Core Module¶
gigaspatial.core
¶
io
¶
adls_data_store
¶
ADLSDataStore
¶
Bases: DataStore
An implementation of DataStore for Azure Data Lake Storage.
Source code in gigaspatial/core/io/adls_data_store.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 |
|
__init__(container=config.ADLS_CONTAINER_NAME, connection_string=config.ADLS_CONNECTION_STRING)
¶
Create a new instance of ADLSDataStore :param container: The name of the container in ADLS to interact with.
Source code in gigaspatial/core/io/adls_data_store.py
copy_directory(source_dir, destination_dir)
¶
Copies all files from a source directory to a destination directory within the same container.
:param source_dir: The source directory path in the blob storage :param destination_dir: The destination directory path in the blob storage
Source code in gigaspatial/core/io/adls_data_store.py
download_directory(blob_dir_path, local_dir_path)
¶
Downloads all files from a directory in Azure Blob Storage to a local directory.
Source code in gigaspatial/core/io/adls_data_store.py
get_file_metadata(path)
¶
Retrieve comprehensive file metadata.
:param path: File path in blob storage :return: File metadata dictionary
Source code in gigaspatial/core/io/adls_data_store.py
mkdir(path, exist_ok=False)
¶
Create a directory in Azure Blob Storage.
In ADLS, directories are conceptual and created by adding a placeholder blob.
:param path: Path of the directory to create :param exist_ok: If False, raise an error if the directory already exists
Source code in gigaspatial/core/io/adls_data_store.py
open(path, mode='r')
¶
Context manager for file operations with enhanced mode support.
:param path: File path in blob storage :param mode: File open mode (r, rb, w, wb)
Source code in gigaspatial/core/io/adls_data_store.py
read_file(path, encoding=None)
¶
Read file with flexible encoding support.
:param path: Path to the file in blob storage :param encoding: File encoding (optional) :return: File contents as string or bytes
Source code in gigaspatial/core/io/adls_data_store.py
upload_directory(dir_path, blob_dir_path)
¶
Uploads all files from a directory to Azure Blob Storage.
Source code in gigaspatial/core/io/adls_data_store.py
upload_file(file_path, blob_path)
¶
Uploads a single file to Azure Blob Storage.
Source code in gigaspatial/core/io/adls_data_store.py
write_file(path, data)
¶
Write file with support for content type and improved type handling.
:param path: Destination path in blob storage :param data: File contents
Source code in gigaspatial/core/io/adls_data_store.py
data_api
¶
GigaDataAPI
¶
Source code in gigaspatial/core/io/data_api.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
|
__init__(profile_file=config.API_PROFILE_FILE_PATH, share_name=config.API_SHARE_NAME, schema_name=config.API_SCHEMA_NAME)
¶
Initialize the GigaDataAPI class with the profile file, share name, and schema name.
profile_file: Path to the delta-sharing profile file. share_name: Name of the share (e.g., "gold"). schema_name: Name of the schema (e.g., "school-master").
Source code in gigaspatial/core/io/data_api.py
get_all_cached_data_as_dict()
¶
Retrieve all cached data in a dictionary format, where each key is a country code, and the value is the DataFrame of that country.
get_all_cached_data_as_json()
¶
Retrieve all cached data in a JSON-like format. Each country is represented as a key, and the value is a list of records (i.e., the DataFrame's to_dict(orient='records')
format).
Source code in gigaspatial/core/io/data_api.py
get_country_list(sort=True)
¶
Retrieve a list of available countries in the dataset.
:param sort: Whether to sort the country list alphabetically (default is True).
Source code in gigaspatial/core/io/data_api.py
get_country_metadata(country)
¶
Retrieve metadata (e.g., column names and data types) for a country's dataset.
country: The country code (e.g., "MWI").
Source code in gigaspatial/core/io/data_api.py
load_country_data(country, filters=None, use_cache=True)
¶
Load the dataset for the specified country with optional filtering and caching.
country: The country code (e.g., "MWI"). filters: A dictionary with column names as keys and filter values as values. use_cache: Whether to use cached data if available (default is True).
Source code in gigaspatial/core/io/data_api.py
load_multiple_countries(countries)
¶
Load data for multiple countries and combine them into a single DataFrame.
countries: A list of country codes.
Source code in gigaspatial/core/io/data_api.py
data_store
¶
DataStore
¶
Bases: ABC
Abstract base class defining the interface for data store implementations. This class serves as a parent for both local and cloud-based storage solutions.
Source code in gigaspatial/core/io/data_store.py
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
|
file_exists(path)
abstractmethod
¶
Check if a file exists in the data store.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path | str | Path to check | required |
Returns:
Type | Description |
---|---|
bool | True if file exists, False otherwise |
is_dir(path)
abstractmethod
¶
Check if path points to a directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path | str | Path to check | required |
Returns:
Type | Description |
---|---|
bool | True if path is a directory, False otherwise |
is_file(path)
abstractmethod
¶
Check if path points to a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path | str | Path to check | required |
Returns:
Type | Description |
---|---|
bool | True if path is a file, False otherwise |
list_files(path)
abstractmethod
¶
List all files in a directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path | str | Directory path to list | required |
Returns:
Type | Description |
---|---|
List[str] | List of file paths in the directory |
open(file, mode='r')
abstractmethod
¶
Context manager for file operations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file | str | Path to the file | required |
mode | str | File mode ('r', 'w', 'rb', 'wb') | 'r' |
Yields:
Type | Description |
---|---|
Union[str, bytes] | File-like object |
Raises:
Type | Description |
---|---|
IOError | If file cannot be opened |
Source code in gigaspatial/core/io/data_store.py
read_file(path)
abstractmethod
¶
Read contents of a file from the data store.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path | str | Path to the file to read | required |
Returns:
Type | Description |
---|---|
Any | Contents of the file |
Raises:
Type | Description |
---|---|
IOError | If file cannot be read |
Source code in gigaspatial/core/io/data_store.py
remove(path)
abstractmethod
¶
Remove a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path | str | Path to the file to remove | required |
Raises:
Type | Description |
---|---|
IOError | If file cannot be removed |
rmdir(dir)
abstractmethod
¶
Remove a directory and all its contents.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dir | str | Path to the directory to remove | required |
Raises:
Type | Description |
---|---|
IOError | If directory cannot be removed |
walk(top)
abstractmethod
¶
Walk through directory tree, similar to os.walk().
Parameters:
Name | Type | Description | Default |
---|---|---|---|
top | str | Starting directory for the walk | required |
Returns:
Type | Description |
---|---|
Generator | Generator yielding tuples of (dirpath, dirnames, filenames) |
Source code in gigaspatial/core/io/data_store.py
write_file(path, data)
abstractmethod
¶
Write data to a file in the data store.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path | str | Path where to write the file | required |
data | Any | Data to write to the file | required |
Raises:
Type | Description |
---|---|
IOError | If file cannot be written |
Source code in gigaspatial/core/io/data_store.py
local_data_store
¶
LocalDataStore
¶
Bases: DataStore
Implementation for local filesystem storage.
Source code in gigaspatial/core/io/local_data_store.py
readers
¶
read_dataset(data_store, path, compression=None, **kwargs)
¶
Read data from various file formats stored in both local and cloud-based storage.
Parameters:¶
data_store : DataStore Instance of DataStore for accessing data storage. path : str, Path Path to the file in data storage. **kwargs : dict Additional arguments passed to the specific reader function.
Returns:¶
pandas.DataFrame or geopandas.GeoDataFrame The data read from the file.
Raises:¶
FileNotFoundError If the file doesn't exist in blob storage. ValueError If the file type is unsupported or if there's an error reading the file.
Source code in gigaspatial/core/io/readers.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
|
read_datasets(data_store, paths, **kwargs)
¶
Read multiple datasets from data storage at once.
Parameters:¶
data_store : DataStore Instance of DataStore for accessing data storage. paths : list of str Paths to files in data storage. **kwargs : dict Additional arguments passed to read_dataset.
Returns:¶
dict Dictionary mapping paths to their corresponding DataFrames/GeoDataFrames.
Source code in gigaspatial/core/io/readers.py
read_gzipped_json_or_csv(file_path, data_store)
¶
Reads a gzipped file, attempting to parse it as JSON (lines=True) or CSV.
Source code in gigaspatial/core/io/readers.py
read_kmz(file_obj, **kwargs)
¶
Helper function to read KMZ files and return a GeoDataFrame.
Source code in gigaspatial/core/io/readers.py
writers
¶
write_dataset(data, data_store, path, **kwargs)
¶
Write DataFrame or GeoDataFrame to various file formats in Azure Blob Storage.
Parameters:¶
data : pandas.DataFrame or geopandas.GeoDataFrame The data to write to blob storage. data_store : DataStore Instance of DataStore for accessing data storage. path : str Path where the file will be written in data storage. **kwargs : dict Additional arguments passed to the specific writer function.
Raises:¶
ValueError If the file type is unsupported or if there's an error writing the file. TypeError If input data is not a DataFrame or GeoDataFrame.
Source code in gigaspatial/core/io/writers.py
write_datasets(data_dict, data_store, **kwargs)
¶
Write multiple datasets to data storage at once.
Parameters:¶
data_dict : dict Dictionary mapping paths to DataFrames/GeoDataFrames. data_store : DataStore Instance of DataStore for accessing data storage. **kwargs : dict Additional arguments passed to write_dataset.
Raises:¶
ValueError If there are any errors writing the datasets.
Source code in gigaspatial/core/io/writers.py
schemas
¶
entity
¶
BaseGigaEntity
¶
Bases: BaseModel
Base class for all Giga entities with common fields.
Source code in gigaspatial/core/schemas/entity.py
id: str
property
¶
Abstract property that must be implemented by subclasses.
EntityTable
¶
Bases: BaseModel
, Generic[E]
Source code in gigaspatial/core/schemas/entity.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 |
|
clear_cache()
¶
filter_by_admin1(admin1_id_giga)
¶
Filter entities by primary administrative division.
filter_by_admin2(admin2_id_giga)
¶
Filter entities by secondary administrative division.
filter_by_bounds(min_lat, max_lat, min_lon, max_lon)
¶
Filter entities whose coordinates fall within the given bounds.
Source code in gigaspatial/core/schemas/entity.py
filter_by_polygon(polygon)
¶
Filter entities within a polygon
Source code in gigaspatial/core/schemas/entity.py
from_file(file_path, entity_class, data_store=None, **kwargs)
classmethod
¶
Create an EntityTable instance from a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path | Union[str, Path] | Path to the dataset file | required |
entity_class | Type[E] | The entity class for validation | required |
Returns:
Type | Description |
---|---|
EntityTable | EntityTable instance |
Raises:
Type | Description |
---|---|
ValidationError | If any row fails validation |
FileNotFoundError | If the file doesn't exist |
Source code in gigaspatial/core/schemas/entity.py
get_lat_array()
¶
Get an array of latitude values.
get_lon_array()
¶
Get an array of longitude values.
get_nearest_neighbors(lat, lon, k=5)
¶
Find k nearest neighbors to a point using a cached KDTree.
Source code in gigaspatial/core/schemas/entity.py
to_coordinate_vector()
¶
Transforms the entity table into a numpy vector of coordinates
Source code in gigaspatial/core/schemas/entity.py
to_dataframe()
¶
to_file(file_path, data_store=None, **kwargs)
¶
Save the entity data to a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path | Union[str, Path] | Path to save the file | required |
Source code in gigaspatial/core/schemas/entity.py
to_geodataframe()
¶
Convert the entity table to a GeoDataFrame.
Source code in gigaspatial/core/schemas/entity.py
GigaEntity
¶
Bases: BaseGigaEntity
Entity with location data.