Configuration Guide¶
GigaSpatial uses a unified configuration system managed by Pydantic. Configuration can be handled through Environment Variables or a .env file located in your project root.
Core Configuration Logic¶
The gigaspatial.config module provides a singleton config object that manages all system settings.
Path Management & Data Tiers¶
GigaSpatial organizes data into a hierarchical "Medallion Architecture" (Bronze, Silver, Gold). All paths are resolved relative to a root data directory.
| Variable | Description | Default |
|---|---|---|
ROOT_DATA_DIR | The base directory for all library data. | . |
BRONZE_DIR | Root for raw, source-aligned data. | bronze |
SILVER_DIR | Root for processed, cleaned data. | silver |
GOLD_DIR | Root for finalized, aggregated data. | gold |
CACHE_DIR | Base for temporary/cached files. | cache |
ADMIN_BOUNDARIES_DIR | Root for administrative boundary files. | Required for some tasks |
[!NOTE] GigaSpatial automatically constructs subpaths for specific handlers. For example, if
ROOT_DATA_DIR=/data, the path for raw WorldPop data will resolve to/data/bronze/worldpop.
Storage Backends¶
GigaSpatial supports multiple storage backends via the DataStore abstraction.
1. Local Storage (Default)¶
If no specialized environment variables are set, GigaSpatial defaults to using your local file system as the primary storage.
2. Azure Data Lake Storage (ADLS)¶
To use ADLS, install the optional dependencies: pip install "giga-spatial[azure]".
| Variable | Requirement | Description |
|---|---|---|
ADLS_CONTAINER_NAME | Required | The name of your Azure container. |
ADLS_CONNECTION_STRING | Optional* | Full connection string for authentication. |
ADLS_ACCOUNT_URL | Optional* | The URL of your storage account. |
ADLS_SAS_TOKEN | Optional* | SAS token (use with ADLS_ACCOUNT_URL). |
[!IMPORTANT] You must provide either
ADLS_CONNECTION_STRINGOR bothADLS_ACCOUNT_URLandADLS_SAS_TOKEN.
3. Snowflake Internal Stages¶
To use Snowflake, install the optional dependencies: pip install "giga-spatial[snowflake]".
| Variable | Description | Example |
|---|---|---|
SNOWFLAKE_ACCOUNT | Account identifier | xy12345.east-us-2.azure |
SNOWFLAKE_USER | Username | JSOW |
SNOWFLAKE_PASSWORD | Password | ******** |
SNOWFLAKE_WAREHOUSE | Compute warehouse | COMPUTE_WH |
SNOWFLAKE_DATABASE | Target database | GIGA_DB |
SNOWFLAKE_SCHEMA | Target schema | RAW |
SNOWFLAKE_STAGE_NAME | Internal stage for file storage | GIGA_STAGE |
External API Keys¶
Many data handlers require external credentials to fetch data.
Google Earth Engine (GEE)¶
Used by the GEE handler and zonal statistics modules. - GOOGLE_CLOUD_PROJECT: Your GCP Project ID. - GOOGLE_SERVICE_ACCOUNT: Service account email. - GOOGLE_SERVICE_ACCOUNT_KEY_PATH: Local path to the .json service account key.
OpenStreetMap (OSM)¶
By default, GigaSpatial uses a public instance of the Overpass API (http://overpass-api.de/api/interpreter). - No keys are required for basic usage. - If you use a private instance, you can override the base_url parameter in the OSMLocationFetcher constructor.
Ookla Speedtest¶
GigaSpatial accesses Ookla datasets via their public AWS S3 bucket. - No AWS credentials or API keys are required for standard access.
Global Master API Key Table¶
| Variable | Service | Use Case |
|---|---|---|
MAPBOX_ACCESS_TOKEN | Mapbox | Satellite imagery fetching |
MAXAR_API_KEY | Maxar | High-res satellite imagery |
EARTHDATA_USERNAME | NASA | SRTM / Elevation data |
EARTHDATA_PASSWORD | NASA | (Pair with username) |
OPENCELLID_ACCESS_TOKEN | OpenCellID | Cell tower location database |
HEALTHSITES_API_KEY | Healthsites.io | Global health facility data |
GEOREPO_API_KEY | GeoRepo | UNICEF Administrative boundaries |
GEOREPO_USER_EMAIL | GeoRepo | (Pair with API Key) |
GIGA_SCHOOL_LOCATION_API_KEY | Giga | School locations API |
GIGA_SCHOOL_PROFILE_API_KEY | Giga | School profiles API |
GIGA_SCHOOL_MEASUREMENTS_API_KEY | Giga | Connectivity measurements API |
Example .env File¶
# General
ROOT_DATA_DIR=/home/user/giga_data
# Azure Data Lake
ADLS_CONTAINER_NAME=mycontainer
ADLS_CONNECTION_STRING='DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;'
# Google Earth Engine
GOOGLE_CLOUD_PROJECT=giga-spatial-project
GOOGLE_SERVICE_ACCOUNT=service-account@giga-spatial.iam.gserviceaccount.com
GOOGLE_SERVICE_ACCOUNT_KEY_PATH=/secrets/gcp-key.json
# Specialized APIs
OPENCELLID_ACCESS_TOKEN=pk.some_token_here
MAPBOX_ACCESS_TOKEN=sk.another_token_here