Changelog¶
All notable changes to this project will be documented in this file.
[v0.9.3] - 2026-04-16¶
Changed¶
-
DataStore-Aware Reading API (
gigaspatial/core/io/readers.py)- Refactored
read_json,read_dataset, andread_datasetsto prioritize the file path as the first argument, makingdata_storean optional second argument that defaults toLocalDataStore()if not provided. This simplifies usage for local file operations:df = read_dataset("data.shp"). - Updated
read_gzipped_json_or_csvto also support an optionaldata_store. - Systematically updated all downstream callers in
BaseHandler,AdminBoundaries,EntityTable,OpenCellIDReader, andHDXReaderto conform to the new argument order.
- Refactored
-
map_wp_pop: Automatic Project-Specific Statistics (gigaspatial/generators/)- Implemented automatic statistic selection for the WorldPop population mapping method in both
PoiViewGeneratorandZonalViewGenerator:degree_of_urbanization: Automatically uses"median"(for categorical class mapping).pop/age_structures: Automatically uses"sum"(for population count mapping).
- This ensures consistent and mathematically correct aggregation defaults across different WorldPop datasets.
- Implemented automatic statistic selection for the WorldPop population mapping method in both
-
TransmissionNodeSchema Update (gigaspatial/core/schemas/transmission_node.py)- Added
is_logical_node(Optional[bool]) field to track whether a site hosts active transmission equipment. - Implemented
_normalize_transmission_mediuminTransmissionNodeProcessorto standardize physical medium values (e.g., mapping "fibre" to "fiber"). - Integrated
BACKHAUL_ALIAS_MAPfrom theCellTowerschema to ensure consistent normalization of shared media types (Fiber, Microwave, Satellite) across infrastructure entities.
- Added
Fixed¶
-
EntityProcessor._drop_duplicates: Robust handling of unhashable columns (gigaspatial/processing/entity_processor.py)- Resolved a
TypeErrorthat occurred when deduplicating DataFrames containing unhashable types (e.g.,set,list,dict) in non-geometry columns. - Updated the deduplication pipeline to dynamically inspect
objectdtype columns and exclude those containing unhashable values from the comparison subset. - Added defensive handling to return the DataFrame unchanged if no comparable (hashable) columns are identified, preventing crashes in
pd.DataFrame.drop_duplicates.
- Resolved a
-
Centralized GEE Dependency Management (
gigaspatial/handlers/gee/)- Refactored
GEEConfigandGEEProfilerto utilize a single shared Earth Engine availability check, centralizing the handling ofearthengine-apiandgeemapoptional dependencies. - Implemented robust
ImportErrorprotection for GEE-specific methods and improved type hint safety across the module, ensuring the core library remains functional without GEE installed.
- Refactored
-
HDX Resource Matching: Prevent false positives for country codes
- Modified
HDXConfig._match_patterningigaspatial/handlers/hdx.pyto support regex-based token matching, ensuring that country identifiers (e.g., ISO-2/ISO-3 codes) are matched as distinct components delimited by_,-,/, or.rather than arbitrary substrings. This prevents incorrect matching where a short code exists within a longer string or dataset identifier. - Added
token_matchparameter toHDXConfig.get_dataset_resourcesto toggle this behavior. - Enabled
token_match=Trueby default inRWIConfig.get_relevant_data_units(gigaspatial/handlers/rwi.py) to resolve ambiguous resource matching across multi-country datasets.
- Modified
Documentation¶
-
README overhaul (
README.md)- Rewrote the Quick Start section to use correct, runnable code. The prior example referenced a non-existent
POIViewGeneratorclass (correct name isPoiViewGenerator), used an undefinedpointsvariable, and duplicated the GHSL mapping call. Updated to useGigaSchoolLocationFetcher→PoiViewGenerator→ enrichment chain, matching the actual v0.9.x API. Added a second grid-based example usingH3ViewGenerator. - Added optional dependency table covering
azure-storage-blob,snowflake-connector-python, andearthengine-api/geemapwith links to the new Configuration Guide. - Replaced the vague Key Features prose with a concise, class-name-anchored bullet list (
TifProcessor,H3ViewGenerator,WPPopulationHandler, etc.). - Replaced the Core Concepts freeform list with a structured Markdown table mapping concept names to their concrete Python classes.
- Added a Supported Datasets table covering all nine source categories (Buildings, Population & Settlements, Network & Connectivity, POI, Humanitarian, Earth Observation, Giga, Admin Boundaries, Relative Wealth).
- Removed the redundant View Generators section (content consolidated into Core Concepts table).
- Updated installation link to point to the new Configuration Guide.
- Rewrote the Quick Start section to use correct, runnable code. The prior example referenced a non-existent
-
New Configuration Guide (
docs/getting-started/configuration.md)- Created a comprehensive guide serving as the master reference for all library settings.
- Path Management: Documents the medallion architecture (
bronze/silver/goldtiers) and howROOT_DATA_DIRdrives the tier hierarchy. Explains theconfig.set_path()andconfig.get_path()API and environment variable override precedence. - Storage Backends: Dedicated setup sections for Local filesystem, Azure Data Lake Storage (ADLS, including connection string vs. SAS token authentication), and Snowflake internal stages (including SPCS mode).
- API Integrations: Covers Google Earth Engine (service account and Application Default Credentials), OpenStreetMap (Overpass endpoint config), and Ookla (tile cache configuration).
- Environment Variable Master Table: Single reference table for all 30+ supported environment variables, grouped by subsystem with descriptions and example values.
- Cross-referenced from the updated Installation Guide.
-
Updated Installation Guide (
docs/getting-started/installation.md)- Modernized "Next Steps" section to explicitly direct users to the new Configuration Guide before the Quick Start guide, reflecting the prerequisite relationship.
- Fixed a missing
---separator that left the section header floating.
-
New User Guide How-to Articles (
docs/user-guide/)- Extracted the five core workflows from production Jupyter notebooks in the repository root and converted them into structured narrative guides. Each guide explains the architectural "Why" (handler vs. generator responsibilities) before the "How" (code).
school-proximity.md:GigaSchoolLocationFetcher+PoiViewGenerator.find_nearest_buildings()— building proximity analysis at national scale. Derived fromoptimize_building_mapping.ipynb.infrastructure-normalization.md:read_dataset+TransmissionNodeTable+EntityProcessor— normalizing partner KMZ/GPKG fiber data into the Giga schema. Derived fromprocess_infra_ken.ipynb.settlement-characterization.md:GHSLDataHandler(product="GHS_SMOD")+PoiViewGenerator.map_smod()— automated urban/rural stratification using the GHSL Settlement Model. Derived from GHSL mapping notebooks.population-accessibility.md:WPPopulationHandler+PoiViewGenerator.map_wp_pop()— population catchment estimation within a configurable buffer radius. Derived fromtest_worldpop.ipynb.zonal-statistics-grids.md:TifProcessor+H3ViewGenerator.map_rasters()— aggregating raster data (elevation, nightlights) onto H3 hexagonal grids for global comparative analysis. Corrects the non-existentH3Hexagons.agg_raster()pattern in favor of the actualH3ViewGenerator.map_rasters()API. Derived fromtest_tifprocessor.ipynb.
-
mkdocs.yml: Navigation updated- Added
Configuration: getting-started/configuration.mdto the Getting Started section. - Added all five new User Guide articles to the User Guide navigation section, replacing the previously commented-out placeholder entries.
- Added
Dependencies¶
- Added
pyarrow>=17.0.0to core dependencies to support optimized tabular data processing and standard Parquet I/O.
[v0.9.2] - 2026-03-30¶
Added¶
-
UNICEF SDG Data Fetcher (
gigaspatial/handlers/unicef/sdg.py) - New stateless fetcher for the UNICEF SDMX Data Warehouse, providing access to 700+ child welfare indicators (mortality, education, nutrition, immunisation, WASH, SDGs).UnicefDataFetcher- Pydantic dataclass fetcher wrapping theunicefdataPython package. Returns indicator data directly as apd.DataFramewith no download/caching lifecycle, consistent with the GigaSpatial fetcher pattern (e.g.,GigaSchoolLocationFetcher).- Supports indicator filtering by
countries(ISO3, validated viapycountry),year/ year range,sexdisaggregation,latest/mrvrecency filters, andcircaapproximate-year matching. fetch()- Core method returning a flatpd.DataFrameof SDMX indicator observations.fetch_as_geodataframe()- Spatially enriches fetched indicator data by left-joining to an admin boundaryGeoDataFrameon ISO3 country code, enabling direct integration with GigaSpatial's view generator workflows.search_indicators()/list_categories()- Static convenience wrappers exposingunicefdatadiscovery utilities without requiring fetcher instantiation.unicefdataadded as a new optional dependency under the[unicef]extra (pip install "giga-spatial[unicef]").
-
OSMLocationFetcher.fetch_by_osmid: Fetch a single OSM element by ID- Added
fetch_by_osmid(osmid, element_type, include_metadata)togigaspatial/handlers/osm.py, enabling direct lookup of any OSM element (node, way, or relation) by its numeric OSM ID via the Overpass API<type>(id:<osmid>)filter. element_typeaccepts"node"(default),"way", or"relation", routing to the correct Overpass output mode (centerfor nodes/relations,geomfor ways) and the appropriate internal processor (_process_node_relationor_process_way).- Returns a processed
Dictwith the same field structure asfetch_locationsresults (source_id,name,name_en,type,geometry,latitude,longitude, and metadata fields wheninclude_metadata=True), ensuring interoperability with downstream GigaSpatial workflows. - Includes a graceful fallback for elements that do not match any configured
location_types: returns a minimal dict with raw OSM tags and geometry instead ofNone, which is the expected behavior for direct ID-based lookups where tag filtering is not the intent. - Reuses
_make_requestfor retry/backoff logic and_process_node_relation/_process_wayfor consistent element normalization, adding no new I/O or processing surface.
- Added
Changed¶
-
Optional Dependency:
unicefdata(UNICEF SDG Handler) - RefactoredUnicefDataFetcherto make theunicefdatalibrary optional.- Implemented defensive loading in
gigaspatial/handlers/unicef/sdg.py:import gigaspatialno longer fails ifunicefdatais missing. - Detailed
ImportErrormessages now guide users to install the dependency viapip install "giga-spatial[unicef]"only when accessing UNICEF data functionality. - Added
unicefextra tosetup.pyand updatedrequirements.txtwith the appropriate git installation URL.
- Implemented defensive loading in
-
UNICEF Handler Reorganization - Migrated the standalone
gigaspatial/handlers/unicef_georepo.pytogigaspatial/handlers/unicef/georepo.py, centralizing all UNICEF-related providers under a single module. Updated imports inAdminBoundariesand top-level__init__.pyfor consistency.
Fixed¶
TifProcessor._create_clipped_processor: Fixed FileNotFoundError when using non-local DataStores- Resolved a
FileNotFoundErrorduringclip_to_geometry(..., return_clipped_processor=True)that occurred when the parent processor used a cloud-basedDataStore(e.g.,ADLSDataStore). - Fixed the initialization of the new processor instance to use
LocalDataStore()for its local temporary path validation, even when the parent uses a non-local store. - Restored the placeholder initialization pattern to ensure the clipped file is saved in the new processor's
_temp_dir, preventing accidental deletion when the original processor is cleaned up.
- Resolved a
Dependencies¶
[unicef]:unicefdata(install via git URL fromunicef-drp/unicefDataorpip install unicefdatafrom source).
[v0.9.1] - 2026-03-26¶
Added¶
read_dataset: New geospatial format support- Added
.fgb(FlatGeobuf) toGEO_READERSviagpd.read_file: FlatGeobuf is one of GigaSpatial's three primary export formats and was previously unhandled, raising an unsupported format error. - Added
.kml(standalone KML) toGEO_READERSviagpd.read_file: previously only.kmzarchives were supported; standalone KML exports were rejected. - Added
.geojsonland.ndjson(newline-delimited / GeoJSON lines) toGEO_READERSviagpd.read_file: supports NDJSON outputs common in data lake and streaming export pipelines. - Added
validate_crsparameter (defaultTrue) toread_dataset: emits aUserWarningwhen a GeoDataFrame is returned with no CRS defined, surfacing silent projection issues that previously caused hard-to-debug downstream failures. Implemented via a new_maybe_warn_crs()internal helper applied consistently across all geo read paths.
- Added
Changed¶
-
Refactored BaseHandlerDownloader and Handler Cleanup
- Promoted
download_data_unitsto a concrete non-abstract method inBaseHandlerDownloader(gigaspatial/handlers/base.py). - Centralized download logic including optional parallel execution (
multiprocessing.Pool),tqdmprogress tracking, and support forpandas.DataFrameunits into the base class. - Added automatic result filtering (removing
None) and flattening of list-based results (e.g., from extracted archives) to the base implementation. - Systematically removed redundant
download_data_unitslogic from several subclasses, significantly reducing boilerplate and improving maintainability:microsoft_global_buildings.pygoogle_open_buildings.pygoogle_ms_combined_buildings.pynasa_srtm.pyookla_speedtest.pyhdx.pyghsl.pyopencellid.pyworldpop.py
- Ensured proper
kwargspropagation todownload_data_unitin the base implementation to preserve handler-specific parameters (e.g.,extract,file_pattern,data_type).
- Promoted
-
read_dataset: Format registry and routing improvements- Removed
.zipfromCOMPRESSION_FORMATS: it is a container format, not a compression wrapper like.gz/.bz2. Previously,.zipmatched the compression branch before the geo-container branch, causing ambiguous routing and silent failures for zipped geospatial data. - Moved
.jsonfromPANDAS_READERStoGEO_READERS(usinggpd.read_file): plain.jsonfiles in GigaSpatial pipelines are predominantly GeoJSON. The previous routing topd.read_jsonreturned geometry as raw strings with no spatial awareness. A pandas fallback remains for non-spatial JSON. - Removed
**kwargsforwarding fromread_json:json.loadonly acceptsclsandobject_hook; forwarding arbitrary kwargs (e.g.compression=...) causedTypeErrorat runtime. - Renamed inner
suffixesvariable in the.shpsidecar branch toSHAPEFILE_SIDECAR_EXTENSIONS(promoted to module-level constant): the previous local redeclaration silently shadowedpath_obj.suffixesin the outer scope. - Unified bare
.gzfallback to delegate toread_gzipped_json_or_csvinstead of blindly assuming CSV: a.gzfile containing JSON (without a compound.json.gzextension) previously returned a malformed DataFrame without error. - Replaced fragile
data_store.__class__.__name__.replace('DataStore', '').lower()string manipulation with a_storage_display_name()helper usingin-based class name detection, consistent with the DataStore cross-platform improvements in v0.9.0. - Added
raise ... from ethroughout allexceptblocks to preserve original tracebacks and improve debuggability.
- Removed
-
OpenCellID Handler Refactor - Complete architectural refactor to align with the
BaseHandlerdesign pattern.- Decomposed the legacy monolithic class into specialized
OpenCellIDConfig,OpenCellIDDownloader, andOpenCellIDReadercomponents, orchestrating them via a newOpenCellIDHandler. - Updated
extract_search_geometryto return ISO 3166-1 alpha-2 country codes, matching the OpenCellID database's primary indexing method. - Implemented
get_relevant_data_units_by_geometryto dynamically resolve download links from the OpenCellID web portal based on the resolved country code. - Standardized the local storage hierarchy to utilize country-specific subdirectories (
bronze/opencellid/{alpha2}/) for better data organization and multi-country support. - Decoupled processing parameters (
created_newer,created_before, anddrop_duplicates) from the static configuration, moving them toload_from_pathsto enable granular, call-time filtering of cell tower data. - Introduced
load_as_geodataframeas a high-level convenience method for direct loading into spatial data structures.
- Decomposed the legacy monolithic class into specialized
Fixed¶
-
Robust Quadkey Handling in MercatorTiles and RWIHandler
- Fixed a potential
TypeErrorinMercatorTiles.from_quadkeysby ensuring all input quadkeys are mapped to strings before calculating zoom levels. This protects the initialization workflow when quadkeys are passed as integers. - Updated RWIHandler to explicitly cast the quadkey column to strings after loading, preventing downstream failures during tiled aggregations when source data stores quadkeys in numeric format.
- Fixed a potential
-
TifProcessor._create_clipped_processor: Robust temp file initialization and data store handling- Fixed a
FileNotFoundErrorraised duringclip_to_geometry(..., return_clipped_processor=True)caused by the newTifProcessorbeing initialized withdata_store=self.data_storewhile the placeholder file was written to a local temp path. Whendata_storeis not aLocalDataStore,__post_init__routed todata_store.file_exists()which could not locate the locally created placeholder, raising the error. - Eliminated the unnecessary two-step placeholder pattern (create dummy
.tifin a separatetempfile.mkdtemp()dir, then overwrite with clipped data). Clipped data is now written directly to a single temp file withinself._temp_dir, matching the pattern used by_reproject_to_temp_file. - The new
TifProcessoris now always initialized withLocalDataStore()for temp-file-backed instances, regardless of the parent processor'sdata_store, ensuring__post_init__correctly resolves the local absolute path. - Explicitly sets
clipped_file_path,dataset_path, anddataset_pathson the returned processor soopen_datasetcorrectly routes reads to the clipped file, consistent with the behavior ofclip_to_bounds.
- Fixed a
[v0.9.0] - 2026-03-17¶
Added¶
-
Giga Spatial Entity Schemas
gigaspatial/core/schemas/- New core module for managing structured infrastructure and administrative entities using a Medallion Architecture (Bronze-Silver-Gold) pattern.- Base Entity Framework (
entity.py):BaseGigaEntity,GigaEntity,GigaGeoEntity,GigaEntityNoLocation: Pydantic-validated base models for all GigaSpatial entities, supporting stable UUID-based ID generation, coordinate validation (including "Null Island" checks), and geometry parsing (WKT/WKB/Shapely).EntityTable: Generic container class for collections of entities, providing bulk loading from files/dataframes, serialization, spatial filtering (admin, polygon, bounds), and advanced spatial analysis (KDTree-based nearest neighbors, distance graph construction).
- Silver-Layer Data Processing (
gigaspatial/processing/entity_processor.py):EntityProcessor: A robust, multi-step cleaning pipeline designed to transform raw "Bronze" data into "Silver" validated structures. Features include:- Automatic coordinate column detection and repair (merged lat/lon columns, trailing commas).
- NFKC string normalization and whitespace stripping.
- Sentinel-aware null coercion (handling "n/a", "none", etc.).
- Geometry-safe row operations (deduplication, empty row removal).
- Change tracking via
track_changesdecorator for detailed pipeline logging of row/column shifts.
- Domain-Specific Entity Schemas:
- Connectivity:
CellTower,Cell,TransmissionNode,MobileCoverage. Includes logic for backhaul normalization, radio technology mapping, and cross-entity enrichment (e.g., populating tower performance from child cells). - Infrastructure:
BuildingFootprintfor managing building geometries. - Administrative:
AdminBoundaryfor structured administrative region management and parent/child relationship mapping.
- Connectivity:
- Shared Infrastructure:
- Centralized enums for
DataConfidence,PowerSource,RadioType, andBackhaulTypeensuring cross-module consistency. - Configurable Entity ID Namespace: Relocated
ENTITY_UUID_NAMESPACEtoshared.pyand linked it to a newENTITY_ID_NAMESPACEglobal configuration field, allowing the UUID namespace used for stable ID generation to be overridden via environment variables (defaults touuid.NAMESPACE_DNS).
- Centralized enums for
- Base Entity Framework (
-
BigQuery Client
gigaspatial/core/io/bigquery_client.py- New reusable BigQuery client layer for interacting with Google BigQuery datasets.BigQueryClientConfig- Pydantic-validated configuration supporting service account key file authentication and Application Default Credentials (ADC) fallback, mirroring the credential pattern used byGEEProfiler. Defaults forproject,service_account, andservice_account_key_pathare resolved fromglobal_config(GOOGLE_CLOUD_PROJECT,GOOGLE_SERVICE_ACCOUNT,GOOGLE_SERVICE_ACCOUNT_KEY_PATH).BigQueryClient- Generic, dataset-agnostic client exposing:list_datasets(project_id): List available datasets within a GCP project.list_tables(dataset_id): List all tables within a dataset.get_table_schema(dataset_id, table_id): Retrieve field names, types, modes, and descriptions for a table.query(sql): Execute SQL and return a rawRowIterator.query_to_dataframe(sql): Execute SQL and return apd.DataFrame, with optional BigQuery Storage API acceleration (use_bq_storage), configurable byte billing cap (max_gb_allowed, default 10 GB), andtqdmprogress bar support.get_query_cost_estimate(sql): Dry-run cost estimation in USD using on-demand pricing ($6.25/TiB).
- Designed for composition: dataset-specific handlers instantiate
BigQueryClientinternally rather than re-implementing auth or query logic.
-
M-Lab NDT7 Handler (
gigaspatial/handlers/mlab.py) - New handler for querying M-Lab network measurement data from themeasurement-lab.ndt.ndt7BigQuery public dataset.MLabConfig- Pydantic config holding dataset-level parameters (project_id,dataset,default_start_date). Auth and billing credentials are fully delegated toBigQueryClientConfig/global_config, keeping dataset config free of auth logic.MLabHandler- ComposesBigQueryClientto query NDT7 upload and download measurements:query_ndt7(country_code, start_date, end_date, measurement): Returns apd.DataFrameof NDT7 measurements filtered by ISO alpha-2 country code and date range. Selectsid,date, client geo fields (country_code,city,lat,lon,postal_code),a.*scalar metrics (MeanThroughputMbps,MinRTT,LossRate,CongestionControl), raw common fields (ServerIP,ClientIP,StartTime,EndTime), and direction-specificServerMeasurementsfields (AppInfo, BBRInfo, TCPInfo) based on themeasurementparameter ("download","upload", or"both").- Automatic Schema Flattening: Handles complex BigQuery ARRAY and STRUCT navigation (e.g.,
ServerMeasurements[SAFE_OFFSET(0)]) internally, exposing a flatpd.DataFrameto the user.
- Automatic Schema Flattening: Handles complex BigQuery ARRAY and STRUCT navigation (e.g.,
query_ndt7_gdf(country_code, start_date, end_date, measurement): Convenience wrapper returning agpd.GeoDataFrame(EPSG:4326) fromclient.Geo.Latitude/client.Geo.Longitude; rows with null coordinates are dropped.estimate_query_cost(country_code, start_date, end_date, measurement): Pre-execution cost estimate in USD viaBigQueryClient.get_query_cost_estimate(), recommended before running wide date-range queries given the scale of the public dataset.
- Country codes validated via
pycountry(raisesValueErrorfor unknown codes). - Partition filter on
dateand directclient.Geo.CountryCodefield filter minimise bytes scanned for cost efficiency.
-
WorldPop Handler: Degree of Urbanisation datasets (
degree_of_urbanization)- Added
project="degree_of_urbanization"as a new supported project type inWPPopulationConfigandWPPopulationHandler, providing access to WorldPop's Degree of Urbanisation (DUG) datasets built on the R2025A Global2 methodology. - Covers years 2015–2030 via API category
dug_g2_v1("Degree of Urbanisation 2015-2030 using WorldPop Global2 R2025A"). - Added
dug_level: Literal["L1", "L2"]field (default"L1") to select between:L1: 3-class grid classification (urban / suburban / rural) —*_GRID_L1_R2025A_v1.tifL2: 7-class detailed grid classification —*_GRID_L2_R2025A_v1.tif
- Only
.tifgrid files are fetched; accompanying.zip(entities, statistics) files are automatically excluded. constrained,un_adjusted,school_age, andunder_18fields are not applicable for DUG and are silently normalised toFalsewith a warning if set.get_relevant_data_units_by_geometryupdated to route DUG requests to the correct APIdataset_type("dug") and filter files by the selecteddug_levelpattern.__repr__updated to showdug_levelinstead of population-specific fields whenproject="degree_of_urbanization".- Fully compatible with existing download and reader pipeline — DUG
.tiffiles follow the sameGIS/path convention and require no changes toget_data_unit_path,get_data_unit_paths, orWPPopulationReader.
- Added
-
HTTP Module
gigaspatial/core/http/- New submodule providing a reusable, composable HTTP client layer for interacting with REST APIs across all GigaSpatial handlers.AuthConfigandAuthType(auth.py) - Pydantic-validated authentication configuration supporting Bearer token, API key (header or query param), Basic auth, and no-auth modes.AuthConfig.build()resolves to(headers, query_params, httpx_auth)for direct use byhttpx.Client.BaseRestApiClientandRestApiClientConfig(client.py) - Abstract base class for REST API clients with built-in retry logic (exponential backoff),Retry-After-aware rate limiting, session lifecycle management via context manager (__enter__/__exit__), and convenienceget/postwrappers. Configuration is fully Pydantic-validated (base_url,auth,timeout,max_retries,retry_backoff,default_headers).BasePaginationStrategy,OffsetPagination,CursorPagination,PageNumberPagination(pagination.py) - Strategy-based pagination abstractions. Subclasses overrideextract_recordsandnext_requestto express any pagination pattern (offset/limit, cursor, page number) without modifying the client.PageNumberPaginationadded to support page/size APIs (used by Giga handlers).- All classes exported from
gigaspatial/core/http/__init__.py.
-
Giga Handlers Submodule
gigaspatial/handlers/giga/- Reorganised Giga School API fetchers into a dedicated submodule with a shared internal client.GigaApiClient(api_client.py) - InternalBaseRestApiClientsubclass shared across all Giga fetchers. Uses Bearer token auth andPageNumberPaginationwithrecords_key="data".GigaSchoolLocationFetcher(school_locations.py) - Refactored to useGigaApiClient. Eliminates manualrequestspagination loop; retry logic and rate limiting inherited fromBaseRestApiClient.GigaSchoolProfileFetcher(school_profile.py) - Refactored to useGigaApiClient. Supports optionalgiga_id_schoolfilter; single-school requests are short-circuited to one page.GigaSchoolMeasurementsFetcher(school_measurements.py) - Refactored to useGigaApiClient. Date range parameters (start_date,end_date) now overridable per-call in addition to instance level._format_datepromoted to@staticmethod.
Changed¶
-
HealthSitesFetcher._convert_country- Added retry loop with exponential backoff and escalating timeouts (2000ms → 4000ms → 6000ms) for OSM country name resolution viaOSMLocationFetcher.get_osm_countries, improving resilience against transient network timeouts. -
Optional Dependency Implementation (Core Framework) - Refactored the library to make heavy dependencies optional, reducing the base installation size and improving import times.
- Implemented "Defensive Loading" pattern across all heavy modules: dependencies are only required when the specific functionality is accessed.
- Added clear
ImportErrormessages with tailored installation instructions (e.g.,pip install "giga-spatial[gee]") for missing optional packages. - Refactored the following modules to be optional:
- Earth Engine:
GEEProfilerand related GEE handlers (requires[gee]). - BigQuery:
BigQueryClientandMLabHandler(requires[bq]). - Azure:
ADLSDataStore(requires[azure]). - Snowflake:
SnowflakeDataStore(requires[snowflake]). - Delta Sharing:
DeltaSharingDataStore(requires[delta]). - Database:
DBConnection, Trino support, and Dask integration (requires[db]). - DuckDB:
OvertureAmenityFetcher(requires[duckdb]).
- Earth Engine:
- Updated
setup.pywithextras_requirefor all groups and an[all]extra for full installation. - Reorganized
requirements.txtto clearly distinguish between core and optional dependencies.
-
Giga fetchers - Removed
sleep_timefield and manualtime.sleepcalls; rate limiting is now handled transparently byBaseRestApiClientviaRetry-Afterheader respect and configurable backoff. -
GigaSchoolMeasurementsFetcher.get_performance_summary- Consolidated repetitive boolean flag summarisation into a loop over(flag, label)pairs for maintainability. -
Healthsites Handler
gigaspatial/handlers/healthsites.py- ReorganisedHealthSitesFetcherwith its own internal client._HealthSitesPaginationStrategy: CustomBasePaginationStrategyhandling both GeoJSON (features[]) and JSON (direct list) response formats from the Healthsites API._HealthSitesApiClient: InternalBaseRestApiClientsubclass usingAuthType.API_KEY_QUERY(api-keyquery parameter) and_HealthSitesPaginationStrategy.HealthSitesFetcher: Refactored to use_HealthSitesApiClient. All fetch parameters (country,extent,output_format,flat_properties,from_date,to_date) are now overridable per call.fetch_statisticsandfetch_facility_by_iduse the client directly for non-paginated requests.
-
WPPopulationConfig.validate_configurationdocstring: Added DUG availability section: -
WorldPop Handler default configuration: Updated the default dataset parameters in
WPPopulationHandlerto userelease="GR2",year=2025, andconstrained=True(withun_adjusted=False), shifting the default data source to the more recent R2025A series.
Fixed¶
- BaseHandler: Robust CRS handling in cropping and loading
- Fixed a critical issue where
crop_to_geometry()used a hardcoded"EPSG:4326"CRS when reprojecting geometries to match data projections. It now correctly uses the CRS provided inkwargsor falls back to the handler's default. - Added a
crsproperty toBaseHandlerConfig(defaulting to"EPSG:4326") to allow handler-specific coordinate systems (e.g., GHSL's MollweideESRI:54009) to be correctly propagated during the loading pipeline. - Updated
BaseHandlerReader.load()to automatically retrieve and pass the configuration's CRS tocrop_to_geometry(), ensuring that cached search geometries are correctly reprojected even when the handler uses a non-WGS84 coordinate system. - Ensured
crsis popped fromkwargsincrop_to_geometry()to avoid downstream conflicts with processing algorithms that might receive the same keyword arguments.
- Fixed a critical issue where
Dependencies¶
- Added
httpx>=0.27as a new dependency. - Refactored the library to use Optional Dependencies (Extras). See
setup.pyorrequirements.txtfor details.[gee]:earthengine-api,geemap[bq]:google-cloud-bigquery,google-cloud-bigquery-storage,google-auth,google-auth-oauthlib,google-auth-httplib2,db-dtypes[azure]:azure-storage-blob[snowflake]:snowflake-connector-python[delta]:delta-sharing[db]:SQLAlchemy,sqlalchemy-trino,dask[duckdb]:duckdb[all]: Install all optional dependencies.
[v0.8.2] - 2026-02-19¶
Changed¶
-
WorldPop Population Mapping Robustness
- Refactored
GeometryBasedZonalViewGenerator.map_wp_popandPoiViewGenerator.map_wp_popto safely handleWPPopulationHandler.load_datareturning either a singleTifProcessororList[TifProcessor]. - Introduced
_ensure_tif_listhelper to normalize handler outputs into flat lists, preventing crashes when filters (e.g.,under_18=True,min_age/max_age) yield single rasters. - Non-
age_structuresraster paths now build flatList[TifProcessor]across countries usingextend, avoiding nested lists. - Preserved per-raster summing semantics for
age_structures+centroid_within(zonal or POI), now robust to single/list outputs.
- Refactored
-
PoiViewGenerator: Removed unused kwargs from aggregation calls
- Cleaned up
map_points()andmap_polygons()methods by removing**kwargsarguments passed toaggregate_points_to_zones()andaggregate_polygons_to_zones(), which do not accept additional keyword arguments.
- Cleaned up
Fixed¶
- Updated
requestsdependency from 2.32.3 to 2.32.4 to address CVE-2024-47081 (URL parsing issue leaking .netrc credentials).
[v0.8.1] - 2026-02-19¶
Added¶
- GeometryBasedZonalViewGenerator: Relative Wealth Index mapping
- Added
map_rwi()helper to aggregate Relative Wealth Index values to arbitrary zones usingRWIHandleras the data source. - Supports both point-based (
predicate="centroid_within", using quadkey centroids) and polygon-based ("intersects","within") enrichment with configurable aggregations (mean,median,max,min) into a new output column (e.g.,rwi_mean).
- Added
Changed¶
-
WorldPop Handler (
WPPopulationConfig): Multi-release support with GR1/GR2- Introduced a
releasefield (Literal["GR1", "GR2"], default"GR2") toWPPopulationConfigandWPPopulationHandlerto explicitly select between the two WorldPop Global dataset releases, replacing the previous implicit year-based release inference. - GR1 (legacy release): covers years 2000–2020 for both
popandage_structuresprojects, with constrained and unconstrained variants and full UN adjustment support. All previously supported dataset categories (wpgp,wpic1km,cic2020_100m,aswpgp,ascic_2020,sapya1km, etc.) are preserved unchanged under this release. - GR2 (WorldPop R2025A v1): covers years 2015–2030, constrained only, no UN adjustment. Supports:
pop: 100m (G2_CN_POP_R25A_100m) and 1km (G2_CN_POP_R25A_1km) resolution.age_structures(full age breakdown): 100m (G2_CN_Age_R25A_100m) and 1km (G2_CN_Age_R25A_1km) resolution.age_structures(under-18 population,under_18=True): 100m only (G2_Age_U18_R25A_100m).
- Added
under_18: boolfield (defaultFalse) toWPPopulationConfigandWPPopulationHandlerfor accessing WorldPop R2025A under-18 population datasets (project="age_structures",release="GR2"only). - Removed the R2024-specific special-case branches (
year == 2024,G2_CN_POP_2024_100m,G2_UC_POP_2024_100m,G2_CN_Age_2024_100m,G2_UC_Age_2024_100m) fromvalidate_configuration; users requiring those datasets should remain on GR1 withyear=2020or use GR2 for the overlapping year range. - Updated
AVAILABLE_YEARS_GR1(formerlyAVAILABLE_YEARS) torange(2000, 2021)and addedAVAILABLE_YEARS_GR2asrange(2015, 2031). - Extended
_filter_age_sex_pathsto correctly handle under-18 filename patterns (ISO3_T/F/M_Under_18_YEAR_...tif): detectsUNDER_18keyword to avoid misclassification as non-school-age files, supportsT(total) sex value, and defaults to returning only theTaggregate file when no sex filter is provided. - Extended the zip-extraction branch in
get_data_unit_pathsto coverunder_18=Truedatasets (previously onlyschool_age=True). - Extended
get_relevant_data_units_by_geometryzip passthrough guard to includeG2_Age_U18_R25A_100malongsidesapya1km. - Updated
validate_configurationdocstring to reflect GR1/GR2 availability matrix for all project and resolution combinations.
- Introduced a
-
Relative Wealth Index Handler (
RWIHandler): Quadkey-enriched loading- Updated
load_data()to automatically compute and attach aquadkeycolumn (zoom level 14) for all RWI records when onlylatitude/longitudeare present, making tiled aggregation the default behavior. - Added
load_as_geodataframe()convenience method to return RWI data as aGeoDataFramewith Mercator quadkey geometries joined to the underlyingrwianderrorattributes for direct spatial analysis.
- Updated
-
aggregate_points_to_zones/aggregate_polygons_to_zones: Semantically correct fill values for empty zones- Changed the fill behavior for zones with no overlapping point or polygon data: non-
countaggregations (e.g.,mean,sum,min,max) now correctly fill missing values withnp.naninstead of0, distinguishing "no data" from a true zero measurement. countaggregation continues to fill with0, preserving the expected behavior that a zone with no matched features has a count of zero.- Updated docstrings for both functions to explicitly document the
np.nanvs.0fill semantics per aggregation method. - Fixed a missing column rename step in
aggregate_points_to_zonesfor non-MultiIndex aggregation results, ensuringoutput_suffixis consistently applied to all output columns. - Fixed geometry column drop in
aggregate_points_to_zonesbeing incorrectly gated on aggregation type; geometry is now always dropped before non-count aggregation to avoid pandas errors.
- Changed the fill behavior for zones with no overlapping point or polygon data: non-
Fixed¶
- BaseHandler.ensure_data_available: Correct handling of post-download paths
- Fixed an issue where
ensure_data_available()could still report missing data on the first download for handlers whoseget_data_unit_paths()mapping changes after download (e.g., WorldPopage_structureswithschool_age=True, where ZIP resources are extracted to.tiffiles). - After downloading, the method now refreshes the resolved data paths before performing the final existence check, ensuring that ZIP →
.tifextraction workflows correctly pass availability checks on the first run.
- Fixed an issue where
[v0.8.0] - 2026-02-10¶
Added¶
-
Google Earth Engine Handler (
GEEProfiler)- Complete GEE Integration: New
gigaspatial/handlers/gee/module with full Google Earth Engine support viaGEEProfilerclass. - Context & Impact: Multi-Sector GEE Intelligence
- Critical Data Access: Built-in support for global infrastructure and environmental datasets (Nightlights, Population, Land Cover) provides instant, ready-to-use insights for emergency response, climate resilience, and urban development.
- Beyond Built-ins: While the registry comes pre-configured for high-value datasets,
GEEProfileris fully flexible, unlocking the entire multi-petabyte Google Earth Engine public catalog (Landsat, Sentinel, MODIS, etc.) for custom analysis.
- Intelligent Asset Handling: Robust initialization automatically distinguishes
ee.Imagevsee.ImageCollection, preventing backend loading errors during band validation. - Dataset Registry:
GEEDatasetRegistryandGEEDatasetEntryfor managing built-in datasets (nightlights, population, etc.) with metadata (bands, scales, temporal coverage). - Inspection Utilities: Comprehensive collection inspection including
display_collection_info(),get_band_names(),get_date_range(),display_band_names(),display_properties(). map_to_points(): Extracts GEE raster values at point locations with optional circular buffers (metric CRS buffering, automatic UTM reprojection).map_to_zones(): Spatial aggregation of GEE rasters within polygon boundaries (admin zones, grids) using configurable reducers.- Automatic Chunking: Handles GEE API limits with intelligent chunking (
chunk_size=1000default), processing large feature sets without timeouts. - Unified Processing Pipeline: Single
_reduce_regions_with_chunking()core handles both points and zones with robust error recovery per chunk. - Temporal Filtering:
start_date/end_datefiltering with automatic temporal reduction (mean,median, etc.) for ImageCollections. - NRT-Ready Logic: Advanced date handling for Near Real-Time datasets, automatically defaulting to most recent data windows.
- Config-Driven:
GEEConfigsupports dataset lookup, authentication (service accounts), and defaults (band, reducer, scale, chunk_size). - Production-Ready: Full validation (bands, dates, geometries), detailed logging, error handling, and config fallbacks.
- Complete GEE Integration: New
-
GEE Built-in Datasets
- Pre-configured registry entries for popular datasets:
nightlights(VIIRS),population(various sources), land cover, etc. - Automatic collection loading via
dataset_id:profiler = GEEProfiler("nightlights"). - Standardized metadata: resolution, default bands, reducers, temporal cadence.
- Pre-configured registry entries for popular datasets:
-
DeltaSharingDataStore (formerly GigaDataAPI)
- Refactored internal
GigaDataAPIinto publicDeltaSharingDataStoreas a reusable data-access layer for Delta Sharing tables. - Replaced hardcoded global config with flexible initialization supporting global config (
API_PROFILE_FILE_PATH,API_SHARE_NAME,API_SCHEMA_NAME) + selective per-instance overrides via keyword arguments. - Added
DeltaSharingConfig(Pydantic model) for validated configuration with profile file existence checks andenable_cacheflag for in-memory table caching. - Implemented lazy
SharingClientinitialization viaclientproperty and comprehensive cache management (get_table_metadata(),clear_cache(),get_cached_tables(),cache_size_mb). - Maintained backward-compatible API:
list_tables()→get_country_list(),load_table()→load_country_data().
- Refactored internal
Changed¶
-
OvertureAmenityFetcher: Updated to latest Overture Places release and S3 layout
- Updated default Overture release to
2026-01-21.0for Places theme queries. - Switched to the current S3 URL pattern for Places GeoParquet (
release/{release}/theme=places/...) and now build the read path via a configurablebase_urlinstead of hardcoding it in the SQL query. - Fixed IO errors caused by outdated bucket patterns by reading directly from the official Overture S3 bucket with DuckDB’s
read_parquet(). - Enabled S3 access in DuckDB by installing and loading the
httpfsextension and configuring the AWS region (s3_region='us-west-2') during connection setup.
- Updated default Overture release to
-
DeltaSharingDataStore: Robust table discovery
- Enhanced
list_tables()to handleSharingClient.list_all_tables()returning empty lists (common in constrained shares) by falling back to explicitlist_shares()→list_schemas()→list_tables()enumeration on configured share/schema. - Added structured logging for debugging share/schema mismatches while preserving stable public API contract.
- Enhanced
-
DataStore: Cross-Platform Path Handling
-
ADLSDataStoreandLocalDataStore: Enhanced path type support- All path-accepting methods now support both
strandPathLikeobjects (e.g.,pathlib.Path) for improved type flexibility and cross-platform compatibility. - Introduced
Pathish = Union[str, PathLike[str]]type alias for consistent path parameter signatures across both data store implementations.
- All path-accepting methods now support both
-
ADLSDataStore: Centralized blob key normalization- Added
_to_blob_key()method as the single source of truth for converting input paths to Azure-compatible blob keys. - Automatically handles Windows backslashes → forward slashes conversion via
PurePosixPathfor cross-platform compatibility. - Strips leading slashes (Azure convention) and optionally ensures trailing slashes for directory operations.
- All methods now use
_to_blob_key()internally, eliminating inconsistent path handling across different operations. _normalize_path()deprecated in favor of_to_blob_key()but kept for backward compatibility.
- Added
-
LocalDataStore: Improved path resolution- Enhanced
_resolve_path()to acceptPathLikeobjects alongside strings. - Properly handles both absolute and relative paths with automatic resolution relative to
base_path. - All methods updated to use
Pathishtype signature for consistency withADLSDataStore.
- Enhanced
-
Benefits:
- Cross-platform safety: Eliminates Windows vs. Linux path separator issues when handlers build paths using
pathliband pass to data stores. - Type flexibility: Handlers can now pass
Pathobjects, strings, or anyPathLikewithout manualstr()conversion. - Backward compatible: Existing handler code with
str()wrappers continues to work unchanged. - Maintainability: Path normalization logic centralized in data store boundary, not scattered across handlers.
- Cross-platform safety: Eliminates Windows vs. Linux path separator issues when handlers build paths using
-
Documentation¶
- OvertureAmenityFetcher: Amenity category documentation
- Expanded the class docstring to explain that
amenity_typesshould correspond tocategories.primaryvalues in the Overture Places schema. - Linked to the authoritative Overture category list CSV on GitHub so users can discover all valid amenity categories without hardcoding them in giga-spatial:
https://github.com/OvertureMaps/schema/blob/main/docs/schema/concepts/by-theme/places/overture_categories.csv.
- Expanded the class docstring to explain that
[v0.7.6] - 2026-01-27¶
Changed¶
-
Maxar Imagery Handler: API Migration to GEGD Pro Platform
-
Updated
MaxarConfigto support the new Maxar GEGD Pro API infrastructure:- Replaced username/password/connection string authentication with API key-based authentication.
- Added support for OAuth bearer tokens as an alternative authentication method.
- Updated base URL from
evwhs.digitalglobe.comtopro.gegd.com/streaming/v1/ogc/wms. - Added
auth_methodparameter supporting three authentication modes:api_key(query parameter),header(custom header), andbearer_token. - Updated layer names from
DigitalGlobe:Imagery/DigitalGlobe:ImageryFootprinttoMaxar:Imagery/Maxar:FinishedFeature. - Replaced deprecated
featureProfileparameter with optionalprofileparameter for stacking profiles. - Renamed
coverage_cql_filtertocql_filterfor consistency with updated API specification. - Added
stylesparameter supportingraster(imagery) andfootprints(feature visualization) rendering modes. - Updated to WMS version 1.3.0 as the default and recommended version.
-
Enhanced Initialization Flexibility:
MaxarImageDownloadernow accepts configuration asMaxarConfigobject, dictionary, or keyword arguments.- Supports mixing configuration sources with kwargs taking precedence for convenient overrides.
- Added validation for API key or bearer token requirement via Pydantic field validators.
-
Updated Authentication Implementation:
- Replaced basic authentication with header-based and query-parameter authentication methods.
- Implemented
_build_auth_headers()for dynamic header construction based on auth method. - Modified
_initialize_wms()to handle API key injection and header configuration. - Updated
_download_single_image()to usesrsparameter (OWSLib compatibility) while supporting WMS 1.3.0crsin actual requests.
-
-
Maxar Imagery Handler: WFS Metadata Integration
-
Added Web Feature Service (WFS) support for querying imagery metadata alongside image downloads:
- Implemented
_initialize_wfs()for WFS 2.0.0 service initialization with authentication. - Added
get_imagery_metadata()method supporting bbox and CQL filter queries with configurable output formats. - Implemented
get_metadata_for_bbox()convenience method providing summary statistics (feature count, sensors, date ranges, cloud cover, resolution). - Added
_download_single_image_with_metadata()for atomic image download with associated feature metadata.
- Implemented
-
Metadata Features:
- Retrieves comprehensive imagery attributes including: acquisition dates, sensor/source information, cloud cover percentage, ground sample distance, sun angles, off-nadir angle, product names, band descriptions, NIIRS quality ratings, and processing levels.
- Returns metadata as
GeoDataFramefor seamless spatial analysis and filtering. - Supports CQL filtering by date ranges, sensors, product types, and custom attribute queries.
- Includes BBOX-in-CQL support for combined spatial and attribute filtering (WFS limitation workaround).
-
Bulk Download Metadata Support:
- Added
save_metadataparameter to all bulk download methods (download_images_by_tiles,download_images_by_bounds,download_images_by_coordinates). - When enabled, automatically saves JSON metadata files alongside downloaded images with matching filenames.
- Metadata JSON includes feature properties with proper datetime serialization (ISO 8601 format).
- Geometry column excluded from JSON output to minimize file size while preserving spatial reference in GeoDataFrame workflows.
- Added
-
-
Maxar Imagery Handler: Date Filtering Convenience
-
Added
build_date_filter()helper method for constructing CQL date range filters:- Accepts string dates (
YYYY-MM-DD),datetime, ordateobjects. - Supports start-only, end-only, or date range filtering via
acquisitionDatefield. - Uses
BETWEENsyntax for cleaner queries when both dates provided. - Configurable
date_fieldparameter for filtering alternative temporal fields.
- Accepts string dates (
-
Integrated Date Filtering in Bulk Downloads:
- Added
start_dateandend_dateparameters to all bulk download methods. - Date filters automatically combined with existing
cql_filterconfiguration via logical AND. - Original CQL filter state preserved and restored after download completion.
- Includes informative logging when date filters are applied.
- Added
-
-
ADLSDataStore: Performance Optimizations for File and Directory Operations
-
Optimized
list_files()Method:- Replaced
list_blobs()withlist_blob_names()for performance improvement. - Reduced memory usage by returning blob name strings instead of full
BlobPropertiesobjects. - Added
_normalize_path()helper method for consistent path handling (removes leading slashes, ensures trailing slashes for directories, converts backslashes to forward slashes). - Maintains backward compatibility—still returns a list of file paths.
- Performance: ~2MB memory usage vs ~500MB for large directories.
- Replaced
-
New
list_files_iter()Method:- Memory-efficient generator-based iteration over files in large directories.
- Enables early exit and lazy evaluation without loading entire file lists into memory.
- Returns iterator of blob name strings, supporting streaming workflows.
- Ideal for directories with 100K+ files where full materialization is unnecessary.
-
Optimized
walk()Method:- Replaced
list_blobs()withlist_files_iter()for lazy evaluation. - Eliminated full materialization of blob lists into memory.
- Performance: 2-3x faster and 100-500x less memory usage for large directory trees.
- Replaced
-
Optimized
list_directories()Method:- Replaced
list_blobs()iteration with Azure'swalk_blobs(delimiter='/')for hierarchical listing. - Azure now returns directory prefixes directly without scanning all files.
- Performance: 100-1000x faster for large directories (1-5 seconds vs 30-60 seconds for 1M files).
- No longer requires iterating through all files to identify subdirectories.
- Replaced
-
New Utility Methods:
has_files_with_extension(): Fast early-exit check for files with specific extensions without full directory scan.count_files(): Memory-efficient file counting using generator iteration.count_files_with_extension(): Memory-efficient counting of files by extension.
-
-
SnowflakeDataStore: Multiprocessing Support
- Implemented lazy connection initialization enabling safe pickling and multiprocessing usage.
- Added thread-safe connection creation via
_get_connection()with double-check locking pattern. - Implemented custom
__getstate__()and__setstate__()for proper serialization, excluding non-picklable connection and lock objects. - Each worker process creates its own database connection on first access.
- Maintained full backward compatibility via
connectionproperty accessor. - Improved startup performance by deferring connection creation until first use.
Performance¶
-
Significant speedups for Azure Blob Storage operations:
- File listing operations now faster due to
list_blob_names()optimization. - Directory listing operations now 100-1000x faster using Azure's hierarchical
walk_blobs()with delimiter. - Memory usage reduced by 100-500x for large directory operations (2-5MB vs 500MB-1GB).
- Directory tree walking now 2-3x faster with lazy evaluation via
list_files_iter(). - Benefits scale with directory size—most dramatic improvements for directories with 100K+ files.
- File listing operations now faster due to
-
Improved startup time for SnowflakeDataStore:
- Lazy connection initialization eliminates unnecessary connection overhead when data store is instantiated but not immediately used.
Developer Notes¶
-
Migration from Legacy Maxar API:
- Existing code using
MAXAR_USERNAME,MAXAR_PASSWORD, andMAXAR_CONNECTION_STRINGenvironment variables must migrate toMAXAR_API_KEY. - Old base URL and authentication methods are no longer supported by Maxar's API infrastructure.
- Layer names in existing configurations must be updated to new
Maxar:namespace. - CQL filter parameter names updated throughout codebase for consistency with new API specification.
- Existing code using
-
ADLSDataStore Improvements:
- Existing
list_files()calls remain fully compatible—no code changes required. - For large directories (100K+ files), consider using
list_files_iter()for memory efficiency. - The
_normalize_path()helper ensures consistent path handling across all methods. - All optimizations leverage Azure SDK's native capabilities for maximum performance.
- Existing
Dependencies¶
- Updated
owslibusage to support WFS 2.0.0 specification for metadata retrieval.
[v0.7.5] - 2026-01-20¶
Added¶
-
Google-Microsoft Combined Buildings Handler (VIDA)
- Added
GoogleMSBuildingsHandler(gigaspatial/handlers/google_ms_combined_buildings.py) to access the merged Google V3 Open Buildings (1.8B footprints) and Microsoft Global Building Footprints (1.24B footprints) dataset hosted by VIDA/source.coop. - Supports multiple data formats: GeoParquet (default), FlatGeobuf, and PMTiles.
- Flexible partition strategies: country-level (single file per country) or S2 grid partitioning (tiled by S2 cells for large countries).
- Download strategies: S3 (default) or HTTPS for cloud-native access.
- Source filtering: filter buildings by data source (
google,microsoft, or both). - Integrated with
BaseHandlerarchitecture for consistent data lifecycle management (download, cache, read). - Automatic partition discovery: resolves relevant S2 tiles or country files based on query geometry.
- Streaming support: can stream GeoParquet row groups directly from cloud storage without full download.
- Added
-
Shared Building-Processing Engine
- Added
GoogleMSBuildingsEngine(gigaspatial/processing/buildings_engine.py) as a reusable, high-performance building workflow engine. - Encapsulates common building-processing logic (S2 tile job creation, per-tile processing loops, and result accumulation) previously duplicated across view generators.
- Provides two main entrypoints:
count_buildings_in_zones(): Efficiently counts buildings intersecting zones usingSTRtreespatial indexing.nearest_buildings_to_pois(): Computes nearest-building distances for POIs using KD-tree nearest-neighbor search with haversine distance calculation.
- Handles both single-file countries (processes entire dataset) and partitioned countries (loads only intersecting S2 tiles).
- Supports source filtering (
google,microsoft, or both) at the engine level. - Designed for reuse: both zonal and POI view generators now delegate to this engine, reducing code duplication and ensuring consistent performance optimizations.
- Added
-
High-Performance Buildings Mapping for View Generators
- Added
GeometryBasedZonalViewGenerator.map_buildings()to efficiently count buildings per zone using the combined dataset.- Uses S2 grid partitioning to load only intersecting building tiles for partitioned countries.
- Leverages
STRtreespatial indexing for fast intersection queries between buildings and zones. - Supports source filtering to count buildings from specific providers.
- Added
PoiViewGenerator.find_nearest_buildings()to efficiently compute nearest-building distances for POIs.- Uses S2 grid partitioning with configurable search radius to limit tile processing.
- Implements KD-tree nearest-neighbor search for fast candidate selection.
- Computes final distances using haversine (great-circle) distance in meters.
- Supports global nearest-building search with progressive radius expansion for partitioned countries.
- Returns distance metrics and boolean flags indicating buildings within specified search radius.
- Added
-
Geo Processing Utilities
- Added
estimate_utm_crs_with_fallback()(gigaspatial/processing/geo.py) as a robust utility for UTM CRS estimation.- Wraps
GeoDataFrame.estimate_utm_crs()with comprehensive error handling and fallback logic. - Automatically falls back to a configurable CRS (default: Web Mercator EPSG:3857) when UTM estimation fails or returns
None. - Handles edge cases: empty GeoDataFrames, estimation exceptions, and
Nonereturn values. - Centralizes the common UTM estimation pattern used across the codebase, reducing duplication.
- Provides optional logger parameter for warning messages when fallbacks occur.
- Wraps
- Added
Fixed¶
-
Spatial Matching Graph Construction (
build_distance_graph)- Fixed critical bug where
exclude_same_index=Truereturned fewer matches than requested when querying same dataframe against itself. - Previously, when excluding self-matches, the function would query for
max_kneighbors and then filter out self-matches, resulting in onlymax_k - 1actual matches returned. - Most critically,
max_k=1withexclude_same_index=Truewould always return zero matches (since the only neighbor found was the point itself). - Now queries for
max_k + 1neighbors whenexclude_same_index=True, ensuringmax_kvalid matches are returned after self-match removal. - Affects
build_distance_graph()in spatial matching workflows where the same dataframe is matched against itself.
- Fixed critical bug where
-
S2 cell polygon generation (
to_geoms)- Now conversion of S2 cells to Shapely polygons enforces a consistent counter‑clockwise winding order using shapely.geometry.polygon.orient() , avoiding accidental orientation flips or hole-like polygons when rendering or exporting.
- Added validation and automatic repair of invalid polygons via buffer(0) to handle rare projection-related self-intersections, logging the number of repaired cells and skipping any that remain invalid after repair.
- Improved logging for debugging by including the S2 token in warnings and errors when a cell fails conversion or cannot be repaired.
Performance¶
- Significant speedups for buildings enrichment
- Zonal building counts and POI nearest-building mapping are now substantially faster than mapping Google and Microsoft buildings separately, achieving performance gains through:
- Single combined dataset: Eliminates redundant I/O and repeated processing of overlapping building footprints from separate sources.
- S2 tile filtering: For partitioned countries, loads only intersecting S2 building tiles instead of scanning entire country files, dramatically reducing memory usage and processing time.
- Spatial indexing: Uses
STRtreefor O(log k) zone intersection queries and KD-tree for fast nearest-neighbor search, replacing slower sequential scans. - Shared engine architecture: Centralized optimizations benefit both zonal and POI workflows, ensuring consistent performance improvements across use cases.
- Zonal building counts and POI nearest-building mapping are now substantially faster than mapping Google and Microsoft buildings separately, achieving performance gains through:
Dependencies¶
- Added
s3fs>=2024.12.0as a new dependency
[v0.7.4] - 2025-11-24¶
Added¶
-
TifProcessor: Raster Export Methods
-
save_to_file()method: Comprehensive raster export functionality with flexible compression and optimization options.- Supports multiple compression algorithms: LZW (default), DEFLATE, ZSTD, JPEG, WEBP, and NONE.
- Configurable compression parameters:
ZLEVELfor DEFLATE (default: 6),ZSTD_LEVELfor ZSTD (default: 9),JPEG_QUALITY(default: 85),WEBP_LEVEL(default: 75). - Predictor support for improved compression: predictor=2 for integer data (horizontal differencing), predictor=3 for floating-point data.
- Tiled output enabled by default (512×512 blocksize) for optimal random access performance.
- Cloud-Optimized GeoTIFF (COG) support via
cog=Trueparameter with automatic overview generation. - Customizable overview levels and resampling methods for COG creation.
- BigTIFF support for files >4GB via
bigtiffparameter. - Multi-threading support for compatible compression algorithms via
num_threadsparameter. - Integrates with
self.open_dataset()context manager, automatically handling merged, reprojected, and clipped rasters. - Writes through
self.data_storeabstraction layer, supporting both local and remote storage (e.g., ADLS). - Preserves all bands from source rasters without skipping.
-
save_array_to_file()method: Export processed numpy arrays while preserving georeferencing metadata.- Accepts 2D or 3D numpy arrays (with automatic dimension handling).
- Inherits CRS, transform, and nodata values from source raster or accepts custom values.
- Supports same compression options as
save_to_file(). - Enables saving modified/processed raster data while maintaining spatial reference.
- Writes through
self.data_storefor consistent storage abstraction.
-
-
TifProcessor: Value-Based Filtering for DataFrame and GeoDataFrame Conversion
-
min_valueandmax_valueparameters: Added optional filtering thresholds toto_dataframe()andto_geodataframe()methods.min_value: Filters out pixels with values ≤ threshold (exclusive).max_value: Filters out pixels with values ≥ threshold (exclusive).- Filtering occurs before geometry creation in
to_geodataframe(), significantly improving performance for sparse datasets. - Supports both single-band and multi-band rasters with consistent behavior.
-
Enhanced
_build_data_mask()method: Extended to incorporate value threshold filtering alongside nodata filtering.- Combines multiple mask conditions using logical AND for efficient filtering.
- Maintains backward compatibility when no thresholds are specified.
-
Enhanced
_build_multi_band_mask()method: Extended for multi-band value filtering.- Drops pixels where ANY band has nodata or fails value thresholds.
- Ensures consistent filtering behavior across RGB, RGBA, and multi-band modes.
-
-
TifProcessor: Raster Statistics in
get_raster_info()- Added
include_statisticsandapprox_okflags to optionally return pixel statistics alongside metadata. - New
_get_basic_statistics()helper streams through raster blocks to compute per-band and overall min, max, mean, std, sum, and count with nodata-aware masking. - Results are cached for reuse within the processor lifecycle to avoid repeated scans.
- Added
-
BaseHandler: Tabular Load Progress
_load_tabular_data()now supports atqdmprogress bar, showing file-level load progress for large tabular batches.- Added
show_progressandprogress_descparameters so handlers can toggle or customize the indicator while keeping existing callers backward compatible.
-
Improved developer usability by enabling easier access to primary components without deep module references:
- Exposed core handlers, view generators, and processing modules at the top-level
gigaspatialpackage namespace for improved user experience and simplified imports. - Added convenient aliases for
gigaspatial.core.ioasioandgigaspatial.processing.algorithmsasalgorithmsdirectly accessible fromgigaspatial. - Declared explicit public API in
__init__.pyto clarify stable, supported components.
- Exposed core handlers, view generators, and processing modules at the top-level
Changed¶
-
GHSLDataConfig: Improved SSL Certificate Handling for Tile Downloads
-
Replaced
ssl._create_unverified_contextapproach with a robust two-tier fallback strategy for downloading GHSL tiles shapefile. -
Primary method: Attempts download via
gpd.read_file()with unverified SSL context (fast, direct access). -
Fallback method: Uses
requests.get()withverify=Falsefor environments wheregpd.read_file()fails (e.g., cloud compute instances with Anaconda certificate bundles). -
Downloads tiles to temporary local file before reading when fallback is triggered, ensuring compatibility across different Python environments.
-
Tile caching: Implemented GeoJSON-based caching in
base_path/cache/directory to minimize redundant downloads.- Cache checked before any download attempts.
- Invalid cache automatically triggers re-download.
- Uses
write_dataset()for consistent storage abstraction across local and remote data stores.
-
Enhanced error handling:
- Logs specific exception types (
type(e).__name__) for better debugging. - Graceful fallback with informative warning messages.
- Preserves exception chain for traceback analysis.
- Logs specific exception types (
- Improved compatibility with Azure ML compute instances where multiple certificate stores (system, Anaconda, certifi) coexist.
- Temporary file cleanup guaranteed via
finallyblock, preventing orphaned downloads.
-
Fixed¶
- GHSLDataConfig: SSL certificate verification failures in cloud environments
- Resolved
CERTIFICATE_VERIFY_FAILEDerrors when downloading GHSL tiles shapefile on cloud compute instances.
- Resolved
Performance¶
- Reduced network overhead for GHSL tile metadata:
- Tiles shapefile downloaded only once per coordinate system (WGS84/Mollweide) and cached locally.
- Subsequent
GHSLDataConfiginstantiations load from cache, eliminating repeated ~MB shapefile downloads. - Benefits scale with number of GHSL queries across application lifecycle.
Documentation¶
- Improved
READMEwith clearer key workflows, core concepts, and updated overview text.
[v0.7.3] - 2025-11-11¶
Added¶
-
SnowflakeDataStore Support
- New
SnowflakeDataStoreclass implementing theDataStoreinterface for Snowflake stages. - Supports file operations (read, write, list, delete) on Snowflake internal stages.
- Integrated with
gigaspatial/config.pyfor centralized configuration via environment variables. - Provides directory-like operations (
mkdir,rmdir,walk,is_dir,is_file) for conceptual directories in Snowflake stages. - Includes context manager support and connection management.
- Full compatibility with existing
DataStoreabstraction.
- New
-
BaseHandler: Config-Level Data Unit Caching
BaseHandlerConfignow maintains internal_unit_cachefor data unit and geometry caching.- Cache stores tuples of
(units, search_geometry)for efficient reuse across handler, downloader, and reader operations. - New methods:
get_cached_search_geometry(): Retrieve cached geometry for a source.clear_unit_cache(): Clear cached data for testing or manual refreshes._cache_key(): Generate canonical cache keys from various source types.
- Benefits all components (handler, downloader, reader) regardless of entry point.
-
Unified Geometry Extraction in BaseHandlerConfig
- New
extract_search_geometry()method providing standardized geometry extraction from various source types:- Country codes (via
AdminBoundaries) - Shapely geometries (
BaseGeometry) - GeoDataFrames with automatic CRS handling
- Lists of points or coordinate tuples (converted to
MultiPoint)
- Country codes (via
- Centralizes geometry conversion logic, eliminating duplication across handler methods.
- New
-
BaseHandler: Crop-to-Source Feature for Handlers
- New
crop_to_sourceparameter inBaseHandlerReader.load()andBaseHandler.load_data()methods. - Allows users to load data clipped to exact source boundaries rather than full data units (e.g., tiles).
- Particularly useful for tile-based datasets (Google Open Buildings, GHSL) where tiles extend beyond requested regions.
- Implemented
crop_to_geometry()method inBaseHandlerReaderfor spatial filtering:- Supports
(Geo)DataFrameclipping using geometry intersection. - Supports raster clipping using
TifProcessor'sclip_to_geometrymethod. - Extensible for future cropping implementations.
- Supports
- Search geometries are now cached alongside data units for efficient cropping operations.
- New
-
S2 Zonal View Generator (
S2ViewGenerator)- New generator for producing zonal views using Google S2 cells (levels 0–30).
- Supports sources:
- Country name (
str) viaCountryS2Cells.create(...) - Shapely geometry or
gpd.GeoDataFrameviaS2Cells.from_spatial(...) - Points (
List[Point | (lon, lat)]) viaS2Cells.from_points(...) - Explicit cells (
List[int | str], S2 IDs or tokens) viaS2Cells.from_cells(...)
- Country name (
- Uses
cell_tokenas the zone identifier. - Includes
map_wp_pop()convenience method (auto-uses stored country when available).
-
H3 Zonal View Generator (
H3ViewGenerator)- New generator for producing zonal views using H3 hexagons (resolutions 0–15).
- Supports sources:
- Country name (
str) viaCountryH3Hexagons.create(...) - Shapely geometry or
gpd.GeoDataFrameviaH3Hexagons.from_spatial(...) - Points (
List[Point | (lon, lat)]) viaH3Hexagons.from_spatial(...) - Explicit H3 indexes (
List[str]) viaH3Hexagons.from_hexagons(...)
- Country name (
- Uses
h3as the zone identifier. - Includes
map_wp_pop()convenience method (auto-uses stored country when available).
-
TifProcessor: MultiPoint clipping support
_prepare_geometry_for_clipping()now acceptsMultiPointinputs and uses their bounding box for raster clipping.- Enables passing collections of points as a
MultiPointtoclip_to_geometry()without pre-converting to a polygon.
Changed¶
-
Configuration
- Added Snowflake connection parameters to
gigaspatial/config.py:SNOWFLAKE_ACCOUNT,SNOWFLAKE_USER,SNOWFLAKE_PASSWORDSNOWFLAKE_WAREHOUSE,SNOWFLAKE_DATABASE,SNOWFLAKE_SCHEMASNOWFLAKE_STAGE_NAME
- Added Snowflake configuration variables to
.env_sample
- Added Snowflake connection parameters to
-
BaseHandler
-
Streamlined Data Unit Resolution
- Consolidated
get_relevant_data_units_by_country(),get_relevant_data_units_by_points(), andget_relevant_data_units_by_geometry()into a unified workflow. - All source types now convert to geometry via
extract_search_geometry()before unit resolution. - Subclasses now only need to implement
get_relevant_data_units_by_geometry()for custom logic. - Significantly reduces code duplication in handler subclasses.
- Consolidated
-
Optimized Handler Workflow
- Eliminated redundant
get_relevant_data_units()calls across handler, downloader, and reader operations. ensure_data_available()now uses cached units and paths, preventing multiple lookups per request.- Data unit resolution occurs at most once per unique source query, improving performance for:
- Repeated
load_data()calls with the same source. - Operations involving both download and read steps.
- Direct usage of downloader or reader components.
- Repeated
- Eliminated redundant
-
-
Enhanced BaseHandlerReader
resolve_source_paths()now primarily handles explicit file paths.- Geometry/country/point conversion delegated to handler and config layers.
load()method updated to supportcrop_to_sourceparameter with automatic geometry retrieval from cache.- Fallback geometry computation if cache miss occurs (e.g., when reader used independently).
-
BaseHandlerConfig Caching Logic
get_relevant_data_units()now checks cache before computing units.- Added
force_recomputeparameter to bypass cache when needed (e.g.,force_download=True). - Cache operations include debug logging for transparency during development.
-
TifProcessor temp-file handling
- Simplified
_create_clipped_processor()to mirror_reproject_to_temp_file: write clipped output to the new processor’s_temp_dir, set_clipped_file_path, updatedataset_path, and reload metadata. open_dataset()now prioritizes_merged_file_path,_reprojected_file_path, then_clipped_file_path, and opens local files directly.- Clipped processors consistently use
LocalDataStore()for local temp files to avoid data-store path resolution issues.
- Simplified
Fixed¶
- TifProcessor: clip_to_geometry() open failure after merge
- Fixed a bug where
open_dataset()failed for processors returned byclip_to_geometry()when the source was initialized with multiple paths and loaded via handlers withmerge_rasters=True. - The clipped raster is now saved directly into the new processor’s temp directory and tracked via
_clipped_file_path, ensuring reliable access byopen_dataset(). - Absolute path checks in
__post_init__now useos.path.exists()for absolute paths withLocalDataStore, preventing false negatives for temp files.
- Fixed a bug where
Performance¶
- Significant reduction in redundant computations in handlers:
- Single geometry extraction per source query (previously up to 3 times).
- Single data unit resolution per source query (previously 2-3 times).
- Cached geometry reuse for cropping operations.
- Benefits scale with:
- Number of repeated queries.
- Complexity of geometry extraction (especially country boundaries).
- Number of data units per query.
Developer Notes¶
- Subclass implementations should now:
- Only override
get_relevant_data_units_by_geometry()for custom unit resolution. - Use
extract_search_geometry()for any geometry conversion needs. - Optionally override
crop_to_geometry()for dataset-specific cropping logic.
- Only override
Dependencies¶
- Added
snowflake-connector-python>=3.0.0as a new dependency
[v0.7.2] - 2025-10-27¶
Added¶
-
Ookla Speedtest Handler Integration (
OoklaSpeedtestHandler)- New classes
OoklaSpeedtestHandler,OoklaSpeedtestConfig,OoklaSpeedtestDownloader, andOoklaSpeedtestReaderfor managing Ookla Speedtest data. OoklaSpeedtestHandler.load_datamethod supports Mercator tile filtering by country or spatial geometry and includes an optionalprocess_geospatialparameter for WKT to GeoDataFrame conversion.- In
OoklaSpeedtestConfig,yearandquarterfields are optional (defaulting toNone) and__post_init__logs warnings if they are not explicitly provided, using the latest available data. - In
OoklaSpeedtestReader,resolve_source_pathsmethod overridden to appropriately handleNoneor non-path sources by returning theDATASET_URL. OoklaSpeedtestHandler, the__init__method requirestypeas a mandatory argument, withyearandquarterbeing optional.
- New classes
-
S2 Grid Generation Support (
S2Cells)- Introduced
S2Cellsclass for managing Google S2 cell grids using thes2spherelibrary. - Supports S2 levels 0-30, providing finer granularity than H3 (30 levels vs 15).
- Provides multiple creation methods:
from_cells(): Create from lists of S2 cell IDs (integers or tokens).from_bounds(): Create from geographic bounding box coordinates.from_spatial(): Create from various spatial sources (geometries, GeoDataFrames, points).from_json(): Load S2Cells from JSON files via DataStore.
- Includes methods for spatial operations:
get_neighbors(): Get edge neighbors (4 per cell) with optional corner neighbors (8 total).get_children(): Navigate to higher resolution child cells.get_parents(): Navigate to lower resolution parent cells.filter_cells(): Filter cells by a given set of cell IDs.
- Provides conversion methods:
to_dataframe(): Convert to pandas DataFrame with cell IDs, tokens, and centroid coordinates.to_geoms(): Convert cells to shapely Polygon geometries (square cells).to_geodataframe(): Convert to GeoPandas GeoDataFrame with geometry column.
- Supports saving to JSON, Parquet, or GeoJSON files via
save()method. - Includes
average_cell_areaproperty for approximate area calculation based on S2 level.
- Introduced
-
Country-Specific S2 Cells (
CountryS2Cells)- Extends
S2Cellsfor generating S2 grids constrained by country boundaries. - Integrates with
AdminBoundariesto fetch country geometries for precise cell generation. - Factory method
create()enforces proper instantiation with country code validation viapycountry.
- Extends
-
Expanded
write_datasetto support generic JSON objects.- The
write_datasetfunction can now write any serializable Python object (like a dict or list) directly to a.jsonfile by leveraging the dedicated write_json helper.
- The
-
NASA SRTM Elevation Data Handler (
NasaSRTMHandler)- New handler classes for downloading and processing NASA SRTM elevation data (30m and 90m resolution).
- Supports Earthdata authentication via
EARTHDATA_USERNAMEandEARTHDATA_PASSWORDenvironment variables. NasaSRTMConfigprovides dynamic 1°x1° tile grid generation covering the global extent.NasaSRTMDownloadersupports parallel downloads of SRTM .hgt.zip tiles using multiprocessing.NasaSRTMReaderloads SRTM data with options to return as pandas DataFrame or list ofSRTMParserobjects.- Integrated with
BaseHandlerarchitecture for consistent data lifecycle management.
-
SRTM Parser (
SRTMParser)- Efficient parser for NASA SRTM .hgt.zip files using memory mapping.
- Supports both SRTM-1 (3601x3601, 1 arc-second) and SRTM-3 (1201x1201, 3 arc-second) formats.
- Provides methods for:
get_elevation(latitude, longitude): Get interpolated elevation for specific coordinates.get_elevation_batch(coordinates): Batch elevation queries with NumPy array support.to_dataframe(): Convert elevation data to pandas DataFrame with optional NaN filtering.- Automatic tile coordinate extraction from filename (e.g., N37E023, S10W120).
-
SRTM Manager (
SRTMManager)- Manager class for accessing elevation data across multiple SRTM tiles with lazy loading.
- Implements LRU caching (default cache size: 10 tiles) for efficient memory usage.
- Methods include:
get_elevation(latitude, longitude): Get interpolated elevation for any coordinate.get_elevation_batch(coordinates): Batch elevation queries across multiple tiles.get_elevation_profile(latitudes, longitudes): Generate elevation profiles along paths.check_coverage(latitude, longitude): Check if a coordinate has SRTM coverage.get_available_tiles(): List available SRTM tiles.clear_cache()andget_cache_info(): Cache management utilities.
- Automatically handles tile boundary crossings for elevation profiles.
-
Earthdata Session (
EarthdataSession)- Custom
requests.Sessionsubclass for NASA Earthdata authentication. - Maintains Authorization headers through redirects to/from Earthdata hosts.
- Required for accessing NASA's SRTM data repository.
- Custom
Changed¶
-
ADLSDataStore Enhancements
- Modified
__init__method to support initialization using eitherADLS_CONNECTION_STRINGor a combination ofADLS_ACCOUNT_URLandADLS_SAS_TOKEN. - Improved flexibility for authenticating with Azure Data Lake Storage.
- Modified
-
Configuration
- Added
ADLS_ACCOUNT_URLandADLS_SAS_TOKENtogigaspatial/config.pyand.env_samplefor alternative ADLS authentication. - Added
EARTHDATA_USERNAMEandEARTHDATA_PASSWORDtogigaspatial/config.pyand.env_samplefor NASA Earthdata authentication.
- Added
Fixed¶
- WorldPop:
RuntimeErrorduringschool_age=Truedata availability check:- Resolved a
RuntimeError: Could not ensure data availability for loadingthat occurred whenschool_age=Trueand WorldPop data was not yet present in the data store. WPPopulationConfig.get_data_unit_pathsnow correctly returns the original.zipURLs to trigger the download/extraction process when filtered.tiffiles are missing.- After successful download and extraction, it now accurately identifies and returns the paths to the local
.tiffiles, allowingBaseHandlerto confirm availability and proceed with loading.
- Resolved a
-
WorldPop:
list index out of rangewhen no datasets found:- Added a
RuntimeErrorinWPPopulationConfig.get_relevant_data_units_by_countrywhenself.client.search_datasetsreturns no results, providing a clearer error message with the search parameters.
- Added a
-
WorldPop: Incomplete downloads with
min_age/max_agefilters for non-school-ageage_structures:- Fixed an issue where
load_datawithmin_ageormax_agefilters (whenschool_age=False) resulted in incomplete downloads. WPPopulationConfig.get_data_unit_pathsnow returns all potential.tifURLs for non-school-ageage_structuresduring the initial availability check, ensuring all necessary files are downloaded.- Age/sex filtering is now deferred and applied by
WPPopulationReader.load_from_pathsusingWPPopulationConfig._filter_age_sex_pathsafter download, guaranteeing data integrity.
- Fixed an issue where
-
HealthSitesFetcher
- Ensured correct Coordinate Reference System (CRS) assignment (
EPSG:4326) when returningGeoDataFramefrom fetched health facility data.
- Ensured correct Coordinate Reference System (CRS) assignment (
Dependencies¶
- Added
s2sphereas a new dependency for S2 geometry operations
[v0.7.1] - 2025-10-15¶
Added¶
-
Healthsites.io API Integration (
HealthSitesFetcher):- New class
HealthSitesFetcherto fetch and process health facility data from the Healthsites.io API. - Supports filtering by country, bounding box extent, and date ranges (
from_date,to_date). - Provides methods for:
fetch_facilities(): Retrieves health facility locations, returning apd.DataFrameorgpd.GeoDataFramebased on output format.fetch_statistics(): Fetches aggregated statistics for health facilities based on provided filters.fetch_facility_by_id(): Retrieves details for a specific facility using its OSM type and ID.
- Includes robust handling for API pagination, different output formats (JSON, GeoJSON), and nested data structures.
- Integrates with
OSMLocationFetcherandpycountryto standardize country names to OSM English names for consistent querying. - Configurable parameters for API URL, API key, page size, flat properties, tag format, output format, and request sleep time.
- New class
-
OSMLocationFetcher Enhancements:
- Historical Data Fetching (
fetch_locations_changed_between):- New method
fetch_locations_changed_between()to retrieve OSM objects that were created or modified within a specified date range. This enables historical analysis and change tracking. - Defaults
include_metadatatoTruefor this method, as it's typically used for change tracking.
- New method
- Comprehensive OSM Country Information (
get_osm_countries):- New static method
get_osm_countries()to fetch country-level administrative boundaries directly from the OSM database. - Supports fetching all countries or a specific country by ISO 3166-1 alpha-3 code.
- Option to include various name variants (e.g.,
name:en,official_name) and ISO codes.
- New static method
- Metadata Inclusion in Fetched Locations:
- Added
include_metadataparameter tofetch_locations()to optionally retrieve change tracking metadata (timestamp, version, changeset, user, uid) for each fetched OSM element. - This metadata is now extracted and included in the DataFrame for nodes, relations, and ways.
- Added
- Flexible Date Filtering in Overpass Queries:
- Introduced
date_filter_type(newer,changed) andstart_date/end_dateparameters to_build_queries()for more granular control over time-based filtering in Overpass QL.
- Introduced
- Date Normalization Utility:
- Added
_normalize_date()helper method to convert various date inputs (string, datetime object) into a standardized ISO 8601 format for Overpass API queries.
- Added
- Historical Data Fetching (
-
TifProcessor
- Comprehensive Memory Management:
- Introduced
_check_available_memory(),_estimate_memory_usage(), and_memory_guard()methods for proactive memory assessment across various operations. - Added warnings (
ResourceWarning) for potentially high memory usage in batched operations, with suggestions for optimizingn_workers.
- Introduced
- Chunked DataFrame Conversion:
- Implemented
to_dataframe_chunked()for memory-efficient processing of large rasters by converting them to DataFrames in manageable chunks. - Automatic calculation of optimal
chunk_sizebased on target memory usage via_calculate_optimal_chunk_size(). - New helper methods:
_get_chunk_windows(),_get_chunk_coordinates().
- Implemented
- Raster Clipping Functionality:
clip_to_geometry(): New method to clip rasters to arbitrary geometries (Shapely, GeoDataFrame, GeoSeries, GeoJSON-like dicts).clip_to_bounds(): New method to clip rasters to rectangular bounding boxes, supporting optional CRS transformation for the bounds.- New helper methods for clipping:
_prepare_geometry_for_clipping(),_validate_geometry_crs(),_create_clipped_processor().
- Comprehensive Memory Management:
-
WorldPopDownloader Zip Handling:
- Modified
download_data_unitinWPPopulationDownloaderto correctly handle.zipfiles (e.g., school age datasets) by downloading them to a temporary location and extracting the contained.tiffiles. - Updated
download_data_unitsto correctly flatten the list of paths returned bydownload_data_unitwhen zip extraction results in multiple files. - Adjusted
WPPopulationConfig.get_data_unit_pathsto correctly identify and return paths for extracted.tiffiles from zip resources. It is now intelligently resolves paths. For school-age datasets, it returns paths to extracted.tiffiles if available; otherwise, it returns the original.zippath(s) to trigger download and extraction. - Added filter support to
WPPopulationConfig.get_data_unit_pathshence to theWPPopulationHandlerfor:- School-age datasets: supports
sex(e.g., "F", "M", "F_M") andeducation_level(e.g., "PRIMARY", "SECONDARY") filters on extracted.tiffilenames. - Non-school-age age_structures: supports
sex,ages,min_age, andmax_agefilters on.tiffilenames.
- School-age datasets: supports
- Modified
-
WorldPop: Filtered aggregation in
GeometryBasedZonalViewGenerator.map_wp_pop:map_wp_popnow enforces a single country input whenhandler.config.projectis "age_structures".- When
predicateis "centroid_within" and the project is "age_structures", individualTifProcessorobjects (representing age/sex combinations) are loaded, sampled withmap_rasters(stat="sum"), and their results are summed per zone, preventing unintended merging.
-
PoiViewGenerator: Filtered aggregation in
PoiViewGenerator.map_wp_pop:map_wp_popnow enforces a single country input whenhandler.config.projectis "age_structures".- When
predicateis "centroid_within" and the project is "age_structures", individualTifProcessorobjects (representing age/sex combinations) are loaded, sampled withmap_zonal_stats(stat="sum"), and their results are summed per POI, preventing unintended merging.
-
TifProcessor Multi-Raster Merging in Handlers and Generators:
- Extended
_load_raster_datainBaseHandlerReaderto support an optionalmerge_rastersargument. WhenTrueand multiple raster paths are provided,TifProcessornow merges them into a singleTifProcessorobject during loading. - Integrated
merge_rastersargument intoGHSLDataReaderandWPPopulationReader'sload_from_pathsandloadmethods, enabling control over raster merging at the reader level. - Propagated
merge_rasterstoGHSLDataHandler'sload_into_dataframe, andload_into_geodataframemethods for consistent behavior across the handler interface.
- Extended
Changed¶
-
TifProcessor
- Unified DataFrame Conversion:
- Refactored
to_dataframe()to act as a universal entry point, dynamically routing to internal, more efficient methods for single and multi-band processing. - Deprecated the individual
_to_band_dataframe(),_to_rgb_dataframe(),_to_rgba_dataframe(), and_to_multi_band_dataframe()methods in favor of the new unified_to_dataframe(). to_dataframe()now includes acheck_memoryparameter.
- Refactored
- Optimized
open_datasetContext Manager:- The
open_datasetcontext manager now directly opens local files whenLocalDataStoreis used, avoiding unnecessaryrasterio.MemoryFilecreation for improved performance and reduced memory overhead.
- The
- Enhanced
to_geodataframeandto_graph:- Added
check_memoryparameter toto_geodataframe()andto_graph()for memory pre-checks.
- Added
- Refined
sample_by_polygons_batched:- Included
check_memoryparameter for memory checks before batch processing. - Implemented platform-specific warnings for potential multiprocessing issues on Windows/macOS.
- Included
- Improved Multiprocessing Initialization:
- The
_initializer_worker()method now prioritizes merged, reprojected, or original local file paths for opening, ensuring workers access the most relevant data.
- The
- Modular Masking and Coordinate Extraction:
- Introduced new private helper methods:
_extract_coordinates_with_mask(),_build_data_mask(),_build_multi_band_mask(), and_bands_to_dict()to centralize and improve data masking and coordinate extraction logic.
- Introduced new private helper methods:
- Streamlined Band-Mode Validation:
- Moved the logic for validating
modeand band count compatibility into a dedicated_validate_mode_band_compatibility()method for better code organization.
- Moved the logic for validating
- Unified DataFrame Conversion:
-
GigaSchoolLocationFetcher
fetch_locations()method:- Added
process_geospatialparameter (defaults toFalse) to optionally process geospatial data and return agpd.GeoDataFrame.
- Added
_process_geospatial_data()method:- Modified to return a
gpd.GeoDataFrameby converting thepd.DataFramewith ageometrycolumn andEPSG:4326CRS.
- Modified to return a
-
OSMLocationFetcher Refactoring:
- Unified Query Execution and Processing: Refactored the core logic for executing Overpass queries and processing their results into a new private method
_execute_and_process_queries(). This centralizes common steps and reduces code duplication betweenfetch_locations()and the newfetch_locations_changed_between(). - Enhanced
_build_queries: Modified_build_queriesto acceptdate_filter_type,start_date,end_date, andinclude_metadatato construct more dynamic and feature-rich Overpass QL queries. - Updated
fetch_locationsSignature:- Replaced
since_yearparameter withsince_date(which can be astrordatetimeobject) for more precise time-based filtering. - Added
include_metadataparameter.
- Replaced
- Improved Logging of Category Distribution:
- Modified the logging for category distribution to correctly handle cases where categories are combined into a list (when
handle_duplicates='combine').
- Modified the logging for category distribution to correctly handle cases where categories are combined into a list (when
since_yearParameter: Removedsince_yearfromfetch_locations()as its functionality is now superseded by the more flexiblesince_dateparameter and the_build_queriesenhancements.
- Unified Query Execution and Processing: Refactored the core logic for executing Overpass queries and processing their results into a new private method
-
PoiViewGeneratorMapping Methods (map_zonal_stats,map_nearest_points,map_google_buildings,map_ms_buildings,map_built_s,map_smod):- Changed
map_zonal_statsandmap_nearest_pointsto returnpd.DataFrameresults (including'poi_id'and new mapped columns) instead of directly updating the internal view. - Updated
map_google_buildings,map_ms_buildings,map_built_s, andmap_smodto capture thepd.DataFramereturned by their respective underlying mapping calls (map_nearest_pointsormap_zonal_stats) and then explicitly callself._update_view()with these results. - This enhances modularity and allows for more flexible result handling and accumulation.
- Changed
-
ZonalViewGenerator.map_rastersEnhancements:- Modified
map_rastersto acceptraster_dataas either a singleTifProcessoror aList[TifProcessor]. - Implemented internal merging of
List[TifProcessor]into a singleTifProcessorbefore performing zonal statistics. - Replaced
sample_multiple_tifs_by_polygonswith theTifProcessor.sample_by_polygonsmethod.
- Modified
Fixed¶
- TifProcessor:
to_graph()Sparse Matrix Creation:- Corrected the sparse matrix creation logic in
to_graph()to ensure proper symmetric graph representation whengraph_type="sparse".
- Corrected the sparse matrix creation logic in
- Coordinate System Handling in
_initializer_worker:- Ensured that
_initializer_workercorrectly handles different data storage scenarios to provide the correct dataset handle to worker processes, preventingRuntimeErrordue to uninitialized raster datasets.
- Ensured that
Removed¶
- OSMLocationFetcher
- Redundant Category Distribution Logging: Removed the explicit category distribution logging for
handle_duplicates == "separate"since thevalue_counts()method on the 'category' column already provides this.
- Redundant Category Distribution Logging: Removed the explicit category distribution logging for
[v0.7.0] - 2025-09-17¶
Added¶
-
TifProcessor Revamp
- Explicit Reprojection Method: Introduced
reproject_to()method, allowing on-demand reprojection of rasters to a new CRS with customizableresampling_methodandresolution. - Reprojection Resolution Control: Added
reprojection_resolutionparameter toTifProcessorfor precise control over output pixel size during reprojection. - Advanced Raster Information: Added
get_raster_info()method to retrieve a comprehensive dictionary of raster metadata. - Graph Conversion Capabilities: Implemented
to_graph()method to convert raster data into a graph (NetworkX or sparse matrix) based on pixel adjacency (4- or 8-connectivity). - Internal Refactoring:
_reproject_to_temp_file: Introduced_reproject_to_temp_fileas a helper for reprojection into temporary files.
- Explicit Reprojection Method: Introduced
-
H3 Grid Generation
- H3 Grid Generation Module (
gigaspatial/grid/h3.py):- Introduced
H3Hexagonsclass for managing H3 cell IDs. - Supports creation from lists of hexagons, geographic bounds, spatial geometries, or points.
- Provides methods to convert H3 hexagons to pandas DataFrames and GeoPandas GeoDataFrames.
- Includes functionalities for filtering, getting k-ring neighbors, compacting hexagons, and getting children/parents at different resolutions.
- Allows saving H3Hexagons to JSON, Parquet, or GeoJSON files.
- Introduced
- Country-Specific H3 Hexagons (
CountryH3Hexagons):- Extends
H3Hexagonsfor generating H3 grids constrained by country boundaries. - Integrates with
AdminBoundariesto fetch country geometries for precise H3 cell generation.
- Extends
- H3 Grid Generation Module (
-
Documentation
- Improved
tif.mdexample to showcase multi-raster initialization, explicit reprojection, and graph conversion.
- Improved
Changed¶
- TifProcessor
- Improved Temporary File Management: Refactored temporary file handling for merging and reprojection using
tempfile.mkdtemp()andshutil.rmtreefor more robust and reliable cleanup. Integrated with context manager (__enter__,__exit__) and added a dedicatedcleanup()method. - Reprojection during Initialization: Implemented automatic reprojection of single rasters to a specified
target_crsduringTifProcessorinitialization. - Enhanced
open_datasetContext Manager: Theopen_datasetcontext manager now intelligently opens the most up-to-date (merged or reprojected) version of the dataset. - More Flexible Multi-Dataset Validation: Modified
_validate_multiple_datasetsto issue a warning instead of raising an error for CRS mismatches whentarget_crsis not set. -
Optimized
_get_reprojection_profile: Dynamically calculates transform and dimensions based onreprojection_resolutionand added LZW compression to reprojected TIFF files to reduce file size. -
ADLSDataStore Enhancements
- New
copy_filemethod: Implemented a new method for copying individual files within ADLS, with an option to overwrite existing files. - New
renamemethod: Added a new method to rename (move) files in ADLS, which internally usescopy_fileand then deletes the source, with options for overwrite, waiting for copy completion, and polling. - Revamped
rmdirmethod: Modifiedrmdirto perform batch deletions of blobs, addressing the Azure Blob batch delete limit (256 sub-requests) and improving efficiency for large directories.
- New
-
LocalDataStore Enhancements
- New
copy_filemethod: Implemented a new method for copying individual files.
- New
Removed¶
- Removed deprecated
tabularproperty andget_zoned_geodataframemethod fromTifProcessor. Users should now useto_dataframe()andto_geodataframe()respectively.
Dependencies¶
- Added
networkxandh3as new dependencies.
Fixed¶
- Several small fixes and improvements to aggregation methods.
[v0.6.9] - 2025-07-26¶
Fixed¶
- Resolved a bug in the handler base class where non-hashable types (dicts) were incorrectly used as dictionary keys in
unit_to_pathmapping, preventing potential runtime errors during data availability checks.
[v0.6.8] - 2025-07-26¶
Added¶
- OSMLocationFetcher Enhancements
- Support for querying OSM locations by arbitrary administrative levels (e.g., states, provinces, cities), in addition to country-level queries.
- New optional parameters:
admin_level: Specify OSM administrative level (e.g., 4 for states, 6 for counties).admin_value: Name of the administrative area to query (e.g., "California").
-
New static method
get_admin_names(admin_level, country=None):- Fetch all administrative area names for a given
admin_level, optionally filtered by country. - Helps users discover valid admin area names for constructing precise queries.
- Fetch all administrative area names for a given
-
Multi-Raster Merging Support in TifProcessor
- Added ability to initialize
TifProcessorwith multiple raster datasets. - Merges rasters on load with configurable strategies:
- Supported
merge_methodoptions:first,last,min,max,mean.
- Supported
- Supports on-the-fly reprojection for rasters with differing coordinate reference systems via
target_crs. - Handles resampling using
resampling_method(default:nearest). - Comprehensive validation to ensure compatibility of input rasters (e.g., resolution, nodata, dtype).
- Temporary file management for merged output with automatic cleanup.
- Backward compatible with single-raster use cases.
New TifProcessor Parameters: - merge_method (default: first) – How to combine pixel values across rasters. - target_crs (optional) – CRS to reproject rasters before merging. - resampling_method – Resampling method for reprojection.
New Properties: - is_merged: Indicates whether the current instance represents merged rasters. - source_count: Number of raster datasets merged.
Changed¶
- OSMLocationFetcher Overpass Query Logic
- Refactored Overpass QL query builder to support subnational queries using
admin_levelandadmin_value. - Improved flexibility and precision for spatial data collection across different administrative hierarchies.
Breaking Changes¶
- None. All changes are fully backward compatible.
[v0.6.7] - 2025-07-16¶
Fixed¶
- Fixed a bug in WorldPopHandler/ADLSDataStore integration where a
Pathobject was passed instead of a string, causing aquote_from_bytes() expected byteserror during download.
[v0.6.6] - 2025-07-15¶
Added¶
AdminBoundaries.from_global_country_boundaries(scale="medium")- New class method to load global admin level 0 boundaries from Natural Earth.
-
Supports
"large"(10m),"medium"(50m), and"small"(110m) scale options. -
WorldPop Handler Refactor (API Integration)
- Introduced
WPPopulationHandler,WPPopulationConfig,WPPopulationDownloader, andWPPopulationReader. - Uses new
WorldPopRestClientto dynamically query the WorldPop REST API. - Replaces static metadata files and hardcoded logic with API-based discovery and download.
- Country code lookup and dataset filtering now handled at runtime.
-
Improved validation, extensibility, logging, and error handling.
-
POI-Based WorldPop Mapping
-
PoiViewGenerator.map_wp_pop()method:- Maps WorldPop population data around POIs using flexible spatial predicates:
"centroid_within","intersects","fractional"(1000m only),"within"- Supports configurable radius and resolution (100m or 1000m).
- Aggregates population data and appends it to the view.
-
Geometry-Based Zonal WorldPop Mapping
GeometryBasedZonalViewGenerator.map_wp_pop()method:- Maps WorldPop population data to polygons/zones using:
"intersects"or"fractional"predicate- Returns zonal population sums as a new view column.
- Handles predicate-dependent data loading (raster vs. GeoDataFrame).
Changed¶
- Refactored
BaseHandler.ensure_data_available - More efficient data check and download logic.
- Downloads only missing units unless
force_download=True. -
Cleaner structure and better reuse of
get_relevant_data_units(). -
Refactored WorldPop Module
- Complete handler redesign using API-based architecture.
- Dataset paths and URLs are now dynamically constructed from API metadata.
- Resolution/year validation is more robust and descriptive.
- Removed static constants, gender/school_age toggles, and local CSV dependency.
Fixed¶
- Several small fixes and improvements to zonal aggregation methods, especially around CRS consistency, missing values, and result alignment.
[v0.6.5] - 2025-07-01¶
Added¶
-
MercatorTiles.get_quadkeys_from_points()
New static method for efficient 1:1 point-to-quadkey mapping using coordinate-based logic, improving performance over spatial joins. -
AdminBoundariesViewGenerator
New generator class for producing zonal views based on administrative boundaries (e.g., districts, provinces) with flexible source and admin level support. -
Zonal View Generator Enhancements
_view: Internal attribute for accumulating mapped statistics.view: Exposes current state of zonal view.add_variable_to_view(): Adds mapped data frommap_points,map_polygons, ormap_rasterswith robust validation and zone alignment.-
to_dataframe()andto_geodataframe()methods added for exporting current view in tabular or spatial formats. -
PoiViewGeneratorEnhancements - Consistent
_viewDataFrame for storing mapped results. _update_view(): Central method to update POI data.save_view(): Improved format handling (CSV, Parquet, GeoJSON, etc.) with geometry recovery.to_dataframe()andto_geodataframe()methods added for convenient export of enriched POI view.-
Robust duplicate ID detection and CRS validation in
map_zonal_stats. -
TifProcessorEnhancements sample_by_polygons_batched(): Parallel polygon sampling.- Enhanced
sample_by_polygons()with nodata masking and multiple stats. -
warn_on_error: Flag to suppress sampling warnings. -
GeoTIFF Multi-Band Support
multimode added for multi-band raster support.- Auto-detects band names via metadata.
-
Strict validation of band count based on mode (
single,rgb,rgba,multi). -
Spatial Distance Graph Algorithm
build_distance_graph()added for fast KD-tree-based spatial matching.- Supports both
DataFrameandGeoDataFrameinputs. - Outputs a
networkx.Graphwith optional DataFrame of matches. -
Handles projections, self-match exclusion, and includes verbose stats/logs.
-
Database Integration (Experimental)
- Added
DBConnectionclass incore/io/database.pyfor unified Trino and PostgreSQL access. - Supports schema/table introspection, query execution, and reading into
pandasordask. - Handles connection creation, credential management, and diagnostics.
-
Utility methods for schema/view/table/column listings and parameterized queries.
-
GHSL Population Mapping
map_ghsl_pop()method added toGeometryBasedZonalViewGenerator.- Aggregates GHSL population rasters to user-defined zones.
- Supports
intersectsandfractionalpredicates (latter for 1000m resolution only). - Returns population statistics (e.g.,
sum) with customizable column prefix.
Changed¶
-
MercatorTiles.from_points()now internally usesget_quadkeys_from_points()for better performance. -
map_points()andmap_rasters()now returnDict[zone_id, value]to support direct usage withadd_variable_to_view(). -
Refactored
aggregate_polygons_to_zones() area_weighteddeprecated in favor ofpredicate.- Supports flexible predicates like
"within","fractional"for spatial aggregation. -
map_polygons()updated to reflect this change. -
Optional Admin Boundaries Configuration
ADMIN_BOUNDARIES_DATA_DIRis now optional.AdminBoundaries.create()only attempts to load if explicitly configured or path is provided.- Improved documentation and fallback behavior for missing configs.
Fixed¶
- GHSL Downloader
- ZIP files are now downloaded into a temporary cache directory using
requests.get(). -
Avoids unnecessary writes and ensures cleanup.
-
TifProcessor - Removed polygon sampling warnings unless explicitly enabled.
Deprecated¶
TifProcessor.tabular→ useto_dataframe()instead.TifProcessor.get_zoned_geodataframe()→ useto_geodataframe()instead.area_weighted→ usepredicatein aggregation methods instead.
[v0.6.4] - 2025-06-19¶
Added¶
- GigaSchoolProfileFetcher
- New class to fetch and process school profile data from the Giga School Profile API
- Supports paginated fetching, filtering by country and school ID
-
Includes methods to generate connectivity summary statistics by region, connection type, and source
-
GigaSchoolMeasurementsFetcher
- New class to fetch and process daily real-time connectivity measurements from the Giga API
- Supports filtering by date range and school
-
Includes performance summary generation (download/upload speeds, latency, quality flags)
-
AdminBoundaries.from_geoboundaries
- New class method to download and process geoBoundaries data by country and admin level
-
Automatically handles HDX dataset discovery, downloading, and fallback logic
-
HDXConfig.search_datasets
- Static method to search HDX datasets without full handler initialization
- Supports query string, sort order, result count, HDX site selection, and custom user agent
Fixed¶
- Typo in
MaxarImageDownloadercausing runtime error
Documentation¶
- Improved Configuration Guide (
docs/user-guide/configuration.md) - Added comprehensive table of environment variables with defaults and descriptions
- Synced
.env_sampleandconfig.pywith docs - Example
.envfile and guidance on path overrides usingconfig.set_path - New section on
config.ensure_directories_existand troubleshooting tips - Clearer handling of credentials and security notes
- Improved formatting and structure for clarity
[v0.6.3] - 2025-06-16¶
Added¶
- Major refactor of
HDXmodule to align with unifiedBaseHandlerarchitecture: HDXConfig: fully aligned withBaseHandlerConfigstructure.- Added flexible pattern matching for resource filtering.
- Improved data unit resolution by country, geometry, and points.
- Enhanced resource filtering with exact and regex options.
HDXDownloaderfully aligned withBaseHandlerDownloader:- Simplified sequential download logic.
- Improved error handling, validation, and logging.
HDXReaderfully aligned withBaseHandlerReader:- Added
resolve_source_pathsandload_all_resourcesmethods. - Simplified source handling for single and multiple files.
- Cleaned up redundant and dataset-specific logic.
-
Introduced
HDXHandleras unified orchestration layer using factory methods. -
Refactor of
RelativeWealthIndex (RWI)module: - Added new
RWIHandlerclass aligned withHDXHandlerandBaseHandler. - Simplified class names:
RWIDownloaderandRWIReader. - Enhanced configuration with
latest_onlyflag to select newest resources automatically. - Simplified resource filtering and country resolution logic.
-
Improved code maintainability, type hints, and error handling.
-
New raster multi-band support in TifProcessor:
- Added new
multimode for handling multi-band raster datasets. - Automatic band name detection from raster metadata.
- Added strict mode validation (
single,rgb,rgba,multi). - Enhanced error handling for invalid modes and band counts.
Fixed¶
- Fixed GHSL tiles loading behavior for correct coordinate system handling:
- Moved
TILES_URLformatting and tile loading tovalidate_configuration. - Ensures proper tile loading after CRS validation.
Documentation¶
- Updated and standardized API references across documentation.
- Standardized handler method names and usage examples.
- Added building enrichment examples for POI processing.
- Updated installation instructions.
Deprecated¶
- Deprecated direct imports from individual handler modules.
[v0.6.2] - 2025-06-11¶
Added¶
- New
ROOT_DATA_DIRconfiguration option to set a base directory for all data tiers - Can be configured via environment variable
ROOT_DATA_DIRor.envfile - Defaults to current directory (
.) if not specified - All tier data paths (bronze, silver, gold, views) are now constructed relative to this root directory
- Example: Setting
ROOT_DATA_DIR=/data/gigaspatialwill store all data under/data/gigaspatial/bronze,/data/gigaspatial/silver, etc.
Fixed¶
- Fixed URL formatting in GHSL tiles by using Enum value instead of Enum member
- Ensures consistent URL formatting with numeric values (4326) instead of Enum names (WGS84)
-
Fixes URL formatting issue across different Python environments
-
Refactored GHSL downloader to follow DataStore abstraction
- Directory creation is now handled by DataStore implementation
- Removed redundant directory creation logic from download_data_unit method
- Improves separation of concerns and makes the code more maintainable
[v0.6.1] - 2025-06-09¶
Fixed¶
- Gracefully handle missing or invalid GeoRepo API key in
AdminBoundaries.create(): - Wrapped
GeoRepoClientinitialization in atry-exceptblock - Added fallback to GADM if GeoRepo client fails
- Improved logging for better debugging and transparency
[v0.6.0] - 2025-06-09¶
Added¶
POI View Generator¶
map_zonal_stats: New method for enriched spatial mapping with support for:- Raster point sampling (value at POI location)
- Raster zonal statistics (with buffer zone)
- Polygon aggregation (with optional area-weighted averaging)
- Auto-generated POI IDs in
_init_points_gdffor consistent point tracking. - Support for area-weighted aggregation for polygon-based statistics.
BaseHandler Orchestration Layer¶
- New abstract
BaseHandlerclass providing unified lifecycle orchestration for config, downloader, and reader. - High-level interface methods:
ensure_data_available()load_data()download_and_load()get_available_data_info()- Integrated factory pattern for safe and standardized component creation.
- Built-in context manager support for resource cleanup.
- Fully backwards compatible with existing handler architecture.
Handlers Updated to Use BaseHandler¶
GoogleOpenBuildingsHandlerMicrosoftBuildingsHandlerGHSLDataHandler- All now inherit from
BaseHandler, supporting standardized behavior and cleaner APIs.
Changed¶
POI View Generator¶
map_built_sandmap_smodnow internally use the newmap_zonal_statsmethod.tif_processorsrenamed todatato support both raster and polygon inputs.- Removed parameters:
id_column(now handled internally)area_column(now automatically calculated)
Internals and Usability¶
- Improved error handling with clearer validation messages.
- Enhanced logging for better visibility during enrichment.
- More consistent use of coordinate column naming.
- Refined type hints and parameter documentation across key methods.
Notes¶
- Removed legacy POI generator classes and redundant
poi.pyfile. - Simplified imports and removed unused handler dependencies.
- All POI generator methods now include updated docstrings, parameter explanations, and usage examples.
- Added docs on the new
BaseHandlerinterface and handler refactors.
[v0.5.0] - 2025-06-02¶
Changed¶
- Refactored data loading architecture:
- Introduced dedicated reader classes for major datasets (Microsoft Global Buildings, Google Open Buildings, GHSL), each inheriting from a new
BaseHandlerReader. - Centralized file existence checks and raster/tabular loading methods in
BaseHandlerReader. -
Improved maintainability by encapsulating dataset-specific logic inside each reader class.
-
Modularized source resolution:
-
Each reader now supports resolving data by country, geometry, or individual points, improving code reuse and flexibility.
-
Unified POI enrichment:
- Merged all POI generators (Google Open Buildings, Microsoft Global Buildings, GHSL Built Surface, GHSL SMOD) into a single
PoiViewGeneratorclass. - Supports flexible inputs: list of
(lat, lon)tuples, list of dicts, DataFrame, or GeoDataFrame. - Maintains consistent internal state via
points_gdf, updated after each mapping. -
Enables chained enrichment of POI data using multiple datasets.
-
Modernized internal data access:
- All data loading now uses dedicated handler/reader classes, improving consistency and long-term maintainability.
Fixed¶
- Full DataStore integration:
- Fixed
OpenCellIDandHDXhandlers to fully support theDataStoreabstraction. - All file reads, writes, and checks now use the configured
DataStore(local or cloud). - Temporary files are only used during downloads; final data is always stored and accessed via the DataStore interface.
Removed¶
- Removed deprecated POI generator classes and the now-obsolete poi submodule. All enrichment is handled through the unified
PoiViewGenerator.
Notes¶
- This release finalizes the architectural refactors started in
v0.5.0. - While marked stable, please report any issues or regressions from the new modular structure.
[v0.5.0b1] - 2025-05-27¶
Added¶
- New Handlers:
hdx.py: Handler for downloading and managing Humanitarian Data Exchange datasets.rwi.py: Handler for the Relative Wealth Index dataset.opencellid.py: Handler for OpenCellID tower locations.unicef_georepo.py: Integration with UNICEF’s GeoRepo asset repository.- Zonal Generators:
- Introduced the
generators/zonal/module to support spatial aggregations of various data types (points, polygons, rasters) to zonal geometries such as grid tiles or catchment areas. - New Geo-Processing Methods:
- Added methods to compute centroids of (Multi)Polygon geometries.
- Added methods to calculate area of (Multi)Polygon geometries in square meters.
Changed¶
- Refactored:
config.py: Added support for new environment variables (OpenCellID and UNICEF GeoRepo keys).geo.py: Enhanced spatial join functions for improved performance and clarity.handlers/:- Minor robustness improvements in
google_open_buildingsandmicrosoft_global_buildings. - Added a new class method in
boundariesfor initializing admin boundaries from UNICEF GeoRepo.
- Minor robustness improvements in
core/io/:- Added
list_directoriesmethod to both ADLS and local storage backends.
- Added
- Documentation & Project Structure:
- Updated
.env_sampleand.gitignoreto align with new environment variables and data handling practices.
Dependencies¶
- Updated
requirements.txtandsetup.pyto reflect new dependencies and ensure compatibility.
Notes¶
- This is a pre-release (
v0.5.0b1) and is intended for testing and feedback. - Some new modules, especially in
handlersandgenerators, are experimental and may be refined in upcoming releases.
[v0.4.1] - 2025-04-17¶
Added¶
- Documentation:
- Added API Reference documentation for all modules, classes, and functions.
- Added a Configuration Guide to explain how to set up paths, API keys, and other.
- TifProcessor: added new to_dataframe method.
- config: added set_path method for dynamic path management.
Changed¶
- Documentation:
- Restructured the
docs/directory to improve organization and navigation. - Updated the
index.mdfor the User Guide to provide a clear overview of available documentation. - Updated Examples for downloading, processing, and storing geospatial data - more to come.
- Restructured the
- README:
- Updated the README with a clear description of the package’s purpose and key features.
- Added a section on View Generators to explain spatial context enrichment and mapping to grid or POI locations.
- Included a Supported Datasets section with an image of dataset provider logos.
Fixed¶
- Handled errors when processing nodes, relations, and ways in OSMLocationFetcher.
- Made
admin1andadmin1_id_gigaoptional in GigaEntity instances for countries with no admin level 1 divisions.
[v0.4.0] - 2025-04-01¶
Added¶
- POI View Generators: Introduced a new module, generators, containing a base class for POI view generation.
- Expanded POI Support: Added new classes for generating POI views from:
- Google Open Buildings
- Microsoft Global Buildings
- GHSL Settlement Model
- GHSL Built Surface
- New Reader: Added read_gzipped_json_or_csv to handle compressed JSON/CSV files.
Changed¶
- ADLSDataStore Enhancements: Updated methods to match LocalDataStore for improved consistency.
- Geo Processing Updates:
- Improved convert_to_dataframe for more efficient data conversion.
- Enhanced annotate_with_admin_regions to improve spatial joins.
- New TifProcessor Methods:
- sample_by_polygons for polygon-based raster sampling.
- sample_multiple_tifs_by_coordinates & sample_multiple_tifs_by_polygons to manage multi-raster sampling.
- Fixed Global Config Handling: Resolved issues with handling configurations inside classes.
[v0.3.2] - 2025-03-21¶
Added¶
- Added a method to efficiently assign unique IDs to features.
Changed¶
- Enhanced logging for better debugging and clarity.
Fixed¶
- Minor bug fix in config.py
[0.3.1] - 2025-03-20¶
Added¶
- Enhanced AdminBoundaries handler with improved error handling for cases where administrative level data is unavailable for a country.
- Added pyproject.toml and setup.py, enabling pip install support for the package.
- Introduced a new method annotate_with_admin_regions in geo.py to perform spatial joins between input points and administrative boundaries (levels 1 and 2), handling conflicts where points intersect multiple admin regions.
Removed¶
- Removed the utils module containing logger.py and integrated LOG_FORMAT and get_logger into config.py for a more streamlined logging approach.
[0.3.0] - 2025-03-18¶
Added¶
- Compression support in readers for improved efficiency
- New GHSL data handler to manage GHSL dataset downloads
Fixed¶
- Small fixes/improvements in Microsoft Buildings, Maxar, and Overture handlers
[v0.2.2] - 2025-03-12¶
-
Refactored Handlers: Improved structure and performance of maxar_image.py, osm.py and overture.py to enhance geospatial data handling.
-
Documentation Improvements:
- Updated index.md, advanced.md, and use-cases.md for better clarity.
- Added installation.md under docs/getting-started for setup guidance.
- Refined API documentation in docs/api/index.md.
-
Configuration & Setup Enhancements: • Improved .gitignore to exclude unnecessary files. • Updated mkdocs.yml for better documentation structuring.
- Bug Fixes & Minor Optimizations: Small fixes and improvements across the codebase for stability and maintainability.
[v0.2.1] - 2025-02-28¶
Added¶
- Introduced WorldPopDownloader feature to handlers
- Refactored TifProcessor class for better performance
Fixed¶
- Minor bug fixes and performance improvements
[v0.2.0] - MaxarImageDownloader & Bug Fixes - 2025-02-24¶
- New Handler: MaxarImageDownloader for downloading Maxar images.
- Bug Fixes: Various improvements and bug fixes.
- Enhancements: Minor optimizations in handlers.
[v0.1.1] - 2025-02-24¶
Added¶
- Local Data Store: Introduced a new local data store alongside ADLS to improve data storage and read/write functionality.
- Boundaries Handler: Added
boundaries.py, a new handler that allows to read administrative boundaries from GADM.
Changed¶
- Handler Refactoring: Refactored existing handlers to improve modularity and data handling.
- Configuration Management: Added
config.pyto manage paths, runtime settings, and environment variables.
Removed¶
- Administrative Schema: Removed
administrative.pysince its functionality is now handled by theboundarieshandler. - Globals Module: Removed
globals.pyand replaced it withconfig.pyfor better configuration management.
Updated Files¶
config.pyboundaries.pygoogle_open_buildings.pymapbox_image.pymicrosoft_global_buildings.pyookla_speedtest.pymercator_tiles.pyadls_data_store.pydata_store.pylocal_data_store.pyreaders.pywriters.pyentity.py
[v0.1.0] - 2025-02-07¶
Added¶
- New data handlers:
google_open_buildings.py,microsoft_global_buildings.py,overture.py,mapbox_image.py,osm.py - Processing functions in
tif_processor.py,geo.pyandtransform.py - Grid generation modules:
h3_tiles.py,mercator_tiles.py - View managers:
grid_view.pyandnational_view.py - Schemas:
administrative.py
Changed¶
- Updated
requirements.txtwith new dependencies - Improved logging and data storage mechanisms
Removed¶
- Deprecated views:
h3_view.py,mercator_view.py