Changelog¶
All notable changes to this project will be documented in this file.
[v0.7.4] - 2025-11-24¶
Added¶
-
TifProcessor: Raster Export Methods
-
save_to_file()method: Comprehensive raster export functionality with flexible compression and optimization options.- Supports multiple compression algorithms: LZW (default), DEFLATE, ZSTD, JPEG, WEBP, and NONE.
- Configurable compression parameters:
ZLEVELfor DEFLATE (default: 6),ZSTD_LEVELfor ZSTD (default: 9),JPEG_QUALITY(default: 85),WEBP_LEVEL(default: 75). - Predictor support for improved compression: predictor=2 for integer data (horizontal differencing), predictor=3 for floating-point data.
- Tiled output enabled by default (512×512 blocksize) for optimal random access performance.
- Cloud-Optimized GeoTIFF (COG) support via
cog=Trueparameter with automatic overview generation. - Customizable overview levels and resampling methods for COG creation.
- BigTIFF support for files >4GB via
bigtiffparameter. - Multi-threading support for compatible compression algorithms via
num_threadsparameter. - Integrates with
self.open_dataset()context manager, automatically handling merged, reprojected, and clipped rasters. - Writes through
self.data_storeabstraction layer, supporting both local and remote storage (e.g., ADLS). - Preserves all bands from source rasters without skipping.
-
save_array_to_file()method: Export processed numpy arrays while preserving georeferencing metadata.- Accepts 2D or 3D numpy arrays (with automatic dimension handling).
- Inherits CRS, transform, and nodata values from source raster or accepts custom values.
- Supports same compression options as
save_to_file(). - Enables saving modified/processed raster data while maintaining spatial reference.
- Writes through
self.data_storefor consistent storage abstraction.
-
-
TifProcessor: Value-Based Filtering for DataFrame and GeoDataFrame Conversion
-
min_valueandmax_valueparameters: Added optional filtering thresholds toto_dataframe()andto_geodataframe()methods.min_value: Filters out pixels with values ≤ threshold (exclusive).max_value: Filters out pixels with values ≥ threshold (exclusive).- Filtering occurs before geometry creation in
to_geodataframe(), significantly improving performance for sparse datasets. - Supports both single-band and multi-band rasters with consistent behavior.
-
Enhanced
_build_data_mask()method: Extended to incorporate value threshold filtering alongside nodata filtering.- Combines multiple mask conditions using logical AND for efficient filtering.
- Maintains backward compatibility when no thresholds are specified.
-
Enhanced
_build_multi_band_mask()method: Extended for multi-band value filtering.- Drops pixels where ANY band has nodata or fails value thresholds.
- Ensures consistent filtering behavior across RGB, RGBA, and multi-band modes.
-
-
TifProcessor: Raster Statistics in
get_raster_info()- Added
include_statisticsandapprox_okflags to optionally return pixel statistics alongside metadata. - New
_get_basic_statistics()helper streams through raster blocks to compute per-band and overall min, max, mean, std, sum, and count with nodata-aware masking. - Results are cached for reuse within the processor lifecycle to avoid repeated scans.
- Added
-
BaseHandler: Tabular Load Progress
_load_tabular_data()now supports atqdmprogress bar, showing file-level load progress for large tabular batches.- Added
show_progressandprogress_descparameters so handlers can toggle or customize the indicator while keeping existing callers backward compatible.
-
Improved developer usability by enabling easier access to primary components without deep module references:
- Exposed core handlers, view generators, and processing modules at the top-level
gigaspatialpackage namespace for improved user experience and simplified imports. - Added convenient aliases for
gigaspatial.core.ioasioandgigaspatial.processing.algorithmsasalgorithmsdirectly accessible fromgigaspatial. - Declared explicit public API in
__init__.pyto clarify stable, supported components.
- Exposed core handlers, view generators, and processing modules at the top-level
Changed¶
-
GHSLDataConfig: Improved SSL Certificate Handling for Tile Downloads
-
Replaced
ssl._create_unverified_contextapproach with a robust two-tier fallback strategy for downloading GHSL tiles shapefile. -
Primary method: Attempts download via
gpd.read_file()with unverified SSL context (fast, direct access). -
Fallback method: Uses
requests.get()withverify=Falsefor environments wheregpd.read_file()fails (e.g., cloud compute instances with Anaconda certificate bundles). -
Downloads tiles to temporary local file before reading when fallback is triggered, ensuring compatibility across different Python environments.
-
Tile caching: Implemented GeoJSON-based caching in
base_path/cache/directory to minimize redundant downloads.- Cache checked before any download attempts.
- Invalid cache automatically triggers re-download.
- Uses
write_dataset()for consistent storage abstraction across local and remote data stores.
-
Enhanced error handling:
- Logs specific exception types (
type(e).__name__) for better debugging. - Graceful fallback with informative warning messages.
- Preserves exception chain for traceback analysis.
- Logs specific exception types (
- Improved compatibility with Azure ML compute instances where multiple certificate stores (system, Anaconda, certifi) coexist.
- Temporary file cleanup guaranteed via
finallyblock, preventing orphaned downloads.
-
Fixed¶
- GHSLDataConfig: SSL certificate verification failures in cloud environments
- Resolved
CERTIFICATE_VERIFY_FAILEDerrors when downloading GHSL tiles shapefile on cloud compute instances.
- Resolved
Performance¶
- Reduced network overhead for GHSL tile metadata:
- Tiles shapefile downloaded only once per coordinate system (WGS84/Mollweide) and cached locally.
- Subsequent
GHSLDataConfiginstantiations load from cache, eliminating repeated ~MB shapefile downloads. - Benefits scale with number of GHSL queries across application lifecycle.
Documentation¶
- Improved
READMEwith clearer key workflows, core concepts, and updated overview text.
[v0.7.3] - 2025-11-11¶
Added¶
-
SnowflakeDataStore Support
- New
SnowflakeDataStoreclass implementing theDataStoreinterface for Snowflake stages. - Supports file operations (read, write, list, delete) on Snowflake internal stages.
- Integrated with
gigaspatial/config.pyfor centralized configuration via environment variables. - Provides directory-like operations (
mkdir,rmdir,walk,is_dir,is_file) for conceptual directories in Snowflake stages. - Includes context manager support and connection management.
- Full compatibility with existing
DataStoreabstraction.
- New
-
BaseHandler: Config-Level Data Unit Caching
BaseHandlerConfignow maintains internal_unit_cachefor data unit and geometry caching.- Cache stores tuples of
(units, search_geometry)for efficient reuse across handler, downloader, and reader operations. - New methods:
get_cached_search_geometry(): Retrieve cached geometry for a source.clear_unit_cache(): Clear cached data for testing or manual refreshes._cache_key(): Generate canonical cache keys from various source types.
- Benefits all components (handler, downloader, reader) regardless of entry point.
-
Unified Geometry Extraction in BaseHandlerConfig
- New
extract_search_geometry()method providing standardized geometry extraction from various source types:- Country codes (via
AdminBoundaries) - Shapely geometries (
BaseGeometry) - GeoDataFrames with automatic CRS handling
- Lists of points or coordinate tuples (converted to
MultiPoint)
- Country codes (via
- Centralizes geometry conversion logic, eliminating duplication across handler methods.
- New
-
BaseHandler: Crop-to-Source Feature for Handlers
- New
crop_to_sourceparameter inBaseHandlerReader.load()andBaseHandler.load_data()methods. - Allows users to load data clipped to exact source boundaries rather than full data units (e.g., tiles).
- Particularly useful for tile-based datasets (Google Open Buildings, GHSL) where tiles extend beyond requested regions.
- Implemented
crop_to_geometry()method inBaseHandlerReaderfor spatial filtering:- Supports
(Geo)DataFrameclipping using geometry intersection. - Supports raster clipping using
TifProcessor'sclip_to_geometrymethod. - Extensible for future cropping implementations.
- Supports
- Search geometries are now cached alongside data units for efficient cropping operations.
- New
-
S2 Zonal View Generator (
S2ViewGenerator)- New generator for producing zonal views using Google S2 cells (levels 0–30).
- Supports sources:
- Country name (
str) viaCountryS2Cells.create(...) - Shapely geometry or
gpd.GeoDataFrameviaS2Cells.from_spatial(...) - Points (
List[Point | (lon, lat)]) viaS2Cells.from_points(...) - Explicit cells (
List[int | str], S2 IDs or tokens) viaS2Cells.from_cells(...)
- Country name (
- Uses
cell_tokenas the zone identifier. - Includes
map_wp_pop()convenience method (auto-uses stored country when available).
-
H3 Zonal View Generator (
H3ViewGenerator)- New generator for producing zonal views using H3 hexagons (resolutions 0–15).
- Supports sources:
- Country name (
str) viaCountryH3Hexagons.create(...) - Shapely geometry or
gpd.GeoDataFrameviaH3Hexagons.from_spatial(...) - Points (
List[Point | (lon, lat)]) viaH3Hexagons.from_spatial(...) - Explicit H3 indexes (
List[str]) viaH3Hexagons.from_hexagons(...)
- Country name (
- Uses
h3as the zone identifier. - Includes
map_wp_pop()convenience method (auto-uses stored country when available).
-
TifProcessor: MultiPoint clipping support
_prepare_geometry_for_clipping()now acceptsMultiPointinputs and uses their bounding box for raster clipping.- Enables passing collections of points as a
MultiPointtoclip_to_geometry()without pre-converting to a polygon.
Changed¶
-
Configuration
- Added Snowflake connection parameters to
gigaspatial/config.py:SNOWFLAKE_ACCOUNT,SNOWFLAKE_USER,SNOWFLAKE_PASSWORDSNOWFLAKE_WAREHOUSE,SNOWFLAKE_DATABASE,SNOWFLAKE_SCHEMASNOWFLAKE_STAGE_NAME
- Added Snowflake configuration variables to
.env_sample
- Added Snowflake connection parameters to
-
BaseHandler
-
Streamlined Data Unit Resolution
- Consolidated
get_relevant_data_units_by_country(),get_relevant_data_units_by_points(), andget_relevant_data_units_by_geometry()into a unified workflow. - All source types now convert to geometry via
extract_search_geometry()before unit resolution. - Subclasses now only need to implement
get_relevant_data_units_by_geometry()for custom logic. - Significantly reduces code duplication in handler subclasses.
- Consolidated
-
Optimized Handler Workflow
- Eliminated redundant
get_relevant_data_units()calls across handler, downloader, and reader operations. ensure_data_available()now uses cached units and paths, preventing multiple lookups per request.- Data unit resolution occurs at most once per unique source query, improving performance for:
- Repeated
load_data()calls with the same source. - Operations involving both download and read steps.
- Direct usage of downloader or reader components.
- Repeated
- Eliminated redundant
-
-
Enhanced BaseHandlerReader
resolve_source_paths()now primarily handles explicit file paths.- Geometry/country/point conversion delegated to handler and config layers.
load()method updated to supportcrop_to_sourceparameter with automatic geometry retrieval from cache.- Fallback geometry computation if cache miss occurs (e.g., when reader used independently).
-
BaseHandlerConfig Caching Logic
get_relevant_data_units()now checks cache before computing units.- Added
force_recomputeparameter to bypass cache when needed (e.g.,force_download=True). - Cache operations include debug logging for transparency during development.
-
TifProcessor temp-file handling
- Simplified
_create_clipped_processor()to mirror_reproject_to_temp_file: write clipped output to the new processor’s_temp_dir, set_clipped_file_path, updatedataset_path, and reload metadata. open_dataset()now prioritizes_merged_file_path,_reprojected_file_path, then_clipped_file_path, and opens local files directly.- Clipped processors consistently use
LocalDataStore()for local temp files to avoid data-store path resolution issues.
- Simplified
Fixed¶
- TifProcessor: clip_to_geometry() open failure after merge
- Fixed a bug where
open_dataset()failed for processors returned byclip_to_geometry()when the source was initialized with multiple paths and loaded via handlers withmerge_rasters=True. - The clipped raster is now saved directly into the new processor’s temp directory and tracked via
_clipped_file_path, ensuring reliable access byopen_dataset(). - Absolute path checks in
__post_init__now useos.path.exists()for absolute paths withLocalDataStore, preventing false negatives for temp files.
- Fixed a bug where
Performance¶
- Significant reduction in redundant computations in handlers:
- Single geometry extraction per source query (previously up to 3 times).
- Single data unit resolution per source query (previously 2-3 times).
- Cached geometry reuse for cropping operations.
- Benefits scale with:
- Number of repeated queries.
- Complexity of geometry extraction (especially country boundaries).
- Number of data units per query.
Developer Notes¶
- Subclass implementations should now:
- Only override
get_relevant_data_units_by_geometry()for custom unit resolution. - Use
extract_search_geometry()for any geometry conversion needs. - Optionally override
crop_to_geometry()for dataset-specific cropping logic.
- Only override
Dependencies¶
- Added
snowflake-connector-python>=3.0.0as a new dependency
[v0.7.2] - 2025-10-27¶
Added¶
-
Ookla Speedtest Handler Integration (
OoklaSpeedtestHandler)- New classes
OoklaSpeedtestHandler,OoklaSpeedtestConfig,OoklaSpeedtestDownloader, andOoklaSpeedtestReaderfor managing Ookla Speedtest data. OoklaSpeedtestHandler.load_datamethod supports Mercator tile filtering by country or spatial geometry and includes an optionalprocess_geospatialparameter for WKT to GeoDataFrame conversion.- In
OoklaSpeedtestConfig,yearandquarterfields are optional (defaulting toNone) and__post_init__logs warnings if they are not explicitly provided, using the latest available data. - In
OoklaSpeedtestReader,resolve_source_pathsmethod overridden to appropriately handleNoneor non-path sources by returning theDATASET_URL. OoklaSpeedtestHandler, the__init__method requirestypeas a mandatory argument, withyearandquarterbeing optional.
- New classes
-
S2 Grid Generation Support (
S2Cells)- Introduced
S2Cellsclass for managing Google S2 cell grids using thes2spherelibrary. - Supports S2 levels 0-30, providing finer granularity than H3 (30 levels vs 15).
- Provides multiple creation methods:
from_cells(): Create from lists of S2 cell IDs (integers or tokens).from_bounds(): Create from geographic bounding box coordinates.from_spatial(): Create from various spatial sources (geometries, GeoDataFrames, points).from_json(): Load S2Cells from JSON files via DataStore.
- Includes methods for spatial operations:
get_neighbors(): Get edge neighbors (4 per cell) with optional corner neighbors (8 total).get_children(): Navigate to higher resolution child cells.get_parents(): Navigate to lower resolution parent cells.filter_cells(): Filter cells by a given set of cell IDs.
- Provides conversion methods:
to_dataframe(): Convert to pandas DataFrame with cell IDs, tokens, and centroid coordinates.to_geoms(): Convert cells to shapely Polygon geometries (square cells).to_geodataframe(): Convert to GeoPandas GeoDataFrame with geometry column.
- Supports saving to JSON, Parquet, or GeoJSON files via
save()method. - Includes
average_cell_areaproperty for approximate area calculation based on S2 level.
- Introduced
-
Country-Specific S2 Cells (
CountryS2Cells)- Extends
S2Cellsfor generating S2 grids constrained by country boundaries. - Integrates with
AdminBoundariesto fetch country geometries for precise cell generation. - Factory method
create()enforces proper instantiation with country code validation viapycountry.
- Extends
-
Expanded
write_datasetto support generic JSON objects.- The
write_datasetfunction can now write any serializable Python object (like a dict or list) directly to a.jsonfile by leveraging the dedicated write_json helper.
- The
-
NASA SRTM Elevation Data Handler (
NasaSRTMHandler)- New handler classes for downloading and processing NASA SRTM elevation data (30m and 90m resolution).
- Supports Earthdata authentication via
EARTHDATA_USERNAMEandEARTHDATA_PASSWORDenvironment variables. NasaSRTMConfigprovides dynamic 1°x1° tile grid generation covering the global extent.NasaSRTMDownloadersupports parallel downloads of SRTM .hgt.zip tiles using multiprocessing.NasaSRTMReaderloads SRTM data with options to return as pandas DataFrame or list ofSRTMParserobjects.- Integrated with
BaseHandlerarchitecture for consistent data lifecycle management.
-
SRTM Parser (
SRTMParser)- Efficient parser for NASA SRTM .hgt.zip files using memory mapping.
- Supports both SRTM-1 (3601x3601, 1 arc-second) and SRTM-3 (1201x1201, 3 arc-second) formats.
- Provides methods for:
get_elevation(latitude, longitude): Get interpolated elevation for specific coordinates.get_elevation_batch(coordinates): Batch elevation queries with NumPy array support.to_dataframe(): Convert elevation data to pandas DataFrame with optional NaN filtering.- Automatic tile coordinate extraction from filename (e.g., N37E023, S10W120).
-
SRTM Manager (
SRTMManager)- Manager class for accessing elevation data across multiple SRTM tiles with lazy loading.
- Implements LRU caching (default cache size: 10 tiles) for efficient memory usage.
- Methods include:
get_elevation(latitude, longitude): Get interpolated elevation for any coordinate.get_elevation_batch(coordinates): Batch elevation queries across multiple tiles.get_elevation_profile(latitudes, longitudes): Generate elevation profiles along paths.check_coverage(latitude, longitude): Check if a coordinate has SRTM coverage.get_available_tiles(): List available SRTM tiles.clear_cache()andget_cache_info(): Cache management utilities.
- Automatically handles tile boundary crossings for elevation profiles.
-
Earthdata Session (
EarthdataSession)- Custom
requests.Sessionsubclass for NASA Earthdata authentication. - Maintains Authorization headers through redirects to/from Earthdata hosts.
- Required for accessing NASA's SRTM data repository.
- Custom
Changed¶
-
ADLSDataStore Enhancements
- Modified
__init__method to support initialization using eitherADLS_CONNECTION_STRINGor a combination ofADLS_ACCOUNT_URLandADLS_SAS_TOKEN. - Improved flexibility for authenticating with Azure Data Lake Storage.
- Modified
-
Configuration
- Added
ADLS_ACCOUNT_URLandADLS_SAS_TOKENtogigaspatial/config.pyand.env_samplefor alternative ADLS authentication. - Added
EARTHDATA_USERNAMEandEARTHDATA_PASSWORDtogigaspatial/config.pyand.env_samplefor NASA Earthdata authentication.
- Added
Fixed¶
- WorldPop:
RuntimeErrorduringschool_age=Truedata availability check:- Resolved a
RuntimeError: Could not ensure data availability for loadingthat occurred whenschool_age=Trueand WorldPop data was not yet present in the data store. WPPopulationConfig.get_data_unit_pathsnow correctly returns the original.zipURLs to trigger the download/extraction process when filtered.tiffiles are missing.- After successful download and extraction, it now accurately identifies and returns the paths to the local
.tiffiles, allowingBaseHandlerto confirm availability and proceed with loading.
- Resolved a
-
WorldPop:
list index out of rangewhen no datasets found:- Added a
RuntimeErrorinWPPopulationConfig.get_relevant_data_units_by_countrywhenself.client.search_datasetsreturns no results, providing a clearer error message with the search parameters.
- Added a
-
WorldPop: Incomplete downloads with
min_age/max_agefilters for non-school-ageage_structures:- Fixed an issue where
load_datawithmin_ageormax_agefilters (whenschool_age=False) resulted in incomplete downloads. WPPopulationConfig.get_data_unit_pathsnow returns all potential.tifURLs for non-school-ageage_structuresduring the initial availability check, ensuring all necessary files are downloaded.- Age/sex filtering is now deferred and applied by
WPPopulationReader.load_from_pathsusingWPPopulationConfig._filter_age_sex_pathsafter download, guaranteeing data integrity.
- Fixed an issue where
-
HealthSitesFetcher
- Ensured correct Coordinate Reference System (CRS) assignment (
EPSG:4326) when returningGeoDataFramefrom fetched health facility data.
- Ensured correct Coordinate Reference System (CRS) assignment (
Dependencies¶
- Added
s2sphereas a new dependency for S2 geometry operations
[v0.7.1] - 2025-10-15¶
Added¶
-
Healthsites.io API Integration (
HealthSitesFetcher):- New class
HealthSitesFetcherto fetch and process health facility data from the Healthsites.io API. - Supports filtering by country, bounding box extent, and date ranges (
from_date,to_date). - Provides methods for:
fetch_facilities(): Retrieves health facility locations, returning apd.DataFrameorgpd.GeoDataFramebased on output format.fetch_statistics(): Fetches aggregated statistics for health facilities based on provided filters.fetch_facility_by_id(): Retrieves details for a specific facility using its OSM type and ID.
- Includes robust handling for API pagination, different output formats (JSON, GeoJSON), and nested data structures.
- Integrates with
OSMLocationFetcherandpycountryto standardize country names to OSM English names for consistent querying. - Configurable parameters for API URL, API key, page size, flat properties, tag format, output format, and request sleep time.
- New class
-
OSMLocationFetcher Enhancements:
- Historical Data Fetching (
fetch_locations_changed_between):- New method
fetch_locations_changed_between()to retrieve OSM objects that were created or modified within a specified date range. This enables historical analysis and change tracking. - Defaults
include_metadatatoTruefor this method, as it's typically used for change tracking.
- New method
- Comprehensive OSM Country Information (
get_osm_countries):- New static method
get_osm_countries()to fetch country-level administrative boundaries directly from the OSM database. - Supports fetching all countries or a specific country by ISO 3166-1 alpha-3 code.
- Option to include various name variants (e.g.,
name:en,official_name) and ISO codes.
- New static method
- Metadata Inclusion in Fetched Locations:
- Added
include_metadataparameter tofetch_locations()to optionally retrieve change tracking metadata (timestamp, version, changeset, user, uid) for each fetched OSM element. - This metadata is now extracted and included in the DataFrame for nodes, relations, and ways.
- Added
- Flexible Date Filtering in Overpass Queries:
- Introduced
date_filter_type(newer,changed) andstart_date/end_dateparameters to_build_queries()for more granular control over time-based filtering in Overpass QL.
- Introduced
- Date Normalization Utility:
- Added
_normalize_date()helper method to convert various date inputs (string, datetime object) into a standardized ISO 8601 format for Overpass API queries.
- Added
- Historical Data Fetching (
-
TifProcessor
- Comprehensive Memory Management:
- Introduced
_check_available_memory(),_estimate_memory_usage(), and_memory_guard()methods for proactive memory assessment across various operations. - Added warnings (
ResourceWarning) for potentially high memory usage in batched operations, with suggestions for optimizingn_workers.
- Introduced
- Chunked DataFrame Conversion:
- Implemented
to_dataframe_chunked()for memory-efficient processing of large rasters by converting them to DataFrames in manageable chunks. - Automatic calculation of optimal
chunk_sizebased on target memory usage via_calculate_optimal_chunk_size(). - New helper methods:
_get_chunk_windows(),_get_chunk_coordinates().
- Implemented
- Raster Clipping Functionality:
clip_to_geometry(): New method to clip rasters to arbitrary geometries (Shapely, GeoDataFrame, GeoSeries, GeoJSON-like dicts).clip_to_bounds(): New method to clip rasters to rectangular bounding boxes, supporting optional CRS transformation for the bounds.- New helper methods for clipping:
_prepare_geometry_for_clipping(),_validate_geometry_crs(),_create_clipped_processor().
- Comprehensive Memory Management:
-
WorldPopDownloader Zip Handling:
- Modified
download_data_unitinWPPopulationDownloaderto correctly handle.zipfiles (e.g., school age datasets) by downloading them to a temporary location and extracting the contained.tiffiles. - Updated
download_data_unitsto correctly flatten the list of paths returned bydownload_data_unitwhen zip extraction results in multiple files. - Adjusted
WPPopulationConfig.get_data_unit_pathsto correctly identify and return paths for extracted.tiffiles from zip resources. It is now intelligently resolves paths. For school-age datasets, it returns paths to extracted.tiffiles if available; otherwise, it returns the original.zippath(s) to trigger download and extraction. - Added filter support to
WPPopulationConfig.get_data_unit_pathshence to theWPPopulationHandlerfor:- School-age datasets: supports
sex(e.g., "F", "M", "F_M") andeducation_level(e.g., "PRIMARY", "SECONDARY") filters on extracted.tiffilenames. - Non-school-age age_structures: supports
sex,ages,min_age, andmax_agefilters on.tiffilenames.
- School-age datasets: supports
- Modified
-
WorldPop: Filtered aggregation in
GeometryBasedZonalViewGenerator.map_wp_pop:map_wp_popnow enforces a single country input whenhandler.config.projectis "age_structures".- When
predicateis "centroid_within" and the project is "age_structures", individualTifProcessorobjects (representing age/sex combinations) are loaded, sampled withmap_rasters(stat="sum"), and their results are summed per zone, preventing unintended merging.
-
PoiViewGenerator: Filtered aggregation in
PoiViewGenerator.map_wp_pop:map_wp_popnow enforces a single country input whenhandler.config.projectis "age_structures".- When
predicateis "centroid_within" and the project is "age_structures", individualTifProcessorobjects (representing age/sex combinations) are loaded, sampled withmap_zonal_stats(stat="sum"), and their results are summed per POI, preventing unintended merging.
-
TifProcessor Multi-Raster Merging in Handlers and Generators:
- Extended
_load_raster_datainBaseHandlerReaderto support an optionalmerge_rastersargument. WhenTrueand multiple raster paths are provided,TifProcessornow merges them into a singleTifProcessorobject during loading. - Integrated
merge_rastersargument intoGHSLDataReaderandWPPopulationReader'sload_from_pathsandloadmethods, enabling control over raster merging at the reader level. - Propagated
merge_rasterstoGHSLDataHandler'sload_into_dataframe, andload_into_geodataframemethods for consistent behavior across the handler interface.
- Extended
Changed¶
-
TifProcessor
- Unified DataFrame Conversion:
- Refactored
to_dataframe()to act as a universal entry point, dynamically routing to internal, more efficient methods for single and multi-band processing. - Deprecated the individual
_to_band_dataframe(),_to_rgb_dataframe(),_to_rgba_dataframe(), and_to_multi_band_dataframe()methods in favor of the new unified_to_dataframe(). to_dataframe()now includes acheck_memoryparameter.
- Refactored
- Optimized
open_datasetContext Manager:- The
open_datasetcontext manager now directly opens local files whenLocalDataStoreis used, avoiding unnecessaryrasterio.MemoryFilecreation for improved performance and reduced memory overhead.
- The
- Enhanced
to_geodataframeandto_graph:- Added
check_memoryparameter toto_geodataframe()andto_graph()for memory pre-checks.
- Added
- Refined
sample_by_polygons_batched:- Included
check_memoryparameter for memory checks before batch processing. - Implemented platform-specific warnings for potential multiprocessing issues on Windows/macOS.
- Included
- Improved Multiprocessing Initialization:
- The
_initializer_worker()method now prioritizes merged, reprojected, or original local file paths for opening, ensuring workers access the most relevant data.
- The
- Modular Masking and Coordinate Extraction:
- Introduced new private helper methods:
_extract_coordinates_with_mask(),_build_data_mask(),_build_multi_band_mask(), and_bands_to_dict()to centralize and improve data masking and coordinate extraction logic.
- Introduced new private helper methods:
- Streamlined Band-Mode Validation:
- Moved the logic for validating
modeand band count compatibility into a dedicated_validate_mode_band_compatibility()method for better code organization.
- Moved the logic for validating
- Unified DataFrame Conversion:
-
GigaSchoolLocationFetcher
fetch_locations()method:- Added
process_geospatialparameter (defaults toFalse) to optionally process geospatial data and return agpd.GeoDataFrame.
- Added
_process_geospatial_data()method:- Modified to return a
gpd.GeoDataFrameby converting thepd.DataFramewith ageometrycolumn andEPSG:4326CRS.
- Modified to return a
-
OSMLocationFetcher Refactoring:
- Unified Query Execution and Processing: Refactored the core logic for executing Overpass queries and processing their results into a new private method
_execute_and_process_queries(). This centralizes common steps and reduces code duplication betweenfetch_locations()and the newfetch_locations_changed_between(). - Enhanced
_build_queries: Modified_build_queriesto acceptdate_filter_type,start_date,end_date, andinclude_metadatato construct more dynamic and feature-rich Overpass QL queries. - Updated
fetch_locationsSignature:- Replaced
since_yearparameter withsince_date(which can be astrordatetimeobject) for more precise time-based filtering. - Added
include_metadataparameter.
- Replaced
- Improved Logging of Category Distribution:
- Modified the logging for category distribution to correctly handle cases where categories are combined into a list (when
handle_duplicates='combine').
- Modified the logging for category distribution to correctly handle cases where categories are combined into a list (when
since_yearParameter: Removedsince_yearfromfetch_locations()as its functionality is now superseded by the more flexiblesince_dateparameter and the_build_queriesenhancements.
- Unified Query Execution and Processing: Refactored the core logic for executing Overpass queries and processing their results into a new private method
-
PoiViewGeneratorMapping Methods (map_zonal_stats,map_nearest_points,map_google_buildings,map_ms_buildings,map_built_s,map_smod):- Changed
map_zonal_statsandmap_nearest_pointsto returnpd.DataFrameresults (including'poi_id'and new mapped columns) instead of directly updating the internal view. - Updated
map_google_buildings,map_ms_buildings,map_built_s, andmap_smodto capture thepd.DataFramereturned by their respective underlying mapping calls (map_nearest_pointsormap_zonal_stats) and then explicitly callself._update_view()with these results. - This enhances modularity and allows for more flexible result handling and accumulation.
- Changed
-
ZonalViewGenerator.map_rastersEnhancements:- Modified
map_rastersto acceptraster_dataas either a singleTifProcessoror aList[TifProcessor]. - Implemented internal merging of
List[TifProcessor]into a singleTifProcessorbefore performing zonal statistics. - Replaced
sample_multiple_tifs_by_polygonswith theTifProcessor.sample_by_polygonsmethod.
- Modified
Fixed¶
- TifProcessor:
to_graph()Sparse Matrix Creation:- Corrected the sparse matrix creation logic in
to_graph()to ensure proper symmetric graph representation whengraph_type="sparse".
- Corrected the sparse matrix creation logic in
- Coordinate System Handling in
_initializer_worker:- Ensured that
_initializer_workercorrectly handles different data storage scenarios to provide the correct dataset handle to worker processes, preventingRuntimeErrordue to uninitialized raster datasets.
- Ensured that
Removed¶
- OSMLocationFetcher
- Redundant Category Distribution Logging: Removed the explicit category distribution logging for
handle_duplicates == "separate"since thevalue_counts()method on the 'category' column already provides this.
- Redundant Category Distribution Logging: Removed the explicit category distribution logging for
[v0.7.0] - 2025-09-17¶
Added¶
-
TifProcessor Revamp
- Explicit Reprojection Method: Introduced
reproject_to()method, allowing on-demand reprojection of rasters to a new CRS with customizableresampling_methodandresolution. - Reprojection Resolution Control: Added
reprojection_resolutionparameter toTifProcessorfor precise control over output pixel size during reprojection. - Advanced Raster Information: Added
get_raster_info()method to retrieve a comprehensive dictionary of raster metadata. - Graph Conversion Capabilities: Implemented
to_graph()method to convert raster data into a graph (NetworkX or sparse matrix) based on pixel adjacency (4- or 8-connectivity). - Internal Refactoring:
_reproject_to_temp_file: Introduced_reproject_to_temp_fileas a helper for reprojection into temporary files.
- Explicit Reprojection Method: Introduced
-
H3 Grid Generation
- H3 Grid Generation Module (
gigaspatial/grid/h3.py):- Introduced
H3Hexagonsclass for managing H3 cell IDs. - Supports creation from lists of hexagons, geographic bounds, spatial geometries, or points.
- Provides methods to convert H3 hexagons to pandas DataFrames and GeoPandas GeoDataFrames.
- Includes functionalities for filtering, getting k-ring neighbors, compacting hexagons, and getting children/parents at different resolutions.
- Allows saving H3Hexagons to JSON, Parquet, or GeoJSON files.
- Introduced
- Country-Specific H3 Hexagons (
CountryH3Hexagons):- Extends
H3Hexagonsfor generating H3 grids constrained by country boundaries. - Integrates with
AdminBoundariesto fetch country geometries for precise H3 cell generation.
- Extends
- H3 Grid Generation Module (
-
Documentation
- Improved
tif.mdexample to showcase multi-raster initialization, explicit reprojection, and graph conversion.
- Improved
Changed¶
- TifProcessor
- Improved Temporary File Management: Refactored temporary file handling for merging and reprojection using
tempfile.mkdtemp()andshutil.rmtreefor more robust and reliable cleanup. Integrated with context manager (__enter__,__exit__) and added a dedicatedcleanup()method. - Reprojection during Initialization: Implemented automatic reprojection of single rasters to a specified
target_crsduringTifProcessorinitialization. - Enhanced
open_datasetContext Manager: Theopen_datasetcontext manager now intelligently opens the most up-to-date (merged or reprojected) version of the dataset. - More Flexible Multi-Dataset Validation: Modified
_validate_multiple_datasetsto issue a warning instead of raising an error for CRS mismatches whentarget_crsis not set. -
Optimized
_get_reprojection_profile: Dynamically calculates transform and dimensions based onreprojection_resolutionand added LZW compression to reprojected TIFF files to reduce file size. -
ADLSDataStore Enhancements
- New
copy_filemethod: Implemented a new method for copying individual files within ADLS, with an option to overwrite existing files. - New
renamemethod: Added a new method to rename (move) files in ADLS, which internally usescopy_fileand then deletes the source, with options for overwrite, waiting for copy completion, and polling. - Revamped
rmdirmethod: Modifiedrmdirto perform batch deletions of blobs, addressing the Azure Blob batch delete limit (256 sub-requests) and improving efficiency for large directories.
- New
-
LocalDataStore Enhancements
- New
copy_filemethod: Implemented a new method for copying individual files.
- New
Removed¶
- Removed deprecated
tabularproperty andget_zoned_geodataframemethod fromTifProcessor. Users should now useto_dataframe()andto_geodataframe()respectively.
Dependencies¶
- Added
networkxandh3as new dependencies.
Fixed¶
- Several small fixes and improvements to aggregation methods.
[v0.6.9] - 2025-07-26¶
Fixed¶
- Resolved a bug in the handler base class where non-hashable types (dicts) were incorrectly used as dictionary keys in
unit_to_pathmapping, preventing potential runtime errors during data availability checks.
[v0.6.8] - 2025-07-26¶
Added¶
- OSMLocationFetcher Enhancements
- Support for querying OSM locations by arbitrary administrative levels (e.g., states, provinces, cities), in addition to country-level queries.
- New optional parameters:
admin_level: Specify OSM administrative level (e.g., 4 for states, 6 for counties).admin_value: Name of the administrative area to query (e.g., "California").
-
New static method
get_admin_names(admin_level, country=None):- Fetch all administrative area names for a given
admin_level, optionally filtered by country. - Helps users discover valid admin area names for constructing precise queries.
- Fetch all administrative area names for a given
-
Multi-Raster Merging Support in TifProcessor
- Added ability to initialize
TifProcessorwith multiple raster datasets. - Merges rasters on load with configurable strategies:
- Supported
merge_methodoptions:first,last,min,max,mean.
- Supported
- Supports on-the-fly reprojection for rasters with differing coordinate reference systems via
target_crs. - Handles resampling using
resampling_method(default:nearest). - Comprehensive validation to ensure compatibility of input rasters (e.g., resolution, nodata, dtype).
- Temporary file management for merged output with automatic cleanup.
- Backward compatible with single-raster use cases.
New TifProcessor Parameters: - merge_method (default: first) – How to combine pixel values across rasters. - target_crs (optional) – CRS to reproject rasters before merging. - resampling_method – Resampling method for reprojection.
New Properties: - is_merged: Indicates whether the current instance represents merged rasters. - source_count: Number of raster datasets merged.
Changed¶
- OSMLocationFetcher Overpass Query Logic
- Refactored Overpass QL query builder to support subnational queries using
admin_levelandadmin_value. - Improved flexibility and precision for spatial data collection across different administrative hierarchies.
Breaking Changes¶
- None. All changes are fully backward compatible.
[v0.6.7] - 2025-07-16¶
Fixed¶
- Fixed a bug in WorldPopHandler/ADLSDataStore integration where a
Pathobject was passed instead of a string, causing aquote_from_bytes() expected byteserror during download.
[v0.6.6] - 2025-07-15¶
Added¶
AdminBoundaries.from_global_country_boundaries(scale="medium")- New class method to load global admin level 0 boundaries from Natural Earth.
-
Supports
"large"(10m),"medium"(50m), and"small"(110m) scale options. -
WorldPop Handler Refactor (API Integration)
- Introduced
WPPopulationHandler,WPPopulationConfig,WPPopulationDownloader, andWPPopulationReader. - Uses new
WorldPopRestClientto dynamically query the WorldPop REST API. - Replaces static metadata files and hardcoded logic with API-based discovery and download.
- Country code lookup and dataset filtering now handled at runtime.
-
Improved validation, extensibility, logging, and error handling.
-
POI-Based WorldPop Mapping
-
PoiViewGenerator.map_wp_pop()method:- Maps WorldPop population data around POIs using flexible spatial predicates:
"centroid_within","intersects","fractional"(1000m only),"within"- Supports configurable radius and resolution (100m or 1000m).
- Aggregates population data and appends it to the view.
-
Geometry-Based Zonal WorldPop Mapping
GeometryBasedZonalViewGenerator.map_wp_pop()method:- Maps WorldPop population data to polygons/zones using:
"intersects"or"fractional"predicate- Returns zonal population sums as a new view column.
- Handles predicate-dependent data loading (raster vs. GeoDataFrame).
Changed¶
- Refactored
BaseHandler.ensure_data_available - More efficient data check and download logic.
- Downloads only missing units unless
force_download=True. -
Cleaner structure and better reuse of
get_relevant_data_units(). -
Refactored WorldPop Module
- Complete handler redesign using API-based architecture.
- Dataset paths and URLs are now dynamically constructed from API metadata.
- Resolution/year validation is more robust and descriptive.
- Removed static constants, gender/school_age toggles, and local CSV dependency.
Fixed¶
- Several small fixes and improvements to zonal aggregation methods, especially around CRS consistency, missing values, and result alignment.
[v0.6.5] - 2025-07-01¶
Added¶
-
MercatorTiles.get_quadkeys_from_points()
New static method for efficient 1:1 point-to-quadkey mapping using coordinate-based logic, improving performance over spatial joins. -
AdminBoundariesViewGenerator
New generator class for producing zonal views based on administrative boundaries (e.g., districts, provinces) with flexible source and admin level support. -
Zonal View Generator Enhancements
_view: Internal attribute for accumulating mapped statistics.view: Exposes current state of zonal view.add_variable_to_view(): Adds mapped data frommap_points,map_polygons, ormap_rasterswith robust validation and zone alignment.-
to_dataframe()andto_geodataframe()methods added for exporting current view in tabular or spatial formats. -
PoiViewGeneratorEnhancements - Consistent
_viewDataFrame for storing mapped results. _update_view(): Central method to update POI data.save_view(): Improved format handling (CSV, Parquet, GeoJSON, etc.) with geometry recovery.to_dataframe()andto_geodataframe()methods added for convenient export of enriched POI view.-
Robust duplicate ID detection and CRS validation in
map_zonal_stats. -
TifProcessorEnhancements sample_by_polygons_batched(): Parallel polygon sampling.- Enhanced
sample_by_polygons()with nodata masking and multiple stats. -
warn_on_error: Flag to suppress sampling warnings. -
GeoTIFF Multi-Band Support
multimode added for multi-band raster support.- Auto-detects band names via metadata.
-
Strict validation of band count based on mode (
single,rgb,rgba,multi). -
Spatial Distance Graph Algorithm
build_distance_graph()added for fast KD-tree-based spatial matching.- Supports both
DataFrameandGeoDataFrameinputs. - Outputs a
networkx.Graphwith optional DataFrame of matches. -
Handles projections, self-match exclusion, and includes verbose stats/logs.
-
Database Integration (Experimental)
- Added
DBConnectionclass incore/io/database.pyfor unified Trino and PostgreSQL access. - Supports schema/table introspection, query execution, and reading into
pandasordask. - Handles connection creation, credential management, and diagnostics.
-
Utility methods for schema/view/table/column listings and parameterized queries.
-
GHSL Population Mapping
map_ghsl_pop()method added toGeometryBasedZonalViewGenerator.- Aggregates GHSL population rasters to user-defined zones.
- Supports
intersectsandfractionalpredicates (latter for 1000m resolution only). - Returns population statistics (e.g.,
sum) with customizable column prefix.
Changed¶
-
MercatorTiles.from_points()now internally usesget_quadkeys_from_points()for better performance. -
map_points()andmap_rasters()now returnDict[zone_id, value]to support direct usage withadd_variable_to_view(). -
Refactored
aggregate_polygons_to_zones() area_weighteddeprecated in favor ofpredicate.- Supports flexible predicates like
"within","fractional"for spatial aggregation. -
map_polygons()updated to reflect this change. -
Optional Admin Boundaries Configuration
ADMIN_BOUNDARIES_DATA_DIRis now optional.AdminBoundaries.create()only attempts to load if explicitly configured or path is provided.- Improved documentation and fallback behavior for missing configs.
Fixed¶
- GHSL Downloader
- ZIP files are now downloaded into a temporary cache directory using
requests.get(). -
Avoids unnecessary writes and ensures cleanup.
-
TifProcessor - Removed polygon sampling warnings unless explicitly enabled.
Deprecated¶
TifProcessor.tabular→ useto_dataframe()instead.TifProcessor.get_zoned_geodataframe()→ useto_geodataframe()instead.area_weighted→ usepredicatein aggregation methods instead.
[v0.6.4] - 2025-06-19¶
Added¶
- GigaSchoolProfileFetcher
- New class to fetch and process school profile data from the Giga School Profile API
- Supports paginated fetching, filtering by country and school ID
-
Includes methods to generate connectivity summary statistics by region, connection type, and source
-
GigaSchoolMeasurementsFetcher
- New class to fetch and process daily real-time connectivity measurements from the Giga API
- Supports filtering by date range and school
-
Includes performance summary generation (download/upload speeds, latency, quality flags)
-
AdminBoundaries.from_geoboundaries
- New class method to download and process geoBoundaries data by country and admin level
-
Automatically handles HDX dataset discovery, downloading, and fallback logic
-
HDXConfig.search_datasets
- Static method to search HDX datasets without full handler initialization
- Supports query string, sort order, result count, HDX site selection, and custom user agent
Fixed¶
- Typo in
MaxarImageDownloadercausing runtime error
Documentation¶
- Improved Configuration Guide (
docs/user-guide/configuration.md) - Added comprehensive table of environment variables with defaults and descriptions
- Synced
.env_sampleandconfig.pywith docs - Example
.envfile and guidance on path overrides usingconfig.set_path - New section on
config.ensure_directories_existand troubleshooting tips - Clearer handling of credentials and security notes
- Improved formatting and structure for clarity
[v0.6.3] - 2025-06-16¶
Added¶
- Major refactor of
HDXmodule to align with unifiedBaseHandlerarchitecture: HDXConfig: fully aligned withBaseHandlerConfigstructure.- Added flexible pattern matching for resource filtering.
- Improved data unit resolution by country, geometry, and points.
- Enhanced resource filtering with exact and regex options.
HDXDownloaderfully aligned withBaseHandlerDownloader:- Simplified sequential download logic.
- Improved error handling, validation, and logging.
HDXReaderfully aligned withBaseHandlerReader:- Added
resolve_source_pathsandload_all_resourcesmethods. - Simplified source handling for single and multiple files.
- Cleaned up redundant and dataset-specific logic.
-
Introduced
HDXHandleras unified orchestration layer using factory methods. -
Refactor of
RelativeWealthIndex (RWI)module: - Added new
RWIHandlerclass aligned withHDXHandlerandBaseHandler. - Simplified class names:
RWIDownloaderandRWIReader. - Enhanced configuration with
latest_onlyflag to select newest resources automatically. - Simplified resource filtering and country resolution logic.
-
Improved code maintainability, type hints, and error handling.
-
New raster multi-band support in TifProcessor:
- Added new
multimode for handling multi-band raster datasets. - Automatic band name detection from raster metadata.
- Added strict mode validation (
single,rgb,rgba,multi). - Enhanced error handling for invalid modes and band counts.
Fixed¶
- Fixed GHSL tiles loading behavior for correct coordinate system handling:
- Moved
TILES_URLformatting and tile loading tovalidate_configuration. - Ensures proper tile loading after CRS validation.
Documentation¶
- Updated and standardized API references across documentation.
- Standardized handler method names and usage examples.
- Added building enrichment examples for POI processing.
- Updated installation instructions.
Deprecated¶
- Deprecated direct imports from individual handler modules.
[v0.6.2] - 2025-06-11¶
Added¶
- New
ROOT_DATA_DIRconfiguration option to set a base directory for all data tiers - Can be configured via environment variable
ROOT_DATA_DIRor.envfile - Defaults to current directory (
.) if not specified - All tier data paths (bronze, silver, gold, views) are now constructed relative to this root directory
- Example: Setting
ROOT_DATA_DIR=/data/gigaspatialwill store all data under/data/gigaspatial/bronze,/data/gigaspatial/silver, etc.
Fixed¶
- Fixed URL formatting in GHSL tiles by using Enum value instead of Enum member
- Ensures consistent URL formatting with numeric values (4326) instead of Enum names (WGS84)
-
Fixes URL formatting issue across different Python environments
-
Refactored GHSL downloader to follow DataStore abstraction
- Directory creation is now handled by DataStore implementation
- Removed redundant directory creation logic from download_data_unit method
- Improves separation of concerns and makes the code more maintainable
[v0.6.1] - 2025-06-09¶
Fixed¶
- Gracefully handle missing or invalid GeoRepo API key in
AdminBoundaries.create(): - Wrapped
GeoRepoClientinitialization in atry-exceptblock - Added fallback to GADM if GeoRepo client fails
- Improved logging for better debugging and transparency
[v0.6.0] - 2025-06-09¶
Added¶
POI View Generator¶
map_zonal_stats: New method for enriched spatial mapping with support for:- Raster point sampling (value at POI location)
- Raster zonal statistics (with buffer zone)
- Polygon aggregation (with optional area-weighted averaging)
- Auto-generated POI IDs in
_init_points_gdffor consistent point tracking. - Support for area-weighted aggregation for polygon-based statistics.
BaseHandler Orchestration Layer¶
- New abstract
BaseHandlerclass providing unified lifecycle orchestration for config, downloader, and reader. - High-level interface methods:
ensure_data_available()load_data()download_and_load()get_available_data_info()- Integrated factory pattern for safe and standardized component creation.
- Built-in context manager support for resource cleanup.
- Fully backwards compatible with existing handler architecture.
Handlers Updated to Use BaseHandler¶
GoogleOpenBuildingsHandlerMicrosoftBuildingsHandlerGHSLDataHandler- All now inherit from
BaseHandler, supporting standardized behavior and cleaner APIs.
Changed¶
POI View Generator¶
map_built_sandmap_smodnow internally use the newmap_zonal_statsmethod.tif_processorsrenamed todatato support both raster and polygon inputs.- Removed parameters:
id_column(now handled internally)area_column(now automatically calculated)
Internals and Usability¶
- Improved error handling with clearer validation messages.
- Enhanced logging for better visibility during enrichment.
- More consistent use of coordinate column naming.
- Refined type hints and parameter documentation across key methods.
Notes¶
- Removed legacy POI generator classes and redundant
poi.pyfile. - Simplified imports and removed unused handler dependencies.
- All POI generator methods now include updated docstrings, parameter explanations, and usage examples.
- Added docs on the new
BaseHandlerinterface and handler refactors.
[v0.5.0] - 2025-06-02¶
Changed¶
- Refactored data loading architecture:
- Introduced dedicated reader classes for major datasets (Microsoft Global Buildings, Google Open Buildings, GHSL), each inheriting from a new
BaseHandlerReader. - Centralized file existence checks and raster/tabular loading methods in
BaseHandlerReader. -
Improved maintainability by encapsulating dataset-specific logic inside each reader class.
-
Modularized source resolution:
-
Each reader now supports resolving data by country, geometry, or individual points, improving code reuse and flexibility.
-
Unified POI enrichment:
- Merged all POI generators (Google Open Buildings, Microsoft Global Buildings, GHSL Built Surface, GHSL SMOD) into a single
PoiViewGeneratorclass. - Supports flexible inputs: list of
(lat, lon)tuples, list of dicts, DataFrame, or GeoDataFrame. - Maintains consistent internal state via
points_gdf, updated after each mapping. -
Enables chained enrichment of POI data using multiple datasets.
-
Modernized internal data access:
- All data loading now uses dedicated handler/reader classes, improving consistency and long-term maintainability.
Fixed¶
- Full DataStore integration:
- Fixed
OpenCellIDandHDXhandlers to fully support theDataStoreabstraction. - All file reads, writes, and checks now use the configured
DataStore(local or cloud). - Temporary files are only used during downloads; final data is always stored and accessed via the DataStore interface.
Removed¶
- Removed deprecated POI generator classes and the now-obsolete poi submodule. All enrichment is handled through the unified
PoiViewGenerator.
Notes¶
- This release finalizes the architectural refactors started in
v0.5.0. - While marked stable, please report any issues or regressions from the new modular structure.
[v0.5.0b1] - 2025-05-27¶
Added¶
- New Handlers:
hdx.py: Handler for downloading and managing Humanitarian Data Exchange datasets.rwi.py: Handler for the Relative Wealth Index dataset.opencellid.py: Handler for OpenCellID tower locations.unicef_georepo.py: Integration with UNICEF’s GeoRepo asset repository.- Zonal Generators:
- Introduced the
generators/zonal/module to support spatial aggregations of various data types (points, polygons, rasters) to zonal geometries such as grid tiles or catchment areas. - New Geo-Processing Methods:
- Added methods to compute centroids of (Multi)Polygon geometries.
- Added methods to calculate area of (Multi)Polygon geometries in square meters.
Changed¶
- Refactored:
config.py: Added support for new environment variables (OpenCellID and UNICEF GeoRepo keys).geo.py: Enhanced spatial join functions for improved performance and clarity.handlers/:- Minor robustness improvements in
google_open_buildingsandmicrosoft_global_buildings. - Added a new class method in
boundariesfor initializing admin boundaries from UNICEF GeoRepo.
- Minor robustness improvements in
core/io/:- Added
list_directoriesmethod to both ADLS and local storage backends.
- Added
- Documentation & Project Structure:
- Updated
.env_sampleand.gitignoreto align with new environment variables and data handling practices.
Dependencies¶
- Updated
requirements.txtandsetup.pyto reflect new dependencies and ensure compatibility.
Notes¶
- This is a pre-release (
v0.5.0b1) and is intended for testing and feedback. - Some new modules, especially in
handlersandgenerators, are experimental and may be refined in upcoming releases.
[v0.4.1] - 2025-04-17¶
Added¶
- Documentation:
- Added API Reference documentation for all modules, classes, and functions.
- Added a Configuration Guide to explain how to set up paths, API keys, and other.
- TifProcessor: added new to_dataframe method.
- config: added set_path method for dynamic path management.
Changed¶
- Documentation:
- Restructured the
docs/directory to improve organization and navigation. - Updated the
index.mdfor the User Guide to provide a clear overview of available documentation. - Updated Examples for downloading, processing, and storing geospatial data - more to come.
- Restructured the
- README:
- Updated the README with a clear description of the package’s purpose and key features.
- Added a section on View Generators to explain spatial context enrichment and mapping to grid or POI locations.
- Included a Supported Datasets section with an image of dataset provider logos.
Fixed¶
- Handled errors when processing nodes, relations, and ways in OSMLocationFetcher.
- Made
admin1andadmin1_id_gigaoptional in GigaEntity instances for countries with no admin level 1 divisions.
[v0.4.0] - 2025-04-01¶
Added¶
- POI View Generators: Introduced a new module, generators, containing a base class for POI view generation.
- Expanded POI Support: Added new classes for generating POI views from:
- Google Open Buildings
- Microsoft Global Buildings
- GHSL Settlement Model
- GHSL Built Surface
- New Reader: Added read_gzipped_json_or_csv to handle compressed JSON/CSV files.
Changed¶
- ADLSDataStore Enhancements: Updated methods to match LocalDataStore for improved consistency.
- Geo Processing Updates:
- Improved convert_to_dataframe for more efficient data conversion.
- Enhanced annotate_with_admin_regions to improve spatial joins.
- New TifProcessor Methods:
- sample_by_polygons for polygon-based raster sampling.
- sample_multiple_tifs_by_coordinates & sample_multiple_tifs_by_polygons to manage multi-raster sampling.
- Fixed Global Config Handling: Resolved issues with handling configurations inside classes.
[v0.3.2] - 2025-03-21¶
Added¶
- Added a method to efficiently assign unique IDs to features.
Changed¶
- Enhanced logging for better debugging and clarity.
Fixed¶
- Minor bug fix in config.py
[0.3.1] - 2025-03-20¶
Added¶
- Enhanced AdminBoundaries handler with improved error handling for cases where administrative level data is unavailable for a country.
- Added pyproject.toml and setup.py, enabling pip install support for the package.
- Introduced a new method annotate_with_admin_regions in geo.py to perform spatial joins between input points and administrative boundaries (levels 1 and 2), handling conflicts where points intersect multiple admin regions.
Removed¶
- Removed the utils module containing logger.py and integrated LOG_FORMAT and get_logger into config.py for a more streamlined logging approach.
[0.3.0] - 2025-03-18¶
Added¶
- Compression support in readers for improved efficiency
- New GHSL data handler to manage GHSL dataset downloads
Fixed¶
- Small fixes/improvements in Microsoft Buildings, Maxar, and Overture handlers
[v0.2.2] - 2025-03-12¶
-
Refactored Handlers: Improved structure and performance of maxar_image.py, osm.py and overture.py to enhance geospatial data handling.
-
Documentation Improvements:
- Updated index.md, advanced.md, and use-cases.md for better clarity.
- Added installation.md under docs/getting-started for setup guidance.
- Refined API documentation in docs/api/index.md.
-
Configuration & Setup Enhancements: • Improved .gitignore to exclude unnecessary files. • Updated mkdocs.yml for better documentation structuring.
- Bug Fixes & Minor Optimizations: Small fixes and improvements across the codebase for stability and maintainability.
[v0.2.1] - 2025-02-28¶
Added¶
- Introduced WorldPopDownloader feature to handlers
- Refactored TifProcessor class for better performance
Fixed¶
- Minor bug fixes and performance improvements
[v0.2.0] - MaxarImageDownloader & Bug Fixes - 2025-02-24¶
- New Handler: MaxarImageDownloader for downloading Maxar images.
- Bug Fixes: Various improvements and bug fixes.
- Enhancements: Minor optimizations in handlers.
[v0.1.1] - 2025-02-24¶
Added¶
- Local Data Store: Introduced a new local data store alongside ADLS to improve data storage and read/write functionality.
- Boundaries Handler: Added
boundaries.py, a new handler that allows to read administrative boundaries from GADM.
Changed¶
- Handler Refactoring: Refactored existing handlers to improve modularity and data handling.
- Configuration Management: Added
config.pyto manage paths, runtime settings, and environment variables.
Removed¶
- Administrative Schema: Removed
administrative.pysince its functionality is now handled by theboundarieshandler. - Globals Module: Removed
globals.pyand replaced it withconfig.pyfor better configuration management.
Updated Files¶
config.pyboundaries.pygoogle_open_buildings.pymapbox_image.pymicrosoft_global_buildings.pyookla_speedtest.pymercator_tiles.pyadls_data_store.pydata_store.pylocal_data_store.pyreaders.pywriters.pyentity.py
[v0.1.0] - 2025-02-07¶
Added¶
- New data handlers:
google_open_buildings.py,microsoft_global_buildings.py,overture.py,mapbox_image.py,osm.py - Processing functions in
tif_processor.py,geo.pyandtransform.py - Grid generation modules:
h3_tiles.py,mercator_tiles.py - View managers:
grid_view.pyandnational_view.py - Schemas:
administrative.py
Changed¶
- Updated
requirements.txtwith new dependencies - Improved logging and data storage mechanisms
Removed¶
- Deprecated views:
h3_view.py,mercator_view.py