Processing Raster Files¶
This example demonstrates how to process raster files using the TifProcessor
class.
Prerequisites¶
Ensure you have installed the gigaspatial
package and set up the necessary configuration. Follow the Installation Guide if you haven’t already.
Example Code¶
from gigaspatial.processing import TifProcessor
from gigaspatial.core.io import LocalDataStore
from rasterio.warp import Resampling # For reprojection methods
# NOTE: For these examples, replace "/path/to/your/file.tif" with actual paths to your GeoTIFF files.
# You might need to create dummy files or use existing ones for local testing.
# 1. Initialize with a single TIFF file
print("--- Single TIFF File Processing ---")
single_processor = TifProcessor(
"/path/to/single_band.tif",
mode="single" # Can be "rgb", "rgba", "multi"
)
df_single = single_processor.to_dataframe()
print("Single-band DataFrame head:")
print(df_single.head())
print("Raster Info for single_processor:")
print(single_processor.get_raster_info())
# 2. Initialize with multiple TIFF files for merging and reprojection
print("\n--- Multi-raster Merging and Reprojection ---")
# Replace with actual paths to your tif files. Ensure they are compatible for merging.
# Example: two adjacent tiles from a dataset.
tif_paths = [
"/path/to/raster1.tif",
"/path/to/raster2.tif"
]
merged_reprojected_processor = TifProcessor(
dataset_path=tif_paths,
mode="single", # Or "multi", "rgb", "rgba" depending on your data
merge_method="mean", # Options: "first", "last", "min", "max", "mean"
target_crs="EPSG:4326", # Reproject all rasters to WGS84 during initialization
)
df_merged = merged_reprojected_processor.to_dataframe()
print("Merged and Reprojected DataFrame head:")
print(df_merged.head())
print("Raster Info for merged_reprojected_processor:")
print(merged_reprojected_processor.get_raster_info())
# 3. Explicit Reprojection after initialization
print("\n--- Explicit Reprojection ---")
# Reproject the current raster (e.g., the merged one) to a different CRS or resolution
# In a real scenario, you'd save this to a persistent location.
reprojected_output_path = "./temp_reprojected_raster.tif"
reprojected_path = merged_reprojected_processor.reproject_to(
target_crs="EPSG:3857", # Web Mercator
output_path=reprojected_output_path,
resampling_method=Resampling.bilinear # Different resampling method
)
print(f"Raster reprojected to: {reprojected_path}")
# 4. Convert raster to a graph (NetworkX example)
print("\n--- Raster to Graph Conversion ---")
# Assuming '/path/to/single_band.tif' is a suitable single-band raster
graph_processor = TifProcessor(
"/path/to/single_band.tif",
mode="single" # Graph conversion typically for single-band data
)
graph = graph_processor.to_graph(
connectivity=8, # 4-connectivity (von Neumann) or 8-connectivity (Moore)
include_coordinates=True, # Include 'x' and 'y' coordinates as node attributes
graph_type="networkx" # Or "sparse" for scipy.sparse.csr_matrix
)
print(f"Generated a NetworkX graph with {graph.number_of_nodes()} nodes and {graph.number_of_edges()} edges.")
# Example: Access node attributes (first 5 nodes)
# for node_id, data in list(graph.nodes(data=True))[:5]:
# print(f"Node {node_id}: Value={data['value']:.2f}, X={data.get('x'):.2f}, Y={data.get('y'):.2f}")
Explanation¶
The TifProcessor
class provides robust functionalities for handling GeoTIFF files, from single-band to multi-band (RGB, RGBA) datasets, with advanced processing capabilities including:
- Initialization:
- Can be initialized with a single GeoTIFF file path.
- Supports a list of GeoTIFF file paths for automatic merging during initialization, configured via
merge_method
(first
,last
,min
,max
,mean
). - The
mode
parameter (single
,rgb
,rgba
,multi
) dictates how bands are interpreted and validated. target_crs
andreprojection_resolution
can be set during initialization to reproject rasters immediately to a consistent CRS and pixel size.
- Data Extraction:
to_dataframe()
: Converts raster data into a pandas DataFrame, with columns for longitude, latitude, and pixel values (or band-specific values for multi-band modes).to_geodataframe()
: Extendsto_dataframe()
by adding ageometry
column, converting each pixel into a GeoDataFrame representing its bounding box, with the correct CRS.
- Reprojection (
reproject_to
):- Allows explicit reprojection of the current raster to a new Coordinate Reference System (CRS) and/or resolution, saving the output to a specified path or a temporary file.
- Supports different
resampling_method
options (e.g.,Resampling.nearest
,Resampling.bilinear
).
- Raster Information (
get_raster_info
):- Provides a dictionary containing comprehensive metadata about the raster, such as band count, dimensions, CRS, bounds, transform, data types, nodata values, processing mode, and merge status.
- Graph Conversion (
to_graph
):- Converts raster data into a graph (NetworkX graph or SciPy sparse matrix) based on pixel adjacency.
- Supports
connectivity
of 4 (von Neumann neighborhood) or 8 (Moore neighborhood). - Can include geographic coordinates and pixel values as node attributes.
- Sampling:
sample_by_coordinates()
: Extracts pixel values at specific geographic coordinates.sample_by_polygons()
: Computes aggregate statistics (e.g., mean, sum, min, max, count) of pixel values within given polygon boundaries, supporting single or multiple statistics.sample_by_polygons_batched()
: Provides a parallelized version of polygon sampling for performance-intensive tasks.
Multi-raster reprojection¶
The differences in the reprojected metadata are expected and are a direct result of the order of operations: reproject then merge versus merge then reproject. The two processes follow different steps, leading to variations in the final raster's dimensions, bounds, and resolution.
Reproject then Merge¶
When you specify target_crs
at initialization, the code first reprojects each individual raster to the target CRS (EPSG:4326
) and then merges the reprojected outputs.
- Step 1: Reprojection: Each input raster is reprojected from
ESRI:54009
toEPSG:4326
. During this step,rasterio
'scalculate_default_transform
function computes a new transform and pixel dimensions (width
,height
) for each raster. The reprojected rasters are now in the same CRS with a consistent resolution (e.g.,0.00918...
degrees). - Step 2: Merging: The reprojected rasters, which are now in the same CRS and have similar resolutions, are merged. The
rasterio.merge
function can combine these aligned rasters seamlessly. The final output's dimensions are calculated by finding the union of all reprojected rasters' bounds and applying the shared resolution, resulting in a single, larger raster.
This process ensures a uniform resolution and grid alignment across all parts of the final merged raster.
Merge then Reproject¶
When target_crs
is not specified at initialization, the code first merges the two rasters in their original ESRI:54009
CRS and then reprojects the single, merged output to EPSG:4326
.
- Step 1: Merging: The two rasters are merged in
ESRI:54009
. Since they are in the same CRS and have the same resolution (1000.0
meters),rasterio.merge
can simply combine them side-by-side. The original raster was1000x1000
, so merging a second one next to it likely creates a2000x1000
raster, as seen in the metadata. The resolution remains1000.0
meters. - Step 2: Reprojection: The single
2000x1000
raster is then reprojected toEPSG:4326
. A new transform and pixel dimensions are calculated for this single, larger raster. Sincecalculate_default_transform
is working on a different-shaped input, it will calculate a different output resolution and grid shape. The resulting resolution (0.00973...
) and dimensions (2076x832
) will be different because the reprojection is performed on a single, larger input rather than two smaller ones.
Why the Metadata is Different¶
- Resolution: The
reproject-then-merge
approach maintains a consistent resolution that is calculated for a single tile and then applied to all. Themerge-then-reproject
approach calculates a single resolution for the entire, larger combined area. The process of resampling to a new grid (a core part of reprojection) is inherently sensitive to the input's size and shape. - Dimensions (
width
,height
): The final pixel dimensions are a function of the total bounds and the final resolution. Since the resolution is different in the two methods, the width and height must also be different to cover the same geographic area. - Bounds: The final bounds are nearly identical in latitude and longitude, which makes sense because both methods represent the same geographic area. Any slight differences are due to rounding and the nuances of resampling.
Conclusion: The differences are normal and reflect the non-commutative nature of these two geospatial operations. The reproject then merge approach is generally preferable as it ensures greater consistency and can be more accurate when dealing with rasters that have slightly different resolutions or alignments, as it creates a single, clean grid before combining the data.