Background
GeoZarr datasets need a well-considered chunking and multiscales strategy that serves two primary use cases:
- Web map tiles — small, spatially aligned reads; benefits from smaller chunks and pre-computed overview levels
- Analysis — larger contiguous reads across time or space; benefits from larger chunks
Reference discussion in the GeoZarr spec: zarr-developers/geozarr-spec#117
Chunking / sharding
- Tile-optimised chunk sizes (e.g. 256×256) create many small reads for analysis workloads
- Analysis-optimised sizes (e.g. 1024×1024+) are inefficient for tile rendering
- Proposal: use 512×512 spatial chunks as a pragmatic compromise — aligns well with common tile sizes while keeping chunk count manageable for analysis reads
- Evaluate whether Zarr v3 sharding (inner/outer chunk split) could let us satisfy both use cases from a single store without duplication
Questions to settle:
- Is 512×512 the right default, or should it be determined per-dataset based on grid resolution and typical query patterns?
- Should we chunk along the time axis, and if so at what granularity?
Multiscales (overview pyramids)
Multiscales add storage and ingestion overhead and are only meaningful for web map rendering. They should not be generated unconditionally.
Proposal: decide whether to generate multiscales, and how many levels, based on the spatial extent of the ingested grid:
- If the native grid is small enough to render at full resolution at zoom level N, skip multiscales entirely
- Otherwise, generate only as many levels as needed to reach a sensible "fit-in-one-tile" zoom level
- A concrete heuristic: number of levels ≈
ceil(log2(max(x_dim, y_dim) / 512)) — i.e. how many halvings are needed before the dataset fits in a single 512×512 tile
This avoids storing 6–8 pyramid levels for small regional datasets that will never be viewed at global zoom.
Open questions
Background
GeoZarr datasets need a well-considered chunking and multiscales strategy that serves two primary use cases:
Reference discussion in the GeoZarr spec: zarr-developers/geozarr-spec#117
Chunking / sharding
Questions to settle:
Multiscales (overview pyramids)
Multiscales add storage and ingestion overhead and are only meaningful for web map rendering. They should not be generated unconditionally.
Proposal: decide whether to generate multiscales, and how many levels, based on the spatial extent of the ingested grid:
ceil(log2(max(x_dim, y_dim) / 512))— i.e. how many halvings are needed before the dataset fits in a single 512×512 tileThis avoids storing 6–8 pyramid levels for small regional datasets that will never be viewed at global zoom.
Open questions