Skip to content

GeoZarr: optimal chunking and multiscales strategy for web tiles and analysis #45

@turban

Description

@turban

Background

GeoZarr datasets need a well-considered chunking and multiscales strategy that serves two primary use cases:

  1. Web map tiles — small, spatially aligned reads; benefits from smaller chunks and pre-computed overview levels
  2. Analysis — larger contiguous reads across time or space; benefits from larger chunks

Reference discussion in the GeoZarr spec: zarr-developers/geozarr-spec#117

Chunking / sharding

  • Tile-optimised chunk sizes (e.g. 256×256) create many small reads for analysis workloads
  • Analysis-optimised sizes (e.g. 1024×1024+) are inefficient for tile rendering
  • Proposal: use 512×512 spatial chunks as a pragmatic compromise — aligns well with common tile sizes while keeping chunk count manageable for analysis reads
  • Evaluate whether Zarr v3 sharding (inner/outer chunk split) could let us satisfy both use cases from a single store without duplication

Questions to settle:

  • Is 512×512 the right default, or should it be determined per-dataset based on grid resolution and typical query patterns?
  • Should we chunk along the time axis, and if so at what granularity?

Multiscales (overview pyramids)

Multiscales add storage and ingestion overhead and are only meaningful for web map rendering. They should not be generated unconditionally.

Proposal: decide whether to generate multiscales, and how many levels, based on the spatial extent of the ingested grid:

  • If the native grid is small enough to render at full resolution at zoom level N, skip multiscales entirely
  • Otherwise, generate only as many levels as needed to reach a sensible "fit-in-one-tile" zoom level
  • A concrete heuristic: number of levels ≈ ceil(log2(max(x_dim, y_dim) / 512)) — i.e. how many halvings are needed before the dataset fits in a single 512×512 tile

This avoids storing 6–8 pyramid levels for small regional datasets that will never be viewed at global zoom.

Open questions

  • Agree on default spatial chunk size (512×512 proposed)
  • Define the multiscales decision rule and level-count formula
  • Decide whether sharding (Zarr v3) is in scope now or a future optimisation
  • Document the chosen strategy in the ingestion pipeline so it is applied consistently

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions