Skip to content

feat: extensible transforms pipeline for zarr build#86

Closed
turban wants to merge 51 commits into
restore/pypi-releasefrom
restore/transforms-pipeline
Closed

feat: extensible transforms pipeline for zarr build#86
turban wants to merge 51 commits into
restore/pypi-releasefrom
restore/transforms-pipeline

Conversation

@turban
Copy link
Copy Markdown
Contributor

@turban turban commented May 9, 2026

Closes #79.

Summary

  • Replaces the hardcoded _UNIT_CONVERSIONS dict and pre_process list with a single transforms pipeline in the dataset YAML
  • Each entry is a dotted-path callable (string or {function, params} dict), resolved at runtime the same way ingestion.function works
  • Adds climate_api/transforms/ with two built-in transforms: convert_units and deaccumulate_era5
  • Updates era5_land.yaml to use transforms: for both temperature and precipitation datasets

Usage

transforms:
  - climate_api.transforms.deaccumulate_era5
  - climate_api.transforms.convert_units

External transforms from dhis2eo or any other package can be referenced by dotted path without changes to core code.

Test plan

  • uv run pytest tests/test_transforms.py — 12 new tests covering unit conversion, deaccumulation, pipeline execution, and edge cases
  • uv run pytest — full suite passes

Stacked on #85. Restores PR #80, which was accidentally merged then reverted.

turban added 30 commits May 8, 2026 00:40
Add a display block to each built-in dataset template (colormap, value
range, nodata) and surface it through the STAC Render extension on every
published collection. Add a /maps endpoint that serves a single-page map
viewer: it reads the STAC catalog to list available datasets, loads the
Render and Datacube metadata to configure a ZarrLayer (MapLibre +
@carbonplan/zarr-layer), and builds a time slider from cube:dimensions.
No tile server or build step required — the browser reads the Zarr store
directly via the existing /zarr HTTP range endpoint.

Closes #66
Implements issue #72. Adds a server-rendered management page at GET /manage
that lets operators ingest and sync datasets without needing to know API
endpoint details or dataset template IDs.

- GET /manage renders a Jinja2 page with an ingest form (template dropdown,
  start/end dates, extent pre-filled) and a status table with per-dataset
  Sync buttons; flash messages show success or error after each operation
- POST /manage/ingest handles the ingest form and redirects back to /manage
- POST /manage/sync handles the sync form and redirects back to /manage
- Landing page gains an "Available dataset templates" card listing all
  registered templates and a Manage link in the Explore section
Pass opacity: 0.75 to ZarrLayer so the basemap shows through the data
layer. Also wire renders.nodata from the STAC collection into ZarrLayer's
fillValue so dataset-specific nodata pixels render as transparent.
Shows a gradient bar with min/max labels and units when a dataset is
selected. Legend is built from the same colormap and clim range used
by the data layer, sourced from the STAC renders block.
Replaces the OSM raster style with OpenFreeMap's positron vector style.
The data layer is inserted before the first symbol layer so country
borders, road labels, and place names always render on top of the
climate data.
Replaces the OSM raster style with OpenFreeMap's positron vector style.
The data layer is inserted before the first symbol layer so country
borders, road labels, and place names always render on top of the
climate data.
turban added 21 commits May 9, 2026 15:35
feat: map viewer at /maps with STAC-backed display metadata
Separate deployment instances (e.g. Norway vs Sierra Leone) sharing
the same DOWNLOAD_DIR would silently reuse each other's NetCDF/Zarr
cache files because the prefix was keyed only on dataset id. Add an
optional extent_id suffix so each extent gets its own cache namespace.

Validate bbox against a dataset's declared coverage field before
downloading, returning HTTP 400 early instead of a confusing
provider-level error. Add coverage: {lat: [-50, 50]} to chirps3.yaml
since CHIRPS3 does not cover latitudes above 50°N (e.g. Norway).
Aligns dataset YAML schema with OGC API Collections by replacing the
custom coverage.lat/lon block with extents.spatial.bbox (OGC [xmin,
ymin, xmax, ymax] format) and adding extents.temporal with begin, end,
trs, and resolution fields.

_validate_spatial_coverage now reads extents.spatial.bbox directly,
which covers both axes in one check without separate lat/lon keys.
All three dataset templates receive extents blocks.
Each configured instance must now declare data_dir in climate-api.yaml.
The API raises a clear error at startup if a config file is present but
data_dir is not set, rather than silently falling back to a shared XDG
directory that another instance might also use.

Resolution order for the data directory:
1. CACHE_OVERRIDE env var — preserved for Docker/CI backward compat
2. data_dir from CLIMATE_API_CONFIG — required when config is present
3. XDG default — only used when no config file is configured

Extent_id remains in cache filenames to support future multi-extent
configurations within a single instance.
…lback

Data directory resolution now uses data_dir from climate-api.yaml (required
when a config file is present) with a clean XDG fallback. The legacy
CACHE_OVERRIDE environment variable is gone from all resolver functions,
tests, and .env.example.
Clarifies the distinction from data_dir (runtime storage) — templates_dir
points at user-supplied YAML templates and will cover both dataset and
processing templates going forward.
templates_dir now acts as a root directory. Dataset templates go in
templates_dir/datasets/, leaving room for processing/ and other template
types alongside it without structural changes.
- Validate request bbox against extents using the env fallback (DOWNLOAD_BBOX)
  when no explicit bbox is provided, so coverage checks apply to all request paths
- Guard against malformed template extents.spatial.bbox (non-list or wrong
  length) to avoid a 500 on user-supplied templates
- Update get_data_dir() docstring to accurately describe the None-on-missing-file
  behaviour introduced for CI safety
…e-validation

fix: scope cache files by extent_id and validate spatial coverage
@turban
Copy link
Copy Markdown
Contributor Author

turban commented May 9, 2026

Included in #87

@turban turban closed this May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant