fix: scope cache files by extent_id and validate spatial coverage#88
Merged
Conversation
Separate deployment instances (e.g. Norway vs Sierra Leone) sharing
the same DOWNLOAD_DIR would silently reuse each other's NetCDF/Zarr
cache files because the prefix was keyed only on dataset id. Add an
optional extent_id suffix so each extent gets its own cache namespace.
Validate bbox against a dataset's declared coverage field before
downloading, returning HTTP 400 early instead of a confusing
provider-level error. Add coverage: {lat: [-50, 50]} to chirps3.yaml
since CHIRPS3 does not cover latitudes above 50°N (e.g. Norway).
Aligns dataset YAML schema with OGC API Collections by replacing the custom coverage.lat/lon block with extents.spatial.bbox (OGC [xmin, ymin, xmax, ymax] format) and adding extents.temporal with begin, end, trs, and resolution fields. _validate_spatial_coverage now reads extents.spatial.bbox directly, which covers both axes in one check without separate lat/lon keys. All three dataset templates receive extents blocks.
Each configured instance must now declare data_dir in climate-api.yaml. The API raises a clear error at startup if a config file is present but data_dir is not set, rather than silently falling back to a shared XDG directory that another instance might also use. Resolution order for the data directory: 1. CACHE_OVERRIDE env var — preserved for Docker/CI backward compat 2. data_dir from CLIMATE_API_CONFIG — required when config is present 3. XDG default — only used when no config file is configured Extent_id remains in cache filenames to support future multi-extent configurations within a single instance.
…lback Data directory resolution now uses data_dir from climate-api.yaml (required when a config file is present) with a clean XDG fallback. The legacy CACHE_OVERRIDE environment variable is gone from all resolver functions, tests, and .env.example.
Clarifies the distinction from data_dir (runtime storage) — templates_dir points at user-supplied YAML templates and will cover both dataset and processing templates going forward.
templates_dir now acts as a root directory. Dataset templates go in templates_dir/datasets/, leaving room for processing/ and other template types alongside it without structural changes.
Contributor
There was a problem hiding this comment.
Pull request overview
Updates Climate API instance isolation by requiring an explicit per-deployment data_dir, removes the legacy CACHE_OVERRIDE env var, and adds dataset-template extents metadata to validate request bounding boxes before calling upstream providers.
Changes:
- Add
data_dirconfig resolution (required when a config file exists) and migrate download/artifact/pygeoapi directories to be scoped under it (with XDG fallback when no config is present). - Rename
datasets_dir→templates_dirand change custom template loading totemplates_dir/datasets/. - Introduce dataset template
extentsand validate ingestion request bboxes againstextents.spatial.bbox(HTTP 400 when fully outside).
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/climate_api/config.py |
Adds get_data_dir() to resolve required instance-scoped data directory. |
src/climate_api/data_manager/services/downloader.py |
Uses data_dir for downloads and adds spatial coverage validation based on template extents. |
src/climate_api/ingestions/services.py |
Moves artifacts directory under data_dir (XDG fallback otherwise). |
src/climate_api/publications/services.py |
Moves pygeoapi working directory under data_dir (XDG fallback otherwise). |
src/climate_api/data_registry/services/datasets.py |
Renames config key to templates_dir and loads custom datasets from templates_dir/datasets/. |
src/climate_api/data/datasets/chirps3.yaml |
Adds OGC-aligned extents metadata (with restricted spatial bbox). |
src/climate_api/data/datasets/era5_land.yaml |
Adds OGC-aligned extents metadata for both ERA5-Land templates. |
src/climate_api/data/datasets/worldpop.yaml |
Adds OGC-aligned extents metadata for WorldPop. |
tests/conftest.py |
Updates test instance config to include required data_dir. |
tests/test_config.py |
Adds tests for get_data_dir() behavior and updates to templates_dir semantics. |
tests/test_downloader.py |
Updates directory resolution tests and adds coverage validation tests. |
tests/test_publications.py |
Updates pygeoapi dir resolution tests to use data_dir. |
tests/test_datasets.py |
Adjusts test setup for Zarr path monkeypatching. |
docs/setup_guide.md |
Documents required data_dir and updates references to templates_dir. |
docs/adding_custom_datasets.md |
Updates guidance for templates_dir and documents extents blocks. |
climate-api.yaml.example |
Updates example config: adds required data_dir and renames templates_dir. |
.env.example |
Removes CACHE_OVERRIDE and updates comments for templates_dir. |
Comments suppressed due to low confidence (1)
src/climate_api/data_manager/services/downloader.py:76
- Spatial coverage validation is performed before the effective bbox is resolved via
_resolve_bbox(). IfbboxisNone(e.g. relying onDOWNLOAD_BBOX),_validate_spatial_coverage()will currently skip validation and the request can still reach the provider. Consider resolving the bbox (when applicable) first and validating the resolved bbox so all request paths get the intended early 400 response.
_validate_spatial_coverage(dataset, bbox)
ingestion = dataset["ingestion"]
eo_download_func_path = ingestion["function"]
eo_download_func = _get_dynamic_function(eo_download_func_path)
before_files = {path.resolve(): path.stat().st_mtime_ns for path in get_cache_files(dataset)}
params = dict(ingestion.get("default_params", {}))
params.update(
{
"start": start,
"end": end or datetime.date.today().isoformat(),
"dirname": DOWNLOAD_DIR,
"prefix": _get_cache_prefix(dataset),
"overwrite": overwrite,
}
)
sig = inspect.signature(eo_download_func)
try:
if "bbox" in sig.parameters:
params["bbox"] = _resolve_bbox(bbox=bbox)
if "country_code" in sig.parameters:
- Validate request bbox against extents using the env fallback (DOWNLOAD_BBOX) when no explicit bbox is provided, so coverage checks apply to all request paths - Guard against malformed template extents.spatial.bbox (non-list or wrong length) to avoid a 500 on user-supplied templates - Update get_data_dir() docstring to accurately describe the None-on-missing-file behaviour introduced for CI safety
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Breaking changes
Summary
Background
Discovered during Norway instance setup: WorldPop data downloaded for Sierra Leone was silently reused for Norway (wrong bounding box), and a CHIRPS3 ingest for Norway returned an HTTP 403 with no clear message. The root fix is directory isolation per instance via `data_dir`.
Test plan