gridstats is an Oceanum library for computing gridded statistics over large oceanographic and climate datasets. Pipelines are defined in YAML and run as a single CLI command. Computation is lazy and out-of-core via xarray and dask, so datasets of arbitrary size are handled without loading them into memory.
- YAML-driven pipelines — source, operations, and output are all declared in one config file
- Out-of-core — processes arbitrarily large grids lazily; spatial
tiles:keeps peak memory bounded - Rich stat library — aggregations, quantiles, exceedance, return period values, directional stats, distributions, and more
- Multiple output formats — NetCDF or Zarr
- CF-compliant — output variables are automatically annotated with standard names, units, and long names
- Extensible — register custom stat functions and loaders via decorator or package entry point
Requires Python ≥ 3.10.
pip install gridstatsFor intake catalog support and cloud storage (GCS):
pip install "gridstats[extra]"# stats.yml
source:
type: xarray
urlpath: /data/hindcast/waves.zarr
engine: zarr
sel:
time: {start: "2000-01-01", stop: "2020-12-31"}
latitude: {start: -50, stop: -30}
longitude: {start: 160, stop: 180}
output:
outfile: ./wave_stats.zarr
calls:
- func: mean
dim: time
data_vars: [hs, tp]
- func: quantile
dim: time
data_vars: [hs]
q: [0.5, 0.90, 0.95, 0.99]
- func: rpv
dim: time
data_vars: [hs]
return_periods: [10, 50, 100]
distribution: gumbel_rgridstats run stats.ymlThe output dataset will contain variables like hs_mean, tp_mean, hs_quantile, and hs_rpv, each with CF-standard attributes.
import xarray as xr
ds = xr.open_zarr("wave_stats.zarr")
print(ds)| Function | Description |
|---|---|
mean, max, min, std, count |
Basic aggregations |
quantile |
Quantiles at arbitrary probability levels |
pcount |
Count of non-NaN values per grid cell |
exceedance / nonexceedance |
Probability of exceeding a threshold |
range_probability |
Probability of a value falling in a range |
rpv |
Return period values via extreme value fitting |
distribution2 / distribution3 |
2-D and 3-D joint histograms |
statdir |
Directional statistics (sector-binned) |
hmo |
Significant wave height from spectral moments |
winpow |
Wind power density |
All calls accept a group: key (month, season, year) to compute statistics per calendar period.
calls:
# Monthly mean
- func: mean
dim: time
data_vars: [hs]
group: month
# Quantile with spatial tiling to control memory
- func: quantile
dim: time
data_vars: [hs]
q: [0.95, 0.99]
chunks: {time: -1, latitude: 50, longitude: 50}
tiles: {latitude: 10, longitude: 10}Register a custom stat function in your own package:
from gridstats.registry import register_stat
import xarray as xr
@register_stat("my_stat")
def my_stat(data: xr.Dataset, *, dim: str = "time", **kwargs) -> xr.Dataset:
...Or declare it as a package entry point so it is discovered automatically:
[project.entry-points."gridstats.stats"]
my_stat = "my_package.stats:my_stat"Usage: gridstats [OPTIONS] COMMAND [ARGS]...
Commands:
run Run a stats pipeline from a YAML configuration file.
list-stats List all registered stat functions.
MIT — see LICENSE.