Skip to content

holukas/diive

Repository files navigation

Python PyPI - Version GitHub License PyPI Downloads DOI

diive is currently being prepared for the v1.0 release.

Time series data processing

diive is a Python library for time series processing, focused on ecosystem data. It was originally developed by the ETH Grassland Sciences group for Swiss FluxNet.

CHANGELOG | Releases


Citation

Cite diive using DOI 10.5281/zenodo.10884017. This concept DOI resolves to the latest release, so include the version number in your citation.

BibTeX format:

@software{diive2026,
  author = {Hörtnagl, Lukas},
  title = {diive: Python library for time series processing},
  version = {0.91.0},
  year = {2026},
  doi = {10.5281/zenodo.10884017}
}

Replace version and year with the values for your target release.


Installation

Requires Python 3.12+

pip install diive

Or with uv:

uv pip install diive

Quick start

import diive as dv

# Load example data
df = dv.load_exampledata_parquet()

# Plot time series
dv.plot_time_series(series=df['NEE']).plot()

# Gap-fill with Random Forest
from diive.core.ml.feature_engineer import FeatureEngineer
from diive.gapfilling.randomforest_ts import RandomForestTS

engineer = FeatureEngineer(target_col='NEE', features_lag=[-1, 1], features_rolling=[12, 24])
df_engineered = engineer.fit_transform(df)

model = RandomForestTS(input_df=df_engineered, target_col='NEE', n_estimators=100)
model.trainmodel()
model.fillgaps()

API

diive exposes its classes through a top-level namespace, available as both PascalCase and snake_case aliases:

import diive as dv

plot = dv.plot_time_series(series=data)   # snake_case alias
plot = dv.TimeSeries(series=data)         # PascalCase class name
Area Common exports
Plotting TimeSeries, Cumulative, DielCycle, HeatmapDateTime
Gap-filling RandomForestTS, XGBoostTS, FluxMDS
Analysis GridAggregator, SeasonalTrendDecomposition
Eddy covariance FluxProcessingChain, FluxDetectionLimit, WindDoubleRotation
I/O load_parquet, save_parquet, load_exampledata_parquet

For the full list, see diive.__all__.


Examples

104 runnable examples are organized by topic in examples/. They follow Sphinx Gallery format (# %% sections), so they run as plain scripts and convert to HTML docs automatically. Browse by use case in CATALOG.md, or check EXAMPLE_DATASET.md for documentation of the 37-variable dataset used throughout.

uv run python examples/visualization/plot_heatmap_datetime_basic.py
uv run python examples/analysis/analysis_daily_correlation.py
uv run python examples/gapfilling/gapfill_randomforest.py
uv run python examples/flux/fluxprocessingchain/fluxprocessingchain_composable.py

Features

Gap-filling

FeatureEngineer runs an 8-stage feature pipeline (lag features, rolling stats, differencing, EMA, polynomial terms, STL decomposition, timestamps, record numbering). You build the features once and reuse them across models.

Method How it works
XGBoostTS Gradient boosting
RandomForestTS Ensemble learning with SHAP importance
FluxMDS Meteorological similarity, no training needed
Linear interpolation Short gaps only

Long-term variants support multi-year data with USTAR scenario options. See examples/gapfilling/.

Flux processing chain

Post-processing from quality flags through gap-filling, covering Levels 2 to 4.1 following Swiss FluxNet standards. Two entry points:

  • run_chain(data, config) — single call drives the full pipeline (L2 → L3.1 → L3.2 → L3.3 → L4.1) from one FluxConfig. Intentionally simple: fixed defaults for per-detector / per-model knobs (Hampel sub-options, MDS tolerances, ML hyperparameters). Use this for the standard FLUXNET-style workflow.
  • Composable per-level callables (run_level2, run_level31, make_level32_detector + run_level32, run_level33_constant_ustar / run_level33_ustar_detection, run_level41_mds / _rf / _xgb) — full control. Every detector class, model hyperparameter, MDS tolerance, and diagnostic flag is reachable here and only here.

Need a computed driver (e.g. VPD in kPa) for L4.1? Use add_driver(data, series) to put it where L4.1 actually reads from. Call data.gap_stats() at any level for a monthly/annual breakdown with long-gap listing. data.plot_gapfilled_heatmaps() puts all gap-filling methods side by side; data.plot_cumulative_comparison() overlays their cumulative sums on one axes.

Reference: Swiss FluxNet flux processing | Examples: examples/flux/fluxprocessingchain/

Quality control and outlier detection

FlagQCF merges multiple test flags into a single quality indicator with daytime/nighttime separation and USTAR scenario support.

Nine outlier detection methods are available: Hampel filter, Z-score (global, rolling, or split by day/night), local SD, Local Outlier Factor, absolute limits, incremental detection, manual removal, trimmed mean, and stepwise chaining across multiple methods. See examples/preprocessing/outlier_detection/.

Corrections and preprocessing

Tools cover offset correction for measurements, radiation, humidity, and wind direction; threshold and missing value handling; and timestamp sanitization (validation, regularization, frequency detection). See examples/preprocessing/corrections/ and examples/times/.

Analysis

Seasonal-trend decomposition (STL, classical, or harmonic), lagged correlation and binned analysis, 2D grid aggregation, gap detection with monthly/annual breakdown, and percentiles/histograms. See examples/analysis/.

Derived variables

VPD from temperature and humidity, day/night flags from solar geometry, air density, aerodynamic resistance, unit conversions, lagged features, and clear-sky potential radiation. See examples/features/.

Eddy covariance

Flux detection limit from 20 Hz data, maximum covariance lag, pre-whitening bootstrap (PWB) for trace gases (CH4, N2O) with single-period and multi-file parallel variants, wind double rotation, self-heating correction for open-path IRGAs, USTAR filtering, and random error propagation. See examples/flux/.

Visualization

14+ plot types including time series, cumulative, diel cycle, heatmaps (datetime and year-month), hexbin, histogram, ridgeline, scatter, and anomaly plots. Both Matplotlib and Plotly are supported. See examples/visualization/.

I/O

Load and save parquet files, read single or batch EddyPro output, detect and split irregular files, and format data for FLUXNET submission. See examples/io/.


Contributing

See CLAUDE.md for development setup, coding standards, and testing.


License

diive is released under the GNU General Public License v3.0.