Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds a second, fully independent baseline uplift estimator (a naive energy-ratio method) and extends the existing v0 binned baseline to support toggle campaigns, enabling end-to-end scoring for both prepost and toggle synthetic studies (with rich per-run diagnostics).
Changes:
- Introduce
NaiveRatioMethod(prepost + toggle) with per-run CSV diagnostics and optional diagnostic plots. - Add toggle support to
V0BinnedMethodby wiring wind_up’s native toggle assessment and generating a toggle signal dataframe. - Improve Hill of Towie 10‑minute loader performance via per-(year, turbine) parquet caching; add new example driver(s) + tests (including a slow toggle E2E).
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/benchmarking/baselines/test_v0_end_to_end.py | Updates HoT v0 context construction to use explicit turbine names. |
| tests/benchmarking/baselines/test_v0_binned.py | Adds unit tests for toggle wiring/config and toggle signal dataframe semantics. |
| tests/benchmarking/baselines/test_toggle_end_to_end.py | Adds slow end-to-end toggle study test scoring v0 + naive + oracle. |
| tests/benchmarking/baselines/test_naive_ratio.py | Adds comprehensive unit tests for naive ratio estimator, diagnostics, and plots. |
| docs/v1/issues.md | Rescopes Issue 4 documentation to the naive energy-ratio method and toggle/v0 wiring. |
| benchmarking/synthetic/sources/hill_of_towie.py | Adds cached unpacking of HoT year zips into per-turbine-year parquet files. |
| benchmarking/baselines/v0_binned.py | Implements toggle-mode support via wind_up toggle config + toggle_df generation. |
| benchmarking/baselines/naive_ratio.py | Implements the new naive energy-ratio baseline with diagnostics and plots. |
| benchmarking/baselines/inspect_naive.py | Adds a manual inspection driver to run naive ratio replicates with plots enabled. |
| benchmarking/baselines/example_v0_study.py | Adds NaiveRatioMethod to the existing prepost example driver. |
| benchmarking/baselines/example_toggle_study.py | Adds a new toggle-mode example driver scoring v0 + naive + oracle. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (2)
benchmarking/baselines/example_prepost_study.py:111
main(..., data_dir=...)passesdata_dirtoload_hot_scada, butrun_prepost_studybuilds the v0 context withbuild_hot_v0_context(...)using the default data dir. This can lead to duplicated downloads/caches (SCADA/metadata) or surprising behavior whendata_diris overridden.
benchmarking/baselines/example_prepost_study.py:185main(..., data_dir=...)passesdata_dirtoload_hot_scada, but it is not forwarded intorun_prepost_study(...)(and thus intobuild_hot_v0_context(...)). Forwarding it keeps SCADA + metadata using the same cache directory whendata_diris overridden.
Comment on lines
+162
to
+163
| if self.save_plots: | ||
| _save_plots(run_dir / "plots", wide=wide, mi=mi, test=mi.test_wtg) |
| n_replicates=n_replicates, | ||
| seed=0, | ||
| ) | ||
| return run_toggle_study(scada_df, profiles=TOGGLE_PROFILES, study=study, out_root=out_root) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue 4 — Naive energy-ratio method (WS3)
Goal: a second, deliberately simple, fully independent method that validates the
harness is not implicitly tuned to v0 — and proves the existing thin method seam is
genuinely pluggable.
Why this, not the original "data contract" issue: the thin
MethodInput/MethodOutputseam from Issues 2–3 already is the shared, method-agnostic contract; the drafted "per test-reference conditioned dataset" was over-fit to
v0 (an R-learner fits once per test turbine over all references at once), and the
assessment_methodproduction selector only earns its place once there is a winner topromote. The durable kernel of the old issue — a treatment-invariant reference-only
feature builder + the §8 bias-guard test (design note §3/§8) — folds into Issue 5.
The method. For a set of rows let
ρ = Σ test_power / Σ reference_total_powerovercomplete-case timestamps (test turbine and every reference finite). Estimate
uplift = ρ(treated) / ρ(baseline) − 1. It never reads the test turbine's own windspeed (design note §3), shares no code with v0, and has no wind_up dependency. It makes
no covariate-shift correction by design, so it is the "don't condition at all" floor:
biased on prepost, near-unbiased on toggle (interleaved on/off share a wind
distribution).
Scope
NaiveRatioMethodbehind the existingMethodseam; prepost and toggle.all/baseline/upgradedsegment, aheadline-results CSV, optional plots) so a human can confirm the right data was
received and interpreted; the headline uplift is re-derivable from the stats CSV.
V0BinnedMethod(wiring wind_up's native toggle assessment) sov0 can be scored on toggle campaigns too.
(3% Cp increase, 20-min-on/20-min-off) scoring naive + v0 + oracle.
Done when:
naive_ratiois scored alongsidev0_binnedand the oracle on thesynthetic profiles for both prepost and toggle, the per-run diagnostics are written, and
its accuracy/precision appears in the leaderboard.