V1 refactor scadadf#108
Merged
Merged
Conversation
5ccd61d to
e2ee2bd
Compare
e2ee2bd to
f783901
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Refactors the benchmarking/synthetic pipeline and harness to operate on source-native SCADA column names (rather than wind-up v0 DataColumns aliases), keeping v0 aliasing confined to the v0 baseline on-ramp.
Changes:
- Introduces a
ColumnSchemaabstraction and threads it through synthetic generation, plotting, and ground-truth uplift computation. - Updates the Hill of Towie adapter to load/cache source-native
wtc_*tags, reshape wide→long for method-facing data, and adds a v0-onlylong_to_wind_up_formatconversion. - Makes the harness/method seam carry the turbine identifier column (
turbine_col) and updates baselines/tests accordingly (including makingNaiveRatioMethodconfigured by active-power column).
Reviewed changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/benchmarking/synthetic/test_upgrades.py | Updates synthetic upgrade tests to use HOT_COLUMNS source-native schema. |
| tests/benchmarking/synthetic/test_plots.py | Updates plot tests to use source-native column schema. |
| tests/benchmarking/synthetic/test_make_example_datasets.py | Updates example dataset tests to use source-native column schema. |
| tests/benchmarking/synthetic/test_ground_truth.py | Updates ground-truth tests to use source-native column schema. |
| tests/benchmarking/synthetic/test_generator.py | Updates generator tests to use source-native column schema and turbine id column. |
| tests/benchmarking/synthetic/sources/test_hill_of_towie.py | Splits tests between source-native reshape and v0 on-ramp conversion. |
| tests/benchmarking/synthetic/sources/test_hill_of_towie_cache.py | Adds offline tests for per-(year,turbine) parquet caching behavior. |
| tests/benchmarking/harness/test_scoring.py | Updates harness scoring tests to use HOT_COLUMNS and drop DataColumns. |
| tests/benchmarking/harness/test_replicates.py | Updates replicate tests to use HOT_COLUMNS and turbine id column. |
| tests/benchmarking/harness/stubs.py | Updates stub/oracle helpers to use seam-provided turbine column + source-native power. |
| tests/benchmarking/baselines/test_v0_binned.py | Adjusts v0 baseline tests for source-native input + conversion on-ramp. |
| tests/benchmarking/baselines/test_toggle_end_to_end.py | Updates end-to-end toggle test wiring for configured naive method + HOT_COLUMNS. |
| tests/benchmarking/baselines/test_naive_ratio.py | Reworks naive-ratio tests to prove no wind_up imports and use configured columns. |
| benchmarking/synthetic/upgrades.py | Refactors upgrades to be keyed by ColumnSchema (defaulting to HoT schema). |
| benchmarking/synthetic/sources/hill_of_towie.py | Loader now keeps wtc_* tags; adds scada_wide_to_long + v0-only long_to_wind_up_format; adds per-turbine-year cache. |
| benchmarking/synthetic/schema.py | New ColumnSchema dataclass defining semantic roles for source-native columns. |
| benchmarking/synthetic/plots.py | Plotting now selects columns via ColumnSchema instead of DataColumns. |
| benchmarking/synthetic/ground_truth.py | Ground-truth uplift now selects columns via ColumnSchema instead of DataColumns. |
| benchmarking/synthetic/generator.py | Generator now operates on source-native long SCADA and threads ColumnSchema through outputs. |
| benchmarking/synthetic/init.py | Re-exports ColumnSchema and HOT_COLUMNS from the synthetic package API. |
| benchmarking/harness/scoring.py | Threads ColumnSchema through replicate-building; injects turbine_col into MethodInput. |
| benchmarking/harness/replicates.py | Adds columns parameter and subsets by schema’s turbine column. |
| benchmarking/harness/method.py | Extends MethodInput to carry the turbine-identifier column name. |
| benchmarking/harness/example_hot_study.py | Updates oracle/example wiring to use source-native HOT_COLUMNS. |
| benchmarking/baselines/v0_binned.py | Converts source-native long SCADA to wind-up format inside the v0 baseline only. |
| benchmarking/baselines/naive_ratio.py | Makes naive method source-agnostic via configured active-power column + seam-provided turbine column. |
| benchmarking/baselines/inspect_v0_run.py | Updates inspection tool to provide turbine_col and use source-native schema. |
| benchmarking/baselines/inspect_naive.py | Updates inspection tool to configure naive method with source-native power col + turbine_col. |
| benchmarking/baselines/example_toggle_study.py | Updates example toggle study to configure naive method with source-native power col. |
| benchmarking/baselines/example_prepost_study.py | Updates example prepost study to configure naive method with source-native power col. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fully remove wind-up v0 method data column assumptions from the rest of the benchmarking code.