-
Notifications
You must be signed in to change notification settings - Fork 1
feat(ci): G3 perf regression gate + allowlist governance docs #293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,80 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # © James Ross Ω FLYING•ROBOTS <https://github.com/flyingrobots> | ||
| # | ||
| # Auto-update perf-baseline.json on main after merges that touch DET-critical | ||
| # or DET-important code. Creates a PR with the new baseline so it's reviewed | ||
| # (never force-pushes or commits directly to main). | ||
| name: Update perf baseline | ||
|
|
||
| on: | ||
| push: | ||
| branches: [main] | ||
|
|
||
| permissions: | ||
| contents: write | ||
| pull-requests: write | ||
|
|
||
| concurrency: | ||
| group: perf-baseline-update | ||
| cancel-in-progress: true | ||
|
|
||
| jobs: | ||
| update-baseline: | ||
| name: Update perf baseline | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 30 | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| with: | ||
| fetch-depth: 0 | ||
|
|
||
| - name: Check if perf-relevant files changed | ||
| id: changed | ||
| run: | | ||
| CHANGED=$(git diff --name-only HEAD~1..HEAD -- 'crates/**/*.rs' 'crates/**/Cargo.toml' || true) | ||
| if [ -z "$CHANGED" ]; then | ||
| echo "skip=true" >> "$GITHUB_OUTPUT" | ||
| echo "No Rust source changes — skipping baseline update" | ||
| else | ||
| echo "skip=false" >> "$GITHUB_OUTPUT" | ||
| fi | ||
|
|
||
| - name: Setup Rust | ||
| if: steps.changed.outputs.skip != 'true' | ||
| uses: dtolnay/rust-toolchain@stable | ||
|
|
||
| - name: Run benchmarks | ||
| if: steps.changed.outputs.skip != 'true' | ||
| run: | | ||
| cargo bench -p warp-benches --bench materialization_hotpath -- --output-format bencher | tee perf.log | ||
|
|
||
| - name: Generate baseline JSON | ||
| id: generate | ||
| if: steps.changed.outputs.skip != 'true' | ||
| run: | | ||
| node scripts/generate_perf_baseline.cjs perf.log > perf-baseline-new.json | ||
| if diff -q perf-baseline.json perf-baseline-new.json >/dev/null 2>&1; then | ||
| echo "Baseline unchanged — no PR needed" | ||
| echo "skip=true" >> "$GITHUB_OUTPUT" | ||
| else | ||
| mv perf-baseline-new.json perf-baseline.json | ||
| echo "skip=false" >> "$GITHUB_OUTPUT" | ||
| fi | ||
|
|
||
| - name: Create baseline PR | ||
| if: steps.changed.outputs.skip != 'true' && steps.generate.outputs.skip != 'true' | ||
| env: | ||
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| run: | | ||
| BRANCH="chore/perf-baseline-$(date +%Y%m%d)-$(git rev-parse --short HEAD)" | ||
| git checkout -b "$BRANCH" | ||
| git add perf-baseline.json | ||
| git config user.name "github-actions[bot]" | ||
| git config user.email "41898282+github-actions[bot]@users.noreply.github.com" | ||
| git commit -m "chore(perf): update perf-baseline.json from $(git rev-parse --short HEAD~1)" | ||
| git push origin "$BRANCH" | ||
| gh pr create \ | ||
| --title "chore(perf): update perf baseline" \ | ||
| --body "Auto-generated baseline update from main push $(git rev-parse HEAD~1)." \ | ||
| --base main \ | ||
| --head "$BRANCH" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| { | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,169 @@ | ||
| #!/usr/bin/env node | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
| // © James Ross Ω FLYING•ROBOTS <https://github.com/flyingrobots> | ||
| // | ||
| // G3 perf regression gate: compare current criterion bencher output against a | ||
| // git-tracked baseline and fail if any benchmark regresses beyond the allowed | ||
| // threshold. | ||
| // | ||
| // Usage: | ||
| // node scripts/check_perf_regression.cjs <baseline.json> <current.log> [--threshold 15] | ||
| // | ||
| // Baseline format (perf-baseline.json): | ||
| // { "<bench_name>": <median_ns>, ... } | ||
| // | ||
| // Current format (criterion --output-format bencher): | ||
| // test <bench_name> ... bench: <N> ns/iter (+/- <M>) | ||
| // | ||
| // Exit codes: | ||
| // 0 — no regressions above threshold | ||
| // 1 — one or more regressions above threshold | ||
| // 2 — usage error | ||
|
|
||
| "use strict"; | ||
|
|
||
| const fs = require("fs"); | ||
|
|
||
| const USAGE = "Usage: node scripts/check_perf_regression.cjs <baseline.json> <current.log> [--threshold <percent>]"; | ||
|
|
||
| function parseArgs(argv) { | ||
| const args = argv.slice(2); | ||
| let threshold = 15; | ||
| const positional = []; | ||
|
|
||
| for (let i = 0; i < args.length; i++) { | ||
| if (args[i] === "--threshold" && i + 1 < args.length) { | ||
| threshold = Number(args[++i]); | ||
| if (Number.isNaN(threshold) || threshold <= 0) { | ||
| console.error("ERROR: --threshold must be a positive number"); | ||
| process.exit(2); | ||
| } | ||
| } else if (args[i].startsWith("-")) { | ||
| console.error(`ERROR: unknown flag: ${args[i]}`); | ||
| console.error(USAGE); | ||
| process.exit(2); | ||
| } else { | ||
| positional.push(args[i]); | ||
| } | ||
| } | ||
|
|
||
| if (positional.length !== 2) { | ||
| console.error(USAGE); | ||
| process.exit(2); | ||
| } | ||
|
|
||
| return { baselinePath: positional[0], currentPath: positional[1], threshold }; | ||
| } | ||
|
|
||
| /** Parse criterion bencher output into { name: median_ns } */ | ||
| function parseBencherOutput(text) { | ||
| const results = {}; | ||
| // Format: "test <name> ... bench: <N> ns/iter (+/- <M>)" | ||
| const re = /^test\s+(\S+)\s+\.\.\.\s+bench:\s+([\d,]+)\s+ns\/iter/gm; | ||
| let match; | ||
| while ((match = re.exec(text)) !== null) { | ||
| const name = match[1]; | ||
| const ns = Number(match[2].replace(/,/g, "")); | ||
| results[name] = ns; | ||
| } | ||
| return results; | ||
| } | ||
|
|
||
| function main() { | ||
| const { baselinePath, currentPath, threshold } = parseArgs(process.argv); | ||
|
|
||
| if (!fs.existsSync(baselinePath)) { | ||
| console.log(`No baseline found at ${baselinePath} — recording current run as baseline.`); | ||
| console.log("G3: SKIP (no baseline to compare against)"); | ||
| process.exit(0); | ||
| } | ||
|
|
||
| const baseline = JSON.parse(fs.readFileSync(baselinePath, "utf-8")); | ||
| const currentText = fs.readFileSync(currentPath, "utf-8"); | ||
| const current = parseBencherOutput(currentText); | ||
|
|
||
| const benchNames = Object.keys(current); | ||
| if (benchNames.length === 0) { | ||
| console.error("ERROR: no benchmark results found in current output"); | ||
| process.exit(2); | ||
| } | ||
|
|
||
| console.log(`G3 perf regression gate (threshold: ${threshold}%)`); | ||
| console.log("─".repeat(72)); | ||
|
|
||
| const report = []; | ||
| let regressions = 0; | ||
|
|
||
| for (const name of benchNames) { | ||
| const cur = current[name]; | ||
| const base = baseline[name]; | ||
|
|
||
| if (base == null) { | ||
| report.push({ name, cur, base: null, delta: null, status: "NEW" }); | ||
| continue; | ||
| } | ||
|
|
||
| const deltaPct = ((cur - base) / base) * 100; | ||
| const regressed = deltaPct > threshold; | ||
| if (regressed) regressions++; | ||
|
|
||
| report.push({ | ||
| name, | ||
| cur, | ||
| base, | ||
| delta: deltaPct, | ||
| status: regressed ? "REGRESSED" : "OK", | ||
| }); | ||
| } | ||
|
|
||
| // Fail when baseline benchmarks disappear from the current run. | ||
| // This prevents silent bypass of regression enforcement via benchmark | ||
| // renames/removals. To resolve: update perf-baseline.json to remove | ||
| // the stale entry (via the baseline update workflow or manually). | ||
| for (const name of Object.keys(baseline)) { | ||
| if (current[name] == null) { | ||
| regressions++; | ||
| report.push({ name, cur: null, base: baseline[name], delta: null, status: "MISSING" }); | ||
| } | ||
| } | ||
|
|
||
| // Print table | ||
| const nameWidth = Math.max(12, ...report.map((r) => r.name.length)); | ||
| const header = `${"Benchmark".padEnd(nameWidth)} ${"Baseline".padStart(12)} ${"Current".padStart(12)} ${"Delta".padStart(8)} Status`; | ||
| console.log(header); | ||
| console.log("─".repeat(header.length)); | ||
|
|
||
| for (const r of report) { | ||
| const baseStr = r.base != null ? `${r.base} ns` : "—"; | ||
| const curStr = r.cur != null ? `${r.cur} ns` : "—"; | ||
| const deltaStr = r.delta != null ? `${r.delta > 0 ? "+" : ""}${r.delta.toFixed(1)}%` : "—"; | ||
| const statusStr = | ||
| r.status === "REGRESSED" ? `FAIL (>${threshold}%)` : | ||
| r.status === "MISSING" ? "FAIL (missing)" : | ||
| r.status; | ||
| console.log( | ||
| `${r.name.padEnd(nameWidth)} ${baseStr.padStart(12)} ${curStr.padStart(12)} ${deltaStr.padStart(8)} ${statusStr}` | ||
| ); | ||
| } | ||
|
|
||
| console.log("─".repeat(header.length)); | ||
|
|
||
| // Write structured report | ||
| const reportObj = { | ||
| threshold_pct: threshold, | ||
| benchmarks: report, | ||
| regressions, | ||
| passed: regressions === 0, | ||
| }; | ||
| fs.writeFileSync("perf-report.json", JSON.stringify(reportObj, null, 2) + "\n"); | ||
| console.log("\nWrote perf-report.json"); | ||
|
|
||
| if (regressions > 0) { | ||
| console.error(`\nG3: FAILED — ${regressions} benchmark(s) regressed beyond ${threshold}% threshold`); | ||
| process.exit(1); | ||
| } | ||
|
|
||
| console.log(`\nG3: PASSED — all benchmarks within ${threshold}% of baseline`); | ||
| } | ||
|
|
||
| main(); | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterating only
Object.keys(current)means the gate never checks benchmarks that exist inperf-baseline.jsonbut are missing from the new output, so a renamed/removed benchmark can silently bypass regression enforcement and still reportG3: PASSED. This undermines the accuracy of the regression gate in exactly the cases where benchmark coverage changes, so the comparison should include baseline-only entries (at least as a hard failure or explicit review-required state).Useful? React with 👍 / 👎.