Skip to content

feat: add Monte Carlo forecast compliance test suite (P1–P13) and fix fixture AIC gap#40126

Merged
pelikhan merged 7 commits into
mainfrom
copilot/formal-spec-forecast-compliance-fixtures-readme
Jun 19, 2026
Merged

feat: add Monte Carlo forecast compliance test suite (P1–P13) and fix fixture AIC gap#40126
pelikhan merged 7 commits into
mainfrom
copilot/formal-spec-forecast-compliance-fixtures-readme

Conversation

Copilot AI commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

The canonical run_summary_minimal.json fixture set total_effective_tokens but omitted total_aic. Since forecast.go:593 gates on runAIC ≤ 0 → continue, every fixture-backed run was silently skipped, producing no Monte Carlo input for T-FC-031–T-FC-040.

Fixture fix

Added "total_aic": 0.0054 to token_usage_summary in specs/forecast-compliance-fixtures/run_summary_minimal.json, closing the field-name gap between what the fixture published and what the engine reads (TokenUsage.TotalAIC).

Formal test suite

New pkg/cli/forecast_compliance_fixtures_formal_test.go — 13 predicates mapped to the formal model in specs/forecast-compliance-fixtures/README.md:

Predicate Coverage
P1 FixtureFieldMapping Fixture JSON keys match the fields the engine reads at runtime
P2 BernoulliSuccess successCount=0 collapses MC mean to zero regardless of λ
P3 ObservedRateFormula λ = (n/h) × periodDays table-driven
P4 YieldFormula yield = successRate × observedRunsPerPeriod
P5 MonteCarloIterations result.Iterations == monteCarloIterations (10 000)
P6 ZeroLambdaNilResult λ≤0, NaN, ±Inf → nil (R-MC-001/004)
P7 ZeroAICExclusion nil/empty observations → nil (R-MC-011/032)
P8 ReliabilityThreshold n<10 → IsReliable=false; n≥10 → true (R-MC-030)
P9 PoissonBranchCrossover useNormalApproximationForPoisson: exact boundary at λ=15 (R-FC-060)
P10 DurationDerivation Duration = UpdatedAt − StartedAt replicates forecast.go:573-574
P11 FlagValidation_Days days ∉ {7,30}RunForecast returns error with correct message
P12 FixtureAICGap total_aic > 0 post-fix; guards against regression of the gap
P13 FixtureJSONConformance All required top-level, run, and token_usage_summary fields present

Copilot AI and others added 2 commits June 18, 2026 18:33
- Add total_aic: 0.0054 to specs/forecast-compliance-fixtures/run_summary_minimal.json
  to close the fixture gap where the engine reads TotalAIC but the fixture
  only set total_effective_tokens (runAIC ≤ 0 → continue at forecast.go:593)

- Add pkg/cli/forecast_compliance_fixtures_formal_test.go with 13 predicates
  covering T-FC-031–T-FC-040: fixture field mapping, Bernoulli success model,
  observed rate / yield formulas, Monte Carlo trial count, zero-λ nil guard,
  zero-AIC exclusion, reliability threshold, Poisson branch crossover,
  duration derivation, days flag validation, fixture AIC gap, and JSON conformance

Closes #40114

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Update README for forecast compliance fixtures and Monte Carlo tests feat: add Monte Carlo forecast compliance test suite (P1–P13) and fix fixture AIC gap Jun 18, 2026
Copilot AI requested a review from pelikhan June 18, 2026 18:35
@pelikhan pelikhan marked this pull request as ready for review June 19, 2026 00:14
Copilot AI review requested due to automatic review settings June 19, 2026 00:14

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a forecast fixture/schema mismatch that caused fixture-backed Monte Carlo forecasts to be skipped, and adds a formalized compliance test suite (P1–P13) to prevent regressions and validate key Monte Carlo/forecast invariants.

Changes:

  • Add token_usage_summary.total_aic to the canonical run_summary_minimal.json fixture so AIC observations aren’t silently excluded.
  • Introduce a new formal compliance test file covering predicates P1–P13 for Monte Carlo forecast behavior and fixture conformance.
Show a summary per file
File Description
specs/forecast-compliance-fixtures/run_summary_minimal.json Adds the missing total_aic field so fixture runs produce usable AIC observations for forecasting.
pkg/cli/forecast_compliance_fixtures_formal_test.go Adds formal predicate-based unit tests validating Monte Carlo behavior, forecast derivations, flag validation, and fixture field presence.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 1

Comment on lines +333 to +343
// Allowed values must NOT return a flag-validation error.
for _, days := range []int{7, 30} {
cfg := ForecastConfig{Days: days, Period: "month", JSONOutput: true, SampleSize: 10}
err := RunForecast(cfg)
// The call may fail for other reasons (no GitHub auth, no workflows), but must
// not fail with the days-validation error.
if err != nil {
assert.NotContains(t, err.Error(), "invalid days value",
"P11: days=%d is valid and must not trigger the days-validation error", days)
}
}
@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel completed test quality analysis.

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

🏗️ Design Decision Gate — ADR Required

This PR adds significant changes to pkg/ (416 new lines in pkg/cli/forecast_compliance_fixtures_formal_test.go, above the 100-line business-logic threshold) but does not have a linked Architecture Decision Record (ADR).

📄 Draft ADR committed: docs/adr/40126-formal-compliance-test-suite-for-forecast-engine.md — review and complete it before merging.

🔒 This PR cannot merge until an ADR is linked in the PR body.

📋 What to do next
  1. Review the draft ADR committed to your branch — it was generated from the PR diff and follows the repo's existing formal-conformance-suite precedent (ADR-38166).
  2. Complete / correct the sections — in particular confirm the decision framing (adopting a P1–P13 predicate-mapped compliance suite for the Monte Carlo forecast engine) and verify the alternatives reflect what you actually weighed.
  3. Commit the finalized ADR to docs/adr/ on your branch.
  4. Reference the ADR in this PR body by adding a line such as:

    ADR: ADR-40126: Formal Compliance Test Suite for the Monte Carlo Forecast Engine

Once an ADR is linked in the PR body, this gate will re-run and verify the implementation matches the decision.

❓ Why ADRs Matter

ADRs create a searchable, permanent record of why the codebase looks the way it does — here, why the forecast engine is pinned by a formal predicate suite rather than ad-hoc unit tests, and why the fixture total_aic field gap was closed. Future contributors (and your future self) will thank you.

📋 Michael Nygard ADR Format Reference

An ADR must contain these four sections to be considered complete:

  • Context — What is the problem? What forces are at play?
  • Decision — What did you decide? Why?
  • Alternatives Considered — What else could have been done?
  • Consequences — What are the trade-offs (positive and negative)?

All ADRs are stored in docs/adr/ as Markdown files numbered by PR number.

🔒 This PR cannot merge until an ADR is linked in the PR body.

🏗️ ADR gate enforced by Design Decision Gate 🏗️ ·

@pelikhan

Copy link
Copy Markdown
Collaborator

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking observations — fixture fix is correct, test suite has quality gaps.

The one-line fixture change ("total_aic": 0.0054) correctly closes the gap that caused forecast.go:593 to silently skip every fixture-backed run. That change is good and should merge.

Review findings summary

Reliability

  • runtime.Caller(0) fragility (line 35): breaks P1/P12/P13 if any build step passes -trimpath; use a package-relative path instead since go test guarantees the working directory.

Test correctness gaps

  • P3/P4 are arithmetic tautologies (lines 104–148): neither test calls a production function; wantLambda/wantYield entries are defined as the same formula being tested, so they cannot detect engine regressions.
  • P10 reimplements rather than tests (lines 309–316): the test copies the exact condition from forecast.go:573-574 inline, so a bug in those lines would not be caught.

Clarity

  • P9 uses assert.InDelta(delta=0) (line 253) for a typed constant — assert.Equal is the right tool here.

P2, P5, P6, P7, P8, P9, P11 (invalid-days path), P12, P13 all exercise real engine code and are solid.

🔎 Code quality review by PR Code Quality Reviewer

func fixtureDir(t *testing.T) string {
t.Helper()
_, thisFile, _, ok := runtime.Caller(0)
require.True(t, ok, "runtime.Caller must return a valid file path")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runtime.Caller(0) breaks fixture discovery when built with -trimpath: if the Go toolchain strips source paths (e.g. go test -trimpath or a reproducible-build CI step), thisFile becomes a module-relative virtual path and filepath.Dir resolves to a non-existent directory, causing every fixture-backed test (P1, P12, P13) to fail with a misleading "fixture file must be readable" error.

💡 Suggested fix

Drop the runtime import and use the package working directory that go test guarantees:

func fixtureDir(t *testing.T) string {
    t.Helper()
    return filepath.Join("..", "..", "specs", "forecast-compliance-fixtures")
}

go test always sets the working directory to the package directory (pkg/cli/), so a relative path of ../../specs/forecast-compliance-fixtures is stable and works with any build flags. The runtime.Caller approach is fragile once -trimpath enters the picture.

Comment on lines +104 to +118
func TestFormal_P3_ObservedRateFormula(t *testing.T) {
cases := []struct {
sampledRuns int
historyDays int
periodDays int
wantLambda float64
}{
{sampledRuns: 10, historyDays: 30, periodDays: 30, wantLambda: 10.0},
{sampledRuns: 10, historyDays: 30, periodDays: 7, wantLambda: 10.0 / 30.0 * 7},
{sampledRuns: 21, historyDays: 7, periodDays: 7, wantLambda: 21.0},
{sampledRuns: 0, historyDays: 30, periodDays: 30, wantLambda: 0.0},
}
for _, tc := range cases {
lambda := float64(tc.sampledRuns) / float64(tc.historyDays) * float64(tc.periodDays)
assert.InDelta(t, tc.wantLambda, lambda, 1e-9,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 and P4 are arithmetic tautologies — they test the formula in isolation, not the engine: both tests compute the formula inline in test code and compare against wantXxx values that are themselves defined as the same formula (e.g. wantLambda: 10.0 / 30.0 * 7 is identical to what the loop computes). These tests can never fail regardless of what forecast.go does. If forecastWorkflow were changed to compute ObservedRunsPerPeriod incorrectly (e.g. off-by-one on historyDays), P3 and P4 would remain green.

💡 Why this matters

The test title implies it verifies the engine's observed-rate formula, but neither P3 nor P4 calls any production function. They verify that (a/b)*c == expected in pure test arithmetic, which is trivially true.

To actually guard against engine regressions, the tests need to invoke forecastWorkflow (or a helper that computes ObservedRunsPerPeriod) with a stubbed GitHub API, then assert on the result's ObservedRunsPerPeriod field. That is admittedly harder, but the current tests provide false confidence — a label of 'P3: formal compliance' that doesn't back it up.

At minimum, the tautological wantLambda / wantYield entries should be replaced with hardcoded expected values derived independently of the formula, and the test should acknowledge it is validating formula semantics only (not engine integration).

Comment on lines +309 to +316
// Replicate the derivation logic from forecast.go:573-574.
if r.Duration == 0 && !r.StartedAt.IsZero() && !r.UpdatedAt.IsZero() {
r.Duration = r.UpdatedAt.Sub(r.StartedAt)
}
assert.Equal(t, tc.wantDuration, r.Duration,
"P10: %s: Duration must equal UpdatedAt − StartedAt", tc.description)
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P10 reimplements the engine derivation rather than testing it: the test copies the exact condition from forecast.go:573-574 and applies it directly to a WorkflowRun struct. It asserts that time.Sub works correctly — which it always does — but never calls forecastWorkflow or any function that actually contains the production derivation. If the guard condition (r.Duration == 0 && !r.StartedAt.IsZero() && !r.UpdatedAt.IsZero()) were wrong in forecast.go, this test would still pass because the test re-runs the same copied condition.

💡 Impact and fix direction

The test comment even says "Replicate the derivation logic from forecast.go:573-574" — this is a red flag. A compliance test should exercise the production path, not copy it.

The duration derivation is called inside forecastWorkflow (line 573-574), which requires a stubbed GitHub API to call. If that's too expensive here, the test should at least be retitled to clarify it validates the formula semantics in isolation (TestFormal_P10_DurationArithmetic) and add a TODO noting that the engine integration path is not covered.

// Specification reference: R-FC-060; forecast_montecarlo.go poissonNormalApproximationThreshold
func TestFormal_P9_PoissonBranchCrossover(t *testing.T) {
assert.InDelta(t, 15.0, poissonNormalApproximationThreshold, 0,
"P9: Poisson crossover threshold must equal 15")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert.InDelta(delta=0) for an exact constant equality is misleading: using InDelta with a delta of 0 is semantically identical to assert.Equal but hides the intent. Readers will scan for the delta value to understand the tolerance being allowed, see 0, and wonder if it is intentional or a copy-paste error. For a constant value like poissonNormalApproximationThreshold (a typed float64 constant, not a computed value), assert.Equal is the correct assertion.

💡 Suggested fix
// Replace:
assert.InDelta(t, 15.0, poissonNormalApproximationThreshold, 0,
    "P9: Poisson crossover threshold must equal 15")

// With:
assert.Equal(t, 15.0, poissonNormalApproximationThreshold,
    "P9: Poisson crossover threshold must equal 15")

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /tdd and /grill-with-docs — commenting with improvement suggestions. No blocking issues; the fixture fix is correct and the test suite is a valuable addition.

📋 Key Themes & Highlights

Key Themes

  • Tautological tests (P3, P4): Two predicates compute their formulas inline in the test body rather than calling production code, making them always-green spec documentation rather than regression guards.
  • Logic duplication (P10): The test copies forecast.go:573-574 verbatim instead of calling an extractable helper — future changes to the production condition won't break P10.
  • Error-string inconsistency (P11): The positive and negative assertions use different substrings, creating a latent fragility.
  • Documentation drift: The file header references a Formal Model section that doesn't exist in the README, and the README's "How to Run" section is now stale.

Positive Highlights

  • P12 + fixture fix is the clearest value in this PR: closes a silent data-gap that caused every fixture-backed run to be skipped at forecast.go:593. The regression guard is well-placed.
  • P6 and P9 cover the hardest-to-spot boundary conditions (λ=NaN/±Inf, Poisson crossover at exactly 15) with crisp table-driven cases.
  • P8 tests both sides of the reliability threshold with exact boundary values (n=9 vs n=10) — a textbook Arrange/Act/Assert structure.
  • ✅ All 13 predicate docstrings include formal predicate notation and spec-reference cross-links, which makes auditing significantly easier.
  • ✅ The fixture change is a one-liner with a clear rationale. P12 + P13 together ensure it can't regress silently.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer

{sampledRuns: 0, historyDays: 30, periodDays: 30, wantLambda: 0.0},
}
for _, tc := range cases {
lambda := float64(tc.sampledRuns) / float64(tc.historyDays) * float64(tc.periodDays)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] This test re-implements the production formula inline rather than calling any production code — so it will always pass regardless of what forecastWorkflow actually computes.

The assertion lambda == wantLambda is mathematically tautological: wantLambda is the result of the same expression. If forecast.go:647–648 is ever changed (e.g., a rounding step or an integer-division bug), P3 would still pass.

💡 Suggestion

Extract the lambda computation into a testable helper:

// Extract from forecast.go lines 647-648
func observedRunsPerPeriod(sampledRuns, historyDays, periodDays int) float64 {
    return (float64(sampledRuns) / float64(historyDays)) * float64(periodDays)
}

// P3 test then calls the production helper
lambda := observedRunsPerPeriod(tc.sampledRuns, tc.historyDays, tc.periodDays)

If extraction is not feasible, at minimum add a historyDays=0 case to document and verify that the engine guards against divide-by-zero / NaN upstream.

}
for _, tc := range cases {
successRate := float64(tc.successCount) / float64(tc.totalRuns)
yield := successRate * tc.lambda

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Same concern as P3: both successRate and yield are computed inline in the test, so this assertion verifies a mathematical identity (a/b)*c == (a/b)*c rather than exercising production code.

The production successRate is computed inside runMonteCarlo at line 107 of forecast_montecarlo.go. P2 already provides stronger coverage by calling runMonteCarlo end-to-end. Consider whether P4 adds value over P2, or whether it should instead be a cross-check that a given (successCount, observations, lambda) triple produces a MeanProjectedAIC that lies within an expected range derived from successRate × lambda × meanAIC.

💡 Suggestion

Either remove P4 (since the formula is trivially true) or replace the inline computation with an integration-style check:

// Verify yield influences MC output proportionally
rng := rand.New(rand.NewSource(42))
mc100 := runMonteCarlo(aicObs, 10 /* 100% success */, 10.0, rng)
rng2 := rand.New(rand.NewSource(42))
mc50 := runMonteCarlo(aicObs, 5 /* 50% success */, 10.0, rng2)
assert.InDelta(t, mc100.MeanProjectedAIC*0.5, mc50.MeanProjectedAIC, mc100.MeanProjectedAIC*0.1,
    "P4: 50%% success should yield ~half the mean AIC of 100%% success")

}
// Replicate the derivation logic from forecast.go:573-574.
if r.Duration == 0 && !r.StartedAt.IsZero() && !r.UpdatedAt.IsZero() {
r.Duration = r.UpdatedAt.Sub(r.StartedAt)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The comment on line 309 says "Replicate the derivation logic from forecast.go:573-574" — and that is exactly the problem. This test copies production code verbatim rather than calling it, so a change to the production condition (e.g., adding && !r.UpdatedAt.Before(r.StartedAt) to guard negative durations) would leave P10 silently passing with the old logic.

This also misses an important edge case: updatedAt < startedAt (e.g., a clock skew or data corruption scenario). The current code would compute a negative duration without any guard.

💡 Suggestion

Extract computeDuration(r *WorkflowRun) from forecast.go:571–574 so P10 can call the real helper:

func computeDuration(r *WorkflowRun) {
    if r.Duration == 0 && !r.StartedAt.IsZero() && !r.UpdatedAt.IsZero() {
        r.Duration = r.UpdatedAt.Sub(r.StartedAt)
    }
}

// P10 test:
computeDuration(&r)

Also add a negative-duration case:

{
    startedAt:    time.Date(2026, 5, 1, 12, 0, 0, 0, time.UTC),
    updatedAt:    time.Date(2026, 5, 1, 11, 0, 0, 0, time.UTC), // before start
    description:  "inverted timestamps (clock skew / bad data)",
}

// The call may fail for other reasons (no GitHub auth, no workflows), but must
// not fail with the days-validation error.
if err != nil {
assert.NotContains(t, err.Error(), "invalid days value",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The negative check for valid days values looks for "invalid days value", but the positive check for invalid days values (line 329) asserts the error contains "must be 7 or 30". These are different strings.

If the production error message changes to something like "invalid days value: must be 7 or 30", this negative assertion could fail for the wrong reason or silently stop being useful if the message format changes in the opposite direction.

💡 Fix

Mirror the same substring used in the positive check:

// Valid days must NOT return the days-validation error
if err != nil {
    assert.NotContains(t, err.Error(), "must be 7 or 30",
        "P11: days=%d is valid and must not trigger the days-validation error", days)
}

"P7: nil AIC observations must return nil (R-MC-011)")

// Empty observations → runMonteCarlo must return nil.
assert.Nil(t, runMonteCarlo([]int{}, 0, 5.0, rng),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] P7 tests nil and []int{} but misses the case of a non-empty slice where all AIC values are zero: []int{0, 0, 0}. According to forecast.go:593, the engine filters runAIC ≤ 0 → continue at the call site before populating the aicObservations slice. If that filter is accidentally removed or bypassed, runMonteCarlo([]int{0,0,0}, ...) would receive a non-nil, non-empty slice and proceed — covering this in P7 would catch the regression.

💡 Suggestion
// If the engine ever passes zero-AIC observations through, MC should still return
// a near-zero (not nil) result — or document explicitly if nil is expected here.
rng3 := rand.New(rand.NewSource(42))
mc := runMonteCarlo([]int{0, 0, 0}, 3, 5.0, rng3)
// Assert the expected behaviour — either nil (if the engine guards inside runMonteCarlo)
// or a near-zero mean (if filtering is the caller's responsibility).
assert.InDelta(t, 0.0, mc.MeanProjectedAIC, 1e-9,
    "P7: all-zero AIC observations → MC mean must be zero")

This also serves as a regression guard for the fix in P12.

// - TLA+ invariants: P1, P5, P7, P8, P9
// - F* pre/post conditions: P2, P3, P4, P6, P10, P11
// - Z3-SMT schema gap: P12, P13

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/grill-with-docs] The comment references a "Formal Model" section of issue #40114 and classifies predicates by TLA+, F*, and Z3-SMT, but specs/forecast-compliance-fixtures/README.md does not contain a Formal Model section or any TLA+/F*/Z3 content. This creates a documentation trail that leads nowhere.

Additionally, the README's "How to Run Compliance Tests" section still points to forecast_montecarlo_test.go and forecast_test.go and uses go test -run TestMonteCarlo, but the new tests use the TestFormal_* prefix in a new file — the README needs to be updated to stay accurate.

💡 Two small fixes
  1. Either update specs/forecast-compliance-fixtures/README.md to add the Formal Model section (or a link to the issue) so the reference resolves, or simplify this header comment to remove the TLA+/F*/Z3 classification since those artifacts don't exist in the repo.

  2. Add to the README's "How to Run Compliance Tests":

# Run the formal compliance predicates (P1–P13):
go test -v -run "TestFormal_" ./pkg/cli/

And update the file list to include forecast_compliance_fixtures_formal_test.go.

@github-actions

Copy link
Copy Markdown
Contributor

🧪 Test quality analysis by Test Quality Sentinel ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 81/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%). No coding-guideline violations detected (build tag present, no mock libraries, all assertions carry descriptive messages).

…ix stale frontmatter hash

The 9441310 commit on main updated the steps in daily-security-observability.md
but manually patched the lock file without recompiling. Since gh-aw workflow steps
are part of the frontmatter, the frontmatter_hash in the lock file became stale
(a869be8d... vs correct 0951baf3...). The CI checks out the merge commit (PR branch
merged with main), sees the updated md but old lock hash, and TestHashConsistencyAcrossLockFiles
fails. Fix: pull the updated md from main and recompile to get the correct hash.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

@copilot fix https://github.com/github/gh-aw/actions/runs/27797818411/job/82261367508

Fixed in commit 9e7c1d6.

The root cause: commit 9441310fb4 on main updated daily-security-observability.md steps (which live inside the frontmatter in gh-aw workflows) but manually patched the lock file instead of recompiling — leaving the frontmatter_hash metadata stale (a869be8d...). The CGO CI checks out the merge commit (PR branch + main), sees the updated .md computing to 0951baf3... but the lock file still holding a869be8d..., and TestHashConsistencyAcrossLockFiles fails.

Fix: pulled the updated daily-security-observability.md from main into this branch and recompiled to produce a lock file with the correct frontmatter_hash: 0951baf3....

@pelikhan

Copy link
Copy Markdown
Collaborator

@copilot merge main and recompile

Copilot AI and others added 2 commits June 19, 2026 01:21
…orecast-compliance-fixtures-readme

# Conflicts:
#	.github/workflows/daily-security-observability.lock.yml

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…versions (0.27.4→0.27.6)

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

@copilot merge main and recompile

Done. Merged origin/main (253a786), resolved the lock file conflict by recompiling — the daily-security-observability.lock.yml manifest is now updated with the latest container pins (gh-aw-firewall 0.27.4→0.27.6, gh-aw-mcpg v0.3.26→v0.3.27) while keeping the correct frontmatter hash (0951baf3). All 250 workflows compiled cleanly and make agent-report-progress passes.

@github-actions github-actions Bot mentioned this pull request Jun 19, 2026
@pelikhan pelikhan merged commit 3235a3c into main Jun 19, 2026
29 checks passed
@pelikhan pelikhan deleted the copilot/formal-spec-forecast-compliance-fixtures-readme branch June 19, 2026 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants