Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/daily-formal-spec-verifier.lock.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 8 additions & 1 deletion .github/workflows/daily-formal-spec-verifier.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,13 @@ Use existing types, functions, and interfaces from the codebase where possible (

Create exactly one issue using the `create_issue` safe output.

### Output Contract (Required)

1. Draft the title and body locally first if needed, but emit exactly one final `create_issue` safe output only after the full payload is complete.
2. Do **not** use `bash`, `cli-proxy`, or the `safeoutputs` CLI to create the issue or inspect the tool schema. Emit the safe output directly with `title` and `body` arguments.
3. Never retry `create_issue` with empty, placeholder, or partial arguments.
4. If the quality checks below cannot be met, emit `report_incomplete` directly as a safe output instead of `create_issue`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contradictory primary directive: the unchanged instruction above this section ("Create exactly one issue using the create_issue safe output.") is unconditional; adding Output Contract #4 as a subordinate clause does not override it, and an agent reasoning top-down may always choose create_issue.

💡 Suggested fix

Change the standalone instruction (~line 175) from:

Create exactly one issue using the `create_issue` safe output.

to:

Emit exactly one final safe output: use `create_issue` if the quality bar is met, or `report_incomplete` if it cannot be met.

This makes both allowed paths explicit at the top level rather than introducing an exception seven lines later inside a sub-section, which an agent can reasonably interpret as advisory rather than authoritative.


### Issue format

Title: `[formal-spec] <SpecFileName> — Formal model & test suite — <YYYY-MM-DD>`
Expand Down Expand Up @@ -239,7 +246,7 @@ Before emitting `create_issue`, verify the body:
- The generated test file compiles without errors (review for syntax mistakes).
- Is at least 1200 characters long.

If these checks cannot be met, emit `report_incomplete` instead of `create_issue`.
If these checks cannot be met, emit `report_incomplete` directly as a safe output instead of `create_issue`.

---

Expand Down
29 changes: 29 additions & 0 deletions pkg/workflow/prompts_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,35 @@ func TestDailyCavemanOptimizerUsesConcreteClaudeModelsForExperiment(t *testing.T
}
}

func TestDailyFormalSpecVerifierDefinesDirectSafeOutputContract(t *testing.T) {
repoRoot, err := findRepoRoot()
if err != nil {
t.Fatalf("Failed to find repo root: %v", err)
}

workflowFile := filepath.Join(repoRoot, ".github", "workflows", "daily-formal-spec-verifier.md")
content, err := os.ReadFile(workflowFile)
if err != nil {
t.Fatalf("Failed to read workflow file: %v", err)
}

workflow := string(content)
requiredContract := "Draft the title and body locally first if needed, but emit exactly one final `create_issue` safe output only after the full payload is complete."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The full-sentence verbatim string makes this assertion brittle — a typo fix or minor reword in the workflow breaks the test even when the contract semantics are unchanged.

💡 Consider a shorter key-phrase

A shorter key substring captures the intent with less fragility:

// Instead of copying the full sentence:
requiredContract := "emit exactly one final `create_issue` safe output only after the full payload is complete"

This is a pattern-level observation; the existing convention in this file does use full sentences. Worth discussing as a team norm if the tests start drifting out of sync with minor prompt rewords.

if !strings.Contains(workflow, requiredContract) {
t.Fatal("Expected daily-formal-spec-verifier workflow to require a single final create_issue safe output")
}

noShellGuidance := "Do **not** use `bash`, `cli-proxy`, or the `safeoutputs` CLI to create the issue or inspect the tool schema. Emit the safe output directly with `title` and `body` arguments."
if !strings.Contains(workflow, noShellGuidance) {
t.Fatal("Expected daily-formal-spec-verifier workflow to forbid bash/CLI safe-output invocation")
}

reportIncompleteGuidance := "If the quality checks below cannot be met, emit `report_incomplete` directly as a safe output instead of `create_issue`."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage gap: this assertion covers Output Contract item #4 ("quality checks below") but the quality-bar paragraph updated at line ~249 ("these checks") uses different wording and is never verified — a silent revert of that line would leave all three assertions green.

💡 Suggested fix

Add a fourth assertion after line 353 targeting the quality-bar wording specifically:

qualityBarFallback := "If these checks cannot be met, emit `report_incomplete` directly as a safe output instead of `create_issue`."
if !strings.Contains(workflow, qualityBarFallback) {
    t.Fatal("Expected daily-formal-spec-verifier quality bar to require direct report_incomplete fallback")
}

This closes the gap: the existing reportIncompleteGuidance check locks in the Output Contract section; this new check locks in the quality-bar paragraph update. Together they ensure both locations stay consistent.

if !strings.Contains(workflow, reportIncompleteGuidance) {
t.Fatal("Expected daily-formal-spec-verifier workflow to require direct report_incomplete fallback")
}
Comment on lines +339 to +353
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Rule 3 (Never retry \create_issue` with empty, placeholder, or partial arguments`) has no assertion here — if it is accidentally removed or rephrased, no test will catch it.

💡 Suggested addition (before the closing `}`)
noRetryGuidance := "Never retry `create_issue` with empty, placeholder, or partial arguments."
if !strings.Contains(workflow, noRetryGuidance) {
	t.Fatal("Expected daily-formal-spec-verifier workflow to forbid retrying create_issue with partial args")
}

Rules 1, 2, and 4 are already covered; this closes the gap for rule 3.


func TestDailyCacheStrategyAnalyzerUsesCodexCompatibleModelsForExperiment(t *testing.T) {
repoRoot, err := findRepoRoot()
if err != nil {
Expand Down
Loading