Skip to content

Use failure-specific issue titles in conclusion job#38615

Merged
pelikhan merged 5 commits into
mainfrom
copilot/customize-agent-failure-titles
Jun 11, 2026
Merged

Use failure-specific issue titles in conclusion job#38615
pelikhan merged 5 commits into
mainfrom
copilot/customize-agent-failure-titles

Conversation

Copilot AI commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

This PR makes failure issues from the conclusion job more specific by deriving titles from the detected failure mode instead of always emitting [aw] <workflow> failed. It improves triage signal and issue scanability without changing existing fallback behavior.

  • What changed

    • Added buildFailureIssueTitle(...) in actions/setup/js/handle_agent_failure.cjs.
    • Replaced static title generation with failure-aware title selection.
    • Kept existing generic fallback when no specific condition matches.
  • Title mapping added

    • Timeout → [aw] <workflow> timed out
    • Missing safe outputs → [aw] <workflow> produced no safe outputs
    • Report incomplete → [aw] <workflow> reported incomplete result
    • Plus targeted variants for budget/rate-limit and selected infra/auth failure signals, with priority ordering.
  • Test updates

    • Extended actions/setup/js/handle_agent_failure.test.cjs with focused assertions for:
      • timeout title
      • missing safe outputs title
      • report_incomplete title
    • Tightened env var cleanup in new test paths to avoid cross-test leakage.
const issueTitle = buildFailureIssueTitle({
  workflowName: sanitizedWorkflowName,
  isTimedOut,
  hasMissingSafeOutputs,
  hasReportIncomplete,
  hasMissingTool,
  hasMissingData,
  hasCacheMissMisconfiguration,
  hasToolDenialsExceeded,
  hasAppTokenMintingFailed,
  hasLockdownCheckFailed,
  hasStaleLockFileFailed,
  hasDailyAICExceeded,
  aiCreditsRateLimitError,
  maxAICreditsExceeded,
});

Copilot AI and others added 3 commits June 11, 2026 13:10
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title Customize agent failure issue titles in conclusion handler Use failure-specific issue titles in conclusion job Jun 11, 2026
Copilot AI requested a review from pelikhan June 11, 2026 13:26
@pelikhan pelikhan marked this pull request as ready for review June 11, 2026 13:29
Copilot AI review requested due to automatic review settings June 11, 2026 13:29
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer was skipped during the skills-based review.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #38615 does not have the implementation label and has 0 new lines of code in business logic directories (threshold 100).

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel completed test quality analysis.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the conclusion-job failure issue creation flow to emit failure-specific GitHub issue titles (derived from detected failure signals) instead of always using a generic "[aw] <workflow> failed" title, improving triage and scanability.

Changes:

  • Added buildFailureIssueTitle(...) to derive a more specific issue title based on prioritized failure signals (timeout, missing safe outputs, report incomplete, budget/rate-limit, etc.).
  • Switched main() to use buildFailureIssueTitle(...) when creating/updating failure issues.
  • Expanded handle_agent_failure.test.cjs with assertions for the new timeout / missing-safe-outputs / report-incomplete title variants, plus tighter env cleanup.
Show a summary per file
File Description
actions/setup/js/handle_agent_failure.cjs Adds failure-aware issue title builder and wires it into the failure-issue creation path.
actions/setup/js/handle_agent_failure.test.cjs Adds focused tests validating the new title selection behavior and improves env cleanup.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 2

Comment on lines +239 to +243
if (options.isTimedOut) return `[aw] ${workflowName} timed out`;
if (options.hasToolDenialsExceeded) return `[aw] ${workflowName} exceeded tool denial limit`;
if (options.hasCacheMissMisconfiguration) return `[aw] ${workflowName} has cache-memory path mismatch`;
if (options.hasReportIncomplete) return `[aw] ${workflowName} reported incomplete result`;
if (options.hasMissingSafeOutputs) return `[aw] ${workflowName} produced no safe outputs`;
if (options.hasStaleLockFileFailed) return `[aw] ${workflowName} has stale lock file`;
if (options.isTimedOut) return `[aw] ${workflowName} timed out`;
if (options.hasToolDenialsExceeded) return `[aw] ${workflowName} exceeded tool denial limit`;
if (options.hasCacheMissMisconfiguration) return `[aw] ${workflowName} has cache-memory path mismatch`;
@github-actions github-actions Bot mentioned this pull request Jun 11, 2026
@github-actions

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 100/100 — Excellent

Analyzed 3 test(s): 3 design, 0 implementation, 0 guideline violation(s).

📊 Metrics & Test Classification (3 tests analyzed)
Metric Value
New/modified tests analyzed 3
✅ Design tests (behavioral contracts) 3 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 3 (100%)
Duplicate test clusters 0
Test inflation detected No (ratio 1.78 — under 2:1 threshold)
🚨 Coding-guideline violations 0

Test Classification Details

Test File Classification Issues Detected
uses a precise timeout title when the agent times out actions/setup/js/handle_agent_failure.test.cjs:754 ✅ Design None
uses a precise missing safe outputs title actions/setup/js/handle_agent_failure.test.cjs:787 ✅ Design None
uses a precise report incomplete title actions/setup/js/handle_agent_failure.test.cjs:818 ✅ Design None

Language Support

Tests analyzed:

  • 🟨 JavaScript (*.test.cjs): 3 tests (vitest)

No Go test files were modified in this PR.

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). All 3 new tests verify observable behavioral contracts — specifically the precise issue title strings produced for distinct failure classes (timed_out, missing safe outputs, report_incomplete). GitHub API calls are mocked via vi.fn() (external I/O — legitimate). Test inflation ratio is 1.78 (under the 2:1 threshold).

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §27350333029

🧪 Test quality analysis by Test Quality Sentinel · 244.9 AIC · ⌖ 19.3 AIC ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 100/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 3 new tests verify behavioral contracts for the precise failure-title selection logic.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REQUEST_CHANGES — one blocking correctness bug, one maintainability gap

🔴 Blocking: cascade detection broken for all new precise titles

FAILURE_TITLE_PATTERN (/^\[aw\] .+ failed$/) and the findRecentFailureIssues search query ("failed" in:title) were not updated to match the 13 titles now produced by buildFailureIssueTitle. 11 of those 13 titles do not contain "failed", and the two that do (failed to mint GitHub App token, failed lockdown check) end mid-sentence so they also fail the $ anchor.

Result: a cascade of timed out, produced no safe outputs, exceeded budget, etc. issues will never be counted by cascade detection, silently suppressing the rollup that exists specifically to catch systemic failures.

Fix: relax FAILURE_TITLE_PATTERN to /^\[aw\] .+/ and drop the "failed" keyword from the cascade search query. See inline comment.

🟡 Maintainability: `buildFailureIssueTitle` not exported; 10 of 13 branches uncovered

buildFailureMatchCategories (the direct sibling) is exported and unit-tested. buildFailureIssueTitle is not exported, so its branches can only be exercised through the heavyweight main() integration path. The three new tests cover only isTimedOut, hasMissingSafeOutputs, and hasReportIncomplete; the highest-priority branches (hasDailyAICExceeded, maxAICreditsExceeded, ...) have zero coverage for the actual title strings they emit. A single typo in those strings goes undetected. See inline comment.

🔎 Code quality review by PR Code Quality Reviewer · ⌖ 13.4 AIC

if (options.hasMissingSafeOutputs) return `[aw] ${workflowName} produced no safe outputs`;
if (options.hasMissingTool) return `[aw] ${workflowName} is missing required tool`;
if (options.hasMissingData) return `[aw] ${workflowName} is missing required data`;
return `[aw] ${workflowName} failed`;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cascade detection is now blind to 11 of 13 new titles: FAILURE_TITLE_PATTERN at line 2072 (/^\[aw\] .+ failed$/) matches only the generic fallback, and the cascade search query at line 2124 hardcodes "failed" in:title. Every new precise title except the two failed * ones will be silently skipped by findRecentFailureIssues, so a flood of timed out, produced no safe outputs, exceeded budget, etc. issues will never trigger cascade rollup.

💡 Suggested fix

Update the constant and its comment (line 2072):

/** Matches all individual failure issue titles produced by handle_agent_failure */
const FAILURE_TITLE_PATTERN = /^\[aw\] .+/;

And the search query at line 2124 — drop the "failed" keyword so new-style titles are returned by the GitHub search API:

const searchQuery = `repo:${owner}/${repo} is:issue is:open label:agentic-workflows "[aw]" in:title created:>=${since}`;

Of the 13 possible titles this function returns, 11 do not contain the word "failed" at all, and the two that do (failed to mint ... / failed lockdown check) end mid-title so they also fail the $ anchor. Both the search filter and the regex need to widen.

* @param {boolean} options.maxAICreditsExceeded
* @returns {string}
*/
function buildFailureIssueTitle(options) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildFailureIssueTitle is not exported, leaving 10 of 13 title branches untested at the unit level: its parallel function buildFailureMatchCategories is exported and can be exercised directly; buildFailureIssueTitle cannot. The three added integration tests cover only isTimedOut, hasMissingSafeOutputs, and hasReportIncomplete. The highest-priority branches — hasDailyAICExceeded, maxAICreditsExceeded, aiCreditsRateLimitError, hasAppTokenMintingFailed, hasLockdownCheckFailed, hasStaleLockFileFailed, hasToolDenialsExceeded, hasCacheMissMisconfiguration, hasMissingTool, hasMissingData — have no coverage for their title strings.

💡 Suggested fix

Add buildFailureIssueTitle to module.exports alongside buildFailureMatchCategories:

module.exports = {
  // ...
  buildFailureMatchCategories,
  buildFailureIssueTitle,   // add this
  FAILURE_CATEGORIES_PATH,
};

Then add lightweight unit tests that call buildFailureIssueTitle directly with each flag set in isolation, similar to how buildFailureMatchCategories is tested, so a typo in any branch title is caught immediately instead of at runtime.

@pelikhan

Copy link
Copy Markdown
Collaborator

@copilot run pr-finisher skill

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

@copilot run pr-finisher skill

Completed. I ran the PR-finisher flow, addressed in-scope review feedback, and pushed commit f223fd8 with the fixes.

@pelikhan pelikhan merged commit 92bb513 into main Jun 11, 2026
13 checks passed
@pelikhan pelikhan deleted the copilot/customize-agent-failure-titles branch June 11, 2026 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants