Skip to content

feat: split triage and pr-fixer into separate GHA workflows#1177

Merged
Gkrumbach07 merged 2 commits intomainfrom
feat/triage-and-pr-fixer-gha
Apr 2, 2026
Merged

feat: split triage and pr-fixer into separate GHA workflows#1177
Gkrumbach07 merged 2 commits intomainfrom
feat/triage-and-pr-fixer-gha

Conversation

@Gkrumbach07
Copy link
Copy Markdown
Contributor

@Gkrumbach07 Gkrumbach07 commented Apr 2, 2026

Summary

  • Split the monolithic pr-fixer.yml into two focused workflows with distinct cadences
  • triage.yml — daily issue discovery and session creation
  • pr-fixer.yml — 30 min PR management with session reuse and circuit breaker

Changes

triage.yml (new)

  • Daily cron (8am UTC weekdays) + manual dispatch
  • Queries Jira + GH issues for untriaged work (newest first, max 5/cycle)
  • Creates child Implement sessions (opus) for actionable issues
  • Skips Jira gracefully if MCP tool unavailable
  • Labels: adds ai-triaged to issues, child sessions add ai-managed to PRs

pr-fixer.yml (rewritten)

  • fix-single: triggered only by @ambient-fix PR comment (was also label + manual)
  • fix-batch: 30 min cron + manual dispatch, single orchestrator session manages all ai-managed PRs
    • Reads frontmatter for session ID linkage
    • Checks for changes since last_action (ignores bot commits)
    • Sends messages to existing sessions (reuse) instead of creating new ones
    • Circuit breaker: 3 retries then ai-needs-human label
  • Removed old matrix-based fix-batch/fix-each pattern
  • Unified label: ai-managed (was agent-managed)

Both workflows

  • Use ambient-action@v0.0.3 with inactivity timeout (timeout: '60' = 60s idle auto-stop)
  • Model: claude-opus-4-6

Test plan

  • Manual dispatch triage.yml — verify it discovers issues and creates sessions
  • Comment @ambient-fix on a PR — verify fix-single triggers
  • Manual dispatch pr-fixer.yml (no PR number) — verify batch orchestrator runs
  • Verify ai-managed label triggers batch pickup on next cycle
  • Verify circuit breaker fires after 3 retries

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Chores
    • PR fixer runs more frequently (every 30 minutes on weekdays), uses an upgraded model, and includes improved session state and retry handling.
    • Concurrency controls added to coordinate fixer and triage runs.
  • New Features
    • Added an automated Issue Triage workflow to regularly or manually triage GitHub and Jira items, label/comment, and create follow-up investigation sessions with session summaries.

- triage.yml: daily cron (8am UTC weekdays), discovers untriaged
  Jira/GH issues, creates Implement sessions (max 5/cycle)
- pr-fixer.yml: 30 min cron, manages all ai-managed PRs with
  session reuse, change detection, and circuit breaker
  - fix-single: triggered only by @ambient-fix comment
  - fix-batch: scheduled + manual dispatch
- Both use ambient-action@v0.0.3 with inactivity timeout (60s)
- Unified label: ai-managed (was agent-managed)
- Removed old matrix-based batch pattern
- Skip Jira gracefully if MCP tool unavailable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 2, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 98c80567-e61c-4ddb-8e9f-b6ebc0587c6f

📥 Commits

Reviewing files that changed from the base of the PR and between ef7e2c0 and c97a503.

📒 Files selected for processing (2)
  • .github/workflows/pr-fixer.yml
  • .github/workflows/triage.yml
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/triage.yml

📝 Walkthrough

Walkthrough

PR fixer workflow reworked to run fixes from issue comments and a single orchestrator batch job using session-frontmatter prompts and Claude Opus; batching now uses an orchestrator prompt with retry/circuit-breaker logic. Added a scheduled Issue Triage workflow that queries Jira/GitHub and spawns child investigation/fix sessions.

Changes

Cohort / File(s) Summary
PR Fixer Workflow
.github/workflows/pr-fixer.yml
Replaced pull_request:labeled and label/workflow-dispatch logic with issue_comment-only invocation for fix-single. Consolidated batch design into a single fix-batch orchestrator job; removed matrix fan-out and agent-managed label workflow. Bumped ambient-code/ambient-action to v0.0.3, model → claude-opus-4-6, increased timeouts (25→60s, 30→60s). Added concurrency group, session/frontmatter-driven prompts (writes <!-- acp:session_id=... -->), retry circuit-breaker (retry_count >= 3), and schedule cadence changed to every 30 minutes (weekdays).
Issue Triage Workflow
.github/workflows/triage.yml
New workflow scheduled weekdays 08:00 UTC (and manual dispatch). Uses ambient-code/ambient-action@v0.0.3 with claude-opus-4-6 prompt to fetch up to five untriaged Jira items (fallback to GitHub), skip externally-managed PRs, create child investigation/fix sessions, label items ai-triaged, and comment with session links. Includes a Session summary step that writes outputs to $GITHUB_STEP_SUMMARY.

Sequence Diagram(s)

sequenceDiagram
    participant IssueComment as Issue Comment Event
    participant GitHubAPI as GitHub API
    participant PRBody as PR Body Parser
    participant AmbientAction as ambient-action v0.0.3
    participant Claude as Claude Opus Model
    participant SessionMgmt as Session Management

    IssueComment->>GitHubAPI: event payload (issue_comment)
    GitHubAPI-->>PRBody: fetch PR/issue body
    PRBody->>PRBody: extract acp:session_id and frontmatter
    PRBody-->>AmbientAction: invoke fix-single with session context
    AmbientAction->>Claude: run session-driven fix prompt
    Claude->>SessionMgmt: read/create session state
    Claude-->>AmbientAction: return fix operations & updated frontmatter
    AmbientAction->>GitHubAPI: apply PR body updates/comments
    GitHubAPI-->>AmbientAction: confirm update
Loading
sequenceDiagram
    participant Schedule as Scheduler (cron/dispatch)
    participant AmbientAction as ambient-action v0.0.3
    participant Jira as Jira (MCP)
    participant GitHub as GitHub Issues
    participant Claude as Claude Opus Model
    participant ChildSession as Child Session Creator

    Schedule->>AmbientAction: trigger triage job
    AmbientAction->>Jira: query untriaged items (max 5)
    alt Jira available
        Jira-->>AmbientAction: return items
    else Jira unavailable
        AmbientAction->>GitHub: query untriaged issues
        GitHub-->>AmbientAction: return items
    end
    AmbientAction->>Claude: run triage orchestrator prompt
    Claude->>GitHub: check for ai-managed PRs / existing sessions
    Claude->>ChildSession: create investigation/fix child sessions as needed
    ChildSession-->>Claude: return session name & UID
    Claude->>GitHub: add `ai-triaged` label and comment with session link
    GitHub-->>AmbientAction: confirm updates
    AmbientAction->>Schedule: write session summary to step summary
Loading
🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title follows Conventional Commits format (feat: prefix) and accurately describes the main change: splitting a monolithic workflow into two separate workflows (triage.yml and pr-fixer.yml).
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Performance And Algorithmic Complexity ✅ Passed Workflow changes use bounded iteration patterns with explicit limits (200 PRs, 5 items/cycle) and no unbounded N+1, O(n²)+, or caching issues.
Security And Secret Handling ✅ Passed Both workflows securely handle authentication using GitHub secrets mechanism with automatic masking. No plaintext credentials, hardcoded tokens, API keys, or injection vulnerabilities detected.
Kubernetes Resource Safety ✅ Passed PR modifies only GitHub Actions workflow files, which are CI/CD configurations not subject to Kubernetes resource safety checks.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/triage-and-pr-fixer-gha
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/triage-and-pr-fixer-gha

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/pr-fixer.yml:
- Around line 50-61: The existing session lookup step (id: existing) writes
session_id and source to GITHUB_OUTPUT but those outputs are never consumed;
update later steps to read steps.existing.outputs.session_id and
steps.existing.outputs.source where session continuity is needed, and stop
unconditionally resetting retry_count to 0 — instead only initialize retry_count
when no session_id was found (e.g., if: steps.existing.outputs.session_id ==
''), or pass the previous retry_count through the workflow when a session
exists; in short, wire the existing step's outputs into the logic that decides
session continuation and remove the hard-reset of retry_count so an `@ambient-fix`
resumes the prior session/retry state.
- Around line 8-13: The scheduled workflow can overlap with long-running runs;
update the workflow to prevent concurrent runs by adding a top-level concurrency
block (e.g., concurrency: group: "pr-fixer-${{ github.ref }}"
cancel-in-progress: false) so new triggers (from schedule or workflow_dispatch)
will queue until the current run finishes, or alternatively change the schedule
cron to an interval >= the max run time (e.g., hourly) to avoid overlap; apply
this change near the existing schedule and workflow_dispatch entries so all
triggers are serialized.
- Around line 141-173: The workflow currently increments retry_count
unconditionally after sending a message ("After sending: increment
retry_count..."), which treats normal churn as failure; change this so
retry_count is incremented only when the agent fails to make progress (no new
external commits, no new/updated review comments, CI still failing, merge
conflicts unresolved) or when the agent explicitly reports a failure state, and
reset or leave retry_count unchanged when the agent's action produced a
meaningful change; update the logic around the session management step that
references retry_count/last_action/ai-needs-human (the "After sending" block and
any code that updates frontmatter like <!-- acp:session_id=... last_action=...
retry_count=... -->) to base increments on a success flag or a comparison of
repo state before/after the agent run rather than incrementing unconditionally.
- Around line 68-71: The workflow currently uses the mutable tag "uses:
ambient-code/ambient-action@v0.0.3" in two places (the "Fix PR" step and the
other step later) which weakens supply-chain security; update both "uses:
ambient-code/ambient-action@v0.0.3" occurrences to pin the action to the
corresponding full commit SHAs (replace `@v0.0.3` with @<commit-sha> for each
occurrence), ensuring you fetch the exact commit SHAs from the
ambient-code/ambient-action repository and update both instances consistently.
- Around line 131-133: The gh CLI call that lists PRs uses the default of 30
results; update the Run command that invokes `gh pr list --repo ${{
github.repository }} --state open --label ai-managed --search "draft:false"` to
include an explicit limit (e.g. add `--limit 1000`) so all matching PRs are
returned; modify the `gh pr list` invocation to `gh pr list ... --search
"draft:false" --limit 1000` (or another appropriate numeric limit) to ensure no
PRs are silently omitted.

In @.github/workflows/triage.yml:
- Around line 3-10: Add a top-level concurrency guard to the triage workflow to
serialize scheduled and manual runs: insert a concurrency block (e.g.,
concurrency: group: 'triage-${{ github.workflow }}' cancel-in-progress: false)
so that runs triggered by schedule and workflow_dispatch cannot overlap and will
queue instead of racing; update the .github/workflows/triage.yml file around the
existing on: schedule and workflow_dispatch sections to include this concurrency
configuration.
- Around line 20-23: The workflow step named "Run triage orchestrator" (id:
session) currently pins the action as ambient-code/ambient-action@v0.0.3;
replace the tag with the specific full commit SHA for
ambient-code/ambient-action (e.g.,
ambient-code/ambient-action@<full-commit-sha>) so the action is immutably
pinned; update the uses value in that step accordingly and commit the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ae0efb6c-68d1-4b14-bfb5-4c0fbac18bc4

📥 Commits

Reviewing files that changed from the base of the PR and between 4b928a0 and ef7e2c0.

📒 Files selected for processing (2)
  • .github/workflows/pr-fixer.yml
  • .github/workflows/triage.yml

Comment on lines +50 to +61
- name: Check for existing session
if: steps.fork_check.outputs.skip != 'true'
id: existing
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# Read PR body and extract session_id from frontmatter
BODY=$(gh pr view ${{ steps.pr.outputs.number }} --repo "${{ github.repository }}" --json body --jq '.body')
SESSION_ID=$(echo "$BODY" | grep -oP 'acp:session_id=\K[^ ]+' | head -1 || echo "")
SOURCE=$(echo "$BODY" | grep -oP 'source=\K[^ ]+' | head -1 || echo "")
echo "session_id=$SESSION_ID" >> $GITHUB_OUTPUT
echo "source=$SOURCE" >> $GITHUB_OUTPUT
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The existing-session lookup currently has no effect.

Line 60 stores session_id, but nothing later reads steps.existing.outputs.session_id. Line 89 also hard-resets retry_count=0, so an @ambient-fix on an already-managed PR loses its prior session/retry state instead of continuing it.

Also applies to: 81-90

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/pr-fixer.yml around lines 50 - 61, The existing session
lookup step (id: existing) writes session_id and source to GITHUB_OUTPUT but
those outputs are never consumed; update later steps to read
steps.existing.outputs.session_id and steps.existing.outputs.source where
session continuity is needed, and stop unconditionally resetting retry_count to
0 — instead only initialize retry_count when no session_id was found (e.g., if:
steps.existing.outputs.session_id == ''), or pass the previous retry_count
through the workflow when a session exists; in short, wire the existing step's
outputs into the logic that decides session continuation and remove the
hard-reset of retry_count so an `@ambient-fix` resumes the prior session/retry
state.

Comment on lines 68 to +71
- name: Fix PR
if: steps.fork_check.outputs.skip != 'true'
id: session
uses: ambient-code/ambient-action@v0.0.2
uses: ambient-code/ambient-action@v0.0.3
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /dev/null 2>&1; pwd > /dev/null; cat -n ".github/workflows/pr-fixer.yml" | head -130

Repository: ambient-code/platform

Length of output: 6302


Pin both ambient-code/ambient-action references to full commit SHAs instead of @v0.0.3.

Semantic version tags are mutable and weaken supply-chain security for a workflow that modifies PR state and accesses secrets. Replace with pinned commit SHAs at lines 71 and 123.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/pr-fixer.yml around lines 68 - 71, The workflow currently
uses the mutable tag "uses: ambient-code/ambient-action@v0.0.3" in two places
(the "Fix PR" step and the other step later) which weakens supply-chain
security; update both "uses: ambient-code/ambient-action@v0.0.3" occurrences to
pin the action to the corresponding full commit SHAs (replace `@v0.0.3` with
@<commit-sha> for each occurrence), ensuring you fetch the exact commit SHAs
from the ambient-code/ambient-action repository and update both instances
consistently.

Comment on lines +141 to +173
## 2. Circuit breaker
If retry_count >= 3: comment "AI was unable to resolve after 3 attempts. Needs human attention.",
add ai-needs-human label, remove ai-managed label. Skip.

## 3. Check for changes since last_action
Ignore commits authored by the bot. Only look for:
- New commits by someone other than the bot
- New or updated review comments
- New CI failures
- Merge conflicts from base branch changes

If nothing changed → skip entirely.

## 4. Something changed — act
- CI failing → send message to existing session with CI logs
- New review comments → send message with the comments
- Merge conflicts → send message to rebase
- New external commits → send message to review and ensure CI passes

## 5. Session management
Before sending a message, check session status:
- Running → send the message
- Stopped → restart (reuse), then send
- Not found → create new session with this prompt:
"You are maintaining an open pull request.
PR: <URL> Source issue: <KEY> (if known)
1. Check out the PR branch.
2. Resolve merge conflicts, fix CI failures, address review comments.
3. Push fixes. Do not merge/close/force-push.
4. Write frontmatter: <!-- acp:session_id=$AGENTIC_SESSION_NAME source=<KEY> last_action=<NOW> retry_count=0 -->
5. If broken beyond repair, comment and stop."

After sending: increment retry_count and update last_action in frontmatter.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The circuit breaker currently counts normal churn as failure.

Line 173 increments retry_count after every message, regardless of outcome. Three ordinary rounds of new comments/CI churn will trip ai-needs-human even if the agent is making progress.

Suggested prompt tweak
-            After sending: increment retry_count and update last_action in frontmatter.
+            After sending:
+            - If the PR is still broken after your attempted fix, increment retry_count.
+            - If the PR is healthy again, reset retry_count to 0.
+            - Update last_action in frontmatter.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## 2. Circuit breaker
If retry_count >= 3: comment "AI was unable to resolve after 3 attempts. Needs human attention.",
add ai-needs-human label, remove ai-managed label. Skip.
## 3. Check for changes since last_action
Ignore commits authored by the bot. Only look for:
- New commits by someone other than the bot
- New or updated review comments
- New CI failures
- Merge conflicts from base branch changes
If nothing changed → skip entirely.
## 4. Something changed — act
- CI failing → send message to existing session with CI logs
- New review comments → send message with the comments
- Merge conflicts → send message to rebase
- New external commits → send message to review and ensure CI passes
## 5. Session management
Before sending a message, check session status:
- Running → send the message
- Stopped → restart (reuse), then send
- Not found → create new session with this prompt:
"You are maintaining an open pull request.
PR: <URL> Source issue: <KEY> (if known)
1. Check out the PR branch.
2. Resolve merge conflicts, fix CI failures, address review comments.
3. Push fixes. Do not merge/close/force-push.
4. Write frontmatter: <!-- acp:session_id=$AGENTIC_SESSION_NAME source=<KEY> last_action=<NOW> retry_count=0 -->
5. If broken beyond repair, comment and stop."
After sending: increment retry_count and update last_action in frontmatter.
## 2. Circuit breaker
If retry_count >= 3: comment "AI was unable to resolve after 3 attempts. Needs human attention.",
add ai-needs-human label, remove ai-managed label. Skip.
## 3. Check for changes since last_action
Ignore commits authored by the bot. Only look for:
- New commits by someone other than the bot
- New or updated review comments
- New CI failures
- Merge conflicts from base branch changes
If nothing changed → skip entirely.
## 4. Something changed — act
- CI failing → send message to existing session with CI logs
- New review comments → send message with the comments
- Merge conflicts → send message to rebase
- New external commits → send message to review and ensure CI passes
## 5. Session management
Before sending a message, check session status:
- Running → send the message
- Stopped → restart (reuse), then send
- Not found → create new session with this prompt:
"You are maintaining an open pull request.
PR: <URL> Source issue: <KEY> (if known)
1. Check out the PR branch.
2. Resolve merge conflicts, fix CI failures, address review comments.
3. Push fixes. Do not merge/close/force-push.
4. Write frontmatter: <!-- acp:session_id=$AGENTIC_SESSION_NAME source=<KEY> last_action=<NOW> retry_count=0 -->
5. If broken beyond repair, comment and stop."
After sending:
- If the PR is still broken after your attempted fix, increment retry_count.
- If the PR is healthy again, reset retry_count to 0.
- Update last_action in frontmatter.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/pr-fixer.yml around lines 141 - 173, The workflow
currently increments retry_count unconditionally after sending a message ("After
sending: increment retry_count..."), which treats normal churn as failure;
change this so retry_count is incremented only when the agent fails to make
progress (no new external commits, no new/updated review comments, CI still
failing, merge conflicts unresolved) or when the agent explicitly reports a
failure state, and reset or leave retry_count unchanged when the agent's action
produced a meaningful change; update the logic around the session management
step that references retry_count/last_action/ai-needs-human (the "After sending"
block and any code that updates frontmatter like <!-- acp:session_id=...
last_action=... retry_count=... -->) to base increments on a success flag or a
comparison of repo state before/after the agent run rather than incrementing
unconditionally.

Comment on lines +20 to +23
- name: Run triage orchestrator
id: session
uses: ambient-code/ambient-action@v0.0.3
with:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /
find . -name "triage.yml" -type f 2>/dev/null | head -20

Repository: ambient-code/platform

Length of output: 113


🏁 Script executed:

cat -n .github/workflows/triage.yml

Repository: ambient-code/platform

Length of output: 4347


🏁 Script executed:

# Check if there are any exposed secrets in the workflow
rg -n "(GITHUB_TOKEN|secrets\.|env\.)" .github/workflows/triage.yml -A 2

Repository: ambient-code/platform

Length of output: 351


Pin ambient-action to a full commit SHA.

Version tags like @v0.0.3 are mutable and can be retagged, allowing behavior changes without triggering another review. Replace with the commit SHA. Secrets and permissions are correctly scoped.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/triage.yml around lines 20 - 23, The workflow step named
"Run triage orchestrator" (id: session) currently pins the action as
ambient-code/ambient-action@v0.0.3; replace the tag with the specific full
commit SHA for ambient-code/ambient-action (e.g.,
ambient-code/ambient-action@<full-commit-sha>) so the action is immutably
pinned; update the uses value in that step accordingly and commit the change.

- Add concurrency guards to prevent overlapping runs
- Remove unused existing-session lookup step from fix-single
- Add --limit 200 to gh pr list in batch orchestrator
- Fix circuit breaker: only increment retry_count on failure,
  reset to 0 when PR is healthy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Gkrumbach07 Gkrumbach07 merged commit 54b5803 into main Apr 2, 2026
12 checks passed
@Gkrumbach07 Gkrumbach07 deleted the feat/triage-and-pr-fixer-gha branch April 2, 2026 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant