Skip to content

[aw-failures] P1: Code Simplifier exhausts api-proxy invocation cap (maxRuns 50/50) → CAPIError 429, 6/6 fail + unclassified #39199

@github-actions

Description

@github-actions

Raise the api-proxy maxRuns cap for Code Simplifier (or cut its sub-agent fan-out below 50), and add a classifier flag for the 50/50 invocation cap — it causes 100% of this workflow's failures and is currently bucketed as an unclassified exit-1.

Problem statement

Code Simplifier hits the api-proxy per-run LLM invocation-count cap (maxRuns: 50). Once 50 model invocations are consumed, every subsequent request returns CAPIError: 429 — Maximum LLM invocations exceeded (50 / 50). The Copilot harness retries 3× (all 429), then gives up and the Execute GitHub Copilot CLI step exits 1. This is distinct from AIC credit-budget exhaustion — AIC was only 650.154 / 1000 (Daily workflow AIC guardrail exceeded: false).

Affected workflows and run IDs

  • Code Simplifier (.github/workflows/code-simplifier.lock.yml) — 6/6 consecutive failures.

Probable root cause

The workflow fans out into many sub-agents (multiple read_agent / scout / scope-filter sub-agent calls observed), exhausting the api-proxy maxRuns: 50 invocation cap (defined in the awf-config apiProxy, separate from maxAiCredits: 1000). The retry loop cannot recover because the cap is per-run and already saturated. Compounding: the conclusion classifier has no flag for the 50/50 cap — GH_AW_AGENTIC_ENGINE_TIMEOUT, GH_AW_AI_CREDITS_RATE_LIMIT_ERROR, GH_AW_UNKNOWN_MODEL_AI_CREDITS, GH_AW_INFERENCE_ACCESS_ERROR, GH_AW_MODEL_NOT_SUPPORTED_ERROR are all false, so the failure is silently bucketed as a generic exit-1.

Proposed remediation

  1. Reduce fan-out OR raise the cap — either lower Code Simplifier's sub-agent count so a normal run stays under 50 invocations, or raise maxRuns for this workflow to a level its legitimate fan-out needs.
  2. Add a classifier flag for CAPIError 429 "Maximum LLM invocations exceeded" (e.g. GH_AW_INVOCATION_CAP_EXCEEDED) so these failures are categorized and distinguishable from AIC-budget and engine-timeout failures in future investigations.

Success criteria / verification

  • Code Simplifier completes without Maximum LLM invocations exceeded (50/50) for ≥3 consecutive scheduled runs.
  • A 50/50 invocation-cap failure (if it recurs) is surfaced as a dedicated classified flag rather than a generic exit-1.

Parent: #29109. Analyzed runs: 27488668377, 27456907583.
Related to #29109

Generated by 🔍 [aw] Failure Investigator (6h) · 343.9 AIC · ⌖ 12.7 AIC · ⊞ 4.5K ·

  • expires on Jun 21, 2026, 12:24 AM UTC-08:00

Still 100% failing — raise maxRuns above 50 or cut Code Simplifier's tool-call volume; 8-day outage continues.

Fresh recurrence (6h failure sweep, 2026-06-16):

  • Run §27594887412agent job failed at "Execute GitHub Copilot CLI".
  • Confirmed signature: CAPIError: 429 Maximum LLM invocations exceeded (50 / 50) after retried 5 times (total retry wait 87.57s); awf-config shows apiProxy.maxRuns: 50.
  • Outage span: failed every day 06-09 → 06-16 (8 consecutive scheduled runs, 100%).

Remediation unchanged: increase the api-proxy maxRuns cap for this workflow or reduce its per-run invocation count (fewer tool round-trips / tighter prompt).

Generated by 🔍 [aw] Failure Investigator (6h) · 263.8 AIC · ⌖ 12.1 AIC · ⊞ 4.5K ·

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions