fix(api-proxy): 403 for terminal caps; fix Anthropic/Copilot input credits#5271
Conversation
Two related token-budget fixes: 1. Terminal hard caps (effective_tokens, max_runs, max_cache_misses, ai_credits) now reject with HTTP 403 instead of 429. LLM SDK clients treat 429 as a transient rate-limit and retry-storm against a cap that never recovers, exhausting the run budget until the step times out. 403 is non-retryable, so the agent stops cleanly. The per-IP rate limiter keeps returning 429 (with Retry-After) since it is recoverable. 2. AI-credit calculation is now provider-aware. Anthropic reports input_tokens as the NON-cached input only (cache_read/cache_creation are additive), whereas OpenAI reports it as the TOTAL with cache as a subset. The old code always subtracted cache from input, over-counting cache and under-counting fresh input for Anthropic. provider is now threaded through applyAiCreditsUsage -> calculateAiCredits. Provider string literals in the new code use centralized constants from the new provider-names module (named to avoid colliding with the providers/ adapter directory). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (1 files)
Coverage comparison generated by |
There was a problem hiding this comment.
Pull request overview
Updates the api-proxy sidecar’s token-budget enforcement behavior to be non-retryable for terminal caps (switching hard-cap responses from HTTP 429 to 403) and fixes AI-credit accounting for providers whose input_tokens semantics are additive (notably Anthropic).
Changes:
- Switch terminal hard-cap guards (effective tokens / max runs / max cache misses / AI credits) to return HTTP 403 instead of 429, including WebSocket upgrade handling and docs/spec/schema updates.
- Thread
providerthrough token-budget logging so AI-credits calculation can be provider-aware. - Add regression tests for Anthropic AI-credit accounting and update guard/websocket tests for the new 403 behavior.
Show a summary per file
| File | Description |
|---|---|
| src/awf-config-schema.json | Updates schema descriptions to document hard caps returning HTTP 403. |
| docs/awf-config.schema.json | Mirrors schema description updates for published docs. |
| docs/awf-config-spec.md | Spec update: hard-cap rejection semantics (HTTP 403) for HTTP + WebSocket paths. |
| docs/api-proxy-sidecar.md | Docs update: budget exhaustion detection and rationale for 403 vs 429. |
| containers/api-proxy/token-budget-log.js | Threads provider into AI-credits usage computation. |
| containers/api-proxy/server.websocket.test.js | Updates WebSocket guard assertions from 429 to 403. |
| containers/api-proxy/server.token-guards.test.js | Updates HTTP guard assertions and related test docs to 403. |
| containers/api-proxy/provider-names.js | Adds centralized provider name constants for safer comparisons. |
| containers/api-proxy/guards/common-guard-checks.js | Changes hard-cap guard status codes to 403 with rationale comments. |
| containers/api-proxy/guards/ai-credits-guard.test.js | Adds Anthropic input-token semantics regression tests; updates cached-token test naming. |
| containers/api-proxy/guards/ai-credits-guard.js | Makes AI-credit calculation provider-aware (Anthropic additive input semantics). |
| containers/api-proxy/Dockerfile | Ensures new provider-names.js is copied into the image. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 12/12 changed files
- Comments generated: 2
| const nonCachedInput = provider === PROVIDER_ANTHROPIC | ||
| ? reportedInput | ||
| : Math.max(0, reportedInput - cacheReadTokens - cacheWriteTokens); |
| /** | ||
| * Tests for proxyRequest guards: effective token limit (429) and | ||
| * max-runs limit (429). | ||
| * Tests for proxyRequest guards: effective token limit (403) and | ||
| * max-runs limit (403). | ||
| * | ||
| * Extracted from server.proxy.test.js. |
|
@copilot address review feedback |
Addressed in 8b0ee05.
|
|
✅ Contribution Check completed successfully! |
|
✅ Smoke Copilot BYOK AOAI (api-key) completed. Copilot AOAI BYOK (api-key) mode operational. 🔓 |
|
🔑 Smoke Copilot PAT PAT auth validated. All systems operational. ✅ |
|
✅ Smoke Copilot BYOK completed. Copilot BYOK mode operational. 🔓 |
|
📡 Smoke OTel Tracing completed. All tracing scenarios validated. ✅ |
|
✅ Smoke Copilot BYOK AOAI (Entra) completed. Copilot AOAI BYOK (Entra) mode operational. 🔓 |
|
✅ Build Test Suite completed successfully! |
|
🔌 Smoke Services — All services reachable! ✅ |
|
Chroot tests passed! Smoke Chroot - All security and functionality tests succeeded. |
|
❌ Smoke Claude failed |
|
📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤 |
|
✅ Smoke Gemini completed. All facets verified. 💎 Smoke test progress |
|
✨ The prophecy is fulfilled... Smoke Codex has completed its mystical journey. The stars align. 🌟 |
🔐 Smoke Test: Copilot PAT Auth — PASS
@lpcox — all checks passed.
|
Smoke Test: Copilot BYOK (Direct Mode) ✅ PASSTests:
Mode: Direct BYOK (COPILOT_PROVIDER_API_KEY) → api-proxy sidecar → api.githubcopilot.com
|
🔬 Smoke Test ResultsPR: fix(api-proxy): 403 for terminal caps; fix Anthropic/Copilot input credits
Overall: FAIL — Pre-computed test data was not passed to the agent (workflow template variables were not substituted).
|
|
fix(api-proxy): 403 for terminal caps; fix Anthropic/Copilot input credits ✅ MCP connectivity Running in direct BYOK mode (COPILOT_PROVIDER_API_KEY + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw) Overall: PASS
|
🔭 Smoke Test: API Proxy OpenTelemetry Tracing
All scenarios pass or are in expected state. OTEL integration is fully wired.
|
|
@lpcox Smoke Test Results:
|
Smoke Test: GitHub Actions Services Connectivity
Overall: ❌ FAIL —
|
Chroot Version Comparison Results
Overall: ❌ Not all tests passed — Python and Node.js versions differ between host and chroot environments.
|
Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "registry.npmjs.org"See Network Configuration for more information.
|
🏗️ Build Test Suite Results
Overall: 8/8 ecosystems passed — ✅ PASS
|
Gemini Engine Validation Results
Overall Status: FAIL Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "localhost"See Network Configuration for more information.
|
The maxRuns:2 cap was too tight for the smoke prompt: the agent routinely burns its 2 invocations on a planning turn plus a parallel capability-probe before emitting its safe output, then hits the cap and fails. Bump max-turns (which drives apiProxy.maxRuns) to 5 so the smoke test has headroom to complete. Recompiled the lock file and updated the workflow test assertions accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Two related token-budget fixes to the api-proxy sidecar.
1. Terminal hard caps return
403instead of429The hard-cap guards (
effective_tokens,max_runs,max_cache_misses,ai_credits) previously rejected with HTTP 429. LLM SDK clients (Anthropic, OpenAI, claude-code) treat429as a transient rate-limit and retry with backoff — but these caps are terminal and never recover, so the agent retry-storms (up to ~10×) against a wall until the step hits its 10-minute timeout.This manifested as:
>
API Error: Request rejected (429) · Maximum LLM invocations exceeded (3 / 2).Switching these to
403 Forbidden(non-retryable) makes the agent stop cleanly. The per-IP rate limiter inwebsocket-proxy.jscorrectly stays429(withRetry-After) because that limit is recoverable.2. Provider-aware AI-credit calculation (Anthropic/Copilot
input_tokens)Anthropic and Copilot report
input_tokensas the non-cached input only (with cache token fields reported separately and additive), whereas OpenAI reports it as the total with cached tokens as a subset. The oldcalculateAiCreditsalways subtracted cache from input — correct for OpenAI, but for Anthropic/Copilot it can over-subtract, under-counting genuinely fresh input.provideris now threaded throughapplyAiCreditsUsage → calculateAiCredits, and cache subtraction is skipped for Anthropic and Copilot.Provider string comparisons in the new code use centralized constants from a new
provider-namesmodule (deliberately not namedprovidersto avoid colliding with theproviders/adapter directory).Testing
containers/api-proxy: pass (including Anthropic and Copilot AI-credit regression coverage, plus 429→403 guard/websocket assertions)Docs / schema
docs/awf-config-spec.md,docs/api-proxy-sidecar.md: hard-cap status updated 429→403 (with rationale note)docs/awf-config.schema.json+src/awf-config-schema.json: regenerated, in sync🤖 Generated with GitHub Copilot CLI