Skip to content

fix(api-proxy): 403 for terminal caps; fix Anthropic/Copilot input credits#5271

Merged
lpcox merged 3 commits into
mainfrom
fix/hard-cap-403-and-anthropic-input-credits
Jun 19, 2026
Merged

fix(api-proxy): 403 for terminal caps; fix Anthropic/Copilot input credits#5271
lpcox merged 3 commits into
mainfrom
fix/hard-cap-403-and-anthropic-input-credits

Conversation

@lpcox

@lpcox lpcox commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

Two related token-budget fixes to the api-proxy sidecar.

1. Terminal hard caps return 403 instead of 429

The hard-cap guards (effective_tokens, max_runs, max_cache_misses, ai_credits) previously rejected with HTTP 429. LLM SDK clients (Anthropic, OpenAI, claude-code) treat 429 as a transient rate-limit and retry with backoff — but these caps are terminal and never recover, so the agent retry-storms (up to ~10×) against a wall until the step hits its 10-minute timeout.

This manifested as:
> API Error: Request rejected (429) · Maximum LLM invocations exceeded (3 / 2).

Switching these to 403 Forbidden (non-retryable) makes the agent stop cleanly. The per-IP rate limiter in websocket-proxy.js correctly stays 429 (with Retry-After) because that limit is recoverable.

2. Provider-aware AI-credit calculation (Anthropic/Copilot input_tokens)

Anthropic and Copilot report input_tokens as the non-cached input only (with cache token fields reported separately and additive), whereas OpenAI reports it as the total with cached tokens as a subset. The old calculateAiCredits always subtracted cache from input — correct for OpenAI, but for Anthropic/Copilot it can over-subtract, under-counting genuinely fresh input. provider is now threaded through applyAiCreditsUsage → calculateAiCredits, and cache subtraction is skipped for Anthropic and Copilot.

Provider string comparisons in the new code use centralized constants from a new provider-names module (deliberately not named providers to avoid colliding with the providers/ adapter directory).

Testing

  • containers/api-proxy: pass (including Anthropic and Copilot AI-credit regression coverage, plus 429→403 guard/websocket assertions)
  • Repo TS: build/type checks pass for this change set

Docs / schema

  • docs/awf-config-spec.md, docs/api-proxy-sidecar.md: hard-cap status updated 429→403 (with rationale note)
  • docs/awf-config.schema.json + src/awf-config-schema.json: regenerated, in sync

🤖 Generated with GitHub Copilot CLI

Two related token-budget fixes:

1. Terminal hard caps (effective_tokens, max_runs, max_cache_misses,
   ai_credits) now reject with HTTP 403 instead of 429. LLM SDK clients
   treat 429 as a transient rate-limit and retry-storm against a cap that
   never recovers, exhausting the run budget until the step times out.
   403 is non-retryable, so the agent stops cleanly. The per-IP rate
   limiter keeps returning 429 (with Retry-After) since it is recoverable.

2. AI-credit calculation is now provider-aware. Anthropic reports
   input_tokens as the NON-cached input only (cache_read/cache_creation
   are additive), whereas OpenAI reports it as the TOTAL with cache as a
   subset. The old code always subtracted cache from input, over-counting
   cache and under-counting fresh input for Anthropic. provider is now
   threaded through applyAiCreditsUsage -> calculateAiCredits.

Provider string literals in the new code use centralized constants from
the new provider-names module (named to avoid colliding with the
providers/ adapter directory).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 19, 2026 02:40
@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Documentation Preview

Documentation build failed for this PR. View logs.

Built from commit be65350

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

✅ Coverage Check Passed

Overall Coverage

Metric Base PR Delta
Lines 97.54% 97.58% 📈 +0.04%
Statements 97.47% 97.50% 📈 +0.03%
Functions 98.85% 98.85% ➡️ +0.00%
Branches 92.87% 92.91% 📈 +0.04%
📁 Per-file Coverage Changes (1 files)
File Lines (Before → After) Statements (Before → After)
src/workdir-setup.ts 92.7% → 94.5% (+1.82%) 92.7% → 94.5% (+1.82%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the api-proxy sidecar’s token-budget enforcement behavior to be non-retryable for terminal caps (switching hard-cap responses from HTTP 429 to 403) and fixes AI-credit accounting for providers whose input_tokens semantics are additive (notably Anthropic).

Changes:

  • Switch terminal hard-cap guards (effective tokens / max runs / max cache misses / AI credits) to return HTTP 403 instead of 429, including WebSocket upgrade handling and docs/spec/schema updates.
  • Thread provider through token-budget logging so AI-credits calculation can be provider-aware.
  • Add regression tests for Anthropic AI-credit accounting and update guard/websocket tests for the new 403 behavior.
Show a summary per file
File Description
src/awf-config-schema.json Updates schema descriptions to document hard caps returning HTTP 403.
docs/awf-config.schema.json Mirrors schema description updates for published docs.
docs/awf-config-spec.md Spec update: hard-cap rejection semantics (HTTP 403) for HTTP + WebSocket paths.
docs/api-proxy-sidecar.md Docs update: budget exhaustion detection and rationale for 403 vs 429.
containers/api-proxy/token-budget-log.js Threads provider into AI-credits usage computation.
containers/api-proxy/server.websocket.test.js Updates WebSocket guard assertions from 429 to 403.
containers/api-proxy/server.token-guards.test.js Updates HTTP guard assertions and related test docs to 403.
containers/api-proxy/provider-names.js Adds centralized provider name constants for safer comparisons.
containers/api-proxy/guards/common-guard-checks.js Changes hard-cap guard status codes to 403 with rationale comments.
containers/api-proxy/guards/ai-credits-guard.test.js Adds Anthropic input-token semantics regression tests; updates cached-token test naming.
containers/api-proxy/guards/ai-credits-guard.js Makes AI-credit calculation provider-aware (Anthropic additive input semantics).
containers/api-proxy/Dockerfile Ensures new provider-names.js is copied into the image.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 12/12 changed files
  • Comments generated: 2

Comment on lines +184 to +186
const nonCachedInput = provider === PROVIDER_ANTHROPIC
? reportedInput
: Math.max(0, reportedInput - cacheReadTokens - cacheWriteTokens);
Comment on lines 1 to 5
/**
* Tests for proxyRequest guards: effective token limit (429) and
* max-runs limit (429).
* Tests for proxyRequest guards: effective token limit (403) and
* max-runs limit (403).
*
* Extracted from server.proxy.test.js.
@lpcox

lpcox commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator Author

@copilot address review feedback

Copilot AI commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

@copilot address review feedback

Addressed in 8b0ee05.

  • Updated AI credits input handling so provider === copilot is treated like Anthropic (no cache subtraction from input_tokens).
  • Added a Copilot regression test in guards/ai-credits-guard.test.js.
  • Updated the server.token-guards.test.js header comment to reflect full guard coverage.

Copilot AI changed the title fix(api-proxy): 403 for terminal caps; fix Anthropic input credits fix(api-proxy): 403 for terminal caps; fix Anthropic/Copilot input credits Jun 19, 2026
Copilot finished work on behalf of lpcox June 19, 2026 02:54
@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Contribution Check completed successfully!

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK AOAI (api-key) completed. Copilot AOAI BYOK (api-key) mode operational. 🔓

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

🔑 Smoke Copilot PAT PAT auth validated. All systems operational. ✅

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK completed. Copilot BYOK mode operational. 🔓

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

📡 Smoke OTel Tracing completed. All tracing scenarios validated. ✅

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK AOAI (Entra) completed. Copilot AOAI BYOK (Entra) mode operational. 🔓

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Build Test Suite completed successfully!

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

🔌 Smoke Services — All services reachable! ✅

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Chroot tests passed! Smoke Chroot - All security and functionality tests succeeded.

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Smoke Claude failed

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Smoke Gemini completed. All facets verified. 💎

Smoke test progress

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

✨ The prophecy is fulfilled... Smoke Codex has completed its mystical journey. The stars align. 🌟

@github-actions

Copy link
Copy Markdown
Contributor

🔐 Smoke Test: Copilot PAT Auth — PASS

Test Result
GitHub MCP connectivity
GitHub.com HTTP ✅ (200)
File write/read

@lpcox — all checks passed.
Auth mode: PAT (COPILOT_GITHUB_TOKEN)

🔑 PAT report filed by Smoke Copilot PAT

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Copilot BYOK (Direct Mode) ✅ PASS

Tests:

  • ✅ GitHub MCP connectivity (PR fetching works)
  • ✅ GitHub.com HTTP connectivity (HTTP 200)
  • ✅ File write/read in agent (verified)
  • ✅ BYOK inference path (responding to prompt = api-proxy working)

Mode: Direct BYOK (COPILOT_PROVIDER_API_KEY) → api-proxy sidecar → api.githubcopilot.com

Author: @lpcox | PR: #5271

🔑 BYOK report filed by Smoke Copilot BYOK

@github-actions

Copy link
Copy Markdown
Contributor

🔬 Smoke Test Results

PR: fix(api-proxy): 403 for terminal caps; fix Anthropic/Copilot input credits
Author: @lpcox

Test Result
GitHub MCP connectivity
GitHub.com HTTP connectivity ❌ (pre-step data not available — template vars unexpanded)
File write/read ❌ (pre-step data not available — template vars unexpanded)

Overall: FAIL — Pre-computed test data was not passed to the agent (workflow template variables were not substituted).

📰 BREAKING: Report filed by Smoke Copilot

@github-actions

Copy link
Copy Markdown
Contributor

@lpcox

fix(api-proxy): 403 for terminal caps; fix Anthropic/Copilot input credits

✅ MCP connectivity
✅ HTTP check
✅ File I/O
✅ Inference BYOK

Running in direct BYOK mode (COPILOT_PROVIDER_API_KEY + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw)

Overall: PASS

🔑 BYOK (AOAI api-key) report filed by Smoke Copilot BYOK AOAI (api-key)

@github-actions

Copy link
Copy Markdown
Contributor

🔭 Smoke Test: API Proxy OpenTelemetry Tracing

Scenario Result Details
1. Module Loading otel.js loads; exports: startRequestSpan, setTokenAttributes, setBudgetAttributes, endSpan, endSpanError, shutdown, isEnabled (+ testing helpers). isEnabled()true.
2. Test Suite 39/39 tests pass (otel.test.js)
3. Env Var Forwarding api-proxy-service-config.ts forwards GH_AW_OTLP_ENDPOINTS, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, GITHUB_AW_OTEL_TRACE_ID, GITHUB_AW_OTEL_PARENT_SPAN_ID, OTEL_SERVICE_NAME
4. Token Tracker Integration onUsage callback present in token-tracker-http.js (line 324) — confirmed OTEL hook point
5. OTEL Diagnostics i️ No OTLP endpoint configured → spans fall back to FileSpanExporter (/var/log/api-proxy/otel.jsonl). No live container run in this workflow, so no span exports observed.

All scenarios pass or are in expected state. OTEL integration is fully wired.

📡 OTel tracing validated by Smoke OTel Tracing

@github-actions

Copy link
Copy Markdown
Contributor

@lpcox Smoke Test Results:

  • GitHub MCP Testing: ✅
  • GitHub.com Connectivity: ✅
  • File Write/Read Test: ✅
  • BYOK Inference Test: ✅
    Running in direct BYOK mode (AWF_AUTH_TYPE=github-oidc + AWF_AUTH_AZURE_* + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw) authenticated via Microsoft Entra
    Overall: PASS

🪪 BYOK (AOAI Entra) report filed by Smoke Copilot BYOK AOAI (Entra)

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: GitHub Actions Services Connectivity

Check Result
Redis PING ❌ (no response / timeout)
PostgreSQL pg_isready ❌ (no response)
PostgreSQL SELECT 1 ❌ (no response / timeout)

Overall: ❌ FAILhost.docker.internal is not reachable from this runner environment.

🔌 Service connectivity validated by Smoke Services

@github-actions

Copy link
Copy Markdown
Contributor

Chroot Version Comparison Results

Runtime Host Version Chroot Version Match?
Python Python 3.12.13 Python 3.12.3 ❌ NO
Node.js v24.16.0 v22.22.3 ❌ NO
Go go1.22.12 go1.22.12 ✅ YES

Overall: ❌ Not all tests passed — Python and Node.js versions differ between host and chroot environments.

Tested by Smoke Chroot

@github-actions

Copy link
Copy Markdown
Contributor
  • fix(api-proxy): 403 for terminal caps; fix Anthropic/Copilot input credits
  • fix(containers): apt install fallback to archive.ubuntu.com
  • ci(smoke): add token-usage sanity checks to smoke workflows
  • GitHub reads: ✅
  • Safe-input GH query: ✅
  • Playwright title: ✅
  • Temp file: ✅
  • Build: ✅
  • Overall: PASS

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

@github-actions

Copy link
Copy Markdown
Contributor

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color ok ✅ PASS
Go env ok ✅ PASS
Go uuid ok ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx All passed ✅ PASS
Node.js execa All passed ✅ PASS
Node.js p-limit All passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #5271 · 48 AIC · ⊞ 7.5K ·

@github-actions

Copy link
Copy Markdown
Contributor

Gemini Engine Validation Results

Overall Status: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

💎 Faceted by Smoke Gemini

The maxRuns:2 cap was too tight for the smoke prompt: the agent
routinely burns its 2 invocations on a planning turn plus a parallel
capability-probe before emitting its safe output, then hits the cap and
fails. Bump max-turns (which drives apiProxy.maxRuns) to 5 so the smoke
test has headroom to complete. Recompiled the lock file and updated the
workflow test assertions accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@lpcox lpcox merged commit 6de3216 into main Jun 19, 2026
29 of 30 checks passed
@lpcox lpcox deleted the fix/hard-cap-403-and-anthropic-input-credits branch June 19, 2026 03:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants