fix(api-proxy): map OpenAI Responses API cached tokens to cache_read#5262
Conversation
The token normalizer recognized cached prompt tokens from the Chat Completions API (usage.prompt_tokens_details.cached_tokens) and Anthropic (cache_read_input_tokens), but not the OpenAI Responses API (/responses), which reports them under usage.input_tokens_details.cached_tokens as an object property. Because extractCacheReadTokens only treated input_tokens_details as a token-entry array, Responses API cache reads silently fell through and were recorded as cache_read_tokens: 0. Agents using the /responses endpoint (e.g. codex) with heavy automatic prompt caching had their cache hits completely unreported, which also skews AI-credits accounting since the guard prices the non-cached input as input_tokens - cache_read_tokens. Fix extractCacheReadTokens to read input_tokens_details.cached_tokens directly. This covers both the buffered JSON and SSE streaming paths (both route through extractCacheReadTokens). Adds regression tests for the JSON, streaming, and normalizeUsage paths using the real Responses API usage shape. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes AWF’s api-proxy token normalization for the OpenAI Responses API (/responses) so cached prompt tokens are correctly reported as cache reads, improving token-usage reporting and downstream AI-credits accounting accuracy.
Changes:
- Extend
extractCacheReadTokens()to recognize Responses API cached tokens atusage.input_tokens_details.cached_tokens. - Add regression tests covering buffered JSON parsing, SSE streaming parsing, and
normalizeUsage()for the Responses API cached-tokens shape.
Show a summary per file
| File | Description |
|---|---|
containers/api-proxy/token-parsers.js |
Adds Responses API handling for cached tokens and updates normalization docs to reflect the new source field. |
containers/api-proxy/token-tracker.parsing.test.js |
Adds regression tests for JSON, SSE, and normalization paths using the real Responses API usage shape. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 0
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (1 files)
Coverage comparison generated by |
Add a regression test reproducing the exact final-chunk shape from gh-aw run 27784259295: a Copilot `/responses` streaming response that arrives as a chat.completion.chunk carrying both prompt_tokens_details.cached_tokens and the authoritative per-type split in copilot_usage.token_details. That run reported cache_read_tokens: 0 despite ~1.43M cached reads across 28 requests; this locks in that the copilot_usage breakdown drives the exact input/cache_read split. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
✅ Copilot review passed with no inline comments. @lpcox Add the |
|
✅ Contribution Check completed successfully! |
|
❌ Smoke Claude failed Smoke test invocation received with no actionable user request. No GitHub actions taken. |
|
✅ Smoke Copilot BYOK AOAI (api-key) completed. Copilot AOAI BYOK (api-key) mode operational. 🔓 |
|
📡 Smoke OTel Tracing completed. All tracing scenarios validated. ✅ |
|
🔑 Smoke Copilot PAT reports failed. PAT auth path may have issues... |
|
📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤 |
|
❌ Smoke Copilot BYOK AOAI (Entra) reports failed. AOAI BYOK (Entra) mode investigation needed... |
|
✅ Smoke Gemini completed. All facets verified. 💎 |
|
Chroot tests passed! Smoke Chroot - All security and functionality tests succeeded. |
|
✅ Build Test Suite completed successfully! |
|
✨ The prophecy is fulfilled... Smoke Codex has completed its mystical journey. The stars align. 🌟 |
|
✅ Smoke Copilot BYOK completed. Copilot BYOK mode operational. 🔓 |
|
🔌 Smoke Services — Service connectivity failed |
Replace the single Copilot /responses regression sample with a data-driven test.each over all 28 real requests captured from gh-aw run 27784259295 (chronological; cache reads grow as the prompt is re-sent). Each request asserts the exact input/cache_read/output split from the upstream copilot_usage.token_details, and that input + cache_read reconstructs the lumped prompt_tokens. A final aggregate test confirms the parser recovers the full 1,426,432 cache-read tokens that the run had reported as 0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔭 Smoke Test: API Proxy OpenTelemetry Tracing
All scenarios pass. OTEL tracing integration is healthy.
|
Smoke Test: Copilot BYOK (Direct) Mode ✅All tests passed. Running in direct BYOK mode (
Overall: PASS
|
|
fix(api-proxy): map OpenAI Responses API cached tokens to cache_read: ✅
|
🔬 Smoke Test ResultsPR: fix(api-proxy): map OpenAI Responses API cached tokens to cache_read
Overall: FAIL — workflow template variables were not expanded; pre-step smoke data missing.
|
🔍 Chroot Smoke Test Results
Overall: ❌ FAILED — Python and Node.js versions differ between host and chroot environments.
|
Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "registry.npmjs.org"See Network Configuration for more information.
|
🏗️ Build Test Suite Results
Overall: 8/8 ecosystems passed — ✅ PASS
|
Gemini Engine Smoke Test Results
Overall status: PASS Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "localhost"See Network Configuration for more information.
|
Problem
The api-proxy token normalizer reported
cache_read_tokens: 0for agents using the OpenAI Responses API (/responses, e.g. codex), even when the model heavily used automatic prompt caching.Root cause:
extractCacheReadTokens()recognized cached prompt tokens from:usage.cache_read_input_tokensusage.prompt_tokens_details.cached_tokens{ token_type: "cache_read", token_count }…but not the OpenAI Responses API, which reports them under
usage.input_tokens_details.cached_tokensas an object property. Since the function only treatedinput_tokens_detailsas a token-entry array, the value silently fell through and was recorded as0.Real-world evidence
In CI run 27784201719 (gpt-5.4-mini via codex), the
usageartifact reportedcache_read_tokens: 0for all 13 requests. But the codex CLI's ownturn.completedtelemetry showed:{"input_tokens":707301,"cached_input_tokens":672256,"output_tokens":12096,"reasoning_output_tokens":7715}That's a ~95% cache-hit rate completely unreported by AWF.
Fix
extractCacheReadTokens()now readsinput_tokens_details.cached_tokensdirectly. This covers both the buffered-JSON and SSE streaming paths (both route through the same function).No double-counting risk: the AI-credits guard already treats
input_tokensas the total (including cache reads) and prices the remainder asinput_tokens - cache_read_tokens - cache_write_tokens(guards/ai-credits-guard.js). Recovering the cache-read count makes AIC accounting more accurate, applying the discounted cached-input rate to the cached portion.Tests
Added regression tests using the real Responses API usage shape for:
extractUsageFromJson(buffered)extractUsageFromSseLine(streaming)normalizeUsageFull api-proxy suite: 1244 passed.
Fixes the root cause behind #5203.