Skip to content

fix(api-proxy): map OpenAI Responses API cached tokens to cache_read#5262

Merged
lpcox merged 3 commits into
mainfrom
fix/openai-cached-tokens-normalization
Jun 18, 2026
Merged

fix(api-proxy): map OpenAI Responses API cached tokens to cache_read#5262
lpcox merged 3 commits into
mainfrom
fix/openai-cached-tokens-normalization

Conversation

@lpcox

@lpcox lpcox commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Problem

The api-proxy token normalizer reported cache_read_tokens: 0 for agents using the OpenAI Responses API (/responses, e.g. codex), even when the model heavily used automatic prompt caching.

Root cause: extractCacheReadTokens() recognized cached prompt tokens from:

  • Anthropic — usage.cache_read_input_tokens
  • OpenAI Chat Completions / Copilot — usage.prompt_tokens_details.cached_tokens
  • Token-entry arrays — { token_type: "cache_read", token_count }

…but not the OpenAI Responses API, which reports them under usage.input_tokens_details.cached_tokens as an object property. Since the function only treated input_tokens_details as a token-entry array, the value silently fell through and was recorded as 0.

Real-world evidence

In CI run 27784201719 (gpt-5.4-mini via codex), the usage artifact reported cache_read_tokens: 0 for all 13 requests. But the codex CLI's own turn.completed telemetry showed:

{"input_tokens":707301,"cached_input_tokens":672256,"output_tokens":12096,"reasoning_output_tokens":7715}

That's a ~95% cache-hit rate completely unreported by AWF.

Fix

extractCacheReadTokens() now reads input_tokens_details.cached_tokens directly. This covers both the buffered-JSON and SSE streaming paths (both route through the same function).

No double-counting risk: the AI-credits guard already treats input_tokens as the total (including cache reads) and prices the remainder as input_tokens - cache_read_tokens - cache_write_tokens (guards/ai-credits-guard.js). Recovering the cache-read count makes AIC accounting more accurate, applying the discounted cached-input rate to the cached portion.

Tests

Added regression tests using the real Responses API usage shape for:

  • extractUsageFromJson (buffered)
  • extractUsageFromSseLine (streaming)
  • normalizeUsage

Full api-proxy suite: 1244 passed.

Fixes the root cause behind #5203.

The token normalizer recognized cached prompt tokens from the Chat
Completions API (usage.prompt_tokens_details.cached_tokens) and Anthropic
(cache_read_input_tokens), but not the OpenAI Responses API (/responses),
which reports them under usage.input_tokens_details.cached_tokens as an
object property.

Because extractCacheReadTokens only treated input_tokens_details as a
token-entry array, Responses API cache reads silently fell through and were
recorded as cache_read_tokens: 0. Agents using the /responses endpoint
(e.g. codex) with heavy automatic prompt caching had their cache hits
completely unreported, which also skews AI-credits accounting since the
guard prices the non-cached input as input_tokens - cache_read_tokens.

Fix extractCacheReadTokens to read input_tokens_details.cached_tokens
directly. This covers both the buffered JSON and SSE streaming paths
(both route through extractCacheReadTokens). Adds regression tests for the
JSON, streaming, and normalizeUsage paths using the real Responses API
usage shape.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 18, 2026 19:56

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes AWF’s api-proxy token normalization for the OpenAI Responses API (/responses) so cached prompt tokens are correctly reported as cache reads, improving token-usage reporting and downstream AI-credits accounting accuracy.

Changes:

  • Extend extractCacheReadTokens() to recognize Responses API cached tokens at usage.input_tokens_details.cached_tokens.
  • Add regression tests covering buffered JSON parsing, SSE streaming parsing, and normalizeUsage() for the Responses API cached-tokens shape.
Show a summary per file
File Description
containers/api-proxy/token-parsers.js Adds Responses API handling for cached tokens and updates normalization docs to reflect the new source field.
containers/api-proxy/token-tracker.parsing.test.js Adds regression tests for JSON, SSE, and normalization paths using the real Responses API usage shape.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 0

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

✅ Coverage Check Passed

Overall Coverage

Metric Base PR Delta
Lines 97.57% 97.61% 📈 +0.04%
Statements 97.50% 97.54% 📈 +0.04%
Functions 98.84% 98.84% ➡️ +0.00%
Branches 92.95% 92.98% 📈 +0.03%
📁 Per-file Coverage Changes (1 files)
File Lines (Before → After) Statements (Before → After)
src/workdir-setup.ts 92.7% → 94.5% (+1.82%) 92.7% → 94.5% (+1.82%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

Add a regression test reproducing the exact final-chunk shape from
gh-aw run 27784259295: a Copilot `/responses` streaming response that
arrives as a chat.completion.chunk carrying both
prompt_tokens_details.cached_tokens and the authoritative per-type split
in copilot_usage.token_details. That run reported cache_read_tokens: 0
despite ~1.43M cached reads across 28 requests; this locks in that the
copilot_usage breakdown drives the exact input/cache_read split.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

✅ Copilot review passed with no inline comments.

@lpcox Add the ready-for-aw label to this PR to trigger agentic CI smoke tests.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Contribution Check completed successfully!

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Smoke Claude failed

Smoke test invocation received with no actionable user request. No GitHub actions taken.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK AOAI (api-key) completed. Copilot AOAI BYOK (api-key) mode operational. 🔓

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

📡 Smoke OTel Tracing completed. All tracing scenarios validated. ✅

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

🔑 Smoke Copilot PAT reports failed. PAT auth path may have issues...

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK AOAI (Entra) reports failed. AOAI BYOK (Entra) mode investigation needed...

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Smoke Gemini completed. All facets verified. 💎

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Chroot tests passed! Smoke Chroot - All security and functionality tests succeeded.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Build Test Suite completed successfully!

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

✨ The prophecy is fulfilled... Smoke Codex has completed its mystical journey. The stars align. 🌟

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK completed. Copilot BYOK mode operational. 🔓

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

🔌 Smoke Services — Service connectivity failed ⚠️

Replace the single Copilot /responses regression sample with a
data-driven test.each over all 28 real requests captured from gh-aw run
27784259295 (chronological; cache reads grow as the prompt is re-sent).
Each request asserts the exact input/cache_read/output split from the
upstream copilot_usage.token_details, and that input + cache_read
reconstructs the lumped prompt_tokens. A final aggregate test confirms
the parser recovers the full 1,426,432 cache-read tokens that the run
had reported as 0.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🔭 Smoke Test: API Proxy OpenTelemetry Tracing

Scenario Status Notes
Module Loading otel.js loads OK; exports startRequestSpan, setTokenAttributes, setBudgetAttributes, endSpan, endSpanError, shutdown, isEnabled + test helpers; isEnabled: true (file exporter active)
Test Suite 59 passed, 0 failed across 2 suites (otel.test.js, otel-fanout.test.js)
Env Var Forwarding api-proxy-service-config.ts forwards GH_AW_OTLP_ENDPOINTS, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, GITHUB_AW_OTEL_TRACE_ID, GITHUB_AW_OTEL_PARENT_SPAN_ID, OTEL_SERVICE_NAME; observability-environment.ts also auto-forwards all OTEL_* vars
Token Tracker Integration onUsage callback present in token-tracker-http.js (OTEL hook point confirmed)
OTEL Diagnostics i️ No OTLP endpoint configured; spans gracefully fall back to FileSpanExporter/var/log/api-proxy/otel.jsonl

All scenarios pass. OTEL tracing integration is healthy.

📡 OTel tracing validated by Smoke OTel Tracing

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Copilot BYOK (Direct) Mode ✅

All tests passed. Running in direct BYOK mode (COPILOT_PROVIDER_API_KEY) via api-proxy → api.githubcopilot.com.

  • ✅ GitHub MCP connectivity (PR data retrieved)
  • ✅ github.com HTTP 200
  • ✅ File write/read (workspace accessible)
  • ✅ BYOK inference active (you are reading this response)

Overall: PASS

@lpcox

🔑 BYOK report filed by Smoke Copilot BYOK

@github-actions

Copy link
Copy Markdown
Contributor

fix(api-proxy): map OpenAI Responses API cached tokens to cache_read: ✅
fix(api-proxy): write placeholder token-usage record when usage extraction fails: ✅
GitHub.com connectivity: ✅
File write/read test: ✅
Direct BYOK inference mode: ✅
Running in direct BYOK mode (COPILOT_PROVIDER_API_KEY + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw)
Overall: PASS
@lpcox

🔑 BYOK (AOAI api-key) report filed by Smoke Copilot BYOK AOAI (api-key)

@github-actions

Copy link
Copy Markdown
Contributor

🔬 Smoke Test Results

PR: fix(api-proxy): map OpenAI Responses API cached tokens to cache_read
Author: @lpcox

Test Result
GitHub MCP connectivity
GitHub.com HTTP connectivity ❌ (pre-step data unavailable)
File write/read ❌ (pre-step data unavailable)

Overall: FAIL — workflow template variables were not expanded; pre-step smoke data missing.

📰 BREAKING: Report filed by Smoke Copilot

@github-actions

Copy link
Copy Markdown
Contributor

🔍 Chroot Smoke Test Results

Runtime Host Version Chroot Version Match?
Python Python 3.12.13 Python 3.12.3 ❌ NO
Node.js v24.16.0 v22.22.3 ❌ NO
Go go1.22.12 go1.22.12 ✅ YES

Overall: ❌ FAILED — Python and Node.js versions differ between host and chroot environments.

Tested by Smoke Chroot

@github-actions

Copy link
Copy Markdown
Contributor
  • 5254 fix(api-proxy): copy token-tracker-shared + otel modules into image (fixes AIC=0) ✅
  • 5253 fix(api-proxy): use copilot_usage token_details for accurate cache split ✅
  • GitHub title check ✅
  • File write/read ✅
  • Discussion comment ✅
  • Build ✅
  • Overall: PASS

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

@github-actions

Copy link
Copy Markdown
Contributor

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color 1/1 passed ✅ PASS
Go env 1/1 passed ✅ PASS
Go uuid 1/1 passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx All passed ✅ PASS
Node.js execa All passed ✅ PASS
Node.js p-limit All passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #5262 · 46.6 AIC · ⊞ 7.5K ·

@github-actions

Copy link
Copy Markdown
Contributor

Gemini Engine Smoke Test Results

  • GitHub MCP Testing: ✅
    • PR 1: fix(api-proxy): copy token-tracker-shared + otel modules into image (fixes AIC=0)
    • PR 2: fix(api-proxy): use copilot_usage token_details for accurate cache split
  • GitHub.com Connectivity: ✅
  • File Writing Testing: ✅
  • Bash Tool Testing: ✅

Overall status: PASS

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

💎 Faceted by Smoke Gemini

@lpcox lpcox merged commit 8a3d323 into main Jun 18, 2026
22 of 23 checks passed
@lpcox lpcox deleted the fix/openai-cached-tokens-normalization branch June 18, 2026 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants