fix(api-proxy): use copilot_usage token_details for accurate cache split#5253
Conversation
Claude-via-Copilot responses report a flattened usage object where prompt_tokens lumps fresh input together with cache-write tokens, and cache_creation_input_tokens is absent. The authoritative per-type split (input / cache_read / cache_write / output) lives only in the sibling copilot_usage.token_details array. Parse copilot_usage.token_details (in both non-streaming JSON and the SSE final chunk) and prefer it over the lumped prompt_tokens so cache-write tokens are recorded and billed correctly instead of being mis-counted as plain input. Plain OpenAI responses without copilot_usage are unaffected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (1 files)
Coverage comparison generated by |
There was a problem hiding this comment.
Pull request overview
This PR fixes token accounting for Claude models served via the GitHub Copilot OpenAI-compatible /chat/completions endpoint by preferring the authoritative copilot_usage.token_details breakdown over the flattened usage.prompt_tokens, restoring correct cache-write vs fresh-input attribution in the api-proxy’s usage normalization.
Changes:
- Add
extractCopilotUsageBreakdown()to parsecopilot_usage.token_details(top-level orresponse-nested) into normalized usage fields. - Integrate the Copilot breakdown into both non-streaming JSON parsing (
extractUsageFromJson) and streaming final-chunk parsing (extractUsageFromSseLine), and drop lumpedprompt_tokenswhen appropriate. - Add unit tests covering the breakdown extraction and integration paths.
Show a summary per file
| File | Description |
|---|---|
| containers/api-proxy/token-parsers.js | Adds Copilot-specific usage breakdown parsing and integrates it into JSON + SSE parsing paths. |
| containers/api-proxy/token-tracker.js | Re-exports the new breakdown helper via the token-tracker facade for tests/consumers. |
| containers/api-proxy/token-tracker.parsing.test.js | Adds unit tests validating Copilot breakdown extraction and end-to-end normalization behavior. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 3/3 changed files
- Comments generated: 3
| const copilotBreakdown = extractCopilotUsageBreakdown(json); | ||
| if (copilotBreakdown) { | ||
| const merged = { ...(result.usage || {}), ...copilotBreakdown }; | ||
| // Drop the lumped prompt_tokens so normalizeUsage uses the accurate | ||
| // input_tokens instead of input+cache_write. | ||
| if (copilotBreakdown.input_tokens !== undefined) { | ||
| delete merged.prompt_tokens; | ||
| } | ||
| result.usage = merged; | ||
| } |
| const copilotBreakdown = extractCopilotUsageBreakdown(json); | ||
| if (copilotBreakdown) { | ||
| result.usage = { ...result.usage, ...copilotBreakdown }; | ||
| if (copilotBreakdown.input_tokens !== undefined) { | ||
| delete result.usage.prompt_tokens; | ||
| } | ||
| } |
| test('uses copilot_usage even when the flattened usage object is absent', () => { | ||
| const body = Buffer.from(JSON.stringify({ | ||
| model: 'claude-sonnet-4.6', | ||
| copilot_usage: { | ||
| token_details: [ | ||
| { token_type: 'input', token_count: 200 }, | ||
| { token_type: 'output', token_count: 10 }, | ||
| { token_type: 'cache_write', token_count: 99 }, | ||
| ], | ||
| }, | ||
| })); | ||
| expect(normalizeUsage(extractUsageFromJson(body).usage)).toEqual({ | ||
| input_tokens: 200, | ||
| output_tokens: 10, | ||
| cache_read_tokens: 0, | ||
| cache_write_tokens: 99, | ||
| reasoning_tokens: 0, | ||
| }); | ||
| }); | ||
| }); |
|
@copilot address review feedback |
… but no input When copilot_usage.token_details provides cache_write but omits input, prompt_tokens (= input + cache_write) would be kept alongside cache_creation_input_tokens, causing normalizeUsage to double-count cache_write. Fix: infer input_tokens = max(0, prompt_tokens - cache_creation_input_tokens) and drop prompt_tokens in both the non-streaming and streaming paths. Add regression tests for this edge case in both paths.
Addressed in commit Both parsing paths now handle the edge case: when |
|
✅ Copilot review passed with no inline comments. @lpcox Add the |
|
❌ Smoke Claude failed No user request was provided in this turn — only system reminders and project context. No action taken. |
|
✅ Contribution Check completed successfully! |
|
📡 Smoke OTel Tracing completed. All tracing scenarios validated. ✅ |
|
✅ Smoke Copilot BYOK AOAI (Entra) completed. Copilot AOAI BYOK (Entra) mode operational. 🔓 |
|
📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤 |
|
🔌 Smoke Services — Service connectivity failed |
|
❌ Smoke Copilot BYOK AOAI (api-key) reports failed. AOAI BYOK (api-key) mode investigation needed... |
|
Chroot tests failed Smoke Chroot failed - See logs for details. |
|
✅ Smoke Gemini completed. All facets verified. 💎 Smoke test completed with FAIL status. Connectivity and MCP tools were unavailable. File operations passed. |
|
✨ The prophecy is fulfilled... Smoke Codex has completed its mystical journey. The stars align. 🌟 |
|
✅ Build Test Suite completed successfully! |
|
🔑 Smoke Copilot PAT PAT auth validated. All systems operational. ✅ |
|
❌ Smoke Copilot BYOK reports failed. BYOK mode investigation needed... |
Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "registry.npmjs.org"See Network Configuration for more information.
|
🔥 Smoke Test: Copilot PAT Auth — FAIL
Overall: FAIL — pre-step outputs not available; file test could not be verified. Auth mode: PAT (COPILOT_GITHUB_TOKEN) | PR author: @lpcox
|
🔬 Smoke Test ResultsPR: fix(api-proxy): use copilot_usage token_details for accurate cache split
Overall: PARTIAL — MCP confirmed working; pre-computed step outputs (
|
|
@lpcox Smoke Test Results:
|
🔍 Smoke Test: API Proxy OpenTelemetry Tracing
All scenarios pass. OTEL tracing integration is healthy on this PR.
|
Smoke Test Results
Overall status: FAIL Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "localhost"See Network Configuration for more information.
|
🏗️ Build Test Suite Results
Overall: 8/8 ecosystems passed — ✅ PASS
|
Problem
For Claude models served through the GitHub Copilot OpenAI-compatible endpoint (
/chat/completionsvia the api-proxy Copilot port), upstream reports a flattenedusageobject whereprompt_tokenslumps fresh input together with cache-write tokens, andcache_creation_input_tokensis absent:The authoritative per-type split lives only in the sibling
copilot_usage.token_details:The parser previously read only
usage, so it recordedinput_tokens = 16396andcache_write_tokens = 0. The 12,539 cache-write tokens (billed at a higher rate than fresh input on Claude) were silently mis-counted as plain input — a cost-fidelity bug.Fix
extractCopilotUsageBreakdown()to parsecopilot_usage.token_details(top-level or nested underresponse) into normalized fields.extractUsageFromJson(non-streaming) and the OpenAI/Copilot SSE final-chunk branch, prefer this breakdown and drop the lumpedprompt_tokens.copilot_usageare unaffected.Before → After (real run shape)
Tests
16 new unit tests in
token-tracker.parsing.test.js; full api-proxy suite green (1238 passed).Scope note
Addresses the cache-split fidelity gap found while investigating the AIC=0 regression.
prompt_tokensextraction itself provably works for this shape, so this is independent of the separate "no token-usage record at all" symptom (which still needs api-proxy container logs to pin down).Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com