Skip to content

fix(api-proxy): use copilot_usage token_details for accurate cache split#5253

Merged
lpcox merged 2 commits into
mainfrom
fix/copilot-usage-token-details
Jun 18, 2026
Merged

fix(api-proxy): use copilot_usage token_details for accurate cache split#5253
lpcox merged 2 commits into
mainfrom
fix/copilot-usage-token-details

Conversation

@lpcox

@lpcox lpcox commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Problem

For Claude models served through the GitHub Copilot OpenAI-compatible endpoint (/chat/completions via the api-proxy Copilot port), upstream reports a flattened usage object where prompt_tokens lumps fresh input together with cache-write tokens, and cache_creation_input_tokens is absent:

"usage": {
  "prompt_tokens": 16396,            // = input (3857) + cache_write (12539)
  "completion_tokens": 362,
  "prompt_tokens_details": { "cached_tokens": 0 }   // cache_read only
}

The authoritative per-type split lives only in the sibling copilot_usage.token_details:

"copilot_usage": { "token_details": [
  { "token_type": "input",       "token_count": 3857 },
  { "token_type": "cache_read",  "token_count": 0 },
  { "token_type": "cache_write", "token_count": 12539 },
  { "token_type": "output",      "token_count": 362 }
] }

The parser previously read only usage, so it recorded input_tokens = 16396 and cache_write_tokens = 0. The 12,539 cache-write tokens (billed at a higher rate than fresh input on Claude) were silently mis-counted as plain input — a cost-fidelity bug.

Fix

  • Add extractCopilotUsageBreakdown() to parse copilot_usage.token_details (top-level or nested under response) into normalized fields.
  • In extractUsageFromJson (non-streaming) and the OpenAI/Copilot SSE final-chunk branch, prefer this breakdown and drop the lumped prompt_tokens.
  • Plain OpenAI/Copilot responses without copilot_usage are unaffected.

Before → After (real run shape)

field before after
input_tokens 16396 3857
cache_write_tokens 0 12539
cache_read_tokens 0 0
output_tokens 362 362

Tests

16 new unit tests in token-tracker.parsing.test.js; full api-proxy suite green (1238 passed).

Scope note

Addresses the cache-split fidelity gap found while investigating the AIC=0 regression. prompt_tokens extraction itself provably works for this shape, so this is independent of the separate "no token-usage record at all" symptom (which still needs api-proxy container logs to pin down).

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Claude-via-Copilot responses report a flattened usage object where
prompt_tokens lumps fresh input together with cache-write tokens, and
cache_creation_input_tokens is absent. The authoritative per-type split
(input / cache_read / cache_write / output) lives only in the sibling
copilot_usage.token_details array.

Parse copilot_usage.token_details (in both non-streaming JSON and the SSE
final chunk) and prefer it over the lumped prompt_tokens so cache-write
tokens are recorded and billed correctly instead of being mis-counted as
plain input. Plain OpenAI responses without copilot_usage are unaffected.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 18, 2026 15:52
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

✅ Coverage Check Passed

Overall Coverage

Metric Base PR Delta
Lines 97.57% 97.61% 📈 +0.04%
Statements 97.50% 97.54% 📈 +0.04%
Functions 98.84% 98.84% ➡️ +0.00%
Branches 92.95% 92.98% 📈 +0.03%
📁 Per-file Coverage Changes (1 files)
File Lines (Before → After) Statements (Before → After)
src/workdir-setup.ts 92.7% → 94.5% (+1.82%) 92.7% → 94.5% (+1.82%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes token accounting for Claude models served via the GitHub Copilot OpenAI-compatible /chat/completions endpoint by preferring the authoritative copilot_usage.token_details breakdown over the flattened usage.prompt_tokens, restoring correct cache-write vs fresh-input attribution in the api-proxy’s usage normalization.

Changes:

  • Add extractCopilotUsageBreakdown() to parse copilot_usage.token_details (top-level or response-nested) into normalized usage fields.
  • Integrate the Copilot breakdown into both non-streaming JSON parsing (extractUsageFromJson) and streaming final-chunk parsing (extractUsageFromSseLine), and drop lumped prompt_tokens when appropriate.
  • Add unit tests covering the breakdown extraction and integration paths.
Show a summary per file
File Description
containers/api-proxy/token-parsers.js Adds Copilot-specific usage breakdown parsing and integrates it into JSON + SSE parsing paths.
containers/api-proxy/token-tracker.js Re-exports the new breakdown helper via the token-tracker facade for tests/consumers.
containers/api-proxy/token-tracker.parsing.test.js Adds unit tests validating Copilot breakdown extraction and end-to-end normalization behavior.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 3/3 changed files
  • Comments generated: 3

Comment on lines +257 to +266
const copilotBreakdown = extractCopilotUsageBreakdown(json);
if (copilotBreakdown) {
const merged = { ...(result.usage || {}), ...copilotBreakdown };
// Drop the lumped prompt_tokens so normalizeUsage uses the accurate
// input_tokens instead of input+cache_write.
if (copilotBreakdown.input_tokens !== undefined) {
delete merged.prompt_tokens;
}
result.usage = merged;
}
Comment on lines +343 to +349
const copilotBreakdown = extractCopilotUsageBreakdown(json);
if (copilotBreakdown) {
result.usage = { ...result.usage, ...copilotBreakdown };
if (copilotBreakdown.input_tokens !== undefined) {
delete result.usage.prompt_tokens;
}
}
Comment on lines +657 to +676
test('uses copilot_usage even when the flattened usage object is absent', () => {
const body = Buffer.from(JSON.stringify({
model: 'claude-sonnet-4.6',
copilot_usage: {
token_details: [
{ token_type: 'input', token_count: 200 },
{ token_type: 'output', token_count: 10 },
{ token_type: 'cache_write', token_count: 99 },
],
},
}));
expect(normalizeUsage(extractUsageFromJson(body).usage)).toEqual({
input_tokens: 200,
output_tokens: 10,
cache_read_tokens: 0,
cache_write_tokens: 99,
reasoning_tokens: 0,
});
});
});
@lpcox

lpcox commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

@copilot address review feedback

… but no input

When copilot_usage.token_details provides cache_write but omits input,
prompt_tokens (= input + cache_write) would be kept alongside
cache_creation_input_tokens, causing normalizeUsage to double-count
cache_write.

Fix: infer input_tokens = max(0, prompt_tokens - cache_creation_input_tokens)
and drop prompt_tokens in both the non-streaming and streaming paths.

Add regression tests for this edge case in both paths.

Copilot AI commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

@copilot address review feedback

Addressed in commit fix(api-proxy): infer input_tokens when copilot_usage has cache_write but no input.

Both parsing paths now handle the edge case: when copilot_usage.token_details provides cache_write but omits input, the code infers input_tokens = max(0, prompt_tokens - cache_creation_input_tokens) and drops prompt_tokens, preventing the double-count in normalizeUsage. Two regression tests lock in the behavior (non-streaming + streaming).

@github-actions

Copy link
Copy Markdown
Contributor

✅ Copilot review passed with no inline comments.

@lpcox Add the ready-for-aw label to this PR to trigger agentic CI smoke tests.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Smoke Claude failed

No user request was provided in this turn — only system reminders and project context. No action taken.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Contribution Check completed successfully!

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

📡 Smoke OTel Tracing completed. All tracing scenarios validated. ✅

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK AOAI (Entra) completed. Copilot AOAI BYOK (Entra) mode operational. 🔓

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

🔌 Smoke Services — Service connectivity failed ⚠️

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK AOAI (api-key) reports failed. AOAI BYOK (api-key) mode investigation needed...

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Chroot tests failed Smoke Chroot failed - See logs for details.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Smoke Gemini completed. All facets verified. 💎

Smoke test completed with FAIL status. Connectivity and MCP tools were unavailable. File operations passed.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

✨ The prophecy is fulfilled... Smoke Codex has completed its mystical journey. The stars align. 🌟

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Build Test Suite completed successfully!

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

🔑 Smoke Copilot PAT PAT auth validated. All systems operational. ✅

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK reports failed. BYOK mode investigation needed...

@github-actions

Copy link
Copy Markdown
Contributor
  • Add comprehensive gVisor firewall comparison workflow: ✅
  • refactor: extract provider env var constants to a shared module: ✅
  • GitHub title check: ✅
  • Smoke-test file write/read: ✅
  • npm ci && npm run build: ✅
  • Overall: PASS

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

@github-actions

Copy link
Copy Markdown
Contributor

🔥 Smoke Test: Copilot PAT Auth — FAIL

Test Result
GitHub MCP connectivity ✅ Connected (PR list retrieved)
GitHub.com HTTP ✅ HTTP 200
File write/read ❌ Pre-step data unavailable (template vars unsubstituted)

Overall: FAIL — pre-step outputs not available; file test could not be verified.

Auth mode: PAT (COPILOT_GITHUB_TOKEN) | PR author: @lpcox

🔑 PAT report filed by Smoke Copilot PAT

@github-actions

Copy link
Copy Markdown
Contributor

🔬 Smoke Test Results

PR: fix(api-proxy): use copilot_usage token_details for accurate cache split
Author: @lpcox

Test Result
GitHub MCP connectivity ✅ PASS
GitHub.com HTTP ⚠️ N/A (pre-step outputs not injected)
File write/read ⚠️ N/A (pre-step outputs not injected)

Overall: PARTIAL — MCP confirmed working; pre-computed step outputs (SMOKE_HTTP_CODE, SMOKE_FILE_CONTENT, SMOKE_FILE_PATH) were not substituted (workflow template issue).

📰 BREAKING: Report filed by Smoke Copilot

@github-actions

Copy link
Copy Markdown
Contributor

@lpcox Smoke Test Results:
GitHub MCP Testing: ✅
GitHub.com Connectivity: ✅
File Write/Read Test: ✅
BYOK Inference Test: ✅
Running in direct BYOK mode (github-oidc + AzureEntra + COPILOT_PROVIDER_BASE_URL) via api-proxy → Foundry (o4-mini-aw).
Overall Status: PASS

🪪 BYOK (AOAI Entra) report filed by Smoke Copilot BYOK AOAI (Entra)

@github-actions

Copy link
Copy Markdown
Contributor

🔍 Smoke Test: API Proxy OpenTelemetry Tracing

Scenario Result Notes
1. Module Loading otel.js loads; exports startRequestSpan, setTokenAttributes, setBudgetAttributes, endSpan, endSpanError, shutdown, isEnabled + 7 test helpers; isEnabled() = true
2. Test Suite 59 passed, 0 failed — 2 suites (otel.test.js, otel-fanout.test.js)
3. Env Var Forwarding api-proxy-service-config.ts forwards GH_AW_OTLP_ENDPOINTS, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, GITHUB_AW_OTEL_TRACE_ID, GITHUB_AW_OTEL_PARENT_SPAN_ID, OTEL_SERVICE_NAME to the api-proxy container
4. Token Tracker Integration onUsage callback present at line 283 of token-tracker-http.js with JSDoc at line 374
5. OTEL Diagnostics FileSpanExporter fallback active (no OTLP endpoint configured → writes to /var/log/api-proxy/otel.jsonl); no external OTLP export expected in this run

All scenarios pass. OTEL tracing integration is healthy on this PR.

📡 OTel tracing validated by Smoke OTel Tracing

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test Results

  • GitHub MCP Testing: ❌ (Tools not found)
  • GitHub.com Connectivity: ❌ (SSL connect error 35)
  • File Writing Testing: ✅
  • Bash Tool Testing: ✅

Overall status: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

💎 Faceted by Smoke Gemini

@github-actions

Copy link
Copy Markdown
Contributor

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color 1/1 passed ✅ PASS
Go env 1/1 passed ✅ PASS
Go uuid 1/1 passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx passed ✅ PASS
Node.js execa passed ✅ PASS
Node.js p-limit passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #5253 ·

@lpcox lpcox merged commit 41cf5ac into main Jun 18, 2026
90 of 105 checks passed
@lpcox lpcox deleted the fix/copilot-usage-token-details branch June 18, 2026 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants