Skip to content

Agent-side AI-credits calculator under-reports AIC for Anthropic (subtracts cache_read from input_tokens) #40205

@lpcox

Description

@lpcox

Summary

The agent-side AI-credits (AIC) calculator in actions/setup/js/model_costs.cjs under-reports AI credits for Anthropic models. It recomputes AIC from raw token counts and incorrectly subtracts cache_read_tokens from Anthropic's input_tokens. Anthropic already reports input_tokens exclusive of cached tokens (cache_read_input_tokens / cache_creation_input_tokens are reported separately and are additive), so the subtraction zeroes out the genuinely-fresh input credit.

This produces a persistent drift between:

  • agent_usage.json / GH_AW_AIC (computed by the harness), and
  • the firewall api-proxy's authoritative ai_credits_total in token-usage.jsonl.

The verify_token_usage smoke job surfaces this as a non-blocking warning:

ai_credits drift: last ai_credits_total is 45.971025, agent_usage reports 44.719.

Root cause

actions/setup/js/model_costs.cjs

// line ~203, computeInferenceCostUSD()
const effectiveInput = providerIncludesCacheReadsInInput(provider)
  ? Math.max(input - cacheRead, 0)
  : input;
// line ~98
function providerIncludesCacheReadsInInput(provider) {
  switch (normalizeProvider(provider)) {
    case "":
    case "anthropic":      // <-- incorrect for Anthropic
    case "openai":
    case "azure-openai":
    case "azure_openai":
      return true;
    default:
      return false;
  }
}

Provider token semantics differ:

Provider input_tokens semantics Subtract cache from input?
OpenAI / Azure OpenAI Total input; cached tokens are a subset ✅ Yes (avoid double-count)
Anthropic Non-cached input only; cache_read + cache_creation are separate & additive No

Returning true for anthropic (and the empty/default "" case, which assumes OpenAI semantics) makes the harness subtract cache_read from an input_tokens value that never included it, so for cache-heavy Anthropic requests effectiveInput collapses to 0 and the fresh-input credit is dropped.

This is the mirror image of a fix just landed in the firewall api-proxy (githubnext/gh-aw-firewall PR #5271), where calculateAiCredits was made provider-aware so it does not subtract cache for Anthropic. The proxy is now correct; the harness recompute is not, hence the drift.

Reproduction & evidence

Run: Smoke Claude — https://github.com/github/gh-aw-firewall/actions/runs/27803071455 (engine: claude, model resolved to claude-opus-4-7, gh-aw harness v0.79.6)

Proxy token-usage.jsonl (4 successful Anthropic requests, authoritative per-response AIC):

# input output cache_read cache_write ai_credits_this_response
1 2498 22 0 0 1.304
2 5 145 0 59981 37.853125
3 1 82 59981 221 3.342675
4 1 135 60202 197 3.471225
Σ 2505 384 120183 60399 45.971025

opus-4-7 pricing ($/M): input 5.00, cache_read 0.50, cache_write 6.25, output 25.00. Each per-response value above checks out exactly, and the cumulative proxy total is 45.971025.

agent_usage.json from the same run agrees on raw tokens (input 2505, output 384, cache_read 120183, cache_write 60399) but reports ai_credits: 44.719.

Reconciling the difference with the buggy formula (Anthropic, subtract cache_read → effectiveInput = max(2505 − 120183, 0) = 0):

input:       0       × 5.00 / 10000 = 0
cache_read:  120183  × 0.50 / 10000 = 6.00915
cache_write: 60399   × 6.25 / 10000 = 37.749375
output:      384      × 25.00 / 10000 = 0.96
total                                = 44.718525  ≈ 44.719   ✅ matches agent_usage

The exact gap (45.971025 − 44.719 ≈ 1.2525) equals the dropped fresh-input credit (2505 × 5.00 / 10000). This confirms the bug is solely the Anthropic cache-read subtraction.

Impact

  • GH_AW_AIC, agent_usage.json.ai_credits, the step-summary token table, and any OTEL ai_credits derived from the harness under-report AIC for Anthropic models whenever cache reads are present (i.e., almost every multi-turn Claude run).
  • Causes a recurring, confusing verify_token_usage drift warning even when the proxy accounting is correct.
  • AIC-budget reasoning that relies on the harness number (rather than the proxy's) is skewed low for Anthropic.

Suggested fixes

Option A (minimal/targeted): In providerIncludesCacheReadsInInput, return false for anthropic so Anthropic input is not cache-subtracted. (Reconsider the empty/default "" case too — it currently assumes OpenAI semantics.)

Option B (preferred, more robust): When the proxy token-usage.jsonl already carries authoritative ai_credits_this_response / ai_credits_total, trust those values instead of recomputing AIC from raw tokens in the harness. The api-proxy computes AIC at request time with full provider-aware pricing and is the source of truth; recomputing downstream invites exactly this kind of divergence.

Either fix eliminates the drift. Option B additionally future-proofs against provider/pricing logic drifting between the proxy and the harness.

References

  • Harness AIC formula: actions/setup/js/model_costs.cjs (computeInferenceCostUSD ~L203, providerIncludesCacheReadsInInput ~L98)
  • Harness recompute call sites: actions/setup/js/parse_mcp_gateway_log.cjs (computeInferenceAIC, L128–146), actions/setup/js/parse_token_usage.cjs (writes agent_usage.json)
  • Corresponding proxy-side fix: githubnext/gh-aw-firewall PR [go-fan] Go Module Review: spf13/cobra #5271 (provider-aware calculateAiCredits)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions