Context
AgentV captures detailed token usage from both Copilot CLI (via ACP usage_update) and Claude SDK (input_tokens, output_tokens, cache_read_input_tokens) providers. However, the OTel exporter only includes aggregate trace-level counts as root span attributes, not per-message/per-LLM-call token breakdowns.
Current behavior
In otel-exporter.ts, the root span gets:
agentv.trace.event_count — total tool calls
agentv.trace.cost_usd — total cost
agentv.trace.llm_call_count — LLM call count
Child gen_ai.generation spans get model name, duration, and content — but no token usage.
Proposal
Add token usage attributes to each LLM child span using the exact GenAI semantic convention names:
// In exportMessage(), for assistant messages:
if (msg.tokenUsage) {
span.setAttribute('gen_ai.usage.input_tokens', msg.tokenUsage.inputTokens);
span.setAttribute('gen_ai.usage.output_tokens', msg.tokenUsage.outputTokens);
if (msg.tokenUsage.cacheCreationInputTokens != null)
span.setAttribute('gen_ai.usage.cache_creation.input_tokens', msg.tokenUsage.cacheCreationInputTokens);
if (msg.tokenUsage.cacheReadInputTokens != null)
span.setAttribute('gen_ai.usage.cache_read.input_tokens', msg.tokenUsage.cacheReadInputTokens);
}
Note: The GenAI spec has no cost attribute (gen_ai.usage.cost does not exist). Cost should remain as agentv.trace.cost_usd on the root span only.
Data flow prerequisite
The Message type in packages/core/src/evaluation/providers/types.ts needs tokenUsage if not already present. Check what each provider currently captures:
| Provider |
Token data available |
Where captured |
| Claude SDK |
input_tokens, output_tokens, cache_read_input_tokens, cache_creation_input_tokens |
claude.ts — from result.usage |
| Copilot CLI |
used (input tokens), no output breakdown |
copilot-cli.ts — from ACP usage_update |
| Copilot SDK |
input_tokens, output_tokens |
copilot-sdk.ts — from SDK response |
| Azure/AI SDK |
Varies by model |
ai-sdk.ts — from Vercel AI SDK response |
If Message.tokenUsage doesn't exist, add it as an optional field and propagate from providers.
Files to modify
packages/core/src/evaluation/providers/types.ts — Add tokenUsage?: ProviderTokenUsage to Message type (if missing)
packages/core/src/evaluation/providers/claude.ts — Attach per-message token usage to Message objects
packages/core/src/evaluation/providers/copilot-cli.ts — Same
packages/core/src/observability/otel-exporter.ts — Read msg.tokenUsage in exportMessage() and set attributes
- Tests — Verify token attributes appear on child spans
Acceptance criteria
References
Testing Approach
Unit Tests (InMemorySpanExporter)
const exporter = new InMemorySpanExporter();
// Run mock eval with known token counts (e.g., mock provider returns tokenUsage: { input: 100, output: 50, cached: 20 })
const spans = exporter.getFinishedSpans();
const genSpan = spans.find(s => s.attributes['gen_ai.operation.name'] === 'chat');
expect(genSpan.attributes['gen_ai.usage.input_tokens']).toBe(100);
expect(genSpan.attributes['gen_ai.usage.output_tokens']).toBe(50);
expect(genSpan.attributes['gen_ai.usage.cache_read.input_tokens']).toBe(20);
What to Assert
Context
AgentV captures detailed token usage from both Copilot CLI (via ACP
usage_update) and Claude SDK (input_tokens,output_tokens,cache_read_input_tokens) providers. However, the OTel exporter only includes aggregate trace-level counts as root span attributes, not per-message/per-LLM-call token breakdowns.Current behavior
In
otel-exporter.ts, the root span gets:agentv.trace.event_count— total tool callsagentv.trace.cost_usd— total costagentv.trace.llm_call_count— LLM call countChild
gen_ai.generationspans get model name, duration, and content — but no token usage.Proposal
Add token usage attributes to each LLM child span using the exact GenAI semantic convention names:
Data flow prerequisite
The
Messagetype inpackages/core/src/evaluation/providers/types.tsneedstokenUsageif not already present. Check what each provider currently captures:input_tokens,output_tokens,cache_read_input_tokens,cache_creation_input_tokensclaude.ts— fromresult.usageused(input tokens), no output breakdowncopilot-cli.ts— from ACPusage_updateinput_tokens,output_tokenscopilot-sdk.ts— from SDK responseai-sdk.ts— from Vercel AI SDK responseIf
Message.tokenUsagedoesn't exist, add it as an optional field and propagate from providers.Files to modify
packages/core/src/evaluation/providers/types.ts— AddtokenUsage?: ProviderTokenUsagetoMessagetype (if missing)packages/core/src/evaluation/providers/claude.ts— Attach per-message token usage toMessageobjectspackages/core/src/evaluation/providers/copilot-cli.ts— Samepackages/core/src/observability/otel-exporter.ts— Readmsg.tokenUsageinexportMessage()and set attributesAcceptance criteria
gen_ai.usage.input_tokensandgen_ai.usage.output_tokenswhen availablegen_ai.usage.cache_creation.input_tokens,gen_ai.usage.cache_read.input_tokensagentv.trace.cost_usdremains unchanged (nogen_ai.usage.cost— doesn't exist in spec)References
gen_ai.usage.*sectionpackages/core/src/observability/otel-exporter.ts:178-253Testing Approach
Unit Tests (InMemorySpanExporter)
What to Assert
gen_ai.chatspans (not just root)agentv.*)