Summary
The Pi Coding Agent provider should return execution metrics (tokenUsage, costUsd, durationMs) in its ProviderResponse so they can be used by evaluators and included in evaluation results.
Current State
The provider already captures some of this data but doesn't surface it:
usage data is extracted from Pi's agent_end event and stored in metadata.usage on output messages
- Execution duration is tracked for logging but not returned
- Cost is not calculated
Proposed Changes
In packages/core/src/evaluation/providers/pi-coding-agent.ts:
-
Parse token usage from agent_end event
// In extractOutputMessages or new function
const usage = agentEndEvent.usage; // { input_tokens, output_tokens, ... }
const tokenUsage = {
input: usage.input_tokens,
output: usage.output_tokens,
cached: usage.cached_tokens, // if available
};
-
Calculate duration
const startTime = Date.now();
// ... execute Pi ...
const durationMs = Date.now() - startTime;
-
Return metrics in ProviderResponse
return {
raw: { ... },
outputMessages,
tokenUsage,
durationMs,
// costUsd: optional, requires pricing info
};
Files to Update
packages/core/src/evaluation/providers/pi-coding-agent.ts
Acceptance Criteria
Labels
enhancement, good-first-issue
Summary
The Pi Coding Agent provider should return execution metrics (
tokenUsage,costUsd,durationMs) in itsProviderResponseso they can be used by evaluators and included in evaluation results.Current State
The provider already captures some of this data but doesn't surface it:
usagedata is extracted from Pi'sagent_endevent and stored inmetadata.usageon output messagesProposed Changes
In
packages/core/src/evaluation/providers/pi-coding-agent.ts:Parse token usage from
agent_endeventCalculate duration
Return metrics in ProviderResponse
Files to Update
packages/core/src/evaluation/providers/pi-coding-agent.tsAcceptance Criteria
tokenUsageextracted from Pi's usage data and returned in responsedurationMscalculated from wall-clock execution timeLabels
enhancement, good-first-issue