Trace Timestamps & Persistence
Goal
Add startTime/endTime to trace types and persist full traces to disk via --trace flag.
Scope
1. Add startTime/endTime to core type interfaces
Replace timestamp with startTime/endTime on ToolCall, OutputMessage, and ProviderResponse. Add startTime/endTime/llmCallCount to TraceSummary and ExecutionMetrics. Since we have few users, replace timestamp directly (no soft deprecation).
Files:
packages/core/src/evaluation/providers/types.ts
packages/core/src/evaluation/trace.ts
- All references to old
timestamp field across providers and tests
2. Update computeTraceSummary to derive timing from spans
- Derive
startTime/endTime from message boundaries (earliest start, latest end)
- Compute
toolDurations from startTime/endTime when durationMs not provided
- Count
llmCallCount from assistant messages
Files:
packages/core/src/evaluation/trace.ts
- New:
packages/core/test/evaluation/trace-summary.test.ts
3. Add --trace flag and TraceWriter for trace persistence
- Add
--trace CLI flag that writes full outputMessages to .agentv/traces/ as JSONL
- Add
outputMessages to EvaluationResult (optional, stripped before results output)
- TraceWriter writes JSONL trace records with spans derived from outputMessages
Files:
- New:
apps/cli/src/commands/eval/trace-writer.ts
apps/cli/src/commands/eval/index.ts
apps/cli/src/commands/eval/run-eval.ts
packages/core/src/evaluation/types.ts
packages/core/src/evaluation/orchestrator.ts
Out of scope
Aggregate threshold checks (max_total_duration_ms, max_llm_calls, max_tool_calls) are handled by #103 (execution_metrics evaluator), not this issue.
Related
Trace Timestamps & Persistence
Goal
Add
startTime/endTimeto trace types and persist full traces to disk via--traceflag.Scope
1. Add
startTime/endTimeto core type interfacesReplace
timestampwithstartTime/endTimeonToolCall,OutputMessage, andProviderResponse. AddstartTime/endTime/llmCallCounttoTraceSummaryandExecutionMetrics. Since we have few users, replacetimestampdirectly (no soft deprecation).Files:
packages/core/src/evaluation/providers/types.tspackages/core/src/evaluation/trace.tstimestampfield across providers and tests2. Update
computeTraceSummaryto derive timing from spansstartTime/endTimefrom message boundaries (earliest start, latest end)toolDurationsfromstartTime/endTimewhendurationMsnot providedllmCallCountfrom assistant messagesFiles:
packages/core/src/evaluation/trace.tspackages/core/test/evaluation/trace-summary.test.ts3. Add
--traceflag and TraceWriter for trace persistence--traceCLI flag that writes fulloutputMessagesto.agentv/traces/as JSONLoutputMessagestoEvaluationResult(optional, stripped before results output)Files:
apps/cli/src/commands/eval/trace-writer.tsapps/cli/src/commands/eval/index.tsapps/cli/src/commands/eval/run-eval.tspackages/core/src/evaluation/types.tspackages/core/src/evaluation/orchestrator.tsOut of scope
Aggregate threshold checks (
max_total_duration_ms,max_llm_calls,max_tool_calls) are handled by #103 (execution_metricsevaluator), not this issue.Related