Summary
Add a built-in --export-otel CLI flag that exports AgentV evaluation traces to any OpenTelemetry-compatible backend (Langfuse, Braintrust, Confident AI, Jaeger, Grafana Tempo, Datadog, etc.) via OTLP/HTTP.
Motivation
AgentV already captures rich structured traces (OutputMessage[] with ToolCall[], timing, token usage) and computes TraceSummary for every eval run. Users want to send these traces to observability platforms for:
- Debugging agent execution flows visually (span trees)
- Monitoring tool call patterns, latency, and costs across runs
- Comparing agent configurations in platform dashboards
- Integrating with existing LLMOps tooling (Langfuse, Braintrust, Datadog)
Industry evidence
- Braintrust uses OTel-compatible spans for unified offline/online tracing, and accepts OTLP ingestion — research
- Langfuse (v3.22+) accepts OTLP/HTTP at
/api/public/otel
- Google ADK-js uses OpenTelemetry as its native tracing layer
- LangWatch converts traces to OTel format
- GenAI semantic conventions (
gen_ai.*) are standardized across the industry
Stale PRs superseded by this issue
| PR |
What it did |
Status |
Why superseded |
| #136 |
Standalone example: JSONL → OTLP/HTTP export script (Confident/Langfuse) |
Draft, stale since Jan 9 |
Good proof-of-concept but consumer-side only; this issue proposes built-in CLI support |
| #92 |
Langfuse-specific SDK integration (openspec proposal + design doc) |
Draft, stale since Jan 1 |
Vendor-locked to Langfuse SDK; this issue proposes vendor-neutral OTel export |
Key insight from reviewing the stale PRs: PR #92 proposed using the Langfuse SDK directly (--langfuse flag), which creates vendor lock-in. PR #136 used standard OTel SDK (@opentelemetry/exporter-trace-otlp-http) and mapped to multiple backends — this is the right approach. This issue takes #136's OTel-native approach and makes it a built-in CLI feature.
Design
Principle: OTel-native, vendor-neutral
Use @opentelemetry/exporter-trace-otlp-http directly — no vendor-specific SDKs. Backend configuration is handled entirely via standard OTel environment variables and optional AgentV backend presets.
This aligns with AgentV's architecture principles:
- No vendor lock-in — any OTLP-compatible backend works
- CLI-first — single flag enables export
- Plugin-friendly — custom exporters can extend via code_judge pattern
OutputMessage → OTel Span Mapping
| AgentV Concept |
OTel Span |
Attributes |
| Eval run (per test case) |
Root trace span |
agentv.test_id, agentv.target, agentv.dataset, agentv.score |
| Assistant message |
Child span (gen_ai.generation) |
gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens |
| ToolCall |
Child span (gen_ai.tool) |
gen_ai.tool.name, gen_ai.tool.call.id |
| EvaluatorResult |
Span event or attribute |
agentv.evaluator.name, agentv.evaluator.score, agentv.evaluator.verdict |
Span hierarchy:
Trace: agentv.eval (test_id="case-001")
├── Span: gen_ai.generation (assistant response)
│ └── attributes: model, token usage, cost
├── Span: gen_ai.tool (tool="search")
│ └── attributes: tool name, duration
├── Span: gen_ai.tool (tool="read_file")
│ └── attributes: tool name, duration
└── Events: evaluator scores attached to root span
CLI Interface
# Export to any OTLP backend
agentv eval tests.yaml --export-otel
# Backend is configured via standard OTel env vars
OTEL_EXPORTER_OTLP_ENDPOINT=https://cloud.langfuse.com/api/public/otel \
OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic <base64>" \
agentv eval tests.yaml --export-otel
# Or use built-in backend presets for convenience
agentv eval tests.yaml --export-otel --otel-backend langfuse
agentv eval tests.yaml --export-otel --otel-backend braintrust
Backend Presets (Convenience Only)
Presets auto-configure OTEL_EXPORTER_OTLP_ENDPOINT and auth headers from well-known env vars:
| Preset |
Endpoint |
Auth Env Vars |
langfuse |
https://cloud.langfuse.com/api/public/otel/v1/traces |
LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY → Basic Auth |
braintrust |
Via BRAINTRUST_API_KEY |
BRAINTRUST_API_KEY |
confident |
https://otel.confident-ai.com/v1/traces |
CONFIDENT_API_KEY → x-confident-api-key header |
| (none) |
Uses OTEL_EXPORTER_OTLP_ENDPOINT directly |
Uses OTEL_EXPORTER_OTLP_HEADERS directly |
Privacy: Content Capture
Following PR #92's design (and Azure SDK / Google ADK patterns):
- Default: Do NOT export message content or tool inputs/outputs — only metadata, timing, scores
- Opt-in:
--otel-capture-content flag or AGENTV_OTEL_CAPTURE_CONTENT=true to include full content
- Rationale: Traces may contain PII, secrets, or proprietary code
Error Handling
- Export failures log a warning and do NOT fail the evaluation
- Flush pending spans before CLI exits (with timeout)
- Missing credentials when
--export-otel is set → warning, proceed without export
Implementation Plan
Phase 1: Core OTel Exporter
- Add
@opentelemetry/api, @opentelemetry/exporter-trace-otlp-http, @opentelemetry/sdk-trace-node, @opentelemetry/resources, @opentelemetry/semantic-conventions as optional dependencies
- Create
packages/core/src/observability/otel-exporter.ts — converts EvaluationResult + OutputMessage[] → OTel spans
- Create
packages/core/src/observability/types.ts — TraceExporter interface
- Wire into eval orchestrator: after each test case completes, if
--export-otel is set, export the result
- Flush on eval completion
Phase 2: Backend Presets
- Add preset config for Langfuse, Braintrust, Confident AI
--otel-backend <name> maps to endpoint + auth header construction
Phase 3: Content Control + Polish
- Content filtering (strip message content / tool I/O when capture disabled)
- Documentation + example
Acceptance Criteria
Effort Estimate
3-5 days (reuses mapping patterns from PR #136)
Relation to Existing Work
Summary
Add a built-in
--export-otelCLI flag that exports AgentV evaluation traces to any OpenTelemetry-compatible backend (Langfuse, Braintrust, Confident AI, Jaeger, Grafana Tempo, Datadog, etc.) via OTLP/HTTP.Motivation
AgentV already captures rich structured traces (
OutputMessage[]withToolCall[], timing, token usage) and computesTraceSummaryfor every eval run. Users want to send these traces to observability platforms for:Industry evidence
/api/public/otelgen_ai.*) are standardized across the industryStale PRs superseded by this issue
Key insight from reviewing the stale PRs: PR #92 proposed using the Langfuse SDK directly (
--langfuseflag), which creates vendor lock-in. PR #136 used standard OTel SDK (@opentelemetry/exporter-trace-otlp-http) and mapped to multiple backends — this is the right approach. This issue takes #136's OTel-native approach and makes it a built-in CLI feature.Design
Principle: OTel-native, vendor-neutral
Use
@opentelemetry/exporter-trace-otlp-httpdirectly — no vendor-specific SDKs. Backend configuration is handled entirely via standard OTel environment variables and optional AgentV backend presets.This aligns with AgentV's architecture principles:
OutputMessage → OTel Span Mapping
agentv.test_id,agentv.target,agentv.dataset,agentv.scoregen_ai.generation)gen_ai.request.model,gen_ai.usage.input_tokens,gen_ai.usage.output_tokensgen_ai.tool)gen_ai.tool.name,gen_ai.tool.call.idagentv.evaluator.name,agentv.evaluator.score,agentv.evaluator.verdictSpan hierarchy:
CLI Interface
Backend Presets (Convenience Only)
Presets auto-configure
OTEL_EXPORTER_OTLP_ENDPOINTand auth headers from well-known env vars:langfusehttps://cloud.langfuse.com/api/public/otel/v1/tracesLANGFUSE_PUBLIC_KEY,LANGFUSE_SECRET_KEY→ Basic AuthbraintrustBRAINTRUST_API_KEYBRAINTRUST_API_KEYconfidenthttps://otel.confident-ai.com/v1/tracesCONFIDENT_API_KEY→x-confident-api-keyheaderOTEL_EXPORTER_OTLP_ENDPOINTdirectlyOTEL_EXPORTER_OTLP_HEADERSdirectlyPrivacy: Content Capture
Following PR #92's design (and Azure SDK / Google ADK patterns):
--otel-capture-contentflag orAGENTV_OTEL_CAPTURE_CONTENT=trueto include full contentError Handling
--export-otelis set → warning, proceed without exportImplementation Plan
Phase 1: Core OTel Exporter
@opentelemetry/api,@opentelemetry/exporter-trace-otlp-http,@opentelemetry/sdk-trace-node,@opentelemetry/resources,@opentelemetry/semantic-conventionsas optional dependenciespackages/core/src/observability/otel-exporter.ts— convertsEvaluationResult+OutputMessage[]→ OTel spanspackages/core/src/observability/types.ts—TraceExporterinterface--export-otelis set, export the resultPhase 2: Backend Presets
--otel-backend <name>maps to endpoint + auth header constructionPhase 3: Content Control + Polish
Acceptance Criteria
--export-otelflag sends OTLP/HTTP traces to configured endpointgen_ai.*semantic conventions for LLM and tool attributesTraceSummarymetrics as root span attributes--otel-capture-contentenables full content export--traceflag (full output messages) and without (uses TraceSummary)Effort Estimate
3-5 days (reuses mapping patterns from PR #136)
Relation to Existing Work
--traceflag (merged in feat(cli): add --trace flag for persisting execution traces (#172) #186): Already persistsOutputMessage[]to JSONL. OTel export reuses the same data.TraceSummary(merged in feat(core): add span-based timing to trace types (#172) #185): Compact trace metadata always available — used as span attributes even without--trace.