feat: OpenTelemetry trace export (OTLP/HTTP)

## Summary

Add a built-in `--export-otel` CLI flag that exports AgentV evaluation traces to any OpenTelemetry-compatible backend (Langfuse, Braintrust, Confident AI, Jaeger, Grafana Tempo, Datadog, etc.) via OTLP/HTTP.

## Motivation

AgentV already captures rich structured traces (`OutputMessage[]` with `ToolCall[]`, timing, token usage) and computes `TraceSummary` for every eval run. Users want to send these traces to observability platforms for:

- **Debugging** agent execution flows visually (span trees)
- **Monitoring** tool call patterns, latency, and costs across runs
- **Comparing** agent configurations in platform dashboards
- **Integrating** with existing LLMOps tooling (Langfuse, Braintrust, Datadog)

### Industry evidence

- **Braintrust** uses OTel-compatible spans for unified offline/online tracing, and accepts OTLP ingestion — [research](https://github.com/agentevals/agentevals-research/blob/main/research/findings/braintrust/README.md)
- **Langfuse** (v3.22+) accepts OTLP/HTTP at `/api/public/otel`
- **Google ADK-js** uses OpenTelemetry as its native tracing layer
- **LangWatch** converts traces to OTel format
- GenAI semantic conventions (`gen_ai.*`) are standardized across the industry

### Stale PRs superseded by this issue

| PR | What it did | Status | Why superseded |
|----|------------|--------|---------------|
| [#136](https://github.com/EntityProcess/agentv/pull/136) | Standalone example: JSONL → OTLP/HTTP export script (Confident/Langfuse) | Draft, stale since Jan 9 | Good proof-of-concept but consumer-side only; this issue proposes built-in CLI support |
| [#92](https://github.com/EntityProcess/agentv/pull/92) | Langfuse-specific SDK integration (openspec proposal + design doc) | Draft, stale since Jan 1 | Vendor-locked to Langfuse SDK; this issue proposes vendor-neutral OTel export |

**Key insight from reviewing the stale PRs**: PR #92 proposed using the Langfuse SDK directly (`--langfuse` flag), which creates vendor lock-in. PR #136 used standard OTel SDK (`@opentelemetry/exporter-trace-otlp-http`) and mapped to multiple backends — this is the right approach. This issue takes #136's OTel-native approach and makes it a built-in CLI feature.

## Design

### Principle: OTel-native, vendor-neutral

Use `@opentelemetry/exporter-trace-otlp-http` directly — no vendor-specific SDKs. Backend configuration is handled entirely via standard OTel environment variables and optional AgentV backend presets.

This aligns with AgentV's architecture principles:
- **No vendor lock-in** — any OTLP-compatible backend works
- **CLI-first** — single flag enables export
- **Plugin-friendly** — custom exporters can extend via code_judge pattern

### OutputMessage → OTel Span Mapping

| AgentV Concept | OTel Span | Attributes |
|----------------|-----------|------------|
| Eval run (per test case) | Root trace span | `agentv.test_id`, `agentv.target`, `agentv.dataset`, `agentv.score` |
| Assistant message | Child span (`gen_ai.generation`) | `gen_ai.request.model`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens` |
| ToolCall | Child span (`gen_ai.tool`) | `gen_ai.tool.name`, `gen_ai.tool.call.id` |
| EvaluatorResult | Span event or attribute | `agentv.evaluator.name`, `agentv.evaluator.score`, `agentv.evaluator.verdict` |

Span hierarchy:
```
Trace: agentv.eval (test_id="case-001")
├── Span: gen_ai.generation (assistant response)
│   └── attributes: model, token usage, cost
├── Span: gen_ai.tool (tool="search")
│   └── attributes: tool name, duration
├── Span: gen_ai.tool (tool="read_file")
│   └── attributes: tool name, duration
└── Events: evaluator scores attached to root span
```

### CLI Interface

```bash
# Export to any OTLP backend
agentv eval tests.yaml --export-otel

# Backend is configured via standard OTel env vars
OTEL_EXPORTER_OTLP_ENDPOINT=https://cloud.langfuse.com/api/public/otel \
OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic <base64>" \
agentv eval tests.yaml --export-otel

# Or use built-in backend presets for convenience
agentv eval tests.yaml --export-otel --otel-backend langfuse
agentv eval tests.yaml --export-otel --otel-backend braintrust
```

### Backend Presets (Convenience Only)

Presets auto-configure `OTEL_EXPORTER_OTLP_ENDPOINT` and auth headers from well-known env vars:

| Preset | Endpoint | Auth Env Vars |
|--------|----------|--------------|
| `langfuse` | `https://cloud.langfuse.com/api/public/otel/v1/traces` | `LANGFUSE_PUBLIC_KEY`, `LANGFUSE_SECRET_KEY` → Basic Auth |
| `braintrust` | Via `BRAINTRUST_API_KEY` | `BRAINTRUST_API_KEY` |
| `confident` | `https://otel.confident-ai.com/v1/traces` | `CONFIDENT_API_KEY` → `x-confident-api-key` header |
| (none) | Uses `OTEL_EXPORTER_OTLP_ENDPOINT` directly | Uses `OTEL_EXPORTER_OTLP_HEADERS` directly |

### Privacy: Content Capture

Following PR #92's design (and Azure SDK / Google ADK patterns):

- **Default**: Do NOT export message content or tool inputs/outputs — only metadata, timing, scores
- **Opt-in**: `--otel-capture-content` flag or `AGENTV_OTEL_CAPTURE_CONTENT=true` to include full content
- **Rationale**: Traces may contain PII, secrets, or proprietary code

### Error Handling

- Export failures log a warning and do NOT fail the evaluation
- Flush pending spans before CLI exits (with timeout)
- Missing credentials when `--export-otel` is set → warning, proceed without export

## Implementation Plan

### Phase 1: Core OTel Exporter

1. Add `@opentelemetry/api`, `@opentelemetry/exporter-trace-otlp-http`, `@opentelemetry/sdk-trace-node`, `@opentelemetry/resources`, `@opentelemetry/semantic-conventions` as optional dependencies
2. Create `packages/core/src/observability/otel-exporter.ts` — converts `EvaluationResult` + `OutputMessage[]` → OTel spans
3. Create `packages/core/src/observability/types.ts` — `TraceExporter` interface
4. Wire into eval orchestrator: after each test case completes, if `--export-otel` is set, export the result
5. Flush on eval completion

### Phase 2: Backend Presets

6. Add preset config for Langfuse, Braintrust, Confident AI
7. `--otel-backend <name>` maps to endpoint + auth header construction

### Phase 3: Content Control + Polish

8. Content filtering (strip message content / tool I/O when capture disabled)
9. Documentation + example

## Acceptance Criteria

- [ ] `--export-otel` flag sends OTLP/HTTP traces to configured endpoint
- [ ] Span hierarchy: root (eval case) → children (assistant messages, tool calls)
- [ ] `gen_ai.*` semantic conventions for LLM and tool attributes
- [ ] `TraceSummary` metrics as root span attributes
- [ ] Backend presets for Langfuse, Braintrust, Confident AI
- [ ] Privacy-first: no content exported by default
- [ ] `--otel-capture-content` enables full content export
- [ ] Export failures don't fail the evaluation
- [ ] Works with `--trace` flag (full output messages) and without (uses TraceSummary)
- [ ] Unit tests for span conversion
- [ ] Supersedes PRs #92 and #136 (close both when this ships)

## Effort Estimate

3-5 days (reuses mapping patterns from PR #136)

## Relation to Existing Work

- **PR #136** (draft): Proved the OTel approach works. This issue promotes it to built-in.
- **PR #92** (draft): Good design doc but Langfuse-specific. This issue generalizes to any OTel backend.
- **`--trace` flag** (merged in #186): Already persists `OutputMessage[]` to JSONL. OTel export reuses the same data.
- **`TraceSummary`** (merged in #185): Compact trace metadata always available — used as span attributes even without `--trace`.

Preset	Endpoint	Auth Env Vars
`langfuse`	`https://cloud.langfuse.com/api/public/otel/v1/traces`	`LANGFUSE_PUBLIC_KEY`, `LANGFUSE_SECRET_KEY` → Basic Auth
`braintrust`	Via `BRAINTRUST_API_KEY`	`BRAINTRUST_API_KEY`
`confident`	`https://otel.confident-ai.com/v1/traces`	`CONFIDENT_API_KEY` → `x-confident-api-key` header
(none)	Uses `OTEL_EXPORTER_OTLP_ENDPOINT` directly	Uses `OTEL_EXPORTER_OTLP_HEADERS` directly

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: OpenTelemetry trace export (OTLP/HTTP) #277

Summary

Motivation

Industry evidence

Stale PRs superseded by this issue

Design

Principle: OTel-native, vendor-neutral

OutputMessage → OTel Span Mapping

CLI Interface

Backend Presets (Convenience Only)

Privacy: Content Capture

Error Handling

Implementation Plan

Phase 1: Core OTel Exporter

Phase 2: Backend Presets

Phase 3: Content Control + Polish

Acceptance Criteria

Effort Estimate

Relation to Existing Work

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PR	What it did	Status	Why superseded
#136	Standalone example: JSONL → OTLP/HTTP export script (Confident/Langfuse)	Draft, stale since Jan 9	Good proof-of-concept but consumer-side only; this issue proposes built-in CLI support
#92	Langfuse-specific SDK integration (openspec proposal + design doc)	Draft, stale since Jan 1	Vendor-locked to Langfuse SDK; this issue proposes vendor-neutral OTel export

AgentV Concept	OTel Span	Attributes
Eval run (per test case)	Root trace span	`agentv.test_id`, `agentv.target`, `agentv.dataset`, `agentv.score`
Assistant message	Child span (`gen_ai.generation`)	`gen_ai.request.model`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`
ToolCall	Child span (`gen_ai.tool`)	`gen_ai.tool.name`, `gen_ai.tool.call.id`
EvaluatorResult	Span event or attribute	`agentv.evaluator.name`, `agentv.evaluator.score`, `agentv.evaluator.verdict`

feat: OpenTelemetry trace export (OTLP/HTTP) #277

Description

Summary

Motivation

Industry evidence

Stale PRs superseded by this issue

Design

Principle: OTel-native, vendor-neutral

OutputMessage → OTel Span Mapping

CLI Interface

Backend Presets (Convenience Only)

Privacy: Content Capture

Error Handling

Implementation Plan

Phase 1: Core OTel Exporter

Phase 2: Backend Presets

Phase 3: Content Control + Polish

Acceptance Criteria

Effort Estimate

Relation to Existing Work

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions