Problem
AgentV outputs JSONL for results. CI/CD tools (GitHub Actions, Jenkins, GitLab CI) natively render JUnit XML. A single JSON file with aggregate stats enables programmatic quality gates.
Proposal
Add -o flag with format inference from file extension:
agentv run --target my-agent evals/ \
-o results.jsonl \
-o results.json \
-o results.xml
Formats
.jsonl — current default, unchanged
.json — single object with aggregate stats + per-case results
.xml — JUnit XML for CI tools
JSON Schema
{
"evalId": "run-2026-02-19-001",
"stats": {
"total": 25,
"passed": 22,
"failed": 3,
"pass_rate": 0.88
},
"results": [...]
}
JUnit XML
<testsuites name="agentv-eval" tests="25" failures="3">
<testsuite name="eval-name">
<testcase name="case-id" classname="eval-file"/>
</testsuite>
</testsuites>
Why Not HTML?
Per design principle #1: "CLI wrappers that consume AgentV JSON/JSONL output for post-processing." HTML reports are a presentation concern. Build a separate viewer that reads JSONL/JSON output.
Design Principles Alignment
- ✅ Lightweight Core — two additional serialization formats, no UI
- ✅ Industry Standard — JUnit XML is the universal CI format
- ✅ Non-Breaking Extension — existing output unchanged, -o is additive
Acceptance Criteria
Problem
AgentV outputs JSONL for results. CI/CD tools (GitHub Actions, Jenkins, GitLab CI) natively render JUnit XML. A single JSON file with aggregate stats enables programmatic quality gates.
Proposal
Add
-oflag with format inference from file extension:Formats
.jsonl— current default, unchanged.json— single object with aggregate stats + per-case results.xml— JUnit XML for CI toolsJSON Schema
{ "evalId": "run-2026-02-19-001", "stats": { "total": 25, "passed": 22, "failed": 3, "pass_rate": 0.88 }, "results": [...] }JUnit XML
Why Not HTML?
Per design principle #1: "CLI wrappers that consume AgentV JSON/JSONL output for post-processing." HTML reports are a presentation concern. Build a separate viewer that reads JSONL/JSON output.
Design Principles Alignment
Acceptance Criteria
-oflag with multiple outputs per run