Objective
Let AgentV sync eval benchmark repos on demand via Studio UI, following an ArgoCD-style model where git is the source of truth and sync is an explicit action.
Current Problem
Today, getting a benchmark repo up to date is left to the caller. Users script ad-hoc clones/pulls per environment. There's no unified way to say "this benchmark lives in a git repo" and have Studio manage it.
Design
1. Add source to benchmark entries
Extend the existing benchmark registry at ~/.agentv/benchmarks.yaml with an optional source field. Uses the existing interpolateEnv() from packages/core/src/evaluation/interpolation.ts.
benchmarks:
- id: eval-benchmarks
name: Eval Benchmarks
path: evals
source:
url: ${{ BENCHMARK_REPO_URL }}
ref: ${{ BENCHMARK_REPO_REF:-main }}
added_at: "2026-03-20T10:00:00Z"
source is optional. If absent, path is used as-is (current behaviour).
- No
sync field. If source exists, the benchmark is git-backed and syncable.
2. Sync as explicit action (ArgoCD model)
- Studio UI: Benchmarks screen shows a "Sync" button for git-backed benchmarks. Click → oneshot
git clone --depth 1 or git pull --ff-only.
- CLI:
agentv benchmark sync <id> triggers the same oneshot pull.
- Docker/CI: Run
agentv benchmark sync as a pre-step in the container entrypoint or CI pipeline.
No background daemon. No continuous mode. No git-sync dependency.
3. Behaviour
| State |
Action |
No source |
Local benchmark. No sync button. Path used as-is. |
Has source, first time |
git clone --depth 1 --filter=blob:none to path. |
Has source, already cloned |
git pull --ff-only from source.ref. |
| Docker/CI |
agentv benchmark sync in entrypoint or script. |
4. Interaction with existing eval workspace.repos
Individual .eval.yaml files can declare additional repos via workspace.repos. This is per-eval, inline, and unrelated to the benchmark registry. The two coexist:
- Benchmark
source — project-level "where does this benchmark live." Drives sync.
- Eval
workspace.repos — per-eval "this test needs this additional repo." Used at eval runtime.
Acceptance Criteria
Non-Goals
- Continuous sync / background daemon.
- git-sync dependency or binary distribution.
- Two-way sync, conflict resolution, or write-back to source.
- Auto-sync on interval.
Objective
Let AgentV sync eval benchmark repos on demand via Studio UI, following an ArgoCD-style model where git is the source of truth and sync is an explicit action.
Current Problem
Today, getting a benchmark repo up to date is left to the caller. Users script ad-hoc clones/pulls per environment. There's no unified way to say "this benchmark lives in a git repo" and have Studio manage it.
Design
1. Add
sourceto benchmark entriesExtend the existing benchmark registry at
~/.agentv/benchmarks.yamlwith an optionalsourcefield. Uses the existinginterpolateEnv()frompackages/core/src/evaluation/interpolation.ts.sourceis optional. If absent,pathis used as-is (current behaviour).syncfield. Ifsourceexists, the benchmark is git-backed and syncable.2. Sync as explicit action (ArgoCD model)
git clone --depth 1orgit pull --ff-only.agentv benchmark sync <id>triggers the same oneshot pull.agentv benchmark syncas a pre-step in the container entrypoint or CI pipeline.No background daemon. No continuous mode. No git-sync dependency.
3. Behaviour
sourcesource, first timegit clone --depth 1 --filter=blob:nonetopath.source, already clonedgit pull --ff-onlyfromsource.ref.agentv benchmark syncin entrypoint or script.4. Interaction with existing eval
workspace.reposIndividual
.eval.yamlfiles can declare additional repos viaworkspace.repos. This is per-eval, inline, and unrelated to the benchmark registry. The two coexist:source— project-level "where does this benchmark live." Drives sync.workspace.repos— per-eval "this test needs this additional repo." Used at eval runtime.Acceptance Criteria
source.url+source.refcan be synced topathvia Studio UI "Sync" button oragentv benchmark sync.agentv benchmark syncdoesgit clone --depth 1(first time) orgit pull --ff-only(subsequent).sourcecontinue to work unchanged.${{ ENV_VAR }}interpolation works insource.urlandsource.ref.Non-Goals