Skip to content

feat: Eval benchmark repo sync to remote targets #1232

@christso

Description

@christso

Objective

Let AgentV sync eval benchmark repos on demand via Studio UI, following an ArgoCD-style model where git is the source of truth and sync is an explicit action.

Current Problem

Today, getting a benchmark repo up to date is left to the caller. Users script ad-hoc clones/pulls per environment. There's no unified way to say "this benchmark lives in a git repo" and have Studio manage it.

Design

1. Add source to benchmark entries

Extend the existing benchmark registry at ~/.agentv/benchmarks.yaml with an optional source field. Uses the existing interpolateEnv() from packages/core/src/evaluation/interpolation.ts.

benchmarks:
  - id: eval-benchmarks
    name: Eval Benchmarks
    path: evals
    source:
      url: ${{ BENCHMARK_REPO_URL }}
      ref: ${{ BENCHMARK_REPO_REF:-main }}
    added_at: "2026-03-20T10:00:00Z"
  • source is optional. If absent, path is used as-is (current behaviour).
  • No sync field. If source exists, the benchmark is git-backed and syncable.

2. Sync as explicit action (ArgoCD model)

  • Studio UI: Benchmarks screen shows a "Sync" button for git-backed benchmarks. Click → oneshot git clone --depth 1 or git pull --ff-only.
  • CLI: agentv benchmark sync <id> triggers the same oneshot pull.
  • Docker/CI: Run agentv benchmark sync as a pre-step in the container entrypoint or CI pipeline.

No background daemon. No continuous mode. No git-sync dependency.

3. Behaviour

State Action
No source Local benchmark. No sync button. Path used as-is.
Has source, first time git clone --depth 1 --filter=blob:none to path.
Has source, already cloned git pull --ff-only from source.ref.
Docker/CI agentv benchmark sync in entrypoint or script.

4. Interaction with existing eval workspace.repos

Individual .eval.yaml files can declare additional repos via workspace.repos. This is per-eval, inline, and unrelated to the benchmark registry. The two coexist:

  • Benchmark source — project-level "where does this benchmark live." Drives sync.
  • Eval workspace.repos — per-eval "this test needs this additional repo." Used at eval runtime.

Acceptance Criteria

  • A benchmark entry with source.url + source.ref can be synced to path via Studio UI "Sync" button or agentv benchmark sync.
  • agentv benchmark sync does git clone --depth 1 (first time) or git pull --ff-only (subsequent).
  • Existing benchmark entries without source continue to work unchanged.
  • ${{ ENV_VAR }} interpolation works in source.url and source.ref.
  • Studio benchmarks screen shows sync button for git-backed benchmarks.

Non-Goals

  • Continuous sync / background daemon.
  • git-sync dependency or binary distribution.
  • Two-way sync, conflict resolution, or write-back to source.
  • Auto-sync on interval.

Metadata

Metadata

Assignees

No one assigned

    Labels

    coreAnything pertaining to core functionality of AgentV

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions