feat(sdk): programmatic API — evaluate() for using agentv as a library

## Summary

Add an `evaluate()` programmatic API so agentv can be used as a library, not just a CLI. This is the foundation for language SDKs, CI integrations, and AI agent consumption.

## Motivation

AgentV is currently CLI-only. To use it programmatically, users must shell out to `agentv run` and parse JSONL output. This is fragile and prevents:
- Type-safe integration in TypeScript projects
- Language SDKs (Python, C#) which need a programmatic API to wrap
- AI agents using agentv as a composable primitive
- Custom runners with different output formats

**Evidence from research**:
- **Promptfoo**: `promptfoo.evaluate({ tests, assert, providers })` — single function, config mirrors YAML
- **Azure SDK**: `evaluate(data=..., evaluators=..., target=...)` — same pattern
- **DeepEval**: `evaluate(test_cases, metrics)` — same pattern, Python
- **Mastra**: Programmatic `agent.evaluate()` method
- **nem035/agentevals**: `describe/it/expect` plus `run()` — dual CLI + programmatic

## Proposed Design

### Naming

Per [#328 naming decisions](https://github.com/EntityProcess/agentv/issues/328), the programmatic API is `evaluate()` (not `runEvaluation()`) to match Promptfoo's mental model. The config shape mirrors the YAML — users can translate between them 1:1.

### TypeScript API

```typescript
import { evaluate } from "@agentv/core";

// Option 1: Pure code (no YAML needed) — config mirrors YAML structure
const results = await evaluate({
  tests: [
    {
      id: "capital",
      input: "What is the capital of France?",
      expected_output: "Paris",
      assert: [
        { type: "contains", value: "Paris" },
        { type: "llm_judge", prompt: "Is this geographically correct?" },
      ],
    },
  ],
  target: { provider: "claude_agent" },
});

// Option 2: Load from YAML (existing workflow, programmatic)
const results = await evaluate({
  specFile: "./evals/EVAL.yaml",
  target: "mock_agent",
  filter: "specific-test",
});

// Results are typed
for (const result of results.results) {
  console.log(result.test_id, result.verdict, result.scores);
}
```

### Package location

`evaluate()` lives in `@agentv/core` — it needs the orchestrator, providers, and registry. This is a heavy dependency and that's honest: users who need programmatic evaluation need the engine.

```
@agentv/eval  → defineAssertion(), defineCodeJudge()  (lightweight, zod only)
@agentv/core  → evaluate()                            (heavy, needs engine)
```

### Return Type

```typescript
interface EvalRunResult {
  results: EvalCaseResult[];
  summary: {
    total: number;
    passed: number;
    failed: number;
    borderline: number;
    duration_ms: number;
    cost_usd: number;
  };
}
```

## Architecture

Extract the evaluation orchestration logic from the CLI command handler into a standalone function:

```
apps/cli/src/commands/run.ts (CLI layer)
  └── calls evaluate() from @agentv/core (library layer)
        ├── loadEvalSpec()     — parse YAML/JSONL
        ├── resolveTarget()    — provider registry lookup
        ├── runTests()         — parallel execution
        └── scoreResults()     — assertion pipeline
```

The CLI becomes a thin wrapper around the library API.

## Acceptance Criteria

- [ ] `evaluate()` exported from `@agentv/core`
- [ ] Accepts both inline config (mirrors YAML) and file-based specs
- [ ] Config uses `assert` key (matching YAML), not `evaluators`
- [ ] Returns typed `EvalRunResult` with summary statistics
- [ ] CLI refactored to use `evaluate()` internally
- [ ] Progress callbacks for streaming results: `onResult?: (result: EvalCaseResult) => void`
- [ ] All existing CLI tests continue passing
- [ ] JSDoc documentation on all public API functions
- [ ] Example in `examples/programmatic/` showing library usage

## Research References

- [#328 SDK naming and package strategy](https://github.com/EntityProcess/agentv/issues/328) — naming decisions, package split
- [Promptfoo SDK](https://www.promptfoo.dev/docs/usage/node-package/) — `evaluate()` API, config mirrors YAML
- [Azure SDK for Python](https://github.com/agentevals/agentevals-research/blob/main/research/findings/azure-sdk-for-python/README.md) — `evaluate()` function
- [DeepEval](https://github.com/agentevals/agentevals-research/blob/main/research/findings/deepeval/README.md) — `evaluate()` function
- [opencode SDK patterns](https://github.com/agentevals/agentevals-research/blob/main/research/findings/opencode-sdk-dx/README.md) — client/server separation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sdk): programmatic API — evaluate() for using agentv as a library #322

Summary

Motivation

Proposed Design

Naming

TypeScript API

Package location

Return Type

Architecture

Acceptance Criteria

Research References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(sdk): programmatic API — evaluate() for using agentv as a library #322

Description

Summary

Motivation

Proposed Design

Naming

TypeScript API

Package location

Return Type

Architecture

Acceptance Criteria

Research References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions