feat(core): add claude provider using @anthropic-ai/claude-agent-sdk

## Summary

Replace the current `claude-code` provider (subprocess scraper) with a new `claude` provider using the official [`@anthropic-ai/claude-agent-sdk`](https://platform.claude.com/docs/en/agent-sdk/typescript). This follows the same pattern as the `copilot` provider (PR #211) which replaced the subprocess-based `copilot-cli` with the typed SDK.

## Motivation

The current `claude-code` provider spawns the Claude CLI as a subprocess and scrapes stdout. The Agent SDK provides:
- **Structured messages** via `AsyncGenerator<SDKMessage>` - typed `SDKAssistantMessage`, `SDKResultMessage`
- **Token usage & cost** from `SDKResultMessage.usage`, `modelUsage`, `total_cost_usd`
- **Duration** from `SDKResultMessage.duration_ms`
- **Tool call tracking** via assistant message content blocks or `PostToolUse` hooks
- **Permission control** via `permissionMode: 'bypassPermissions'` (no interactive prompts)
- **Model selection** via `options.model`
- **Budget control** via `options.maxBudgetUsd` and `options.maxTurns`
- **Abort support** via `options.abortController`

## Competitive Advantage

AgentV's SDK-based providers extract **structured tool call traces** as `outputMessages` — enabling `tool_trajectory` and `tool_call_f1` evaluation. Most eval frameworks treat coding agents as text-in/text-out black boxes. AgentV's SDK integrations give evaluators access to:

- **Every tool call** the agent made (name, arguments, result)
- **Token usage breakdown** (input, output, cached)
- **Cost in USD** (from the SDK, not estimated)
- **Duration** (SDK-reported, not wall-clock)
- **Turn count** and model breakdown

This structured data is what makes AgentV uniquely suited for evaluating coding agents — you can evaluate HOW the agent worked, not just its final output.

## SDK API Overview

```typescript
import { query } from '@anthropic-ai/claude-agent-sdk';

const q = query({
  prompt: 'What is 2+2?',
  options: {
    model: 'claude-sonnet-4-5-20250929',
    permissionMode: 'bypassPermissions',
    allowDangerouslySkipPermissions: true,
    cwd: '/path/to/workspace',
    systemPrompt: 'custom system prompt',
    maxTurns: 50,
    maxBudgetUsd: 1.0,
    abortController: new AbortController(),
  }
});

for await (const message of q) {
  if (message.type === 'assistant') {
    // message.message contains APIAssistantMessage with content blocks
    // Extract tool_use blocks → ToolCall[]
  }
  if (message.type === 'result') {
    // message.result - final text
    // message.usage - { input_tokens, output_tokens, cache_* }
    // message.modelUsage - per-model breakdown with costUSD
    // message.total_cost_usd - total cost
    // message.duration_ms - total duration
    // message.num_turns - number of turns
  }
}
```

## Implementation Plan

Follow the same pattern as the `copilot` provider (see PR #211):

### Files to modify
1. **`types.ts`** - Rename `claude-code` → `claude` in `ProviderKind`, add `claude-code` as alias
2. **`targets.ts`** - Update resolved config and target resolution
3. **`index.ts`** - Wire new provider, remove old
4. **`targets-validator.ts`** - Update settings validation

### Files to create
1. **`claude.ts`** - New provider using `@anthropic-ai/claude-agent-sdk`
   - Lazy-load SDK (same pattern as copilot provider)
   - Use `query()` with `permissionMode: 'bypassPermissions'`
   - Map `SDKAssistantMessage` content blocks → `ToolCall[]` and `OutputMessage[]`
   - Extract `usage`, `total_cost_usd`, `duration_ms` from `SDKResultMessage`
2. **`claude-log-tracker.ts`** - Log tracker (rename from claude-code variant)
3. **`claude.test.ts`** - Unit tests with mocked SDK

### Files to delete
1. **`claude-code.ts`** - Old subprocess provider
2. **`claude-code-log-tracker.ts`** - Old log tracker
3. **`claude-code.test.ts`** - Old tests (if exists)

### Dependencies
- Add `@anthropic-ai/claude-agent-sdk` to `packages/core/package.json` and `apps/cli/package.json`
- Add to CLI's `tsup.config.ts` external list

## Key Design Decisions

- **Naming**: `claude` (canonical), `claude-code` (alias for backward compat)
- **Permissions**: Use `bypassPermissions` + `allowDangerouslySkipPermissions` for unattended eval
- **Tool calls**: Extract from `SDKAssistantMessage.message.content` blocks (type `tool_use` / `tool_result`)
- **Streaming**: Iterate the async generator, collect all messages, extract result at end
- **Structured output**: Map every `tool_use` content block to `ToolCall` for trajectory evaluation

## References

- [TypeScript Agent SDK docs](https://platform.claude.com/docs/en/agent-sdk/typescript)
- PR #211 - `copilot` provider implementation (pattern to follow)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): add claude provider using @anthropic-ai/claude-agent-sdk #213

Summary

Motivation

Competitive Advantage

SDK API Overview

Implementation Plan

Files to modify

Files to create

Files to delete

Dependencies

Key Design Decisions

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(core): add claude provider using @anthropic-ai/claude-agent-sdk #213

Description

Summary

Motivation

Competitive Advantage

SDK API Overview

Implementation Plan

Files to modify

Files to create

Files to delete

Dependencies

Key Design Decisions

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions