Summary
Replace the current claude-code provider (subprocess scraper) with a new claude provider using the official @anthropic-ai/claude-agent-sdk. This follows the same pattern as the copilot provider (PR #211) which replaced the subprocess-based copilot-cli with the typed SDK.
Motivation
The current claude-code provider spawns the Claude CLI as a subprocess and scrapes stdout. The Agent SDK provides:
- Structured messages via
AsyncGenerator<SDKMessage> - typed SDKAssistantMessage, SDKResultMessage
- Token usage & cost from
SDKResultMessage.usage, modelUsage, total_cost_usd
- Duration from
SDKResultMessage.duration_ms
- Tool call tracking via assistant message content blocks or
PostToolUse hooks
- Permission control via
permissionMode: 'bypassPermissions' (no interactive prompts)
- Model selection via
options.model
- Budget control via
options.maxBudgetUsd and options.maxTurns
- Abort support via
options.abortController
Competitive Advantage
AgentV's SDK-based providers extract structured tool call traces as outputMessages — enabling tool_trajectory and tool_call_f1 evaluation. Most eval frameworks treat coding agents as text-in/text-out black boxes. AgentV's SDK integrations give evaluators access to:
- Every tool call the agent made (name, arguments, result)
- Token usage breakdown (input, output, cached)
- Cost in USD (from the SDK, not estimated)
- Duration (SDK-reported, not wall-clock)
- Turn count and model breakdown
This structured data is what makes AgentV uniquely suited for evaluating coding agents — you can evaluate HOW the agent worked, not just its final output.
SDK API Overview
import { query } from '@anthropic-ai/claude-agent-sdk';
const q = query({
prompt: 'What is 2+2?',
options: {
model: 'claude-sonnet-4-5-20250929',
permissionMode: 'bypassPermissions',
allowDangerouslySkipPermissions: true,
cwd: '/path/to/workspace',
systemPrompt: 'custom system prompt',
maxTurns: 50,
maxBudgetUsd: 1.0,
abortController: new AbortController(),
}
});
for await (const message of q) {
if (message.type === 'assistant') {
// message.message contains APIAssistantMessage with content blocks
// Extract tool_use blocks → ToolCall[]
}
if (message.type === 'result') {
// message.result - final text
// message.usage - { input_tokens, output_tokens, cache_* }
// message.modelUsage - per-model breakdown with costUSD
// message.total_cost_usd - total cost
// message.duration_ms - total duration
// message.num_turns - number of turns
}
}
Implementation Plan
Follow the same pattern as the copilot provider (see PR #211):
Files to modify
types.ts - Rename claude-code → claude in ProviderKind, add claude-code as alias
targets.ts - Update resolved config and target resolution
index.ts - Wire new provider, remove old
targets-validator.ts - Update settings validation
Files to create
claude.ts - New provider using @anthropic-ai/claude-agent-sdk
- Lazy-load SDK (same pattern as copilot provider)
- Use
query() with permissionMode: 'bypassPermissions'
- Map
SDKAssistantMessage content blocks → ToolCall[] and OutputMessage[]
- Extract
usage, total_cost_usd, duration_ms from SDKResultMessage
claude-log-tracker.ts - Log tracker (rename from claude-code variant)
claude.test.ts - Unit tests with mocked SDK
Files to delete
claude-code.ts - Old subprocess provider
claude-code-log-tracker.ts - Old log tracker
claude-code.test.ts - Old tests (if exists)
Dependencies
- Add
@anthropic-ai/claude-agent-sdk to packages/core/package.json and apps/cli/package.json
- Add to CLI's
tsup.config.ts external list
Key Design Decisions
- Naming:
claude (canonical), claude-code (alias for backward compat)
- Permissions: Use
bypassPermissions + allowDangerouslySkipPermissions for unattended eval
- Tool calls: Extract from
SDKAssistantMessage.message.content blocks (type tool_use / tool_result)
- Streaming: Iterate the async generator, collect all messages, extract result at end
- Structured output: Map every
tool_use content block to ToolCall for trajectory evaluation
References
Summary
Replace the current
claude-codeprovider (subprocess scraper) with a newclaudeprovider using the official@anthropic-ai/claude-agent-sdk. This follows the same pattern as thecopilotprovider (PR #211) which replaced the subprocess-basedcopilot-cliwith the typed SDK.Motivation
The current
claude-codeprovider spawns the Claude CLI as a subprocess and scrapes stdout. The Agent SDK provides:AsyncGenerator<SDKMessage>- typedSDKAssistantMessage,SDKResultMessageSDKResultMessage.usage,modelUsage,total_cost_usdSDKResultMessage.duration_msPostToolUsehookspermissionMode: 'bypassPermissions'(no interactive prompts)options.modeloptions.maxBudgetUsdandoptions.maxTurnsoptions.abortControllerCompetitive Advantage
AgentV's SDK-based providers extract structured tool call traces as
outputMessages— enablingtool_trajectoryandtool_call_f1evaluation. Most eval frameworks treat coding agents as text-in/text-out black boxes. AgentV's SDK integrations give evaluators access to:This structured data is what makes AgentV uniquely suited for evaluating coding agents — you can evaluate HOW the agent worked, not just its final output.
SDK API Overview
Implementation Plan
Follow the same pattern as the
copilotprovider (see PR #211):Files to modify
types.ts- Renameclaude-code→claudeinProviderKind, addclaude-codeas aliastargets.ts- Update resolved config and target resolutionindex.ts- Wire new provider, remove oldtargets-validator.ts- Update settings validationFiles to create
claude.ts- New provider using@anthropic-ai/claude-agent-sdkquery()withpermissionMode: 'bypassPermissions'SDKAssistantMessagecontent blocks →ToolCall[]andOutputMessage[]usage,total_cost_usd,duration_msfromSDKResultMessageclaude-log-tracker.ts- Log tracker (rename from claude-code variant)claude.test.ts- Unit tests with mocked SDKFiles to delete
claude-code.ts- Old subprocess providerclaude-code-log-tracker.ts- Old log trackerclaude-code.test.ts- Old tests (if exists)Dependencies
@anthropic-ai/claude-agent-sdktopackages/core/package.jsonandapps/cli/package.jsontsup.config.tsexternal listKey Design Decisions
claude(canonical),claude-code(alias for backward compat)bypassPermissions+allowDangerouslySkipPermissionsfor unattended evalSDKAssistantMessage.message.contentblocks (typetool_use/tool_result)tool_usecontent block toToolCallfor trajectory evaluationReferences
copilotprovider implementation (pattern to follow)