Summary
Replace the current codex provider (CLI subprocess via spawn) with a new provider using the official @openai/codex SDK. This follows the same pattern as the copilot provider (PR #211) and the planned claude provider (#213).
Motivation
The current Codex provider spawns codex CLI as a subprocess and scrapes JSONL output. The SDK provides:
- Structured responses with typed messages
- Token usage from response metadata
- Cost tracking from SDK-reported usage
- Tool call extraction as structured data (not scraped text)
- Abort/timeout control via SDK options
Competitive Advantage
Like the Claude and Copilot SDK providers, this gives AgentV structured access to agent internals — tool calls, token usage, cost — enabling tool_trajectory, tool_call_f1, and execution_metrics evaluators to work out of the box. Most eval frameworks treat Codex as a text-in/text-out black box.
Implementation Plan
Follow the same pattern as copilot provider (PR #211):
Files to modify
types.ts - Update ProviderKind to add codex-sdk or alias existing codex
targets.ts - Update resolved config and target resolution
index.ts - Wire new provider
targets-validator.ts - Update settings validation
Files to create
codex-sdk.ts - New provider using @openai/codex SDK
- Lazy-load SDK (same pattern as copilot/claude providers)
- Extract structured tool calls →
ToolCall[] and OutputMessage[]
- Extract token usage, cost, duration from SDK response
codex-sdk.test.ts - Unit tests with mocked SDK
Files to update/delete
codex.ts - Keep as fallback or deprecate
codex-log-tracker.ts - Update for SDK events
Dependencies
- Add
@openai/codex to packages/core/package.json
- Add to CLI
tsup.config.ts external list
Key Design Decisions
- Naming:
codex (canonical using SDK), old CLI provider available as codex-cli if needed
- Tool calls: Extract structured tool use data for trajectory evaluation
- Backward compat: Existing
codex configs should work with the new provider
Supersedes
This supersedes #99 (execution metrics for Codex) — the SDK provider will return metrics natively.
References
Summary
Replace the current
codexprovider (CLI subprocess viaspawn) with a new provider using the official@openai/codexSDK. This follows the same pattern as thecopilotprovider (PR #211) and the plannedclaudeprovider (#213).Motivation
The current Codex provider spawns
codexCLI as a subprocess and scrapes JSONL output. The SDK provides:Competitive Advantage
Like the Claude and Copilot SDK providers, this gives AgentV structured access to agent internals — tool calls, token usage, cost — enabling
tool_trajectory,tool_call_f1, andexecution_metricsevaluators to work out of the box. Most eval frameworks treat Codex as a text-in/text-out black box.Implementation Plan
Follow the same pattern as
copilotprovider (PR #211):Files to modify
types.ts- UpdateProviderKindto addcodex-sdkor alias existingcodextargets.ts- Update resolved config and target resolutionindex.ts- Wire new providertargets-validator.ts- Update settings validationFiles to create
codex-sdk.ts- New provider using@openai/codexSDKToolCall[]andOutputMessage[]codex-sdk.test.ts- Unit tests with mocked SDKFiles to update/delete
codex.ts- Keep as fallback or deprecatecodex-log-tracker.ts- Update for SDK eventsDependencies
@openai/codextopackages/core/package.jsontsup.config.tsexternal listKey Design Decisions
codex(canonical using SDK), old CLI provider available ascodex-cliif neededcodexconfigs should work with the new providerSupersedes
This supersedes #99 (execution metrics for Codex) — the SDK provider will return metrics natively.
References
copilotprovider implementation (pattern to follow)claudeprovider using Agent SDK (same pattern)packages/core/src/evaluation/providers/codex.ts