Unpin the dead Codex alpha model on Daily Cache Strategy Analyzer — it 404s every run and the fallback resolves to the same dead model.
Severity: P1 — 100% agent-job outage for any Codex workflow pinned to gpt-5-codex-alpha-2025-11-07.
Problem statement
The Codex CLI agent fails every turn with 404 Not Found: Model not found gpt-5-codex-alpha-2025-11-07 against the api-proxy ((172.30.0.30/redacted) The harness burns all 4 retries (5 reconnect attempts each) and the agentjob exits 1. The configured fallback--model gpt-5-codexdoes NOT recover: codex logsWARN Unknown model gpt-5-codex is used. This will use fallback model metadata, and the request still resolves server-side to the dead gpt-5-codex-alpha-2025-11-07`, returning 404 again.
Affected workflows and run IDs
- Daily Cache Strategy Analyzer (
.github/workflows/daily-cache-strategy-analyzer.lock.yml) — run §27571281247
- Any other Codex-engine workflow pinned to the
gpt-5-codex-alpha-2025-11-07 alpha model is exposed to the same outage.
Probable root cause
The alpha model id gpt-5-codex-alpha-2025-11-07 was removed/renamed on the inference backend. The model pin (and the api-proxy fallback mapping) still point at it, so both the primary model and the gpt-5-codex fallback dereference to a non-existent model.
Proposed remediation
- Repoint the workflow/engine
model from gpt-5-codex-alpha-2025-11-07 to a currently-served Codex model id.
- Fix the api-proxy fallback map so
gpt-5-codex resolves to a live model instead of re-resolving to the dead alpha id.
- Make the harness treat a 404 model-not-found as
isInvalidModelError=true (it currently logs isInvalidModelError=false) so it fails fast with a clear classification instead of exhausting 4 retries.
Success criteria / verification
- Next scheduled Daily Cache Strategy Analyzer run reaches the agent turn without a 404, agent job conclusion = success.
- A deliberately-pinned dead model id is classified as
isInvalidModelError=true and fails on attempt 1 (no 4× retry storm).
Parent: #39344. Analyzed run: 27571281247.
Related to #39344
Generated by 🔍 [aw] Failure Investigator (6h) · 572.8 AIC · ⌖ 11.7 AIC · ⊞ 4.5K · ◷
Unpin the dead Codex alpha model on Daily Cache Strategy Analyzer — it 404s every run and the fallback resolves to the same dead model.
Severity: P1 — 100% agent-job outage for any Codex workflow pinned to
gpt-5-codex-alpha-2025-11-07.Problem statement
The Codex CLI agent fails every turn with
404 Not Found: Model not found gpt-5-codex-alpha-2025-11-07against the api-proxy ((172.30.0.30/redacted) The harness burns all 4 retries (5 reconnect attempts each) and theagentjob exits 1. The configured fallback--model gpt-5-codexdoes NOT recover: codex logsWARN Unknown model gpt-5-codex is used. This will use fallback model metadata, and the request still resolves server-side to the deadgpt-5-codex-alpha-2025-11-07`, returning 404 again.Affected workflows and run IDs
.github/workflows/daily-cache-strategy-analyzer.lock.yml) — run §27571281247gpt-5-codex-alpha-2025-11-07alpha model is exposed to the same outage.Probable root cause
The alpha model id
gpt-5-codex-alpha-2025-11-07was removed/renamed on the inference backend. The model pin (and the api-proxy fallback mapping) still point at it, so both the primary model and thegpt-5-codexfallback dereference to a non-existent model.Proposed remediation
modelfromgpt-5-codex-alpha-2025-11-07to a currently-served Codex model id.gpt-5-codexresolves to a live model instead of re-resolving to the dead alpha id.isInvalidModelError=true(it currently logsisInvalidModelError=false) so it fails fast with a clear classification instead of exhausting 4 retries.Success criteria / verification
isInvalidModelError=trueand fails on attempt 1 (no 4× retry storm).Parent: #39344. Analyzed run: 27571281247.
Related to #39344