feat(opencode): add LLM provider fallback chain#26292
feat(opencode): add LLM provider fallback chain#26292j3k0 wants to merge 4 commits intoanomalyco:devfrom
Conversation
Add fallbacks and cooldown_seconds to agent and top-level config schema. Wire fallbacks through Agent.Info and StreamInput so the LLM layer receives the fallback chain from configuration. Fixes anomalyco#7602
… error detection Add CooldownManager for tracking retryable provider failures. Implement withFallback effect that chains primary model through fallbacks on transient errors, with configurable cooldown duration. Detect stream-level errors (e.g. overloaded provider returning 200 with error JSON) by peeking at the first chunk before proxying the stream. Clear cooldown on successful fallback to avoid stale entries. Show toast notification in TUI when fallback is triggered.
CooldownManager: put/get/clear/expiry behaviour. Config: validate fallbacks array and cooldown_seconds at agent and top level.
|
The following comment was made by an LLM, it may be inaccurate: Based on the search results, I found several related PRs that address similar functionality: Potential Related PRs
These PRs address related concerns around provider fallback chains, transient error handling, and retry logic, though they appear to be separate implementations in different components. PR #26292 appears to be the consolidated, comprehensive implementation of this functionality. |
|
Thanks for updating your PR! It now meets our contributing guidelines. 👍 |
…assification When a fallback model succeeds, update the assistant message modelID and providerID so events, logs, and billing attribute to the correct provider. Publish a llm.fallback.used bus event and show an info toast. Block weekly/monthly quota rate limits from retrying — they won't resolve with backoff and can take days to reset.
Issue for this PR
Closes #7602
Type of change
What does this PR do?
Adds a configurable fallback chain so that when a provider returns a transient error (rate limit, overload, 5xx), OpenCode automatically retries on the next model in the chain instead of failing the session.
{ "model": "anthropic/claude-sonnet-4-20250514", "fallbacks": ["openai/gpt-4.1", "deepseek/deepseek-v4"], "cooldown_seconds": 300 }fallbackscan be set at the top level or per-agent.cooldown_secondsdefaults to 300 — after a retryable failure, that provider/model is skipped for the cooldown duration so you don't wait on retries to an overloaded provider.Why built-in instead of a proxy: cheaper providers are unreliable, and routing through LiteLLM degrades tool-call quality. When a provider gets overloaded, falling through immediately is faster than retrying the same one.
How it works:
fallbacksis triedcooldown_secondsand skipped during that windowHow did you verify your code works?
CooldownManager(put/get/clear/expiry) and config validation (fallbacks array and cooldown_seconds)bun typecheckpasses for all 12 packages in the monorepoScreenshots / recordings
N/A — no UI changes visible in screenshots (toast is a runtime notification)
Checklist
Comparison with related PRs
Reviewed #24369, #26192, #24013, #18443, and the closed #13189:
resolveFallbackChainutility — minor convenience we can add later.Key differences in our approach: