feat(opencode): add LLM provider fallback chain by j3k0 · Pull Request #26292 · anomalyco/opencode

j3k0 · 2026-05-08T06:20:06Z

Issue for this PR

Type of change

Bug fix
New feature
Refactor / code improvement
Documentation

What does this PR do?

Adds a configurable fallback chain so that when a provider returns a transient error (rate limit, overload, 5xx), OpenCode automatically retries on the next model in the chain instead of failing the session.

{
  "model": "anthropic/claude-sonnet-4-20250514",
  "fallbacks": ["openai/gpt-4.1", "deepseek/deepseek-v4"],
  "cooldown_seconds": 300
}

fallbacks can be set at the top level or per-agent. cooldown_seconds defaults to 300 — after a retryable failure, that provider/model is skipped for the cooldown duration so you don't wait on retries to an overloaded provider.

Why built-in instead of a proxy: cheaper providers are unreliable, and routing through LiteLLM degrades tool-call quality. When a provider gets overloaded, falling through immediately is faster than retrying the same one.

How it works:

On a retryable error (5xx, rate limit, overload), the next model in fallbacks is tried
Failed providers are put on cooldown for cooldown_seconds and skipped during that window
On success, the winning provider's cooldown is cleared so it's immediately available next time
Stream-level errors (provider returning 200 with an error body) are detected by peeking at the first chunk
When a fallback model succeeds, model attribution is updated so events, logs, and billing reflect the actual provider used
A toast notification appears in the TUI when fallback triggers and when a fallback succeeds
Weekly/monthly quota rate limits are classified as non-retryable (won't resolve with backoff)

How did you verify your code works?

Unit tests for CooldownManager (put/get/clear/expiry) and config validation (fallbacks array and cooldown_seconds)
Running this in production for 1 week across daily work without issues
bun typecheck passes for all 12 packages in the monorepo

Screenshots / recordings

N/A — no UI changes visible in screenshots (toast is a runtime notification)

Checklist

I have tested my changes locally
I have not included unrelated changes in this PR

Comparison with related PRs

Reviewed #24369, #26192, #24013, #18443, and the closed #13189:

vs feat(processor): add model fallback chain when retries are exhausted #24369: We have cooldown (they have none), stream error detection, and user notification. They have a resolveFallbackChain utility — minor convenience we can add later.
vs fix(session): add fallback retry handling and harden pre-push bun path #26192: They identified the model attribution gap (events/logs/billing showing original provider instead of fallback) — now fixed in our PR. Their dedup would prevent trying the same model twice in a chain, but this can be intentional (try primary, fall through, then retry primary with fresh context). Our cooldown solves this differently.
vs fix(opencode): stop retrying non-transient rate limits #24013: Adopted their weekly/monthly quota guard — non-transient rate limits that take days to reset should not trigger fallback.
vs fix(retry): retry transient 429 responses even when provider marks non-retryable #18443: They handle 429+isRetryable=false from provider proxies. Our code already handles this via 5xx override, but the explicit 429 path would be a good future improvement.
vs feat: add model fallback support with TTFT-based timeout #13189 (closed, stale): They had TTFT timeout and session state tracking. Our cooldown naturally routes away from failed providers toward working ones (simpler), and TTFT is better handled per-provider in config.

Key differences in our approach:

Cooldown is superior to session state: remembers what's failed, not what's succeeded — no "sticky" fallback
Stream error detection via first-chunk peek catches 200-with-error responses
retry-after header parsing respects provider-suggested backoff
No dedup by design: deliberate — allows retrying primary after falling through

Add fallbacks and cooldown_seconds to agent and top-level config schema. Wire fallbacks through Agent.Info and StreamInput so the LLM layer receives the fallback chain from configuration. Fixes anomalyco#7602

… error detection Add CooldownManager for tracking retryable provider failures. Implement withFallback effect that chains primary model through fallbacks on transient errors, with configurable cooldown duration. Detect stream-level errors (e.g. overloaded provider returning 200 with error JSON) by peeking at the first chunk before proxying the stream. Clear cooldown on successful fallback to avoid stale entries. Show toast notification in TUI when fallback is triggered.

CooldownManager: put/get/clear/expiry behaviour. Config: validate fallbacks array and cooldown_seconds at agent and top level.

github-actions · 2026-05-08T06:21:08Z

The following comment was made by an LLM, it may be inaccurate:

Based on the search results, I found several related PRs that address similar functionality:

Potential Related PRs

PR fix(session): add fallback retry handling and harden pre-push bun path #26192 - fix(session): add fallback retry handling and harden pre-push bun path
- Related because it adds fallback retry handling in the session layer, which is closely related to the fallback chain feature
PR feat(processor): add model fallback chain when retries are exhausted #24369 - feat(processor): add model fallback chain when retries are exhausted
- Similar feature but for the processor - implements a fallback chain mechanism when retries are exhausted
PR fix(opencode): stop retrying non-transient rate limits #24013 - fix(opencode): stop retrying non-transient rate limits
- Related to distinguishing transient errors (rate limits, 5xx) that should trigger fallbacks
PR fix(retry): retry transient 429 responses even when provider marks non-retryable #18443 - fix(retry): retry transient 429 responses even when provider marks non-retryable
- Related to handling transient 429 rate limit errors which is a key trigger for the fallback chain

These PRs address related concerns around provider fallback chains, transient error handling, and retry logic, though they appear to be separate implementations in different components. PR #26292 appears to be the consolidated, comprehensive implementation of this functionality.

github-actions · 2026-05-08T06:25:58Z

Thanks for updating your PR! It now meets our contributing guidelines. 👍

…assification When a fallback model succeeds, update the assistant message modelID and providerID so events, logs, and billing attribute to the correct provider. Publish a llm.fallback.used bus event and show an info toast. Block weekly/monthly quota rate limits from retrying — they won't resolve with backoff and can take days to reset.

j3k0 added 3 commits May 8, 2026 08:30

feat(opencode): add provider fallback chain config

1df09cc

Add fallbacks and cooldown_seconds to agent and top-level config schema. Wire fallbacks through Agent.Info and StreamInput so the LLM layer receives the fallback chain from configuration. Fixes anomalyco#7602

test(opencode): add fallback config and cooldown tests

4a5fce6

CooldownManager: put/get/clear/expiry behaviour. Config: validate fallbacks array and cooldown_seconds at agent and top level.

github-actions Bot added the needs:compliance This means the issue will auto-close after 2 hours. label May 8, 2026

github-actions Bot removed the needs:compliance This means the issue will auto-close after 2 hours. label May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(opencode): add LLM provider fallback chain#26292

feat(opencode): add LLM provider fallback chain#26292
j3k0 wants to merge 4 commits intoanomalyco:devfrom
j3k0:feat/llm-fallback

j3k0 commented May 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

j3k0 commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue for this PR

Type of change

What does this PR do?

How did you verify your code works?

Screenshots / recordings

Checklist

Comparison with related PRs

Uh oh!

github-actions Bot commented May 8, 2026

Potential Related PRs

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

j3k0 commented May 8, 2026 •

edited

Loading