Skip to content

feat(opencode): add LLM provider fallback chain#26292

Open
j3k0 wants to merge 4 commits intoanomalyco:devfrom
j3k0:feat/llm-fallback
Open

feat(opencode): add LLM provider fallback chain#26292
j3k0 wants to merge 4 commits intoanomalyco:devfrom
j3k0:feat/llm-fallback

Conversation

@j3k0
Copy link
Copy Markdown

@j3k0 j3k0 commented May 8, 2026

Issue for this PR

Closes #7602

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Adds a configurable fallback chain so that when a provider returns a transient error (rate limit, overload, 5xx), OpenCode automatically retries on the next model in the chain instead of failing the session.

{
  "model": "anthropic/claude-sonnet-4-20250514",
  "fallbacks": ["openai/gpt-4.1", "deepseek/deepseek-v4"],
  "cooldown_seconds": 300
}

fallbacks can be set at the top level or per-agent. cooldown_seconds defaults to 300 — after a retryable failure, that provider/model is skipped for the cooldown duration so you don't wait on retries to an overloaded provider.

Why built-in instead of a proxy: cheaper providers are unreliable, and routing through LiteLLM degrades tool-call quality. When a provider gets overloaded, falling through immediately is faster than retrying the same one.

How it works:

  1. On a retryable error (5xx, rate limit, overload), the next model in fallbacks is tried
  2. Failed providers are put on cooldown for cooldown_seconds and skipped during that window
  3. On success, the winning provider's cooldown is cleared so it's immediately available next time
  4. Stream-level errors (provider returning 200 with an error body) are detected by peeking at the first chunk
  5. When a fallback model succeeds, model attribution is updated so events, logs, and billing reflect the actual provider used
  6. A toast notification appears in the TUI when fallback triggers and when a fallback succeeds
  7. Weekly/monthly quota rate limits are classified as non-retryable (won't resolve with backoff)

How did you verify your code works?

  • Unit tests for CooldownManager (put/get/clear/expiry) and config validation (fallbacks array and cooldown_seconds)
  • Running this in production for 1 week across daily work without issues
  • bun typecheck passes for all 12 packages in the monorepo

Screenshots / recordings

N/A — no UI changes visible in screenshots (toast is a runtime notification)

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

Comparison with related PRs

Reviewed #24369, #26192, #24013, #18443, and the closed #13189:

Key differences in our approach:

  • Cooldown is superior to session state: remembers what's failed, not what's succeeded — no "sticky" fallback
  • Stream error detection via first-chunk peek catches 200-with-error responses
  • retry-after header parsing respects provider-suggested backoff
  • No dedup by design: deliberate — allows retrying primary after falling through

j3k0 added 3 commits May 8, 2026 08:30
Add fallbacks and cooldown_seconds to agent and top-level config schema.
Wire fallbacks through Agent.Info and StreamInput so the LLM layer
receives the fallback chain from configuration.

Fixes anomalyco#7602
… error detection

Add CooldownManager for tracking retryable provider failures.
Implement withFallback effect that chains primary model through
fallbacks on transient errors, with configurable cooldown duration.
Detect stream-level errors (e.g. overloaded provider returning 200
with error JSON) by peeking at the first chunk before proxying the
stream. Clear cooldown on successful fallback to avoid stale entries.
Show toast notification in TUI when fallback is triggered.
CooldownManager: put/get/clear/expiry behaviour.
Config: validate fallbacks array and cooldown_seconds at agent and top level.
@github-actions github-actions Bot added the needs:compliance This means the issue will auto-close after 2 hours. label May 8, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

The following comment was made by an LLM, it may be inaccurate:

Based on the search results, I found several related PRs that address similar functionality:

Potential Related PRs

  1. PR fix(session): add fallback retry handling and harden pre-push bun path #26192 - fix(session): add fallback retry handling and harden pre-push bun path

    • Related because it adds fallback retry handling in the session layer, which is closely related to the fallback chain feature
  2. PR feat(processor): add model fallback chain when retries are exhausted #24369 - feat(processor): add model fallback chain when retries are exhausted

    • Similar feature but for the processor - implements a fallback chain mechanism when retries are exhausted
  3. PR fix(opencode): stop retrying non-transient rate limits #24013 - fix(opencode): stop retrying non-transient rate limits

    • Related to distinguishing transient errors (rate limits, 5xx) that should trigger fallbacks
  4. PR fix(retry): retry transient 429 responses even when provider marks non-retryable #18443 - fix(retry): retry transient 429 responses even when provider marks non-retryable

    • Related to handling transient 429 rate limit errors which is a key trigger for the fallback chain

These PRs address related concerns around provider fallback chains, transient error handling, and retry logic, though they appear to be separate implementations in different components. PR #26292 appears to be the consolidated, comprehensive implementation of this functionality.

@github-actions github-actions Bot removed the needs:compliance This means the issue will auto-close after 2 hours. label May 8, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Thanks for updating your PR! It now meets our contributing guidelines. 👍

…assification

When a fallback model succeeds, update the assistant message modelID
and providerID so events, logs, and billing attribute to the correct
provider. Publish a llm.fallback.used bus event and show an info toast.

Block weekly/monthly quota rate limits from retrying — they won't
resolve with backoff and can take days to reset.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Native Model Fallback / Failover Support

1 participant