fix(session): retry empty stream truncations with attempt cap by edevil · Pull Request #26167 · anomalyco/opencode

edevil · 2026-05-07T11:04:20Z

Issue for this PR

Type of change

Bug fix

What does this PR do?

When an upstream provider stream ends without a proper stop_reason, the AI
SDK emits a fallback finishReason: "other" with zero output tokens.
opencode previously accepted this as a normal end-of-step, persisting a
truncated message with no error and no retry. The user got a half-finished
response and had to manually re-prompt.

This PR detects the truncation pattern at the session-processor layer and
surfaces it as a retryable APIError, capped at 3 attempts.

The trigger condition

When the upstream provider stream is cut mid-generation, the AI SDK emits:

{ type: "text-delta", delta: "..." }
{ type: "text-delta", delta: "..." }   // ← upstream stream cuts here
{ type: "finish", finishReason: "other", usage: { outputTokens: 0 } }
                                                  ↑
                       AI SDK's "no stop reason was given" fallback

opencode's session processor receives the finish-step with
value.finishReason === "other" and usage.tokens.output === 0. Pre-fix,
the processor accepts that as a legitimate end-of-step.

Symptom (real-world evidence)

I found more than a dozen instances of this exact bug pattern across my own
opencode session database, spanning two providers (anthropic, openai)
and four models (gpt-5.3-codex, claude-opus-4-6, claude-opus-4-7,
claude-haiku-4-5). All exhibit the same shape:

// assistant message stored after the truncation
{
  "role": "assistant",
  "providerID": "anthropic",
  "modelID": "claude-opus-4-7",
  "finish": "other",
  "tokens": { "input": 0, "output": 0, "reasoning": 0, "cache": {...} },
  "cost": 0
}

// the corresponding step-finish part
{
  "type": "step-finish",
  "reason": "other",
  "tokens": { "input": 0, "output": 0, "reasoning": 0 },
  "cost": 0
}

Mid-stream cut, not a model decision: in one diagnostic example, the
reasoning text literally ends mid-word — "...really just wrapping the existing whichlang::detect_language() functi". The upstream stream was
severed before the next chunk arrived.

User-visible behavior pre-fix: the session stores a half-finished
message with no error, no retry, no recovery. In one observed session the
user manually re-prompted ~111s later, succeeded for 3 turns, hit the bug
again, re-prompted again — the "session degradation" pattern users report
in #16214.

The fix

Three small changes:

processor.ts — Detect finishReason="other" with zero output
tokens on finish-step and fail the stream with a retryable APIError
tagged metadata.code = "EmptyOther".
retry.ts — Cap EmptyOther retries at 3 attempts so a misbehaving
provider can't loop forever. Other retryable classifications keep their
existing unbounded behaviour.
message-v2.ts — Add case APIError.isInstance(e) to fromError
that converts the class instance to its wire form, so the structured
message and metadata reach the TUI instead of being wrapped in a generic
UnknownError whose payload is the JSON-stringified original.

Scope: why processor-layer instead of provider-layer

Related #21727 catches a similar truncation pattern at the
@ai-sdk/openai-compatible provider's flush() callback, which works only
for OpenAI-compatible providers. This PR catches the same condition one
layer up, in the session processor, where it applies to all AI-SDK
providers — including Anthropic direct, Bedrock, and Vertex. The instances
I observed include Anthropic-direct cases that #21727 cannot reach. The
two PRs are independent and complementary; either order of merge is fine.

How did you verify your code works?

3 new tests in retry.test.ts:
- EmptyOther is recognized as retryable.
- Retry policy stops after 3 attempts on EmptyOther.
- APIError class instances thrown via Effect.fail round-trip through
  fromError correctly (preserving data.message and metadata.code).
All existing tests pass (bun test test/session/retry.test.ts — 31 pass).
bun typecheck adds no new errors.

Other user-visible issues this likely helps

coding agent often abruptly stops mid codeblock #24132 ("coding agent often abruptly stops mid codeblock") — same
symptom class; user has not run diagnostics to confirm mechanism.
Explore subagent hangs indefinitely with Anthropic Claude Opus 4.6 -- no timeout or recovery #13841 ("Explore subagent hangs indefinitely") — partially addresses;
satisfies one of four suggested fixes (retry-cap angle for EmptyOther only).
Intermittent OpenAI streamed server_error (sequence_number:2) with gpt-5.3-codex; retries degrade session #16214 ("Intermittent OpenAI streamed server_error … retries degrade
session") — fromError passthrough makes these errors readable in the
TUI; retry cap prevents indefinite loop.

Checklist

I have tested my changes locally
I have not included unrelated changes in this PR

Detect AI SDK `finishReason="other"` with zero output as upstream stream truncation. Surface as retryable `APIError` tagged `EmptyOther`. Cap retries at 3 attempts so misbehaving providers can't loop forever. Add a `fromError` case for `APIError` class instances so the structured message and metadata are preserved on the assistant message instead of being wrapped in a generic `UnknownError` whose payload is the JSON-stringified original.

github-actions · 2026-05-07T11:04:37Z

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

Open an issue describing the bug/feature (if one doesn't exist)
Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

github-actions · 2026-05-07T11:05:29Z

The following comment was made by an LLM, it may be inaccurate:

Results

Found 1 related PR:

#21727 - fix: handle stream interruption for OpenAI-compatible providers
- This PR is explicitly mentioned in the current PR's description as complementary. It catches the same truncation pattern at the @ai-sdk/openai-compatible provider layer's flush() callback, while PR fix(session): retry empty stream truncations with attempt cap #26167 catches it at the session processor layer (applying to all AI-SDK providers). The description notes both are independent and either order of merge is fine.

Note: PR #26167 is the current PR being analyzed, so it correctly appears in search results but is not a duplicate of itself.

No other duplicate PRs found addressing the same issue.

github-actions Bot added needs:issue contributor labels May 7, 2026

github-actions Bot removed the needs:issue label May 7, 2026

This was referenced May 7, 2026

fix(session): exclude orphaned interrupted tools from run-loop continuation #26178

Open

fix(session): add fallback retry handling and harden pre-push bun path #26192

Open

fix(session): cap retry schedule at RETRY_MAX_ATTEMPTS = 3 #26343

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(session): retry empty stream truncations with attempt cap#26167

fix(session): retry empty stream truncations with attempt cap#26167
edevil wants to merge 1 commit intoanomalyco:devfrom
edevil:fix/empty-other-stream-truncation

edevil commented May 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

edevil commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue for this PR

Type of change

What does this PR do?

The trigger condition

Symptom (real-world evidence)

The fix

Scope: why processor-layer instead of provider-layer

How did you verify your code works?

Other user-visible issues this likely helps

Checklist

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

edevil commented May 7, 2026 •

edited

Loading