Skip to content

fix(session): retry empty stream truncations with attempt cap#26167

Open
edevil wants to merge 1 commit intoanomalyco:devfrom
edevil:fix/empty-other-stream-truncation
Open

fix(session): retry empty stream truncations with attempt cap#26167
edevil wants to merge 1 commit intoanomalyco:devfrom
edevil:fix/empty-other-stream-truncation

Conversation

@edevil
Copy link
Copy Markdown
Contributor

@edevil edevil commented May 7, 2026

Issue for this PR

Closes #26170
Related #21727

Type of change

  • Bug fix

What does this PR do?

When an upstream provider stream ends without a proper stop_reason, the AI
SDK emits a fallback finishReason: "other" with zero output tokens.
opencode previously accepted this as a normal end-of-step, persisting a
truncated message with no error and no retry. The user got a half-finished
response and had to manually re-prompt.

This PR detects the truncation pattern at the session-processor layer and
surfaces it as a retryable APIError, capped at 3 attempts.

The trigger condition

When the upstream provider stream is cut mid-generation, the AI SDK emits:

{ type: "text-delta", delta: "..." }
{ type: "text-delta", delta: "..." }   // ← upstream stream cuts here
{ type: "finish", finishReason: "other", usage: { outputTokens: 0 } }
                                                  
                       AI SDK's "no stop reason was given" fallback

opencode's session processor receives the finish-step with
value.finishReason === "other" and usage.tokens.output === 0. Pre-fix,
the processor accepts that as a legitimate end-of-step.

Symptom (real-world evidence)

I found more than a dozen instances of this exact bug pattern across my own
opencode session database, spanning two providers (anthropic, openai)
and four models (gpt-5.3-codex, claude-opus-4-6, claude-opus-4-7,
claude-haiku-4-5). All exhibit the same shape:

// assistant message stored after the truncation
{
  "role": "assistant",
  "providerID": "anthropic",
  "modelID": "claude-opus-4-7",
  "finish": "other",
  "tokens": { "input": 0, "output": 0, "reasoning": 0, "cache": {...} },
  "cost": 0
}
// the corresponding step-finish part
{
  "type": "step-finish",
  "reason": "other",
  "tokens": { "input": 0, "output": 0, "reasoning": 0 },
  "cost": 0
}

Mid-stream cut, not a model decision: in one diagnostic example, the
reasoning text literally ends mid-word — "...really just wrapping the existing whichlang::detect_language() functi". The upstream stream was
severed before the next chunk arrived.

User-visible behavior pre-fix: the session stores a half-finished
message with no error, no retry, no recovery. In one observed session the
user manually re-prompted ~111s later, succeeded for 3 turns, hit the bug
again, re-prompted again — the "session degradation" pattern users report
in #16214.

The fix

Three small changes:

  1. processor.ts — Detect finishReason="other" with zero output
    tokens on finish-step and fail the stream with a retryable APIError
    tagged metadata.code = "EmptyOther".

  2. retry.ts — Cap EmptyOther retries at 3 attempts so a misbehaving
    provider can't loop forever. Other retryable classifications keep their
    existing unbounded behaviour.

  3. message-v2.ts — Add case APIError.isInstance(e) to fromError
    that converts the class instance to its wire form, so the structured
    message and metadata reach the TUI instead of being wrapped in a generic
    UnknownError whose payload is the JSON-stringified original.

Scope: why processor-layer instead of provider-layer

Related #21727 catches a similar truncation pattern at the
@ai-sdk/openai-compatible provider's flush() callback, which works only
for OpenAI-compatible providers. This PR catches the same condition one
layer up, in the session processor, where it applies to all AI-SDK
providers — including Anthropic direct, Bedrock, and Vertex. The instances
I observed include Anthropic-direct cases that #21727 cannot reach. The
two PRs are independent and complementary; either order of merge is fine.

How did you verify your code works?

  • 3 new tests in retry.test.ts:
    • EmptyOther is recognized as retryable.
    • Retry policy stops after 3 attempts on EmptyOther.
    • APIError class instances thrown via Effect.fail round-trip through
      fromError correctly (preserving data.message and metadata.code).
  • All existing tests pass (bun test test/session/retry.test.ts — 31 pass).
  • bun typecheck adds no new errors.

Other user-visible issues this likely helps

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

Detect AI SDK `finishReason="other"` with zero output as upstream stream
truncation. Surface as retryable `APIError` tagged `EmptyOther`. Cap
retries at 3 attempts so misbehaving providers can't loop forever.

Add a `fromError` case for `APIError` class instances so the structured
message and metadata are preserved on the assistant message instead of
being wrapped in a generic `UnknownError` whose payload is the
JSON-stringified original.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

The following comment was made by an LLM, it may be inaccurate:

Results

Found 1 related PR:

  1. #21727 - fix: handle stream interruption for OpenAI-compatible providers
    • This PR is explicitly mentioned in the current PR's description as complementary. It catches the same truncation pattern at the @ai-sdk/openai-compatible provider layer's flush() callback, while PR fix(session): retry empty stream truncations with attempt cap #26167 catches it at the session processor layer (applying to all AI-SDK providers). The description notes both are independent and either order of merge is fine.

Note: PR #26167 is the current PR being analyzed, so it correctly appears in search results but is not a duplicate of itself.

No other duplicate PRs found addressing the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provider stream truncation (finishReason="other" with zero output) silently accepted, persisting half-finished assistant messages

1 participant