Skip to content

fix(opencode): reconnect on network disruptions (VPN switch, SSE timeout, connection reset)#19116

Draft
davidprokopec wants to merge 6 commits intoanomalyco:devfrom
davidprokopec:fix/vpn-switch-reconnect
Draft

fix(opencode): reconnect on network disruptions (VPN switch, SSE timeout, connection reset)#19116
davidprokopec wants to merge 6 commits intoanomalyco:devfrom
davidprokopec:fix/vpn-switch-reconnect

Conversation

@davidprokopec
Copy link
Copy Markdown

@davidprokopec davidprokopec commented Mar 25, 2026

Issue for this PR

Closes #17574, #15350, #17099, #15247
Relates to #19100, #17648, #15393

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

When I switch VPN or my network drops briefly, opencode falls apart in a few ways: MCP tool calls fail and the server never reconnects, the session processor treats network errors the same as API errors (wrong retry strategy), and there's zero UI feedback that anything is wrong — it just looks stuck.

This PR adds a dedicated network error handling path.

Session processor — Network errors (ECONNRESET, ETIMEDOUT, SSE read timeout, etc.) now get their own retry loop, separate from the existing API retry logic. It's bounded at 5 attempts with exponential backoff capped at 5s. If the stream already produced chunks before dying, it cleans up partial text/reasoning parts and incomplete tool calls before retrying so you don't get duplicate or corrupt output. If all 5 retries fail, the session stops with the error (doesn't loop forever).

MCP tool calls — Remote MCP servers now auto-reconnect on network errors. When a callTool fails with a network error, it closes the dead client, creates a new connection, and retries the call once. Only applies to type: "remote" configs — local servers don't get this since the failure mode is different. isNetworkError() explicitly excludes auth errors and HTTP status codes so it only catches actual transport failures.

Error classification — The old code only recognized ECONNRESET. Now it catches ETIMEDOUT, ENETUNREACH, EHOSTUNREACH, ENOTFOUND, EPIPE, ECONNREFUSED too. Also added "SSE read timed out" as a retryable error — this was silently dropping before.

Provider chunk timeout — Was a no-op when chunkTimeout wasn't set in config. Now defaults to 30s so SSE stalls are always detected instead of hanging forever.

UI — New "reconnecting" session status shows in the TUI prompt bar with the attempt number and error message. Debounced by 1s so quick reconnects don't flicker.

How did you verify your code works?

  • Added two integration tests in reconnection.test.ts:
    • SSE timeout on first stream → reconnect → success, verifying partial parts get cleaned up
    • All 5 network retries exhausted → session stops with error and idle status
  • Manually tested by toggling VPN on/off during active sessions — MCP tools reconnect and streaming resumes after the reconnect status clears

Screenshots / recordings

  • TODO: will add

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

@github-actions
Copy link
Copy Markdown
Contributor

Hey! Your PR title Fix: restart stream on vpn change, connection drop, ... doesn't follow conventional commit format.

Please update it to start with one of:

  • feat: or feat(scope): new feature
  • fix: or fix(scope): bug fix
  • docs: or docs(scope): documentation changes
  • chore: or chore(scope): maintenance tasks
  • refactor: or refactor(scope): code refactoring
  • test: or test(scope): adding or updating tests

Where scope is the package name (e.g., app, desktop, opencode).

See CONTRIBUTING.md for details.

@davidprokopec davidprokopec changed the title Fix: restart stream on vpn change, connection drop, ... fix(opencode): reconnect on network disruptions (VPN switch, SSE timeout, connection reset) Mar 25, 2026
@davidprokopec davidprokopec force-pushed the fix/vpn-switch-reconnect branch from d00a1b5 to 6c41971 Compare March 25, 2026 13:08
@herjarsa
Copy link
Copy Markdown

Confirming this fixes issue #21199

I can confirm this exact problem - Desktop app goes offline after consistent time period on Windows 11 with local server (SSE). The issue manifests as:

  • UI completely blocks (clicks do nothing)
  • Server LED must be green again before UI responds
  • Happens consistently every session

This PR addresses all the symptoms we see:

  1. Network error retry loop with bounded attempts (5 max)
  2. MCP tool auto-reconnect for remote servers
  3. Better error classification (ETIMEDOUT, ECONNRESET, etc.)
  4. 30s chunk timeout default
  5. UI reconnecting status with attempt counter

Question for reviewer: The PR is in draft state - is there anything blocking it from being marked ready for review? All CI checks pass. Happy to help with testing or reviews.

@herjarsa
Copy link
Copy Markdown

Testing Report

I attempted to cherry-pick and test this PR locally. Here's what I found:

The Good

  • All CI checks pass (8/8) ✅
  • The code changes are well-structured and comprehensive
  • New reconnection tests cover the key scenarios

The Problem ⚠️

The PR is based on an outdated branch and cannot merge cleanly.

When I tried to merge origin/dev into this PR, there were conflicts in 6 files:

  • packages/opencode/src/mcp/index.ts
  • packages/opencode/src/provider/provider.ts
  • packages/opencode/src/session/message-v2.ts
  • packages/opencode/src/session/processor.ts
  • packages/opencode/src/session/status.ts
  • packages/sdk/js/src/v2/gen/types.gen.ts

The test files also reference APIs (@opencode-ai/core/* imports, Session/MessageV2 methods) that have changed significantly since March 25.

Recommendation

The PR needs to be rebased onto the latest origin/dev to resolve these conflicts. After a rebase, the tests should be updated to match the current API signatures.

Would you like help with the rebase? I can test the updated version once it's ready.

@herjarsa
Copy link
Copy Markdown

PR Analysis - Tests Running

I ran the reconnection tests locally on branch test/pr-19116-reconnect.

Test Results: Tests fail due to missing @opencode-ai/core module (workspace dependency issue), not due to the PR changes themselves. This is a pre-existing workspace setup issue.

Critical Finding: The PR has merge conflicts (mergeStateStatus: CONFLICTING). The merge commit 2e473d0 exists locally but hasn't been pushed because conflicts need resolution.

PR Status Summary:

  • All 8 CI checks PASS
  • But PR is in DRAFT state with CONFLICTING status
  • The merge commit exists only in local branch test/pr-19116-reconnect

Question: Do you want me to help resolve the merge conflicts so this PR can be finalized?

@herjarsa
Copy link
Copy Markdown

Conflict Analysis

I tested merging the PR branch (fix/vpn-switch-reconnect) against origin/dev and confirmed the conflicts. Here's what's conflicting:

Conflicting files (5):

  1. packages/opencode/src/mcp/index.ts
  2. packages/opencode/src/provider/provider.ts
  3. packages/opencode/src/session/message-v2.ts
  4. packages/opencode/src/session/processor.ts
  5. packages/sdk/js/src/v2/gen/types.gen.ts

Auto-merged successfully (5):

  1. �un.lock
  2. packages/opencode/src/cli/cmd/tui/component/prompt/index.tsx
  3. packages/opencode/src/session/status.ts
  4. packages/opencode/test/session/reconnection.test.ts (new file)
  5. packages/opencode/test/session/retry.test.ts

The conflicts are in the core logic files. This PR needs a rebase against current dev. Are you planning to update the PR soon, or would you like help with the rebase?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Frequent occurrence: error: sse read timed out

2 participants