Motivation
When an LLM provider stalls mid-stream (no tokens for minutes), the subagent appears "running" but is actually frozen. The for-await loop in processor.ts hangs indefinitely waiting for the next SSE chunk. The abort signal is only checked after a token arrives — if no token arrives, the check never runs. The only safety net is a 30-minute timeout, meaning zombie agents waste up to 30 minutes before the system notices.
Proposed Solution
Add a lastTokenTime timestamp that updates on every text-delta, reasoning-delta, and tool-call event in the processor stream loop. At the start of each iteration (after the existing input.abort.throwIfAborted() check at line 56), check Date.now() - lastTokenTime against a configurable stall timeout (default 3 minutes). If exceeded, throw an error: "LLM stream stalled: no tokens received for 3 minutes".
Key hook locations in processor.ts:
- Line 56: stall check goes after
input.abort.throwIfAborted()
- Line 81:
case "reasoning-delta" — update lastTokenTime = Date.now()
- Line 134-264: tool-call handlers — update
lastTokenTime (tool calls indicate LLM is active)
- Line 338:
case "text-delta" — update lastTokenTime = Date.now()
Important: This will NOT gracefully abort the hung HTTP connection (SSE reads cannot be interrupted mid-byte). It will however surface the problem — the session fails visibly instead of hanging indefinitely, and Pulse/check_task can take corrective action.
Quality Gates (Non-Negotiable)
Acceptance Criteria
Definition of Done
Fork Manifest Requirement
This issue modifies the subagent monitoring system introduced by the async-tasks fork feature. Upon completion, update .fork-features/manifest.json entry async-tasks:
modifiedFiles: Add packages/opencode/src/session/processor.ts
criticalCode: Add lastTokenTime, OPENCODE_STALL_TIMEOUT_MS, LLM stream stalled
absorptionSignals: Add stall.*detector, stream.*stall, lastTokenTime
This ensures sync-time agents understand the stall detection logic and can verify it survives upstream merges.
Motivation
When an LLM provider stalls mid-stream (no tokens for minutes), the subagent appears "running" but is actually frozen. The for-await loop in processor.ts hangs indefinitely waiting for the next SSE chunk. The abort signal is only checked after a token arrives — if no token arrives, the check never runs. The only safety net is a 30-minute timeout, meaning zombie agents waste up to 30 minutes before the system notices.
Proposed Solution
Add a
lastTokenTimetimestamp that updates on everytext-delta,reasoning-delta, and tool-call event in the processor stream loop. At the start of each iteration (after the existinginput.abort.throwIfAborted()check at line 56), checkDate.now() - lastTokenTimeagainst a configurable stall timeout (default 3 minutes). If exceeded, throw an error:"LLM stream stalled: no tokens received for 3 minutes".Key hook locations in processor.ts:
input.abort.throwIfAborted()case "reasoning-delta"— updatelastTokenTime = Date.now()lastTokenTime(tool calls indicate LLM is active)case "text-delta"— updatelastTokenTime = Date.now()Important: This will NOT gracefully abort the hung HTTP connection (SSE reads cannot be interrupted mid-byte). It will however surface the problem — the session fails visibly instead of hanging indefinitely, and Pulse/check_task can take corrective action.
Quality Gates (Non-Negotiable)
Acceptance Criteria
lastTokenTimetracked per session in the processor stream loopOPENCODE_STALL_TIMEOUT_MS)Loginstrumentation: warn-level log on stall detection with session ID and elapsed timeDefinition of Done
Fork Manifest Requirement
This issue modifies the subagent monitoring system introduced by the
async-tasksfork feature. Upon completion, update.fork-features/manifest.jsonentryasync-tasks:modifiedFiles: Addpackages/opencode/src/session/processor.tscriticalCode: AddlastTokenTime,OPENCODE_STALL_TIMEOUT_MS,LLM stream stalledabsorptionSignals: Addstall.*detector,stream.*stall,lastTokenTimeThis ensures sync-time agents understand the stall detection logic and can verify it survives upstream merges.