Skip to content

Release lock for interruptable client requests#237

Open
mishushakov wants to merge 13 commits intomainfrom
fix-interrupt
Open

Release lock for interruptable client requests#237
mishushakov wants to merge 13 commits intomainfrom
fix-interrupt

Conversation

@mishushakov
Copy link
Member

@mishushakov mishushakov commented Mar 24, 2026

fixes #213

@cursor
Copy link

cursor bot commented Mar 24, 2026

PR Summary

Medium Risk
Changes the execution streaming/cancellation path to send periodic keepalives and trigger a Jupyter kernel interrupt on client disconnect, which could affect concurrency and cleanup behavior under load or slow networks.

Overview
Fixes stuck kernels after client-side timeouts by adding periodic keepalive yields during streaming and interrupting the Jupyter kernel when the HTTP stream is cancelled/disconnected.

ContextWebSocket.execute now shields a REST interrupt call (/api/kernels/{id}/interrupt) on CancelledError/GeneratorExit and ensures execution state is cleaned up in a finally block. Adds regression tests in JS and Python (sync + async) that simulate a timeout and verify a subsequent execution completes, plus a patch changeset entry.

Written by Cursor Bugbot for commit dfc32d2. This will update automatically on new commits. Configure here.

@mishushakov mishushakov enabled auto-merge (squash) March 24, 2026 19:11
@mishushakov mishushakov disabled auto-merge March 25, 2026 09:02
Copy link
Member

@jakubno jakubno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a correct fix. The executions in jupyter are in serial. What you would need to do here is to interrupt the execution on client disconnect

@jakubno jakubno self-assigned this Mar 25, 2026
@ValentaTomas ValentaTomas removed their request for review March 25, 2026 16:27
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

@mishushakov mishushakov requested a review from jakubno March 25, 2026 21:33
// This simulates a client disconnect: the SDK aborts the connection,
// which should trigger the server to interrupt the kernel (#213).
await expect(
sandbox.runCode('import time; time.sleep(30)', { timeoutMs: 3_000 })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this longer than test timeout, so we're sure it was cancelled

Suggested change
sandbox.runCode('import time; time.sleep(30)', { timeoutMs: 3_000 })
sandbox.runCode('import time; time.sleep(300)', { timeoutMs: 3_000 })

'@e2b/code-interpreter-template': patch
---

release lock before stream completion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changeset is obsolete

# This simulates a client disconnect: the SDK closes the connection,
# which should trigger the server to interrupt the kernel (#213).
with pytest.raises(TimeoutException):
await async_sandbox.run_code("import time; time.sleep(30)", timeout=3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
await async_sandbox.run_code("import time; time.sleep(30)", timeout=3)
await async_sandbox.run_code("import time; time.sleep(300)", timeout=3)

# This simulates a client disconnect: the SDK closes the connection,
# which should trigger the server to interrupt the kernel (#213).
with pytest.raises(TimeoutException):
sandbox.run_code("import time; time.sleep(30)", timeout=3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sandbox.run_code("import time; time.sleep(30)", timeout=3)
sandbox.run_code("import time; time.sleep(300)", timeout=3)

while True:
output = await queue.get()
try:
output = await asyncio.wait_for(queue.get(), timeout=KEEPALIVE_INTERVAL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add the comment above with explanation for the keepalive interval 🙏🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

asyncio.Lock in messaging.py not released on client disconnect → cascading timeouts

2 participants