[miniflare] Fix Browser Run Windows race condition#13734
[miniflare] Fix Browser Run Windows race condition#13734petebacondarwin wants to merge 1 commit intomainfrom
Conversation
🦋 Changeset detectedLatest commit: 56293c3 The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Codeowners approval required for this PR:
Show detailed file reviewers |
|
✅ All changesets look good |
|
I've now thoroughly reviewed all three changed files. Let me analyze the code for any actionable issues:
Everything checks out. The two-layer defense (probe + retry) is well-reasoned, the implementation is correct, and the change is appropriately scoped. LGTM |
create-cloudflare
@cloudflare/kv-asset-handler
miniflare
@cloudflare/pages-shared
@cloudflare/unenv-preset
@cloudflare/vite-plugin
@cloudflare/vitest-pool-workers
@cloudflare/workers-editor-shared
@cloudflare/workers-utils
wrangler
commit: |
0d5e987 to
0719495
Compare
|
lgtm, but we should replace Browser Rendering with Browser Run. |
0719495 to
69cda5f
Compare
69cda5f to
3910c49
Compare
3910c49 to
7a370f0
Compare
| "oxc.path.oxfmt": "node_modules/.bin/oxfmt", | ||
| "oxc.path.oxlint": "node_modules/.bin/oxlint", |
There was a problem hiding this comment.
I couldn't get format-on-save to work in VS Code because the extension couldn't find the binaries.
And this was a suggested workaround.
Not really relevant to this.
workers-devprod
left a comment
There was a problem hiding this comment.
Codeowners reviews satisfied
Wait for Chrome's DevTools port to actually accept connections before returning from launchBrowser, and retry transient connection failures when the binding worker reaches Chrome. Fixes flaky ConnectEx (#1225) and WSARecv (#64) errors observed when running miniflare browser rendering tests on Windows CI.
7a370f0 to
56293c3
Compare
Fixes the flaky
Tests (Windows, packages-and-tools)CI failures observed in run 25108579707/job 73576107174, where 11 of 21 tests inpackages/miniflare/test/plugins/browser/index.spec.tsfailed even with{ retry: 3 }.When Miniflare launches Chrome for Browser Run bindings, it returns the WebSocket endpoint as soon as Chrome prints its
DevTools listening on ws://...banner. On Windows the underlying listening socket is occasionally not yet accepting connections at that point, so the first request from workerd to Chrome fails withkj/async-io-win32.c++:281: failed: ConnectEx(): #1225 The remote computer refused the network connection.. The error propagates up through/v1/acquireas the response bodyError: The remote computer refused…, which the user worker then fails to JSON-parse — the surface the test sees isSyntaxError: Unexpected token 'E', "Error: The"... is not valid JSON.This PR addresses the issue at two layers:
launchBrowser(packages/miniflare/src/plugins/browser-rendering/index.ts). AfterwaitForLineOutputresolves with the WS endpoint, probe Chrome's/json/versionHTTP endpoint with retry/backoff (25ms → 250ms, ~5s total) until it accepts a connection. This addresses the root cause ofConnectEx (#1225).packages/miniflare/src/workers/browser-rendering/binding.worker.ts). A newfetchWithConnectRetryhelper retries on substring-matched transient workerd/kj errors (connection refused,remote computer refused,network name is no longer available,network connection lost,disconnected) and is applied tosetSessionInfoRoute,#proxyRawWebSocket, and#proxyJsonRequest. This catchesWSARecv (#64)and similar mid-fetch socket failures that the readiness probe alone cannot prevent.The fix is internal — there are no API or behaviour changes for users beyond the elimination of the spurious connection errors. On macOS the probe succeeds on the first attempt, adding ~no measurable overhead (browser test suite still completes in ~9–10s locally over 4 consecutive runs).
Tests
Public documentation
Additional testing:
Ran
pnpm -F miniflare test test/plugins/browser/index.spec.tsfour consecutive times locally on macOS — all 21 tests pass each time. The failure mode being fixed is Windows-specific and not reproducible on macOS/Linux without artificial fault injection.