fix: bypass X11 for scroll using CDP pixel-precise deltas#193
fix: bypass X11 for scroll using CDP pixel-precise deltas#193hiroTamada wants to merge 1 commit intomainfrom
Conversation
| for _, key := range *body.HoldKeys { | ||
| keyupArgs = append(keyupArgs, "keyup", key) | ||
| } | ||
| if _, err := defaultXdoTool.Run(ctx, keyupArgs...); err != nil { |
There was a problem hiding this comment.
Deferred keyup uses request context, may leave keys stuck
High Severity
The deferred keyup cleanup in doScroll calls defaultXdoTool.Run(ctx, keyupArgs...) using the request context. Every other cleanup path in this file (lines 174, 696, 969, 984) uses context.Background() specifically so that keys are released even when the context is cancelled. If the CDP operation times out or the request is cancelled, the modifier keys will remain stuck in the "down" state.
Additional Locations (1)
| fetch(url, { | ||
| method: 'POST', | ||
| headers: { 'Content-Type': 'application/json' }, | ||
| body: JSON.stringify({ x: sx, y: sy, delta_x: -dx, delta_y: -dy }), |
There was a problem hiding this comment.
Scroll delta negation inverts scroll_invert setting semantics
Medium Severity
The fetch body sends delta_x: -dx, delta_y: -dy, negating the accumulated deltas. CDP's Input.dispatchMouseEvent with mouseWheel uses the same sign convention as the browser's WheelEvent (positive deltaY = scroll down). This extra negation reverses the effective meaning of the scroll_invert toggle: when scroll_invert is false, scrolling is actually inverted, and when true (the default), it produces natural scrolling — the opposite of pre-existing behavior and what the setting name implies.
| } | ||
| next.ServeHTTP(w, r) | ||
| }) | ||
| }, |
There was a problem hiding this comment.
CORS wildcard exposes all API endpoints to cross-origin
Medium Severity
The CORS middleware sets Access-Control-Allow-Origin: * on every route of the API server, not just /live-view/scroll. This exposes all sensitive endpoints (click_mouse, type_text, press_key, scroll, process management, etc.) to cross-origin requests from any website. A malicious page could remotely control the browser session.
X11 scroll events are discrete button clicks (button 4/5), each producing a fixed ~120px jump in Chromium. This makes smooth trackpad scrolling impossible through the neko → X11 path. This change bypasses X11 entirely for scroll by: 1. Adding DispatchMouseWheelEvent to the CDP client, which sends pixel-precise mouseWheel events directly to Chromium 2. Migrating the REST API doScroll handler from xdotool button clicks to CDP mouseWheel events 3. Adding a lightweight POST /live-view/scroll endpoint that accepts float pixel deltas for the live view client 4. Updating the live view client (video.vue) to POST scroll deltas directly to the kernel-images API (port 444) instead of sending through neko's data channel (which goes through X11) Made-with: Cursor
15b86ef to
60d29a0
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
There are 6 total unresolved issues (including 3 from previous reviews).
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| }, 100) | ||
| const now = Date.now() | ||
| if (now - this._scrollLastSendTime < 50) { | ||
| return |
There was a problem hiding this comment.
Throttled scroll events silently lost at gesture end
Medium Severity
The scroll throttle accumulates deltas in _scrollAccX/_scrollAccY and only sends when 50ms have passed since the last send. When the user stops scrolling, any deltas accumulated during the final throttle window are never flushed — there's no trailing-edge timer to dispatch them. This silently drops the tail end of every scroll gesture, causing noticeable under-scrolling.
Additional Locations (1)
|
|
||
| log.Info("dispatching CDP mouseWheel", "x", body.X, "y", body.Y, "deltaX", deltaXPx, "deltaY", deltaYPx) | ||
| if err := client.DispatchMouseWheelEvent(cdpCtx, body.X, body.Y, deltaXPx, deltaYPx); err != nil { | ||
| return &executionError{msg: fmt.Sprintf("CDP mouseWheel failed: %s", err)} |
There was a problem hiding this comment.
Hold keys ignored in CDP scroll dispatch
High Severity
The doScroll function sends hold_keys as xdotool keydown/keyup (X11 events) but dispatches the scroll via CDP Input.dispatchMouseEvent. CDP synthetic events use their own explicit modifiers parameter (defaulting to 0) and do not inherit X11 platform keyboard state. Since modifiers is never set in the CDP call, modifier+scroll combinations like Ctrl+scroll for zoom are broken.
| http.Error(w, "cdp dial failed", http.StatusInternalServerError) | ||
| return | ||
| } | ||
| defer client.Close() |
There was a problem hiding this comment.
Per-scroll WebSocket connection causes high overhead
Medium Severity
Both HandlePixelScroll and doScroll open a new CDP WebSocket connection for every scroll event — involving a full handshake, target discovery, session attach, event dispatch, detach, and close. The client sends scroll events at up to 20Hz, meaning ~20 WebSocket lifecycles per second. This adds substantial latency and resource pressure to every scroll gesture.


Summary
X11 scroll events are discrete button clicks (button 4/5), each producing a fixed ~120px jump in Chromium. This makes smooth trackpad scrolling impossible through the neko → X11 path.
This PR bypasses X11 entirely for scroll by sending pixel-precise
mouseWheelevents directly to Chromium via CDP.Changes
server/lib/cdpclient/cdpclient.go— AddDispatchMouseWheelEventthat sendsInput.dispatchMouseEvent(typemouseWheel) with float deltaX/deltaY to a page targetserver/cmd/api/api/computer.go— MigratedoScrollfrom xdotool button clicks to CDP mouseWheel; addHandlePixelScrollendpoint for the live view clientserver/cmd/api/main.go— RegisterPOST /live-view/scrollroute and add CORS middlewareimages/chromium-headful/client/src/components/video.vue— Replace neko data channel scroll with directfetchto the kernel-images API (port 444) using raw pixel deltasTest plan
ENABLE_WEBRTC=true— smooth trackpad scrolling workshold_keys+ scroll (e.g., Ctrl+scroll for zoom) still worksNote
Medium Risk
Changes the scroll input path end-to-end (client + server) and adds a new unauthenticated CORS-enabled endpoint, which could affect input behavior and expose a new surface if not constrained by deployment/networking.
Overview
Switches live-view scrolling from X11/
xdotool-style discrete ticks to pixel-precise scroll delivered directly to Chromium via CDP.The client now accumulates/throttles wheel deltas and
POSTs them to a newPOST /live-view/scrollendpoint, while the API’s existingScrollaction is migrated to dispatch CDPmouseWheelevents (keepinghold_keysviaxdotool). A new CORS middleware is added to the API router to allow the live-view client to call this endpoint cross-origin, andcdpclientgainsDispatchMouseWheelEventto sendInput.dispatchMouseEventwith float deltas.Written by Cursor Bugbot for commit 60d29a0. This will update automatically on new commits. Configure here.