Skip to content

fix: bypass X11 for scroll using CDP pixel-precise deltas#193

Open
hiroTamada wants to merge 1 commit intomainfrom
hiro/cdp-scroll-bypass-x11
Open

fix: bypass X11 for scroll using CDP pixel-precise deltas#193
hiroTamada wants to merge 1 commit intomainfrom
hiro/cdp-scroll-bypass-x11

Conversation

@hiroTamada
Copy link
Copy Markdown
Contributor

@hiroTamada hiroTamada commented Mar 27, 2026

Summary

X11 scroll events are discrete button clicks (button 4/5), each producing a fixed ~120px jump in Chromium. This makes smooth trackpad scrolling impossible through the neko → X11 path.

This PR bypasses X11 entirely for scroll by sending pixel-precise mouseWheel events directly to Chromium via CDP.

Changes

  • server/lib/cdpclient/cdpclient.go — Add DispatchMouseWheelEvent that sends Input.dispatchMouseEvent (type mouseWheel) with float deltaX/deltaY to a page target
  • server/cmd/api/api/computer.go — Migrate doScroll from xdotool button clicks to CDP mouseWheel; add HandlePixelScroll endpoint for the live view client
  • server/cmd/api/main.go — Register POST /live-view/scroll route and add CORS middleware
  • images/chromium-headful/client/src/components/video.vue — Replace neko data channel scroll with direct fetch to the kernel-images API (port 444) using raw pixel deltas

Test plan

  • Tested locally in Docker with ENABLE_WEBRTC=true — smooth trackpad scrolling works
  • Deployed as unikernel to KraftCloud dev metro — confirmed smooth scroll
  • Verify scroll works in headless mode (direct API calls)
  • Verify hold_keys + scroll (e.g., Ctrl+scroll for zoom) still works

Note

Medium Risk
Changes the scroll input path end-to-end (client + server) and adds a new unauthenticated CORS-enabled endpoint, which could affect input behavior and expose a new surface if not constrained by deployment/networking.

Overview
Switches live-view scrolling from X11/xdotool-style discrete ticks to pixel-precise scroll delivered directly to Chromium via CDP.

The client now accumulates/throttles wheel deltas and POSTs them to a new POST /live-view/scroll endpoint, while the API’s existing Scroll action is migrated to dispatch CDP mouseWheel events (keeping hold_keys via xdotool). A new CORS middleware is added to the API router to allow the live-view client to call this endpoint cross-origin, and cdpclient gains DispatchMouseWheelEvent to send Input.dispatchMouseEvent with float deltas.

Written by Cursor Bugbot for commit 60d29a0. This will update automatically on new commits. Configure here.

for _, key := range *body.HoldKeys {
keyupArgs = append(keyupArgs, "keyup", key)
}
if _, err := defaultXdoTool.Run(ctx, keyupArgs...); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred keyup uses request context, may leave keys stuck

High Severity

The deferred keyup cleanup in doScroll calls defaultXdoTool.Run(ctx, keyupArgs...) using the request context. Every other cleanup path in this file (lines 174, 696, 969, 984) uses context.Background() specifically so that keys are released even when the context is cancelled. If the CDP operation times out or the request is cancelled, the modifier keys will remain stuck in the "down" state.

Additional Locations (1)
Fix in Cursor Fix in Web

fetch(url, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ x: sx, y: sy, delta_x: -dx, delta_y: -dy }),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scroll delta negation inverts scroll_invert setting semantics

Medium Severity

The fetch body sends delta_x: -dx, delta_y: -dy, negating the accumulated deltas. CDP's Input.dispatchMouseEvent with mouseWheel uses the same sign convention as the browser's WheelEvent (positive deltaY = scroll down). This extra negation reverses the effective meaning of the scroll_invert toggle: when scroll_invert is false, scrolling is actually inverted, and when true (the default), it produces natural scrolling — the opposite of pre-existing behavior and what the setting name implies.

Fix in Cursor Fix in Web

}
next.ServeHTTP(w, r)
})
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CORS wildcard exposes all API endpoints to cross-origin

Medium Severity

The CORS middleware sets Access-Control-Allow-Origin: * on every route of the API server, not just /live-view/scroll. This exposes all sensitive endpoints (click_mouse, type_text, press_key, scroll, process management, etc.) to cross-origin requests from any website. A malicious page could remotely control the browser session.

Fix in Cursor Fix in Web

X11 scroll events are discrete button clicks (button 4/5), each producing
a fixed ~120px jump in Chromium. This makes smooth trackpad scrolling
impossible through the neko → X11 path.

This change bypasses X11 entirely for scroll by:

1. Adding DispatchMouseWheelEvent to the CDP client, which sends
   pixel-precise mouseWheel events directly to Chromium
2. Migrating the REST API doScroll handler from xdotool button clicks
   to CDP mouseWheel events
3. Adding a lightweight POST /live-view/scroll endpoint that accepts
   float pixel deltas for the live view client
4. Updating the live view client (video.vue) to POST scroll deltas
   directly to the kernel-images API (port 444) instead of sending
   through neko's data channel (which goes through X11)

Made-with: Cursor
@hiroTamada hiroTamada force-pushed the hiro/cdp-scroll-bypass-x11 branch from 15b86ef to 60d29a0 Compare March 27, 2026 20:42
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

There are 6 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

}, 100)
const now = Date.now()
if (now - this._scrollLastSendTime < 50) {
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throttled scroll events silently lost at gesture end

Medium Severity

The scroll throttle accumulates deltas in _scrollAccX/_scrollAccY and only sends when 50ms have passed since the last send. When the user stops scrolling, any deltas accumulated during the final throttle window are never flushed — there's no trailing-edge timer to dispatch them. This silently drops the tail end of every scroll gesture, causing noticeable under-scrolling.

Additional Locations (1)
Fix in Cursor Fix in Web


log.Info("dispatching CDP mouseWheel", "x", body.X, "y", body.Y, "deltaX", deltaXPx, "deltaY", deltaYPx)
if err := client.DispatchMouseWheelEvent(cdpCtx, body.X, body.Y, deltaXPx, deltaYPx); err != nil {
return &executionError{msg: fmt.Sprintf("CDP mouseWheel failed: %s", err)}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold keys ignored in CDP scroll dispatch

High Severity

The doScroll function sends hold_keys as xdotool keydown/keyup (X11 events) but dispatches the scroll via CDP Input.dispatchMouseEvent. CDP synthetic events use their own explicit modifiers parameter (defaulting to 0) and do not inherit X11 platform keyboard state. Since modifiers is never set in the CDP call, modifier+scroll combinations like Ctrl+scroll for zoom are broken.

Fix in Cursor Fix in Web

http.Error(w, "cdp dial failed", http.StatusInternalServerError)
return
}
defer client.Close()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per-scroll WebSocket connection causes high overhead

Medium Severity

Both HandlePixelScroll and doScroll open a new CDP WebSocket connection for every scroll event — involving a full handshake, target discovery, session attach, event dispatch, detach, and close. The client sends scroll events at up to 20Hz, meaning ~20 WebSocket lifecycles per second. This adds substantial latency and resource pressure to every scroll gesture.

Additional Locations (1)
Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant