Skip to content

fix(sandbox): make peer-binary resolution netns-robust#1302

Draft
laitingsheng wants to merge 1 commit intoNVIDIA:mainfrom
laitingsheng:fix/1471-peer-binary-netns-fallback
Draft

fix(sandbox): make peer-binary resolution netns-robust#1302
laitingsheng wants to merge 1 commit intoNVIDIA:mainfrom
laitingsheng:fix/1471-peer-binary-netns-fallback

Conversation

@laitingsheng
Copy link
Copy Markdown
Contributor

Summary

Peer-binary resolution scanned only /proc/<entrypoint_pid>/net/tcp{,6} to find the socket inode for a TCP peer. If entrypoint_pid was stale (process exited, PID recycled) or sat in a different netns than the one actually carrying the connection — which is what happens in nested-container setups like macOS Docker Desktop with k3s — the scan returned "No ESTABLISHED TCP connection found" and the proxy denied the request with binary=-. Add a (local_port, remote_port) filter and a netns-deduped /proc walk fallback so the inode is found wherever the connection actually lives.

Related Issue

Fixes NVIDIA/NemoClaw#1471.

Changes

  • crates/openshell-sandbox/src/procfs.rs: parse_proc_net_tcp now matches connections by both local_port == peer_port and rem_port == remote_port. On primary-PID miss, walk /proc, dedup by the netns inode of each PID via stat /proc/<pid>/ns/net, and scan one /proc/<pid>/net/tcp per distinct netns. Socket inodes are kernel-global, so a match found in any netns resolves to the correct FD-holding processes downstream. New helpers scan_pid_net_tcp and parse_hex_port.
  • crates/openshell-sandbox/src/proxy.rs: stop dropping client.local_addr() on the floor — thread the proxy's accepted local address through evaluate_opa_tcpresolve_process_identityresolve_tcp_peer_socket_owners, both for CONNECT and for forward-proxy paths.
  • crates/openshell-sandbox/src/bypass_monitor.rs: pass the kmsg event's dst_port so bypass-attempt identity resolution uses the same filter.
  • crates/openshell-sandbox/src/procfs.rs (tests): two new tests — one asserts the fallback walks /proc and finds the connection when entrypoint_pid is a PID that doesn't exist; the other asserts the remote-port filter rejects a stale match when only the local port collides.

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 11, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@laitingsheng laitingsheng force-pushed the fix/1471-peer-binary-netns-fallback branch from a5c9cea to 37b1707 Compare May 11, 2026 06:03
`/proc/<entrypoint_pid>/net/tcp` is the only source the proxy queried
to find the socket inode for a TCP peer. If the entrypoint PID was
stale (process died, PID recycled) or sat in a different netns than
the connection's actual netns — which is what happens in nested
container setups such as macOS Docker Desktop with k3s — the scan
returned "No ESTABLISHED TCP connection found" and the proxy denied
the request with `binary=-`.

Match the connection by `(local_port, remote_port)` so the search can
safely walk other netns. On primary-PID miss, walk `/proc` deduping by
the netns inode of each PID and scan one `/proc/<pid>/net/tcp` per
distinct netns. Socket inodes are kernel-global, so the inode found
in any netns resolves to the same FD-holding processes downstream.

Threads through `client.local_addr()` (already captured but unused) so
`evaluate_opa_tcp` and the bypass monitor can pass the destination
port the kernel sees on the sandbox side.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@laitingsheng laitingsheng force-pushed the fix/1471-peer-binary-netns-fallback branch from 37b1707 to 1753e60 Compare May 11, 2026 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] OpenShell egress proxy "failed to resolve peer binary" — blocks all new CONNECT tunnels including Telegram

1 participant