feat: transparent routing through agent tunnel#1741
feat: transparent routing through agent tunnel#1741irvingouj@Devolutions (irvingoujAtDevolution) wants to merge 2 commits intomasterfrom
Conversation
8638365 to
ad3d3a0
Compare
76acc25 to
b70e278
Compare
There was a problem hiding this comment.
Pull request overview
Adds transparent target-based routing through the QUIC agent tunnel so the gateway can automatically forward connections via an enrolled agent when the destination matches advertised subnets/domains.
Changes:
- Introduces a shared agent-tunnel routing pipeline (
resolve_route/try_route) and wires it into forwarding (WS TCP/TLS), RDP clean path, and KDC proxy. - Extends route advertisements to support IPv4+IPv6 subnets and normalized domain suffix matching (longest domain suffix wins).
- Updates RDP clean-path server connection logic to support both TCP and QUIC transports via a concrete
ServerTransportenum (to preserveSend).
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| devolutions-gateway/src/rdp_proxy.rs | Updates Kerberos send function signature usage for CredSSP network requests. |
| devolutions-gateway/src/rd_clean_path.rs | Splits clean-path into authorization vs connect; adds TCP/QUIC ServerTransport for server side. |
| devolutions-gateway/src/proxy.rs | Tightens transport bounds to require Send for both sides. |
| devolutions-gateway/src/generic_client.rs | Integrates agent-tunnel routing into generic TCP forwarding path. |
| devolutions-gateway/src/api/rdp.rs | Plumbs agent_tunnel_handle into the RDP handler path. |
| devolutions-gateway/src/api/kdc_proxy.rs | Adds optional agent-tunnel routing to KDC proxy send path and generalizes reply reading. |
| devolutions-gateway/src/api/fwd.rs | Plumbs agent_tunnel_handle into WS forwarder and routes via tunnel when matched. |
| devolutions-gateway/src/agent_tunnel/routing.rs | New shared routing pipeline + unit tests. |
| devolutions-gateway/src/agent_tunnel/registry.rs | Adds target matching helpers and agent lookup by subnet/domain specificity; moves to IpNetwork. |
| devolutions-gateway/src/agent_tunnel/mod.rs | Exposes new routing module. |
| devolutions-agent/src/tunnel_helpers.rs | Extends tunnel target parsing/resolution to support IPv6 and IpNetwork. |
| devolutions-agent/src/tunnel.rs | Switches advertised subnets to IpNetwork and domains to normalized DomainName. |
| crates/agent-tunnel-proto/src/stream.rs | Refactors framing helpers placement and control stream split types. |
| crates/agent-tunnel-proto/src/lib.rs | Re-exports DomainName. |
| crates/agent-tunnel-proto/src/control.rs | Introduces DomainName and changes subnet advertisement type to IpNetwork (IPv4+IPv6). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| target_addr: &str, | ||
| ) -> Result<Option<(TunnelStream, Arc<AgentPeer>)>> { | ||
| let Some(handle) = handle else { | ||
| return Ok(None); |
There was a problem hiding this comment.
try_route returns Ok(None) when handle is None, even if explicit_agent_id is set. That silently falls back to direct connect for tokens that explicitly require routing via a specific agent (jet_agent_id), which contradicts the claim semantics and can bypass intended network boundaries. Consider returning an error when explicit_agent_id.is_some() but the tunnel handle is not configured.
| return Ok(None); | |
| return match explicit_agent_id { | |
| Some(id) => Err(anyhow!( | |
| "agent {id} specified in token requires agent tunnel routing, but no tunnel handle is configured" | |
| )), | |
| None => Ok(None), | |
| }; |
| // Route via agent tunnel: explicit agent_id → subnet → domain → direct | ||
| let first_target = targets.first(); | ||
| let target_str = format!("{}:{}", first_target.host(), first_target.port()); | ||
|
|
||
| if let Some((server_stream, _agent)) = crate::agent_tunnel::routing::try_route( | ||
| agent_tunnel_handle.as_deref(), | ||
| claims.jet_agent_id, | ||
| first_target.host(), | ||
| claims.jet_aid, | ||
| &target_str, | ||
| ) |
There was a problem hiding this comment.
This routing path only considers targets.first() and ignores alternate targets even though ConnectionMode::Fwd targets are defined as “should be tried in order”. It also builds target_str as host:port, which will be malformed for IPv6 (missing brackets) and can break agent-side parsing. Consider iterating over targets in order and using candidate.as_addr() for the tunnel target string.
| let first_target = targets.first(); | ||
| let target_str = format!("{}:{}", first_target.host(), first_target.port()); | ||
|
|
||
| if let Some((server_stream, _agent)) = crate::agent_tunnel::routing::try_route( | ||
| agent_tunnel_handle.as_deref(), | ||
| claims.jet_agent_id, | ||
| first_target.host(), | ||
| claims.jet_aid, | ||
| &target_str, | ||
| ) | ||
| .await | ||
| .map_err(ForwardError::BadGateway)? | ||
| { | ||
| let selected_target = first_target.clone(); | ||
| span.record("target", selected_target.to_string()); | ||
|
|
||
| let info = SessionInfo::builder() | ||
| .id(claims.jet_aid) | ||
| .application_protocol(claims.jet_ap) | ||
| .details(ConnectionModeDetails::Fwd { | ||
| destination_host: selected_target, | ||
| }) | ||
| .time_to_live(claims.jet_ttl) | ||
| .recording_policy(claims.jet_rec) | ||
| .filtering_policy(claims.jet_flt) | ||
| .build(); | ||
|
|
||
| let server_addr: SocketAddr = "0.0.0.0:0".parse().expect("valid placeholder"); | ||
|
|
||
| return Proxy::builder() | ||
| .conf(conf) | ||
| .session_info(info) | ||
| .address_a(client_addr) | ||
| .transport_a(client_stream) | ||
| .address_b(server_addr) | ||
| .transport_b(server_stream) | ||
| .sessions(sessions) | ||
| .subscriber_tx(subscriber_tx) | ||
| .disconnect_interest(DisconnectInterest::from_reconnection_policy(claims.jet_reuse)) | ||
| .build() | ||
| .select_dissector_and_forward() | ||
| .await | ||
| .context("encountered a failure during agent tunnel traffic proxying") | ||
| .map_err(ForwardError::Internal); |
There was a problem hiding this comment.
This routing path only uses targets.first() and ignores alternate targets even though forwarding targets are meant to be tried in order. Also, format!("{}:{}", host, port) will be invalid for IPv6 targets (missing []), causing tunnel connect parsing to fail. Consider iterating over targets and using candidate.as_addr() when calling try_route/connect_via_agent.
| let first_target = targets.first(); | |
| let target_str = format!("{}:{}", first_target.host(), first_target.port()); | |
| if let Some((server_stream, _agent)) = crate::agent_tunnel::routing::try_route( | |
| agent_tunnel_handle.as_deref(), | |
| claims.jet_agent_id, | |
| first_target.host(), | |
| claims.jet_aid, | |
| &target_str, | |
| ) | |
| .await | |
| .map_err(ForwardError::BadGateway)? | |
| { | |
| let selected_target = first_target.clone(); | |
| span.record("target", selected_target.to_string()); | |
| let info = SessionInfo::builder() | |
| .id(claims.jet_aid) | |
| .application_protocol(claims.jet_ap) | |
| .details(ConnectionModeDetails::Fwd { | |
| destination_host: selected_target, | |
| }) | |
| .time_to_live(claims.jet_ttl) | |
| .recording_policy(claims.jet_rec) | |
| .filtering_policy(claims.jet_flt) | |
| .build(); | |
| let server_addr: SocketAddr = "0.0.0.0:0".parse().expect("valid placeholder"); | |
| return Proxy::builder() | |
| .conf(conf) | |
| .session_info(info) | |
| .address_a(client_addr) | |
| .transport_a(client_stream) | |
| .address_b(server_addr) | |
| .transport_b(server_stream) | |
| .sessions(sessions) | |
| .subscriber_tx(subscriber_tx) | |
| .disconnect_interest(DisconnectInterest::from_reconnection_policy(claims.jet_reuse)) | |
| .build() | |
| .select_dissector_and_forward() | |
| .await | |
| .context("encountered a failure during agent tunnel traffic proxying") | |
| .map_err(ForwardError::Internal); | |
| for candidate in &targets { | |
| let target_addr = candidate.as_addr(); | |
| if let Some((server_stream, _agent)) = crate::agent_tunnel::routing::try_route( | |
| agent_tunnel_handle.as_deref(), | |
| claims.jet_agent_id, | |
| candidate.host(), | |
| claims.jet_aid, | |
| &target_addr, | |
| ) | |
| .await | |
| .map_err(ForwardError::BadGateway)? | |
| { | |
| let selected_target = candidate.clone(); | |
| span.record("target", selected_target.to_string()); | |
| let info = SessionInfo::builder() | |
| .id(claims.jet_aid) | |
| .application_protocol(claims.jet_ap) | |
| .details(ConnectionModeDetails::Fwd { | |
| destination_host: selected_target, | |
| }) | |
| .time_to_live(claims.jet_ttl) | |
| .recording_policy(claims.jet_rec) | |
| .filtering_policy(claims.jet_flt) | |
| .build(); | |
| let server_addr: SocketAddr = "0.0.0.0:0".parse().expect("valid placeholder"); | |
| return Proxy::builder() | |
| .conf(conf) | |
| .session_info(info) | |
| .address_a(client_addr) | |
| .transport_a(client_stream) | |
| .address_b(server_addr) | |
| .transport_b(server_stream) | |
| .sessions(sessions) | |
| .subscriber_tx(subscriber_tx) | |
| .disconnect_interest(DisconnectInterest::from_reconnection_policy(claims.jet_reuse)) | |
| .build() | |
| .select_dissector_and_forward() | |
| .await | |
| .context("encountered a failure during agent tunnel traffic proxying") | |
| .map_err(ForwardError::Internal); | |
| } |
| async fn send_network_request(request: &NetworkRequest) -> anyhow::Result<Vec<u8>> { | ||
| let target_addr = TargetAddr::parse(request.url.as_str(), Some(88))?; | ||
|
|
||
| send_krb_message(&target_addr, &request.data) | ||
| send_krb_message(&target_addr, &request.data, None) | ||
| .await | ||
| .map_err(|err| anyhow::Error::msg("failed to send KDC message").context(err)) |
There was a problem hiding this comment.
send_network_request always calls send_krb_message(..., None), which means CredSSP’s Kerberos network requests in the RDP credential-injection path can never use the agent-tunnel routing pipeline (even when the gateway is configured with an agent tunnel and the target/KDC are only reachable via an agent). If transparent routing is intended for this path too, consider plumbing agent_tunnel_handle into RdpProxy and passing it through here.
| // Route through agent tunnel if available, otherwise connect directly. | ||
| let first_target = targets.first(); | ||
| let target_str = format!("{}:{}", first_target.host(), first_target.port()); | ||
|
|
||
| let (mut server_stream, server_addr, selected_target): (ServerTransport, SocketAddr, &TargetAddr) = | ||
| match crate::agent_tunnel::routing::try_route( | ||
| agent_tunnel_handle.map(AsRef::as_ref), | ||
| claims.jet_agent_id, | ||
| first_target.host(), | ||
| claims.jet_aid, | ||
| &target_str, | ||
| ) |
There was a problem hiding this comment.
connect_rdp_server routes based only on targets.first() and builds the tunnel target_str as host:port, which is incorrect for IPv6 (missing brackets) and ignores alternate targets (the token’s targets are intended to be tried in order). Consider iterating over all targets and using candidate.as_addr() for the tunnel target string, so both routing and failover behave consistently.
| registry.register(Arc::clone(&peer_a)); | ||
|
|
||
| std::thread::sleep(std::time::Duration::from_millis(10)); | ||
|
|
||
| let peer_b = make_peer("agent-b"); | ||
| let id_b = peer_b.agent_id; | ||
| let subnet_b: ipnetwork::IpNetwork = "10.2.0.0/16".parse().expect("valid test subnet"); | ||
| peer_b.update_routes(1, vec![subnet_b], vec![domain("contoso.local")]); |
There was a problem hiding this comment.
This test relies on thread::sleep(Duration::from_millis(10)) to create a deterministic ordering by SystemTime. On some platforms (notably Windows) timer/time resolution can be >=10–16ms, making this flaky. Prefer a deterministic ordering mechanism for the test (or use a larger margin / inject the timestamp) so the assertion doesn’t depend on wall-clock granularity.
| registry.register(Arc::clone(&peer_a)); | |
| std::thread::sleep(std::time::Duration::from_millis(10)); | |
| let peer_b = make_peer("agent-b"); | |
| let id_b = peer_b.agent_id; | |
| let subnet_b: ipnetwork::IpNetwork = "10.2.0.0/16".parse().expect("valid test subnet"); | |
| peer_b.update_routes(1, vec![subnet_b], vec![domain("contoso.local")]); | |
| peer_a.last_seen.store(1, Ordering::Release); | |
| registry.register(Arc::clone(&peer_a)); | |
| let peer_b = make_peer("agent-b"); | |
| let id_b = peer_b.agent_id; | |
| let subnet_b: ipnetwork::IpNetwork = "10.2.0.0/16".parse().expect("valid test subnet"); | |
| peer_b.update_routes(1, vec![subnet_b], vec![domain("contoso.local")]); | |
| peer_b.last_seen.store(2, Ordering::Release); |
| pub struct RouteAdvertisementState { | ||
| /// Monotonically increasing epoch within an agent process lifetime. | ||
| pub epoch: u64, | ||
| /// IPv4 subnets this agent can reach. |
There was a problem hiding this comment.
The field comment says “IPv4 subnets”, but the type is now Vec<IpNetwork> and the protocol advertises IPv4 and IPv6. Update the comment to avoid misleading documentation for route matching behavior.
| /// IPv4 subnets this agent can reach. | |
| /// IP subnets this agent can reach (IPv4 or IPv6). |
| protocol_version: u16, | ||
| /// Monotonically increasing epoch within this agent process lifetime. | ||
| epoch: u64, | ||
| /// Reachable IPv4 subnets. | ||
| subnets: Vec<Ipv4Network>, | ||
| /// Reachable subnets (IPv4 and IPv6). | ||
| subnets: Vec<IpNetwork>, | ||
| /// DNS domains this agent can resolve, with source tracking. | ||
| domains: Vec<DomainAdvertisement>, |
There was a problem hiding this comment.
RouteAdvertise.subnets now supports IPv6 (IpNetwork). Peers built against the previous IPv4-only format won’t be able to deserialize/interpret IPv6 subnets, even though the protocol_version is still 1. If mixed agent/gateway versions are possible, consider bumping the protocol version/ALPN or gating IPv6 advertisements behind a version negotiation to avoid silent incompatibility.
| let kdc_target = kdc_addr.to_string(); | ||
|
|
||
| if let Some((mut stream, _agent)) = crate::agent_tunnel::routing::try_route( | ||
| agent_tunnel_handle, | ||
| None, | ||
| kdc_addr.host(), | ||
| uuid::Uuid::new_v4(), | ||
| &kdc_target, | ||
| ) |
There was a problem hiding this comment.
kdc_target is built with kdc_addr.to_string(), but TargetAddr string form includes the scheme prefix (e.g. tcp://host:88). The agent tunnel target parser expects host:port/[v6]:port, so this will make tunneled KDC routing fail (and can prevent the intended direct fallback). Use kdc_addr.as_addr() (or format using host/port with IPv6 brackets) for the target_addr passed to try_route/connect_via_agent.
b70e278 to
80aed20
Compare
80aed20 to
f323f30
Compare
Builds on #1738 (core infrastructure). Follow-up PRs will add the Windows/Linux installer integration, gateway webapp agent management UI, Docker deployment, and Playwright E2E harness. Transparent routing: - `crates/agent-tunnel/src/routing.rs`: `RoutingDecision` pipeline — explicit `jet_agent_id` from the JWT → subnet match → domain suffix match (longest wins) → direct connect. Single `try_route` entry point consumed by all gateway proxy paths. - `crates/agent-tunnel/src/registry.rs`: `find_agents_for(host)` + `RouteAdvertisementState::matches_target()` do the lookup in one spot; offline agents are skipped. - Gateway proxy integration: `api/fwd.rs`, `api/kdc_proxy.rs`, `api/rdp.rs`, `rd_clean_path.rs`, `generic_client.rs`, `rdp_proxy.rs` all call `try_route` before falling through to direct TCP. - Tests: `agent-tunnel/src/integration_test.rs` (2 full-stack QUIC E2E), `tests/agent_tunnel_registry.rs` (13), `tests/agent_tunnel_ routing.rs` (8). Agent-side certificate renewal: - `enrollment.rs`: `is_cert_expiring(cert_path, threshold_days)` and `generate_csr_from_existing_key(key_path, agent_name)` — the key never changes across renewals, the gateway just signs a new cert with the same public key. - `tunnel.rs`: on connect, if the cert is within 15 days of expiry, the agent sends a `CertRenewalRequest` control message with a new CSR, waits for `CertRenewalResponse::Success`, writes the renewed cert and CA, and reconnects. - `agent-tunnel/src/listener.rs`: gateway-side handler signs the CSR via `CaManager::sign_agent_csr` and returns the new cert chain. (Stub replaced: master's handler emitted a debug log and dropped the message.) QUIC endpoint override: - `enrollment.rs`: new `quic_endpoint_override: Option<String>` parameter on `enroll_agent` — if set, overrides the endpoint returned by the enroll API. Needed because the gateway's `quic_endpoint` is derived from `conf.hostname`, which in a containerized deployment is often the container ID (not routable from outside). - `main.rs`: new `--quic-endpoint` CLI flag and `jet_quic_endpoint` JWT claim; precedence is CLI flag > JWT claim > enroll API response. Agent-side routing primitives: - `tunnel_helpers.rs`: `Target::Ip` / `Target::Domain` enum parsed from the gateway's `ConnectRequest::target`, `resolve_target` (domain → DNS), `connect_to_target` (happy-eyeballs). Tests: 22 agent-tunnel lib + 3 proto version + 24 proto control + 11 proto session + 13 registry + 8 routing integration + 64 gateway lib, all green. Zero clippy warnings; nightly fmt clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
f323f30 to
3c49f7f
Compare
`run_single_connection` previously returned `Ok(())` on both graceful
shutdown and successful cert renewal. The outer reconnect loop treated
`Ok(())` as "task done forever", so after a renewal the agent exited
and never reconnected with the new cert.
Split the return with `ConnectionOutcome::{Shutdown, CertRenewed}`;
renewal now reconnects immediately (bypassing backoff), shutdown still
exits the task. Also wrap the `CertRenewalResponse` recv in a 30s
timeout so a stalled gateway cannot hang the agent indefinitely.
Summary
Transparent routing through QUIC agent tunnel (PR 2 of 4, stacked on #1738).
When a connection target matches an agent's advertised subnets or domains, the gateway automatically routes through the QUIC tunnel instead of connecting directly.
Depends on: #1738 (must merge first)
Changes
ServerTransportenum (Tcp/Quic) inrd_clean_path.rsfor RDP tunnel supportPR stack
🤖 Generated with Claude Code