Skip to content

feat: transparent routing through agent tunnel#1741

Draft
irvingouj@Devolutions (irvingoujAtDevolution) wants to merge 2 commits intomasterfrom
feat/quic-tunnel-2-routing
Draft

feat: transparent routing through agent tunnel#1741
irvingouj@Devolutions (irvingoujAtDevolution) wants to merge 2 commits intomasterfrom
feat/quic-tunnel-2-routing

Conversation

@irvingoujAtDevolution
Copy link
Copy Markdown
Contributor

Summary

Transparent routing through QUIC agent tunnel (PR 2 of 4, stacked on #1738).

When a connection target matches an agent's advertised subnets or domains, the gateway automatically routes through the QUIC tunnel instead of connecting directly.

Depends on: #1738 (must merge first)

Changes

  • Routing pipeline: explicit agent_id → subnet match → domain suffix (longest wins) → direct
  • Integrated into all proxy paths: RDP (clean path), SSH, VNC, ARD, KDC proxy
  • ServerTransport enum (Tcp/Quic) in rd_clean_path.rs for RDP tunnel support
  • 7 routing unit tests

PR stack

  1. Protocol + Tunnel Core (feat: initial implementation of QUIC agent tunnel #1738)
  2. Transparent Routing (this PR)
  3. Auth + Webapp
  4. Deployment + Installer

🤖 Generated with Claude Code

@irvingoujAtDevolution
Copy link
Copy Markdown
Contributor Author

⚠️ Not ready to merge — depends on #1738. Will rebase and mark ready once #1738 is merged.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds transparent target-based routing through the QUIC agent tunnel so the gateway can automatically forward connections via an enrolled agent when the destination matches advertised subnets/domains.

Changes:

  • Introduces a shared agent-tunnel routing pipeline (resolve_route/try_route) and wires it into forwarding (WS TCP/TLS), RDP clean path, and KDC proxy.
  • Extends route advertisements to support IPv4+IPv6 subnets and normalized domain suffix matching (longest domain suffix wins).
  • Updates RDP clean-path server connection logic to support both TCP and QUIC transports via a concrete ServerTransport enum (to preserve Send).

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
devolutions-gateway/src/rdp_proxy.rs Updates Kerberos send function signature usage for CredSSP network requests.
devolutions-gateway/src/rd_clean_path.rs Splits clean-path into authorization vs connect; adds TCP/QUIC ServerTransport for server side.
devolutions-gateway/src/proxy.rs Tightens transport bounds to require Send for both sides.
devolutions-gateway/src/generic_client.rs Integrates agent-tunnel routing into generic TCP forwarding path.
devolutions-gateway/src/api/rdp.rs Plumbs agent_tunnel_handle into the RDP handler path.
devolutions-gateway/src/api/kdc_proxy.rs Adds optional agent-tunnel routing to KDC proxy send path and generalizes reply reading.
devolutions-gateway/src/api/fwd.rs Plumbs agent_tunnel_handle into WS forwarder and routes via tunnel when matched.
devolutions-gateway/src/agent_tunnel/routing.rs New shared routing pipeline + unit tests.
devolutions-gateway/src/agent_tunnel/registry.rs Adds target matching helpers and agent lookup by subnet/domain specificity; moves to IpNetwork.
devolutions-gateway/src/agent_tunnel/mod.rs Exposes new routing module.
devolutions-agent/src/tunnel_helpers.rs Extends tunnel target parsing/resolution to support IPv6 and IpNetwork.
devolutions-agent/src/tunnel.rs Switches advertised subnets to IpNetwork and domains to normalized DomainName.
crates/agent-tunnel-proto/src/stream.rs Refactors framing helpers placement and control stream split types.
crates/agent-tunnel-proto/src/lib.rs Re-exports DomainName.
crates/agent-tunnel-proto/src/control.rs Introduces DomainName and changes subnet advertisement type to IpNetwork (IPv4+IPv6).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

target_addr: &str,
) -> Result<Option<(TunnelStream, Arc<AgentPeer>)>> {
let Some(handle) = handle else {
return Ok(None);
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try_route returns Ok(None) when handle is None, even if explicit_agent_id is set. That silently falls back to direct connect for tokens that explicitly require routing via a specific agent (jet_agent_id), which contradicts the claim semantics and can bypass intended network boundaries. Consider returning an error when explicit_agent_id.is_some() but the tunnel handle is not configured.

Suggested change
return Ok(None);
return match explicit_agent_id {
Some(id) => Err(anyhow!(
"agent {id} specified in token requires agent tunnel routing, but no tunnel handle is configured"
)),
None => Ok(None),
};

Copilot uses AI. Check for mistakes.
Comment on lines +116 to +126
// Route via agent tunnel: explicit agent_id → subnet → domain → direct
let first_target = targets.first();
let target_str = format!("{}:{}", first_target.host(), first_target.port());

if let Some((server_stream, _agent)) = crate::agent_tunnel::routing::try_route(
agent_tunnel_handle.as_deref(),
claims.jet_agent_id,
first_target.host(),
claims.jet_aid,
&target_str,
)
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This routing path only considers targets.first() and ignores alternate targets even though ConnectionMode::Fwd targets are defined as “should be tried in order”. It also builds target_str as host:port, which will be malformed for IPv6 (missing brackets) and can break agent-side parsing. Consider iterating over targets in order and using candidate.as_addr() for the tunnel target string.

Copilot uses AI. Check for mistakes.
Comment on lines +237 to +280
let first_target = targets.first();
let target_str = format!("{}:{}", first_target.host(), first_target.port());

if let Some((server_stream, _agent)) = crate::agent_tunnel::routing::try_route(
agent_tunnel_handle.as_deref(),
claims.jet_agent_id,
first_target.host(),
claims.jet_aid,
&target_str,
)
.await
.map_err(ForwardError::BadGateway)?
{
let selected_target = first_target.clone();
span.record("target", selected_target.to_string());

let info = SessionInfo::builder()
.id(claims.jet_aid)
.application_protocol(claims.jet_ap)
.details(ConnectionModeDetails::Fwd {
destination_host: selected_target,
})
.time_to_live(claims.jet_ttl)
.recording_policy(claims.jet_rec)
.filtering_policy(claims.jet_flt)
.build();

let server_addr: SocketAddr = "0.0.0.0:0".parse().expect("valid placeholder");

return Proxy::builder()
.conf(conf)
.session_info(info)
.address_a(client_addr)
.transport_a(client_stream)
.address_b(server_addr)
.transport_b(server_stream)
.sessions(sessions)
.subscriber_tx(subscriber_tx)
.disconnect_interest(DisconnectInterest::from_reconnection_policy(claims.jet_reuse))
.build()
.select_dissector_and_forward()
.await
.context("encountered a failure during agent tunnel traffic proxying")
.map_err(ForwardError::Internal);
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This routing path only uses targets.first() and ignores alternate targets even though forwarding targets are meant to be tried in order. Also, format!("{}:{}", host, port) will be invalid for IPv6 targets (missing []), causing tunnel connect parsing to fail. Consider iterating over targets and using candidate.as_addr() when calling try_route/connect_via_agent.

Suggested change
let first_target = targets.first();
let target_str = format!("{}:{}", first_target.host(), first_target.port());
if let Some((server_stream, _agent)) = crate::agent_tunnel::routing::try_route(
agent_tunnel_handle.as_deref(),
claims.jet_agent_id,
first_target.host(),
claims.jet_aid,
&target_str,
)
.await
.map_err(ForwardError::BadGateway)?
{
let selected_target = first_target.clone();
span.record("target", selected_target.to_string());
let info = SessionInfo::builder()
.id(claims.jet_aid)
.application_protocol(claims.jet_ap)
.details(ConnectionModeDetails::Fwd {
destination_host: selected_target,
})
.time_to_live(claims.jet_ttl)
.recording_policy(claims.jet_rec)
.filtering_policy(claims.jet_flt)
.build();
let server_addr: SocketAddr = "0.0.0.0:0".parse().expect("valid placeholder");
return Proxy::builder()
.conf(conf)
.session_info(info)
.address_a(client_addr)
.transport_a(client_stream)
.address_b(server_addr)
.transport_b(server_stream)
.sessions(sessions)
.subscriber_tx(subscriber_tx)
.disconnect_interest(DisconnectInterest::from_reconnection_policy(claims.jet_reuse))
.build()
.select_dissector_and_forward()
.await
.context("encountered a failure during agent tunnel traffic proxying")
.map_err(ForwardError::Internal);
for candidate in &targets {
let target_addr = candidate.as_addr();
if let Some((server_stream, _agent)) = crate::agent_tunnel::routing::try_route(
agent_tunnel_handle.as_deref(),
claims.jet_agent_id,
candidate.host(),
claims.jet_aid,
&target_addr,
)
.await
.map_err(ForwardError::BadGateway)?
{
let selected_target = candidate.clone();
span.record("target", selected_target.to_string());
let info = SessionInfo::builder()
.id(claims.jet_aid)
.application_protocol(claims.jet_ap)
.details(ConnectionModeDetails::Fwd {
destination_host: selected_target,
})
.time_to_live(claims.jet_ttl)
.recording_policy(claims.jet_rec)
.filtering_policy(claims.jet_flt)
.build();
let server_addr: SocketAddr = "0.0.0.0:0".parse().expect("valid placeholder");
return Proxy::builder()
.conf(conf)
.session_info(info)
.address_a(client_addr)
.transport_a(client_stream)
.address_b(server_addr)
.transport_b(server_stream)
.sessions(sessions)
.subscriber_tx(subscriber_tx)
.disconnect_interest(DisconnectInterest::from_reconnection_policy(claims.jet_reuse))
.build()
.select_dissector_and_forward()
.await
.context("encountered a failure during agent tunnel traffic proxying")
.map_err(ForwardError::Internal);
}

Copilot uses AI. Check for mistakes.
Comment on lines 637 to 642
async fn send_network_request(request: &NetworkRequest) -> anyhow::Result<Vec<u8>> {
let target_addr = TargetAddr::parse(request.url.as_str(), Some(88))?;

send_krb_message(&target_addr, &request.data)
send_krb_message(&target_addr, &request.data, None)
.await
.map_err(|err| anyhow::Error::msg("failed to send KDC message").context(err))
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

send_network_request always calls send_krb_message(..., None), which means CredSSP’s Kerberos network requests in the RDP credential-injection path can never use the agent-tunnel routing pipeline (even when the gateway is configured with an agent tunnel and the target/KDC are only reachable via an agent). If transparent routing is intended for this path too, consider plumbing agent_tunnel_handle into RdpProxy and passing it through here.

Copilot uses AI. Check for mistakes.
Comment on lines +304 to +315
// Route through agent tunnel if available, otherwise connect directly.
let first_target = targets.first();
let target_str = format!("{}:{}", first_target.host(), first_target.port());

let (mut server_stream, server_addr, selected_target): (ServerTransport, SocketAddr, &TargetAddr) =
match crate::agent_tunnel::routing::try_route(
agent_tunnel_handle.map(AsRef::as_ref),
claims.jet_agent_id,
first_target.host(),
claims.jet_aid,
&target_str,
)
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connect_rdp_server routes based only on targets.first() and builds the tunnel target_str as host:port, which is incorrect for IPv6 (missing brackets) and ignores alternate targets (the token’s targets are intended to be tried in order). Consider iterating over all targets and using candidate.as_addr() for the tunnel target string, so both routing and failover behave consistently.

Copilot uses AI. Check for mistakes.
Comment on lines +287 to +294
registry.register(Arc::clone(&peer_a));

std::thread::sleep(std::time::Duration::from_millis(10));

let peer_b = make_peer("agent-b");
let id_b = peer_b.agent_id;
let subnet_b: ipnetwork::IpNetwork = "10.2.0.0/16".parse().expect("valid test subnet");
peer_b.update_routes(1, vec![subnet_b], vec![domain("contoso.local")]);
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test relies on thread::sleep(Duration::from_millis(10)) to create a deterministic ordering by SystemTime. On some platforms (notably Windows) timer/time resolution can be >=10–16ms, making this flaky. Prefer a deterministic ordering mechanism for the test (or use a larger margin / inject the timestamp) so the assertion doesn’t depend on wall-clock granularity.

Suggested change
registry.register(Arc::clone(&peer_a));
std::thread::sleep(std::time::Duration::from_millis(10));
let peer_b = make_peer("agent-b");
let id_b = peer_b.agent_id;
let subnet_b: ipnetwork::IpNetwork = "10.2.0.0/16".parse().expect("valid test subnet");
peer_b.update_routes(1, vec![subnet_b], vec![domain("contoso.local")]);
peer_a.last_seen.store(1, Ordering::Release);
registry.register(Arc::clone(&peer_a));
let peer_b = make_peer("agent-b");
let id_b = peer_b.agent_id;
let subnet_b: ipnetwork::IpNetwork = "10.2.0.0/16".parse().expect("valid test subnet");
peer_b.update_routes(1, vec![subnet_b], vec![domain("contoso.local")]);
peer_b.last_seen.store(2, Ordering::Release);

Copilot uses AI. Check for mistakes.
pub struct RouteAdvertisementState {
/// Monotonically increasing epoch within an agent process lifetime.
pub epoch: u64,
/// IPv4 subnets this agent can reach.
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field comment says “IPv4 subnets”, but the type is now Vec<IpNetwork> and the protocol advertises IPv4 and IPv6. Update the comment to avoid misleading documentation for route matching behavior.

Suggested change
/// IPv4 subnets this agent can reach.
/// IP subnets this agent can reach (IPv4 or IPv6).

Copilot uses AI. Check for mistakes.
Comment on lines 55 to 61
protocol_version: u16,
/// Monotonically increasing epoch within this agent process lifetime.
epoch: u64,
/// Reachable IPv4 subnets.
subnets: Vec<Ipv4Network>,
/// Reachable subnets (IPv4 and IPv6).
subnets: Vec<IpNetwork>,
/// DNS domains this agent can resolve, with source tracking.
domains: Vec<DomainAdvertisement>,
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RouteAdvertise.subnets now supports IPv6 (IpNetwork). Peers built against the previous IPv4-only format won’t be able to deserialize/interpret IPv6 subnets, even though the protocol_version is still 1. If mixed agent/gateway versions are possible, consider bumping the protocol version/ALPN or gating IPv6 advertisements behind a version negotiation to avoid silent incompatibility.

Copilot uses AI. Check for mistakes.
Comment on lines +167 to +175
let kdc_target = kdc_addr.to_string();

if let Some((mut stream, _agent)) = crate::agent_tunnel::routing::try_route(
agent_tunnel_handle,
None,
kdc_addr.host(),
uuid::Uuid::new_v4(),
&kdc_target,
)
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kdc_target is built with kdc_addr.to_string(), but TargetAddr string form includes the scheme prefix (e.g. tcp://host:88). The agent tunnel target parser expects host:port/[v6]:port, so this will make tunneled KDC routing fail (and can prevent the intended direct fallback). Use kdc_addr.as_addr() (or format using host/port with IPv6 brackets) for the target_addr passed to try_route/connect_via_agent.

Copilot uses AI. Check for mistakes.
Base automatically changed from feat/quic-tunnel-1-core to master April 21, 2026 16:44
Builds on #1738 (core infrastructure). Follow-up PRs will add the
Windows/Linux installer integration, gateway webapp agent
management UI, Docker deployment, and Playwright E2E harness.

Transparent routing:

- `crates/agent-tunnel/src/routing.rs`: `RoutingDecision` pipeline —
  explicit `jet_agent_id` from the JWT → subnet match → domain
  suffix match (longest wins) → direct connect. Single `try_route`
  entry point consumed by all gateway proxy paths.
- `crates/agent-tunnel/src/registry.rs`: `find_agents_for(host)` +
  `RouteAdvertisementState::matches_target()` do the lookup in one
  spot; offline agents are skipped.
- Gateway proxy integration: `api/fwd.rs`, `api/kdc_proxy.rs`,
  `api/rdp.rs`, `rd_clean_path.rs`, `generic_client.rs`, `rdp_proxy.rs`
  all call `try_route` before falling through to direct TCP.
- Tests: `agent-tunnel/src/integration_test.rs` (2 full-stack QUIC
  E2E), `tests/agent_tunnel_registry.rs` (13), `tests/agent_tunnel_
  routing.rs` (8).

Agent-side certificate renewal:

- `enrollment.rs`: `is_cert_expiring(cert_path, threshold_days)` and
  `generate_csr_from_existing_key(key_path, agent_name)` — the key
  never changes across renewals, the gateway just signs a new cert
  with the same public key.
- `tunnel.rs`: on connect, if the cert is within 15 days of expiry,
  the agent sends a `CertRenewalRequest` control message with a new
  CSR, waits for `CertRenewalResponse::Success`, writes the renewed
  cert and CA, and reconnects.
- `agent-tunnel/src/listener.rs`: gateway-side handler signs the
  CSR via `CaManager::sign_agent_csr` and returns the new cert chain.
  (Stub replaced: master's handler emitted a debug log and dropped
  the message.)

QUIC endpoint override:

- `enrollment.rs`: new `quic_endpoint_override: Option<String>`
  parameter on `enroll_agent` — if set, overrides the endpoint
  returned by the enroll API. Needed because the gateway's
  `quic_endpoint` is derived from `conf.hostname`, which in a
  containerized deployment is often the container ID (not routable
  from outside).
- `main.rs`: new `--quic-endpoint` CLI flag and `jet_quic_endpoint`
  JWT claim; precedence is CLI flag > JWT claim > enroll API
  response.

Agent-side routing primitives:

- `tunnel_helpers.rs`: `Target::Ip` / `Target::Domain` enum parsed
  from the gateway's `ConnectRequest::target`, `resolve_target`
  (domain → DNS), `connect_to_target` (happy-eyeballs).

Tests: 22 agent-tunnel lib + 3 proto version + 24 proto control +
11 proto session + 13 registry + 8 routing integration + 64 gateway
lib, all green. Zero clippy warnings; nightly fmt clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`run_single_connection` previously returned `Ok(())` on both graceful
shutdown and successful cert renewal. The outer reconnect loop treated
`Ok(())` as "task done forever", so after a renewal the agent exited
and never reconnected with the new cert.

Split the return with `ConnectionOutcome::{Shutdown, CertRenewed}`;
renewal now reconnects immediately (bypassing backoff), shutdown still
exits the task. Also wrap the `CertRenewalResponse` recv in a 30s
timeout so a stalled gateway cannot hang the agent indefinitely.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants