Skip to content

Release/v0.7.0#287

Open
pigri wants to merge 245 commits into
mainfrom
release/v0.7.0
Open

Release/v0.7.0#287
pigri wants to merge 245 commits into
mainfrom
release/v0.7.0

Conversation

@pigri
Copy link
Copy Markdown
Contributor

@pigri pigri commented May 6, 2026

No description provided.

pigri and others added 30 commits April 27, 2026 21:53
…l to limit update frequency to once every 7 days
Switch synapse to consume the capture-only dendrite at
../dendrite-0427/crates/dendrite via [patch], drop the firewall and
bpf-windows feature flags that were removed from dendrite, and stub the
Windows xdp_blocker types (real impl arrives in PR 2 from
migrate-to-synapse/bpf_windows_blockers/).

- Cargo.toml: dendrite -> local path dep with the new feature set
  (drop firewall, bpf-windows; keep capture, ndis-capture; add
  windows-afxdp-capture, ssh-collector). [patch] redirects the dendrite
  git refs in synapse-events and occipital to the same local path so
  the workspace resolves a single dendrite version.
- crates/synapse-events: remove firewall-bridge feature and the dead
  PacketMatcher impl on dendrite::firewall::FingerprintMatcher (gone).
- crates/occipital: keep the firewall feature as a placeholder so the
  cfg gates in tui.rs don't warn; PR 2 will rewire them to a synapse-
  internal matcher.
- src/platform/windows/xdp_blocker.rs: replace the dendrite re-export
  with stub types (XdpBlockMode, WindowsXdpBlocker) that always error
  at runtime. Keeps thalamus_ids_windows.rs compiling; PR 2 replaces.
- src/platform/windows/mod.rs: point xdp_blocker tests at the local stub.
- src/bin/afpacket_diag.rs: convert to a thin Linux/Windows dispatcher.
  Linux body moves to src/afpacket_diag_impl.rs (outside src/bin/ so
  cargo's auto-bin-discovery doesn't pick it up). Fixes pre-existing
  Windows compile errors caused by AF_PACKET / sockaddr_ll / setsid.

cargo check passes on Windows. Functional changes (subscribe to
dendrite::CaptureEvent, delete the polling code in fingerprint_writer
and tcp_fingerprint) are PR1.1 -- still to come.
… (CaptureBackend)

- Add src/capture/ndis_subscriber.rs: Windows-only NdisSubscriber that merges all
  NDIS adapter CaptureEnvelope streams and routes to fingerprint log + corpus_callosum
- Replace the 52-line inline capture.run(callback) block in main.rs with a 10-line
  NdisSubscriber::start() / std::thread::spawn(subscriber.run()) call
- Log message updated: "NDIS fingerprint capture started (CaptureBackend)"
- Tested on Azure VM: 20 fingerprint log entries, same JA4T/JA4TS hashes as before
…e-to-synapse

PR 2 of the synapse refactor. Copies the staged enforcement code from
dendrite-0427/migrate-to-synapse/bpf_windows_blockers/ into
src/platform/windows/firewall/ and wires it in:

- New module src/platform/windows/firewall/ with DatapathController,
  WindowsXdpBlocker (FFI), WindowsEbpfBlocker (FFI), and their C shims
- WindowsDatapathConfig/Mode/Preference config types defined in firewall/mod.rs
- build.rs: cc::Build for xdp_block_shim.c and ebpf_block_shim.c (Windows only)
- Cargo.toml: add cc = "1" to [build-dependencies]
- thalamus_ids_windows.rs: import path updated to firewall::xdp_blocker
- Deleted stub src/platform/windows/xdp_blocker.rs (replaced by real impl)

Verified: cargo build --release succeeds; synapse agent starts cleanly on
Azure VM with NDIS capture active and no regressions.
Follow-up to PR 2 commit (3a98a3e): stage the stub file deletion that
was overlooked in the explicit git add, and record the cc = "1"
build-dependency in Cargo.lock.
…napse-core

Step 1 of PR 3 (per-OS workspace split):

- Add three new workspace members: synapse-core, synapse-linux, synapse-windows
- Move src/core/cli.rs → crates/synapse-core/src/core/cli.rs
  - Inline CaptchaProvider enum (was imported from WAF captcha module)
  - Inline FirewallMode enum (was duplicated in firewall + noop modules)
  - Gate ProxyConfig::to_app_config on cfg(all(unix, feature = "proxy"))
- src/core/mod.rs: re-export synapse_core::core::cli so existing crate::core::cli paths
  compile unchanged throughout the rest of the codebase
- captcha.rs: replace CaptchaProvider definition with re-export from synapse-core
- firewall/mod.rs + firewall_noop.rs: replace FirewallMode definition with re-export
- parceyaml.rs: gate the to_app_config shortcut on cfg(all(unix, feature = "proxy"))

cargo check --workspace passes on Windows.
Move three OS-agnostic modules from the root crate into synapse-core:
- src/utils/maxmind.rs -> synapse-core/src/utils/maxmind.rs
- src/security/geoip/mod.rs -> synapse-core/src/security/geoip/mod.rs
- src/security/threat/mod.rs -> synapse-core/src/security/threat/mod.rs

Root crate stubs replaced with pub use synapse_core::* re-exports so all
existing crate::security::geoip / crate::security::threat / crate::utils::maxmind
paths continue to resolve without touching any caller.

Added corpus-callosum, maxminddb, memmap2, arc-swap, chrono, dashmap,
pingora-memory-cache, and tokio to synapse-core dependencies.

cargo check --workspace passes on Windows.
Move three more OS-agnostic modules from the root crate into synapse-core:
- src/utils/http_client.rs    -> synapse-core/src/utils/http_client.rs
- src/platform/agent_status.rs -> synapse-core/src/platform/agent_status.rs
- src/platform/authcheck.rs    -> synapse-core/src/platform/authcheck.rs

Root stubs replaced with pub use synapse_core::* re-exports.
Added reqwest, sha2, hex deps to synapse-core.
cargo check --workspace passes on Windows.
…imports

Direct path dep `../../dendrite-0427/crates/dendrite` resolved incorrectly
(landed inside synapse-0427/ subtree). Switch to the git source intercepted
by the workspace [patch] entry, matching how synapse-events/occipital do it.

Also removes the now-unused `use serde::{Deserialize, Serialize}` from
firewall/mod.rs and firewall_noop.rs — those derives moved to synapse-core.
Moves the following OS-agnostic modules into synapse-core, replacing root
files with one-line pub-use stubs so existing crate:: paths resolve unchanged:

- src/utils/path_sanitize.rs → crates/synapse-core/src/utils/path_sanitize.rs
- src/utils/state.rs → crates/synapse-core/src/utils/state.rs
- src/utils/tls_fingerprint.rs → crates/synapse-core/src/utils/tls_fingerprint.rs
- src/utils/tls_client_hello.rs → crates/synapse-core/src/utils/tls_client_hello.rs
- src/logger/fingerprint_log.rs → crates/synapse-core/src/logger/fingerprint_log.rs

tls_fingerprint and tls_client_hello use dendrite::Ja4/JA4S types; wired via
the git+patch dep added in the previous commit.
Moves worker/manager.rs (WorkerManager, PeriodicWorker, PeriodicTask, etc.)
into synapse-core — it has no platform deps, only std + tokio. Root
src/worker/manager.rs becomes a one-line pub-use stub so all existing
crate::worker:: paths resolve unchanged.
…core

Both workers only reference types that are already in synapse-core:
http_client, worker framework, security::threat, utils::path_sanitize.
Root files become pub-use stubs.
Pure-std IP/CIDR parsing helpers — no platform deps.
Phase 1 of the synapse → amigdala integration plan
(amigdala/docs/synapse-integration.md). Lossy broadcast fan-out from
the FingerprintLogger; new amigdala_bridge module subscribes,
translates each FingerprintLogEntry into amigdala::JaEvent(s), and
pumps them into a caller-supplied Reactor. On a hit, the returned
ReactorAction is forwarded to a user-wired install closure (typically
firewall.block_ip_src_port).

Constraints honoured:
* fingerprints.log path stays untouched. Broadcast happens after the
  file send.
* Broadcast is bounded (1024 entries). Slow consumers see Lagged(n)
  and skip — never backpressure the writer.
* Optional dep gated behind amigdala-reactor feature; default builds
  unchanged.
* JA4T / JA4TS deliberately not emitted via reactor — those are
  enforced at the kernel/daemon layer, redundant via this path.

Components:
* src/logger/fingerprint_log.rs — FINGERPRINT_BROADCAST OnceLock,
  subscribe_fingerprints() public API, log_fingerprint() fans out.
* src/amigdala_bridge.rs (new, feature-gated) — run() consumer loop,
  entries_to_events() converter with 4 unit tests covering kind
  dispatch, source 5-tuple carry-through, unset-fields skip, and
  invalid-IP handling.
* Cargo.toml — optional amigdala = { path = "../amigdala", features
  = ["reactor"] }; new amigdala-reactor feature.
Phase 3 of the synapse → amigdala integration plan. JA4L's c
component — the app-handshake half-RTT — is the meaningful signal
for distinguishing real distant clients from colocated bots. dendrite
already supports `Ja4l::with_app_handshake(...)`; this surfaces it
through synapse's `Ja4lMeasurement` wrapper.

Changes:
* `Ja4lMeasurement` gains optional D / E / F timestamps
  (app_hello_time / app_response_time / app_ack_time) plus matching
  setters (set_app_hello / set_app_response / set_app_ack).
* `fingerprint_client` / `fingerprint_server` now branch on
  availability: 3-segment `{tcp_rtt}_{ttl}_{app_rtt}` when D / E / F
  are all set, 2-segment `{tcp_rtt}_{ttl}` otherwise — preserves
  back-compat with existing 2-segment consumers.
* New tests: 3-segment emission with reference values from dendrite,
  partial-app-handshake falls back to 2-segment.

Capture-site wiring (where in the TLS path D / E / F get recorded)
is the next PR — this commit ships the API only so the call sites
have a stable target to plumb into.
Phase 2 of the synapse → amigdala integration plan. After the XDP
skeleton loads, pin the dendrite-provided `ja4_client_events`
ringbuf and `ja4_client_events_dropped` per-CPU counter at stable
paths under `/sys/fs/bpf/dendrite/` per interface. Out-of-process
amigdala consumers (BpfRingbufSource::open_pinned) can then
subscribe alongside our existing in-process consumer.

Lifecycle:
* Pin happens once per attached XDP interface, after the skel loads
  and the in-process ja4_store consumer is registered.
* `remove_file` on the pin path before pinning so a crashed prior
  run's leftover (libbpf doesn't unpin on its own, fresh pin would
  fail EEXIST) doesn't block startup.
* Pin paths are per-interface (`ja4_client_events_<iface>`) so
  multiple XDP attachments don't collide.
* Best-effort: pin failures log a warning but don't affect the
  in-process consumer or the firewall path.

Pin paths intentionally use the `/sys/fs/bpf/dendrite/` namespace
because the BPF program comes from dendrite — synapse loads it but
doesn't own its identity. amigdala's docs/synapse-integration.md
refers to the same path.

This is the throughput story: NFQUEUE caps at ~100k pps, the BPF
ringbuf path scales with TLS flow-arrival rate which is multi-Mpps
on a single core. Producers and consumers no longer need to share
a process.
Phase 4 of the synapse → amigdala integration plan. Adds a thin
plumbing layer for pushing JA4T / JA4TS fingerprint patterns into
amigdala's userland NDIS enforcement daemon on Windows. Where the
patterns come from (config file, gen0sec platform API, threat
feed) is intentionally out of scope — synapse provides the wire,
operators bring the policy.

* `src/amigdala_ndis_feed.rs` (new, gated `cfg(target_os =
  "windows", feature = "amigdala-ndis")`) — `NdisFeed` shared
  handle wrapping `Arc<Mutex<NdisFirewall>>`. Exposes
  block / unblock / allow / unallow for JA4T and JA4TS, plus
  `replace_ja4t_blocklist(&[String])` for snapshot-style feeds
  (flush + reinstall in one operation).
* `Cargo.toml` — new `amigdala-ndis` feature pulling
  `amigdala/windows-ndis`. Moved the optional `amigdala` dep out
  of `[target.'cfg(unix)'.dependencies]` (where it was
  unreachable on Windows) into the top-level `[dependencies]` so
  both `amigdala-reactor` (Linux + Windows) and `amigdala-ndis`
  (Windows-only) can resolve it.

Verified with `cargo check --no-default-features --features
amigdala-ndis` on the Windows MSVC test host — full build clean.
Linux default + `--features amigdala-reactor` builds also clean.
rustc 1.95+ defaults to rust-lld for x86_64-unknown-linux-gnu.
lld is strict about symbol resolution and rejects the openssl-sys
vendored OpenSSL because of an inconsistency in its assembly:

    rust-lld: error: undefined symbol: bn_sqrx8x_internal
    >>> referenced by x86_64-mont.s:781
    >>>                libcrypto-lib-x86_64-mont.o
    >>> did you mean: bn_sqr8x_internal
    >>> defined in: libcrypto-lib-x86_64-mont5.o

The two assembly modules in vendored openssl-sys were compiled with
different BMI2 feature flags — `x86_64-mont.s` calls into the
`bn_sqrx8x_internal` MULX variant but `x86_64-mont5.s` only ships
the non-MULX `bn_sqr8x_internal`. lld reports it; bfd ld resolves
the same set of objects without complaining (older, more lenient
symbol-matching).

`cargo build` linked OK accidentally because of object ordering;
`cargo test --bin synapse` consistently failed. Pinning bfd via
.cargo/config.toml's `[target.x86_64-unknown-linux-gnu]` table
unblocks all test builds without affecting Windows / aarch64
targets.

Verified: 4 amigdala_bridge tests + 3 JA4L tests + 174 other
synapse tests run after the fix.
…code

Migrates the fingerprint enforcement code from dendrite's migrate-to-synapse/
into src/security/firewall/fingerprint/:
- matcher.rs: userland FingerprintMatcher for JA4/JA4S/JA4H/JA4L/JA4SSH/JA4X/JA4D/JA4D6
- rules.rs: BpfFingerprint, Action, BlockRule
- error.rs: error enum
- bpf/xdp_filter.bpf.c: kernel XDP filter for JA4T/JA4TS

Also adds decode_l3 branch in thalamus_ids_windows so the IDS pipeline can
decode raw IP-layer frames (needed for the BLOCK_APPLIED rule path verified
on Azure).
Splits synapse into three crates:
- synapse-linux: Linux entry point (BPF/XDP, nftables, iptables, daemonize)
- synapse-windows: Windows entry point (ETW, service, AF_XDP, NDIS)
- synapse (root binary): 19-line dispatcher that calls the per-OS run()

Both per-OS crates compile the same logic by including src/app.rs via
include!(), giving each its own crate context where crate:: paths resolve
correctly without a 138-site mass rename. The root build.rs is gone — each
per-OS crate has its own build script for SkeletonBuilder / cc::Build.

Cargo.toml gates the per-OS crate as a target dependency so cargo build
on Linux pulls only synapse-linux, and on Windows pulls only synapse-windows.

Verified on Azure VM: NDIS capture emits JA4T fingerprints, IDS rule
blocks port 22 with BLOCK_APPLIED log, traffic flows again after
synapse exits.
NdisSubscriber was only forwarding ja4t and ja4ts via the CaptureBackend
trait, which only carries those two variants. Bypass CaptureBackend and
consume FingerprintEvent directly so JA4, JA4S, JA4L, and JA4LS are also
written to fingerprints.log and stored in corpus_callosum.

Also normalises flow direction: when only JA4TS is present the event
src/dst is server/client; swap them so the log entry is always keyed on
(client, server).

Verified on Azure: single_source=0 after fix (was 4 before).
… block

Phase 1 of the synapse → amigdala bridge: make the lossy
`subscribe_fingerprints()` broadcast (already shipped) actually
drive a kernel firewall on `Reactor` hits.

* `amigdala_bridge::spawn_with_iptables` / `spawn_with_nftables`
  build an amigdala firewall with `with_ja4(...)` (which spawns
  the NFQUEUE daemon and exposes a `BlockedJa4Sets` /
  `Reactor`), wrap it in `Arc<Mutex<...>>`, and spawn the bridge
  task with an apply closure that calls `block_ip` /
  `block_ip_src_port` on each `ReactorAction`. Generic over
  `FirewallBackend` so both backends share one task body.
* main.rs: spawn the iptables-backed bridge alongside the
  Linux `FingerprintWriter`, gated on `feature =
  "amigdala-reactor"`. Init failure logs and continues — we
  don't take down synapse if iptables happens to be missing.
* Cargo.toml: amigdala dep gains the `ja4` feature so
  `with_ja4(...)` and `Reactor` are in scope.

End-to-end shape after this commit:
  uprobe / wire capture
      → FingerprintLogEntry
        → log_fingerprint() (file write + broadcast)
          → amigdala_bridge::run() (subscriber)
            → entries_to_events()  (`JaEvent` per kind)
              → reactor.consume()  (returns `ReactorAction`)
                → apply closure  (locks `Arc<Mutex<Firewall>>`)
                  → kernel rule install  (iptables chain)

Other JA4-family kinds (JA4 / JA4S / JA4L / JA4LS / JA4SSH /
JA4H / JA4X) now have a working kernel-enforcement path on
the host where synapse runs the capture: the writer broadcasts
→ bridge converts → reactor matches against the configured
blocked set → SYNs from the offending source IP get dropped at
the netfilter layer. JA4T / JA4TS stay on the inline kernel-
match path (iptables / nftables decompose into kernel match
clauses, dendrite XDP for amigdala v0.10's xdp backend) — the
bridge intentionally skips those kinds in `entries_to_events`.

Bridge unit tests (4/4) pass; a full live test belongs in
amigdala's external matrix harness (next change there).
Three changes that together make synapse drive amigdala's XDP
firewall instead of iptables, and let an external test harness
install pre-blocks without exposing a control plane:

* `amigdala_bridge::spawn_with_xdp(iface, shutdown)` — symmetric
  to `spawn_with_iptables`, but builds an `amigdala::firewall::
  xdp::XdpFirewall` (hillock TC + dendrite XDP coexisting on the
  same iface) and feeds the Reactor's `BlockIp` actions into
  hillock's IpFilterManager. JA4 family kinds whose capture is
  uprobe-driven (JA4H, JA4X) or post-decrypt (JA4, JA4S) all
  flow to *kernel* TC drops instead of stopping at userland.
* `entries_to_events` now emits `Ja4ts` events. The kind was
  excluded by an outdated comment claiming "kernel-inline only"
  — now that amigdala routes Ja4ts through the userland set,
  the bridge has to forward it.
* `preblock_from_env`: parse `AMIGDALA_PREBLOCK=kind=value,...`
  on startup and call `block_fingerprint` for each entry. Used
  by amigdala's `test-xdp-all-ja4.sh` to install a known-bad
  fingerprint before driving traffic.
* main.rs: spawn the XDP-backed bridge alongside the Linux
  FingerprintWriter; iface comes from `config.network.iface`
  (falls back to enp35s0). Init failure logs and continues.
* `Cargo.toml`: amigdala dep gains `xdp` feature; `[patch]`
  redirect to local `/root/dendrite` so source-level changes to
  the dendrite collectors actually land in the build (the prior
  git rev meant nothing I edited was reaching the BPF skel).

Verified by amigdala's `tools/test-xdp-all-ja4.sh` — every JA4
family kind drops real cross-machine SYNs at the kernel after
this commit. Wire trace included; per-backend run signature
differs (TC vs XDP, post-handshake vs SYN-time) and the
verdict logic in the harness accepts both shapes.
pigri and others added 30 commits May 18, 2026 07:16
…r-node engine

Heavy signature IDS no longer has to run inside every scaled proxy
replica; ruleset cost becomes O(nodes) instead of O(proxy replicas).

P0 (no-behaviour-change foundation):
- synapse-events: `HttpTxn` — the serde producer<->consumer boundary
  (mirrors `SignatureEngine::inspect_http` inputs + content_type).
- synapse-proxy/thalamus_post_tls: split into pure `build_http_txn()`
  + `inspect_txn()`; inline composes them (byte-identical to legacy).
- synapse-core: `IdsConfig.post_tls { mode, socket }`, `#[serde(default)]`
  so an absent key = `inline` = exact legacy behaviour.

P1 (offload transport + per-node consumer):
- synapse-eventbridge/ids_l7: framed UDS transport — a bounded,
  non-blocking, drop-on-backpressure `L7Sink` (proxy) and
  `start_l7_ingest_server` (agent). Frame `[u32 LE len][serde_json
  HttpTxn]`; the server chmods the socket 0666 so a non-root proxy can
  connect (node-local IPC; shared-gid tightening is a later refinement).
- thalamus_post_tls: `offload` builds the sink ONLY (no engine/ruleset
  in the proxy); `is_active()` dispatch — inline inspects (can block),
  offload fire-and-forgets the txn and returns None (never blocks the
  request path on IDS).
- synapse-idp/start_l7_ingest: ONE per-node `SignatureEngine`
  consuming the UDS; wired in synapse-app for agent + ids.enabled +
  post_tls.mode==offload.

Verified in k3s: proxy non-root with zero ruleset even at 2 replicas;
agent runs one node-wide engine; a sqlmap-UA request through the
ruleset-less proxy is detected out-of-band ("ET SCAN Sqlmap SQL
Injection Scan", sid 2008538); WAF suite 26/26 unaffected. Canonical
`cargo check --features proxy` + clippy (new crates) + synapse-events
tests green.

offload is detect/alert only (no inline block — by design, IDS must
never add latency/failure to serving). L7-alert telemetry/enforcer
parity and scale/throughput are P2/P3.
The decoupled L7-IDS consumer now treats detections exactly like the
XDP IDS worker, so existing OTLP/eventbridge consumers and kernel
enforcement need no L7-specific handling:

- Telemetry/eventbridge parity: emit via the same
  `synapse_blocking_log::Builder` path the XDP IDS uses
  (BlockLayer::Ids / RateLimit; BlockAction::Notice for
  observation, Block/Ratelimit when enforced). L7 detections now
  appear as `synapse.block.events.ids` (service.name=synapse,
  layer=ids) identically to XDP-IDS alerts.
- Enforcement parity: a block/drop/reject action under
  `enforce_block` calls
  `synapse_security::runtime_blocklist::block_ip_runtime(src_ip)`
  — the same runtime blocklist the agent's XDP firewall consults —
  so future packets from the offender drop at the NIC.

Verified in k3s (16GB/8vCPU VM): dual full-ET engines (XDP-IDS +
L7-ingest, 49604 rules each) up in ~48s, agent Running/0-restarts
(no OOM); sqlmap UA via the offload proxy and a /__l7drop request
both surfaced at the otel-collector as synapse.block.events.ids
(action notice and block); enforce path logged
`action=block blocked=true`; proxy still non-root with zero
ruleset; WAF 26/26.

Wire-level packet drop verification remains the documented
XDP-east-west topology caveat (out of scope); P2 proves the
code-path + telemetry/enforce parity. P3 = scale/throughput +
optional dedicated IDS tier.
…al-unicast src IPs

P3 surfaced a remotely-triggerable DoS in the decoupled post-TLS IDS:
in `offload` the L7 engine's `src_ip` is the *proxy-observed* client,
which behind NAT / LB / CDN / kube-proxy is routinely a shared infra
address. P2's enforce-parity then `block_ip_runtime`'d it, so one
`/__l7drop` request banned the shared k3s/SLIRP ingress IP and
blackholed every client behind it (WAF 9/25). The parity code was
correct; the offload input is the untrustworthy part.

Refinement:

- New `synapse_security::runtime_blocklist::is_ban_safe_ip` — the
  single source of truth for "may this IP be pushed into a kernel
  firewall". Global-unicast only; rejects loopback / RFC1918 / CGNAT
  (100.64.0.0/10) / link-local / broadcast / documentation /
  unspecified / multicast / IPv6 ULA & link-local. Fails closed.
- New `PostTlsIdsConfig.enforce` (`#[serde(default)]` = false),
  deliberately distinct from `ids.enforce_block`: `offload` now
  defaults to pure detect/alert regardless of `enforce_block`. IP
  enforcement is an explicit opt-in.
- `thalamus_l7_ingest`: enforcement keys off `post_tls.enforce`, and
  even when enabled every ban is hard-gated by `is_ban_safe_ip`. A
  shared / non-global-unicast src is logged and downgraded to
  detect-only (telemetry still emits, as Notice).
- `waf_fw_offload::is_safe_to_block` now delegates to the shared
  helper (one implementation; also gains the CGNAT coverage it
  previously lacked). The own-listener exemption stays proxy-local.

Gates: `cargo check --features proxy` ✓; `-p synapse-app --features
proxy,amygdala-reactor` ✓; `cargo test -p synapse-security
runtime_blocklist::` 2/2 ✓.

Verified in k3s with the adversarial config (`post_tls.enforce:true`
+ the `/__l7drop` drop rule — the exact P3-blackhole setup): alert
fires `blocked=false`, guard refuses banning `10.42.0.1`, runtime
blocklist gains 0 shared IPs, otel still receives `sid:9000009` as
`synapse.action=notice`, proxy still non-root/zero-ruleset, and
WAF 26/26 — ingress is no longer taken down.
The [patch.gen0sec] thalamus = { path = "../thalamus" } line was left
uncommented on this branch (amygdala/cortex/dendrite are all commented
per the section's own guidance: workspace deps resolve from the
gen0sec registry; uncomment only for local source-level edits).

It pinned the build to a local ../thalamus checkout that is 1 commit
ahead of thalamus origin/main (ca275a5 'perf(rules): force ContiguousNFA
Aho-Corasick prefilter' — perf-only, unreleased, no API change), so
the branch did not build standalone / in CI. Comment it out; Cargo.lock
re-resolves thalamus 0.0.4 from the gen0sec registry. cargo build
--features proxy,bpf --bin synapse: clean against the registry crate.
…ert set)

With acme.enabled=false the TLS listener was bound to a one-time
startup snapshot of the certificate set: start.rs built
`TlsSettings::with_callbacks(Box::new((*certs).clone()))` once, and
the inotify cert watcher rebuilt `Certificates` into the
`certificates_arc` ArcSwap (start.rs:430) that the live listener's
TlsAccept never consulted. Result: a cert added or rotated in the
certificates dir after boot was invisible until a process restart
(only certs present at startup, and the default, were ever served).

  - tls.rs: extract the SNI cert-selection body of
    `impl TlsAccept for Certificates::certificate_callback` into an
    inherent `Certificates::apply_to_ssl(&self, &mut SslRef)` so a
    dynamic resolver can delegate to the CURRENT set; the trait impl
    now just calls it. Add `tls_settings_from_accept(Box<dyn
    TlsAccept>, grade)` factoring the grade/ALPN wiring so any
    resolver can be supplied without duplicating it
    (create_tls_settings_with_sni now calls it too — behavior
    unchanged).
  - start.rs: new `DynamicCertificates { store, fallback }` TlsAccept
    that, per handshake, `load_full()`s the live `certificates_arc`
    ArcSwap and delegates to that `Certificates::apply_to_ssl`
    (falling back to the startup set only if the store is somehow
    empty — never after init). The TLS listener is built from it
    instead of a snapshot clone. Cheap, lock-free, in-flight
    handshakes unaffected, behaviorally identical when the set is
    unchanged.

`watch_folder`'s event filter is deliberately left unchanged: it
already re-scans on Create / Modify(Data) / Remove, which is exactly
what an in-place cert write (truncate+write, no rename — the pattern
synapse-operator uses) produces. Atomic tmp+rename would land as
Modify(Name) and still be missed, but no first-party writer does that;
that broadening is intentionally out of scope here.

No new dependencies (arc-swap/async-trait already present in
synapse-proxy). cargo build --features proxy,bpf --bin synapse: clean.

k3s e2e (synapse-operator projecting Ingress/Gateway TLS Secrets into
an operator-owned certs dir): 12/12 — distinct cert served per SNI for
domains added AFTER synapse booted; cert rotation served immediately
with the SAME pod (no restart, no SIGHUP); pruning a cert falls SNI
back to the configured default; default + public domain unaffected.
WAF-branch commit 3f10782 added a pinned-map reconcile
(synapse-access-rules: snapshot_kernel_banned_rules) that calls
SYNAPSEFirewall::banned_ipv4_entries()/banned_ipv6_entries(). Those
inherent accessors exist only on the BPF firewall (firewall/mod.rs);
the noop SYNAPSEFirewall (firewall_noop.rs, selected for
not(all(unix, feature="bpf")) — Windows / --no-default-features)
never got them, so the Windows release-windows --no-default-features
build failed: E0599 no method named banned_ipv4_entries. Pre-existing
on this branch; unrelated to the thalamus patch hygiene commit (the
registry thalamus 0.0.4 compiled fine).

Add empty-Vec stubs mirroring the real signatures (no kernel
banned_ips map without BPF ⇒ empty reconcile baseline), matching how
firewall_noop no-ops every other firewall method. Verified:
cargo check -p synapse-access-rules --no-default-features is clean.
The cert-fix import additions in start.rs weren't rustfmt-canonical
(import ordering / cfg-attr placement at lines 13/22/28), failing the
Formatting job and the Windows job's 'Check formatting' step. nightly
rustfmt --edition 2024, this file only; no logic change.
Pre-existing on feature/waf-response-phase (WAF WIP; unrelated to the
thalamus patch-hygiene, the live-cert-reload merge, or the noop
firewall stub). cargo clippy --workspace -- -D warnings and cargo doc
-- -D warnings gate these:

  - synapse-eventbridge/src/ids_l7.rs:58  needless_return  → tail expr
  - synapse-eventbridge/src/ids_l7.rs:76  manual_is_multiple_of
      n % 1000 == 0  →  n.is_multiple_of(1000)
  - synapse-waf/src/actions/penalty_sync.rs:43  public doc intra-link
      to private DEFAULT_CHANNEL  →  plain code span (no link)

Tool-suggested, behaviour-preserving. Verified locally:
clippy -p synapse-eventbridge --all-targets -- -D warnings clean;
RUSTDOCFLAGS=-D warnings cargo doc -p synapse-waf clean; fmt clean.
…ound 2)

Pre-existing WAF-branch lint debt under cargo clippy --workspace
--all-targets -- -D warnings (and the Windows clippy step). Unrelated
to thalamus / live-cert-reload / noop-firewall.

  - synapse-eventbridge/src/ids_l7.rs: FRAME_MAX/SINK_CAP are used
    only in the #[cfg(unix)] sink path; on non-unix (Windows) they
    were dead_code → -D warnings error. Gate both consts #[cfg(unix)].
  - synapse-waf/src/actions/penalty_box.rs, synapse-waf/src/
    wirefilter.rs, synapse-security/src/runtime_blocklist.rs: new test
    modules use .unwrap() (clippy.toml disallowed-methods). Apply the
    repo's established #[allow(clippy::disallowed_methods)] convention
    (10+ such sites on release/v0.7.0) at the #[cfg(test)] mod level.

Verified locally: cargo clippy --locked --workspace --all-targets --
-D warnings is CLEAN; cargo fmt --all -- --check clean.
Two related fixes from the gcp-nlp-l4 demo memory-reduction pass:

1. BPF map sizes in synapse-security/firewall/bpf:

   - xdp_afxdp_tail.bpf.c, xdp_maps.h: tighter per-CPU + per-flow maps;

     NO_PREALLOC where shape allows; bounded LRU caps for synapse-side

     allowlists/denylists.

   - bpf_utils.rs / bpf_utils_noop.rs: matching userspace knobs and the

     reference-count bookkeeping that pinned maps need to survive a

     pod restart cleanly. Pinned-map staleness across rebuilds was the

     bug that initially masked these savings.

2. network.iface: "auto" now selects ONLY physical UP uplink interface(s).

   Loopback and CNI/virtual/tunnel devices (veth*, lxc*, cilium*, gke*,

   docker*, br-*, vxlan*, …) are excluded; bond* and VLAN sub-interfaces

   (eth0.100) are kept. The previous "auto" behaviour attached XDP +

   capture to *every* up interface on a node, which exploded memory on

   Cilium-equipped GKE nodes (dozens of lxc*/cilium_* veths each get

   their own XDP program copy + their own per-CPU JA4 maps) and broke

   Cilium's datapath. With the filter, the agent attaches only to the

   node uplink, keeping memory at the ~30-50 MB baseline.

Cargo.toml: uncomment the dendrite path overrides so the matching

dendrite-side map shrinks reach this build. Drop without dendrite branch

`feat/bpf-map-shrink` checked out under ../dendrite.

docs/CONFIGURATION.md + docs/KUBERNETES.md: document the new auto-filter

semantics and the "synapse refuses to clobber a foreign XDP program"

safety. .gitignore: ignore *_token files used during local docker builds.
… UDS bridge

Three independent demo patches that unlock the WAF feature surface

behind GCP Global LB and on the edge-passthrough path. All shipped

and verified end-to-end against gcp-nlp-l4 (test-a harness PASS 37/0,

edge UDS bridge 30 FpObserved/5s observed). Empty defaults preserve

current behaviour; each is opt-in via config.

--- A. effective_client_ip — XFF-aware per-real-client rate-limit & ip.src

`rate_limit::check_rate_limit` keys on the L4 socket peer IP. Behind

any L7 LB that string is the LB IP, so per-IP counters never

accumulate per real client. Patch:

- synapse-utils/src/xff.rs (new) — effective_client_ip(peer, headers,

  trusted_proxies) -> IpAddr. Returns first XFF token when peer is in

  the trusted CIDR list; else returns peer (spoofing-defence). 9 unit

  tests cover empty/untrusted/valid/malformed/IPv6/multi-range/loopback.

- synapse-core/src/core/cli.rs — ProxyConfig.trusted_proxies: Vec<String>

  (CIDR strings, default empty).

- synapse-utils/src/structs.rs — AppConfig.trusted_proxies: Vec<(IpAddr,u8)>

  parsed once at config load via parse_ip_or_cidr; warns + skips bad

  entries.

- synapse-proxy/src/proxyhttp.rs — three call-site swaps: WAF context

  build (signal.ip.src), request-phase ratelimit bucket key,

  response-phase ratelimit bucket key. invoke_smart_firewall_block

  retains socket_addr.ip() — kernel drops must target real L4 peer.

--- B. trust_lb_tls_headers — populate signal.ja4 from x-client-tls-ja4

When a TLS-terminating LB injects the JA4 in a request header,

pipe that header into the WAF wirefilter context so the existing

signal.ja4-keyed rules work behind LBs that decrypt upstream.

- synapse-core/src/core/cli.rs — ProxyConfig.trust_lb_tls_headers: bool

  (default false). Deployment-shape gate — only flip on when fronted

  by a known LB that owns the header; on a direct-attached proxy a

  client could spoof the header.

- synapse-waf/src/wirefilter.rs — TRUST_LB_TLS_HEADERS: OnceLock<bool>

  + set_trust_lb_tls_headers() setter. populate_request_context now

  reads `x-client-tls-ja4` into signal.ja4 when the flag is on,

  falling back to empty string. 43 existing WAF tests still pass.

- synapse-app/src/lib.rs — startup wire-up calls the setter from

  config.proxy.trust_lb_tls_headers.

--- C. fp_uds_subscriber — cross-process FpObserved bridge over UDS

Lets a co-located synapse-agent (XDP, captures JA4 etc.) ship its

observations to a TLS-terminating synapse-proxy that has no XDP of

its own. Most plumbing already existed in synapse-eventbridge —

send_fp_event() had a FP_SOCKET_CLIENTS broadcast path but no

listener was registering with it. Two-line publisher fix + a new

subscriber module:

- synapse-eventbridge/src/event_server.rs — set_fp_socket_clients()

  call so the existing event server fans out FpObserved alongside

  HTTP and Packet.

- synapse-eventbridge/src/fp_uds_subscriber.rs (new) — proxy-side

  client: connect to UDS, read JSON-line SocketEvent::Fp envelopes,

  republish onto the local bus via send_fp_event. Reconnects on

  EPIPE / connect failure with 2 s backoff. Unix-only (Windows stub).

  2 unit tests: end-to-end pump + missing-socket retry safety.

- synapse-eventbridge/src/lib.rs — pub mod fp_uds_subscriber + a

  pub use synapse_events::{FpObserved, FpSource} re-export.

- synapse-eventbridge/Cargo.toml — tempfile dev-dep for the test.

- synapse-core/src/core/cli.rs — ProxyConfig.fp_event_bridge: { socket_path:

  String } (empty = subscriber disabled, default).

- synapse-app/src/lib.rs — startup spawns the subscriber thread when

  the socket_path is non-empty AND is_agent_mode is false, with a

  shutdown atomic tied to the main shutdown_rx watch.

- synapse-security/src/utils/fingerprint/tcp_fingerprint.rs — the BPF

  ClientHello-store path previously emitted nothing on the FpObserved

  bus (only synapse-app::kernel_pump did, and only for PacketEvent).

  Surfaced during edge-cluster deployment as a silent bridge. Added

  the missing send_fp_event() call right after the existing

  "JA4: stored eBPF-captured ClientHello fingerprint" log line so

  BPF-sourced captures are visible to cross-process subscribers.
…pin reuse

When BPF map shapes change between releases (e.g. the

AFXDP_FLOW_DENY_MAX shrink from 1048576 → 65536 in commit 7334be2),

libbpf reuse-by-name fails OpenSkel::load() with EINVAL because

kernel 6.12+ strictly validates pinned-map properties (max_entries,

map_flags, value_size) on reuse. The agent then silently falls back

to the XDP/BPF-only backend, AFXDP_FW_HANDLE stays None, and

`synapse_smart_firewall::ja4_reload::apply_ja4_kernel_blocks`

no-ops — kind: ja4 entries in smart_firewall_rules.block never

reach the kernel.

Surgical EINVAL recovery on `.load()`:

- synapse-security/src/utils/bpf_utils.rs (+38 LOC):

  `unpin_stale_pinned_maps(pin_root, names)` — best-effort

  remove_file for each name under pin_root; silent on NotFound,

  WARN on other I/O errors. Surgical name list (not

  remove_dir_all) is required because pin_root may host other

  pins that the same process owns and reuses (e.g.

  xdp_link_<iface>).

- synapse-app/src/lib.rs (refactor load_afxdp_tail_program):

  factor the builder→open→load triple into a closure so the retry

  can re-leak a fresh MaybeUninit<OpenObject> (libbpf consumes it

  per attempt). On EINVAL (string-match os error 22 or

  Invalid argument — libbpf-rs surfaces no structured kind),

  invoke the helper with TAIL_PIN_NAMES = [xsks_map, flow_deny,

  afxdp_intercept_stats] (the three LIBBPF_PIN_BY_NAME maps that

  xdp_afxdp_tail.bpf.c declares) and retry once. Any non-EINVAL

  error or a second EINVAL bubbles to the caller.

Mirrors the B.4 retry shape in xdp_pipeline (lib.rs:1226-1260) but

diverges in cleanup mechanism: B.4 uses remove_dir_all, which is

safe THERE because B.4 runs before xdp_link_<iface> is pinned;

the tail loader runs AFTER, so a directory-wide wipe would orphan

the live XDP attachment. Unifying both call sites is a deferred

follow-up.

Verified live on the gcp-nlp-l4 edge cluster: pre-fix, the agent

logged `afxdp tail program load failed: ... (os error 22)`; after

manually removing the stale /sys/fs/bpf/synapse/firewall/flow_deny

pin AND enabling firewall.amygdala.enabled=true in config-agent.yaml,

agent logs `amygdala_bridge started (afxdp backend, iface=eth0)`,

`ja4 kernel blocks reloaded: +6 / -0 (total now 6)`. With this

commit, future map-shape changes self-heal without operator

`rm` intervention.
…les loaded

BPF tail program (xdp_afxdp_tail.bpf.c) used to unconditionally redirect

every TLS ClientHello to the userspace amygdala worker for JA4 inspection.

When no `kind: ja4` rules are loaded, BlockedJa4 is empty, so the worker

has no DROP verdict to issue; its default "allow" path bounces the frame

back out via AF_XDP TX. On a host-firewall topology (the packet's dst IP

is THIS host), that ships the frame back to the upstream gateway, which

black-holes it. Net result: every external TLS handshake silently times

out whenever the AF_XDP backend is running but the rule set is empty.

Verified live on the gcp-nlp-l4 edge cluster (a.g0s.dev): curl from

external Linux clients to :443 timed out for hours, while in-cluster

TLS to the same node:443 succeeded; scaling synapse-agent to 0 replicas

immediately restored end-to-end connectivity. The agent BPF stack was

the sole source of the drop, despite UC4/UC7 rule maps being empty.

Fix — a runtime kill-switch in the BPF tail:

- New `afxdp_intercept_cfg` ARRAY map (1 entry, pinned-by-name).

  Key 0 holds a u32 `enabled` flag.

- Tail program reads the flag AFTER the flow_deny LRU check (so

  residual blocked flows still drop during rule removal) and

  BEFORE the AF_XDP redirect. If the lookup misses OR returns 0,

  return XDP_PASS — packets flow into the local kernel stack

  unmolested.

- Default state: map empty → BPF lookup returns NULL → treated

  as enabled=0 → pass. Fresh agent boot is fail-safe.

Userspace plumbing (3 files):

- synapse-utils/src/hooks.rs (+35 LOC): new hook pair

  `set_afxdp_intercept_enabled_hook` + `invoke_afxdp_intercept_enabled(bool)`

  mirroring the existing ja4_kernel_block_reload hook pattern.

- synapse-smart-firewall/src/ja4_reload.rs: after each BlockedJa4

  diff applies, calls `invoke_afxdp_intercept_enabled(!wanted.is_empty())`

  so the flag tracks rule-set non-emptiness on every config reload.

  Plus +1 dep in Cargo.toml (synapse-utils).

- synapse-app/src/lib.rs (+37 LOC): registers the hook handler.

  Opens `/sys/fs/bpf/synapse/firewall/afxdp_intercept_cfg` via

  libbpf_rs::MapHandle::from_pinned_path and updates key 0. Quietly

  skips when the pin isn't available yet (early startup, before the

  AF_XDP tail program has loaded).

Fail-safe properties:

- Cold boot (no rules) → traffic flows.

- Tail program not loaded → hook no-ops cleanly.

- AF_XDP backend not built → no hook registered, no-op.

- Rule removal mid-flight → flow_deny check still drops residual

  blocked flows (it runs BEFORE the kill-switch).

- Stale enabled=1 after worker death → bpf_redirect_map falls back

  to its third-arg XDP_PASS when no AF_XDP socket is bound, so the

  redirect is safe in the absence of a userspace consumer.

Follow-up: same bug exists in amygdala's standalone-attach variant

(amygdala/src/bpf/afxdp_intercept/intercept.bpf.c — used when

afxdp_attach_mode != "shared"). Will land as a parallel commit in

amygdala.
After agent restart, the pinned `afxdp_intercept_cfg` BPF map

survives but the in-memory `previous` HashSet in apply_ja4_kernel_blocks

is fresh-empty. If the new YAML has no `kind: ja4` entries, the diff

comes out empty (wanted={} = previous={}) and the function returns

early WITHOUT updating the kill-switch flag — leaving a stale

enabled=1 from the previous generation. The BPF tail then redirects

ClientHellos to a worker whose BlockedJa4 set is also fresh-empty,

so every TLS handshake stalls.

Resync the kill-switch unconditionally before the early-return. The

map update is a single 4-byte syscall and idempotent — cheap to

always run.
…d-proxy topologies

When a synapse-proxy sits behind another synapse-proxy with a

GFE/L7 LB in between (client → edge synapse → Tier-2 LB → Tier-2

synapse), the Tier-2 LB injects X-Client-Tls-Ja4 carrying the

EDGE synapse's JA4 (it terminated TLS with the LB), not the

original client's. Patch B as shipped read X-Client-Tls-Ja4

unconditionally, so signal.ja4 at Tier-2 came out as the edge's

JA4 — worthless for client identification.

Edge synapse-proxy already forwards the real client's JA4 via

X-JA4 / X-JA4-Raw headers when forward_fingerprints=true (set in

proxyhttp.rs:1733). Prefer those upstream-synapse-forwarded

headers over the LB-injected one. Both still gated by the

trust_lb_tls_headers flag so directly-exposed proxies can't be

spoofed.

Also populate signal.ja4_raw_unsorted from X-JA4-Raw — was

always empty before.
The captcha-client init block (init_captcha_client +

start_cache_cleanup_task) lived inside the api_configured

branch. LOCAL MODE deployments — proxy.captcha.{site_key,

secret_key,jwt_secret} populated from a YAML file with no

API key — therefore had Turnstile/hCaptcha keys loaded but

the client never wired up: startup_complete logged

`captcha_client: false`, and any WAF rule with action:

captcha failed at request time.

Move the init below the if/else so it runs for both modes.
WafAction::Challenge was passing peer_addr.ip() to

validate_captcha_token + generate_captcha_token. Behind a

GFE LB (or any front-end that rotates peers) the L4 peer

changes per request, so the JWT issued on attempt 1 fails

the ip_address check on attempt 2 — captcha loops forever

even when Turnstile solving succeeds and the token is marked

validated in Redis.

Use synapse_utils::xff::effective_client_ip against

self.config.trusted_proxies (same pattern Patch A uses for

signal.ip.src). The XFF first-hop is stable across GFE/NLB

rebalancing while still spoof-resistant for direct clients.
Three layered hardenings on the captcha bearer cookie:

1. JA4 binding. `CaptchaClaims.ja4_fingerprint` (already part of

   the struct, previously always None) is now populated at issue

   time from `_ctx.tls_fingerprint.ja4` with `X-Client-Tls-Ja4`

   fallback for LB-terminated TLS. `validate_token` compares the

   claim against the request-time JA4 and rejects on mismatch.

   Defeats replay-from-same-NAT-with-spoofed-UA attackers who

   can mimic IP+UA but not the victim TLS stack signature.

2. Cookie Max-Age now tracks the JWT exp via the new

   `captcha_token_ttl_seconds()` helper, instead of hardcoding

   3600. Drops the default TTL from 2h (7200) to 10min (600) in

   the demo config — shrinks the replay window without breaking

   typical sessions.

3. Opt-in one-shot mode via `proxy.captcha.one_shot: true`.

   After a successful cookie-bypass, `revoke_captcha_token` is

   called on the JTI, blacklisting it in Redis. The next request

   through the same `challenge`-action rule re-challenges. For

   sensitive paths only; off by default so normal session UX is

   unchanged.
Global proxy.captcha.one_shot is a single switch; ops need finer

control to keep normal session UX on most challenge-action rules

while making specific sensitive paths (password reset, payment

confirm) revoke-after-first-bypass.

WafResult grows a `one_shot: bool` field, parsed from the rule's

`config.oneShot` JSON at compile time (only honored when action

is Challenge; ignored elsewhere). proxyhttp.rs combines per-rule

+ global flags via OR so either knob can turn it on.
…imit field

Per-rule Challenge configs without a rateLimit field (e.g. just

{ "oneShot": true }) used to spam ERROR-level "rateLimit field not

found" logs at compile time. The rule still worked but the log

was noise. Pre-filter for the field before invoking from_json.
Mechanical cleanup so this branch is wellness-check-clean before
opening the PR:

  * cargo fmt --all — reformats 8 files touched earlier in the
    session (xff helpers, fp_uds_subscriber, AF_XDP killswitch,
    captcha JA4 binding, etc.). Whitespace + trailing-comma
    canonicalisation only; no behaviour change.

  * Replace two bare `.unwrap()` calls in test code that the
    repo bans via `clippy::disallowed_methods`:

      - synapse-eventbridge/src/fp_uds_subscriber.rs (test
        publisher) — `.unwrap()` on a hard-coded `1.2.3.4`
        IpAddr parse → `.expect("hard-coded literal IPv4
        parses")`.

      - synapse-utils/src/xff.rs (test fixture builder) —
        `.unwrap()` on `HeaderValue::from_str` of a test XFF
        literal → multi-line `.expect("test XFF literal must
        be valid header value")`.

  * Add the missing `one_shot: false` field to five `WafResult
    { … }` struct literals in
    synapse-worker/src/access_log.rs — a fall-out from the
    earlier `feat(waf): per-rule one_shot opt-in via
    rule.config.oneShot` commit which extended the struct shape
    but missed these five test-only construction sites.

Verified locally on Linux with CI-exact flags:

    cargo fmt -- --check                                       OK
    cargo clippy --locked --workspace --all-targets -D warnings OK
    cargo test --locked --workspace --lib                      OK
    (33 tests in synapse-worker — including the 5 access_log
    sites — all green)

Windows leg is covered by the matrix job (.github/workflows/
windows-build.yaml) once the branch is pushed.
…ling checkout

The `[patch.gen0sec]` block had four `dendrite{,-core,-linux,-windows}
= { path = "../dendrite/crates/<name>" }` entries left uncommented from
local source-edit work. CI runners check out only the synapse repo,
not the dendrite sibling, so cargo fails on PR #324 with:

    error: failed to load source for dependency `dendrite`
    Caused by: unable to update D:\a\synapse\dendrite\crates\dendrite
    Caused by: failed to read `…\dendrite\crates\dendrite\Cargo.toml`
    Caused by: The system cannot find the path specified.

(visible on the Windows MSI build job — step 7 `Build static exe
(no-default-features)`, run 26232111796). Same failure mode would
hit the Linux build/test jobs once they reached the cargo invocation.

Comment all four lines back out so the registry-version dendrite
0.1.2 resolves cleanly. Cargo.lock is refreshed at the same time to
drop any path-source references and pin firmly to the registry
revision. Local dev that needs the source-level dendrite edits
uncomments these lines temporarily — the comment block now spells
out the gotcha explicitly so future runs do not repeat it.

Verified locally with CI-exact flags after the change:

    cargo check --workspace                                    OK
    cargo clippy --locked --workspace --all-targets -D warnings OK
…ilds

The HashMap is populated + drained inside `#[cfg(all(unix, feature
= "bpf"))]` blocks. On the Linux runner those blocks compile so the
variable is live. On the Windows MSI / windows-build.yaml job the
cfg gate elides every reader, leaving the declaration unused →
`-D warnings` trips the windows-build clippy step at lib.rs:918.

Match the established pattern a few lines further down where
`bpf_startup_ok` (same shape: declared unconditionally, read only
under the cfg gate) carries `#[allow(unused_mut, unused_variables)]`.
Cheaper than wrapping the declaration in another cfg block — the
binding is needed when the cfg is on, and the allow keeps the diff
narrow.

Verified locally (Linux):
    cargo fmt -- --check                                          OK
    cargo clippy --locked --workspace --all-targets -D warnings   OK

Windows fix verified at next CI run.
The multi_variant_subscriber_writes_distinct_event_types test spawns
a tokio task that holds a `BufWriter` against a real on-disk file
and depends on the `tokio::time::interval(100ms)` flush tick firing
before the test reads the file back. Under `cargo miri test
--no-default-features` (the wellness-check.yaml miri job, run on
synapse-core / -events / -eventbridge / -log) the runtime/timer
combination + Miri's lack of faithful FS modelling means the
BufWriter is never flushed before the assertion runs, so the test
sees `lines == []` and fails:

    thread '…multi_variant_subscriber_writes_distinct_event_types'
    panicked at crates/synapse-core/src/logger/eventbridge_log.rs:185:9:
    expected a fingerprint line, got: []

Apply the same `#[cfg_attr(miri, ignore)]` gate the repo already
uses in 8 sites (`utils::http_client` for FFI-touching tests,
`core::cli` for serial_test-using tests). The test still runs as
part of the regular `cargo test --workspace --lib` job (verified
locally: passes in 0.25s).

The unit-test coverage of the broadcast → file pipeline is not lost
— `cargo test` covers the real path. Miri's value is on the pure
in-memory logic in this same module set (e.g. WAF rule plumbing,
JA4 parsers), which still runs.
…under miri

Second wave of the same class of Miri incompatibility — the first
thing `start_fp_uds_subscriber`'s background thread does is
`UnixStream::connect(&socket_path)`, which calls
`libc::socket(AF_UNIX, SOCK_STREAM, 0)`. Miri models only AF_INET
and AF_INET6 sockets and aborts:

    error: unsupported operation: socket: domain 0x1 is unsupported,
           only AF_INET and AF_INET6 are allowed.
       at fp_uds_subscriber.rs:88 (run_loop)
       spawned from fp_uds_subscriber.rs:67 (start_fp_uds_subscriber)
       thread name: fp-uds-sub

Apply the same `#[cfg_attr(miri, ignore)]` gate, matching the
convention now spanning four files in this PR (eventbridge_log,
utils::http_client, core::cli, and now fp_uds_subscriber).

The UDS bridge still has coverage via:
  * `cargo test --workspace --lib` (wellness-check unit-tests job)
  * The e2e-root job (real socket end-to-end with `socat`)
  * Live verification in the gcp-nlp-l4 demo cluster
    (30 envelopes/5 s observed against the agent socket)

The remaining Miri scope still exercises the pure-Rust eventbridge
broadcast plumbing + the JSON decode path in `run_loop`'s body
when called directly without the AF_UNIX dependency.
Picks up `fix(afxdp): kill-switch for standalone intercept
program — parity with synapse tail` (gen0sec/amygdala#5,
published as gen0sec/amygdala@v0.1.1 on 2026-05-21).

Pairs with the synapse-side AF_XDP kill-switch commits already on
this branch (2b3d329, a94ae0f, 7bff008): synapse owns the
`xdp_afxdp_tail` program, amygdala owns the standalone
`AttachedIntercept` program, both now `XDP_PASS` when no JA4 rules
are loaded so the kernel intercept does not stall traffic when
the userspace consumer has nothing to drain.

Touched files:
  * crates/synapse-idp/Cargo.toml             (Windows / NDIS path)
  * crates/synapse-smart-firewall/Cargo.toml  (Linux + Windows)
  * crates/synapse-security/Cargo.toml        (Reactor + JA4)
  * Cargo.toml                                (workspace-deps comment)
  * Cargo.lock                                (registry pin + checksum)

Verified locally with CI-exact flags:
    cargo fmt -- --check                                           OK
    cargo clippy --locked --workspace --all-targets -D warnings    OK
    cargo test --locked --workspace --lib                          OK
This pub fn had zero callers anywhere in the workspace. The L3/L4
enforcement path (apply_rules / apply_rules_nftables /
apply_rules_iptables) only reads rule.block.{ips,country,asn} and
never consults rule.allow.*, so the helper was unreachable from
the kernel ban path.

Note: access_rules.allow remains a config schema field used by
the TUI event bus (synapse-app, terminal_client) for display
transport only. No L3/L4 allow-precedence semantics are implied.
Operators who need allow-precedence should rely on proxy-mode
WAF rules instead.
…es + iptables)

Before this change the agent topology silently accepted
access_rules.allow.* entries in config but never enforced them. The
XDP program, nftables chain, and iptables chain only consulted
access_rules.block.*; an IP in only the allow list got no protection
from a runtime ban, and an IP in both allow and block was blocked.
Operators expecting allow-precedence (e.g. "always let our corp VPN
through even if threat-intel flags an IP in that range") had no way
to express it in agent mode — only the proxy / WAF topology had a
working allow path (wirefilter L7 rules).

This wires allow as a true override at every kernel-side layer:

XDP path
- New  /  LPM-trie maps in xdp_maps.h.
  Same pinning + size as the existing banned_ips maps so they
  survive a process restart and grow lazily.
-  now checks allowed_ips BEFORE banned_ips: a hit
  short-circuits to XDP_PASS, defeating any banned-ip entry from
  config or from amygdala / WAF runtime ban.
- SYNAPSEFirewall::{allow_ip,unallow_ip,allow_ipv6,unallow_ipv6}
  push deltas to the LPM maps;  mirror
   for the pinned-map reconcile on
  startup.

nftables path
-  sets created in the synapse table, with
   rules inserted before the existing drop rules in
  . nft evaluates rules in order so accept wins.

iptables path
- Allow rules go in as  at position 1 of
  SYNAPSE_BLOCK so they run before any DROP rule and the packet
  falls back to INPUT for the rest of the host policy.

apply_rules (every backend) now parses access_rules.allow.{ips,
country,asn} alongside block, diffs against per-process previous
state, and pushes the deltas via the new Firewall-trait methods.
Allow state is reconciled against the live kernel map once per
process (same crash-survivability pattern as block).

Userspace predicate updated:
-  is restored (the dead-function
  removal from the previous commit is reverted — it stops being
  dead the moment we wire it).
-  checks allow FIRST, so a
  runtime ban (DashSet) on an allow-listed IP also reports
  not-blocked.

Tests (synapse-access-rules, 8 new):
- direct IPv4 match, CIDR match, country list, ASN list
- allow overrides block at same IP
- allow beats runtime blocklist
- block-only IP is still blocked
- IPv6 CIDR allow overrides IPv6 block
Serialized with serial_test because they mutate the
process-global Config singleton.

Wellness check (matches .github/workflows/wellness-check.yaml) all
green locally: fmt, machete, clippy --workspace --all-targets
(default + classifier feature), cargo doc with -D warnings, and
cargo test --workspace --lib.
One-shot example binary that wires Config -> Logs SDK -> OTLP/HTTP
the same way Synapse does in production and emits five synthetic
BlockEvents covering every (layer, action) pair the block-counter
buckets on. Used to validate the full Synapse -> telemetry-api ->
kafka -> workflow chain locally.

Run from the repo root:

  cargo run --example e2e_emit \
    --manifest-path crates/synapse-telemetry/Cargo.toml -- \
    <telemetry-api-url> <api-key>

The example uses block_on + tokio::time::sleep so the
BatchLogProcessor's flush task actually runs before the runtime
tears down (std::thread::sleep on a current-thread runtime would
block the executor and silently drop the batch).
Until now the OTLP Logs and Metrics exporters reported themselves to
the platform as 'OTel OTLP Exporter Rust/<otel-version>' — the
opentelemetry-otlp default. The worker's config-poll client (built
via synapse_core::utils::http_client) already sends
'Synapse/<CARGO_PKG_VERSION>'. Two outbound clients, two agent
identities — a paper cut in access-log correlation and an awkward
extra entry for any allowlist the platform side maintains.

Unified: a new crate::otlp::apply_synapse_headers helper sets
User-Agent to synapse_user_agent() ('Synapse/<version>') on every
OTLP exporter builder before threading the caller's headers
(typically Authorization) through. logs_sdk::init and metrics::init
both go through it. A caller-supplied User-Agent in OtlpConfig.headers
still overrides — useful when a tenant routes through a proxy that
requires a specific string.

Workaround for an upstream bug: opentelemetry-otlp 0.27's
WithHttpConfig::with_headers uses Option::iter_mut().zip(headers)
where the Option has at most 1 element. The zip yields exactly one
pair regardless of how many entries the caller supplies, so every
header after the first is silently dropped. apply_synapse_headers
walks around that by calling with_headers once per single-entry
HashMap, so each call actually inserts. Re-evaluate when we move
past 0.27.

Verified end-to-end on the local stack: the
synapse-telemetry::e2e_emit example POSTs to telemetry-api, the
access log now reports user_agent='Synapse/0.1.0' (was 'OTel OTLP
Exporter Rust/0.27.0'), Authorization is still validated, the
event reaches Kafka, and the workflow-service block counter writes
the bucket with every dimension.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants