Release 2026.4.2: ant-node 0.11.0 by jacderida · Pull Request #79 · WithAutonomi/ant-node

jacderida · 2026-04-24T19:55:59Z

Summary

Promote RC branch rc-2026.4.2 to main
Bump ant-node from 0.11.0-rc.2 to 0.11.0
Upgrade saorsa-core from the RC branch git dependency back to the published crates.io 0.24.0 release (which pulls in saorsa-transport 0.33.0)
Refresh Cargo.lock

🤖 Generated with Claude Code

Merkle verification never tied the 16 candidate pub_keys to the pool midpoint's actual close group on the network. An attacker could generate 16 ML-DSA keypairs locally, point every reward_address at a self-owned wallet, fund the on-chain merkle payment, and receive their own rewards back. Add a closeness defence: for each MerklePaymentProof, the verifier queries the DHT for the closest peers to the winner_pool's midpoint address and requires a majority of the 16 candidate pub_keys to hash to PeerIds that appear in that returned set. Hooked in between the cheap signature check and the on-chain RPC so a forged pool is rejected without paying for an on-chain round-trip. Every storing node that receives the proof runs the same check against the single winner_pool. PaymentVerifier grows an optional Arc<P2PNode> attached at startup via attach_p2p_node; node.rs, devnet.rs, and the e2e testnet harness all wire it after P2PNode construction. Unit tests that don't exercise merkle verification keep a None handle and log a warning. New e2e test test_attack_merkle_pay_yourself_fabricated_pool fails on main (verifier accepts 16 attacker-generated keys all rewarding the payer) and passes with the fix.

Security review flagged three issues with the initial closeness check: 1. **Fail-open on missing P2PNode** was unsafe in production: any startup path that forgot to call attach_p2p_node would silently disable the defence and only log a warning. Fail CLOSED in release builds — the warn->error and Ok->Err flip surfaces the wiring bug as a rejected PUT rather than a silent accept. Test-utils builds keep fail-open so unit tests don't need a real DHT. 2. **9/16 majority was too permissive**: an attacker controlling 7 neighbourhood peers could fabricate the remaining 7 and still pass. Raise to 13/16 — tolerates normal routing-table skew (1-3 peers) without letting an attacker plant 7 fabricated candidates. 3. **Per-chunk Kademlia lookups were a DoS amplifier**: a 256-chunk batch sharing one winner_pool would trigger 256 iterative lookups, and N concurrent forged-pool PUTs for the same hash would trigger N parallel lookups. Add: - closeness_pass_cache: pool_hash -> () so within-batch repeats and cross-connection replays are free after the first check. - inflight_closeness: pool_hash -> Notify so concurrent PUTs for the same pool coalesce to a single lookup behind a drop-guarded waker. Also bump the per-lookup timeout from 10s to 60s. Iterative Kademlia lookups with dial cascades of 20-30s per unresponsive peer were timing out under normal churn at 10s — false-rejecting honest proofs. 60s keeps DoS bounded while being wide enough for real-world convergence. Document the known Sybil-grinding limitation: midpoint_proof.address() is BLAKE3 of attacker-controllable inputs, so an attacker who also runs Sybil DHT nodes can grind it to land in their own neighbourhood. Closing that gap needs a Sybil-resistance layer or an attacker-uncontrolled midpoint binding (on-chain VRF / block hash).

… single-flight Second concurrency review flagged three real problems with the single-flight closeness check: 1. **Lost-wakeup race**. `notify.notified().await` creates the Notified future AFTER the inflight lock is released. If the leader calls `notify_waiters()` between the waiter's lock drop and the future's first poll, the notification is lost and the waiter parks forever — there was no outer timeout around the wait. Fixed by switching to `Arc<Notify>::notified_owned()`, which snapshots the `notify_waiters` counter at call time; the counter check on the first poll resolves the future immediately even if the leader already fired. 2. **LRU eviction orphans waiters**. When `DEFAULT_POOL_CACHE_CAPACITY` (1000) unique pool_hashes are in flight, LruCache evicts the oldest `Notify` entry. The leader's drop guard did `slot.pop(&pool_hash)` which, for an evicted entry, popped nothing and called `notify_waiters` on — nothing. Worse, a subsequent leader for the same pool_hash inserts a fresh `Notify`, which the original leader then erroneously pops. Both failure modes orphan waiters. Fixed by having the leader own an `Arc<ClosenessSlot>` independent of the cache and calling `notify_waiters` on THAT; the pop only runs if `Arc::ptr_eq` confirms the cache still holds our slot. 3. **Failure-path waiters each re-run the lookup**. On leader failure the pass cache wasn't written, waiters woke, saw cache miss, and all became new leaders running their own Kademlia lookups. That defeats the documented DoS bound of "N forged PUTs cost at most one lookup". Fixed by sharing the leader's `Result<(), String>` through an `Arc<ClosenessSlot>` with a `OnceLock`; waiters read the stored result directly and skip re-verification on both success AND failure. Other changes: - Loud startup error log if a binary is accidentally built with the `test-utils` feature, since that flips the closeness defence to fail-open when P2PNode isn't attached. - Two new unit tests exercise the cache short-circuit and the concurrent-waiter single-flight path.

Third review pass found two remaining issues: 1. **Unbounded retry loop on leader cancellation cascade**. The loop retries forever if every leader gets cancelled before publishing. Add `MAX_LEADER_RETRIES = 4` so waiters return a visible error rather than spinning through indefinite cancellation cycles. 2. **Single-flight test didn't actually exercise the waiter path**. The existing concurrent-readers test relies on timing and might run both calls serially through the leader path. Add `closeness_waiter_reads_leaders_published_failure`, which seeds the inflight slot, spawns a waiter task, yields until the waiter is parked on `notified_owned().await`, then externally publishes a failure + notifies — proving the waiter reads the leader's published result directly without running its own inner check.

Point saorsa-core at the saorsa-labs/saorsa-core rc-2026.4.2 branch and refresh Cargo.lock to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(payment): block merkle pay-yourself via DHT closeness check

The previous bump set the version to 0.24.0-rc.1 by mistake; this restores the intended 0.11.0 RC series for rc-2026.4.2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pick up the latest saorsa-core commit on the rc-2026.4.2 branch and refresh Cargo.lock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Switch saorsa-core from the RC branch git reference back to the published 0.24.0 release on crates.io (which pulls in the published saorsa-transport 0.33.0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Release promotion to ant-node 0.11.0, switching saorsa-core back to the crates.io 0.24.0 release and refreshing the lockfile, while adding a merkle-payment “pay-yourself” defence that verifies candidate closeness against the live DHT (and wiring P2P into the verifier across runtime paths).

Changes:

Bump ant-node to 0.11.0, update saorsa-core to 0.24.0 (crates.io), and refresh Cargo.lock.
Add merkle candidate “closeness” verification in PaymentVerifier with caching/single-flight to reduce lookup cost.
Wire P2PNode into the payment verifier in production/devnet/e2e startup and add an e2e test for the fabricated-candidate-pool (“pay-yourself”) attack.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`src/payment/verifier.rs`	Implements DHT-based merkle candidate closeness checks plus caching/single-flight and associated unit tests.
`src/node.rs`	Attaches the live `P2PNode` to the payment verifier during node startup.
`src/devnet.rs`	Attaches `P2PNode` to the payment verifier for devnet nodes.
`tests/e2e/testnet.rs`	Attaches `P2PNode` to the payment verifier in the e2e test harness startup.
`tests/e2e/merkle_payment.rs`	Adds an e2e attack test for fabricated merkle candidate pools (“pay-yourself”).
`Cargo.toml`	Bumps crate version and switches `saorsa-core` from a git dependency to `0.24.0` on crates.io.
`Cargo.lock`	Lockfile refresh consistent with dependency/version updates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-24T20:00:57Z

+        // list counts as unmatched.
+        let network_set: std::collections::HashSet<PeerId> =
+            network_peers.iter().map(|n| n.peer_id).collect();
+        let matched = candidate_peer_ids


The closeness match count can be artificially inflated by duplicate candidates. matched is computed by iterating all candidate PeerIds and counting set membership, but it does not enforce that candidate pub_key/PeerId values are unique. An attacker who runs (or grinds) a single real close peer could repeat that same pub_key across many of the 16 candidates and satisfy the CANDIDATE_CLOSENESS_REQUIRED threshold without having 13 distinct close peers. Consider rejecting pools with duplicate derived PeerIds and/or counting unique matches (e.g., intersect the set of candidate PeerIds with the network set).

Suggested change

// list counts as unmatched.

let network_set: std::collections::HashSet<PeerId> =

network_peers.iter().map(|n| n.peer_id).collect();

let matched = candidate_peer_ids

// list counts as unmatched. Candidate pub_keys / PeerIds must also be

// unique, otherwise duplicate entries could artificially inflate the

// closeness match count.

let network_set: std::collections::HashSet<PeerId> =

network_peers.iter().map(|n| n.peer_id).collect();

let candidate_set: std::collections::HashSet<PeerId> =

candidate_peer_ids.iter().copied().collect();

if candidate_set.len() != candidate_peer_ids.len() {

debug!(

"Merkle closeness rejected: duplicate candidate PeerIds for pool midpoint {}",

hex::encode(pool_address.0),

);

return Err(Error::Payment(

"Merkle candidate pool rejected: candidate pub_keys must derive to distinct \

PeerIds. Duplicate candidates are not allowed."

.into(),

));

}

let matched = candidate_set

Copilot · 2026-04-24T20:00:57Z

+    /// In-flight closeness lookups, keyed by pool hash. Lets concurrent PUTs
+    /// for the same pool coalesce onto a single Kademlia lookup AND share
+    /// its result — on both success and failure — which bounds `DoS`
+    /// amplification to one lookup per unique `pool_hash` regardless of
+    /// concurrency.
+    inflight_closeness: Mutex<LruCache<PoolHash, Arc<ClosenessSlot>>>,


inflight_closeness is stored in an LruCache, which can evict entries under pressure. If an inflight slot for a given pool_hash is evicted before the leader finishes, new callers won’t find it and will start a second lookup for the same pool_hash, breaking the single-flight/DoS-amplification guarantee described in the comment. Using a non-evicting map for inflight state (and removing on completion), or a dedicated structure sized by actual concurrency, would preserve single-flight semantics for a pool_hash.

Suggested change

/// In-flight closeness lookups, keyed by pool hash. Lets concurrent PUTs

/// for the same pool coalesce onto a single Kademlia lookup AND share

/// its result — on both success and failure — which bounds `DoS`

/// amplification to one lookup per unique `pool_hash` regardless of

/// concurrency.

inflight_closeness: Mutex<LruCache<PoolHash, Arc<ClosenessSlot>>>,

/// In-flight closeness lookups, keyed by pool hash. This must be a

/// non-evicting map so an active leader's slot cannot disappear before it

/// publishes its result; entries are removed explicitly on completion.

/// Lets concurrent PUTs for the same pool coalesce onto a single Kademlia

/// lookup AND share its result — on both success and failure — which

/// bounds `DoS` amplification to one lookup per unique `pool_hash`

/// regardless of concurrency.

inflight_closeness:

Mutex<std::collections::HashMap<PoolHash, Arc<ClosenessSlot>>>,

Copilot · 2026-04-24T20:00:57Z

+    async fn closeness_single_flight_concurrent_readers_share_one_verification() {
+        // Two concurrent callers for the same pool_hash should produce the
+        // same outcome, and the cache should end up populated exactly once.
+        // We use the test-utils fail-open path to short-circuit the inner
+        // DHT lookup; the purpose of this test is the single-flight
+        // plumbing, not the lookup itself.
+        let verifier = Arc::new(create_test_verifier());
+        let pool_hash = [0x77u8; 32];
+        let pool = MerklePaymentCandidatePool {
+            midpoint_proof: fake_midpoint_proof(),
+            candidate_nodes: make_candidate_nodes(1_700_000_000),
+        };
+
+        let v1 = Arc::clone(&verifier);
+        let p1 = pool.clone();
+        let v2 = Arc::clone(&verifier);
+        let p2 = pool.clone();
+
+        let (r1, r2) = tokio::join!(
+            async move { v1.verify_merkle_candidate_closeness(&p1, pool_hash).await },
+            async move { v2.verify_merkle_candidate_closeness(&p2, pool_hash).await },
+        );
+
+        assert_eq!(r1.is_ok(), r2.is_ok(), "concurrent callers must agree");
+        assert!(
+            r1.is_ok(),
+            "both callers must succeed on the test-utils path"
+        );


This test is intended to validate the single-flight behavior under concurrency, but on the current code path the inner check can complete immediately (fail-open under cfg(test) when no P2PNode is attached). In that case tokio::join! may not actually overlap the two calls, so the test can pass even if coalescing is broken. Consider adding an explicit synchronization point (e.g., barrier/notify) so one task becomes the leader and blocks before publishing, while the second task is forced into the waiter path and must observe the leader’s published result.

Copilot · 2026-04-24T20:00:58Z

+    // 4-leaf tree → depth 2 (log2(4)). Prices are all 1024 in
+    // build_candidate_nodes, so per-node payment = 1024 * 2^2 / 2 = 2048.
+    let depth: u8 = 2;
+    let per_node = Amount::from(4096u64);


The comment says the per-node payment should be 1024 * 2^2 / 2 = 2048, but per_node is set to 4096. Overpaying may still make the proof pass (since verification checks >= expected_per_node), but the mismatch makes the test harder to reason about and can hide regressions in the payment formula. Consider either setting per_node to 2048 or updating the comment to match the intended overpayment.

Suggested change

let per_node = Amount::from(4096u64);

let per_node = Amount::from(2048u64);

grumbach and others added 9 commits April 22, 2026 16:57

chore: bump version to 0.24.0-rc.1 for rc-2026.4.2

9c534e2

Point saorsa-core at the saorsa-labs/saorsa-core rc-2026.4.2 branch and refresh Cargo.lock to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merge pull request #77 from grumbach/fix/merkle-pay-yourself-closeness

d4213d5

fix(payment): block merkle pay-yourself via DHT closeness check

chore: correct version to 0.11.0-rc.1

2ba556e

The previous bump set the version to 0.24.0-rc.1 by mistake; this restores the intended 0.11.0 RC series for rc-2026.4.2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore: bump version to 0.11.0-rc.2

943ca2b

Pick up the latest saorsa-core commit on the rc-2026.4.2 branch and refresh Cargo.lock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 24, 2026 19:56

Copilot started reviewing on behalf of jacderida April 24, 2026 19:56 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

jacderida merged commit 9d5cb51 into main Apr 24, 2026
13 of 15 checks passed

jacderida deleted the rc-2026.4.2 branch April 24, 2026 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 2026.4.2: ant-node 0.11.0#79

Release 2026.4.2: ant-node 0.11.0#79
jacderida merged 9 commits intomainfrom
rc-2026.4.2

jacderida commented Apr 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-        // list counts as unmatched.
-        let network_set: std::collections::HashSet<PeerId> =
-            network_peers.iter().map(|n| n.peer_id).collect();
-        let matched = candidate_peer_ids
+        // list counts as unmatched. Candidate pub_keys / PeerIds must also be
+        // unique, otherwise duplicate entries could artificially inflate the
+        // closeness match count.
+        let network_set: std::collections::HashSet<PeerId> =
+            network_peers.iter().map(|n| n.peer_id).collect();
+        let candidate_set: std::collections::HashSet<PeerId> =
+            candidate_peer_ids.iter().copied().collect();
+        if candidate_set.len() != candidate_peer_ids.len() {
+            debug!(
+                "Merkle closeness rejected: duplicate candidate PeerIds for pool midpoint {}",
+                hex::encode(pool_address.0),
+            );
+            return Err(Error::Payment(
+                "Merkle candidate pool rejected: candidate pub_keys must derive to distinct \
+                 PeerIds. Duplicate candidates are not allowed."
+                    .into(),
+            ));
+        }
+        let matched = candidate_set

-    /// In-flight closeness lookups, keyed by pool hash. Lets concurrent PUTs
-    /// for the same pool coalesce onto a single Kademlia lookup AND share
-    /// its result — on both success and failure — which bounds `DoS`
-    /// amplification to one lookup per unique `pool_hash` regardless of
-    /// concurrency.
-    inflight_closeness: Mutex<LruCache<PoolHash, Arc<ClosenessSlot>>>,
+    /// In-flight closeness lookups, keyed by pool hash. This must be a
+    /// non-evicting map so an active leader's slot cannot disappear before it
+    /// publishes its result; entries are removed explicitly on completion.
+    /// Lets concurrent PUTs for the same pool coalesce onto a single Kademlia
+    /// lookup AND share its result — on both success and failure — which
+    /// bounds `DoS` amplification to one lookup per unique `pool_hash`
+    /// regardless of concurrency.
+    inflight_closeness:
+        Mutex<std::collections::HashMap<PoolHash, Arc<ClosenessSlot>>>,

	let per_node = Amount::from(4096u64);
	let per_node = Amount::from(2048u64);

Conversation

jacderida commented Apr 24, 2026

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants