revert: "fix: avoid improper dual way connection attempts" + tests#7289
Conversation
This reverts a57a811 from PR dashpay#6967. `relayMembers` and `connections` in EnsureQuorumConnections are not interchangeable. * `connections`: who this node should connect TO. For each pair (A, B) only the deterministic-outbound side is listed, so the pair results in one TCP connection. * `relayMembers`: who this node should ask to push recovered sigs. For every already-connected MN in the set we send QSENDRECSIGS=true, and the peer then flips m_wants_recsigs=true on its side. For the handshake to happen in both directions, the set must list every other quorum member -- not just the outbound half. After a57a811, only the outbound half is listed, so on the inbound half of each pair m_wants_recsigs stays false. RelayRecoveredSig only pushes QSIGREC to peers with m_wants_recsigs=true, so half of all proactive recovered-sig pushes are silently dropped. This only triggers with spork21 on (IsAllMembersConnectedEnabled returns true). In this case both the path that uses the half-mesh outbound subset and the path that relies on proactive QSIGREC. This fixes the functional test `feature_llmq_signing.py --spork21`, which times out in wait_for_sigs ~60% of the time while the non-spork21 variant passes on the same CI job.
✅ No Merge Conflicts DetectedThis PR currently has no conflicts with other open PRs. |
|
✅ Review complete (commit c1cdb75) |
WalkthroughThe Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Clean revert restoring the distinction between outbound connections and full-mesh relayMembers in EnsureQuorumConnections, fixing intermittent feature_llmq_signing.py --spork21 failures. No correctness regressions identified. Two non-blocking suggestions about test coverage and documentation.
Reviewed commit: 0e33aa2
🟡 1 suggestion(s) | 💬 1 nitpick(s)
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `src/llmq/utils.cpp`:
- [SUGGESTION] lines 803-826: No regression guard against re-introducing the conflation bug
This revert fixes an intermittent failure in `feature_llmq_signing.py --spork21` by restoring the relayMembers logic, but adds no targeted assertion that would catch the same regression in the future. The bug is silent — signatures still arrive eventually via fallback paths, just slowly enough to flake the test — so the existing functional test only catches it ~60% of the time per the commit message. A more direct check (e.g. that `m_wants_recsigs` is set on both sides of every quorum pair, or that the relay-members set is strictly a superset of the outbound-only connections set when spork21 is active) would prevent the same simplification from being merged again. The original PR #6967 was merged precisely because the existing tests didn't reliably catch the conflation.
| Uint256HashSet connections; | ||
| Uint256HashSet relayMembers; | ||
| if (isMember) { | ||
| connections = GetQuorumConnections(llmqParams, sporkman, util_params, myProTxHash, /*onlyOutbound=*/true); | ||
| // If all-members-connected is enabled for this quorum type, leverage the full-mesh | ||
| // connections for low-latency recovered sig propagation by treating all members as | ||
| // relay members (instead of the ring-based subset). This ensures peers will send | ||
| // QSENDRECSIGS to each other across the full mesh and set m_wants_recsigs widely. | ||
| if (IsAllMembersConnectedEnabled(llmqParams.type, sporkman)) { | ||
| for (const auto& dmn : members) { | ||
| if (dmn->proTxHash != myProTxHash) { | ||
| relayMembers.emplace(dmn->proTxHash); | ||
| } | ||
| } | ||
| } else { | ||
| relayMembers = GetQuorumRelayMembers(llmqParams, util_params, myProTxHash, true); | ||
| } | ||
| } else { | ||
| auto cindexes = CalcDeterministicWatchConnections(llmqParams.type, util_params.m_base_index, members.size(), 1); | ||
| for (auto idx : cindexes) { | ||
| connections.emplace(members[idx]->proTxHash); | ||
| } | ||
| relayMembers = connections; | ||
| } |
There was a problem hiding this comment.
💬 Nitpick: Document why connections and relayMembers must differ
The detailed rationale for keeping connections (outbound-only half-mesh) separate from relayMembers (every other quorum member, used to drive QSENDRECSIGS handshakes bidirectionally) lives only in the revert's commit message. The existing comment at lines 807-810 explains the all-members-connected branch but not the broader invariant. A short in-source comment near line 803 noting that the two sets are intentionally distinct — and why conflating them breaks recovered-sig propagation on the inbound side of each pair — would make the trap visible to future readers and reduce the chance of the same simplification being attempted again.
source: ['claude']
If relayMembers gets conflated with the outbound-only connections set, one direction of each MN-MN pair never sends QSENDRECSIGS, the inbound side keeps m_wants_recsigs=false, and half of all proactive QSIGREC pushes are silently dropped which could be visible only as slow/flaky signing.
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
Clean revert of a problematic change in EnsureQuorumConnections, restoring the correct distinction between outbound-only connections and full-mesh relayMembers. The new assert_qsendrecsigs_symmetric functional test provides a deterministic regression guard for the bidirectional QSENDRECSIGS handshake. No remaining correctness issues.
Reviewed commit: c1cdb75
thepastaclaw
left a comment
There was a problem hiding this comment.
No blockers from me on the current head; remaining comments are advisory suggestions/nits.
…attempts" + tests c1cdb75 test: assert QSENDRECSIGS handshake is symmetric under spork21 (Konstantin Akimov) 0e33aa2 revert: "fix: avoid improper dual way connection attempts" (Konstantin Akimov) Pull request description: ## Issue being fixed or feature implemented dashpay#7255 Intermittent failure at `feature_llmq_signing.py --spork21` File "feature_llmq_signing.py", line 111, in run_test wait_for_sigs(True, False, True, 15) File "feature_llmq_signing.py", line 60, in wait_for_sigs self.wait_until(lambda: check_sigs(hasrecsigs, isconflicting1, isconflicting2), timeout = timeout) AssertionError: Predicate '' not true after <timeout> seconds ## What was done? This reverts a57a811 from PR dashpay#6967. `relayMembers` and `connections` in EnsureQuorumConnections are not interchangeable. * `connections`: who this node should connect TO. For each pair (A, B) only the deterministic-outbound side is listed, so the pair results in one TCP connection. * `relayMembers`: who this node should ask to push recovered sigs. For every already-connected MN in the set we send QSENDRECSIGS=true, and the peer then flips m_wants_recsigs=true on its side. For the handshake to happen in both directions, the set must list every other quorum member -- not just the outbound half. After a57a811, only the outbound half is listed, so on the inbound half of each pair m_wants_recsigs stays false. RelayRecoveredSig only pushes QSIGREC to peers with m_wants_recsigs=true, so half of all proactive recovered-sig pushes are silently dropped. This only triggers with spork21 on (IsAllMembersConnectedEnabled returns true). In this case both the path that uses the half-mesh outbound subset and the path that relies on proactive QSIGREC. ## How Has This Been Tested? This fixes the functional test `feature_llmq_signing.py --spork21`, which times out in wait_for_sigs. It reduced amount of failures on my localhost from ~50% to almost 0. Added new functional test in feature_llmq_signing.py for --spork21 case; which fails as expected without this patch. ## Breaking Changes N/A ## Checklist: - [x] I have performed a self-review of my own code - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have added or updated relevant unit/integration/functional/e2e tests - [ ] I have made corresponding changes to the documentation - [x] I have assigned this pull request to a milestone ACKs for top commit: UdjinM6: utACK c1cdb75 Tree-SHA512: 81730edf32317f3ac81a35ef72dbe556b03120cc91eba3b809af252ea2909d04561efa8bcbbb8bf73d4cf91508b74f9869c5e7946d904fd9c9b21401bae36381 (cherry picked from commit 8c42e82)
…attempts" + tests c1cdb75 test: assert QSENDRECSIGS handshake is symmetric under spork21 (Konstantin Akimov) 0e33aa2 revert: "fix: avoid improper dual way connection attempts" (Konstantin Akimov) Pull request description: ## Issue being fixed or feature implemented dashpay#7255 Intermittent failure at `feature_llmq_signing.py --spork21` File "feature_llmq_signing.py", line 111, in run_test wait_for_sigs(True, False, True, 15) File "feature_llmq_signing.py", line 60, in wait_for_sigs self.wait_until(lambda: check_sigs(hasrecsigs, isconflicting1, isconflicting2), timeout = timeout) AssertionError: Predicate '' not true after <timeout> seconds ## What was done? This reverts a57a811 from PR dashpay#6967. `relayMembers` and `connections` in EnsureQuorumConnections are not interchangeable. * `connections`: who this node should connect TO. For each pair (A, B) only the deterministic-outbound side is listed, so the pair results in one TCP connection. * `relayMembers`: who this node should ask to push recovered sigs. For every already-connected MN in the set we send QSENDRECSIGS=true, and the peer then flips m_wants_recsigs=true on its side. For the handshake to happen in both directions, the set must list every other quorum member -- not just the outbound half. After a57a811, only the outbound half is listed, so on the inbound half of each pair m_wants_recsigs stays false. RelayRecoveredSig only pushes QSIGREC to peers with m_wants_recsigs=true, so half of all proactive recovered-sig pushes are silently dropped. This only triggers with spork21 on (IsAllMembersConnectedEnabled returns true). In this case both the path that uses the half-mesh outbound subset and the path that relies on proactive QSIGREC. ## How Has This Been Tested? This fixes the functional test `feature_llmq_signing.py --spork21`, which times out in wait_for_sigs. It reduced amount of failures on my localhost from ~50% to almost 0. Added new functional test in feature_llmq_signing.py for --spork21 case; which fails as expected without this patch. ## Breaking Changes N/A ## Checklist: - [x] I have performed a self-review of my own code - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have added or updated relevant unit/integration/functional/e2e tests - [ ] I have made corresponding changes to the documentation - [x] I have assigned this pull request to a milestone ACKs for top commit: UdjinM6: utACK c1cdb75 Tree-SHA512: 81730edf32317f3ac81a35ef72dbe556b03120cc91eba3b809af252ea2909d04561efa8bcbbb8bf73d4cf91508b74f9869c5e7946d904fd9c9b21401bae36381 (cherry picked from commit 8c42e82)
293f778 doc: trim v23.1.3 release note test detail (PastaClaw) 2c0015a chore: prepare v23.1.3 release (PastaClaw) d08fdd7 Merge #7312: fix: intermittent incorrect logging of CheckQueue for invalid blocks (pasta) 4616073 Merge #7289: revert: "fix: avoid improper dual way connection attempts" + tests (pasta) 5e0af0b Merge #7279: test: RPC coverage linter + related fixes for listaddressbalances (pasta) 5bc0bd5 Merge #7293: fix: keep sending ISDLOCK invs to non-MN peers with watchquorums (pasta) 1b52c85 Merge #7271: Merge bitcoin-core/gui#555: Restore Send button when using external signer (pasta) Pull request description: # Release Preparation for v23.1.3 Prepares Dash Core v23.1.3 on the `v23.1.x` branch. ## Changes - **Version bump:** 23.1.2 → 23.1.3 (`configure.ac`) - **Backports included:** - dash#7271 — Restore Send button when using external signer - dash#7279 — RPC coverage linter + `listaddressbalances` fixes - dash#7289 — Revert improper dual-way connection attempts + tests - dash#7293 — Keep sending ISDLOCK invs to non-MN peers with `watchquorums` - dash#7312 — Fix intermittent incorrect logging of CheckQueue for invalid blocks - **Release notes:** Archived v23.1.2 notes and wrote v23.1.3 notes - **Flatpak metainfo:** Added v23.1.3 release entry dated 2026-05-15 - **Manpages:** Regenerated for v23.1.3 / May 2026 - **Chainparams:** Refreshed mainnet `nMinimumChainWork`, `defaultAssumeValid`, `checkpointData`, and `chainTxData`; testnet is intentionally unchanged for this patch release ## Chainparams source data Mainnet was updated from a synced node using a chainlocked block two blocks back from tip: - Height: 2471728 - Hash: `000000000000001a19ad7270422a00f86123ea94e0b295a3a796d6861bd7b032` - Chainwork: `00000000000000000000000000000000000000000000b9040746437784aaec47` - `getchaintxstats 17280`: - `time`: 1778832687 - `txcount`: 69379403 - `txrate`: 0.1476929741159368 Testnet chainparams are left unchanged from v23.1.x. ## Backport label follow-up Once this PR lands, the `backport-candidate-23.1.x` label can be removed from dash#7271, dash#7279, dash#7289, dash#7293, and dash#7312. Existing older candidate labels that appear already included in the v23.1 line and may be stale: - `backport-candidate-22.1.x`: dash#6879 - `backport-candidate-23.0.x`: dash#7064, dash#7069, dash#7087, dash#7126 ## Validation - `git diff --check upstream/v23.1.x..HEAD` - `python3 test/lint/lint-whitespace.py` - `python3 test/lint/lint-files.py` - `python3 test/lint/lint-python.py` (skipped Python linting because `flake8` is not installed) - Pre-PR code review gate: no significant issues found; recommendation: ship ACKs for top commit: UdjinM6: utACK 293f778 UdjinM6: re-utACK 293f778 Tree-SHA512: 1052f3ecb7cdc9b0e035c25c34a083d6e913d7080bc8128f90a0a15cbb51c6199112274b82fcf431125badcaaf2bfa64adff0f4215d5f702a6f0916025029f4b
293f778 doc: trim v23.1.3 release note test detail (PastaClaw) 2c0015a chore: prepare v23.1.3 release (PastaClaw) d08fdd7 Merge #7312: fix: intermittent incorrect logging of CheckQueue for invalid blocks (pasta) 4616073 Merge #7289: revert: "fix: avoid improper dual way connection attempts" + tests (pasta) 5e0af0b Merge #7279: test: RPC coverage linter + related fixes for listaddressbalances (pasta) 5bc0bd5 Merge #7293: fix: keep sending ISDLOCK invs to non-MN peers with watchquorums (pasta) 1b52c85 Merge #7271: Merge bitcoin-core/gui#555: Restore Send button when using external signer (pasta) Pull request description: ## Issue being fixed or feature implemented ## What was done? ## How Has This Been Tested? ## Breaking Changes ## Checklist: - [ ] I have performed a self-review of my own code - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have added or updated relevant unit/integration/functional/e2e tests - [ ] I have made corresponding changes to the documentation - [ ] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_ Top commit has no ACKs. Tree-SHA512: 2865e9d244dc7f9af6fb53090341af6a6ffda2838176b38f5a59bc2da56d6f3208d9d2f9fdd2af0beed99ca582f2894973a48075f909288992f70a37a7a8bae6
Issue being fixed or feature implemented
#7255
Intermittent failure at
feature_llmq_signing.py --spork21What was done?
This reverts a57a811 from PR #6967.
relayMembersandconnectionsin EnsureQuorumConnections are not interchangeable.connections: who this node should connect TO. For each pair (A, B) only the deterministic-outbound side is listed, so the pair results in one TCP connection.relayMembers: who this node should ask to push recovered sigs. For every already-connected MN in the set we send QSENDRECSIGS=true, and the peer then flips m_wants_recsigs=true on its side. For the handshake to happen in both directions,the set must list every other quorum member -- not just the outbound half.
After a57a811, only the outbound half is listed, so on the inbound half of each pair m_wants_recsigs stays false. RelayRecoveredSig only pushes QSIGREC to peers with m_wants_recsigs=true, so half of all proactive recovered-sig pushes are silently dropped.
This only triggers with spork21 on (IsAllMembersConnectedEnabled returns true). In this case both the path that uses the half-mesh outbound subset and the path that relies on proactive QSIGREC.
How Has This Been Tested?
This fixes the functional test
feature_llmq_signing.py --spork21, which times out in wait_for_sigs.It reduced amount of failures on my localhost from ~50% to almost 0.
Added new functional test in feature_llmq_signing.py for --spork21 case; which fails as expected without this patch.
Breaking Changes
N/A
Checklist: