perf: cache x11 block header hash, reduce reindex hashes from 4->2 per block#6610
perf: cache x11 block header hash, reduce reindex hashes from 4->2 per block#6610PastaPastaPasta wants to merge 2 commits into
Conversation
WalkthroughThe changes introduce an optional caching mechanism for block header hashes within the Tip ⚡💬 Agentic Chat (Pro Plan, General Availability)
📜 Recent review detailsConfiguration used: CodeRabbit UI 📥 CommitsReviewing files that changed from the base of the PR and between 10b0bc8c4db109d3dfbd7c20e8d8e9476e1ab0b7 and 3e760e2. 📒 Files selected for processing (10)
🚧 Files skipped from review as they are similar to previous changes (10)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
Nice! 👍
develop: This PR: |
|
How about 1628c17 instead of 10b0bc8c4db109d3dfbd7c20e8d8e9476e1ab0b7? |
10b0bc8 to
3e760e2
Compare
| SERIALIZE_METHODS(CBlockHeader, obj) { READWRITE(obj.nVersion, obj.hashPrevBlock, obj.hashMerkleRoot, obj.nTime, obj.nBits, obj.nNonce); } | ||
| SERIALIZE_METHODS(CBlockHeader, obj) { | ||
| READWRITE(obj.nVersion, obj.hashPrevBlock, obj.hashMerkleRoot, obj.nTime, obj.nBits, obj.nNonce); | ||
| obj.cached_hash.SetNull(); |
There was a problem hiding this comment.
I think it may reset only in case of read event
|
See tsan error: https://gitlab.com/dashpay/dash/-/jobs/9770587678 |
…shing sources 0c5e295 chore: retain only Skein512, remove other variants (Kittywhiskers Van Gogh) 6fa5ddc chore: retain only Simd512, remove other variants (Kittywhiskers Van Gogh) 2c5b641 chore: retain only Shavite512, remove other variants (Kittywhiskers Van Gogh) 9035b5f chore: retain only Luffa512, remove other variants (Kittywhiskers Van Gogh) 11c0517 chore: retain only Keccak512, remove other variants (Kittywhiskers Van Gogh) 9fa0bc9 chore: retain only Jh512, remove other variants (Kittywhiskers Van Gogh) 9229220 chore: retain only Groestl512, remove other variants (Kittywhiskers Van Gogh) 822e75d chore: retain only Echo512, remove other variants (Kittywhiskers Van Gogh) 79f3078 chore: retain only Cubehash512, remove other variants (Kittywhiskers Van Gogh) 05e53e6 chore: retain only Bmw512, remove other variants (Kittywhiskers Van Gogh) 974b330 chore: retain only Blake512, remove other variants (Kittywhiskers Van Gogh) 4a31cfb chore: remove unused big-endian AES tables (Kittywhiskers Van Gogh) d0f94c3 chore: remove Doxygen macro blocks (Kittywhiskers Van Gogh) 79b9c87 chore: remove commented out code (Kittywhiskers Van Gogh) Pull request description: ## Motivation Dash utilizes a daisy-chain of 11 hash algorithms for its proof of work termed X11. The library that provided the implementation of the underlying hash algorithms is `sphlib` by Thomas Pornin ([source](https://web.archive.org/web/20180428002946/http://www.saphir2.com/sphlib/), Internet Archive). The library has been a part of Dash Core since inception (f164aea) and does what it says on the tin quite well. Though, it's been a solid decade since and performance profiling has shown that proof of work hashing takes up a not-insignificant amount of time. As an alternative to (or alongside) [dash#6610](#6610), we intend to work on improving the performance of X11 while maintaining readability and auditability. To begin with that, we are removing variants that Dash _doesn't_ use, namely, non 512-bit variants of the constituent algorithms used and subsequent pull requests will be integrating the library's contents with primitives available in Dash Core to allow for attributable and reasonable performance improvements. ## Breaking Changes None expected. ## Checklist - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas **(note: N/A)** - [x] I have added or updated relevant unit/integration/functional/e2e tests **(note: N/A)** - [x] I have made corresponding changes to the documentation **(note: N/A)** - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_ ACKs for top commit: PastaPastaPasta: utACK 0c5e295 UdjinM6: utACK 0c5e295 knst: utACK 0c5e295 Tree-SHA512: fb6cd94bcb01b9434d83c35509f5bd2ed902c918bf58d7027af87643abcd62a937aa94070d73c7a2c4e9177809194369b71630e353df87e0432dac503e8e0d72
Review GateCommit:
|
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
The X11 hash caching achieves a meaningful reindex speedup, but the implementation has two fundamental issues: the mutable cache introduces an unsynchronized write in a const method (data race under concurrent GetHash() calls), and cache invalidation is incomplete because use_cache=false still writes the cache and header fields remain publicly mutable without clearing it.
Reviewed commit: 3e760e2
🔴 2 blocking
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `src/primitives/block.cpp`:
- [BLOCKING] lines 13-21: Unsynchronized mutable write in const GetHash() — data race
`GetHash()` is a `const` method that writes `cached_hash` (a `mutable uint256`) with no lock or atomic protection. Before this PR, `GetHash()` was pure and concurrent calls on the same header were safe. Now, two threads calling `GetHash()` on the same `CBlockHeader` for the first time race on a 32-byte write — undefined behavior per C++.
Since `CBlock`/`CBlockHeader` instances are routinely shared (including via `shared_ptr<const CBlock>`), this is a real regression. Consider using `std::atomic` with a flag, or `std::call_once`, or external synchronization.
- [BLOCKING] lines 15-20: `use_cache=false` still writes the cache — stale hashes after header mutation
`GetHash(false)` bypasses the cache *read* but still assigns `cached_hash` on line 19. Callers using `use_cache=false` during nonce search (to get a fresh hash after mutating `nNonce`) still poison the cache with each iteration's hash. After the loop, a plain `GetHash()` returns the stale cached value from the last nonce tried, not the final header state.
The same issue affects `getblocktemplate` which mutates `nTime`/`nNonce`/`nVersion` on a cached template without clearing the hash. `CBlock::GetBlockHeader()` also copies `cached_hash` into the returned header, propagating stale state.
Fix: `use_cache=false` should compute and return without writing the cache, or header mutation should clear the cache.
| uint256 CBlockHeader::GetHash(bool use_cache) const | ||
| { | ||
| std::vector<unsigned char> vch(80); | ||
| CVectorWriter ss(SER_GETHASH, PROTOCOL_VERSION, vch, 0); | ||
| ss << *this; | ||
| return HashX11((const char *)vch.data(), (const char *)vch.data() + vch.size()); | ||
| if (!use_cache || cached_hash.IsNull()) { | ||
| std::vector<unsigned char> vch(80); | ||
| CVectorWriter ss(SER_GETHASH, PROTOCOL_VERSION, vch, 0); | ||
| ss << *this; | ||
| cached_hash = HashX11((const char *)vch.data(), (const char *)vch.data() + vch.size()); | ||
| } | ||
| return cached_hash; |
There was a problem hiding this comment.
🔴 Blocking: Unsynchronized mutable write in const GetHash() — data race
GetHash() is a const method that writes cached_hash (a mutable uint256) with no lock or atomic protection. Before this PR, GetHash() was pure and concurrent calls on the same header were safe. Now, two threads calling GetHash() on the same CBlockHeader for the first time race on a 32-byte write — undefined behavior per C++.
Since CBlock/CBlockHeader instances are routinely shared (including via shared_ptr<const CBlock>), this is a real regression. Consider using std::atomic with a flag, or std::call_once, or external synchronization.
source: ['codex']
🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `src/primitives/block.cpp`:
- [BLOCKING] lines 13-21: Unsynchronized mutable write in const GetHash() — data race
`GetHash()` is a `const` method that writes `cached_hash` (a `mutable uint256`) with no lock or atomic protection. Before this PR, `GetHash()` was pure and concurrent calls on the same header were safe. Now, two threads calling `GetHash()` on the same `CBlockHeader` for the first time race on a 32-byte write — undefined behavior per C++.
Since `CBlock`/`CBlockHeader` instances are routinely shared (including via `shared_ptr<const CBlock>`), this is a real regression. Consider using `std::atomic` with a flag, or `std::call_once`, or external synchronization.
| if (!use_cache || cached_hash.IsNull()) { | ||
| std::vector<unsigned char> vch(80); | ||
| CVectorWriter ss(SER_GETHASH, PROTOCOL_VERSION, vch, 0); | ||
| ss << *this; | ||
| cached_hash = HashX11((const char *)vch.data(), (const char *)vch.data() + vch.size()); | ||
| } |
There was a problem hiding this comment.
🔴 Blocking: use_cache=false still writes the cache — stale hashes after header mutation
GetHash(false) bypasses the cache read but still assigns cached_hash on line 19. Callers using use_cache=false during nonce search (to get a fresh hash after mutating nNonce) still poison the cache with each iteration's hash. After the loop, a plain GetHash() returns the stale cached value from the last nonce tried, not the final header state.
The same issue affects getblocktemplate which mutates nTime/nNonce/nVersion on a cached template without clearing the hash. CBlock::GetBlockHeader() also copies cached_hash into the returned header, propagating stale state.
Fix: use_cache=false should compute and return without writing the cache, or header mutation should clear the cache.
source: ['codex']
🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `src/primitives/block.cpp`:
- [BLOCKING] lines 15-20: `use_cache=false` still writes the cache — stale hashes after header mutation
`GetHash(false)` bypasses the cache *read* but still assigns `cached_hash` on line 19. Callers using `use_cache=false` during nonce search (to get a fresh hash after mutating `nNonce`) still poison the cache with each iteration's hash. After the loop, a plain `GetHash()` returns the stale cached value from the last nonce tried, not the final header state.
The same issue affects `getblocktemplate` which mutates `nTime`/`nNonce`/`nVersion` on a cached template without clearing the hash. `CBlock::GetBlockHeader()` also copies `cached_hash` into the returned header, propagating stale state.
Fix: `use_cache=false` should compute and return without writing the cache, or header mutation should clear the cache.
3e760e2 to
4434123
Compare
✅ No Merge Conflicts DetectedThis PR currently has no conflicts with other open PRs. |
|
Pls see f6faab6, ae3f6a6 and d8c6891. This should fix tsan. But I'm not sure if all this extra complexity in such a critical part is a good tradeoff for performance gains. How about an alternative approach UdjinM6@f7c43a2? Headers only reindex on testnet finishes in basically the same time as this PR for me. |
|
Do you want to make you own PR for UdjinM6@f7c43a2? that approach seems fine |
|
See #7272 |
…to avoid redundant X11 computations 6e138f0 review: document known_hash safety and fix test to exercise reindex path (UdjinM6) 01af278 refactor: return std::optional<uint256> from ReadBlockFromDisk(FlatFilePos) (UdjinM6) 89d8c45 assert: use ASSERT_IF_DEBUG for known_hash invariants (UdjinM6) 01e72cb test: exercise CheckBlock and AcceptBlock with pre-computed known_hash (UdjinM6) ec87196 assert: add debug-only invariant to verify known_hash matches block hash (UdjinM6) f7c43a2 perf: pass pre-computed block hash through reindex path to avoid redundant X11 (UdjinM6) Pull request description: ## Issue being fixed or feature implemented During reindex, the X11 block header hash was computed multiple times for the same block: once in `LoadExternalBlockFile`, again in `AcceptBlockHeader`, again in `CheckBlock`, and more for out-of-order blocks. See #6610 for more info. ## What was done? Thread the already-computed hash through `AcceptBlock`, `AcceptBlockHeader`, `CheckBlock`, and `ReadBlockFromDisk` to eliminate redundant X11 computations (from 3-5 per block down to 1). ## How Has This Been Tested? Reindexed on testnet with `--stopatheight=1` to confirm performance is on par with #6610. Then competed reindex on testnet with no issues. Tests are green too. ## Breaking Changes n/a ## Checklist: - [ ] I have performed a self-review of my own code - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have added or updated relevant unit/integration/functional/e2e tests - [ ] I have made corresponding changes to the documentation - [ ] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_ Top commit has no ACKs. Tree-SHA512: 473fb7bf177c721f8d0f50e04394015554e779b26b1f004e52efb7803d267a13692a0f6a31426f084f1fbc25d2db44a56b0621fe7ed1b65c9c49ed322d98f1e9
Issue being fixed or feature implemented
We currently hash the header up to 4 times per block we are loading from disk; drop this down to 2 times
before:

./src/dashd --printtoconsole --testnet --nowallet --reindex --stopatheight=1 76.34s user 6.15s system 95% cpu 1:25.98 total
after:
./src/dashd --printtoconsole --testnet --nowallet --reindex --stopatheight=1 62.87s user 5.70s system 95% cpu 1:11.52 total

Shaves ~20-25% off of header reindex times.
Calculated as ((76.34-62.87)/76.34)=17.6%, but this includes the overhead of startup / shutdown, so it's likely higher.
What was done?
How Has This Been Tested?
Reindex headers; should probably do a full reindex
Breaking Changes
Checklist:
Go over all the following points, and put an
xin all the boxes that apply.