[BUG] Fix CAGRA search recall with a graph built by NN Descent#819
[BUG] Fix CAGRA search recall with a graph built by NN Descent#819enp1s0 wants to merge 55 commits intorapidsai:mainfrom
Conversation
| @@ -829,36 +830,29 @@ void GnndGraph<Index_t>::sample_graph_new(InternalID_t<Index_t>* new_neighbors, | |||
| template <typename Index_t> | |||
| void GnndGraph<Index_t>::init_random_graph() | |||
There was a problem hiding this comment.
Could you add a docstring to describe what properties the random graph is expected to have? What is ensured by the current PR to fix the low recall issue? E.g. is it guaranteed to be fully connected?
There was a problem hiding this comment.
Sorry for the very late reply.
Actually, I'm not sure about the exact requirements for graph initialization since the paper doesn’t describe them.
However, I’ve updated the initialization process to ensure connectivity and added a comment about it.
ad5bf28 to
8a52d05
Compare
KyleFromNVIDIA
left a comment
There was a problem hiding this comment.
All CMake/CI/packaging changes have been removed, so approving.
|
@enp1s0 there still seems to be a C++ test failure in the int8 CAGRA test. Can you take a look? |
@cjnolet moved the kernel for KDE from cuml to cuvs. This PR is needed for rapidsai/cuml#7833 Authors: - Severin Dicks (https://github.com/Intron7) - Anupam (https://github.com/aamijar) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1915
**Description of changes** cuVS Bench documentation updates to align with the pluggable benchmark API: - **index.rst**: Fix datasets.yaml path and document the `constraints` field in algorithm YAML; add Recommended (orchestrator API) and Legacy CLI workflow; fix intro typo. - **param_tuning.rst**: Add Benchmark modes (sweep vs tune) and tune-mode example. - **pluggable_backend.rst** (new): Document config loader + backend pair, run flow, what the loader produces, steps to add a backend, and a minimal Elasticsearch-style example. Add to cuvs_bench toctree. Authors: - Joseph Nke (https://github.com/jnke2016) - Corey J. Nolet (https://github.com/cjnolet) - Anupam (https://github.com/aamijar) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1891
Following on the work of rapidsai/rmm#2244 and rapidsai/cuml#7725, this enables general instructions and configuration to enable coderabbit codereviews on cuvs. Closes rapidsai#1767 Authors: - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) - Anupam (https://github.com/aamijar) Approvers: - Anupam (https://github.com/aamijar) - Corey J. Nolet (https://github.com/cjnolet) - Bradley Dice (https://github.com/bdice) URL: rapidsai#1908
Adding cagra mem usage for building with the NN Descent algorithm. Improving formatting of existing docs. Previous: <img width="824" height="565" alt="Screenshot 2026-04-08 at 4 04 55 PM" src="https://github.com/user-attachments/assets/0d18de28-e5c6-4ee5-8f90-8896c6e4c059" /> Updated in the PR: <img width="807" height="565" alt="Screenshot 2026-04-08 at 4 05 18 PM" src="https://github.com/user-attachments/assets/1f3d0eaa-4c90-4a1d-ac18-32f3df0a612c" /> Authors: - Jinsol Park (https://github.com/jinsolp) Approvers: - Anupam (https://github.com/aamijar) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#2000
📝 WalkthroughSummary by CodeRabbit
WalkthroughReworks NN-Descent initialization/finalization: replaces construction-time asserts with runtime checks, replaces shuffled init with deterministic stride-based neighbor generation, compacts/deduplicates rows and fills remaining slots with pseudo-random candidates, adds complete-graph fast path, and exposes a new Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
cpp/src/neighbors/detail/nn_descent.cuh (2)
1681-1714: Minor: hoist thei == idxcheck out of the inner dedup loops.Inside both the dedup pass (lines 1685–1689) and the random-fill loop (lines 1706–1711), the predicate
i == idxis rechecked on every iteration of the innerexi_jloop, even though neitherinoridxchanges within that loop. It's not a correctness bug — the outer loops still discard self-edges — but moving the check out lets each inner loop short-circuit immediately and makes the intent clearer.♻️ Suggested rewrite
- for (size_t in_j = 0; in_j < build_config_.node_degree; in_j++) { - size_t idx = graph_.h_graph[i * graph_.node_degree + in_j].id(); - - bool dup = false; - for (size_t exi_j = 0; exi_j < out_j; exi_j++) { - if (static_cast<decltype(idx)>(output_neighbor_list_ptr[exi_j]) == idx || i == idx) { - dup = true; - break; - } - } - if (!dup) { - output_neighbor_list_ptr[out_j] = idx; - out_j++; - } - } + for (size_t in_j = 0; in_j < build_config_.node_degree; in_j++) { + size_t idx = graph_.h_graph[i * graph_.node_degree + in_j].id(); + if (idx == i) { continue; } // skip self-loops once + bool dup = false; + for (size_t exi_j = 0; exi_j < out_j; exi_j++) { + if (static_cast<decltype(idx)>(output_neighbor_list_ptr[exi_j]) == idx) { + dup = true; + break; + } + } + if (!dup) { output_neighbor_list_ptr[out_j++] = idx; } + }The same simplification applies to the random-fill
do { … } while (dup)loop below.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/neighbors/detail/nn_descent.cuh` around lines 1681 - 1714, The inner dedup loops redundantly re-evaluate the invariant check "i == idx" on every iteration; in functions/loops handling neighbor initialization (the for loop over in_j reading graph_.h_graph[i * graph_.node_degree + in_j].id() and the random-fill do...while that uses cuvs::neighbors::cagra::detail::device::xorshift64), hoist the self-edge test out of the exi_j inner loops: first check if i==idx and skip adding immediately if true, otherwise run the duplicate-scan over output_neighbor_list_ptr; apply the same change inside both the initial dedup pass (using out_j and exi_j) and the random-fill loop (using j and exi_j) so the inner loop only tests against previous entries and short-circuits sooner.
1713-1713: UseIndex_tinstead ofintfor the cast.
output_neighbor_list_ptrisIndex_t*and the analogous assignment at line 1692 relies on the implicitsize_t → Index_tconversion. Hard-codinginthere is inconsistent and will silently truncate ifIndex_tis ever widened (e.g. toint64_t/uint32_t).- output_neighbor_list_ptr[j] = static_cast<int>(idx); + output_neighbor_list_ptr[j] = static_cast<Index_t>(idx);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/neighbors/detail/nn_descent.cuh` at line 1713, The assignment to output_neighbor_list_ptr uses an explicit cast to int which is inconsistent and may truncate values; replace the hard-coded int cast with Index_t (i.e., cast idx to Index_t) so the line uses the same type as output_neighbor_list_ptr and matches the implicit size_t→Index_t behavior used elsewhere (see the analogous assignment that relies on implicit conversion). Ensure you reference output_neighbor_list_ptr, Index_t, and idx when making the change.cpp/tests/neighbors/ann_cagra.cuh (1)
1269-1281: Typosplite_ratioand duplicatedatabase0_size/database1_sizedeclarations.
splite_ratiois misspelled (should besplit_ratio), anddatabase0_size/database1_sizeare redeclared later at lines 1348–1350 inside the inner block (with the correctly spelledsplit_ratio). The values are the same (0.55), so behavior is identical, but the duplication is fragile — if someone updates one constant, the other will silently diverge. Recommend hoisting the existing declarations or removing the redundant inner copies.♻️ Proposed cleanup
- const double splite_ratio = 0.55; - const std::size_t database0_size = ps.n_rows * splite_ratio; - const std::size_t database1_size = ps.n_rows - database0_size; + const double split_ratio = 0.55; + const std::size_t database0_size = ps.n_rows * split_ratio; + const std::size_t database1_size = ps.n_rows - database0_size; // Dataset size must be larger than both 33 and the intermediate graph degree if (ps.build_algo == graph_build_algo::NN_DESCENT && ((database0_size < 33) || raft::round_up_safe(database0_size, 32lu) <= ps.graph_degree * 3lu)) GTEST_SKIP(); if (ps.build_algo == graph_build_algo::NN_DESCENT && ((database1_size < 33) || raft::round_up_safe(database1_size, 32lu) <= ps.graph_degree * 3lu)) GTEST_SKIP();Then remove the redeclarations later (lines 1348–1350):
- const double split_ratio = 0.55; - const std::size_t database0_size = ps.n_rows * split_ratio; - const std::size_t database1_size = ps.n_rows - database0_size; - auto database0_view = raft::make_device_matrix_view<const DataT, int64_t>(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/neighbors/ann_cagra.cuh` around lines 1269 - 1281, The test defines the same split constant twice with a typo: `splite_ratio` (and corresponding `database0_size`/`database1_size`) at the top, while later the correctly named `split_ratio` and duplicate `database0_size`/`database1_size` are redeclared; remove the duplication and fix the typo by consolidating to a single `split_ratio` declaration (or rename the top `splite_ratio` to `split_ratio`) and use those single `database0_size` and `database1_size` variables everywhere (remove the later redeclared variables) so the values cannot diverge.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/neighbors/detail/nn_descent.cuh`:
- Around line 1213-1226: In init_random_graph(), replace the local uint32_t id
(and ensure extended_nrows used in arithmetic) with a 64-bit type (e.g.,
uint64_t or Index_t) so the calculation id = (i + 1) * num_segments + seg_id is
performed without 32-bit overflow/truncation; update the arithmetic in the
initialization block and the while loop condition (while (id >= nrow || id ==
i)) to use the 64-bit id/extended_nrows, and only narrow/cast to uint32_t when
storing the final neighbor index into the neighbor arrays to preserve bounds
checks and correctness of neighbor selection.
---
Nitpick comments:
In `@cpp/src/neighbors/detail/nn_descent.cuh`:
- Around line 1681-1714: The inner dedup loops redundantly re-evaluate the
invariant check "i == idx" on every iteration; in functions/loops handling
neighbor initialization (the for loop over in_j reading graph_.h_graph[i *
graph_.node_degree + in_j].id() and the random-fill do...while that uses
cuvs::neighbors::cagra::detail::device::xorshift64), hoist the self-edge test
out of the exi_j inner loops: first check if i==idx and skip adding immediately
if true, otherwise run the duplicate-scan over output_neighbor_list_ptr; apply
the same change inside both the initial dedup pass (using out_j and exi_j) and
the random-fill loop (using j and exi_j) so the inner loop only tests against
previous entries and short-circuits sooner.
- Line 1713: The assignment to output_neighbor_list_ptr uses an explicit cast to
int which is inconsistent and may truncate values; replace the hard-coded int
cast with Index_t (i.e., cast idx to Index_t) so the line uses the same type as
output_neighbor_list_ptr and matches the implicit size_t→Index_t behavior used
elsewhere (see the analogous assignment that relies on implicit conversion).
Ensure you reference output_neighbor_list_ptr, Index_t, and idx when making the
change.
In `@cpp/tests/neighbors/ann_cagra.cuh`:
- Around line 1269-1281: The test defines the same split constant twice with a
typo: `splite_ratio` (and corresponding `database0_size`/`database1_size`) at
the top, while later the correctly named `split_ratio` and duplicate
`database0_size`/`database1_size` are redeclared; remove the duplication and fix
the typo by consolidating to a single `split_ratio` declaration (or rename the
top `splite_ratio` to `split_ratio`) and use those single `database0_size` and
`database1_size` variables everywhere (remove the later redeclared variables) so
the values cannot diverge.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 54725a7b-94e0-4e19-a782-f3997e5863a5
📒 Files selected for processing (2)
cpp/src/neighbors/detail/nn_descent.cuhcpp/tests/neighbors/ann_cagra.cuh
| for (uint64_t i = 0; i < nrow; i++) { | ||
| // Generate a starting index. The node ((i + 1) % nrow) will be included in the neighbor list | ||
| // of node i. This rule guarantees the connectivity of the graph. | ||
| uint32_t id; | ||
| if ((i + 1) % num_segments == seg_id) { | ||
| id = i + 1; | ||
| if (id >= nrow) { id = seg_id; } | ||
| } else { | ||
| id = (i + 1) * num_segments + seg_id; | ||
| } | ||
|
|
||
| for (uint32_t j = 0; j < actual_segment_size; j++) { | ||
| while (id >= nrow || id == i) { | ||
| id = (id + stride * num_segments) % extended_nrows; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify that GNND/build instantiations only use 32-bit Index_t today.
rg -nP --type=cpp --type=cuda -C2 '\bGNND<\s*[^,>]+,\s*[^>]+>'
rg -nP --type=cpp --type=cuda -C2 '\bnn_descent::detail::build<'Repository: rapidsai/cuvs
Length of output: 115
🏁 Script executed:
# Search for GNND and build instantiations, avoiding unsupported --type=cuda
rg -nP -g '*.cu' -g '*.cuh' -g '*.hpp' -g '*.cpp' 'GNND<\s*[^,>]+,\s*[^>]+>' -C2Repository: rapidsai/cuvs
Length of output: 5691
🏁 Script executed:
# Search for nn_descent::detail::build instantiations
rg -nP -g '*.cu' -g '*.cuh' -g '*.hpp' -g '*.cpp' 'nn_descent::detail::build<' -C2Repository: rapidsai/cuvs
Length of output: 39
🏁 Script executed:
# Check the actual code at lines 1213-1226 to confirm the issue
sed -n '1210,1230p' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 812
🏁 Script executed:
# Check Index_t definition and usage
rg -n 'using Index_t' cpp/src/neighbors/detail/nn_descent.cuh | head -5
rg -n 'Index_t' cpp/src/neighbors/detail/nn_descent.cuh | head -20Repository: rapidsai/cuvs
Length of output: 1360
🏁 Script executed:
# Find the init_random_graph function definition and see num_segments constraints
sed -n '1190,1240p' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 1940
🏁 Script executed:
# Find where num_segments is set/defined in build_config or elsewhere
rg -n 'num_segments\s*=' cpp/src/neighbors/detail/nn_descent.cuh | head -15Repository: rapidsai/cuvs
Length of output: 236
🏁 Script executed:
# Check the extended_nrows computation and stride
sed -n '1195,1210p' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 839
🏁 Script executed:
# Check what node_degree and segment_size are set to
sed -n '1150,1175p' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 1395
🏁 Script executed:
# Look for segment_size definition in the context
rg -n 'segment_size\s*=' cpp/src/neighbors/detail/nn_descent.cuh | head -10Repository: rapidsai/cuvs
Length of output: 214
🏁 Script executed:
# Check the GnndGraph class member definitions to understand the types
sed -n '1080,1150p' cpp/src/neighbors/detail/nn_descent.cuh | head -70Repository: rapidsai/cuvs
Length of output: 2757
🏁 Script executed:
# Find where segment_size is defined and its typical value
rg -n 'segment_size' cpp/src/neighbors/detail/nn_descent.cuh | head -20Repository: rapidsai/cuvs
Length of output: 1391
🏁 Script executed:
# Look at the GNND class members to understand segment_size type
sed -n '1100,1140p' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 1229
🏁 Script executed:
# Check a concrete call site to see what segment_size values are used
sed -n '1750,1770p' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 771
🏁 Script executed:
# Find the GnndGraph constructor signature to see segment_size parameter
sed -n '1130,1155p' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 1130
🏁 Script executed:
# Find where GnndGraph is instantiated to see what segment_size is used
rg -n 'GnndGraph<' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 541
🏁 Script executed:
# Look at the build function that creates GnndGraph
sed -n '1492,1530p' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 1753
🏁 Script executed:
# Search for segment_size as a class member or constant
sed -n '1000,1130p' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 4920
🏁 Script executed:
# Search for static segment_size or constexpr
rg -n 'segment_size\s*=' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 214
🏁 Script executed:
# Search for segment_size as a static or class member in GnndGraph
sed -n '1040,1090p' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 1930
🏁 Script executed:
# Search more broadly for segment_size definition
rg -n 'static.*segment_size|const.*segment_size' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 406
🏁 Script executed:
# Look for the full GnndGraph class definition
sed -n '1040,1080p' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 1279
🏁 Script executed:
# Search for GnndGraph class definition and member variables
sed -n '920,1000p' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 2975
🏁 Script executed:
# Look for the class declaration
rg -n 'class GnndGraph|struct GnndGraph' cpp/src/neighbors/detail/nn_descent.cuh -A 50 | head -80Repository: rapidsai/cuvs
Length of output: 39
🏁 Script executed:
# Search for GnndGraph directly
rg -n 'GnndGraph' cpp/src/neighbors/detail/nn_descent.cuh | head -30Repository: rapidsai/cuvs
Length of output: 541
🏁 Script executed:
# Look at the header section of the file to find GnndGraph class
head -n 1200 cpp/src/neighbors/detail/nn_descent.cuh | tail -n 300 | grep -n 'GnndGraph\|segment_size\|static'Repository: rapidsai/cuvs
Length of output: 931
🏁 Script executed:
# Check if segment_size is defined as a constexpr or static constant
grep -n 'constexpr.*segment_size\|static.*segment_size\|= raft::warp_size' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 723
🏁 Script executed:
# Search for segment_size definition in the entire file
rg -n 'segment_size' cpp/src/neighbors/detail/nn_descent.cuh | grep -E 'static|const|=.*raft::warp_size'Repository: rapidsai/cuvs
Length of output: 406
🏁 Script executed:
# Look for the class declaration with members
sed -n '1,200p' cpp/src/neighbors/detail/nn_descent.cuh | grep -A 100 'template.*class GnndGraph\|struct GnndGraph'Repository: rapidsai/cuvs
Length of output: 39
🏁 Script executed:
# Check the header file for GnndGraph definition
fd -e hpp cpp/src/neighbors/detail | xargs grep -l 'GnndGraph'Repository: rapidsai/cuvs
Length of output: 461
🏁 Script executed:
# Search for GnndGraph class definition more carefully
rg -n 'class GnndGraph' cpp/src/neighbors/detail/nn_descent.cuh -B 5 -A 30Repository: rapidsai/cuvs
Length of output: 39
🏁 Script executed:
# Try to find where segment_size is declared as a member
rg -n 'size_t.*segment_size|int.*segment_size' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 730
🏁 Script executed:
# Check the entire constructor to understand the members
sed -n '1139,1170p' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 1657
🏁 Script executed:
# Search for segment_size as a static member or constexpr in the whole class
rg -n 'static.*segment_size|constexpr.*segment_size' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 268
🏁 Script executed:
# Look at a larger section around the constructor - it might be a template parameter or static
sed -n '1100,1150p' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 1754
🏁 Script executed:
# Try searching in header files
fd . cpp/src/neighbors --full-path -e h -e hpp | xargs grep -l 'GnndGraph'Repository: rapidsai/cuvs
Length of output: 102
🏁 Script executed:
# Look at the header file for GnndGraph class definition
cat cpp/src/neighbors/detail/nn_descent_gnnd.hpp | head -300Repository: rapidsai/cuvs
Length of output: 10333
Potential uint32_t overflow / truncation for large datasets.
In init_random_graph() (lines 1213–1226), the calculation (i + 1) * num_segments + seg_id can overflow uint32_t even at current Index_t = int (32-bit) sizes. With segment_size = 32 (static constexpr), num_segments = node_degree / 32. For example, with nrow ≈ 2.1 billion (near int max) and num_segments = 64, the expression evaluates to ~1.37e11, which exceeds UINT32_MAX (~4.3e9). The result is implicitly truncated when assigned to uint32_t id, causing the while loop's bounds check (id >= nrow) to be bypassed incorrectly—a truncated value may pass the check when it should not. This affects neighbor selection correctness.
Promote id and extended_nrows to uint64_t (or Index_t) to preserve the full calculation and eliminate implicit narrowing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/neighbors/detail/nn_descent.cuh` around lines 1213 - 1226, In
init_random_graph(), replace the local uint32_t id (and ensure extended_nrows
used in arithmetic) with a 64-bit type (e.g., uint64_t or Index_t) so the
calculation id = (i + 1) * num_segments + seg_id is performed without 32-bit
overflow/truncation; update the arithmetic in the initialization block and the
while loop condition (while (id >= nrow || id == i)) to use the 64-bit
id/extended_nrows, and only narrow/cast to uint32_t when storing the final
neighbor index into the neighbor arrays to preserve bounds checks and
correctness of neighbor selection.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
cpp/src/neighbors/detail/nn_descent.cuh (1)
1681-1714: Minor cleanups in the dedup + random-fill loops.A few small refinements in the rewritten shrink path:
- The
i == idxcheck on lines 1686 and 1707 is loop-invariant w.r.t.exi_jand should be evaluated once before the inner loop (or the duplicate stored as a sentinel) — currently re-evaluated every inner iteration.- Line 1692 implicitly narrows
size_t idxtointvia assignment tooutput_neighbor_list_ptr[out_j], while line 1713 uses an explicitstatic_cast<int>. Make these consistent.- The random-fill
do { ... } while (dup);loop has no upper bound on retries. While the complete-graph case is now handled by the fast path, near-complete cases (e.g.,node_degreeclose tonrow_-1) can still drive expected attempts very high; consider a fallback (e.g., a deterministic sweep over remaining ids) if this becomes a hot path under those configurations.♻️ Sketch
- // Copy neighbor list while removing duplicates. - for (size_t in_j = 0; in_j < build_config_.node_degree; in_j++) { - size_t idx = graph_.h_graph[i * graph_.node_degree + in_j].id(); - - bool dup = false; - for (size_t exi_j = 0; exi_j < out_j; exi_j++) { - if (static_cast<decltype(idx)>(output_neighbor_list_ptr[exi_j]) == idx || i == idx) { - dup = true; - break; - } - } - if (!dup) { - output_neighbor_list_ptr[out_j] = idx; - out_j++; - } - } + // Copy neighbor list while removing duplicates. + for (size_t in_j = 0; in_j < build_config_.node_degree; in_j++) { + size_t idx = graph_.h_graph[i * graph_.node_degree + in_j].id(); + if (idx == i) { continue; } // hoisted self check + + bool dup = false; + for (size_t exi_j = 0; exi_j < out_j; exi_j++) { + if (static_cast<decltype(idx)>(output_neighbor_list_ptr[exi_j]) == idx) { + dup = true; + break; + } + } + if (!dup) { + output_neighbor_list_ptr[out_j] = static_cast<int>(idx); + out_j++; + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/neighbors/detail/nn_descent.cuh` around lines 1681 - 1714, The dedup + random-fill loops in nn_descent.cuh repeatedly evaluate the invariant i == idx inside the inner loops and mix implicit/explicit integer narrowing; move the i == idx check out of the inner for-loops (compute a bool is_self = (i==idx) before the inner loop or check once per candidate) to avoid redundant checks, make all writes into output_neighbor_list_ptr use the same explicit cast (use static_cast<int>(idx)) for consistency, and add a retry cap to the xorshift64-based do/while in the random-fill path (e.g., max_attempts = 64); if the cap is exceeded, fall back to a deterministic sweep over node ids to pick the next non-duplicate non-self neighbors. Ensure these changes touch the loops that reference build_config_.node_degree, output_neighbor_list_ptr, nrow_, and cuvs::neighbors::cagra::detail::device::xorshift64.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/neighbors/detail/nn_descent.cuh`:
- Around line 1748-1758: The fast-path that fills the complete graph and returns
bypasses distance computation, so add a guard at the start of that branch to
reject callers who requested distances: check the same flag used elsewhere
(params.return_distances or index_params.return_distances) and if true
throw/return an error (e.g., throw std::invalid_argument or use the existing
error reporting mechanism) with a clear message indicating complete-graph path
does not support return_distances; reference idx.graph(), idx.distances(),
graph_degree, dataset.extent(0) and nnd.build in the message so maintainers can
locate the relevant logic.
- Around line 1224-1234: The post-store advance uses nrow modulo which breaks
the id % num_segments congruence; change the wrap from using nrow to
extended_nrows in the inner loop's post-store update so the sequence preserves
id % num_segments across iterations (i.e., after storing into h_graph at
store_index, update id with (id + num_segments * stride) % extended_nrows
instead of % nrow) to maintain the segmentation invariant relied on by
update_graph() and sample_graph().
---
Nitpick comments:
In `@cpp/src/neighbors/detail/nn_descent.cuh`:
- Around line 1681-1714: The dedup + random-fill loops in nn_descent.cuh
repeatedly evaluate the invariant i == idx inside the inner loops and mix
implicit/explicit integer narrowing; move the i == idx check out of the inner
for-loops (compute a bool is_self = (i==idx) before the inner loop or check once
per candidate) to avoid redundant checks, make all writes into
output_neighbor_list_ptr use the same explicit cast (use static_cast<int>(idx))
for consistency, and add a retry cap to the xorshift64-based do/while in the
random-fill path (e.g., max_attempts = 64); if the cap is exceeded, fall back to
a deterministic sweep over node ids to pick the next non-duplicate non-self
neighbors. Ensure these changes touch the loops that reference
build_config_.node_degree, output_neighbor_list_ptr, nrow_, and
cuvs::neighbors::cagra::detail::device::xorshift64.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: eabdd134-b3cc-4043-83b5-01ee3c40950e
📒 Files selected for processing (1)
cpp/src/neighbors/detail/nn_descent.cuh
| for (uint32_t j = 0; j < actual_segment_size; j++) { | ||
| while (id >= nrow || id == i) { | ||
| id = (id + stride * num_segments) % extended_nrows; | ||
| } | ||
|
|
||
| h_neighbor_list[j].id_with_flag() = | ||
| j < (rand_seq.size() - self_in_this_seg) && size_t(id) < nrow | ||
| ? id | ||
| : std::numeric_limits<Index_t>::max(); | ||
| h_dist_list[j] = std::numeric_limits<DistData_t>::max(); | ||
| idx++; | ||
| const auto store_index = i * node_degree + seg_id * segment_size + j; | ||
| h_graph[store_index].id_with_flag() = id; | ||
| h_dists.data_handle()[store_index] = std::numeric_limits<DistData_t>::max(); | ||
|
|
||
| id = (id + num_segments * stride) % nrow; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm update_graph and sample_graph rely on `id % num_segments` to address segment slots.
rg -nP -C3 '%\s*num_segments' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 1085
🏁 Script executed:
#!/bin/bash
# Look at the full context around lines 1224-1234
sed -n '1200,1250p' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 1927
🏁 Script executed:
#!/bin/bash
# Find extended_nrows calculation
rg -nP 'extended_nrows\s*=' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 92
🏁 Script executed:
#!/bin/bash
# Find stride calculation and how it relates to num_segments, nrow
rg -nP 'stride\s*=' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 105
🏁 Script executed:
#!/bin/bash
# Find update_graph and sample_graph implementations to confirm they use id % num_segments
rg -nP -A10 'void update_graph\(|void sample_graph\(' cpp/src/neighbors/detail/nn_descent.cuh | head -100Repository: rapidsai/cuvs
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Search for where h_graph is read, especially in update_graph and sample_graph
rg -nP 'update_graph|sample_graph' cpp/src/neighbors/detail/nn_descent.cuh | head -20Repository: rapidsai/cuvs
Length of output: 678
🏁 Script executed:
#!/bin/bash
# Find where the segment-based indexing is used - look for how segments are accessed
rg -nP -B5 -A5 'seg_idx.*segment_size|segment_id.*segment_size' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 781
🏁 Script executed:
#!/bin/bash
# Look at the actual fix more carefully - check if storing values in [nrow, extended_nrows) would break anything
rg -nP 'h_graph\[.*\]' cpp/src/neighbors/detail/nn_descent.cuh | head -30Repository: rapidsai/cuvs
Length of output: 334
🏁 Script executed:
#!/bin/bash
# Check if any code assumes id < nrow or validates boundaries when reading h_graph
rg -nP -B3 -A3 'id\(\)|\.id\(\)' cpp/src/neighbors/detail/nn_descent.cuh | grep -A3 -B3 'nrow\|>= nrow\|< nrow' | head -50Repository: rapidsai/cuvs
Length of output: 756
🏁 Script executed:
#!/bin/bash
# Look at the full sample_graph implementation to see how it uses h_graph entries
sed -n '1240,1285p' cpp/src/neighbors/detail/nn_descent.cuhRepository: rapidsai/cuvs
Length of output: 1838
Fix modulus mismatch in post-store advance to preserve the segmentation invariant.
The inner while loop (line 1226) wraps with % extended_nrows, preserving id % num_segments because extended_nrows is a multiple of num_segments. However, the post-store advance (line 1233) wraps with % nrow, which is not guaranteed to be a multiple of num_segments. When this advance crosses nrow, the congruence class shifts, causing subsequent neighbors stored into segment seg_id to violate the invariant id % num_segments == seg_id that is relied upon by update_graph() at line 1297, which uses seg_idx = new_neighb_id.id() % num_segments to determine segment placement.
Concrete example: nrow=100, num_segments=6, extended_nrows=102. Sequence: 6 → 36 → 66 → 96 → 26. The final value lands in segment 0's slot, but 26 % 6 == 2, not 0.
Use extended_nrows consistently for the post-store advance. This preserves the congruence class; values in [nrow, extended_nrows) will be wrapped back into [0, nrow) by the next iteration's while loop and are safely skipped downstream by sample_graph() (line 1260), which explicitly filters neighbor.id() >= nrow.
Suggested fix
- id = (id + num_segments * stride) % nrow;
+ id = (id + num_segments * stride) % extended_nrows;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for (uint32_t j = 0; j < actual_segment_size; j++) { | |
| while (id >= nrow || id == i) { | |
| id = (id + stride * num_segments) % extended_nrows; | |
| } | |
| h_neighbor_list[j].id_with_flag() = | |
| j < (rand_seq.size() - self_in_this_seg) && size_t(id) < nrow | |
| ? id | |
| : std::numeric_limits<Index_t>::max(); | |
| h_dist_list[j] = std::numeric_limits<DistData_t>::max(); | |
| idx++; | |
| const auto store_index = i * node_degree + seg_id * segment_size + j; | |
| h_graph[store_index].id_with_flag() = id; | |
| h_dists.data_handle()[store_index] = std::numeric_limits<DistData_t>::max(); | |
| id = (id + num_segments * stride) % nrow; | |
| } | |
| for (uint32_t j = 0; j < actual_segment_size; j++) { | |
| while (id >= nrow || id == i) { | |
| id = (id + stride * num_segments) % extended_nrows; | |
| } | |
| const auto store_index = i * node_degree + seg_id * segment_size + j; | |
| h_graph[store_index].id_with_flag() = id; | |
| h_dists.data_handle()[store_index] = std::numeric_limits<DistData_t>::max(); | |
| id = (id + num_segments * stride) % extended_nrows; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/neighbors/detail/nn_descent.cuh` around lines 1224 - 1234, The
post-store advance uses nrow modulo which breaks the id % num_segments
congruence; change the wrap from using nrow to extended_nrows in the inner
loop's post-store update so the sequence preserves id % num_segments across
iterations (i.e., after storing into h_graph at store_index, update id with (id
+ num_segments * stride) % extended_nrows instead of % nrow) to maintain the
segmentation invariant relied on by update_graph() and sample_graph().
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/src/neighbors/detail/nn_descent.cuh (1)
1636-1669:⚠️ Potential issue | 🔴 CriticalDedup/backfill can desynchronize
output_graphandoutput_distances.When
return_distances=true, Lines 1636-1669 copy the firstoutput_graph_degreedistances before Lines 1673-1714 remove duplicate ids and inject random replacements. That means callers can receive neighbor ids that no longer match the returned distances.Please compact
(id, dist)pairs together here, and give synthetic fillers a corresponding distance value as well (recomputed or a sentinel such asmax()).Also applies to: 1673-1714
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/neighbors/detail/nn_descent.cuh` around lines 1636 - 1669, The code copies distances into output_distances (graph_h_dists -> output_dist_view) before the dedup/backfill that modifies output_graph, so ids and distances can desynchronize; fix by treating ids and distances as paired data: either move the device copy of distances to after the dedup/backfill that runs on output_graph, or perform the same dedup/backfill logic on the host/device distance buffer (graph_h_dists/output_dist_view) in tandem with the id buffer so any removed/filled neighbor id has its distance updated; when injecting synthetic replacements in the dedup/backfill path, assign a corresponding distance (recompute real distance or set a sentinel like std::numeric_limits<DistData_t>::max()) and ensure the post-processing (output_dist_view mapping) operates on the final paired distances.
♻️ Duplicate comments (1)
cpp/src/neighbors/detail/nn_descent.cuh (1)
1224-1233:⚠️ Potential issue | 🟠 MajorKeep the post-store wrap on
extended_nrows.Line 1233 still wraps with
% nrow, so once the sequence crossesnrowthe stored ids can switchid % num_segmentsclasses.update_graph()later routes by that modulus, so this can place neighbors into the wrong segment.Suggested fix
- id = (id + num_segments * stride) % nrow; + id = (id + num_segments * stride) % extended_nrows;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/neighbors/detail/nn_descent.cuh` around lines 1224 - 1233, The loop advances id but the final wrap uses % nrow which causes ids to change their id % num_segments class and misroute in update_graph(); fix by wrapping with extended_nrows instead of nrow when updating id at the end of the loop (the code that updates id after writing to h_graph/h_dists). Locate the block using variables id, stride, num_segments, extended_nrows, nrow, store_index, segment_size and change the post-store modulus to extended_nrows so stored ids remain in the correct segment class for update_graph().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@cpp/src/neighbors/detail/nn_descent.cuh`:
- Around line 1636-1669: The code copies distances into output_distances
(graph_h_dists -> output_dist_view) before the dedup/backfill that modifies
output_graph, so ids and distances can desynchronize; fix by treating ids and
distances as paired data: either move the device copy of distances to after the
dedup/backfill that runs on output_graph, or perform the same dedup/backfill
logic on the host/device distance buffer (graph_h_dists/output_dist_view) in
tandem with the id buffer so any removed/filled neighbor id has its distance
updated; when injecting synthetic replacements in the dedup/backfill path,
assign a corresponding distance (recompute real distance or set a sentinel like
std::numeric_limits<DistData_t>::max()) and ensure the post-processing
(output_dist_view mapping) operates on the final paired distances.
---
Duplicate comments:
In `@cpp/src/neighbors/detail/nn_descent.cuh`:
- Around line 1224-1233: The loop advances id but the final wrap uses % nrow
which causes ids to change their id % num_segments class and misroute in
update_graph(); fix by wrapping with extended_nrows instead of nrow when
updating id at the end of the loop (the code that updates id after writing to
h_graph/h_dists). Locate the block using variables id, stride, num_segments,
extended_nrows, nrow, store_index, segment_size and change the post-store
modulus to extended_nrows so stored ids remain in the correct segment class for
update_graph().
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 77e8a5b1-2d04-4569-a2f5-0ba787f7ccaf
📒 Files selected for processing (1)
cpp/src/neighbors/detail/nn_descent.cuh
This PR addresses an unexpected low recall issue in the CAGRA search with a graph generated by NN Descent.
For that, it updates the initial NN Descent graph generation so that all indices are included, whereas some nodes are not found in the branch-25.10 implementation.
(itopk=256)
This PR also makes the following changes: