Skip to content

[BUG] Fix CAGRA search recall with a graph built by NN Descent#819

Open
enp1s0 wants to merge 55 commits intorapidsai:mainfrom
enp1s0:cagra-fix-nnd-init
Open

[BUG] Fix CAGRA search recall with a graph built by NN Descent#819
enp1s0 wants to merge 55 commits intorapidsai:mainfrom
enp1s0:cagra-fix-nnd-init

Conversation

@enp1s0
Copy link
Copy Markdown
Member

@enp1s0 enp1s0 commented Apr 12, 2025

This PR addresses an unexpected low recall issue in the CAGRA search with a graph generated by NN Descent.
For that, it updates the initial NN Descent graph generation so that all indices are included, whereas some nodes are not found in the branch-25.10 implementation.

figure

(itopk=256)

This PR also makes the following changes:

  • Adds a workaround to remove duplicate node indices from the kNN result
  • Adds test cases for different graph degrees

@enp1s0 enp1s0 requested a review from a team as a code owner April 12, 2025 16:00
@enp1s0 enp1s0 self-assigned this Apr 12, 2025
@github-actions github-actions Bot added the cpp label Apr 12, 2025
@enp1s0 enp1s0 added bug Something isn't working non-breaking Introduces a non-breaking change cpp and removed cpp labels Apr 12, 2025
@cjnolet cjnolet moved this to In Progress in Unstructured Data Processing Apr 22, 2025
Copy link
Copy Markdown
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @enp1s0 for this PR! Looks good overall, I have just one request to clarify the expectations for the random graph.

@@ -829,36 +830,29 @@ void GnndGraph<Index_t>::sample_graph_new(InternalID_t<Index_t>* new_neighbors,
template <typename Index_t>
void GnndGraph<Index_t>::init_random_graph()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a docstring to describe what properties the random graph is expected to have? What is ensured by the current PR to fix the low recall issue? E.g. is it guaranteed to be fully connected?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the very late reply.
Actually, I'm not sure about the exact requirements for graph initialization since the paper doesn’t describe them.
However, I’ve updated the initialization process to ensure connectivity and added a comment about it.

@tfeher tfeher changed the base branch from branch-25.06 to branch-25.08 May 26, 2025 08:36
@enp1s0 enp1s0 force-pushed the cagra-fix-nnd-init branch from ad5bf28 to 8a52d05 Compare July 18, 2025 22:24
@enp1s0 enp1s0 requested review from a team as code owners September 6, 2025 08:52
@enp1s0 enp1s0 requested a review from bdice September 6, 2025 08:52
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Sep 6, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@enp1s0 enp1s0 changed the base branch from branch-25.08 to branch-25.10 September 6, 2025 08:53
Copy link
Copy Markdown
Member

@KyleFromNVIDIA KyleFromNVIDIA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All CMake/CI/packaging changes have been removed, so approving.

@cjnolet cjnolet changed the base branch from branch-25.10 to main November 3, 2025 17:06
@cjnolet
Copy link
Copy Markdown
Member

cjnolet commented Apr 9, 2026

@enp1s0 there still seems to be a C++ test failure in the int8 CAGRA test. Can you take a look?

AyodeAwe and others added 11 commits April 10, 2026 12:00
@cjnolet moved the kernel for KDE from cuml to cuvs. This PR is needed for rapidsai/cuml#7833

Authors:
  - Severin Dicks (https://github.com/Intron7)
  - Anupam (https://github.com/aamijar)

Approvers:
  - Victor Lafargue (https://github.com/viclafargue)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#1915
**Description of changes**

cuVS Bench documentation updates to align with the pluggable benchmark API:

- **index.rst**: Fix datasets.yaml path and document the `constraints` field in algorithm YAML; add Recommended (orchestrator API) and Legacy CLI workflow; fix intro typo.
- **param_tuning.rst**: Add Benchmark modes (sweep vs tune) and tune-mode example.
- **pluggable_backend.rst** (new): Document config loader + backend pair, run flow, what the loader produces, steps to add a backend, and a minimal Elasticsearch-style example. Add to cuvs_bench toctree.

Authors:
  - Joseph Nke (https://github.com/jnke2016)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Anupam (https://github.com/aamijar)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#1891
Following on the work of rapidsai/rmm#2244 and rapidsai/cuml#7725, this enables general instructions and configuration to enable coderabbit codereviews on cuvs.

Closes rapidsai#1767

Authors:
  - Ben Frederickson (https://github.com/benfred)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Anupam (https://github.com/aamijar)

Approvers:
  - Anupam (https://github.com/aamijar)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#1908
Adding cagra mem usage for building with the NN Descent algorithm.
Improving formatting of existing docs.


Previous:
<img width="824" height="565" alt="Screenshot 2026-04-08 at 4 04 55 PM" src="https://github.com/user-attachments/assets/0d18de28-e5c6-4ee5-8f90-8896c6e4c059" />


Updated in the PR:

<img width="807" height="565" alt="Screenshot 2026-04-08 at 4 05 18 PM" src="https://github.com/user-attachments/assets/1f3d0eaa-4c90-4a1d-ac18-32f3df0a612c" />

Authors:
  - Jinsol Park (https://github.com/jinsolp)

Approvers:
  - Anupam (https://github.com/aamijar)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#2000
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 26, 2026

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes

    • Strengthened runtime validation and improved removal of duplicate/self neighbors during graph construction.
  • Performance

    • Early-exit optimization for complete-graph cases to skip unnecessary work.
    • More efficient neighbor-list compaction and deterministic initialization for predictable startup.
  • Refactor

    • Reworked neighbor generation and sanitization for consistency and reliability.
  • Tests

    • Tests extended with a graph-degree parameter, propagated into index construction and test heuristics; evaluation thresholds and dataset-skip rules adjusted accordingly.

Walkthrough

Reworks NN-Descent initialization/finalization: replaces construction-time asserts with runtime checks, replaces shuffled init with deterministic stride-based neighbor generation, compacts/deduplicates rows and fills remaining slots with pseudo-random candidates, adds complete-graph fast path, and exposes a new graph_degree test parameter with related adjustments.

Changes

Cohort / File(s) Summary
NN-Descent Algorithm Core
cpp/src/neighbors/detail/nn_descent.cuh
Replaces divisibility asserts with RAFT_EXPECTS runtime validation; rewrites GnndGraph::init_random_graph() to deterministic stride-based neighbor assignment using extended_nrows and std::gcd; removes a destructor assert(h_graph == nullptr); changes final neighbor sanitization to per-row deduplication (treating self as duplicate) and pseudo-random filling that rejects duplicates/self; adds early-exit fast path when requested graph is complete and distances are not requested.
Test Infrastructure & Parameters
cpp/tests/neighbors/ann_cagra.cuh
Adds graph_degree field to AnnCagraInputs (public); prints and propagates graph_degree into cagra::index_params (sets intermediate_graph_degree = graph_degree * 2); introduces dataset-size skip heuristics based on graph_degree; heuristically scales min_recall downward for low degrees; updates all test parameter generators to include the new degree parameter.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main bug being fixed (CAGRA search recall with NN Descent graphs) and accurately summarizes the primary change in the PR.
Description check ✅ Passed The description is directly related to the changeset, explaining the low recall issue, the NN Descent graph generation fix, duplicate removal workaround, and new test cases.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
cpp/src/neighbors/detail/nn_descent.cuh (2)

1681-1714: Minor: hoist the i == idx check out of the inner dedup loops.

Inside both the dedup pass (lines 1685–1689) and the random-fill loop (lines 1706–1711), the predicate i == idx is rechecked on every iteration of the inner exi_j loop, even though neither i nor idx changes within that loop. It's not a correctness bug — the outer loops still discard self-edges — but moving the check out lets each inner loop short-circuit immediately and makes the intent clearer.

♻️ Suggested rewrite
-    for (size_t in_j = 0; in_j < build_config_.node_degree; in_j++) {
-      size_t idx = graph_.h_graph[i * graph_.node_degree + in_j].id();
-
-      bool dup = false;
-      for (size_t exi_j = 0; exi_j < out_j; exi_j++) {
-        if (static_cast<decltype(idx)>(output_neighbor_list_ptr[exi_j]) == idx || i == idx) {
-          dup = true;
-          break;
-        }
-      }
-      if (!dup) {
-        output_neighbor_list_ptr[out_j] = idx;
-        out_j++;
-      }
-    }
+    for (size_t in_j = 0; in_j < build_config_.node_degree; in_j++) {
+      size_t idx = graph_.h_graph[i * graph_.node_degree + in_j].id();
+      if (idx == i) { continue; }  // skip self-loops once
+      bool dup = false;
+      for (size_t exi_j = 0; exi_j < out_j; exi_j++) {
+        if (static_cast<decltype(idx)>(output_neighbor_list_ptr[exi_j]) == idx) {
+          dup = true;
+          break;
+        }
+      }
+      if (!dup) { output_neighbor_list_ptr[out_j++] = idx; }
+    }

The same simplification applies to the random-fill do { … } while (dup) loop below.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/neighbors/detail/nn_descent.cuh` around lines 1681 - 1714, The inner
dedup loops redundantly re-evaluate the invariant check "i == idx" on every
iteration; in functions/loops handling neighbor initialization (the for loop
over in_j reading graph_.h_graph[i * graph_.node_degree + in_j].id() and the
random-fill do...while that uses
cuvs::neighbors::cagra::detail::device::xorshift64), hoist the self-edge test
out of the exi_j inner loops: first check if i==idx and skip adding immediately
if true, otherwise run the duplicate-scan over output_neighbor_list_ptr; apply
the same change inside both the initial dedup pass (using out_j and exi_j) and
the random-fill loop (using j and exi_j) so the inner loop only tests against
previous entries and short-circuits sooner.

1713-1713: Use Index_t instead of int for the cast.

output_neighbor_list_ptr is Index_t* and the analogous assignment at line 1692 relies on the implicit size_t → Index_t conversion. Hard-coding int here is inconsistent and will silently truncate if Index_t is ever widened (e.g. to int64_t / uint32_t).

-      output_neighbor_list_ptr[j] = static_cast<int>(idx);
+      output_neighbor_list_ptr[j] = static_cast<Index_t>(idx);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/neighbors/detail/nn_descent.cuh` at line 1713, The assignment to
output_neighbor_list_ptr uses an explicit cast to int which is inconsistent and
may truncate values; replace the hard-coded int cast with Index_t (i.e., cast
idx to Index_t) so the line uses the same type as output_neighbor_list_ptr and
matches the implicit size_t→Index_t behavior used elsewhere (see the analogous
assignment that relies on implicit conversion). Ensure you reference
output_neighbor_list_ptr, Index_t, and idx when making the change.
cpp/tests/neighbors/ann_cagra.cuh (1)

1269-1281: Typo splite_ratio and duplicate database0_size/database1_size declarations.

splite_ratio is misspelled (should be split_ratio), and database0_size/database1_size are redeclared later at lines 1348–1350 inside the inner block (with the correctly spelled split_ratio). The values are the same (0.55), so behavior is identical, but the duplication is fragile — if someone updates one constant, the other will silently diverge. Recommend hoisting the existing declarations or removing the redundant inner copies.

♻️ Proposed cleanup
-    const double splite_ratio        = 0.55;
-    const std::size_t database0_size = ps.n_rows * splite_ratio;
-    const std::size_t database1_size = ps.n_rows - database0_size;
+    const double split_ratio         = 0.55;
+    const std::size_t database0_size = ps.n_rows * split_ratio;
+    const std::size_t database1_size = ps.n_rows - database0_size;
     // Dataset size must be larger than both 33 and the intermediate graph degree
     if (ps.build_algo == graph_build_algo::NN_DESCENT &&
         ((database0_size < 33) ||
          raft::round_up_safe(database0_size, 32lu) <= ps.graph_degree * 3lu))
       GTEST_SKIP();
     if (ps.build_algo == graph_build_algo::NN_DESCENT &&
         ((database1_size < 33) ||
          raft::round_up_safe(database1_size, 32lu) <= ps.graph_degree * 3lu))
       GTEST_SKIP();

Then remove the redeclarations later (lines 1348–1350):

-        const double split_ratio         = 0.55;
-        const std::size_t database0_size = ps.n_rows * split_ratio;
-        const std::size_t database1_size = ps.n_rows - database0_size;
-
         auto database0_view = raft::make_device_matrix_view<const DataT, int64_t>(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/neighbors/ann_cagra.cuh` around lines 1269 - 1281, The test defines
the same split constant twice with a typo: `splite_ratio` (and corresponding
`database0_size`/`database1_size`) at the top, while later the correctly named
`split_ratio` and duplicate `database0_size`/`database1_size` are redeclared;
remove the duplication and fix the typo by consolidating to a single
`split_ratio` declaration (or rename the top `splite_ratio` to `split_ratio`)
and use those single `database0_size` and `database1_size` variables everywhere
(remove the later redeclared variables) so the values cannot diverge.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/neighbors/detail/nn_descent.cuh`:
- Around line 1213-1226: In init_random_graph(), replace the local uint32_t id
(and ensure extended_nrows used in arithmetic) with a 64-bit type (e.g.,
uint64_t or Index_t) so the calculation id = (i + 1) * num_segments + seg_id is
performed without 32-bit overflow/truncation; update the arithmetic in the
initialization block and the while loop condition (while (id >= nrow || id ==
i)) to use the 64-bit id/extended_nrows, and only narrow/cast to uint32_t when
storing the final neighbor index into the neighbor arrays to preserve bounds
checks and correctness of neighbor selection.

---

Nitpick comments:
In `@cpp/src/neighbors/detail/nn_descent.cuh`:
- Around line 1681-1714: The inner dedup loops redundantly re-evaluate the
invariant check "i == idx" on every iteration; in functions/loops handling
neighbor initialization (the for loop over in_j reading graph_.h_graph[i *
graph_.node_degree + in_j].id() and the random-fill do...while that uses
cuvs::neighbors::cagra::detail::device::xorshift64), hoist the self-edge test
out of the exi_j inner loops: first check if i==idx and skip adding immediately
if true, otherwise run the duplicate-scan over output_neighbor_list_ptr; apply
the same change inside both the initial dedup pass (using out_j and exi_j) and
the random-fill loop (using j and exi_j) so the inner loop only tests against
previous entries and short-circuits sooner.
- Line 1713: The assignment to output_neighbor_list_ptr uses an explicit cast to
int which is inconsistent and may truncate values; replace the hard-coded int
cast with Index_t (i.e., cast idx to Index_t) so the line uses the same type as
output_neighbor_list_ptr and matches the implicit size_t→Index_t behavior used
elsewhere (see the analogous assignment that relies on implicit conversion).
Ensure you reference output_neighbor_list_ptr, Index_t, and idx when making the
change.

In `@cpp/tests/neighbors/ann_cagra.cuh`:
- Around line 1269-1281: The test defines the same split constant twice with a
typo: `splite_ratio` (and corresponding `database0_size`/`database1_size`) at
the top, while later the correctly named `split_ratio` and duplicate
`database0_size`/`database1_size` are redeclared; remove the duplication and fix
the typo by consolidating to a single `split_ratio` declaration (or rename the
top `splite_ratio` to `split_ratio`) and use those single `database0_size` and
`database1_size` variables everywhere (remove the later redeclared variables) so
the values cannot diverge.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 54725a7b-94e0-4e19-a782-f3997e5863a5

📥 Commits

Reviewing files that changed from the base of the PR and between f2bffb6 and 7dddecb.

📒 Files selected for processing (2)
  • cpp/src/neighbors/detail/nn_descent.cuh
  • cpp/tests/neighbors/ann_cagra.cuh

Comment on lines +1213 to +1226
for (uint64_t i = 0; i < nrow; i++) {
// Generate a starting index. The node ((i + 1) % nrow) will be included in the neighbor list
// of node i. This rule guarantees the connectivity of the graph.
uint32_t id;
if ((i + 1) % num_segments == seg_id) {
id = i + 1;
if (id >= nrow) { id = seg_id; }
} else {
id = (i + 1) * num_segments + seg_id;
}

for (uint32_t j = 0; j < actual_segment_size; j++) {
while (id >= nrow || id == i) {
id = (id + stride * num_segments) % extended_nrows;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify that GNND/build instantiations only use 32-bit Index_t today.
rg -nP --type=cpp --type=cuda -C2 '\bGNND<\s*[^,>]+,\s*[^>]+>'
rg -nP --type=cpp --type=cuda -C2 '\bnn_descent::detail::build<'

Repository: rapidsai/cuvs

Length of output: 115


🏁 Script executed:

# Search for GNND and build instantiations, avoiding unsupported --type=cuda
rg -nP -g '*.cu' -g '*.cuh' -g '*.hpp' -g '*.cpp' 'GNND<\s*[^,>]+,\s*[^>]+>' -C2

Repository: rapidsai/cuvs

Length of output: 5691


🏁 Script executed:

# Search for nn_descent::detail::build instantiations
rg -nP -g '*.cu' -g '*.cuh' -g '*.hpp' -g '*.cpp' 'nn_descent::detail::build<' -C2

Repository: rapidsai/cuvs

Length of output: 39


🏁 Script executed:

# Check the actual code at lines 1213-1226 to confirm the issue
sed -n '1210,1230p' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 812


🏁 Script executed:

# Check Index_t definition and usage
rg -n 'using Index_t' cpp/src/neighbors/detail/nn_descent.cuh | head -5
rg -n 'Index_t' cpp/src/neighbors/detail/nn_descent.cuh | head -20

Repository: rapidsai/cuvs

Length of output: 1360


🏁 Script executed:

# Find the init_random_graph function definition and see num_segments constraints
sed -n '1190,1240p' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 1940


🏁 Script executed:

# Find where num_segments is set/defined in build_config or elsewhere
rg -n 'num_segments\s*=' cpp/src/neighbors/detail/nn_descent.cuh | head -15

Repository: rapidsai/cuvs

Length of output: 236


🏁 Script executed:

# Check the extended_nrows computation and stride
sed -n '1195,1210p' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 839


🏁 Script executed:

# Check what node_degree and segment_size are set to
sed -n '1150,1175p' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 1395


🏁 Script executed:

# Look for segment_size definition in the context
rg -n 'segment_size\s*=' cpp/src/neighbors/detail/nn_descent.cuh | head -10

Repository: rapidsai/cuvs

Length of output: 214


🏁 Script executed:

# Check the GnndGraph class member definitions to understand the types
sed -n '1080,1150p' cpp/src/neighbors/detail/nn_descent.cuh | head -70

Repository: rapidsai/cuvs

Length of output: 2757


🏁 Script executed:

# Find where segment_size is defined and its typical value
rg -n 'segment_size' cpp/src/neighbors/detail/nn_descent.cuh | head -20

Repository: rapidsai/cuvs

Length of output: 1391


🏁 Script executed:

# Look at the GNND class members to understand segment_size type
sed -n '1100,1140p' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 1229


🏁 Script executed:

# Check a concrete call site to see what segment_size values are used
sed -n '1750,1770p' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 771


🏁 Script executed:

# Find the GnndGraph constructor signature to see segment_size parameter
sed -n '1130,1155p' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 1130


🏁 Script executed:

# Find where GnndGraph is instantiated to see what segment_size is used
rg -n 'GnndGraph<' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 541


🏁 Script executed:

# Look at the build function that creates GnndGraph
sed -n '1492,1530p' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 1753


🏁 Script executed:

# Search for segment_size as a class member or constant
sed -n '1000,1130p' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 4920


🏁 Script executed:

# Search for static segment_size or constexpr
rg -n 'segment_size\s*=' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 214


🏁 Script executed:

# Search for segment_size as a static or class member in GnndGraph
sed -n '1040,1090p' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 1930


🏁 Script executed:

# Search more broadly for segment_size definition
rg -n 'static.*segment_size|const.*segment_size' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 406


🏁 Script executed:

# Look for the full GnndGraph class definition
sed -n '1040,1080p' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 1279


🏁 Script executed:

# Search for GnndGraph class definition and member variables
sed -n '920,1000p' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 2975


🏁 Script executed:

# Look for the class declaration
rg -n 'class GnndGraph|struct GnndGraph' cpp/src/neighbors/detail/nn_descent.cuh -A 50 | head -80

Repository: rapidsai/cuvs

Length of output: 39


🏁 Script executed:

# Search for GnndGraph directly
rg -n 'GnndGraph' cpp/src/neighbors/detail/nn_descent.cuh | head -30

Repository: rapidsai/cuvs

Length of output: 541


🏁 Script executed:

# Look at the header section of the file to find GnndGraph class
head -n 1200 cpp/src/neighbors/detail/nn_descent.cuh | tail -n 300 | grep -n 'GnndGraph\|segment_size\|static'

Repository: rapidsai/cuvs

Length of output: 931


🏁 Script executed:

# Check if segment_size is defined as a constexpr or static constant
grep -n 'constexpr.*segment_size\|static.*segment_size\|= raft::warp_size' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 723


🏁 Script executed:

# Search for segment_size definition in the entire file
rg -n 'segment_size' cpp/src/neighbors/detail/nn_descent.cuh | grep -E 'static|const|=.*raft::warp_size'

Repository: rapidsai/cuvs

Length of output: 406


🏁 Script executed:

# Look for the class declaration with members
sed -n '1,200p' cpp/src/neighbors/detail/nn_descent.cuh | grep -A 100 'template.*class GnndGraph\|struct GnndGraph'

Repository: rapidsai/cuvs

Length of output: 39


🏁 Script executed:

# Check the header file for GnndGraph definition
fd -e hpp cpp/src/neighbors/detail | xargs grep -l 'GnndGraph'

Repository: rapidsai/cuvs

Length of output: 461


🏁 Script executed:

# Search for GnndGraph class definition more carefully
rg -n 'class GnndGraph' cpp/src/neighbors/detail/nn_descent.cuh -B 5 -A 30

Repository: rapidsai/cuvs

Length of output: 39


🏁 Script executed:

# Try to find where segment_size is declared as a member
rg -n 'size_t.*segment_size|int.*segment_size' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 730


🏁 Script executed:

# Check the entire constructor to understand the members
sed -n '1139,1170p' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 1657


🏁 Script executed:

# Search for segment_size as a static member or constexpr in the whole class
rg -n 'static.*segment_size|constexpr.*segment_size' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 268


🏁 Script executed:

# Look at a larger section around the constructor - it might be a template parameter or static
sed -n '1100,1150p' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 1754


🏁 Script executed:

# Try searching in header files
fd . cpp/src/neighbors --full-path -e h -e hpp | xargs grep -l 'GnndGraph'

Repository: rapidsai/cuvs

Length of output: 102


🏁 Script executed:

# Look at the header file for GnndGraph class definition
cat cpp/src/neighbors/detail/nn_descent_gnnd.hpp | head -300

Repository: rapidsai/cuvs

Length of output: 10333


Potential uint32_t overflow / truncation for large datasets.

In init_random_graph() (lines 1213–1226), the calculation (i + 1) * num_segments + seg_id can overflow uint32_t even at current Index_t = int (32-bit) sizes. With segment_size = 32 (static constexpr), num_segments = node_degree / 32. For example, with nrow ≈ 2.1 billion (near int max) and num_segments = 64, the expression evaluates to ~1.37e11, which exceeds UINT32_MAX (~4.3e9). The result is implicitly truncated when assigned to uint32_t id, causing the while loop's bounds check (id >= nrow) to be bypassed incorrectly—a truncated value may pass the check when it should not. This affects neighbor selection correctness.

Promote id and extended_nrows to uint64_t (or Index_t) to preserve the full calculation and eliminate implicit narrowing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/neighbors/detail/nn_descent.cuh` around lines 1213 - 1226, In
init_random_graph(), replace the local uint32_t id (and ensure extended_nrows
used in arithmetic) with a 64-bit type (e.g., uint64_t or Index_t) so the
calculation id = (i + 1) * num_segments + seg_id is performed without 32-bit
overflow/truncation; update the arithmetic in the initialization block and the
while loop condition (while (id >= nrow || id == i)) to use the 64-bit
id/extended_nrows, and only narrow/cast to uint32_t when storing the final
neighbor index into the neighbor arrays to preserve bounds checks and
correctness of neighbor selection.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
cpp/src/neighbors/detail/nn_descent.cuh (1)

1681-1714: Minor cleanups in the dedup + random-fill loops.

A few small refinements in the rewritten shrink path:

  1. The i == idx check on lines 1686 and 1707 is loop-invariant w.r.t. exi_j and should be evaluated once before the inner loop (or the duplicate stored as a sentinel) — currently re-evaluated every inner iteration.
  2. Line 1692 implicitly narrows size_t idx to int via assignment to output_neighbor_list_ptr[out_j], while line 1713 uses an explicit static_cast<int>. Make these consistent.
  3. The random-fill do { ... } while (dup); loop has no upper bound on retries. While the complete-graph case is now handled by the fast path, near-complete cases (e.g., node_degree close to nrow_-1) can still drive expected attempts very high; consider a fallback (e.g., a deterministic sweep over remaining ids) if this becomes a hot path under those configurations.
♻️ Sketch
-    // Copy neighbor list while removing duplicates.
-    for (size_t in_j = 0; in_j < build_config_.node_degree; in_j++) {
-      size_t idx = graph_.h_graph[i * graph_.node_degree + in_j].id();
-
-      bool dup = false;
-      for (size_t exi_j = 0; exi_j < out_j; exi_j++) {
-        if (static_cast<decltype(idx)>(output_neighbor_list_ptr[exi_j]) == idx || i == idx) {
-          dup = true;
-          break;
-        }
-      }
-      if (!dup) {
-        output_neighbor_list_ptr[out_j] = idx;
-        out_j++;
-      }
-    }
+    // Copy neighbor list while removing duplicates.
+    for (size_t in_j = 0; in_j < build_config_.node_degree; in_j++) {
+      size_t idx = graph_.h_graph[i * graph_.node_degree + in_j].id();
+      if (idx == i) { continue; }  // hoisted self check
+
+      bool dup = false;
+      for (size_t exi_j = 0; exi_j < out_j; exi_j++) {
+        if (static_cast<decltype(idx)>(output_neighbor_list_ptr[exi_j]) == idx) {
+          dup = true;
+          break;
+        }
+      }
+      if (!dup) {
+        output_neighbor_list_ptr[out_j] = static_cast<int>(idx);
+        out_j++;
+      }
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/neighbors/detail/nn_descent.cuh` around lines 1681 - 1714, The dedup
+ random-fill loops in nn_descent.cuh repeatedly evaluate the invariant i == idx
inside the inner loops and mix implicit/explicit integer narrowing; move the i
== idx check out of the inner for-loops (compute a bool is_self = (i==idx)
before the inner loop or check once per candidate) to avoid redundant checks,
make all writes into output_neighbor_list_ptr use the same explicit cast (use
static_cast<int>(idx)) for consistency, and add a retry cap to the
xorshift64-based do/while in the random-fill path (e.g., max_attempts = 64); if
the cap is exceeded, fall back to a deterministic sweep over node ids to pick
the next non-duplicate non-self neighbors. Ensure these changes touch the loops
that reference build_config_.node_degree, output_neighbor_list_ptr, nrow_, and
cuvs::neighbors::cagra::detail::device::xorshift64.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/neighbors/detail/nn_descent.cuh`:
- Around line 1748-1758: The fast-path that fills the complete graph and returns
bypasses distance computation, so add a guard at the start of that branch to
reject callers who requested distances: check the same flag used elsewhere
(params.return_distances or index_params.return_distances) and if true
throw/return an error (e.g., throw std::invalid_argument or use the existing
error reporting mechanism) with a clear message indicating complete-graph path
does not support return_distances; reference idx.graph(), idx.distances(),
graph_degree, dataset.extent(0) and nnd.build in the message so maintainers can
locate the relevant logic.
- Around line 1224-1234: The post-store advance uses nrow modulo which breaks
the id % num_segments congruence; change the wrap from using nrow to
extended_nrows in the inner loop's post-store update so the sequence preserves
id % num_segments across iterations (i.e., after storing into h_graph at
store_index, update id with (id + num_segments * stride) % extended_nrows
instead of % nrow) to maintain the segmentation invariant relied on by
update_graph() and sample_graph().

---

Nitpick comments:
In `@cpp/src/neighbors/detail/nn_descent.cuh`:
- Around line 1681-1714: The dedup + random-fill loops in nn_descent.cuh
repeatedly evaluate the invariant i == idx inside the inner loops and mix
implicit/explicit integer narrowing; move the i == idx check out of the inner
for-loops (compute a bool is_self = (i==idx) before the inner loop or check once
per candidate) to avoid redundant checks, make all writes into
output_neighbor_list_ptr use the same explicit cast (use static_cast<int>(idx))
for consistency, and add a retry cap to the xorshift64-based do/while in the
random-fill path (e.g., max_attempts = 64); if the cap is exceeded, fall back to
a deterministic sweep over node ids to pick the next non-duplicate non-self
neighbors. Ensure these changes touch the loops that reference
build_config_.node_degree, output_neighbor_list_ptr, nrow_, and
cuvs::neighbors::cagra::detail::device::xorshift64.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: eabdd134-b3cc-4043-83b5-01ee3c40950e

📥 Commits

Reviewing files that changed from the base of the PR and between 7dddecb and 5f22011.

📒 Files selected for processing (1)
  • cpp/src/neighbors/detail/nn_descent.cuh

Comment on lines +1224 to 1234
for (uint32_t j = 0; j < actual_segment_size; j++) {
while (id >= nrow || id == i) {
id = (id + stride * num_segments) % extended_nrows;
}

h_neighbor_list[j].id_with_flag() =
j < (rand_seq.size() - self_in_this_seg) && size_t(id) < nrow
? id
: std::numeric_limits<Index_t>::max();
h_dist_list[j] = std::numeric_limits<DistData_t>::max();
idx++;
const auto store_index = i * node_degree + seg_id * segment_size + j;
h_graph[store_index].id_with_flag() = id;
h_dists.data_handle()[store_index] = std::numeric_limits<DistData_t>::max();

id = (id + num_segments * stride) % nrow;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm update_graph and sample_graph rely on `id % num_segments` to address segment slots.
rg -nP -C3 '%\s*num_segments' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 1085


🏁 Script executed:

#!/bin/bash
# Look at the full context around lines 1224-1234
sed -n '1200,1250p' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 1927


🏁 Script executed:

#!/bin/bash
# Find extended_nrows calculation
rg -nP 'extended_nrows\s*=' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 92


🏁 Script executed:

#!/bin/bash
# Find stride calculation and how it relates to num_segments, nrow
rg -nP 'stride\s*=' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 105


🏁 Script executed:

#!/bin/bash
# Find update_graph and sample_graph implementations to confirm they use id % num_segments
rg -nP -A10 'void update_graph\(|void sample_graph\(' cpp/src/neighbors/detail/nn_descent.cuh | head -100

Repository: rapidsai/cuvs

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Search for where h_graph is read, especially in update_graph and sample_graph
rg -nP 'update_graph|sample_graph' cpp/src/neighbors/detail/nn_descent.cuh | head -20

Repository: rapidsai/cuvs

Length of output: 678


🏁 Script executed:

#!/bin/bash
# Find where the segment-based indexing is used - look for how segments are accessed
rg -nP -B5 -A5 'seg_idx.*segment_size|segment_id.*segment_size' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 781


🏁 Script executed:

#!/bin/bash
# Look at the actual fix more carefully - check if storing values in [nrow, extended_nrows) would break anything
rg -nP 'h_graph\[.*\]' cpp/src/neighbors/detail/nn_descent.cuh | head -30

Repository: rapidsai/cuvs

Length of output: 334


🏁 Script executed:

#!/bin/bash
# Check if any code assumes id < nrow or validates boundaries when reading h_graph
rg -nP -B3 -A3 'id\(\)|\.id\(\)' cpp/src/neighbors/detail/nn_descent.cuh | grep -A3 -B3 'nrow\|>= nrow\|< nrow' | head -50

Repository: rapidsai/cuvs

Length of output: 756


🏁 Script executed:

#!/bin/bash
# Look at the full sample_graph implementation to see how it uses h_graph entries
sed -n '1240,1285p' cpp/src/neighbors/detail/nn_descent.cuh

Repository: rapidsai/cuvs

Length of output: 1838


Fix modulus mismatch in post-store advance to preserve the segmentation invariant.

The inner while loop (line 1226) wraps with % extended_nrows, preserving id % num_segments because extended_nrows is a multiple of num_segments. However, the post-store advance (line 1233) wraps with % nrow, which is not guaranteed to be a multiple of num_segments. When this advance crosses nrow, the congruence class shifts, causing subsequent neighbors stored into segment seg_id to violate the invariant id % num_segments == seg_id that is relied upon by update_graph() at line 1297, which uses seg_idx = new_neighb_id.id() % num_segments to determine segment placement.

Concrete example: nrow=100, num_segments=6, extended_nrows=102. Sequence: 6 → 36 → 66 → 96 → 26. The final value lands in segment 0's slot, but 26 % 6 == 2, not 0.

Use extended_nrows consistently for the post-store advance. This preserves the congruence class; values in [nrow, extended_nrows) will be wrapped back into [0, nrow) by the next iteration's while loop and are safely skipped downstream by sample_graph() (line 1260), which explicitly filters neighbor.id() >= nrow.

Suggested fix
-        id = (id + num_segments * stride) % nrow;
+        id = (id + num_segments * stride) % extended_nrows;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for (uint32_t j = 0; j < actual_segment_size; j++) {
while (id >= nrow || id == i) {
id = (id + stride * num_segments) % extended_nrows;
}
h_neighbor_list[j].id_with_flag() =
j < (rand_seq.size() - self_in_this_seg) && size_t(id) < nrow
? id
: std::numeric_limits<Index_t>::max();
h_dist_list[j] = std::numeric_limits<DistData_t>::max();
idx++;
const auto store_index = i * node_degree + seg_id * segment_size + j;
h_graph[store_index].id_with_flag() = id;
h_dists.data_handle()[store_index] = std::numeric_limits<DistData_t>::max();
id = (id + num_segments * stride) % nrow;
}
for (uint32_t j = 0; j < actual_segment_size; j++) {
while (id >= nrow || id == i) {
id = (id + stride * num_segments) % extended_nrows;
}
const auto store_index = i * node_degree + seg_id * segment_size + j;
h_graph[store_index].id_with_flag() = id;
h_dists.data_handle()[store_index] = std::numeric_limits<DistData_t>::max();
id = (id + num_segments * stride) % extended_nrows;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/neighbors/detail/nn_descent.cuh` around lines 1224 - 1234, The
post-store advance uses nrow modulo which breaks the id % num_segments
congruence; change the wrap from using nrow to extended_nrows in the inner
loop's post-store update so the sequence preserves id % num_segments across
iterations (i.e., after storing into h_graph at store_index, update id with (id
+ num_segments * stride) % extended_nrows instead of % nrow) to maintain the
segmentation invariant relied on by update_graph() and sample_graph().

Comment thread cpp/src/neighbors/detail/nn_descent.cuh
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/src/neighbors/detail/nn_descent.cuh (1)

1636-1669: ⚠️ Potential issue | 🔴 Critical

Dedup/backfill can desynchronize output_graph and output_distances.

When return_distances=true, Lines 1636-1669 copy the first output_graph_degree distances before Lines 1673-1714 remove duplicate ids and inject random replacements. That means callers can receive neighbor ids that no longer match the returned distances.

Please compact (id, dist) pairs together here, and give synthetic fillers a corresponding distance value as well (recomputed or a sentinel such as max()).

Also applies to: 1673-1714

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/neighbors/detail/nn_descent.cuh` around lines 1636 - 1669, The code
copies distances into output_distances (graph_h_dists -> output_dist_view)
before the dedup/backfill that modifies output_graph, so ids and distances can
desynchronize; fix by treating ids and distances as paired data: either move the
device copy of distances to after the dedup/backfill that runs on output_graph,
or perform the same dedup/backfill logic on the host/device distance buffer
(graph_h_dists/output_dist_view) in tandem with the id buffer so any
removed/filled neighbor id has its distance updated; when injecting synthetic
replacements in the dedup/backfill path, assign a corresponding distance
(recompute real distance or set a sentinel like
std::numeric_limits<DistData_t>::max()) and ensure the post-processing
(output_dist_view mapping) operates on the final paired distances.
♻️ Duplicate comments (1)
cpp/src/neighbors/detail/nn_descent.cuh (1)

1224-1233: ⚠️ Potential issue | 🟠 Major

Keep the post-store wrap on extended_nrows.

Line 1233 still wraps with % nrow, so once the sequence crosses nrow the stored ids can switch id % num_segments classes. update_graph() later routes by that modulus, so this can place neighbors into the wrong segment.

Suggested fix
-        id = (id + num_segments * stride) % nrow;
+        id = (id + num_segments * stride) % extended_nrows;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/neighbors/detail/nn_descent.cuh` around lines 1224 - 1233, The loop
advances id but the final wrap uses % nrow which causes ids to change their id %
num_segments class and misroute in update_graph(); fix by wrapping with
extended_nrows instead of nrow when updating id at the end of the loop (the code
that updates id after writing to h_graph/h_dists). Locate the block using
variables id, stride, num_segments, extended_nrows, nrow, store_index,
segment_size and change the post-store modulus to extended_nrows so stored ids
remain in the correct segment class for update_graph().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@cpp/src/neighbors/detail/nn_descent.cuh`:
- Around line 1636-1669: The code copies distances into output_distances
(graph_h_dists -> output_dist_view) before the dedup/backfill that modifies
output_graph, so ids and distances can desynchronize; fix by treating ids and
distances as paired data: either move the device copy of distances to after the
dedup/backfill that runs on output_graph, or perform the same dedup/backfill
logic on the host/device distance buffer (graph_h_dists/output_dist_view) in
tandem with the id buffer so any removed/filled neighbor id has its distance
updated; when injecting synthetic replacements in the dedup/backfill path,
assign a corresponding distance (recompute real distance or set a sentinel like
std::numeric_limits<DistData_t>::max()) and ensure the post-processing
(output_dist_view mapping) operates on the final paired distances.

---

Duplicate comments:
In `@cpp/src/neighbors/detail/nn_descent.cuh`:
- Around line 1224-1233: The loop advances id but the final wrap uses % nrow
which causes ids to change their id % num_segments class and misroute in
update_graph(); fix by wrapping with extended_nrows instead of nrow when
updating id at the end of the loop (the code that updates id after writing to
h_graph/h_dists). Locate the block using variables id, stride, num_segments,
extended_nrows, nrow, store_index, segment_size and change the post-store
modulus to extended_nrows so stored ids remain in the correct segment class for
update_graph().

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 77e8a5b1-2d04-4569-a2f5-0ba787f7ccaf

📥 Commits

Reviewing files that changed from the base of the PR and between 5f22011 and 5386cab.

📒 Files selected for processing (1)
  • cpp/src/neighbors/detail/nn_descent.cuh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ci CMake cpp non-breaking Introduces a non-breaking change Python

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

9 participants