Skip to content

fix(#598): pre-boost occurrence cap stops mention-dense tooling files saturating the path prior#599

Merged
justrach merged 2 commits into
release/0.2.5825from
fix/issue-598-tooling-saturation
Jun 10, 2026
Merged

fix(#598): pre-boost occurrence cap stops mention-dense tooling files saturating the path prior#599
justrach merged 2 commits into
release/0.2.5825from
fix/issue-598-tooling-saturation

Conversation

@justrach

Copy link
Copy Markdown
Owner

Fixes #598 — the last evidenced ranking failure mode from the audit rounds.

The ×0.5 tooling prior (#557) is multiplicative against the raw per-line occurrence count, so density shrugs it off: live, codedb search capture returned benchmarks/search-shootout/shootout.py in every top-8 slot. A naive total-score cap (the doc-penalty approach) would destroy eponymy — codedb search install must keep ranking install/install.sh first.

Fix: cap the occurrence BASE at 2.0 for tooling paths before the stem/symbol boosts. Density can't dominate (6 mentions → 2.0×0.5=1.0, below any 2-mention source line), while the +15 stem boost applies after the cap so eponymous lookups still win (cap 2 + 15 = 17 → ×0.5 = 8.5).

Failing test committed red (dense bench file ranked first; eponymy case pinned in the same test). Full suite green.

🤖 Generated with Claude Code

justrach and others added 2 commits June 10, 2026 18:23
…ath prior

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The x0.5 tooling prior is multiplicative against the raw per-line
count, so a bench script repeating a term six times (6.0x0.5=3.0)
still beat the implementation's 2.0. Cap the occurrence base at 2.0
for tooling paths BEFORE the stem/symbol boosts: density cannot
dominate, while an eponymous lookup (query 'install' ->
install/install.sh) keeps its +15 stem boost and still ranks first.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8e8c65a20c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/explore.zig
Comment on lines +2870 to +2873
const is_tooling_path = pathHasSegment(r.path, "bench") or pathHasSegment(r.path, "benchmarks") or
pathHasSegment(r.path, "scripts") or pathHasSegment(r.path, "website") or
pathHasSegment(r.path, "install");
if (is_tooling_path) score = @min(score, 2.0);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply the cap to the MCP fast path too

This cap only affects results that flow through searchContentAuto/rerankSignalScore; the MCP handler first tries renderPlainSearch for single-token searches with no glob/compact options (src/mcp.zig:1777-1782), and that renderer has its own Tier 0 ordering based on raw hit counts without this tooling cap. In that common codedb_search path, a dense bench/... or benchmarks/... file still sorts ahead of an implementation file for queries like capture, so the issue this change is meant to fix remains visible to MCP clients unless the same cap/prior is applied there or the fast path is bypassed.

Useful? React with 👍 / 👎.

@github-actions

Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 116103 108320 -6.70% -7783 OK
codedb_changes 12006 11564 -3.68% -442 OK
codedb_context 1225172 1278733 +4.37% +53561 OK
codedb_deps 326 365 +11.96% +39 NOISE
codedb_edit 42182 48424 +14.80% +6242 NOISE
codedb_find 10444 10956 +4.90% +512 OK
codedb_hot 27725 26855 -3.14% -870 OK
codedb_outline 37923 42437 +11.90% +4514 NOISE
codedb_read 19995 17944 -10.26% -2051 OK
codedb_search 30278 28465 -5.99% -1813 OK
codedb_snapshot 73016 72562 -0.62% -454 OK
codedb_status 10556 10035 -4.94% -521 OK
codedb_symbol 50306 48157 -4.27% -2149 OK
codedb_tree 51631 45276 -12.31% -6355 OK
codedb_word 13212 12650 -4.25% -562 OK

@justrach justrach merged commit 7ba17e1 into release/0.2.5825 Jun 10, 2026
2 checks passed
@justrach justrach deleted the fix/issue-598-tooling-saturation branch June 10, 2026 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant