fix(#598): pre-boost occurrence cap stops mention-dense tooling files saturating the path prior by justrach · Pull Request #599 · justrach/codedb

justrach · 2026-06-10T10:24:01Z

Fixes #598 — the last evidenced ranking failure mode from the audit rounds.

The ×0.5 tooling prior (#557) is multiplicative against the raw per-line occurrence count, so density shrugs it off: live, codedb search capture returned benchmarks/search-shootout/shootout.py in every top-8 slot. A naive total-score cap (the doc-penalty approach) would destroy eponymy — codedb search install must keep ranking install/install.sh first.

Fix: cap the occurrence BASE at 2.0 for tooling paths before the stem/symbol boosts. Density can't dominate (6 mentions → 2.0×0.5=1.0, below any 2-mention source line), while the +15 stem boost applies after the cap so eponymous lookups still win (cap 2 + 15 = 17 → ×0.5 = 8.5).

Failing test committed red (dense bench file ranked first; eponymy case pinned in the same test). Full suite green.

🤖 Generated with Claude Code

…ath prior Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The x0.5 tooling prior is multiplicative against the raw per-line count, so a bench script repeating a term six times (6.0x0.5=3.0) still beat the implementation's 2.0. Cap the occurrence base at 2.0 for tooling paths BEFORE the stem/symbol boosts: density cannot dominate, while an eponymous lookup (query 'install' -> install/install.sh) keeps its +15 stem boost and still ranks first. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8e8c65a20c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-10T10:26:10Z

+        const is_tooling_path = pathHasSegment(r.path, "bench") or pathHasSegment(r.path, "benchmarks") or
+            pathHasSegment(r.path, "scripts") or pathHasSegment(r.path, "website") or
+            pathHasSegment(r.path, "install");
+        if (is_tooling_path) score = @min(score, 2.0);


Apply the cap to the MCP fast path too

This cap only affects results that flow through searchContentAuto/rerankSignalScore; the MCP handler first tries renderPlainSearch for single-token searches with no glob/compact options (src/mcp.zig:1777-1782), and that renderer has its own Tier 0 ordering based on raw hit counts without this tooling cap. In that common codedb_search path, a dense bench/... or benchmarks/... file still sorts ahead of an implementation file for queries like capture, so the issue this change is meant to fix remains visible to MCP clients unless the same cap/prior is applied there or the fast path is bypassed.

Useful? React with 👍 / 👎.

github-actions · 2026-06-10T10:26:54Z

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool	Base (ns)	Head (ns)	Delta	Abs Delta (ns)	Status
`codedb_bundle`	116103	108320	-6.70%	-7783	OK
`codedb_changes`	12006	11564	-3.68%	-442	OK
`codedb_context`	1225172	1278733	+4.37%	+53561	OK
`codedb_deps`	326	365	+11.96%	+39	NOISE
`codedb_edit`	42182	48424	+14.80%	+6242	NOISE
`codedb_find`	10444	10956	+4.90%	+512	OK
`codedb_hot`	27725	26855	-3.14%	-870	OK
`codedb_outline`	37923	42437	+11.90%	+4514	NOISE
`codedb_read`	19995	17944	-10.26%	-2051	OK
`codedb_search`	30278	28465	-5.99%	-1813	OK
`codedb_snapshot`	73016	72562	-0.62%	-454	OK
`codedb_status`	10556	10035	-4.94%	-521	OK
`codedb_symbol`	50306	48157	-4.27%	-2149	OK
`codedb_tree`	51631	45276	-12.31%	-6355	OK
`codedb_word`	13212	12650	-4.25%	-562	OK

justrach and others added 2 commits June 10, 2026 18:23

test(#598): failing — mention-dense tooling files saturate past the p…

d2ccfe4

…ath prior Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed Jun 10, 2026

View reviewed changes

justrach merged commit 7ba17e1 into release/0.2.5825 Jun 10, 2026
2 checks passed

justrach deleted the fix/issue-598-tooling-saturation branch June 10, 2026 10:27

justrach mentioned this pull request Jun 10, 2026

rank: mention-dense tooling files saturate past the path prior (occurrence count needs a pre-boost cap) #598

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(#598): pre-boost occurrence cap stops mention-dense tooling files saturating the path prior#599

fix(#598): pre-boost occurrence cap stops mention-dense tooling files saturating the path prior#599
justrach merged 2 commits into
release/0.2.5825from
fix/issue-598-tooling-saturation

justrach commented Jun 10, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justrach commented Jun 10, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 10, 2026

Benchmark Regression Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant