Skip to content

feat(#596,#597): ContentCache byte budget; store lock hardening + data-log compaction#602

Merged
justrach merged 3 commits into
release/0.2.5825from
feat/cache-byte-budget-store-hardening
Jun 11, 2026
Merged

feat(#596,#597): ContentCache byte budget; store lock hardening + data-log compaction#602
justrach merged 3 commits into
release/0.2.5825from
feat/cache-byte-budget-store-hardening

Conversation

@justrach

Copy link
Copy Markdown
Owner

Implements both parked enhancements. Closes #596, closes #597. Independent of #601 (no shared files); branched off the release tip.

#596 — ContentCache byte budget (src/hot_cache.zig)

The cache bounded entries (4096 slots), not bytes — values are file contents, so worst-case retained bytes were unbounded. Now:

  • owned_bytes accounting through put/update/evict/remove/clear, surfaced in Stats.
  • Byte budget (default 256MB, field-configurable): a put that would exceed it runs a global second-chance sweep over owned entries until the incoming value fits. Holes are safe — post-hot_cache: ContentCache probe-window violations — overflow inserts land unreachable, holes break lookup and allow duplicate keys #584, get()/putImpl() scan the full probe window, so a globally-evicted slot can't strand anything. The in-window collision eviction is untouched.
  • Per-entry ceiling (default 8MB): oversized owned values are not cached at all, and any stale entry for that key is dropped so get() can't serve outdated content (search re-reads from disk on miss).
  • Borrowed values exempt (snapshot adoption): they cost no budget and budget sweeps skip them — evicting them would free nothing.

Four new tests: budget eviction, ceiling + stale-drop, borrowed exemption, accounting through update/remove/clear.

#597 — store hardening (src/store.zig)

  1. No unlocked diff writes. appendVersion proceeded without the advisory lock when acquisition failed — two processes could compute the same end_pos and interleave writes. Now a failed lock skips the diff persist; the version still records with data_offset = null.
  2. Data-log compaction. Cross-lifetime growth was already solved by truncate-on-open (data.log accumulates unframed raw edit content; no replay on load (0.2.5794) #367); within one daemon lifetime, versions trimmed by max_versions orphaned their diff bytes forever. appendVersion now checks past a 16MB floor with exponential back-off; when ≥ half the file is orphaned, live ranges are rewritten to the front in ascending offset order (each buffered whole — overlap-safe), version offsets fixed in place, tail truncated. Bailing midway leaves a consistent log. Test drives max_versions trimming and asserts the file shrinks, offsets are rewritten, and both surviving diffs read back intact.

Verification

zig build test --summary all: 799/799, exit 0 (the four cache tests are inline in hot_cache.zig and run in every binary that transitively imports it, same as the existing inline cache tests).

🤖 Generated with Claude Code

justrach and others added 2 commits June 10, 2026 22:15
…d values refused

Entry-count capacity stays; on top of it the cache now tracks
owned_bytes and enforces a byte budget (default 256MB) for owned values.
A put that would exceed it runs a global second-chance sweep over owned
entries until the new value fits — holes are safe because get()/putImpl()
scan the full probe window (#584). Owned values above max_entry_bytes
(default 8MB) are not cached at all; any stale entry for that key is
dropped so get() can't serve outdated content, and search re-reads from
disk on the miss. Borrowed (snapshot-adopted) values are exempt from
both, and budget sweeps skip them — evicting them frees no budget.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ction

appendVersion: when the advisory lock can't be acquired, skip the diff
persist instead of writing unlocked (two processes could interleave at
the same end_pos). The version still records with data_offset = null.

Data log: versions trimmed by max_versions orphan their diff bytes and
the within-lifetime log grew forever (cross-lifetime growth was already
solved by truncate-on-open, #367). appendVersion now checks past a 16MB
floor with exponential back-off; when at least half the file is
orphaned, live ranges are rewritten to the front in ascending offset
order (each buffered whole, so overlap-safe), offsets fixed in place,
and the tail truncated. Bailing midway leaves a consistent log.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 78173 76882 -1.65% -1291 OK
codedb_changes 6022 6485 +7.69% +463 OK
codedb_context 1140237 962989 -15.54% -177248 OK
codedb_deps 294 314 +6.80% +20 OK
codedb_edit 37275 36501 -2.08% -774 OK
codedb_find 5363 5298 -1.21% -65 OK
codedb_hot 15637 13674 -12.55% -1963 OK
codedb_outline 27385 28099 +2.61% +714 OK
codedb_read 22128 13656 -38.29% -8472 OK
codedb_search 29165 27746 -4.87% -1419 OK
codedb_snapshot 67387 69540 +3.19% +2153 OK
codedb_status 4788 5028 +5.01% +240 OK
codedb_symbol 43748 39740 -9.16% -4008 OK
codedb_tree 32168 32936 +2.39% +768 OK
codedb_word 8189 7415 -9.45% -774 OK

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 794d376fd2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/store.zig
}
write_pos += len;
}
try log.setLength(io, write_pos);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve ranges written by other processes

When two codedb processes share the same data.log, this compaction only treats offsets present in this process's in-memory self.files as live; ranges appended by another process are not in that map. Even though the write path re-stats under the file lock to account for another process appending, this setLength can truncate those other locked appends once this process reaches the compaction threshold, leaving the other process's recorded data_offsets pointing past EOF or at rewritten bytes.

Useful? React with 👍 / 👎.

@justrach justrach merged commit 8d1ae7d into release/0.2.5825 Jun 11, 2026
1 check passed
@github-actions

Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 110097 112868 +2.52% +2771 OK
codedb_changes 11287 11759 +4.18% +472 OK
codedb_context 1068081 1068985 +0.08% +904 OK
codedb_deps 338 357 +5.62% +19 OK
codedb_edit 36089 35279 -2.24% -810 OK
codedb_find 9992 10084 +0.92% +92 OK
codedb_hot 29128 26555 -8.83% -2573 OK
codedb_outline 37337 37407 +0.19% +70 OK
codedb_read 17444 18174 +4.18% +730 OK
codedb_search 28892 28761 -0.45% -131 OK
codedb_snapshot 72760 73307 +0.75% +547 OK
codedb_status 10075 9799 -2.74% -276 OK
codedb_symbol 52313 52889 +1.10% +576 OK
codedb_tree 58100 55857 -3.86% -2243 OK
codedb_word 12280 12613 +2.71% +333 OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant