feat(#596,#597): ContentCache byte budget; store lock hardening + data-log compaction#602
Conversation
…d values refused Entry-count capacity stays; on top of it the cache now tracks owned_bytes and enforces a byte budget (default 256MB) for owned values. A put that would exceed it runs a global second-chance sweep over owned entries until the new value fits — holes are safe because get()/putImpl() scan the full probe window (#584). Owned values above max_entry_bytes (default 8MB) are not cached at all; any stale entry for that key is dropped so get() can't serve outdated content, and search re-reads from disk on the miss. Borrowed (snapshot-adopted) values are exempt from both, and budget sweeps skip them — evicting them frees no budget. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ction appendVersion: when the advisory lock can't be acquired, skip the diff persist instead of writing unlocked (two processes could interleave at the same end_pos). The version still records with data_offset = null. Data log: versions trimmed by max_versions orphan their diff bytes and the within-lifetime log grew forever (cross-lifetime growth was already solved by truncate-on-open, #367). appendVersion now checks past a 16MB floor with exponential back-off; when at least half the file is orphaned, live ranges are rewritten to the front in ascending offset order (each buffered whole, so overlap-safe), offsets fixed in place, and the tail truncated. Bailing midway leaves a consistent log. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Benchmark Regression ReportThresholds: 10.00% and 50,000 ns absolute delta
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 794d376fd2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
| write_pos += len; | ||
| } | ||
| try log.setLength(io, write_pos); |
There was a problem hiding this comment.
Preserve ranges written by other processes
When two codedb processes share the same data.log, this compaction only treats offsets present in this process's in-memory self.files as live; ranges appended by another process are not in that map. Even though the write path re-stats under the file lock to account for another process appending, this setLength can truncate those other locked appends once this process reaches the compaction threshold, leaving the other process's recorded data_offsets pointing past EOF or at rewritten bytes.
Useful? React with 👍 / 👎.
…e-byte-budget-store-hardening
Benchmark Regression ReportThresholds: 10.00% and 50,000 ns absolute delta
|
Implements both parked enhancements. Closes #596, closes #597. Independent of #601 (no shared files); branched off the release tip.
#596 — ContentCache byte budget (src/hot_cache.zig)
The cache bounded entries (4096 slots), not bytes — values are file contents, so worst-case retained bytes were unbounded. Now:
owned_bytesaccounting through put/update/evict/remove/clear, surfaced inStats.get()/putImpl()scan the full probe window, so a globally-evicted slot can't strand anything. The in-window collision eviction is untouched.get()can't serve outdated content (search re-reads from disk on miss).Four new tests: budget eviction, ceiling + stale-drop, borrowed exemption, accounting through update/remove/clear.
#597 — store hardening (src/store.zig)
appendVersionproceeded without the advisory lock when acquisition failed — two processes could compute the sameend_posand interleave writes. Now a failed lock skips the diff persist; the version still records withdata_offset = null.max_versionsorphaned their diff bytes forever.appendVersionnow checks past a 16MB floor with exponential back-off; when ≥ half the file is orphaned, live ranges are rewritten to the front in ascending offset order (each buffered whole — overlap-safe), version offsets fixed in place, tail truncated. Bailing midway leaves a consistent log. Test drivesmax_versionstrimming and asserts the file shrinks, offsets are rewritten, and both surviving diffs read back intact.Verification
zig build test --summary all: 799/799, exit 0 (the four cache tests are inline in hot_cache.zig and run in every binary that transitively imports it, same as the existing inline cache tests).🤖 Generated with Claude Code