Skip to content

perf(#564): defer the symbol index on snapshot fast-load (−33% load time, −43MB footprint on openclaw)#565

Merged
justrach merged 1 commit into
release/0.2.5825from
perf/load-pass-c
Jun 10, 2026
Merged

perf(#564): defer the symbol index on snapshot fast-load (−33% load time, −43MB footprint on openclaw)#565
justrach merged 1 commit into
release/0.2.5825from
perf/load-pass-c

Conversation

@justrach

Copy link
Copy Markdown
Owner

Fixes #564.

What

Snapshot fast-load (Pass C) eagerly built the global symbol index for every restored file. Plain content search never reads it, so every one-shot CLI query paid the inserts and their heap for nothing. This PR defers it using the codebase's existing lazy pattern (word index, #539):

  • markSymbolIndexIncomplete() before Pass C; rebuildSymbolIndexFor no-ops while deferred
  • ensureSymbolIndex() builds from outlines on first use, hooked at every reader entry: findSymbol, findAllSymbols, searchSymbols, renderSymbols, resolveCallees, findCallPath, buildCallCentrality
  • ensureCallGraph refuses to build from a deferred index — an empty call graph can never be cached (ranking paths simply skip the centrality boost until the index exists; the snapshot-restored centrality section is unaffected)

Also extends the gated CODEDB_LOAD_PROFILE profiler with insert sub-phases (content/deps/symidx/store) and per-phase maxrss attribution — the instrumentation that found this.

Measured (openclaw, 13,654 files, ReleaseFast, warm cache, 3 runs)

metric before after
snapshot load ~60ms ~40ms
symidx during load 18–20ms 0.2ms
Pass C heap growth +62.5MB +20.5MB
one-shot search max RSS 244MB 200MB
one-shot search phys footprint 132.7MB 89.2MB (−33%)

First symbol/caller use pays a one-time ~18ms ensure; verified live: codedb symbol GatewayClient correct on the lazy path.

Tests

  • New issue-564 test: symbol_index.count() == 0 after fast-load, then first findAllSymbols builds on demand and answers correctly (fails red on release/0.2.5825)
  • issue-537b (call edges after restore) now exercises the resolveCallees ensure-hook — caught a missing hook during development
  • Full suite green (731 tests)

🤖 Generated with Claude Code

… maxrss load profiling

Pass C eagerly rebuilt the global symbol index for every restored file
even though plain content search never reads it. Mark it incomplete
before Pass C, no-op per-file rebuilds while deferred, and build it
from outlines on first use via ensureSymbolIndex (mirrors the lazy
word-index rebuild) at every reader entry: findSymbol, findAllSymbols,
searchSymbols, renderSymbols, resolveCallees, findCallPath,
buildCallCentrality. ensureCallGraph refuses to build from a deferred
index so an empty call graph is never cached.

Also extends the gated CODEDB_LOAD_PROFILE with insert sub-phases
(content/deps/symidx/store) and per-phase maxrss attribution, which is
how this was found.

openclaw (13,654 files, ReleaseFast, warm): load 60ms -> 40ms; Pass C
heap +62.5MB -> +20.5MB; one-shot search phys footprint 132.7MB ->
89.2MB (-33%), max RSS 244MB -> 200MB.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 71965 69617 -3.26% -2348 OK
codedb_changes 6526 6134 -6.01% -392 OK
codedb_context 1105029 1099856 -0.47% -5173 OK
codedb_deps 271 283 +4.43% +12 OK
codedb_edit 39797 38786 -2.54% -1011 OK
codedb_find 5310 4826 -9.11% -484 OK
codedb_hot 14478 13103 -9.50% -1375 OK
codedb_outline 26771 26794 +0.09% +23 OK
codedb_read 12610 12282 -2.60% -328 OK
codedb_search 24841 25037 +0.79% +196 OK
codedb_snapshot 67689 69121 +2.12% +1432 OK
codedb_status 5051 4935 -2.30% -116 OK
codedb_symbol 36491 38213 +4.72% +1722 OK
codedb_tree 27467 13735 -49.99% -13732 OK
codedb_word 7481 7082 -5.33% -399 OK

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 084c4ff27d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/snapshot.zig
// no-ops and ensureSymbolIndex builds it from outlines on first
// symbol/caller/callpath use. Plain search never needs it, so one-shot
// CLI queries skip the inserts and their heap entirely.
explorer.markSymbolIndexIncomplete();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Populate symbols before serving JSON snapshots

When a project is fast-loaded from a snapshot, this leaves explorer.symbol_index empty until a symbol/caller API happens to call ensureSymbolIndex(). The MCP/HTTP snapshot path still calls snapshot_json.buildSnapshot, which iterates explorer.symbol_index directly under a shared lock without ensuring it first, so codedb_snapshot/GET /snapshot responses for restored projects now contain an empty symbol_index and can cache that incomplete JSON until the seq changes. Please ensure the symbol index before producing the JSON snapshot, or keep that exporter outline-based.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant