perf(#564): defer the symbol index on snapshot fast-load (−33% load time, −43MB footprint on openclaw)#565
Conversation
… maxrss load profiling Pass C eagerly rebuilt the global symbol index for every restored file even though plain content search never reads it. Mark it incomplete before Pass C, no-op per-file rebuilds while deferred, and build it from outlines on first use via ensureSymbolIndex (mirrors the lazy word-index rebuild) at every reader entry: findSymbol, findAllSymbols, searchSymbols, renderSymbols, resolveCallees, findCallPath, buildCallCentrality. ensureCallGraph refuses to build from a deferred index so an empty call graph is never cached. Also extends the gated CODEDB_LOAD_PROFILE with insert sub-phases (content/deps/symidx/store) and per-phase maxrss attribution, which is how this was found. openclaw (13,654 files, ReleaseFast, warm): load 60ms -> 40ms; Pass C heap +62.5MB -> +20.5MB; one-shot search phys footprint 132.7MB -> 89.2MB (-33%), max RSS 244MB -> 200MB. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Benchmark Regression ReportThresholds: 10.00% and 50,000 ns absolute delta
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 084c4ff27d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // no-ops and ensureSymbolIndex builds it from outlines on first | ||
| // symbol/caller/callpath use. Plain search never needs it, so one-shot | ||
| // CLI queries skip the inserts and their heap entirely. | ||
| explorer.markSymbolIndexIncomplete(); |
There was a problem hiding this comment.
Populate symbols before serving JSON snapshots
When a project is fast-loaded from a snapshot, this leaves explorer.symbol_index empty until a symbol/caller API happens to call ensureSymbolIndex(). The MCP/HTTP snapshot path still calls snapshot_json.buildSnapshot, which iterates explorer.symbol_index directly under a shared lock without ensuring it first, so codedb_snapshot/GET /snapshot responses for restored projects now contain an empty symbol_index and can cache that incomplete JSON until the seq changes. Please ensure the symbol index before producing the JSON snapshot, or keep that exporter outline-based.
Useful? React with 👍 / 👎.
Fixes #564.
What
Snapshot fast-load (Pass C) eagerly built the global symbol index for every restored file. Plain content search never reads it, so every one-shot CLI query paid the inserts and their heap for nothing. This PR defers it using the codebase's existing lazy pattern (word index, #539):
markSymbolIndexIncomplete()before Pass C;rebuildSymbolIndexForno-ops while deferredensureSymbolIndex()builds from outlines on first use, hooked at every reader entry:findSymbol,findAllSymbols,searchSymbols,renderSymbols,resolveCallees,findCallPath,buildCallCentralityensureCallGraphrefuses to build from a deferred index — an empty call graph can never be cached (ranking paths simply skip the centrality boost until the index exists; the snapshot-restored centrality section is unaffected)Also extends the gated
CODEDB_LOAD_PROFILEprofiler with insert sub-phases (content/deps/symidx/store) and per-phase maxrss attribution — the instrumentation that found this.Measured (openclaw, 13,654 files, ReleaseFast, warm cache, 3 runs)
First symbol/caller use pays a one-time ~18ms ensure; verified live:
codedb symbol GatewayClientcorrect on the lazy path.Tests
issue-564test:symbol_index.count() == 0after fast-load, then firstfindAllSymbolsbuilds on demand and answers correctly (fails red on release/0.2.5825)issue-537b(call edges after restore) now exercises theresolveCalleesensure-hook — caught a missing hook during development🤖 Generated with Claude Code