Skip to content

fix: switch scanJsonlFile and parseSessionFile to readSessionLines to prevent OOM#132

Merged
iamtoruk merged 1 commit intogetagentseal:mainfrom
maucher:fix/streaming-oom-readSessionLines
Apr 22, 2026
Merged

fix: switch scanJsonlFile and parseSessionFile to readSessionLines to prevent OOM#132
iamtoruk merged 1 commit intogetagentseal:mainfrom
maucher:fix/streaming-oom-readSessionLines

Conversation

@maucher
Copy link
Copy Markdown
Contributor

@maucher maucher commented Apr 22, 2026

Fixes #131

Problem

readViaStream (the code path for files ≥ 8 MB introduced in #67) reassembles the entire file into a single string via chunks.join('\n'), giving the same peak allocation as a plain readFile. Callers then do content.split('\n'), creating a second full copy. With FILE_READ_CONCURRENCY = 16 and files up to 128 MB, theoretical peak heap usage is ~6 GB — right where the crash lands:

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
...
v8::internal::JsonParser<unsigned short>::ParseJson

Fix

readSessionLines already exists in src/fs-utils.ts as a proper async generator that yields one line at a time and never holds the full file in memory. This PR switches the two hot-path callers to iterate it directly:

  • scanJsonlFile in src/optimize.ts
  • parseSessionFile in src/parser.ts

No concurrency change needed — with true line-by-line streaming, 16 concurrent files each hold only one line at a time.

Tests

Two new tests added to tests/optimize-fs.test.ts:

  • Spy test — confirms readSessionLines is called and readSessionFile is not called by scanJsonlFile
  • 500-entry correctness test — verifies all entries in a large multi-line file are processed without truncation

Checklist

  • npm run build passes
  • npm test passes (323 tests, 0 failures)
  • No Claude/Anthropic co-author trailers in commits

… prevent OOM

readViaStream (used for files ≥8 MB) reconstructs the full file as a
single string via chunks.join('\n'), giving the same peak allocation as
readFile. Callers then call content.split('\n'), creating a second copy.
With FILE_READ_CONCURRENCY=16 and files up to 128 MB this can exhaust
the V8 heap (~6 GB theoretical peak).

readSessionLines already exists as a proper async generator that yields
one line at a time. Switch both hot-path callers to iterate it directly
so the full file string is never held in memory.

Adds two tests: a spy test confirming readSessionLines is called (not
readSessionFile), and a 500-entry correctness test.

Fixes getagentseal#131
@maucher maucher marked this pull request as ready for review April 22, 2026 10:15
@iamtoruk iamtoruk merged commit bc54f85 into getagentseal:main Apr 22, 2026
2 of 3 checks passed
webrulon pushed a commit to webrulon/codeburn that referenced this pull request Apr 22, 2026
OOM streaming fix (getagentseal#132), compact menubar mode (getagentseal#133), keychain
credential fix + App Nap hardening (getagentseal#134).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OOM crash in scanJsonlFile / parseSessionFile: readViaStream loads entire file into memory despite using streams

2 participants