Skip to content

feat(xml): add parseXmlRecords#7111

Open
tomas-zijdemans wants to merge 6 commits intodenoland:mainfrom
tomas-zijdemans:record-stream
Open

feat(xml): add parseXmlRecords#7111
tomas-zijdemans wants to merge 6 commits intodenoland:mainfrom
tomas-zijdemans:record-stream

Conversation

@tomas-zijdemans
Copy link
Copy Markdown
Contributor

@tomas-zijdemans tomas-zijdemans commented Apr 25, 2026

Adds parseXmlRecords: an async generator that adapts SAX-style XML event callbacks into a typed AsyncGenerator<T> of records. Mirrors the function shape of its sibling parseXmlStream, but yields records as they're parsed so callers can iterate large feeds with per-record backpressure without buffering the whole document.

for await (const item of parseXmlRecords<Item>(source, (emit) => ({
  onEndElement(name) { if (name === "item") emit(buildItem()); },
  // ...build state in the other callbacks
}))) {
  // process item
}

For pipeThrough composition, wrap with ReadableStream.from(parseXmlRecords(...)).

@github-actions github-actions Bot added the xml label Apr 25, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 25, 2026

Codecov Report

❌ Patch coverage is 96.96970% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 94.62%. Comparing base (a496da2) to head (3593ea3).

Files with missing lines Patch % Lines
xml/parse_records.ts 94.44% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7111   +/-   ##
=======================================
  Coverage   94.61%   94.62%           
=======================================
  Files         634      636    +2     
  Lines       51801    51827   +26     
  Branches     9329     9334    +5     
=======================================
+ Hits        49011    49039   +28     
+ Misses       2216     2214    -2     
  Partials      574      574           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Member

@bartlomieju bartlomieju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is correct and well-tested — SAX init mirrors parseXmlStream, TransformStream auto-handles errors in transform/flush, and the start try/catch guards the factory. A few concerns:

Thin wrapper over parseXmlStream

The user still writes the exact same SAX callback boilerplate (tracking insideItem, accumulating text, emitting on end element). The only difference is emit(record) vs items.push(record). The real value is stream composability (pipeThrough), but users can already achieve this with a small wrapper around parseXmlStream:

new ReadableStream({
  async start(controller) {
    await parseXmlStream(source, {
      onEndElement(name) { if (name === "item") controller.enqueue(item); }
    });
    controller.close();
  }
});

Is the convenience of XmlRecordStream enough to justify a new public API surface? The createCallbacks factory pattern is also unusual — most TransformStream subclasses in std take declarative options. This one takes a callback factory that returns callbacks, so the user is essentially writing a SAX handler with an extra indirection layer rather than getting a higher-level abstraction.

Duplicated initialization logic

Lines 93–106 of record_stream.ts duplicate the option extraction and XmlTokenizer/XmlEventParser construction from parseXmlStream (lines 67–71 of parse_stream.ts). If defaults change in one place, they'd need to change in both.

Backpressure test may be flaky

The test "pauses upstream pulls while downstream is blocked" uses setTimeout with 5ms and 30ms delays to detect backpressure behavior. This is timing-sensitive and may flake on slow CI runners.

Chunk-level backpressure limitation

As noted in the JSDoc, all records from a single input chunk are enqueued synchronously. If one chunk produces many records, they all buffer at once. This is inherent to the synchronous SAX model but worth a note in user-facing docs since "stream" implies per-record backpressure.

@tomas-zijdemans
Copy link
Copy Markdown
Contributor Author

Thin wrapper over parseXmlStream / unusual factory pattern

Yep. You're right that the class shape was the wrong abstraction here, and the createCallbacks factory only existed to thread emit through a class constructor. I've replaced XmlRecordStream with parseXmlRecords, an async generator that mirrors parseXmlStream's shape (function, not class). This gives a substantially smaller surface.

For users who want pipeThrough composition, one-line: ReadableStream.from(parseXmlRecords(...)).

The SAX boilerplate is unchanged, that's fundamental to event-driven parsing. But the goal here isn't to abstract that away, it's to give a small idiomatic adapter that turns "stream of XML chunks" into "stream of records" with per-record backpressure.

Duplicated initialization logic

Good catch. Extracted to _pipeline.ts (createXmlPipeline) and used by both parseXmlStream and parseXmlRecords.

Backpressure test may be flaky

Agreed. The test no longer uses fixed setTimeout delays.

Chunk-level backpressure limitation

The async generator fixes the consumer-side concern: records are yielded one-by-one, so a slow consumer applies backpressure per record. JSDoc on parseXmlRecords now states this explicitly so callers know to size input chunks accordingly.

@tomas-zijdemans tomas-zijdemans changed the title feat(xml): add XmlRecordStream feat(xml): add parseXmlRecords Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants