feat(xml): add parseXmlRecords#7111
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7111 +/- ##
=======================================
Coverage 94.61% 94.62%
=======================================
Files 634 636 +2
Lines 51801 51827 +26
Branches 9329 9334 +5
=======================================
+ Hits 49011 49039 +28
+ Misses 2216 2214 -2
Partials 574 574 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
bartlomieju
left a comment
There was a problem hiding this comment.
The implementation is correct and well-tested — SAX init mirrors parseXmlStream, TransformStream auto-handles errors in transform/flush, and the start try/catch guards the factory. A few concerns:
Thin wrapper over parseXmlStream
The user still writes the exact same SAX callback boilerplate (tracking insideItem, accumulating text, emitting on end element). The only difference is emit(record) vs items.push(record). The real value is stream composability (pipeThrough), but users can already achieve this with a small wrapper around parseXmlStream:
new ReadableStream({
async start(controller) {
await parseXmlStream(source, {
onEndElement(name) { if (name === "item") controller.enqueue(item); }
});
controller.close();
}
});Is the convenience of XmlRecordStream enough to justify a new public API surface? The createCallbacks factory pattern is also unusual — most TransformStream subclasses in std take declarative options. This one takes a callback factory that returns callbacks, so the user is essentially writing a SAX handler with an extra indirection layer rather than getting a higher-level abstraction.
Duplicated initialization logic
Lines 93–106 of record_stream.ts duplicate the option extraction and XmlTokenizer/XmlEventParser construction from parseXmlStream (lines 67–71 of parse_stream.ts). If defaults change in one place, they'd need to change in both.
Backpressure test may be flaky
The test "pauses upstream pulls while downstream is blocked" uses setTimeout with 5ms and 30ms delays to detect backpressure behavior. This is timing-sensitive and may flake on slow CI runners.
Chunk-level backpressure limitation
As noted in the JSDoc, all records from a single input chunk are enqueued synchronously. If one chunk produces many records, they all buffer at once. This is inherent to the synchronous SAX model but worth a note in user-facing docs since "stream" implies per-record backpressure.
Yep. You're right that the class shape was the wrong abstraction here, and the For users who want The SAX boilerplate is unchanged, that's fundamental to event-driven parsing. But the goal here isn't to abstract that away, it's to give a small idiomatic adapter that turns "stream of XML chunks" into "stream of records" with per-record backpressure.
Good catch. Extracted to
Agreed. The test no longer uses fixed setTimeout delays.
The async generator fixes the consumer-side concern: records are yielded one-by-one, so a slow consumer applies backpressure per record. JSDoc on |
parseXmlRecords
Adds
parseXmlRecords: an async generator that adapts SAX-style XML event callbacks into a typedAsyncGenerator<T>of records. Mirrors the function shape of its siblingparseXmlStream, but yields records as they're parsed so callers can iterate large feeds with per-record backpressure without buffering the whole document.For pipeThrough composition, wrap with
ReadableStream.from(parseXmlRecords(...)).