-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Description
The WARC file data/whirlwind.warc uses '\n' instead of '\r\n' to separate WARC and HTTP headers. This can make parsing the WARC file fail:
$> java -jar jwarc.jar ls data/whirlwind.warc
Exception in thread "main" java.io.UncheckedIOException: org.netpreserve.jwarc.ParsingException: invalid WARC record at position 8: WARC/1.0<-- HERE -->\nWARC-Type: warcinfo\nWARC-Date: 2024-05-... (offset 0 in whirlwind.warc)
at org.netpreserve.jwarc.WarcReader$1.hasNext(WarcReader.java:357)
at org.netpreserve.jwarc.tools.ListTool.main(ListTool.java:12)
at org.netpreserve.jwarc.tools.WarcTool.main(WarcTool.java:40)
Caused by: org.netpreserve.jwarc.ParsingException: invalid WARC record at position 8: WARC/1.0<-- HERE -->\nWARC-Type: warcinfo\nWARC-Date: 2024-05-... (offset 0 in whirlwind.warc)
at org.netpreserve.jwarc.WarcParser.parse(WarcParser.java:356)
at org.netpreserve.jwarc.WarcReader.next(WarcReader.java:181)
at org.netpreserve.jwarc.WarcReader$1.hasNext(WarcReader.java:355)
... 2 more
Same for:
- data/whirlwind.warc.wat
- data/whirlwind.warc.wet
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels