Is your feature request related to a problem or challenge?
Datafusion currently supports registering files in the Arrow IPC file format as tables:
ctx.register_arrow("my_table", "file.arrow", ArrowReadOptions::default())
.await
.unwrap();
ctx.sql("SELECT * FROM my_table LIMIT 10")
.await
.unwrap()
.show()
.await
.unwrap();
You can also just reference the file path from SQL in datafusion-cli:
> SELECT * FROM 'file.arrow' LIMIT 10;
You cannot, however, do the same with files in the Arrow IPC stream format. You get the error:
called `Result::unwrap()` on an `Err` value: ArrowError(ParseError("Arrow file does not contain correct footer"), None)
Describe the solution you'd like
I would love if register_arrow supported files in the Arrow IPC stream format, or if another equivalent function would be added to do the same. Additionally, it would be great if datafusion-cli could just reference the files by name in the same way it can for the alternative Arrow IPC format.
Describe alternatives you've considered
- Convert from the stream format to the file format and then query as shown above.
- Read all the record batches into memory and then register it as MemTable.
- Add a new
StreamProvider impl and use a StreamTable.
There are probably others, too, but none as simple as just being able to register the arrow file with register_arrow or referencing the file directly in datafusion-cli.
Additional context
I'm interested in taking a crack at this feature but, assuming y'all are interested in it, I would love some implementation guidance.
Thanks for your time!
Is your feature request related to a problem or challenge?
Datafusion currently supports registering files in the Arrow IPC file format as tables:
You can also just reference the file path from SQL in
datafusion-cli:You cannot, however, do the same with files in the Arrow IPC stream format. You get the error:
Describe the solution you'd like
I would love if
register_arrowsupported files in the Arrow IPC stream format, or if another equivalent function would be added to do the same. Additionally, it would be great ifdatafusion-clicould just reference the files by name in the same way it can for the alternative Arrow IPC format.Describe alternatives you've considered
StreamProviderimpl and use aStreamTable.There are probably others, too, but none as simple as just being able to register the arrow file with
register_arrowor referencing the file directly indatafusion-cli.Additional context
I'm interested in taking a crack at this feature but, assuming y'all are interested in it, I would love some implementation guidance.
Thanks for your time!