Is your feature request related to a problem or challenge?
This is a tracking epic for a collection of features related to writing data.
The basic idea is better / full support for writing data:
- to multiple (possibly Partitoned by value) files
- To different file types (parquet, csv, json, avro, arrow)
- In a streaming fashion (input doesn't need to be entirely buffered)
- From SQL (via INSERT, INSERT INTO, COPY, Etc)
- Stream to a target object_store (aka multi-part S3 upload)
This is partially supported today programmatically (see SessionContext::write_csv, etc)
Subtasks:
Is your feature request related to a problem or challenge?
This is a tracking epic for a collection of features related to writing data.
The basic idea is better / full support for writing data:
This is partially supported today programmatically (see SessionContext::write_csv, etc)
Subtasks:
COPY ... TOstatement #5654DataFrame.write_*to useLogicalPlan::Write#5076CopyOptionsfor controlling copy behavior #7322allow_single_file_parallelismby default to write out parquet files in parallel #7590Dictionary(UInt16, Utf8)#7891COPYcommand #8493SINGLE_FILE_OUTPUToption from COPY statement #8621FileTypeenum and replace with atrait#8657DataFrame::writecommand #9237