Skip to content

Support writing hive style partitioned files in DataFrame::write command #9237

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

@Omega359 asked on discord: https://discord.com/channels/885562378132000778/1166447479609376850/1207458257874984970

Q: Is there a way to write out a dataframe to parquet with hive-style partitioning without having to create a table provider? I am pretty sure that a ListingTableProvider or a custom table provider will work but that seems like a ton of config for this

Describe the solution you'd like

I would like to be able to use DataFrame::write_parquet and the other APIs to write partitioned files

I suggest adding the table_partition_cols from ListingOptions as one of the options on https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrameWriteOptions.html

So way to specify partition information would be as described on ListingOptions::with_table_partition_cols

So that would look something like

let options = DataFrameWriteOptions::new()
  .with_table_partition_cols(vec![
      ("col_a".to_string(), DataType::Utf8),
  ]);

// write the data frame to parquet
// producing files like
// /tmp/my_table/col_a=foo/12345.parquet (data with 'foo' in column a)
// ..
// /tmp/my_table/col_a=zoo/12345.parquet (data with 'zoo' in column a)
df.write_parquet("/tmp/my_table", &options, None).await?

Describe alternatives you've considered

No response

Additional context

Possibly related to #8493

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions