Skip to content

Inconsistent value for data_page_max_rows setting in DataFusion ParquetOptions and in ArrowWriterOptions #11367

@alamb

Description

@alamb

Describe the bug

@wiedld pointed out that the default value of data_page_max_rows is different in DataFusion than in arrow-rs

https://docs.rs/datafusion/latest/datafusion/common/config/struct.ParquetOptions.html sets it to usize::max

As of apache/arrow-rs#5957 was released in 51.1.0 , arrow-rs sets it to 20,000

To Reproduce

N/A

Expected behavior

The DataFusion defaults should be the same as the arrow-rs https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/struct.ArrowWriterOptions.html, unless there is a good reason to deviate

In this case, I think we should go with the arrow-rs value

Ideally a fix for this ticket would both

  1. Change the defaults of ParquetOptions to match the default in ArrowWriterOptions
  2. Write a test that verified the default values were the same

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions