Change FileScanConfig.table_partition_cols from (String, DataType) to Fields#7890
Change FileScanConfig.table_partition_cols from (String, DataType) to Fields#7890Dandandan merged 5 commits intoapache:mainfrom
FileScanConfig.table_partition_cols from (String, DataType) to Fields#7890Conversation
|
@alamb and @crepererum |
| pub limit: Option<usize>, | ||
| /// The partitioning columns | ||
| pub table_partition_cols: Vec<(String, DataType)>, | ||
| pub table_partition_cols: Vec<Field>, |
There was a problem hiding this comment.
Here is the key change
There was a problem hiding this comment.
You probably want to use FieldRef not Field
| let partition_idx = idx - self.file_schema.fields().len(); | ||
| let (name, dtype) = &self.table_partition_cols[partition_idx]; | ||
| table_fields.push(Field::new(name, dtype.to_owned(), false)); | ||
| table_fields.push(self.table_partition_cols[partition_idx].to_owned()); |
There was a problem hiding this comment.
And this where we convert table_partition_cols to Field
alamb
left a comment
There was a problem hiding this comment.
Thank you @NGA-TRAN -- I think the idea of passing a real Field as the partition column makes a lot of sense and that this PR does it very nicely 👍
I had a few code improvement suggestions, but nothing I think is required to merge this.
Thanks again
| ) | ||
| } | ||
|
|
||
| fn config_for_proj_with_field_tab_part( |
There was a problem hiding this comment.
I find this name confusing given the three letter abbreviations and I don't think this is common elsewhere in the DataFusion codebase.
How about something like
| fn config_for_proj_with_field_tab_part( | |
| fn config_for_projection_with_partition_fields( |
Or maybe instead you could change config_for_projection to take table_partition_cols: Vec<Field>, and make a function like
/// Convert all
fn partition_cols( table_partition_cols: Vec<(&str, DataType)>) -> Vec<Field> {
table_partition_cols
.iter()
.map(|(name, dtype)| Field::new(name, dtype.clone(), false))
.collect::<Vec<_>>()
}And then convert the call sites of config_for_projection to be config_for_projection(.., partition_cols(..))
There was a problem hiding this comment.
I implemented your second suggestion @alamb . Thanks
FileScanConfig.table_partition_cols from (String, DataType) to Fields
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
|
I have addressed all the comments. Thanks @alamb |
|
Thanks @NGA-TRAN |
Which issue does this PR close?
Closes #7875
Rationale for this change
Currently,
FileScanConfig.table_partition_colshas data typeVec<(String, DataType)>to store only columns name and its data type. A column can include many more information such asnullableand extra meta data. Thus, when we convert table_partition_cols to Fields here, all other information of a field will either empty or default.We want the data type of table_partition_cols a vector of Fields in the first place so when we need to store a Field, we won't lose any information.
FYI: IOx needs this requirement.
What changes are included in this PR?
Replace data type of
FileScanConfig.table_partition_colsfromVec<(String, DataType)>to Vec`Are these changes tested?
Yes
Are there any user-facing changes?
The API to create
FileScanConfigneeds a vector of Fields fortable_partition_cols. Most of the places it is an empty vector means it is not used.