GH-43594: [C++] Remove std::optional from arrow::ArrayStatistics::is_{min,max}_exact#43595
GH-43594: [C++] Remove std::optional from arrow::ArrayStatistics::is_{min,max}_exact#43595kou merged 1 commit intoapache:mainfrom
Conversation
|
|
There was a problem hiding this comment.
This looks good to me, but IMO is_min_exact = false might still has exact statistics but the reader cannot gurantee that, since apache/parquet-format#216 is new in 2.10 :-(
There was a problem hiding this comment.
Are the followings correct?
- Parquet 2.9 or earlier data don't have
is_min_value_exact/is_max_value_exact - Parquet 2.9 or earlier data use only exact min/max
- Parquet 2.10 or later data use exact min/max or non-exact min/max
- Parquet 2.10 or later data may use exact min/max without
is_min_value_exact/is_max_value_exact
You're focusing on the 2. case, right? Can our Parquet reader detect Parquet version? If so, can we always set true to is_min_exact/is_max_exact for Parquet 2.9 or earlier?
There was a problem hiding this comment.
Sorry for late replying
Parquet 2.9 or earlier data don't have is_min_value_exact/is_max_value_exact
Yes
Parquet 2.9 or earlier data use only exact min/max
I guess no, I'll send a mail to maillist to make it sure
Parquet 2.10 or later data use exact min/max or non-exact min/max
Yes
Parquet 2.10 or later data may use exact min/max without is_min_value_exact/is_max_value_exact
right
There was a problem hiding this comment.
If so, can we always set true to is_min_exact/is_max_exact for Parquet 2.9 or earlier?
Hmmm I'll try to make it clear
There was a problem hiding this comment.
Anyway we can mention "exact=false" can also means is exact, lol
Or we can denote that the parquet-c++ output is exact.
There was a problem hiding this comment.
OK. Let's use exact=true for Apache Parquet C++ output.
…s::is_{min,max}_exact
We don't need "unknown" state. If they aren't set, we can process they
are not exact.
abcfb72 to
8331bf5
Compare
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit b80a51a. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
We don't need "unknown" state. If they aren't set, we can process they are not exact.
What changes are included in this PR?
Remove
std::optionalfromarrow::ArrayStatistics::is_{min,max}_exact.Are these changes tested?
Yes.
Are there any user-facing changes?
Yes.
std::optionalfromarrow::ArrayStatistics::is_{min,max}_exact#43594