consistent logical & physical NTILE return types#8270
Merged
alamb merged 1 commit intoapache:mainfrom Nov 21, 2023
Merged
Conversation
msirek
approved these changes
Nov 20, 2023
Contributor
msirek
left a comment
There was a problem hiding this comment.
The fix looks good.
It might be good to fix some other bugs in ntile. Do you think a separate issue should be opened for these or is it something maybe worth addressing in this PR?
DataFusion CLI v33.0.0
❯ create table t1 (a int);
0 rows in set. Query took 0.005 seconds.
❯ insert into t1 values (1),(2),(3);
+-------+
| count |
+-------+
| 3 |
+-------+
1 row in set. Query took 0.006 seconds.
-- Do these results make sense? All other databases return ntile values 1,2,3.
-- Tested at https://dbfiddle.uk/
❯ select ntile(9223377) OVER(ORDER BY a) from t1;
+--------------------------------------------------------------------------------------------------------+
| NTILE(Int64(9223377)) ORDER BY [t1.a ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW |
+--------------------------------------------------------------------------------------------------------+
| 1 |
| 3074460 |
| 6148919 |
+--------------------------------------------------------------------------------------------------------+
-- This should return a regular error instead of an internal error
❯ select ntile(9223372036854775809) OVER(ORDER BY a) from t1;
Internal error: Cannot convert UInt64(9223372036854775809) to i64.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
-- This should not panic and crash the datafusion cli
❯ select ntile(-922337203685477580) OVER(ORDER BY a) from t1;
thread 'main' panicked at /home/ms/git/arrow-datafusion/datafusion/physical-expr/src/window/ntile.rs:100:23:
attempt to multiply with overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
/home/ms/git/arrow-datafusion/datafusion-cli (ntile_output_type ✔) ᐅ
alamb
approved these changes
Nov 20, 2023
Contributor
Thanks @msirek . I recommend we file a ticket and fix these issues in a follow on PR. My rationale is that this PR is strictly better than main (even though you have identified other areas where it can be better). |
7 tasks
Contributor
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #7639.
Rationale for this change
At this moment, while logical plan building, return type of
ntileconsidered to beUInt32, and physical expression ofntilereturnsUInt64-- this leads to incompatible schemas of created memory table and record batches on insertion.FWIW: another schema incompatibility left (non breaking in case of CTAS, though) -- nullability of output fields, should be fixed in #7638
What changes are included in this PR?
Return type of
ntilelogical expression isUInt64nowAre these changes tested?
Unit test for return type & minimal reproducer from the issue in sqllogictests
Are there any user-facing changes?
Return type of
ntilelogical expression isUInt64now