Fix ambiguous reference when aliasing in combination with ORDER BY#8425
Fix ambiguous reference when aliasing in combination with ORDER BY#8425alamb merged 41 commits intoapache:mainfrom
ORDER BY#8425Conversation
# Conflicts: # datafusion/physical-plan/src/joins/hash_join_utils.rs
# Conflicts: # datafusion/physical-plan/src/joins/hash_join_utils.rs
alamb
left a comment
There was a problem hiding this comment.
Thank you @Asura7969 . This is a really nice fix 🕵️ 💯
The code in this PR looks good to me, though I do wonder abut the relation field (see inline comment)
Also, I tested the original query locally and this PR does indeed fix the problem 🎉
DataFusion CLI v33.0.0
❯ CREATE TABLE t (a INTEGER);
0 rows in set. Query took 0.018 seconds.
❯ SELECT t.a AS a FROM t ORDER BY t.a;
0 rows in set. Query took 0.011 seconds.
❯
Can you please add some tests to ensure we don't break it accidentally in the future (the only tests that were changed are related to subqueries and I worry someone will accidentally break remove the fix code and not realize we introduced a regression)?
These should probably be end to end sqllogic tests:
https://github.com/apache/arrow-datafusion/tree/main/datafusion/sqllogictest
Perhaps added to https://github.com/apache/arrow-datafusion/blob/main/datafusion/sqllogictest/test_files/select.slt
| )?; | ||
| let expr = col.alias(self.normalizer.normalize(alias)); | ||
| let name = self.normalizer.normalize(alias); | ||
| let expr = match &col { |
There was a problem hiding this comment.
Can you please add a comment here about what this is doing? I think it is avoiding adding an alias if the column name is the same.
Do we have to worry about relation names here too (like what if column.relation is non null?)
There was a problem hiding this comment.
I understand, but I didn't expect the scenario where column.relation is inconsistent🤔
There was a problem hiding this comment.
I was thinking of a query like
EXPLAIN select a as a FROM table1 CROSS JOIN table2 order by a
Where both table1 and table2 have a column named a. I think we should add a test for this case too -- I'll try and do so later today
I would expect the test to fail
There was a problem hiding this comment.
as you expected
DataFusion CLI v33.0.0
❯ CREATE TABLE table1 (a int) as values(1);
0 rows in set. Query took 0.037 seconds.
❯ CREATE TABLE table2 (a int) as values(1);
0 rows in set. Query took 0.012 seconds.
❯ EXPLAIN select a as a FROM table1 CROSS JOIN table2 order by a;
Schema error: Ambiguous reference to unqualified field a| let expected = "Sort: CAST(first_name AS first_name AS Int32) ASC NULLS LAST\ | ||
| \n Projection: first_name AS first_name\ | ||
| \n Projection: person.first_name AS first_name\ | ||
| let expected = "Sort: CAST(person.first_name AS Int32) ASC NULLS LAST\ |
|
fyi @Jesse-Bakker |
| } | ||
|
|
||
| #[test] | ||
| fn test_avoid_add_alias() { |
There was a problem hiding this comment.
Ensure logical consistency
alamb
left a comment
There was a problem hiding this comment.
Thanks @Asura7969 -- there is just one last thing I worry about with multiple relation names that I think we can resolve with some more tests.
| )?; | ||
| let expr = col.alias(self.normalizer.normalize(alias)); | ||
| let name = self.normalizer.normalize(alias); | ||
| let expr = match &col { |
There was a problem hiding this comment.
I was thinking of a query like
EXPLAIN select a as a FROM table1 CROSS JOIN table2 order by a
Where both table1 and table2 have a column named a. I think we should add a test for this case too -- I'll try and do so later today
I would expect the test to fail
|
|
||
| # ambiguous column references in on join | ||
| query error DataFusion error: Schema error: Ambiguous reference to unqualified field a | ||
| EXPLAIN select a as a FROM table1 t1 CROSS JOIN table1 t2 order by a |
alamb
left a comment
There was a problem hiding this comment.
Thank you for sticking with this @Asura7969 -- looks good to me
…pache#8425) * Minor: Improve the document format of JoinHashMap * ambiguous reference * ignore aliases * fix * fix * add test * add test * add test * add test
Which issue does this PR close?
Closes #8391.
Rationale for this change
What changes are included in this PR?
When the field alias is consistent with the original name, cancel the alias
Are these changes tested?
Overwrite existing tests
Are there any user-facing changes?