Describe the bug
Background
In the SQL standard, RESPECT NULLS and IGNORE NULLS are options to be set for the lead, lag, first_value, last_value and nth_value window functions.
Some window functions permit a null_treatment clause that specifies how to handle NULL values when calculating results. This clause is optional. It is part of the SQL standard, but the MySQL implementation permits only RESPECT NULLS (which is also the default).
MySQL
The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead, lag, first_value, last_value, and nth_value. This is not implemented in PostgreSQL: the behavior is always the same as the standard's default, namely RESPECT NULLS. Likewise, the standard's FROM FIRST or FROM LAST option for nth_value is not implemented: only the default FROM FIRST behavior is supported. (You can achieve the result of FROM LAST by reversing the ORDER BY ordering.)
Postgres
While MySQL and Postgres don't support the IGNORE NULLS options, there are systems that support it for those functions:
Bug
DataFusion permits queries like
SELECT FIRST_VALUE(column1) RESPECT NULLS FROM t;
and
SELECT FIRST_VALUE(column1) IGNORE NULLS FROM t;
and in fact has tests for them.
This doesn't conform with how most engines work with these options.
A more "standard" version of these IGNORE NULLS would look like
SELECT FIRST_VALUE(column1) FILTER (WHERE column1 IS NOT NULL) FROM t;
That being said, part of the weirdness here is that DataFusion defines first_value both as an aggregate function and as a window function.
Other engines treat first_value solely as a window function, and provide something like any_value (Trino, Postgres, Snowflake).
To Reproduce
See tests for RESPECT NULLS and IGNORE NULLS in the codebase
|
query I |
|
SELECT FIRST_VALUE(column1) RESPECT NULLS FROM t; |
|
---- |
|
NULL |
|
|
|
query I |
|
SELECT FIRST_VALUE(column1) IGNORE NULLS FROM t; |
|
---- |
|
3 |
Expected behavior
To better conform with standard SQL behaviour, at the SQL level:
RESPECT NULLS and IGNORE NULLS should not be valid options for aggregate functions.
RESPECT NULLS and IGNORE NULLS should be only valid for the following window functions: lead, lag, first_value, last_value, nth_value
Additional context
No response
Describe the bug
Background
In the SQL standard, RESPECT NULLS and IGNORE NULLS are options to be set for the
lead,lag,first_value,last_valueandnth_valuewindow functions.MySQL
Postgres
While MySQL and Postgres don't support the IGNORE NULLS options, there are systems that support it for those functions:
Bug
DataFusion permits queries like
and
and in fact has tests for them.
This doesn't conform with how most engines work with these options.
A more "standard" version of these IGNORE NULLS would look like
That being said, part of the weirdness here is that DataFusion defines
first_valueboth as an aggregate function and as a window function.Other engines treat
first_valuesolely as a window function, and provide something likeany_value(Trino, Postgres, Snowflake).To Reproduce
See tests for RESPECT NULLS and IGNORE NULLS in the codebase
datafusion/datafusion/sqllogictest/test_files/aggregate.slt
Lines 5866 to 5874 in dd0fd88
Expected behavior
To better conform with standard SQL behaviour, at the SQL level:
RESPECT NULLSandIGNORE NULLSshould not be valid options for aggregate functions.RESPECT NULLSandIGNORE NULLSshould be only valid for the following window functions:lead,lag,first_value,last_value,nth_valueAdditional context
No response