Skip to content

(RESPECT NULLS / IGNORE NULLS is syntax for window functions, not aggregate functions #15006

@vbarua

Description

@vbarua

Describe the bug

Background

In the SQL standard, RESPECT NULLS and IGNORE NULLS are options to be set for the lead, lag, first_value, last_value and nth_value window functions.

Some window functions permit a null_treatment clause that specifies how to handle NULL values when calculating results. This clause is optional. It is part of the SQL standard, but the MySQL implementation permits only RESPECT NULLS (which is also the default).

MySQL

The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead, lag, first_value, last_value, and nth_value. This is not implemented in PostgreSQL: the behavior is always the same as the standard's default, namely RESPECT NULLS. Likewise, the standard's FROM FIRST or FROM LAST option for nth_value is not implemented: only the default FROM FIRST behavior is supported. (You can achieve the result of FROM LAST by reversing the ORDER BY ordering.)

Postgres

While MySQL and Postgres don't support the IGNORE NULLS options, there are systems that support it for those functions:

Bug

DataFusion permits queries like

SELECT FIRST_VALUE(column1) RESPECT NULLS FROM t; 

and

SELECT FIRST_VALUE(column1) IGNORE NULLS FROM t; 

and in fact has tests for them.

This doesn't conform with how most engines work with these options.

A more "standard" version of these IGNORE NULLS would look like

SELECT FIRST_VALUE(column1) FILTER (WHERE column1 IS NOT NULL) FROM t; 

That being said, part of the weirdness here is that DataFusion defines first_value both as an aggregate function and as a window function.

Other engines treat first_value solely as a window function, and provide something like any_value (Trino, Postgres, Snowflake).

To Reproduce

See tests for RESPECT NULLS and IGNORE NULLS in the codebase

query I
SELECT FIRST_VALUE(column1) RESPECT NULLS FROM t;
----
NULL
query I
SELECT FIRST_VALUE(column1) IGNORE NULLS FROM t;
----
3

Expected behavior

To better conform with standard SQL behaviour, at the SQL level:

  1. RESPECT NULLS and IGNORE NULLS should not be valid options for aggregate functions.
  2. RESPECT NULLS and IGNORE NULLS should be only valid for the following window functions: lead, lag, first_value, last_value, nth_value

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions