Skip to content

Calling with_column twice generates an error when the second call uses a window function #12425

@Michael-J-Ward

Description

@Michael-J-Ward

Describe the bug

Calling with_column twice generates an error when the second column is a window expression.

df
.with_column("foo", <normal_expr>)
.with_column("bar, <window_expr>)

Because "foo" does not have a qualifier, the second call to with_column ends up aliasing it as well.

let mut fields: Vec<Expr> = plan
.schema()
.iter()
.map(|(qualifier, field)| {
if field.name() == name {
col_exists = true;
new_column.clone()
} else if window_func && qualifier.is_none() {
col(Column::from((qualifier, field))).alias(name)
} else {
col(Column::from((qualifier, field)))
}
})
.collect();

Error: Plan("Projections require unique expression names but the expression \"s AS r\" at position 3 and \"row_number() ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS r\" at position 4 have the same name. Consider aliasing (\"AS\") one of them.")

To Reproduce

Update test_window_function_with_column to first call with_column with any expression.

For example:

    // Test issue: https://github.com/apache/datafusion/issues/11982
    // Window function was creating unwanted projection when using with_column() method.
    #[tokio::test]
    async fn test_window_function_with_column() -> Result<()> {
        let df = test_table().await?.select_columns(&["c1", "c2", "c3"])?;
        let ctx = SessionContext::new();
        let df_impl = DataFrame::new(ctx.state(), df.plan.clone());
        let func = row_number().alias("row_num");

        // This first `with_column` results in a column without a `qualifier` 
        let df_impl = df_impl.with_column("s", col("c2") + col("c3"))?;

        // This second `with_column` then assigns `"r"` alias to the above column and the window function
        // Should create an additional column with alias 'r' that has window func results
        let df = df_impl.with_column("r", func)?.limit(0, Some(2))?;
        assert_eq!(4, df.schema().fields().len());

        let df_results = df.clone().collect().await?;
        assert_batches_sorted_eq!(
            [
                "+----+----+-----+---+",
                "| c1 | c2 | c3  | r |",
                "+----+----+-----+---+",
                "| c  | 2  | 1   | 1 |",
                "| d  | 5  | -40 | 2 |",
                "+----+----+-----+---+",
            ],
            &df_results
        );

        Ok(())
    }

Expected behavior

I would expect the second call to succeed and the final dataframe to have columns c1, c2, c3, s, r

Additional context

#12000 introduced that conditional.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions