Skip to content

SQL case, the subquery plan inside the sub query expression does not get a chance to run all the optimizer rules #5771

@jiangzhx

Description

@jiangzhx

Describe the bug

dataframe: TableScan node added a unexpected projection projection=[a]

dataframe:

+--------------+-------------------------------------------------------+
| plan_type    | plan                                                  |
+--------------+-------------------------------------------------------+
| logical_plan | Filter: EXISTS (<subquery>)                           |
|              |   Subquery:                                           |
|              |     Aggregate: groupBy=[[]], aggr=[[COUNT(UInt8(1))]] |
|              |       TableScan: t2 projection=[a]                    |
|              |   TableScan: t1 projection=[a, b]                     |
+--------------+-------------------------------------------------------+

sql

+--------------+---------------------------------------------------------+
| plan_type    | plan                                                    |
+--------------+---------------------------------------------------------+
| logical_plan | Filter: EXISTS (<subquery>)                             |
|              |   Subquery:                                             |
|              |     Projection: COUNT(UInt8(1))                         |
|              |       Aggregate: groupBy=[[]], aggr=[[COUNT(UInt8(1))]] |
|              |         TableScan: t2                                   |
|              |   TableScan: t1 projection=[a, b]                       |
+--------------+---------------------------------------------------------+

To Reproduce


#[tokio::test]
async fn test_count_wildcard_on_where_exist() -> Result<()> {
    // you can find create_join_context at datafusion/core/tests/dataframe.rs
    let ctx = create_join_context()?;
    let sql_results = ctx
        .sql("SELECT a, b FROM t1 WHERE EXISTS (SELECT count(*) FROM t2)")
        .await?
        .explain(false, false)?
        .collect()
        .await?;

    let df_results = ctx
        .table("t1")
        .await?
        .filter(Expr::Exists {
            subquery: Subquery {
                subquery: Arc::new(
                    ctx.table("t2")
                        .await?
                        .aggregate(vec![], vec![count(Expr::Wildcard)])?
                        .select(vec![count(Expr::Wildcard)])?
                        .into_optimized_plan()?,
                ),
                outer_ref_columns: vec![],
            },
            negated: false,
        })?
        .select(vec![col("a"), col("b")])?
        .explain(false, false)?
        .collect()
        .await?;

    assert_eq!(
        pretty_format_batches(&sql_results)?.to_string(),
        pretty_format_batches(&df_results)?.to_string()
    );
    Ok(())
}

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions