Skip to content

Evaluating InList without columns always returns a record batch with 1 row (causing panic "selection contains less than the number of selected row") #8600

@alamb

Description

@alamb

Describe the bug

Evaluating InList without columns always returns a record batch with 1 row

To Reproduce

This test will fail -- it evaluates <const> IN <list> on a RecordBatch of three rows but the returned batch has a single row

#[test]
    fn in_list_no_cols() -> Result<()> {
        // test logic when the in_list expression doesn't have any columns
        let schema = Schema::new(vec![Field::new(
            "a",
            DataType::Int32,
            true,
        )]);
        let a = Int32Array::from(vec![
            Some(1),
            Some(2),
            None,
        ]);
        let batch = RecordBatch::try_new(Arc::new(schema.clone()), vec![Arc::new(a)])?;

        let list = vec![
            lit(ScalarValue::from(1i32)),
            lit(ScalarValue::from(6i32)),
        ];

        // 1 IN (1, 6)
        let expr = lit(ScalarValue::Int32(Some(1)));
        in_list!(
            batch,
            list.clone(),
            &false,
            // should have three outputs, as the input batch has three rows
            vec![Some(true), Some(true), Some(true)],
            expr,
            &schema
        );

        // 2 IN (1, 6)
        let expr = lit(ScalarValue::Int32(Some(2)));
        in_list!(
            batch,
            list.clone(),
            &false,
            // should have three outputs, as the input batch has three rows
            vec![Some(false), Some(false), Some(false)],
            expr,
            &schema
        );

        // NULL IN (1, 6)
        let expr = lit(ScalarValue::Int32(None));
        in_list!(
            batch,
            list.clone(),
            &false,
            // should have three outputs, as the input batch has three rows
            vec![None, None, None],
            expr,
            &schema
        );

        Ok(())
    }

Expected behavior

The returned record batch should have the same number of rows as the input batch (three in the case above)

Additional context

I couldn't create a reproducer at the SQL level because expressions like <const> IN <list> are folded to a constant during planning time

We actually hit this in IOX where it shows up as "selection contains less than the number of selected rows"

thread 'tokio-runtime-worker' panicked at /Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parquet-49.0.0/src/arrow/arrow_reader/selection.rs:308:17:
selection contains less than the number of selected rows

What was happening is that we had a query like this

set datafusion.execution.parquet.pushdown_filters = true;

SELECT x from table WHERE x IN ('1','2','3','8' )

And we were reading from multiple parquet files one of which did not have the column x but one did. So in this case the IN list predicate was being applied, but returned a 1 row selection to the parquet reader

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions