Describe the bug
Evaluating InList without columns always returns a record batch with 1 row
To Reproduce
This test will fail -- it evaluates <const> IN <list> on a RecordBatch of three rows but the returned batch has a single row
#[test]
fn in_list_no_cols() -> Result<()> {
// test logic when the in_list expression doesn't have any columns
let schema = Schema::new(vec![Field::new(
"a",
DataType::Int32,
true,
)]);
let a = Int32Array::from(vec![
Some(1),
Some(2),
None,
]);
let batch = RecordBatch::try_new(Arc::new(schema.clone()), vec![Arc::new(a)])?;
let list = vec![
lit(ScalarValue::from(1i32)),
lit(ScalarValue::from(6i32)),
];
// 1 IN (1, 6)
let expr = lit(ScalarValue::Int32(Some(1)));
in_list!(
batch,
list.clone(),
&false,
// should have three outputs, as the input batch has three rows
vec![Some(true), Some(true), Some(true)],
expr,
&schema
);
// 2 IN (1, 6)
let expr = lit(ScalarValue::Int32(Some(2)));
in_list!(
batch,
list.clone(),
&false,
// should have three outputs, as the input batch has three rows
vec![Some(false), Some(false), Some(false)],
expr,
&schema
);
// NULL IN (1, 6)
let expr = lit(ScalarValue::Int32(None));
in_list!(
batch,
list.clone(),
&false,
// should have three outputs, as the input batch has three rows
vec![None, None, None],
expr,
&schema
);
Ok(())
}
Expected behavior
The returned record batch should have the same number of rows as the input batch (three in the case above)
Additional context
I couldn't create a reproducer at the SQL level because expressions like <const> IN <list> are folded to a constant during planning time
We actually hit this in IOX where it shows up as "selection contains less than the number of selected rows"
thread 'tokio-runtime-worker' panicked at /Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parquet-49.0.0/src/arrow/arrow_reader/selection.rs:308:17:
selection contains less than the number of selected rows
What was happening is that we had a query like this
set datafusion.execution.parquet.pushdown_filters = true;
SELECT x from table WHERE x IN ('1','2','3','8' )
And we were reading from multiple parquet files one of which did not have the column x but one did. So in this case the IN list predicate was being applied, but returned a 1 row selection to the parquet reader
Describe the bug
Evaluating
InListwithout columns always returns a record batch with 1 rowTo Reproduce
This test will fail -- it evaluates
<const> IN <list>on aRecordBatchof three rows but the returned batch has a single rowExpected behavior
The returned record batch should have the same number of rows as the input batch (three in the case above)
Additional context
I couldn't create a reproducer at the SQL level because expressions like
<const> IN <list>are folded to a constant during planning timeWe actually hit this in IOX where it shows up as "selection contains less than the number of selected rows"
What was happening is that we had a query like this
And we were reading from multiple parquet files one of which did not have the column
xbut one did. So in this case the IN list predicate was being applied, but returned a 1 row selection to the parquet reader