Skip to content

Predicates on to_timestamp do not work as expected with "naive" timestamp strings #765

@alamb

Description

@alamb

Describe the bug
Given a TimestampNanosecondArray which pretty-prints as follows:

    // +---------------------+
    // | time                |
    // +---------------------+
    // | 2021-07-20 23:28:50 |
    // | 2021-07-20 23:30:30 |
    // +---------------------+

Queries involving a predicate such as time < to_timestamp('2021-07-20 23:29:30') do not filter any rows (even though they should filter the row with 2021-07-20 23:30:30)

To Reproduce

async fn datafusion_reproducer() {
    let array = TimestampNanosecondArray::from(vec![1626823730000000000, 1626823830000000000]);
    let array: ArrayRef = Arc::new(array);
    println!("array[0]: {:?}", array_value_to_string(&array, 0).unwrap());
    println!("array[1]: {:?}", array_value_to_string(&array, 1).unwrap());

    let batch = RecordBatch::try_from_iter(vec![("time", array)]).unwrap();
    let table = MemTable::try_new(batch.schema(), vec![vec![batch]]).unwrap();
    let table = Arc::new(table);

    // select * from t
    // +---------------------+
    // | time                |
    // +---------------------+
    // | 2021-07-20 23:28:50 |
    // | 2021-07-20 23:30:30 |
    // +---------------------+
    run_query(table.clone(), "select * from t").await;

    // Using the following predicate should result in a single row,
    // but instead results in both
    //
    // select * from t where time < to_timestamp('2021-07-20 23:29:30')
    // +---------------------+
    // | time                |
    // +---------------------+
    // | 2021-07-20 23:28:50 |
    // | 2021-07-20 23:30:30 |
    // +---------------------+
    run_query(table.clone(), "select * from t where time < to_timestamp('2021-07-20 23:29:30')").await;



    // explain select * from t where time < to_timestamp('2021-07-20 23:29:30')
    // +---------------+---------------------------------------------------------------+
    // | plan_type     | plan                                                          |
    // +---------------+---------------------------------------------------------------+
    // | logical_plan  | Projection: #t.time                                           |
    // |               |   Filter: #t.time Lt TimestampNanosecond(1626838170000000000) |
    // |               |     TableScan: t projection=Some([0])                         |
    // | physical_plan | ProjectionExec: expr=[time@0 as time]                         |
    // |               |   CoalesceBatchesExec: target_batch_size=4096                 |
    // |               |     FilterExec: time@0 < 1626838170000000000                  |
    // |               |       RepartitionExec: partitioning=RoundRobinBatch(16)       |
    // |               |         MemoryExec: partitions=1, partition_sizes=[1]         |
    // +---------------+---------------------------------------------------------------+

    run_query(table.clone(), "explain select * from t where time < to_timestamp('2021-07-20 23:29:30')").await;
}

#[allow(dead_code)]
async fn run_query(csvdata: Arc<dyn TableProvider>, query: &str)  {


    let mut ctx = ExecutionContext::new();
    ctx.register_table("t", csvdata).unwrap();

    let results = ctx.sql(query)
        .unwrap()
        .collect()
        .await
        .unwrap();

    let pretty = pretty_format_batches(&results).unwrap();
    println!("{}\n{}", query, pretty);
}

Expected behavior
The query should produce a single row with timestamp 2021-07-20 23:28:50

However the actual query returns both rows

Additional context
We saw this in IOx: https://github.com/influxdata/influxdb_iox/issues/2071

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions