Skip to content

Sometimes Filters are not repartitioned when they could be #4967

@alamb

Description

@alamb

Describe the bug

We previously had a plan like this (where the RepartitionExec was added prior to a filter in order to increase parallelism).

However, after upgrading DataFusion, the RepartitionExec is no longer there. I actually think this is a slightly worse plan as now the filter can not be done in parallel

FilterExec: tag@2 = A
 RepartitionExec: partitioning=RoundRobinBatch(4)  <--- This RepartitionExec has been removed
   DeduplicateExec: [tag@2 ASC,time@3 ASC]
    SortPreservingMergeExec: [tag@2 ASC,time@3 ASC]
      UnionExec
       ParquetExec: limit=None, partitions={1 group: [[1/1/1/1/00000000-0000-0000-0000-000000000000.parquet]]}, predicate=tag = Dictionary(Int32, Utf8("A")), pruning_predicate=tag_min@0 <= A AND A <= tag_max@1, output_ordering=[tag@2 ASC, time@3 ASC], projection=[bar, foo, tag, time] |
       SortExec: [tag@2 ASC,time@3 ASC].
         RecordBatchesExec: batches_groups=1 batches=1

To Reproduce
I am working on a reproducer

Expected behavior
A RepartitionExec should be added if it will increase parallelism for filtering

Additional context
We found this while upgrading IOx:

https://github.com/influxdata/influxdb_iox/pull/6603 -- see https://github.com/influxdata/influxdb_iox/pull/6603/files#r1072606494

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingoptimizerOptimizer rules

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions