Skip to content

Parallelize Paimon split execution in DataFusion #168

@QuakeWang

Description

@QuakeWang

Search before asking

  • I searched in the issues and found nothing similar.

Description

The current DataFusion integration reads all planned Paimon splits in a single DataFusion execution partition.

At the moment, PaimonTableScan reports UnknownPartitioning(1) and performs scan.plan() inside ExecutionPlan::execute(). This has two drawbacks:

  1. It prevents DataFusion from scheduling independent Paimon splits across threads
  2. It makes the execution model harder to extend, because planning is mixed into per-partition execution

This task is to parallelize the DataFusion scan path by exposing one execution partition per Paimon split, or per pre-grouped split set, while preserving the current read semantics.

Willingness to contribute

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions