-
Notifications
You must be signed in to change notification settings - Fork 49
Parallelize Paimon split execution in DataFusion #168
Copy link
Copy link
Description
Search before asking
- I searched in the issues and found nothing similar.
Description
The current DataFusion integration reads all planned Paimon splits in a single DataFusion execution partition.
At the moment, PaimonTableScan reports UnknownPartitioning(1) and performs scan.plan() inside ExecutionPlan::execute(). This has two drawbacks:
- It prevents DataFusion from scheduling independent Paimon splits across threads
- It makes the execution model harder to extend, because planning is mixed into per-partition execution
This task is to parallelize the DataFusion scan path by exposing one execution partition per Paimon split, or per pre-grouped split set, while preserving the current read semantics.
Willingness to contribute
- I'm willing to submit a PR!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels