Is your feature request related to a problem or challenge?
JOB (Join Order Benchmark) was proposed by a research team from TUM in the paper "How Good Are Query Optimizers, Really?".
It is also used in HyPer, DuckDB, and CedarDB. It is a good benchmark for testing join ordering and join operators. It is also part of DuckDB's regression test suite.
I think if we add this test suite, it will also help with improvements like those discussed in #7955.
Describe the solution you'd like
JOB utilize the IMDB datasets. These datasets are provided in csv.gz format and represent real-world data, making them ideal for testing datafusion.
task
Once everything is set up, we will be able to easily run benchmarks using the following command:
cargo run --bin dfbench --imdb --query=5
I would like to work on this!
Can someone help me understand the usual process for adding a third-party license in a Apache project ?
cc @jayzhan211 @alamb
Describe alternatives you've considered
No response
Additional context
No response
Is your feature request related to a problem or challenge?
JOB (Join Order Benchmark) was proposed by a research team from TUM in the paper "How Good Are Query Optimizers, Really?".
It is also used in HyPer, DuckDB, and CedarDB. It is a good benchmark for testing join ordering and join operators. It is also part of DuckDB's regression test suite.
I think if we add this test suite, it will also help with improvements like those discussed in #7955.
Describe the solution you'd like
JOB utilize the IMDB datasets. These datasets are provided in csv.gz format and represent real-world data, making them ideal for testing datafusion.
task
csv.gzformat toParquet.dfbench.Once everything is set up, we will be able to easily run benchmarks using the following command:
I would like to work on this!
Can someone help me understand the usual process for adding a third-party license in a Apache project ?
cc @jayzhan211 @alamb
Describe alternatives you've considered
No response
Additional context
No response