Is your feature request related to a problem or challenge?
Previously, by default, EXPLAIN ANALYZE will show all available metrics inside an operator. It can get quite verbose for some operator.
In datafusion-cli:
> CREATE EXTERNAL TABLE IF NOT EXISTS lineitem
STORED AS parquet
LOCATION '/Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem';
0 row(s) fetched.
Elapsed 0.000 seconds.
> explain analyze select *
from lineitem
where l_orderkey = 3000000;
The parquet reader includes a large number of low-level details:
metrics=[output_rows=19813, elapsed_compute=14ns, batches_split=0, bytes_scanned=2147308, file_open_errors=0, file_scan_errors=0, files_ranges_pruned_statistics=18, num_predicate_creation_errors=0, page_index_rows_matched=19813, page_index_rows_pruned=729088, predicate_cache_inner_records=0, predicate_cache_records=0, predicate_evaluation_errors=0, pushdown_rows_matched=0, pushdown_rows_pruned=0, row_groups_matched_bloom_filter=0, row_groups_matched_statistics=1, row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=0, bloom_filter_eval_time=21.997µs, metadata_load_time=273.83µs, page_index_eval_time=29.915µs, row_pushdown_eval_time=42ns, statistics_eval_time=76.248µs, time_elapsed_opening=4.02146ms, time_elapsed_processing=24.787461ms, time_elapsed_scanning_total=24.17671ms, time_elapsed_scanning_until_data=23.103665ms]
I believe only a subset of it is commonly used, for example output_rows, metadata_load_time, and how many file/row-group/pages are pruned, and it would better to only display the most common ones by default.
After #18098, the EXPLAIN ANALYZE detail level can be controlled through an option
> set datafusion.explain.analyze_level = summary;
0 row(s) fetched.
Elapsed 0.000 seconds.
> explain analyze select * from generate_series(10000) as t1(v1) order by v1 desc;
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | SortExec: expr=[v1@0 DESC], preserve_partitioning=[false], metrics=[output_rows=10001, elapsed_compute=100µs] |
| | ProjectionExec: expr=[value@0 as v1], metrics=[output_rows=10001, elapsed_compute=1.166µs] |
| | LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=0, end=10000, batch_size=8192], metrics=[output_rows=10001, elapsed_compute=43µs] |
| | |
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.
> set datafusion.explain.analyze_level = dev;
0 row(s) fetched.
Elapsed 0.000 seconds.
> explain analyze select * from generate_series(10000) as t1(v1) order by v1 desc;
+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | SortExec: expr=[v1@0 DESC], preserve_partitioning=[false], metrics=[output_rows=10001, elapsed_compute=222.043µs, spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, batches_split=2] |
| | ProjectionExec: expr=[value@0 as v1], metrics=[output_rows=10001, elapsed_compute=2.584µs] |
| | LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=0, end=10000, batch_size=8192], metrics=[output_rows=10001, elapsed_compute=162.625µs] |
| | |
+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.003 seconds.
Now only output_rows and elapsed_compute are included in the summary analyze level, and the dev level is the default.
The goal of this issue is, for each operator:
- Investigate what's the most commonly used metrics for high-level insights, postgres
and DuckDB's result can be used as reference
- Set those common metrics to
summary type
And finally set summary analyze level to default.
Describe the solution you'd like
Identify which metrics should be included in the summary level for all operators.
Describe alternatives you've considered
No response
Additional context
No response
Is your feature request related to a problem or challenge?
Previously, by default,
EXPLAIN ANALYZEwill show all available metrics inside an operator. It can get quite verbose for some operator.In
datafusion-cli:The parquet reader includes a large number of low-level details:
I believe only a subset of it is commonly used, for example output_rows, metadata_load_time, and how many file/row-group/pages are pruned, and it would better to only display the most common ones by default.
After #18098, the
EXPLAIN ANALYZEdetail level can be controlled through an optionNow only
output_rowsandelapsed_computeare included in thesummaryanalyze level, and thedevlevel is the default.The goal of this issue is, for each operator:
and DuckDB's result can be used as reference
summarytypeAnd finally set
summaryanalyze level to default.Describe the solution you'd like
Identify which metrics should be included in the
summarylevel for all operators.Describe alternatives you've considered
No response
Additional context
No response