Add small column on empty projection by ch-sc · Pull Request #7833 · apache/datafusion

ch-sc · 2023-10-16T12:11:13Z

Which issue does this PR close?

Improves #3214.

Rationale for this change

If a projection is empty, we add the first column of the input schema since some parts of DataFusion still rely on at least having one column. Instead of selecting the first column from the input schema, these changes aim to select a column with a smaller memory size. The memory size is based on the data type.

What changes are included in this PR?

Are these changes tested?

Basic unit tests for new logic are included. All tests that involve query planning and empty projections execute this code.

Are there any user-facing changes?

…all-column-on-empty-projection

Dandandan · 2023-10-17T09:21:12Z

-// Get the projection exprs from columns in the order of the schema
+/// Accumulate the memory size of a data type measured in bits.
+///
+/// Nested types are traversed and increment `nesting` on every level.


Can we add a comment saying that variable-sized types are estimated using some heuristics?

Makes sense. Added a comment about variable sized types. Feel free to rephrase if you think something is missing.

Dandandan · 2023-10-17T09:22:43Z

+        LargeList(f) => nested_size(f.data_type(), nesting),
+        Struct(fields) => fields
+            .iter()
+            .map(|f| nested_size(f.data_type(), nesting))


In principle we could project a sub-field from a struct instead of the entire struct (all columns).

Good idea, I will play around with it. Though it sounds like a rare edge case to me where no other "smaller" type would be present in the schema!?

Yeah indeed :)

Dandandan

awesome @ch-sc ! I left a few comments.

This will yield some nice performance improvements for SELECT COUNT(*) from [source] queries even without solving #3214

Dandandan · 2023-10-18T11:06:46Z

Change seems non controversial and has some good tests, so merging seems fine.

Thank you @ch-sc 😊

ch-sc added 2 commits October 16, 2023 13:45

Find small column when projection is empty

c196ba2

Merge branch 'main' of github.com:apache/arrow-datafusion into add-sm…

08d1558

…all-column-on-empty-projection

github-actions Bot added optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Oct 16, 2023

ch-sc added 3 commits October 16, 2023 14:26

clippy

9cb3241

fix comment

0807354

fix avro.slt test

05d2179

Dandandan reviewed Oct 17, 2023

View reviewed changes

Comment thread datafusion/optimizer/src/push_down_projection.rs Outdated

Dandandan reviewed Oct 17, 2023

View reviewed changes

Comment thread datafusion/optimizer/src/push_down_projection.rs Outdated

Dandandan reviewed Oct 17, 2023

View reviewed changes

Comment thread datafusion/sqllogictest/test_files/avro.slt

Dandandan reviewed Oct 17, 2023

View reviewed changes

Dandandan approved these changes Oct 17, 2023

View reviewed changes

ch-sc added 2 commits October 18, 2023 11:16

use min_by

f648cce

clippy

cf77e80

Dandandan approved these changes Oct 18, 2023

View reviewed changes

Dandandan merged commit 7acd883 into apache:main Oct 18, 2023

matthewgapp mentioned this pull request Jan 11, 2024

matt/feat/recursive ctes/config flag matthewgapp/arrow-datafusion#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add small column on empty projection#7833

Add small column on empty projection#7833
Dandandan merged 7 commits intoapache:mainfrom
ch-sc:add-small-column-on-empty-projection

ch-sc commented Oct 16, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Dandandan Oct 17, 2023

Uh oh!

ch-sc Oct 18, 2023

Uh oh!

Dandandan Oct 17, 2023

Uh oh!

ch-sc Oct 18, 2023 •

edited

Loading

Uh oh!

Dandandan Oct 18, 2023

Uh oh!

Dandandan left a comment

Uh oh!

Dandandan commented Oct 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ch-sc commented Oct 16, 2023

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Dandandan Oct 17, 2023

Choose a reason for hiding this comment

Uh oh!

ch-sc Oct 18, 2023

Choose a reason for hiding this comment

Uh oh!

Dandandan Oct 17, 2023

Choose a reason for hiding this comment

Uh oh!

ch-sc Oct 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dandandan Oct 18, 2023

Choose a reason for hiding this comment

Uh oh!

Dandandan left a comment

Choose a reason for hiding this comment

Uh oh!

Dandandan commented Oct 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ch-sc Oct 18, 2023 •

edited

Loading