[opt](serde)Optimize the filling of fixed values into block columns without repeated deserialization. (#37377) (#38245)#38810
Merged
yiguolei merged 2 commits intoapache:branch-2.1from Aug 5, 2024
Conversation
… without repeated deserialization. (apache#37377) ## Proposed changes Since the value of the partition column is fixed when querying the partition table, we can deserialize the value only once and then repeatedly insert the value into the block. ```sql in Hive: CREATE TABLE parquet_partition_tb ( col1 STRING, col2 INT, col3 DOUBLE ) PARTITIONED BY ( partition_col1 STRING, partition_col2 INT ) STORED AS PARQUET; insert into parquet_partition_tb partition (partition_col1="hello",partition_col2=1) values("word",2,2.3); insert into parquet_partition_tb partition(partition_col1="hello",partition_col2=1 ) select col1,col2,col3 from parquet_partition_tb where partition_col1="hello" and partition_col2=1; Repeat the `insert into xxx select xxx`operation several times. Doris : before: mysql> select count(partition_col1) from parquet_partition_tb; +-----------------------+ | count(partition_col1) | +-----------------------+ | 33554432 | +-----------------------+ 1 row in set (3.24 sec) mysql> select count(partition_col2) from parquet_partition_tb; +-----------------------+ | count(partition_col2) | +-----------------------+ | 33554432 | +-----------------------+ 1 row in set (3.34 sec) after: mysql> select count(partition_col1) from parquet_partition_tb ; +-----------------------+ | count(partition_col1) | +-----------------------+ | 33554432 | +-----------------------+ 1 row in set (0.79 sec) mysql> select count(partition_col2) from parquet_partition_tb; +-----------------------+ | count(partition_col2) | +-----------------------+ | 33554432 | +-----------------------+ 1 row in set (0.51 sec) ``` ## Summary: test sql `select count(partition_col) from tbl;` Number of lines : 33554432 | |before | after| |---|---|--| |boolean | 3.96|0.47 | |tinyint | 3.39|0.47 | |smallint | 3.14|0.50 | |int |3.34|0.51 | |bigint | 3.61|0.51 | |float | 4.59 |0.51 | |double |4.60| 0.55 | |decimal(5,2)| 3.96 |0.61 | |date | 5.80|0.52 | |timestamp | 7.68 | 0.52 | |string | 3.24 |0.79 | Issue Number: close #xxx <!--Describe your changes.-->
…rom_fixed_json (apache#38245) ## Proposed changes fix a bug in DataTypeNullableSerDe.deserialize_column_from_fixed_json. The expected behavior of the `deserialize_column_from_fixed_json` function is to `insert` n values into the column. However, when the `DataTypeNullableSerDe` class implements this function, the null_map column is `resize` to n, which does not insert n values into it. Since this function is only used by the `_fill_partition_columns` of the `parquet/orc reader` and is not called repeatedly for a `get_next_block`, this bug is covered up. before pr : apache#37377
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
Contributor
Author
|
run buildall |
Contributor
|
clang-tidy review says "All clean, LGTM! 👍" |
|
TeamCity be ut coverage result: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed changes
pick pr: #38575 and fix this pr bug : #38245