[fix](parquet)Fix the be core issue when reading parquet unsigned types. (#39926)#40123
Merged
morningman merged 1 commit intoapache:branch-2.1from Aug 29, 2024
Conversation
…es. (apache#39926) ## Proposed changes Since Doris does not have an unsigned type, we convert parquet uint32 type to doris bigint (int64) type. When reading the parquet file, the byte size stored in parquet and the byte size of the data type mapped by doris are inconsistent, resulting in be core. Fix: When reading, we read according to the byte size stored in parquet, and then convert it to the data type mapped by doris. Mapping relationship description: parquet -> doris UInt8 -> Int16 UInt16 -> Int32 UInt32 -> Int64 UInt64 -> Int128.
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
Contributor
Author
|
run buildall |
Contributor
|
clang-tidy review says "All clean, LGTM! 👍" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
bp #39926
Proposed changes
Since Doris does not have an unsigned type, we convert parquet uint32 type to doris bigint (int64) type.
When reading the parquet file, the byte size stored in parquet and the byte size of the data type mapped by doris are inconsistent, resulting in be core.
Fix:
When reading, we read according to the byte size stored in parquet, and then convert it to the data type mapped by doris.
Mapping relationship description:
parquet -> doris
UInt8 -> Int16
UInt16 -> Int32
UInt32 -> Int64
UInt64 -> Int128.
Proposed changes
Issue Number: close #xxx