[fix](merge-on-write) segcompaction should process delete bitmap if necessary#38369
Merged
zhannngchen merged 5 commits intoapache:masterfrom Aug 1, 2024
Merged
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
Contributor
Author
|
run buildall |
Contributor
Author
|
run buildall |
6865700 to
7759455
Compare
Contributor
Author
|
run buildall |
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 39487 ms |
TPC-DS: Total hot run time: 171977 ms |
ClickBench: Total hot run time: 30.94 s |
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 39742 ms |
TPC-DS: Total hot run time: 174282 ms |
ClickBench: Total hot run time: 30.2 s |
Contributor
|
PR approved by at least one committer and no changes requested. |
Contributor
|
PR approved by anyone and no changes requested. |
dataroaring
pushed a commit
that referenced
this pull request
Aug 3, 2024
## Proposed changes Issue Number: close #xxx introduced by #38369
feiniaofeiafei
pushed a commit
to feiniaofeiafei/doris
that referenced
this pull request
Aug 9, 2024
…ecessary (apache#38369) ## Proposed changes Issue Number: close #xxx When loading data to a unique key table with sequence column, some data in current load job might be marked as delete due to a lower sequence value. If there's many segments in such load job, segcompaction might be triggered, which don't process the delete bitmap currently, will cause data correctness issue For example: 1. we have 4 segments in current load job initially, and due to seq column, some rows are marked as deleted 2. after segcompaction, if we don't process the delete bitmap, it's content is still corresponding to the old segment layout, and row 7,14,15 is not mark deleted correctly on new generated segment 1. 3. in this PR, we convert old delete bitmap to fit new segment layout, it use similar way as base/cumulative compaction to convert delete bitmaps on old layout to new one, but the rowid conversion is simpler 
dataroaring
pushed a commit
that referenced
this pull request
Aug 11, 2024
…ecessary (#38369) ## Proposed changes Issue Number: close #xxx When loading data to a unique key table with sequence column, some data in current load job might be marked as delete due to a lower sequence value. If there's many segments in such load job, segcompaction might be triggered, which don't process the delete bitmap currently, will cause data correctness issue For example: 1. we have 4 segments in current load job initially, and due to seq column, some rows are marked as deleted 2. after segcompaction, if we don't process the delete bitmap, it's content is still corresponding to the old segment layout, and row 7,14,15 is not mark deleted correctly on new generated segment 1. 3. in this PR, we convert old delete bitmap to fit new segment layout, it use similar way as base/cumulative compaction to convert delete bitmaps on old layout to new one, but the rowid conversion is simpler 
dataroaring
pushed a commit
that referenced
this pull request
Aug 11, 2024
## Proposed changes Issue Number: close #xxx introduced by #38369
dataroaring
pushed a commit
that referenced
this pull request
Aug 16, 2024
…ecessary (#38369) ## Proposed changes Issue Number: close #xxx When loading data to a unique key table with sequence column, some data in current load job might be marked as delete due to a lower sequence value. If there's many segments in such load job, segcompaction might be triggered, which don't process the delete bitmap currently, will cause data correctness issue For example: 1. we have 4 segments in current load job initially, and due to seq column, some rows are marked as deleted 2. after segcompaction, if we don't process the delete bitmap, it's content is still corresponding to the old segment layout, and row 7,14,15 is not mark deleted correctly on new generated segment 1. 3. in this PR, we convert old delete bitmap to fit new segment layout, it use similar way as base/cumulative compaction to convert delete bitmaps on old layout to new one, but the rowid conversion is simpler 
dataroaring
pushed a commit
that referenced
this pull request
Aug 16, 2024
## Proposed changes Issue Number: close #xxx introduced by #38369
zhannngchen
added a commit
to zhannngchen/incubator-doris
that referenced
this pull request
Aug 21, 2024
…ecessary (apache#38369) Issue Number: close #xxx When loading data to a unique key table with sequence column, some data in current load job might be marked as delete due to a lower sequence value. If there's many segments in such load job, segcompaction might be triggered, which don't process the delete bitmap currently, will cause data correctness issue For example: 1. we have 4 segments in current load job initially, and due to seq column, some rows are marked as deleted 2. after segcompaction, if we don't process the delete bitmap, it's content is still corresponding to the old segment layout, and row 7,14,15 is not mark deleted correctly on new generated segment 1. 3. in this PR, we convert old delete bitmap to fit new segment layout, it use similar way as base/cumulative compaction to convert delete bitmaps on old layout to new one, but the rowid conversion is simpler 
zhannngchen
added a commit
to zhannngchen/incubator-doris
that referenced
this pull request
Aug 21, 2024
## Proposed changes Issue Number: close #xxx introduced by apache#38369
yiguolei
pushed a commit
that referenced
this pull request
Aug 21, 2024
zhannngchen
added a commit
to zhannngchen/incubator-doris
that referenced
this pull request
Aug 22, 2024
…ecessary (apache#38369) (apache#39707) Issue Number: close #xxx cherry-pick apache#38369 and apache#38800
zhannngchen
added a commit
that referenced
this pull request
Aug 22, 2024
Closed
mongo360
pushed a commit
to mongo360/doris
that referenced
this pull request
Dec 11, 2024
…ecessary (apache#38369) (apache#39749) cherry-pick apache#38369 and apache#38800
16 tasks
zhannngchen
pushed a commit
that referenced
this pull request
Jul 17, 2025
… input segments before converting delete bitmaps on them (#53198) ### What problem does this PR solve? When the table has sequence column, load may generate delete bitmap marks on its own rowset. Segment compaction should wait for delete bitmaps generation before converting these delete bitmaps(#38369).
zclllyybb
pushed a commit
to zclllyybb/doris
that referenced
this pull request
Feb 28, 2026
… input segments before converting delete bitmaps on them (apache#53198) ### What problem does this PR solve? When the table has sequence column, load may generate delete bitmap marks on its own rowset. Segment compaction should wait for delete bitmaps generation before converting these delete bitmaps(apache#38369).
zclllyybb
pushed a commit
to zclllyybb/doris
that referenced
this pull request
Mar 1, 2026
… input segments before converting delete bitmaps on them (apache#53198) ### What problem does this PR solve? When the table has sequence column, load may generate delete bitmap marks on its own rowset. Segment compaction should wait for delete bitmaps generation before converting these delete bitmaps(apache#38369).
zclllyybb
pushed a commit
to zclllyybb/doris
that referenced
this pull request
Mar 1, 2026
… input segments before converting delete bitmaps on them (apache#53198) ### What problem does this PR solve? When the table has sequence column, load may generate delete bitmap marks on its own rowset. Segment compaction should wait for delete bitmaps generation before converting these delete bitmaps(apache#38369).
zclllyybb
pushed a commit
to zclllyybb/doris
that referenced
this pull request
Mar 1, 2026
… input segments before converting delete bitmaps on them (apache#53198) ### What problem does this PR solve? When the table has sequence column, load may generate delete bitmap marks on its own rowset. Segment compaction should wait for delete bitmaps generation before converting these delete bitmaps(apache#38369).
zclllyybb
pushed a commit
to zclllyybb/doris
that referenced
this pull request
Mar 1, 2026
… input segments before converting delete bitmaps on them (apache#53198) ### What problem does this PR solve? When the table has sequence column, load may generate delete bitmap marks on its own rowset. Segment compaction should wait for delete bitmaps generation before converting these delete bitmaps(apache#38369).
zclllyybb
pushed a commit
to zclllyybb/doris
that referenced
this pull request
Mar 1, 2026
… input segments before converting delete bitmaps on them (apache#53198) ### What problem does this PR solve? When the table has sequence column, load may generate delete bitmap marks on its own rowset. Segment compaction should wait for delete bitmaps generation before converting these delete bitmaps(apache#38369).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed changes
Issue Number: close #xxx
When loading data to a unique key table with sequence column, some data in current load job might be marked as delete due to a lower sequence value.
If there's many segments in such load job, segcompaction might be triggered, which don't process the delete bitmap currently, will cause data correctness issue
For example: