[improve](txn insert) txn insert support write to one table many times#32980
Merged
dataroaring merged 6 commits intoapache:masterfrom May 7, 2024
Merged
[improve](txn insert) txn insert support write to one table many times#32980dataroaring merged 6 commits intoapache:masterfrom
dataroaring merged 6 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
Contributor
Author
|
run buildall |
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 37510 ms |
TPC-DS: Total hot run time: 181883 ms |
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G' |
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 37824 ms |
TPC-DS: Total hot run time: 182424 ms |
ClickBench: Total hot run time: 28.9 s |
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G' |
dataroaring
reviewed
Apr 1, 2024
Contributor
Author
|
run buildall |
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 38520 ms |
TPC-DS: Total hot run time: 181653 ms |
ClickBench: Total hot run time: 30.1 s |
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G' |
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 38735 ms |
TPC-DS: Total hot run time: 181749 ms |
ClickBench: Total hot run time: 31.14 s |
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 41325 ms |
TPC-DS: Total hot run time: 186442 ms |
Contributor
Author
|
run p0 |
Contributor
|
PR approved by at least one committer and no changes requested. |
Contributor
|
PR approved by anyone and no changes requested. |
Contributor
Author
|
run buildall |
Contributor
Author
|
run cloud_p1 |
ByteYue
pushed a commit
to ByteYue/doris
that referenced
this pull request
May 15, 2024
dataroaring
pushed a commit
that referenced
this pull request
Jun 5, 2024
## Proposed changes ### Purpose The user doc: https://doris.apache.org/zh-CN/docs/dev/data-operate/import/transaction-load-manual We have supported insert into select(#31666), update(#33034) and delete(#33100) in transaction load. #32980 implements one txn write to one partition more than one rowsets. This pr implements to cloud mode of #32980 ### Implementation #### sub_txn_id see #32980 #### Meta service supports commit txn This process is generally the same as commit_txn, the difference is that he partitions version will plus 1 in multi sub txns. One example: Suppose the table, partition, tablet and version info is: ``` -------------------------------------------- | table | partition | tablet | version | -------------------------------------------- | t1 | t1_p1 | t1_p1.1 | 1 | | t1 | t1_p1 | t1_p1.2 | 1 | | t1 | t1_p2 | t1_p2.1 | 2 | | t2 | t2_p3 | t2_p3.1 | 3 | | t2 | t2_p4 | t2_p4.1 | 4 | -------------------------------------------- ``` Now we commit a txn with 3 sub txns and the tablets are: * sub_txn1: t1_p1.1, t1_p1.2, t1_p2.1 * sub_txn2: t2_p3.1 * sub_txn3: t1_p1.1, t1_p1.2 When commit, the partitions version will be: * sub_txn1: t1_p1(1 -> 2), t1_p2(2 -> 3) * sub_txn2: t2_p3(3 -> 4) * sub_txn3: t1_p1(2 -> 3) After commit, the partitions version will be: * t1: t1_p1(3), t1_p2(3) * t2: t2_p3(4), t2_p4(4) #### Meta service support generate sub_txn_id by `begin_sub_txn`
dataroaring
pushed a commit
that referenced
this pull request
Jun 7, 2024
## Proposed changes ### Purpose The user doc: https://doris.apache.org/zh-CN/docs/dev/data-operate/import/transaction-load-manual We have supported insert into select(#31666), update(#33034) and delete(#33100) in transaction load. #32980 implements one txn write to one partition more than one rowsets. This pr implements to cloud mode of #32980 ### Implementation #### sub_txn_id see #32980 #### Meta service supports commit txn This process is generally the same as commit_txn, the difference is that he partitions version will plus 1 in multi sub txns. One example: Suppose the table, partition, tablet and version info is: ``` -------------------------------------------- | table | partition | tablet | version | -------------------------------------------- | t1 | t1_p1 | t1_p1.1 | 1 | | t1 | t1_p1 | t1_p1.2 | 1 | | t1 | t1_p2 | t1_p2.1 | 2 | | t2 | t2_p3 | t2_p3.1 | 3 | | t2 | t2_p4 | t2_p4.1 | 4 | -------------------------------------------- ``` Now we commit a txn with 3 sub txns and the tablets are: * sub_txn1: t1_p1.1, t1_p1.2, t1_p2.1 * sub_txn2: t2_p3.1 * sub_txn3: t1_p1.1, t1_p1.2 When commit, the partitions version will be: * sub_txn1: t1_p1(1 -> 2), t1_p2(2 -> 3) * sub_txn2: t2_p3(3 -> 4) * sub_txn3: t1_p1(2 -> 3) After commit, the partitions version will be: * t1: t1_p1(3), t1_p2(3) * t2: t2_p3(4), t2_p4(4) #### Meta service support generate sub_txn_id by `begin_sub_txn`
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed changes
Purpose
We have supported
insert into select(#31666),update(#33034) anddelete(#33100) in transaction load.But leave a problem that, one partition can only be written once in one transaction, because current transaction mechanism only support publish one version for one partition. This pr is to solve this problem.
In other words, this pr supports write to one table many times in one transaction like:
Current implementation
In Doris, one transaction is related to one txn_id
BE use this txn_id to record the load info of partition_id, tablet, DeltaWriter... in
txn_managerIf writing to one partition twice in one txn, the above info in BE may be overwrited
When FE commit the txn, it calcultes a new partition version, a version is related to a Rowset, but multiple loads in txn generate multiple Rowsets.
New implementation
Introduce of sub_txn_id
To solve the above problem, the basic idea is to separate the txn_id in FE and BE. For multiple loads in one txn, we use sub_txn_id to distinguish the load for BE.
One example: suppose table t has 2 partitions, p1 and p2. The current version of p1 is 3, p2 is 4.
2. sub_txn_id1 = txn_id;
* sub_txn_id1: p1(4), p2(5)
* sub_txn_id2: p1(5)
* sub_txn_id3: p1(6), p2(6)
publish_task:
* use sub_txn_id to submit publish version tasks to be
FE Meta
In addition, this pr change the storage of
TransationStatein bdbje to json format to make it compatible.Isolation Level
Doris provides the
READ COMMITTEDisolation level. Please note the following:In a transaction, each statement reads the data that was committed at the time the statement began executing.
In a transaction, each statement cannot read the modifications made by other statements within the same transaction. Please notice:
For delete command, there are 2 implementations, one is delete condition, one is insert.
If the delete condition is committed after the insert, the delete will work for the insert, for example:
User doc
apache/doris-website#604