Skip to content

[Feature](partial update) Support flexible partial update in stream load with json files#39756

Merged
dataroaring merged 29 commits intoapache:masterfrom
bobhan1:mow-flexible-partial-update
Oct 10, 2024
Merged

[Feature](partial update) Support flexible partial update in stream load with json files#39756
dataroaring merged 29 commits intoapache:masterfrom
bobhan1:mow-flexible-partial-update

Conversation

@bobhan1
Copy link
Copy Markdown
Contributor

@bobhan1 bobhan1 commented Aug 22, 2024

This PR add the ability to update different columns for each row in one stream load
Doc: apache/doris-website#1140

Example

MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL, 
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true"); 
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+

test1.json:

{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+

@doris-robot
Copy link
Copy Markdown

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread be/src/olap/rowset/segment_v2/vertical_segment_writer.cpp
Comment thread be/src/olap/rowset/segment_v2/vertical_segment_writer.cpp
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread be/src/olap/rowset/segment_v2/vertical_segment_writer.cpp
Comment thread be/src/olap/rowset/segment_v2/vertical_segment_writer.cpp
@bobhan1 bobhan1 force-pushed the mow-flexible-partial-update branch from bd91d77 to 40a5580 Compare August 22, 2024 04:06
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread be/src/olap/rowset/segment_v2/vertical_segment_writer.cpp
Comment thread be/src/olap/rowset/segment_v2/vertical_segment_writer.cpp
@bobhan1 bobhan1 changed the title [Draft](partial update) Support flexible partial update in stream load with json files [Feature](partial update) Support flexible partial update in stream load with json files Aug 22, 2024
@bobhan1 bobhan1 force-pushed the mow-flexible-partial-update branch from 21b47f7 to edb8bf3 Compare August 22, 2024 07:50
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread be/src/olap/base_tablet.cpp
@bobhan1 bobhan1 force-pushed the mow-flexible-partial-update branch from edb8bf3 to a82fc76 Compare August 22, 2024 10:59
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread be/src/olap/base_tablet.cpp
@bobhan1 bobhan1 force-pushed the mow-flexible-partial-update branch 3 times, most recently from 001ea2a to 830813c Compare August 23, 2024 02:54
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread be/src/exec/tablet_info.h
@bobhan1 bobhan1 force-pushed the mow-flexible-partial-update branch from 830813c to 3e7e9f5 Compare August 23, 2024 03:02
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread be/src/olap/partial_update_info.h
@bobhan1 bobhan1 force-pushed the mow-flexible-partial-update branch 12 times, most recently from 92da59e to ccec48f Compare August 23, 2024 06:59
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread be/src/olap/rowset/segment_v2/vertical_segment_writer.cpp
Comment thread be/src/olap/rowset/segment_v2/vertical_segment_writer.cpp
@bobhan1 bobhan1 force-pushed the mow-flexible-partial-update branch from ccec48f to 139660e Compare August 23, 2024 07:17
@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented Oct 8, 2024

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread be/src/olap/base_tablet.cpp
Comment thread be/src/olap/partial_update_info.cpp
Comment thread be/src/olap/rowset/segment_v2/segment_writer.cpp
Comment thread be/src/olap/rowset/segment_v2/segment_writer.cpp
Comment thread be/src/olap/rowset/segment_v2/segment_writer.cpp
Comment thread be/src/olap/rowset/segment_v2/vertical_segment_writer.cpp
Comment thread be/src/olap/rowset/segment_v2/vertical_segment_writer.cpp
return Status::OK();
}

Status VerticalSegmentWriter::_merge_rows_for_sequence_column(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function '_merge_rows_for_sequence_column' exceeds recommended size/complexity thresholds [readability-function-size]

Status VerticalSegmentWriter::_merge_rows_for_sequence_column(
                              ^
Additional context

be/src/olap/rowset/segment_v2/vertical_segment_writer.cpp:830: 91 lines including whitespace and comments (threshold 80)

Status VerticalSegmentWriter::_merge_rows_for_sequence_column(
                              ^

@doris-robot
Copy link
Copy Markdown

TeamCity be ut coverage result:
Function Coverage: 37.23% (9638/25888)
Line Coverage: 28.54% (79918/280040)
Region Coverage: 28.01% (41327/147559)
Branch Coverage: 24.62% (21055/85508)
Coverage Report: http://coverage.selectdb-in.cc/coverage/2569de09e81dd504a3267b5e991c552ca678fbb0_2569de09e81dd504a3267b5e991c552ca678fbb0/report/index.html

@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented Oct 9, 2024

run cloud_p0

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread be/src/olap/base_tablet.cpp Outdated
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread be/src/olap/base_tablet.cpp
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread be/src/olap/base_tablet.cpp
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread be/src/olap/base_tablet.cpp
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread be/src/olap/base_tablet.cpp
@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented Oct 9, 2024

run buildall

@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented Oct 9, 2024

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread be/src/olap/base_tablet.cpp
@doris-robot
Copy link
Copy Markdown

TeamCity be ut coverage result:
Function Coverage: 37.25% (9645/25895)
Line Coverage: 28.56% (79987/280104)
Region Coverage: 27.99% (41350/147725)
Branch Coverage: 24.60% (21047/85574)
Coverage Report: http://coverage.selectdb-in.cc/coverage/e5b703dbd9f6361ac4bc3456066488cc78091f2f_e5b703dbd9f6361ac4bc3456066488cc78091f2f/report/index.html

Copy link
Copy Markdown
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Copy Markdown
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bobhan1 bobhan1 mentioned this pull request Oct 14, 2024
@Carl-Zhou-CN
Copy link
Copy Markdown
Member

@dataroaring @zhannngchen hi, is there any plan to release this PR? What risks are involved

@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented Jun 25, 2025

@Carl-Zhou-CN the feature will probably be in doris 4.0

@Carl-Zhou-CN
Copy link
Copy Markdown
Member

@Carl-Zhou-CN the feature will probably be in doris 4.0

Thank you very much for your response. I currently need this functionality urgently. If I merge and use it myself, what issues should I be aware of?

@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented Jun 25, 2025

@Carl-Zhou-CN the feature will probably be in doris 4.0

Thank you very much for your response. I currently need this functionality urgently. If I merge and use it myself, what issues should I be aware of?

This PR has problems and some them are fixed in #41701.
This feature may require high iops and io throughput.

@Carl-Zhou-CN
Copy link
Copy Markdown
Member

@Carl-Zhou-CN the feature will probably be in doris 4.0

Thank you very much for your response. I currently need this functionality urgently. If I merge and use it myself, what issues should I be aware of?

This PR has problems and some them are fixed in #41701. This feature may require high iops and io throughput.

Ok. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants