[improvement](mtmv) Materialized view partition track supports date_trunc and optimize the fail reason#35562
Merged
morrySnow merged 14 commits intoapache:masterfrom Jun 5, 2024
Merged
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
Member
Author
|
run buildall |
1 similar comment
Member
Author
|
run buildall |
TPC-H: Total hot run time: 40704 ms |
TPC-DS: Total hot run time: 170257 ms |
ClickBench: Total hot run time: 30.02 s |
Member
Author
|
run buildall |
TPC-H: Total hot run time: 39956 ms |
TPC-DS: Total hot run time: 171459 ms |
ClickBench: Total hot run time: 30.74 s |
b0f6bf8 to
706d12b
Compare
Member
Author
|
run buildall |
TPC-H: Total hot run time: 39818 ms |
TPC-DS: Total hot run time: 169663 ms |
ClickBench: Total hot run time: 30.13 s |
morrySnow
reviewed
May 30, 2024
Member
Author
|
run buildall |
TPC-H: Total hot run time: 42107 ms |
TPC-DS: Total hot run time: 172696 ms |
ClickBench: Total hot run time: 30.17 s |
morrySnow
previously approved these changes
May 31, 2024
Contributor
|
PR approved by at least one committer and no changes requested. |
Contributor
|
PR approved by anyone and no changes requested. |
Member
Author
|
run buildall |
Member
Author
|
run buildall |
TPC-DS: Total hot run time: 170818 ms |
ClickBench: Total hot run time: 31.17 s |
zddr
reviewed
Jun 5, 2024
| ) | ||
| AS | ||
| SELECT date_trunc(`k2`,'miniute') as month_alias, * FROM ${tableName}; | ||
| SELECT date_trunc(`k2`,'miniute') as miniute_alias, * FROM ${tableName}; |
zddr
approved these changes
Jun 5, 2024
morrySnow
approved these changes
Jun 5, 2024
dataroaring
pushed a commit
that referenced
this pull request
Jun 7, 2024
… optimize the fail reason (#35562) this depends on #34781 1. Materialized view partition track supports date_trunc and optimize the fail reason. 2. it supports create partition mv as following: this mv will be partition updated by day CREATE MATERIALIZED VIEW mv_6 BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by(date_trunc(date_alias, 'day')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT date_trunc(t1.L_SHIPDATE, 'hour') as date_alias, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS, count(distinct case when t1.L_SUPPKEY > 0 then t2.O_ORDERSTATUS else null end) as cnt_1 from (select * from lineitem where L_SHIPDATE in ('2017-01-30')) t1 left join (select * from orders where O_ORDERDATE in ('2017-01-30')) t2 on t1.L_ORDERKEY = t2.O_ORDERKEY group by t1.L_SHIPDATE, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS;
seawinde
added a commit
to seawinde/doris
that referenced
this pull request
Jun 7, 2024
… optimize the fail reason (apache#35562) this depends on apache#34781 1. Materialized view partition track supports date_trunc and optimize the fail reason. 2. it supports create partition mv as following: this mv will be partition updated by day CREATE MATERIALIZED VIEW mv_6 BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by(date_trunc(date_alias, 'day')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT date_trunc(t1.L_SHIPDATE, 'hour') as date_alias, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS, count(distinct case when t1.L_SUPPKEY > 0 then t2.O_ORDERSTATUS else null end) as cnt_1 from (select * from lineitem where L_SHIPDATE in ('2017-01-30')) t1 left join (select * from orders where O_ORDERDATE in ('2017-01-30')) t2 on t1.L_ORDERKEY = t2.O_ORDERKEY group by t1.L_SHIPDATE, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS;
This was referenced Jun 12, 2024
seawinde
added a commit
to seawinde/doris
that referenced
this pull request
Jun 20, 2024
… optimize the fail reason (apache#35562) this depends on apache#34781 1. Materialized view partition track supports date_trunc and optimize the fail reason. 2. it supports create partition mv as following: this mv will be partition updated by day CREATE MATERIALIZED VIEW mv_6 BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by(date_trunc(date_alias, 'day')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT date_trunc(t1.L_SHIPDATE, 'hour') as date_alias, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS, count(distinct case when t1.L_SUPPKEY > 0 then t2.O_ORDERSTATUS else null end) as cnt_1 from (select * from lineitem where L_SHIPDATE in ('2017-01-30')) t1 left join (select * from orders where O_ORDERDATE in ('2017-01-30')) t2 on t1.L_ORDERKEY = t2.O_ORDERKEY group by t1.L_SHIPDATE, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS;
seawinde
added a commit
to seawinde/doris
that referenced
this pull request
Jun 20, 2024
… optimize the fail reason (apache#35562) this depends on apache#34781 1. Materialized view partition track supports date_trunc and optimize the fail reason. 2. it supports create partition mv as following: this mv will be partition updated by day CREATE MATERIALIZED VIEW mv_6 BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by(date_trunc(date_alias, 'day')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT date_trunc(t1.L_SHIPDATE, 'hour') as date_alias, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS, count(distinct case when t1.L_SUPPKEY > 0 then t2.O_ORDERSTATUS else null end) as cnt_1 from (select * from lineitem where L_SHIPDATE in ('2017-01-30')) t1 left join (select * from orders where O_ORDERDATE in ('2017-01-30')) t2 on t1.L_ORDERKEY = t2.O_ORDERKEY group by t1.L_SHIPDATE, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS;
morrySnow
pushed a commit
that referenced
this pull request
Jun 21, 2024
…by (#36175) This is brought by #35562 At the pr above when you create partition materialized view as following, which would fail with the message: Unable to find a suitable base table for partitioning CREATE MATERIALIZED VIEW mvName BUILD IMMEDIATE REFRESH AUTO ON MANUAL PARTITION BY (date_trunc(month_alias, 'month')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ( 'replication_num' = '1' ) AS SELECT date_trunc(`k2`,'day') AS month_alias, k3, count(*) FROM tableName GROUP BY date_trunc(`k2`,'day'), k3; This pr supports to create partition materialized view when `date_trunc` in group by cluause.
morrySnow
pushed a commit
that referenced
this pull request
Jun 21, 2024
… rewrite by partition rolled up mv (#36414) This is brought by #35562 When mv is partition rolled up mv, which is rolled up by date_trunc. If base table add new partition. if query rewrite successfully by the partition mv, the data will lost the new partition data. This pr fix this problem. For example as following: mv def is: CREATE MATERIALIZED VIEW roll_up_mv BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by (date_trunc(`col1`, 'month')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey; if run the insert comand insert into lineitem values (1, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-11-21', '2023-11-21', '2023-11-21', 'a', 'b', 'yyyyyyyyy'); then run query as following, result will not return the 2023-11-21 partition data select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey;
dataroaring
pushed a commit
that referenced
this pull request
Jun 21, 2024
…by (#36175) This is brought by #35562 At the pr above when you create partition materialized view as following, which would fail with the message: Unable to find a suitable base table for partitioning CREATE MATERIALIZED VIEW mvName BUILD IMMEDIATE REFRESH AUTO ON MANUAL PARTITION BY (date_trunc(month_alias, 'month')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ( 'replication_num' = '1' ) AS SELECT date_trunc(`k2`,'day') AS month_alias, k3, count(*) FROM tableName GROUP BY date_trunc(`k2`,'day'), k3; This pr supports to create partition materialized view when `date_trunc` in group by cluause.
dataroaring
pushed a commit
that referenced
this pull request
Jun 21, 2024
… rewrite by partition rolled up mv (#36414) This is brought by #35562 When mv is partition rolled up mv, which is rolled up by date_trunc. If base table add new partition. if query rewrite successfully by the partition mv, the data will lost the new partition data. This pr fix this problem. For example as following: mv def is: CREATE MATERIALIZED VIEW roll_up_mv BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by (date_trunc(`col1`, 'month')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey; if run the insert comand insert into lineitem values (1, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-11-21', '2023-11-21', '2023-11-21', 'a', 'b', 'yyyyyyyyy'); then run query as following, result will not return the 2023-11-21 partition data select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey;
seawinde
added a commit
to seawinde/doris
that referenced
this pull request
Jun 27, 2024
… optimize the fail reason (apache#35562) this depends on apache#34781 1. Materialized view partition track supports date_trunc and optimize the fail reason. 2. it supports create partition mv as following: this mv will be partition updated by day CREATE MATERIALIZED VIEW mv_6 BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by(date_trunc(date_alias, 'day')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT date_trunc(t1.L_SHIPDATE, 'hour') as date_alias, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS, count(distinct case when t1.L_SUPPKEY > 0 then t2.O_ORDERSTATUS else null end) as cnt_1 from (select * from lineitem where L_SHIPDATE in ('2017-01-30')) t1 left join (select * from orders where O_ORDERDATE in ('2017-01-30')) t2 on t1.L_ORDERKEY = t2.O_ORDERKEY group by t1.L_SHIPDATE, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS;
morrySnow
pushed a commit
that referenced
this pull request
Jul 5, 2024
seawinde
added a commit
to seawinde/doris
that referenced
this pull request
Jul 11, 2024
…by (apache#36175) This is brought by apache#35562 At the pr above when you create partition materialized view as following, which would fail with the message: Unable to find a suitable base table for partitioning CREATE MATERIALIZED VIEW mvName BUILD IMMEDIATE REFRESH AUTO ON MANUAL PARTITION BY (date_trunc(month_alias, 'month')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ( 'replication_num' = '1' ) AS SELECT date_trunc(`k2`,'day') AS month_alias, k3, count(*) FROM tableName GROUP BY date_trunc(`k2`,'day'), k3; This pr supports to create partition materialized view when `date_trunc` in group by cluause.
seawinde
added a commit
to seawinde/doris
that referenced
this pull request
Jul 11, 2024
… rewrite by partition rolled up mv (apache#36414) This is brought by apache#35562 When mv is partition rolled up mv, which is rolled up by date_trunc. If base table add new partition. if query rewrite successfully by the partition mv, the data will lost the new partition data. This pr fix this problem. For example as following: mv def is: CREATE MATERIALIZED VIEW roll_up_mv BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by (date_trunc(`col1`, 'month')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey; if run the insert comand insert into lineitem values (1, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-11-21', '2023-11-21', '2023-11-21', 'a', 'b', 'yyyyyyyyy'); then run query as following, result will not return the 2023-11-21 partition data select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey;
morrySnow
pushed a commit
that referenced
this pull request
Sep 13, 2024
…m both side of join (#40485) This is brought by #35562 if partition mv def is as following: CREATE MATERIALIZED VIEW mv1 BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL PARTITION BY (upgrade_day) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select t1.upgrade_day, t2.batch_no, count(*) from test2 t2 join test1 t1 on t1.upgrade_day = t2.upgrade_day group by t1.upgrade_day, t2.batch_no; the mv related partition table should `test1`, but now is `test2`, this pr fix this.
seawinde
added a commit
to seawinde/doris
that referenced
this pull request
Sep 13, 2024
…m both side of join (apache#40485) This is brought by apache#35562 if partition mv def is as following: CREATE MATERIALIZED VIEW mv1 BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL PARTITION BY (upgrade_day) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select t1.upgrade_day, t2.batch_no, count(*) from test2 t2 join test1 t1 on t1.upgrade_day = t2.upgrade_day group by t1.upgrade_day, t2.batch_no; the mv related partition table should `test1`, but now is `test2`, this pr fix this.
seawinde
added a commit
to seawinde/doris
that referenced
this pull request
Sep 14, 2024
…m both side of join (apache#40485) This is brought by apache#35562 if partition mv def is as following: CREATE MATERIALIZED VIEW mv1 BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL PARTITION BY (upgrade_day) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select t1.upgrade_day, t2.batch_no, count(*) from test2 t2 join test1 t1 on t1.upgrade_day = t2.upgrade_day group by t1.upgrade_day, t2.batch_no; the mv related partition table should `test1`, but now is `test2`, this pr fix this.
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed changes
this depends on #34781
Materialized view partition track supports date_trunc and optimize the fail reason.
it supports create partition mv as following:
this mv will be partition updated by day