[feat](nereids) add merge aggregate rule#31811
Conversation
|
Thank you for your contribution to Apache Doris. |
|
run buildall |
|
run buildall |
| private Plan mergeTwoAggregate(Plan plan) { | ||
| LogicalAggregate<Plan> outerAgg = (LogicalAggregate<Plan>) plan; | ||
| LogicalAggregate<Plan> innerAgg = (LogicalAggregate<Plan>) outerAgg.child(); | ||
|
|
There was a problem hiding this comment.
| private Plan mergeTwoAggregate(Plan plan) { | |
| LogicalAggregate<Plan> outerAgg = (LogicalAggregate<Plan>) plan; | |
| LogicalAggregate<Plan> innerAgg = (LogicalAggregate<Plan>) outerAgg.child(); | |
| private Plan mergeTwoAggregate(LogicalAggregate<LogicalAggregate<Plan>> outerAgg) { | |
| LogicalAggregate<Plan> innerAgg = outerAgg.child(); | |
|
|
||
| Map<ExprId, AggregateFunction> innerAggExprIdToAggFunc = innerAgg.getOutputExpressions().stream() | ||
| .filter(expr -> (expr instanceof Alias) && (expr.child(0) instanceof AggregateFunction)) | ||
| .collect(Collectors.toMap(NamedExpression::getExprId, value -> (AggregateFunction) value.child(0))); |
There was a problem hiding this comment.
nit: add mergeFunction in case of duplicate key
| private Plan mergeAggProjectAgg(Plan plan) { | ||
| LogicalAggregate<Plan> outerAgg = (LogicalAggregate<Plan>) plan; | ||
| LogicalProject<Plan> project = (LogicalProject<Plan>) outerAgg.child(); | ||
| LogicalAggregate<Plan> innerAgg = (LogicalAggregate<Plan>) project.child(); |
There was a problem hiding this comment.
private Plan mergeAggProjectAgg(LogicalAggregate<LogicalProject<LogicalAggregate<Plan>>> outerAgg) {
LogicalProject<LogicalAggregate<Plan>> project = outerAgg.child();
LogicalAggregate<Plan> innerAgg = project.child();
| if (innerFunc.isDistinct()) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
inner distinct is ok if outer group by keys are exactly same with inner keys?
| return false; | ||
| } | ||
| // support sum(sum),min(min),max(max),sum(count),any_value(any_value) | ||
| if (!(outerFunc.getName().equals("sum") && innerFunc.getName().equals("count")) |
There was a problem hiding this comment.
trans sum(count()) to count() lead to nullable changed if outer agg is scalar agg. so we need to wrap the final expression with nullable() function to change its nullable to true
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/MergeAggregate.java
Show resolved
Hide resolved
|
|
||
| sql "sync" | ||
|
|
||
| qt_maxMax_minMin_sumSum_sumCount """ |
There was a problem hiding this comment.
add some shape check or ut to ensure merge agg work well
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 37964 ms |
TPC-DS: Total hot run time: 187125 ms |
ClickBench: Total hot run time: 30.81 s |
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G' |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/MergeAggregate.java
Show resolved
Hide resolved
introduced by #31811 sql like this: select col1, col2 from (select a as col1, a as col2 from mal_test1 group by a) t group by col1, col2 ; Transformation Description: In the process of optimizing the query, an agg-project-agg pattern is transformed into a project-agg pattern: Before Transformation: LogicalAggregate +-- LogicalPrject +-- LogicalAggregate After Transformation: LogicalProject +-- LogicalAggregate Before the transformation, the projection in the LogicalProject was a AS col1, a AS col2, and the outer aggregate group by keys were col1, col2. After the transformation, the aggregate group by keys became a, a, and the projection remained a AS col1, a AS col2. Problem: When building the project projections, the group by key a, a needed to be transformed to a AS col1, a AS col2. The old code had a bug where it used the slot as the map key and the alias in the projections as the map value. This approach did not account for the situation where aliases might have the same slot. Solution: The new code fixes this issue by using the original outer aggregate group by expression's exprId. It searches within the original project projections to find the NamedExpression that has the same exprId. These expressions are then placed into the new projections. This method ensures that the correct aliases are maintained, resolving the bug.
introduced by #31811 sql like this: select col1, col2 from (select a as col1, a as col2 from mal_test1 group by a) t group by col1, col2 ; Transformation Description: In the process of optimizing the query, an agg-project-agg pattern is transformed into a project-agg pattern: Before Transformation: LogicalAggregate +-- LogicalPrject +-- LogicalAggregate After Transformation: LogicalProject +-- LogicalAggregate Before the transformation, the projection in the LogicalProject was a AS col1, a AS col2, and the outer aggregate group by keys were col1, col2. After the transformation, the aggregate group by keys became a, a, and the projection remained a AS col1, a AS col2. Problem: When building the project projections, the group by key a, a needed to be transformed to a AS col1, a AS col2. The old code had a bug where it used the slot as the map key and the alias in the projections as the map value. This approach did not account for the situation where aliases might have the same slot. Solution: The new code fixes this issue by using the original outer aggregate group by expression's exprId. It searches within the original project projections to find the NamedExpression that has the same exprId. These expressions are then placed into the new projections. This method ensures that the correct aliases are maintained, resolving the bug.
introduced by #31811 sql like this: select col1, col2 from (select a as col1, a as col2 from mal_test1 group by a) t group by col1, col2 ; Transformation Description: In the process of optimizing the query, an agg-project-agg pattern is transformed into a project-agg pattern: Before Transformation: LogicalAggregate +-- LogicalPrject +-- LogicalAggregate After Transformation: LogicalProject +-- LogicalAggregate Before the transformation, the projection in the LogicalProject was a AS col1, a AS col2, and the outer aggregate group by keys were col1, col2. After the transformation, the aggregate group by keys became a, a, and the projection remained a AS col1, a AS col2. Problem: When building the project projections, the group by key a, a needed to be transformed to a AS col1, a AS col2. The old code had a bug where it used the slot as the map key and the alias in the projections as the map value. This approach did not account for the situation where aliases might have the same slot. Solution: The new code fixes this issue by using the original outer aggregate group by expression's exprId. It searches within the original project projections to find the NamedExpression that has the same exprId. These expressions are then placed into the new projections. This method ensures that the correct aliases are maintained, resolving the bug.
…e members (#36145) This bug is induced by #31811. The innerAggExprIdToAggFunc was a member of MergeAggregate, which was wrong. Because rules like MergeAggregate are single instances, same rules applied to different sub-plans will affect each other. This pr changes innerAggExprIdToAggFunc to a local variable, fixes this bug. No regression use case was added because it’s not a problem that will definitely reoccur and requires the same rule to be applied to multiple plans at the same time.
…e members (apache#36145) This bug is induced by apache#31811. The innerAggExprIdToAggFunc was a member of MergeAggregate, which was wrong. Because rules like MergeAggregate are single instances, same rules applied to different sub-plans will affect each other. This pr changes innerAggExprIdToAggFunc to a local variable, fixes this bug. No regression use case was added because it’s not a problem that will definitely reoccur and requires the same rule to be applied to multiple plans at the same time.
…e members (#36145) This bug is induced by #31811. The innerAggExprIdToAggFunc was a member of MergeAggregate, which was wrong. Because rules like MergeAggregate are single instances, same rules applied to different sub-plans will affect each other. This pr changes innerAggExprIdToAggFunc to a local variable, fixes this bug. No regression use case was added because it’s not a problem that will definitely reoccur and requires the same rule to be applied to multiple plans at the same time.
…on all (#41613) (#41909) introduce by #31811 and #39450 ```sql select count(1) from(select 3, 6 union all select 1, 3) t ``` wrong LogicalUnion plan: ```sql LogicalUnion( qualifier=ALL, outputs=[3#6], regularChildrenOutputs=[], constantExprsList=[[], []], hasPushedFilter=false ``` this sql will report error in explain, because the logical union outputs has a slot, but the logical union has no child and has a empty constantExprList, which is wrong set in column prune. this pr fixes it by consider when require columns is empty and keep the min slot and min slot corresponding const expressions.
Related PR: #31811 Problem Summary: Before this pr: The LogicalAggregate generated by MergeAggregate outputExpressions has duplicated column . This bug will not lead to result wrong, will output the gby key 2 times in LogicalAggregate. After this pr: This pr fix this problem.
Related PR: #31811 Problem Summary: Before this pr: The LogicalAggregate generated by MergeAggregate outputExpressions has duplicated column . This bug will not lead to result wrong, will output the gby key 2 times in LogicalAggregate. After this pr: This pr fix this problem.
Related PR: apache#31811 Problem Summary: Before this pr: The LogicalAggregate generated by MergeAggregate outputExpressions has duplicated column . This bug will not lead to result wrong, will output the gby key 2 times in LogicalAggregate. After this pr: This pr fix this problem.
aggregate can be merged to
this pr add a RBO rule to perform this transform