[Enhancement](doris-future) Support REGR_ARGX, REGR_ARGY, REGR_COUNT and REGR_R2 aggregate functions#61352
Conversation
…and REGR_R2 aggregate functions
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 26608 ms |
TPC-DS: Total hot run time: 168152 ms |
FE UT Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
| // Keep `syy` to preserve regr_intercept's historical serialized state layout. | ||
| static constexpr size_t sy_level = 2; |
There was a problem hiding this comment.
Older versions used 2 here. It might be possible to reduce this to 1,
but doing so could change the serialized aggregate-state layout and potentially break deserialize() compatibility.
|
/review |
Code Review Summary — PR #61352Scope: This PR adds 4 new REGR aggregate functions ( Minor note: PR title contains a typo — "REGR_ARGX, REGR_ARGY" should be "REGR_AVGX, REGR_AVGY". Critical Checkpoint Conclusions
Verdict: No issues found. Clean, well-structured addition with good test coverage and backward-compatible serialization. |
|
|
||
| public static final String COUNT = "count"; | ||
|
|
||
| public static final String REGR_AVGX = "regr_avgx"; |
There was a problem hiding this comment.
remove these useless code
| if (!yType.isNumericType() && !yType.isNullType()) { | ||
| throw new AnalysisException("regr_avgx requires numeric for first parameter: " + toSql()); | ||
| } | ||
| if (!xType.isNumericType() && !xType.isNullType()) { | ||
| throw new AnalysisException("regr_avgx requires numeric for second parameter: " + toSql()); | ||
| } |
There was a problem hiding this comment.
maybe we should support string type expr as a parameter
There was a problem hiding this comment.
Thanks, supported in a later commit.
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 27284 ms |
TPC-DS: Total hot run time: 168864 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 26836 ms |
TPC-DS: Total hot run time: 167562 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
TPC-H: Total hot run time: 29176 ms |
TPC-DS: Total hot run time: 180138 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run nonConcurrent |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
# Conflicts: # be/src/exprs/aggregate/aggregate_function_regr_union.cpp # be/src/exprs/aggregate/aggregate_function_regr_union.h
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
There was a problem hiding this comment.
Summary: The unified BE implementation and FE registrations look consistent for REGR_AVGX, REGR_AVGY, REGR_COUNT, and REGR_R2, and I did not find a blocking correctness or compatibility issue in the patch.
Critical checkpoints:
- Goal and correctness: The PR goal is to add the remaining REGR aggregate functions on top of the unified implementation. The BE factory registration, FE builtin registration, Nereids wrappers, and regression suites are all present, so the change appears to achieve that goal.
- Scope and focus: The change is reasonably focused on REGR aggregates plus their regression coverage. The BE refactor also removes duplicated code without widening scope beyond this family.
- Concurrency: No new concurrency or locking paths are introduced.
- Lifecycle and initialization: No special lifecycle or static-initialization risk found.
- Configuration: No new config items.
- Compatibility: The existing REGR serialization layouts appear preserved for the pre-existing functions, including the explicit preservation of regr_intercept's historical state layout. FE/BE function names, arity, and result nullability expectations are aligned.
- Parallel code paths: Both BE execution registration and FE builtin/Nereids registration were updated for the full REGR family touched by this PR.
- Special conditions and error handling: Preconditions are straightforward and consistent with adjacent aggregate implementations; no silent-failure pattern stood out.
- Test coverage: Regression suites were added for REGR_AVGX, REGR_AVGY, REGR_COUNT, and REGR_R2, and related existing REGR suites were updated. Coverage includes empty input, null filtering, literals, coercion, and negative signature cases. I did not rerun tests locally in this review.
- Observability / transaction / persistence / FE-BE variable passing: Not applicable here.
- Performance: The unified implementation removes duplication and does not show an obvious hot-path regression.
- Other issues: None found during this review.
Residual risk: the added regression cases explicitly run with Nereids enabled, so classic-planner execution is not directly exercised by the new tests; I did not find evidence of a registration gap, but that planner path remains less directly covered than Nereids.
…#3544) ## Versions Add REGR aggregate function docs for apache/doris#61352 - [x] dev - [x] 4.x - [ ] 3.x - [ ] 2.1 ## Languages - [x] Chinese - [x] English ## Docs Checklist - [x] Checked by AI - [ ] Test Cases Built
What problem does this PR solve?
Issue Number: #38974, #38976
Related PR: #55940
Problem Summary:
This PR completes the remaining statistical regression aggregate functions (
REGR_*) by adding support forREGR_AVGX,REGR_AVGY,REGR_COUNT, andREGR_R2, based on the unified Regr aggregate functionapproach introduced in #55940.
Release note
None
Check List (For Author)
Test
Behavior changed:
REGR_AVGX,REGR_AVGY,REGR_COUNT, andREGR_R2aggregate functionsDoes this need documentation?
Check List (For Reviewer who merge this PR)