Skip to content

[Enhancement](doris-future) Support REGR_ARGX, REGR_ARGY, REGR_COUNT and REGR_R2 aggregate functions#61352

Merged
zclllyybb merged 13 commits intoapache:masterfrom
JoverZhang:regr_fn
Apr 14, 2026
Merged

[Enhancement](doris-future) Support REGR_ARGX, REGR_ARGY, REGR_COUNT and REGR_R2 aggregate functions#61352
zclllyybb merged 13 commits intoapache:masterfrom
JoverZhang:regr_fn

Conversation

@JoverZhang
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: #38974, #38976

Related PR: #55940

Problem Summary:

This PR completes the remaining statistical regression aggregate functions (REGR_*) by adding support for REGR_AVGX, REGR_AVGY, REGR_COUNT, and REGR_R2, based on the unified Regr aggregate function
approach introduced in #55940.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
      • Add support for REGR_AVGX, REGR_AVGY, REGR_COUNT, and REGR_R2 aggregate functions
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@JoverZhang
Copy link
Copy Markdown
Contributor Author

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 26608 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 05fdd3c6c117c5d61cd436dd464703066c64cb7a, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17637	4476	4300	4300
q2	q3	10645	763	508	508
q4	4680	359	249	249
q5	7541	1181	1010	1010
q6	182	174	148	148
q7	788	848	668	668
q8	9383	1446	1303	1303
q9	4881	4704	4687	4687
q10	6332	1909	1647	1647
q11	489	271	245	245
q12	784	577	466	466
q13	18050	2928	2163	2163
q14	232	235	218	218
q15	q16	749	737	668	668
q17	708	859	433	433
q18	5984	5311	5209	5209
q19	1304	980	618	618
q20	532	473	384	384
q21	4532	1810	1384	1384
q22	342	497	300	300
Total cold run time: 95775 ms
Total hot run time: 26608 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4807	4638	4634	4634
q2	q3	3919	4376	3843	3843
q4	870	1214	812	812
q5	4051	4372	4332	4332
q6	176	170	144	144
q7	1778	1652	1503	1503
q8	2487	2741	2589	2589
q9	7515	7284	7409	7284
q10	3754	3933	3561	3561
q11	533	453	431	431
q12	521	605	483	483
q13	2847	3169	2321	2321
q14	277	327	413	327
q15	q16	848	778	740	740
q17	1173	1285	1313	1285
q18	7262	6919	6768	6768
q19	892	846	909	846
q20	2070	2113	1960	1960
q21	3936	3480	3261	3261
q22	461	427	378	378
Total cold run time: 50177 ms
Total hot run time: 47502 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 168152 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 05fdd3c6c117c5d61cd436dd464703066c64cb7a, data reload: false

query5	4321	636	526	526
query6	332	217	200	200
query7	4210	474	272	272
query8	346	254	238	238
query9	8723	2728	2691	2691
query10	523	377	341	341
query11	7039	5085	4912	4912
query12	192	128	134	128
query13	1271	456	363	363
query14	5657	3711	3477	3477
query14_1	2835	2840	2838	2838
query15	209	192	176	176
query16	965	463	442	442
query17	1109	737	631	631
query18	2446	458	346	346
query19	216	209	179	179
query20	142	130	127	127
query21	215	152	114	114
query22	13279	13979	14409	13979
query23	16515	16011	15537	15537
query23_1	16113	15760	15578	15578
query24	7258	1639	1233	1233
query24_1	1256	1233	1233	1233
query25	574	486	482	482
query26	1262	269	143	143
query27	2835	493	323	323
query28	4467	1869	1843	1843
query29	836	555	463	463
query30	296	230	188	188
query31	1016	942	859	859
query32	77	69	71	69
query33	507	325	272	272
query34	886	872	528	528
query35	650	674	593	593
query36	1068	1077	993	993
query37	149	100	83	83
query38	2952	2977	2841	2841
query39	858	832	802	802
query39_1	816	797	799	797
query40	230	153	133	133
query41	65	62	57	57
query42	262	252	254	252
query43	239	248	220	220
query44	
query45	199	188	179	179
query46	860	974	624	624
query47	2133	2137	2024	2024
query48	333	319	243	243
query49	647	456	383	383
query50	685	295	222	222
query51	4078	4078	4150	4078
query52	267	278	262	262
query53	309	337	285	285
query54	305	274	272	272
query55	103	93	82	82
query56	319	328	315	315
query57	1972	1759	1645	1645
query58	285	276	273	273
query59	2804	3023	2789	2789
query60	347	339	343	339
query61	159	158	151	151
query62	643	584	537	537
query63	328	279	276	276
query64	5014	1269	995	995
query65	
query66	1455	457	369	369
query67	24186	24289	24159	24159
query68	
query69	397	311	284	284
query70	955	971	935	935
query71	341	303	296	296
query72	2837	2624	2423	2423
query73	531	551	318	318
query74	9585	9558	9406	9406
query75	2841	2724	2482	2482
query76	2270	1025	664	664
query77	355	382	297	297
query78	10889	11120	10401	10401
query79	2930	838	595	595
query80	1734	618	532	532
query81	580	258	218	218
query82	988	149	117	117
query83	320	254	237	237
query84	255	123	93	93
query85	927	507	448	448
query86	435	311	300	300
query87	3108	3151	3023	3023
query88	3577	2660	2643	2643
query89	431	374	341	341
query90	1949	175	171	171
query91	162	162	154	154
query92	80	74	68	68
query93	1295	829	503	503
query94	657	302	284	284
query95	571	401	321	321
query96	630	516	225	225
query97	2483	2457	2383	2383
query98	244	224	216	216
query99	980	988	920	920
Total cold run time: 252110 ms
Total hot run time: 168152 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 4.30% (4/93) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.62% (19708/37452)
Line Coverage 36.21% (184055/508327)
Region Coverage 32.33% (142023/439246)
Branch Coverage 33.54% (62104/185172)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.55% (26229/36658)
Line Coverage 54.33% (275257/506614)
Region Coverage 51.55% (228536/443295)
Branch Coverage 52.96% (98329/185654)

Comment on lines +91 to +92
// Keep `syy` to preserve regr_intercept's historical serialized state layout.
static constexpr size_t sy_level = 2;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Older versions used 2 here. It might be possible to reduce this to 1,
but doing so could change the serialized aggregate-state layout and potentially break deserialize() compatibility.

@zclllyybb
Copy link
Copy Markdown
Contributor

/review

@zclllyybb zclllyybb self-assigned this Mar 16, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Code Review Summary — PR #61352

Scope: This PR adds 4 new REGR aggregate functions (REGR_AVGX, REGR_AVGY, REGR_COUNT, REGR_R2) and refactors the existing 5 (REGR_SLOPE, REGR_INTERCEPT, REGR_SXX, REGR_SYY, REGR_SXY) into a unified template framework using RegrFunctionKind enum + RegrTraits specializations.

Minor note: PR title contains a typo — "REGR_ARGX, REGR_ARGY" should be "REGR_AVGX, REGR_AVGY".

Critical Checkpoint Conclusions

  1. Goal achieved? Yes. The PR adds 4 new statistical regression functions and consolidates all 9 regr functions into a single unified template. Regression tests cover all new functions.

  2. Modification minimal and focused? Yes. The refactoring of existing functions into the new template is well-motivated — it avoids duplicating the add/merge/serialize logic for each function.

  3. Concurrency? Not applicable — aggregate functions are per-partition state with no shared mutable data.

  4. Lifecycle / static initialization? No concerns — no static state, no cross-TU dependencies.

  5. Configuration items? None added.

  6. Incompatible changes / serialization compatibility? Verified safe. The serialization format (field order: Sx, Sy, Sxx, Syy, Sxy, N, each conditional) is identical between old aggregate_function_regr_union.h and new aggregate_function_regr.h for all 5 existing functions. The regr_intercept intentionally keeps sy_level=2 to preserve its historical serialized state layout (documented in comment at line 91). Rolling upgrades are safe.

  7. Parallel code paths? The old aggregate_function_regr_union.h/.cpp is fully deleted and replaced. No stale parallel paths remain.

  8. Special conditional checks? The regr_r2 get_result() has three-branch logic matching the SQL standard: NULL when N=0 or VAR_POP(x)=0, 1.0 when VAR_POP(y)=0, otherwise Sxy²/(Sxx·Syy). This is mathematically correct and matches PostgreSQL behavior. The regr_count state (k_num_moments=0) correctly compiles down to only tracking n.

  9. Test coverage? Good. Each of the 4 new functions has a dedicated regression test with:

    • Empty table (NULL/0), group by on empty table
    • Integer and double inputs
    • Group by with multiple groups
    • Non-nullable path (non_nullable())
    • Literal arguments
    • All-null groups (NULL/0)
    • Error cases (string params, array params) using test{exception} pattern
    • order_qt_ prefixed queries and ORDER BY for deterministic output
    • Tables dropped before use
  10. FE-BE consistency? Verified:

    • FE: RegrAvgx/RegrAvgy/RegrR2AlwaysNullable, return DoubleType → BE returns nullable Float64. Consistent.
    • FE: RegrCountNotNullableAggregateFunction with resultForEmptyInput()=0, return BigIntType → BE returns non-nullable Int64 via regr_count specialization. Consistent.
    • All 4 new functions registered in BuiltinAggregateFunctions.java, FunctionSet.java, and BE's register_aggregate_function_regr().
    • All 4 visitor methods added to AggregateFunctionVisitor.java, delegating to visitAggregateFunction() — consistent with existing regr functions.
  11. Observability? Not needed — aggregate functions don't require additional metrics or logging.

  12. Transaction/persistence? Not applicable.

  13. Performance? The template design with if constexpr compiles away unused moment tracking (e.g., regr_count only tracks N with zero overhead). The online Welford/Youngs-Cramer algorithms are numerically stable and O(1) per row. No redundant copies or allocations. The use of std::tuple<UInt64, std::array<Float64, k_num_moments>> vs std::tuple<UInt64> for regr_count is a nice touch that minimizes state size.

  14. Other issues? None found. The code follows existing patterns, the FE Java classes follow the newer style (consistent with RegrSxx/RegrSyy/RegrSxy from the related PR [fix](regr) Use Youngs-Cramer for REGR_SLOPE/INTERCEPT to align with PG #55940), and all mathematical formulas are correct.

Verdict: No issues found. Clean, well-structured addition with good test coverage and backward-compatible serialization.


public static final String COUNT = "count";

public static final String REGR_AVGX = "regr_avgx";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these useless code

Comment on lines +62 to +67
if (!yType.isNumericType() && !yType.isNullType()) {
throw new AnalysisException("regr_avgx requires numeric for first parameter: " + toSql());
}
if (!xType.isNumericType() && !xType.isNullType()) {
throw new AnalysisException("regr_avgx requires numeric for second parameter: " + toSql());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should support string type expr as a parameter

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, supported in a later commit.

@JoverZhang
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 3.25% (4/123) 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 27284 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 60edfc4880fec339a5f7385b02406fa592dd6f56, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17635	4453	4335	4335
q2	q3	10644	808	560	560
q4	4678	356	254	254
q5	7540	1238	1027	1027
q6	182	171	145	145
q7	781	848	688	688
q8	10040	1483	1350	1350
q9	5267	4856	4747	4747
q10	6296	1937	1642	1642
q11	457	255	245	245
q12	698	581	466	466
q13	18033	2957	2211	2211
q14	227	233	209	209
q15	q16	730	735	676	676
q17	733	838	474	474
q18	5994	5409	5237	5237
q19	1099	978	629	629
q20	529	489	381	381
q21	4334	1843	1719	1719
q22	444	368	289	289
Total cold run time: 96341 ms
Total hot run time: 27284 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4731	4603	4647	4603
q2	q3	3963	4345	3871	3871
q4	910	1210	799	799
q5	4110	4430	4357	4357
q6	188	178	146	146
q7	1758	1636	1519	1519
q8	2522	2748	2651	2651
q9	7769	7404	7341	7341
q10	3761	3962	3612	3612
q11	548	480	451	451
q12	524	611	446	446
q13	2891	3227	2412	2412
q14	308	319	286	286
q15	q16	737	784	747	747
q17	1193	1354	1368	1354
q18	7316	6778	6648	6648
q19	927	945	926	926
q20	2134	2224	2062	2062
q21	4080	3523	3377	3377
q22	531	435	402	402
Total cold run time: 50901 ms
Total hot run time: 48010 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 168864 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 60edfc4880fec339a5f7385b02406fa592dd6f56, data reload: false

query5	4339	638	519	519
query6	338	250	213	213
query7	4207	469	266	266
query8	336	244	231	231
query9	8695	2779	2757	2757
query10	539	389	364	364
query11	7032	5101	4874	4874
query12	200	128	126	126
query13	1284	473	345	345
query14	5761	3903	3519	3519
query14_1	2851	2878	2882	2878
query15	212	193	174	174
query16	956	478	455	455
query17	882	726	619	619
query18	2435	453	369	369
query19	210	205	184	184
query20	136	129	123	123
query21	210	127	108	108
query22	13208	14095	15069	14095
query23	16240	15817	15956	15817
query23_1	15797	15760	15598	15598
query24	7562	1611	1215	1215
query24_1	1225	1222	1215	1215
query25	520	463	409	409
query26	1248	272	154	154
query27	2758	484	300	300
query28	4448	1872	1863	1863
query29	828	590	477	477
query30	298	222	189	189
query31	1010	952	874	874
query32	76	75	68	68
query33	516	340	289	289
query34	879	872	523	523
query35	629	688	599	599
query36	1094	1133	979	979
query37	138	105	83	83
query38	2928	2944	2847	2847
query39	855	834	809	809
query39_1	809	787	797	787
query40	231	152	137	137
query41	63	59	57	57
query42	259	257	255	255
query43	242	246	224	224
query44	
query45	196	192	182	182
query46	885	979	610	610
query47	2099	2103	2070	2070
query48	302	322	231	231
query49	631	469	381	381
query50	685	287	225	225
query51	4066	4049	4054	4049
query52	264	264	253	253
query53	300	339	285	285
query54	297	268	270	268
query55	89	87	80	80
query56	328	320	304	304
query57	1931	1739	1693	1693
query58	279	285	305	285
query59	2805	2948	2752	2752
query60	333	334	316	316
query61	145	146	143	143
query62	618	588	549	549
query63	319	280	285	280
query64	5009	1253	991	991
query65	
query66	1464	449	370	370
query67	24252	24351	24146	24146
query68	
query69	420	313	287	287
query70	869	962	868	868
query71	344	308	312	308
query72	2799	2682	2538	2538
query73	561	550	322	322
query74	9589	9594	9404	9404
query75	2879	2766	2480	2480
query76	2303	1045	684	684
query77	385	402	325	325
query78	10884	11077	10435	10435
query79	1942	785	584	584
query80	1543	648	582	582
query81	547	256	233	233
query82	979	158	118	118
query83	338	295	254	254
query84	296	130	106	106
query85	955	556	437	437
query86	405	306	292	292
query87	3135	3115	3018	3018
query88	3584	2688	2692	2688
query89	468	370	343	343
query90	2001	186	179	179
query91	169	172	138	138
query92	76	73	73	73
query93	998	847	504	504
query94	646	316	274	274
query95	597	337	316	316
query96	638	521	229	229
query97	2435	2505	2393	2393
query98	236	230	251	230
query99	1032	973	905	905
Total cold run time: 249341 ms
Total hot run time: 168864 ms

@doris-robot
Copy link
Copy Markdown

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.64% (19703/37433)
Line Coverage 36.19% (184040/508474)
Region Coverage 32.37% (142031/438797)
Branch Coverage 33.55% (62103/185117)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.45% (26180/36639)
Line Coverage 54.24% (274889/506761)
Region Coverage 51.36% (227435/442846)
Branch Coverage 52.86% (98115/185599)

@JoverZhang
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 3.25% (4/123) 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 26836 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6a39c60705f3cea0dbbd488333151a9d916a7cfd, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17640	4506	4292	4292
q2	q3	10648	826	538	538
q4	4703	358	250	250
q5	7897	1232	1021	1021
q6	207	176	147	147
q7	829	846	671	671
q8	10279	1474	1345	1345
q9	5370	4682	4667	4667
q10	6329	1938	1625	1625
q11	452	253	247	247
q12	750	583	465	465
q13	18047	2954	2188	2188
q14	224	249	210	210
q15	q16	741	734	666	666
q17	736	851	448	448
q18	5930	5426	5224	5224
q19	1284	974	601	601
q20	529	494	373	373
q21	4799	1961	1585	1585
q22	396	349	273	273
Total cold run time: 97790 ms
Total hot run time: 26836 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4688	4592	4721	4592
q2	q3	4017	4330	3871	3871
q4	878	1202	768	768
q5	4103	4606	4328	4328
q6	190	175	139	139
q7	1734	1670	1594	1594
q8	2496	2762	2653	2653
q9	7576	7664	7391	7391
q10	3760	3985	3566	3566
q11	518	455	414	414
q12	494	701	482	482
q13	2798	3276	2329	2329
q14	288	295	272	272
q15	q16	710	775	721	721
q17	1236	1347	1407	1347
q18	7288	6878	6546	6546
q19	901	923	912	912
q20	2059	2155	2060	2060
q21	4010	3530	3479	3479
q22	446	436	383	383
Total cold run time: 50190 ms
Total hot run time: 47847 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 167562 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6a39c60705f3cea0dbbd488333151a9d916a7cfd, data reload: false

query5	4344	630	498	498
query6	329	217	216	216
query7	4217	479	265	265
query8	339	241	225	225
query9	8741	2687	2697	2687
query10	543	383	334	334
query11	6946	5056	4854	4854
query12	191	132	124	124
query13	1283	462	351	351
query14	5633	3759	3478	3478
query14_1	2845	2872	2824	2824
query15	209	192	178	178
query16	986	476	467	467
query17	894	759	629	629
query18	2458	453	358	358
query19	217	212	189	189
query20	135	130	130	130
query21	218	140	112	112
query22	13199	13291	13101	13101
query23	15692	15485	15433	15433
query23_1	15795	15772	15830	15772
query24	7751	1677	1307	1307
query24_1	1344	1287	1296	1287
query25	637	520	482	482
query26	1337	282	165	165
query27	2951	555	321	321
query28	4944	1918	1935	1918
query29	913	594	492	492
query30	303	231	199	199
query31	1028	952	855	855
query32	85	72	67	67
query33	515	330	284	284
query34	912	886	512	512
query35	632	689	583	583
query36	1087	1142	972	972
query37	132	95	84	84
query38	2912	2954	2895	2895
query39	864	836	813	813
query39_1	788	795	794	794
query40	234	153	135	135
query41	67	58	58	58
query42	258	253	251	251
query43	234	246	215	215
query44	
query45	200	194	183	183
query46	912	1003	623	623
query47	2115	2114	2595	2114
query48	312	322	222	222
query49	636	461	366	366
query50	694	281	210	210
query51	4034	3976	3991	3976
query52	258	262	255	255
query53	287	328	284	284
query54	304	291	257	257
query55	94	88	87	87
query56	317	314	312	312
query57	1945	1820	1627	1627
query58	285	270	272	270
query59	2746	2931	2740	2740
query60	347	340	321	321
query61	148	181	146	146
query62	616	585	523	523
query63	314	280	273	273
query64	5127	1249	1003	1003
query65	
query66	1461	448	357	357
query67	24180	24234	24307	24234
query68	
query69	411	309	280	280
query70	994	989	975	975
query71	348	306	299	299
query72	2773	2608	2423	2423
query73	536	547	314	314
query74	9632	9538	9418	9418
query75	2865	2737	2462	2462
query76	2307	1022	679	679
query77	361	369	316	316
query78	11024	11184	10448	10448
query79	1660	817	569	569
query80	1295	624	533	533
query81	573	257	226	226
query82	997	152	114	114
query83	338	263	248	248
query84	302	113	100	100
query85	898	480	481	480
query86	416	308	293	293
query87	3141	3105	2995	2995
query88	3564	2668	2629	2629
query89	423	372	340	340
query90	1995	182	182	182
query91	166	156	134	134
query92	75	77	66	66
query93	989	861	493	493
query94	631	317	291	291
query95	588	402	317	317
query96	637	506	227	227
query97	2471	2490	2441	2441
query98	234	227	226	226
query99	1000	994	912	912
Total cold run time: 250170 ms
Total hot run time: 167562 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.69% (19740/37461)
Line Coverage 36.24% (184370/508803)
Region Coverage 32.40% (142277/439074)
Branch Coverage 33.58% (62220/185295)

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

PR approved by anyone and no changes requested.

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 29176 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 21f0eaeeed212aca8b4f76d1b22ee8c9f8ec69df, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17678	3714	3766	3714
q2	q3	10672	863	605	605
q4	4676	457	363	363
q5	7458	1348	1137	1137
q6	198	169	137	137
q7	926	945	749	749
q8	9386	1409	1290	1290
q9	5514	5295	5249	5249
q10	6324	2033	1775	1775
q11	485	280	281	280
q12	873	683	502	502
q13	18081	2961	2178	2178
q14	286	288	254	254
q15	q16	889	851	785	785
q17	1032	1142	749	749
q18	6540	5643	5547	5547
q19	1237	1250	1045	1045
q20	620	536	408	408
q21	4864	2401	2064	2064
q22	450	387	345	345
Total cold run time: 98189 ms
Total hot run time: 29176 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4378	4399	4316	4316
q2	q3	4601	4771	4142	4142
q4	2063	2088	1347	1347
q5	4934	5039	5324	5039
q6	208	171	137	137
q7	1961	1775	1594	1594
q8	3307	2993	3021	2993
q9	8228	8487	8187	8187
q10	4685	4576	4218	4218
q11	598	455	496	455
q12	651	699	480	480
q13	2617	3130	2439	2439
q14	312	301	272	272
q15	q16	754	755	700	700
q17	1299	1274	1217	1217
q18	7856	6996	7037	6996
q19	1127	1152	1111	1111
q20	2295	2238	1947	1947
q21	6001	5342	4788	4788
q22	591	531	421	421
Total cold run time: 58466 ms
Total hot run time: 52799 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 180138 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 21f0eaeeed212aca8b4f76d1b22ee8c9f8ec69df, data reload: false

query5	4339	649	514	514
query6	374	232	211	211
query7	4219	598	344	344
query8	351	266	235	235
query9	8767	3944	3925	3925
query10	520	400	340	340
query11	6685	5494	5132	5132
query12	193	138	131	131
query13	1274	626	450	450
query14	5732	5194	4783	4783
query14_1	4177	4167	4175	4167
query15	224	207	189	189
query16	1100	469	358	358
query17	1416	789	663	663
query18	2539	546	407	407
query19	260	237	221	221
query20	141	130	132	130
query21	224	151	124	124
query22	13984	14856	14314	14314
query23	17984	17073	16802	16802
query23_1	16944	17053	16915	16915
query24	7718	1802	1409	1409
query24_1	1334	1373	1377	1373
query25	658	494	449	449
query26	1263	324	175	175
query27	2691	650	377	377
query28	4446	1889	1894	1889
query29	983	679	546	546
query30	300	231	189	189
query31	1102	1039	931	931
query32	106	75	70	70
query33	539	357	288	288
query34	1197	1131	686	686
query35	746	787	679	679
query36	1243	1229	1097	1097
query37	155	102	83	83
query38	3095	3021	2981	2981
query39	907	896	863	863
query39_1	829	833	828	828
query40	236	161	143	143
query41	63	60	59	59
query42	272	270	272	270
query43	320	321	275	275
query44	
query45	211	199	193	193
query46	1124	1267	827	827
query47	2326	2326	2236	2236
query48	407	403	291	291
query49	641	533	428	428
query50	723	294	216	216
query51	4385	4325	4193	4193
query52	280	285	264	264
query53	324	362	290	290
query54	331	295	291	291
query55	103	97	89	89
query56	334	338	319	319
query57	1773	1797	1591	1591
query58	291	275	271	271
query59	2903	3001	2720	2720
query60	331	348	328	328
query61	157	160	151	151
query62	672	630	564	564
query63	316	278	266	266
query64	5390	1440	1103	1103
query65	
query66	1499	474	367	367
query67	24254	24281	24089	24089
query68	
query69	458	351	301	301
query70	1035	1018	933	933
query71	363	327	311	311
query72	2975	2751	2417	2417
query73	793	827	456	456
query74	9835	9747	9570	9570
query75	3580	3357	2975	2975
query76	2319	1134	790	790
query77	399	427	350	350
query78	11245	11373	10735	10735
query79	1554	1079	789	789
query80	935	786	674	674
query81	478	282	251	251
query82	1361	162	119	119
query83	344	288	263	263
query84	249	146	118	118
query85	954	533	462	462
query86	487	336	318	318
query87	3323	3183	3120	3120
query88	3617	2689	2705	2689
query89	518	412	383	383
query90	1861	183	179	179
query91	180	193	140	140
query92	77	76	72	72
query93	929	908	527	527
query94	628	331	298	298
query95	667	465	339	339
query96	1093	762	342	342
query97	2647	2669	2563	2563
query98	237	224	227	224
query99	1065	1062	965	965
Total cold run time: 259787 ms
Total hot run time: 180138 ms

@doris-robot
Copy link
Copy Markdown

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.89% (20030/37873)
Line Coverage 36.49% (188005/515242)
Region Coverage 32.70% (145742/445684)
Branch Coverage 33.89% (63874/188458)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.86% (26656/37092)
Line Coverage 54.78% (281388/513693)
Region Coverage 51.76% (232799/449797)
Branch Coverage 53.24% (100639/189024)

@JoverZhang
Copy link
Copy Markdown
Contributor Author

run nonConcurrent

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.88% (26661/37092)
Line Coverage 54.79% (281455/513693)
Region Coverage 51.80% (232990/449797)
Branch Coverage 53.27% (100699/189024)

morrySnow
morrySnow previously approved these changes Apr 2, 2026
# Conflicts:
#	be/src/exprs/aggregate/aggregate_function_regr_union.cpp
#	be/src/exprs/aggregate/aggregate_function_regr_union.h
@JoverZhang JoverZhang dismissed stale reviews from morrySnow and zclllyybb via db2b4e8 April 14, 2026 00:42
@github-actions github-actions Bot removed the approved Indicates a PR has been approved by one committer. label Apr 14, 2026
@JoverZhang
Copy link
Copy Markdown
Contributor Author

run buildall

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 14, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 6.85% (22/321) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.04% (20145/37979)
Line Coverage 36.59% (189418/517735)
Region Coverage 32.83% (146992/447797)
Branch Coverage 33.98% (64373/189460)

@zclllyybb
Copy link
Copy Markdown
Contributor

/review

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 97.51% (313/321) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.60% (27375/37193)
Line Coverage 57.25% (295488/516151)
Region Coverage 54.50% (246292/451929)
Branch Coverage 56.13% (106673/190042)

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary: The unified BE implementation and FE registrations look consistent for REGR_AVGX, REGR_AVGY, REGR_COUNT, and REGR_R2, and I did not find a blocking correctness or compatibility issue in the patch.

Critical checkpoints:

  • Goal and correctness: The PR goal is to add the remaining REGR aggregate functions on top of the unified implementation. The BE factory registration, FE builtin registration, Nereids wrappers, and regression suites are all present, so the change appears to achieve that goal.
  • Scope and focus: The change is reasonably focused on REGR aggregates plus their regression coverage. The BE refactor also removes duplicated code without widening scope beyond this family.
  • Concurrency: No new concurrency or locking paths are introduced.
  • Lifecycle and initialization: No special lifecycle or static-initialization risk found.
  • Configuration: No new config items.
  • Compatibility: The existing REGR serialization layouts appear preserved for the pre-existing functions, including the explicit preservation of regr_intercept's historical state layout. FE/BE function names, arity, and result nullability expectations are aligned.
  • Parallel code paths: Both BE execution registration and FE builtin/Nereids registration were updated for the full REGR family touched by this PR.
  • Special conditions and error handling: Preconditions are straightforward and consistent with adjacent aggregate implementations; no silent-failure pattern stood out.
  • Test coverage: Regression suites were added for REGR_AVGX, REGR_AVGY, REGR_COUNT, and REGR_R2, and related existing REGR suites were updated. Coverage includes empty input, null filtering, literals, coercion, and negative signature cases. I did not rerun tests locally in this review.
  • Observability / transaction / persistence / FE-BE variable passing: Not applicable here.
  • Performance: The unified implementation removes duplication and does not show an obvious hot-path regression.
  • Other issues: None found during this review.

Residual risk: the added regression cases explicitly run with Nereids enabled, so classic-planner execution is not directly exercised by the new tests; I did not find evidence of a registration gap, but that planner path remains less directly covered than Nereids.

@zclllyybb zclllyybb merged commit e984a38 into apache:master Apr 14, 2026
30 of 32 checks passed
zclllyybb pushed a commit to apache/doris-website that referenced this pull request Apr 14, 2026
…#3544)

## Versions 

Add REGR aggregate function docs for
apache/doris#61352

- [x] dev
- [x] 4.x
- [ ] 3.x
- [ ] 2.1

## Languages

- [x] Chinese
- [x] English

## Docs Checklist

- [x] Checked by AI
- [ ] Test Cases Built
@JoverZhang JoverZhang deleted the regr_fn branch April 14, 2026 04:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.x dev/4.1.x-conflict reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants