Skip to content

[fix](ES Catalog)Check isArray before parse json to array (#39104)#39275

Merged
xiaokang merged 1 commit intoapache:branch-2.0from
qidaye:pick_is_array_check_2.0
Aug 14, 2024
Merged

[fix](ES Catalog)Check isArray before parse json to array (#39104)#39275
xiaokang merged 1 commit intoapache:branch-2.0from
qidaye:pick_is_array_check_2.0

Conversation

@qidaye
Copy link
Copy Markdown
Contributor

@qidaye qidaye commented Aug 13, 2024

Proposed changes

bp #39104

## Proposed changes

Elasticsearch does not have an explicit array type, but one of its
fields can contain [0 or more
values](https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html).
When the field has one value and we map it as array type in Doris, it
will run into segment fault while parsing it.
So we add a check before we parse json to array.

Issue Number: close apache#39102
@doris-robot
Copy link
Copy Markdown

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@qidaye
Copy link
Copy Markdown
Contributor Author

qidaye commented Aug 13, 2024

run buildall

@github-actions
Copy link
Copy Markdown
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 50362 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 53aeec1847f59150677766b8c8774a2b1624ac80, data reload: false

------ Round 1 ----------------------------------
q1	17862	4436	4382	4382
q2	2082	157	158	157
q3	10456	1942	1901	1901
q4	10285	1258	1332	1258
q5	8602	3950	3991	3950
q6	239	127	126	126
q7	2056	1640	1592	1592
q8	9324	2763	2760	2760
q9	11001	10504	10576	10504
q10	8639	3527	3515	3515
q11	419	252	262	252
q12	480	303	313	303
q13	18338	4003	4023	4003
q14	365	326	331	326
q15	519	456	465	456
q16	689	577	579	577
q17	1150	966	972	966
q18	7226	6943	6928	6928
q19	1683	1553	1545	1545
q20	563	313	314	313
q21	4474	4157	4158	4157
q22	488	391	392	391
Total cold run time: 116940 ms
Total hot run time: 50362 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4356	4321	4398	4321
q2	320	226	226	226
q3	4195	4186	4159	4159
q4	2773	2757	2760	2757
q5	7219	7095	7176	7095
q6	244	124	123	123
q7	3297	2884	2801	2801
q8	4361	4476	4502	4476
q9	17451	17206	17163	17163
q10	4303	4300	4293	4293
q11	749	706	713	706
q12	1026	862	883	862
q13	7437	3728	3762	3728
q14	446	417	440	417
q15	503	465	459	459
q16	736	694	685	685
q17	3829	3808	3917	3808
q18	8856	8698	8855	8698
q19	1708	1720	1673	1673
q20	2391	2141	2081	2081
q21	8558	8554	8466	8466
q22	1011	892	925	892
Total cold run time: 85769 ms
Total hot run time: 79889 ms

@doris-robot
Copy link
Copy Markdown

TeamCity be ut coverage result:
Function Coverage: 37.73% (8107/21485)
Line Coverage: 29.37% (66446/226212)
Region Coverage: 28.87% (34274/118710)
Branch Coverage: 24.76% (17621/71174)
Coverage Report: http://coverage.selectdb-in.cc/coverage/53aeec1847f59150677766b8c8774a2b1624ac80_53aeec1847f59150677766b8c8774a2b1624ac80/report/index.html

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 203266 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 53aeec1847f59150677766b8c8774a2b1624ac80, data reload: false

query1	920	388	433	388
query2	6529	2156	2168	2156
query3	6926	206	194	194
query4	20015	17966	17953	17953
query5	19734	6518	6554	6518
query6	296	227	230	227
query7	4167	305	326	305
query8	257	254	248	248
query9	3181	2739	2670	2670
query10	436	301	305	301
query11	11281	10743	10720	10720
query12	130	79	72	72
query13	5604	646	665	646
query14	18323	13722	13526	13526
query15	355	223	237	223
query16	6455	286	269	269
query17	1696	1459	877	877
query18	2304	430	427	427
query19	212	154	151	151
query20	76	81	83	81
query21	196	102	100	100
query22	5186	5107	4980	4980
query23	32819	32010	32119	32010
query24	6979	6595	6533	6533
query25	540	429	452	429
query26	532	164	163	163
query27	1895	302	295	295
query28	6158	2314	2292	2292
query29	2968	2796	2763	2763
query30	242	169	166	166
query31	922	736	728	728
query32	72	63	60	60
query33	406	262	247	247
query34	851	475	482	475
query35	1141	929	940	929
query36	1354	1144	1264	1144
query37	88	60	61	60
query38	3093	2958	2936	2936
query39	1385	1350	1332	1332
query40	212	95	97	95
query41	40	37	36	36
query42	90	81	90	81
query43	639	564	616	564
query44	1139	728	732	728
query45	243	237	237	237
query46	1231	945	972	945
query47	1817	1807	1807	1807
query48	987	682	665	665
query49	634	391	390	390
query50	863	577	611	577
query51	4739	4696	4831	4696
query52	90	77	80	77
query53	443	325	320	320
query54	2678	2472	2492	2472
query55	91	77	92	77
query56	222	223	204	204
query57	1186	1062	1064	1062
query58	220	212	214	212
query59	3460	3241	3400	3241
query60	222	196	194	194
query61	99	103	100	100
query62	899	450	454	450
query63	492	339	352	339
query64	2329	1529	1500	1500
query65	3639	3595	3610	3595
query66	823	384	382	382
query67	15587	15442	15477	15442
query68	8804	662	658	658
query69	575	335	344	335
query70	1693	1650	1292	1292
query71	427	312	318	312
query72	6596	3523	3494	3494
query73	744	332	319	319
query74	6305	5865	5896	5865
query75	5482	3671	3796	3671
query76	5360	1157	1198	1157
query77	913	262	268	262
query78	12808	13584	11997	11997
query79	10041	667	686	667
query80	787	405	408	405
query81	490	243	239	239
query82	653	98	100	98
query83	163	138	132	132
query84	266	72	73	72
query85	799	325	321	321
query86	333	301	294	294
query87	3243	3073	3008	3008
query88	4781	2322	2342	2322
query89	393	318	284	284
query90	1914	212	208	208
query91	166	129	124	124
query92	57	53	52	52
query93	4373	596	559	559
query94	683	215	214	214
query95	1114	1091	1061	1061
query96	670	338	335	335
query97	6620	6377	6369	6369
query98	182	185	172	172
query99	2672	886	947	886
Total cold run time: 310945 ms
Total hot run time: 203266 ms

@doris-robot
Copy link
Copy Markdown

ClickBench: Total hot run time: 30.56 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 53aeec1847f59150677766b8c8774a2b1624ac80, data reload: false

query1	0.03	0.02	0.03
query2	0.07	0.02	0.02
query3	0.24	0.04	0.05
query4	1.79	0.06	0.06
query5	0.54	0.53	0.52
query6	1.24	0.61	0.59
query7	0.02	0.01	0.02
query8	0.03	0.03	0.02
query9	0.53	0.50	0.47
query10	0.53	0.55	0.53
query11	0.12	0.09	0.09
query12	0.12	0.10	0.09
query13	0.63	0.62	0.62
query14	0.79	0.77	0.79
query15	0.80	0.76	0.76
query16	0.36	0.36	0.35
query17	1.02	1.02	1.03
query18	0.23	0.27	0.25
query19	1.91	1.78	1.77
query20	0.01	0.00	0.00
query21	15.46	0.57	0.59
query22	1.88	2.32	1.73
query23	17.21	0.97	0.83
query24	3.89	0.88	1.80
query25	0.38	0.10	0.05
query26	0.53	0.15	0.15
query27	0.04	0.03	0.04
query28	8.60	0.75	0.72
query29	12.64	2.32	2.22
query30	0.61	0.55	0.53
query31	2.81	0.40	0.38
query32	3.38	0.49	0.51
query33	3.13	3.04	3.10
query34	15.28	4.82	4.83
query35	4.87	4.88	4.86
query36	1.09	1.01	1.02
query37	0.06	0.04	0.05
query38	0.04	0.02	0.02
query39	0.02	0.02	0.01
query40	0.16	0.14	0.14
query41	0.06	0.01	0.01
query42	0.02	0.01	0.02
query43	0.02	0.02	0.02
Total cold run time: 103.19 s
Total hot run time: 30.56 s

@doris-robot
Copy link
Copy Markdown

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 53aeec1847f59150677766b8c8774a2b1624ac80 with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       21.6 seconds inserted 10000000 Rows, about 462K ops/s

@xiaokang xiaokang merged commit adfef1d into apache:branch-2.0 Aug 14, 2024
@qidaye qidaye deleted the pick_is_array_check_2.0 branch August 14, 2024 08:07
mongo360 pushed a commit to mongo360/doris that referenced this pull request Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants