[feature](Load)(step1)support nereids load, add load grammar by wsjz · Pull Request #23485 · apache/doris

wsjz · 2023-08-25T06:26:33Z

Proposed changes

support nereids load grammar.

we will convert the broker load stmt to insert into clause:

rename broker load to bulk load.
add load grammar to nereids optimizer.
convert to insert into clause with table value function.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

github-actions · 2023-08-25T06:27:37Z

`sh-checker report`

To get the full details, please check in the job output.

shellcheck errors

'shellcheck ' found no issues.

shfmt errors


'shfmt ' returned error 1 finding the following formatting issues:

----------
--- bin/start_be.sh.orig
+++ bin/start_be.sh
@@ -82,13 +82,13 @@
 preload_jars+=("java-udf")
 
 for preload_jar_dir in "${preload_jars[@]}"; do
-  for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
-    if [[ -z "${DORIS_CLASSPATH}" ]]; then
-        export DORIS_CLASSPATH="${f}"
-    else
-        export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
-    fi
-  done
+    for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
+        if [[ -z "${DORIS_CLASSPATH}" ]]; then
+            export DORIS_CLASSPATH="${f}"
+        else
+            export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
+        fi
+    done
 done
 
 if [[ -d "${DORIS_HOME}/lib/hadoop_hdfs/" ]]; then
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename

github-actions · 2023-08-25T06:28:52Z

`sh-checker report`

To get the full details, please check in the job output.

shellcheck errors

'shellcheck ' found no issues.

shfmt errors


'shfmt ' returned error 1 finding the following formatting issues:

----------
--- bin/start_be.sh.orig
+++ bin/start_be.sh
@@ -82,13 +82,13 @@
 preload_jars+=("java-udf")
 
 for preload_jar_dir in "${preload_jars[@]}"; do
-  for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
-    if [[ -z "${DORIS_CLASSPATH}" ]]; then
-        export DORIS_CLASSPATH="${f}"
-    else
-        export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
-    fi
-  done
+    for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
+        if [[ -z "${DORIS_CLASSPATH}" ]]; then
+            export DORIS_CLASSPATH="${f}"
+        else
+            export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
+        fi
+    done
 done
 
 if [[ -d "${DORIS_HOME}/lib/hadoop_hdfs/" ]]; then
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename

github-actions · 2023-08-25T08:42:45Z

`sh-checker report`

To get the full details, please check in the job output.

shellcheck errors

'shellcheck ' found no issues.

shfmt errors


'shfmt ' returned error 1 finding the following formatting issues:

----------
--- bin/start_be.sh.orig
+++ bin/start_be.sh
@@ -82,13 +82,13 @@
 preload_jars+=("java-udf")
 
 for preload_jar_dir in "${preload_jars[@]}"; do
-  for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
-    if [[ -z "${DORIS_CLASSPATH}" ]]; then
-        export DORIS_CLASSPATH="${f}"
-    else
-        export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
-    fi
-  done
+    for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
+        if [[ -z "${DORIS_CLASSPATH}" ]]; then
+            export DORIS_CLASSPATH="${f}"
+        else
+            export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
+        fi
+    done
 done
 
 if [[ -d "${DORIS_HOME}/lib/hadoop_hdfs/" ]]; then
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename

github-actions · 2023-08-25T10:06:48Z

`sh-checker report`

To get the full details, please check in the job output.

shellcheck errors

'shellcheck ' found no issues.

shfmt errors


'shfmt ' returned error 1 finding the following formatting issues:

----------
--- bin/start_be.sh.orig
+++ bin/start_be.sh
@@ -82,13 +82,13 @@
 preload_jars+=("java-udf")
 
 for preload_jar_dir in "${preload_jars[@]}"; do
-  for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
-    if [[ -z "${DORIS_CLASSPATH}" ]]; then
-        export DORIS_CLASSPATH="${f}"
-    else
-        export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
-    fi
-  done
+    for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
+        if [[ -z "${DORIS_CLASSPATH}" ]]; then
+            export DORIS_CLASSPATH="${f}"
+        else
+            export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
+        fi
+    done
 done
 
 if [[ -d "${DORIS_HOME}/lib/hadoop_hdfs/" ]]; then
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename

github-actions · 2023-08-28T02:23:18Z

`sh-checker report`

To get the full details, please check in the job output.

shellcheck errors

'shellcheck ' found no issues.

shfmt errors


'shfmt ' returned error 1 finding the following formatting issues:

----------
--- bin/start_be.sh.orig
+++ bin/start_be.sh
@@ -82,13 +82,13 @@
 preload_jars+=("java-udf")
 
 for preload_jar_dir in "${preload_jars[@]}"; do
-  for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
-    if [[ -z "${DORIS_CLASSPATH}" ]]; then
-        export DORIS_CLASSPATH="${f}"
-    else
-        export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
-    fi
-  done
+    for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
+        if [[ -z "${DORIS_CLASSPATH}" ]]; then
+            export DORIS_CLASSPATH="${f}"
+        else
+            export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
+        fi
+    done
 done
 
 if [[ -d "${DORIS_HOME}/lib/hadoop_hdfs/" ]]; then
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename

github-actions · 2023-08-28T06:27:27Z

`sh-checker report`

To get the full details, please check in the job output.

shellcheck errors

'shellcheck ' found no issues.

shfmt errors


'shfmt ' returned error 1 finding the following formatting issues:

----------
--- bin/start_be.sh.orig
+++ bin/start_be.sh
@@ -82,13 +82,13 @@
 preload_jars+=("java-udf")
 
 for preload_jar_dir in "${preload_jars[@]}"; do
-  for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
-    if [[ -z "${DORIS_CLASSPATH}" ]]; then
-        export DORIS_CLASSPATH="${f}"
-    else
-        export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
-    fi
-  done
+    for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
+        if [[ -z "${DORIS_CLASSPATH}" ]]; then
+            export DORIS_CLASSPATH="${f}"
+        else
+            export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
+        fi
+    done
 done
 
 if [[ -d "${DORIS_HOME}/lib/hadoop_hdfs/" ]]; then
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename

morningman · 2023-08-31T16:05:28Z

+    }
+
+    private String rewriteByPrecedingFilter(Expression whereExpr, Expression precedingFilterExpr) {
+        return whereExpr.toSql();


I am not sure that toSql() to recover the origin sql string exactly.
How about just save the origin sql string when parsing the Load Stmt?

wsjz · 2023-09-04T12:23:39Z

run buildall

wsjz · 2023-09-05T11:43:53Z

run buildall

wsjz · 2023-09-07T18:20:50Z

run buildall

wsjz · 2023-09-07T18:29:48Z

run buildall

doris-robot · 2023-09-07T18:59:24Z

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.87 seconds
stream load tsv: 527 seconds loaded 74807831229 Bytes, about 135 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17161920596 Bytes

wsjz · 2023-09-08T02:29:29Z

run buildall

wsjz · 2023-09-18T02:07:18Z

run buildall

doris-robot · 2023-09-18T02:38:09Z

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.19 seconds
stream load tsv: 619 seconds loaded 74807831229 Bytes, about 115 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162623572 Bytes

wsjz · 2023-09-18T02:45:07Z

run p0

wsjz · 2023-09-18T03:23:00Z

run buildall

doris-robot · 2023-09-18T03:26:50Z

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.61 seconds
stream load tsv: 613 seconds loaded 74807831229 Bytes, about 116 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.5 seconds inserted 10000000 Rows, about 350K ops/s
storage size: 17162186050 Bytes

doris-robot · 2023-09-18T03:55:56Z

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.45 seconds
stream load tsv: 620 seconds loaded 74807831229 Bytes, about 115 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162257137 Bytes

wsjz · 2023-09-18T03:59:58Z

run buildall

doris-robot · 2023-09-18T04:30:52Z

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 49.27 seconds
stream load tsv: 616 seconds loaded 74807831229 Bytes, about 115 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162495225 Bytes

wsjz · 2023-09-18T07:43:48Z

run buildall

doris-robot · 2023-09-18T08:15:51Z

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.09 seconds
stream load tsv: 596 seconds loaded 74807831229 Bytes, about 119 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162110312 Bytes

wsjz · 2023-09-18T08:21:23Z

run buildall

doris-robot · 2023-09-18T09:03:23Z

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.78 seconds
stream load tsv: 595 seconds loaded 74807831229 Bytes, about 119 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17162426406 Bytes

wsjz · 2023-09-19T08:35:11Z

run buildall

doris-robot · 2023-09-19T09:18:24Z

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 48.44 seconds
stream load tsv: 600 seconds loaded 74807831229 Bytes, about 118 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17162456442 Bytes

wsjz · 2023-09-20T06:16:16Z

run buildall

doris-robot · 2023-09-20T07:06:08Z

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.57 seconds
stream load tsv: 616 seconds loaded 74807831229 Bytes, about 115 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 29.3 seconds inserted 10000000 Rows, about 341K ops/s
storage size: 17162201019 Bytes

morningman

LGTM

github-actions · 2023-09-20T09:13:53Z

PR approved by at least one committer and no changes requested.

github-actions · 2023-09-20T09:13:56Z

PR approved by anyone and no changes requested.

wsjz and others added 4 commits August 28, 2023 15:12

[feature](Load)support nereids load

392de4a

Merge branch 'master' into nereids_load

e003f7a

1

54a46d4

2

5164d98

morningman reviewed Aug 31, 2023

View reviewed changes

wsjz and others added 3 commits August 31, 2023 10:40

3

4373d7a

Merge branch 'master' into nereids_load

b7f67d7

4

3073dc9

morningman reviewed Aug 31, 2023

View reviewed changes

5

d0fc19e

wsjz marked this pull request as ready for review September 5, 2023 06:14

Merge branch 'master' into nereids_load

9093248

wsjz changed the title ~~[feature](Load)support nereids load~~ [feature](Load)(step1)support nereids load, add load grammar Sep 5, 2023

morningman reviewed Sep 5, 2023

View reviewed changes

wsjz and others added 2 commits September 8, 2023 02:20

fix

362ea33

Merge branch 'master' into nereids_load

de335c5

rebase

451ce62

fix old ut

5969228

wsjz and others added 3 commits September 18, 2023 02:51

wait

dafef10

Merge branch 'master' into nereids_load

9db6006

fix ut

aa0c583

wsjz added 2 commits September 18, 2023 10:46

Merge branch 'master' into nereids_load

99f627b

Merge branch 'master' into nereids_load

8278617

morrySnow reviewed Sep 19, 2023

View reviewed changes

fix check style

b206e4a

morningman approved these changes Sep 20, 2023

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 20, 2023

github-actions bot added the reviewed label Sep 20, 2023

morrySnow approved these changes Sep 20, 2023

View reviewed changes

morningman merged commit 677b342 into apache:master Sep 20, 2023

Conversation

wsjz commented Aug 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Further comments

Uh oh!

github-actions bot commented Aug 25, 2023

sh-checker report

Uh oh!

github-actions bot commented Aug 25, 2023

sh-checker report

Uh oh!

github-actions bot commented Aug 25, 2023

sh-checker report

Uh oh!

github-actions bot commented Aug 25, 2023

sh-checker report

Uh oh!

github-actions bot commented Aug 28, 2023

sh-checker report

Uh oh!

github-actions bot commented Aug 28, 2023

sh-checker report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

morningman Aug 31, 2023

Choose a reason for hiding this comment

Uh oh!

wsjz Sep 4, 2023

Choose a reason for hiding this comment

Uh oh!

wsjz commented Sep 4, 2023

Uh oh!

wsjz commented Sep 5, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wsjz commented Sep 7, 2023

Uh oh!

wsjz commented Sep 7, 2023

Uh oh!

doris-robot commented Sep 7, 2023

Uh oh!

wsjz commented Sep 8, 2023

Uh oh!

wsjz commented Sep 18, 2023

Uh oh!

doris-robot commented Sep 18, 2023

Uh oh!

wsjz commented Sep 18, 2023

Uh oh!

wsjz commented Sep 18, 2023

Uh oh!

doris-robot commented Sep 18, 2023

Uh oh!

doris-robot commented Sep 18, 2023

Uh oh!

wsjz commented Sep 18, 2023

Uh oh!

doris-robot commented Sep 18, 2023

Uh oh!

wsjz commented Sep 18, 2023

Uh oh!

doris-robot commented Sep 18, 2023

Uh oh!

wsjz commented Sep 18, 2023

Uh oh!

doris-robot commented Sep 18, 2023

Uh oh!

Uh oh!

Uh oh!

wsjz commented Aug 25, 2023 •

edited

Loading

`sh-checker report`

`sh-checker report`

`sh-checker report`

`sh-checker report`

`sh-checker report`

`sh-checker report`