Skip to content

[feature](Load)(step1)support nereids load, add load grammar#23485

Merged
morningman merged 25 commits intoapache:masterfrom
wsjz:nereids_load
Sep 20, 2023
Merged

[feature](Load)(step1)support nereids load, add load grammar#23485
morningman merged 25 commits intoapache:masterfrom
wsjz:nereids_load

Conversation

@wsjz
Copy link
Copy Markdown
Contributor

@wsjz wsjz commented Aug 25, 2023

Proposed changes

support nereids load grammar.

we will convert the broker load stmt to insert into clause:

  1. rename broker load to bulk load.
  2. add load grammar to nereids optimizer.
  3. convert to insert into clause with table value function.

#24221

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions
Copy link
Copy Markdown
Contributor

sh-checker report

To get the full details, please check in the job output.

shellcheck errors
'shellcheck ' found no issues.

shfmt errors

'shfmt ' returned error 1 finding the following formatting issues:

----------
--- bin/start_be.sh.orig
+++ bin/start_be.sh
@@ -82,13 +82,13 @@
 preload_jars+=("java-udf")
 
 for preload_jar_dir in "${preload_jars[@]}"; do
-  for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
-    if [[ -z "${DORIS_CLASSPATH}" ]]; then
-        export DORIS_CLASSPATH="${f}"
-    else
-        export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
-    fi
-  done
+    for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
+        if [[ -z "${DORIS_CLASSPATH}" ]]; then
+            export DORIS_CLASSPATH="${f}"
+        else
+            export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
+        fi
+    done
 done
 
 if [[ -d "${DORIS_HOME}/lib/hadoop_hdfs/" ]]; then
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename


@github-actions
Copy link
Copy Markdown
Contributor

sh-checker report

To get the full details, please check in the job output.

shellcheck errors
'shellcheck ' found no issues.

shfmt errors

'shfmt ' returned error 1 finding the following formatting issues:

----------
--- bin/start_be.sh.orig
+++ bin/start_be.sh
@@ -82,13 +82,13 @@
 preload_jars+=("java-udf")
 
 for preload_jar_dir in "${preload_jars[@]}"; do
-  for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
-    if [[ -z "${DORIS_CLASSPATH}" ]]; then
-        export DORIS_CLASSPATH="${f}"
-    else
-        export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
-    fi
-  done
+    for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
+        if [[ -z "${DORIS_CLASSPATH}" ]]; then
+            export DORIS_CLASSPATH="${f}"
+        else
+            export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
+        fi
+    done
 done
 
 if [[ -d "${DORIS_HOME}/lib/hadoop_hdfs/" ]]; then
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename


@github-actions
Copy link
Copy Markdown
Contributor

sh-checker report

To get the full details, please check in the job output.

shellcheck errors
'shellcheck ' found no issues.

shfmt errors

'shfmt ' returned error 1 finding the following formatting issues:

----------
--- bin/start_be.sh.orig
+++ bin/start_be.sh
@@ -82,13 +82,13 @@
 preload_jars+=("java-udf")
 
 for preload_jar_dir in "${preload_jars[@]}"; do
-  for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
-    if [[ -z "${DORIS_CLASSPATH}" ]]; then
-        export DORIS_CLASSPATH="${f}"
-    else
-        export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
-    fi
-  done
+    for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
+        if [[ -z "${DORIS_CLASSPATH}" ]]; then
+            export DORIS_CLASSPATH="${f}"
+        else
+            export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
+        fi
+    done
 done
 
 if [[ -d "${DORIS_HOME}/lib/hadoop_hdfs/" ]]; then
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename


@github-actions
Copy link
Copy Markdown
Contributor

sh-checker report

To get the full details, please check in the job output.

shellcheck errors
'shellcheck ' found no issues.

shfmt errors

'shfmt ' returned error 1 finding the following formatting issues:

----------
--- bin/start_be.sh.orig
+++ bin/start_be.sh
@@ -82,13 +82,13 @@
 preload_jars+=("java-udf")
 
 for preload_jar_dir in "${preload_jars[@]}"; do
-  for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
-    if [[ -z "${DORIS_CLASSPATH}" ]]; then
-        export DORIS_CLASSPATH="${f}"
-    else
-        export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
-    fi
-  done
+    for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
+        if [[ -z "${DORIS_CLASSPATH}" ]]; then
+            export DORIS_CLASSPATH="${f}"
+        else
+            export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
+        fi
+    done
 done
 
 if [[ -d "${DORIS_HOME}/lib/hadoop_hdfs/" ]]; then
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename


@github-actions
Copy link
Copy Markdown
Contributor

sh-checker report

To get the full details, please check in the job output.

shellcheck errors
'shellcheck ' found no issues.

shfmt errors

'shfmt ' returned error 1 finding the following formatting issues:

----------
--- bin/start_be.sh.orig
+++ bin/start_be.sh
@@ -82,13 +82,13 @@
 preload_jars+=("java-udf")
 
 for preload_jar_dir in "${preload_jars[@]}"; do
-  for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
-    if [[ -z "${DORIS_CLASSPATH}" ]]; then
-        export DORIS_CLASSPATH="${f}"
-    else
-        export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
-    fi
-  done
+    for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
+        if [[ -z "${DORIS_CLASSPATH}" ]]; then
+            export DORIS_CLASSPATH="${f}"
+        else
+            export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
+        fi
+    done
 done
 
 if [[ -d "${DORIS_HOME}/lib/hadoop_hdfs/" ]]; then
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename


@github-actions
Copy link
Copy Markdown
Contributor

sh-checker report

To get the full details, please check in the job output.

shellcheck errors
'shellcheck ' found no issues.

shfmt errors

'shfmt ' returned error 1 finding the following formatting issues:

----------
--- bin/start_be.sh.orig
+++ bin/start_be.sh
@@ -82,13 +82,13 @@
 preload_jars+=("java-udf")
 
 for preload_jar_dir in "${preload_jars[@]}"; do
-  for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
-    if [[ -z "${DORIS_CLASSPATH}" ]]; then
-        export DORIS_CLASSPATH="${f}"
-    else
-        export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
-    fi
-  done
+    for f in "${DORIS_HOME}/lib/java_extensions/${preload_jar_dir}"/*.jar; do
+        if [[ -z "${DORIS_CLASSPATH}" ]]; then
+            export DORIS_CLASSPATH="${f}"
+        else
+            export DORIS_CLASSPATH="${DORIS_CLASSPATH}:${f}"
+        fi
+    done
 done
 
 if [[ -d "${DORIS_HOME}/lib/hadoop_hdfs/" ]]; then
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename


Comment thread fe/fe-core/src/main/java/org/apache/doris/analysis/BulkDesc.java Outdated
Comment thread fe/fe-core/src/main/java/org/apache/doris/analysis/BulkDesc.java Outdated
Comment thread fe/fe-core/src/main/java/org/apache/doris/analysis/BulkDesc.java Outdated
Comment thread fe/fe-core/src/main/java/org/apache/doris/analysis/BulkLoadDataDesc.java Outdated
Comment thread fe/fe-core/src/main/java/org/apache/doris/analysis/NereidsLoadStmt.java Outdated
Comment thread fe/fe-core/src/main/java/org/apache/doris/analysis/BulkStorageDesc.java Outdated
Comment thread fe/fe-core/src/main/java/org/apache/doris/analysis/BulkLoadDataDesc.java Outdated
}

private String rewriteByPrecedingFilter(Expression whereExpr, Expression precedingFilterExpr) {
return whereExpr.toSql();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that toSql() to recover the origin sql string exactly.
How about just save the origin sql string when parsing the Load Stmt?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no effects

@wsjz
Copy link
Copy Markdown
Contributor Author

wsjz commented Sep 4, 2023

run buildall

@wsjz wsjz marked this pull request as ready for review September 5, 2023 06:14
@wsjz wsjz changed the title [feature](Load)support nereids load [feature](Load)(step1)support nereids load, add load grammar Sep 5, 2023
@wsjz
Copy link
Copy Markdown
Contributor Author

wsjz commented Sep 5, 2023

run buildall

@wsjz
Copy link
Copy Markdown
Contributor Author

wsjz commented Sep 7, 2023

run buildall

@wsjz
Copy link
Copy Markdown
Contributor Author

wsjz commented Sep 7, 2023

run buildall

@doris-robot
Copy link
Copy Markdown

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.87 seconds
stream load tsv: 527 seconds loaded 74807831229 Bytes, about 135 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17161920596 Bytes

@wsjz
Copy link
Copy Markdown
Contributor Author

wsjz commented Sep 8, 2023

run buildall

@wsjz
Copy link
Copy Markdown
Contributor Author

wsjz commented Sep 18, 2023

run buildall

@doris-robot
Copy link
Copy Markdown

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.19 seconds
stream load tsv: 619 seconds loaded 74807831229 Bytes, about 115 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162623572 Bytes

@wsjz
Copy link
Copy Markdown
Contributor Author

wsjz commented Sep 18, 2023

run p0

@wsjz
Copy link
Copy Markdown
Contributor Author

wsjz commented Sep 18, 2023

run buildall

@doris-robot
Copy link
Copy Markdown

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.61 seconds
stream load tsv: 613 seconds loaded 74807831229 Bytes, about 116 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.5 seconds inserted 10000000 Rows, about 350K ops/s
storage size: 17162186050 Bytes

@doris-robot
Copy link
Copy Markdown

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.45 seconds
stream load tsv: 620 seconds loaded 74807831229 Bytes, about 115 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162257137 Bytes

@wsjz
Copy link
Copy Markdown
Contributor Author

wsjz commented Sep 18, 2023

run buildall

@doris-robot
Copy link
Copy Markdown

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 49.27 seconds
stream load tsv: 616 seconds loaded 74807831229 Bytes, about 115 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162495225 Bytes

@wsjz
Copy link
Copy Markdown
Contributor Author

wsjz commented Sep 18, 2023

run buildall

@doris-robot
Copy link
Copy Markdown

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.09 seconds
stream load tsv: 596 seconds loaded 74807831229 Bytes, about 119 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162110312 Bytes

@wsjz
Copy link
Copy Markdown
Contributor Author

wsjz commented Sep 18, 2023

run buildall

@doris-robot
Copy link
Copy Markdown

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.78 seconds
stream load tsv: 595 seconds loaded 74807831229 Bytes, about 119 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17162426406 Bytes

Comment thread fe/fe-core/src/main/antlr4/org/apache/doris/nereids/DorisParser.g4 Outdated
Comment thread fe/fe-core/src/main/antlr4/org/apache/doris/nereids/DorisParser.g4
Comment thread fe/fe-core/src/main/antlr4/org/apache/doris/nereids/DorisParser.g4 Outdated
@wsjz
Copy link
Copy Markdown
Contributor Author

wsjz commented Sep 19, 2023

run buildall

@doris-robot
Copy link
Copy Markdown

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 48.44 seconds
stream load tsv: 600 seconds loaded 74807831229 Bytes, about 118 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17162456442 Bytes

@wsjz
Copy link
Copy Markdown
Contributor Author

wsjz commented Sep 20, 2023

run buildall

@doris-robot
Copy link
Copy Markdown

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.57 seconds
stream load tsv: 616 seconds loaded 74807831229 Bytes, about 115 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 29.3 seconds inserted 10000000 Rows, about 341K ops/s
storage size: 17162201019 Bytes

Copy link
Copy Markdown
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 20, 2023
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit 677b342 into apache:master Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants