[SPARK-44066][SQL] Support positional parameters in Scala/Java `sql()` by MaxGekk · Pull Request #41568 · apache/spark

MaxGekk · 2023-06-13T08:06:05Z

What changes were proposed in this pull request?

In the PR, I propose to extend SparkSession API and override the sql method by:

  def sql(sqlText: String, args: Array[_]): DataFrame

which accepts an array of Java/Scala objects that can be converted to SQL literal expressions.

And the first argument sqlText might have named parameters in the positions of constants like literal values. A value can be also a Column of literal expression, in that case it is taken as is.

For example:

  spark.sql(
    sqlText = "SELECT * FROM tbl WHERE date > ? LIMIT ?",
    args = Array(LocalDate.of(2023, 6, 15), 100))

The new sql() method parses the input SQL statement and replaces the positional parameters by the literal values.

Why are the changes needed?

To conform the SQL standard and JDBC/ODBC protocol.
To improve user experience with Spark SQL via
- Using Spark as remote service (microservice).
- Write SQL code that will power reports, dashboards, charts and other data presentation solutions that need to account for criteria modifiable by users through an interface.
- Build a generic integration layer based on the SQL API. The goal is to expose managed data to a wide application ecosystem with a microservice architecture. It is only natural in such a setup to ask for modular and reusable SQL code, that can be executed repeatedly with different parameter values.
To achieve feature parity with other systems that support positional parameters.

Does this PR introduce any user-facing change?

No, the changes extend the existing API.

How was this patch tested?

By running new tests:

$ build/sbt "test:testOnly *AnalysisSuite"
$ build/sbt "test:testOnly *PlanParserSuite"
$ build/sbt "test:testOnly *ParametersSuite"

and the affected test suites:

$ build/sbt "sql/testOnly *QueryExecutionErrorsSuite"

entong · 2023-06-15T15:41:18Z

+    CTERelationDef.curId.set(0)
+    val expected1 = parsePlan("WITH a AS (SELECT 1 c) SELECT * FROM a LIMIT 10").analyze
+    comparePlans(actual1, expected1)
+    // Ignore unused arguments


Can we add more negative test cases?

Error out when there are less number of arguments than positional placeholders (?)

Error out when the query text contains both named and positional params

Error out when there are less number of arguments than positional placeholders (?)

@entong Such test has been added already, see test("non-substituted positional parameters") in ParametersSuite, please.

Error out when the query text contains both named and positional params

I added such test "mixing of positional and named parameters". Thank you for the proposal @entong. BTW, existing API doesn't allow to pass both named and positional arguments but the query might contain such parameters. The test checks that.

…pos-param

cloud-fan · 2023-06-19T09:07:04Z

+      case p @ NameParameterizedQuery(child, args) if !child.containsPattern(UNRESOLVED_WITH) =>
+        checkArgs(args)
+        val res = bind(child) { case NamedParameter(name) if args.contains(name) => args(name) }
+        res.copyTagsFrom(p)


do we really need this? the function bind calls resolveExpressionsWithPruning which automatically propagate the tree node tags

:-) Wenchen, you added this in the PR https://github.com/apache/spark/pull/40333/files#diff-0c6239dcd44391b82e8a2b0a1bc3c6210daae448be9caa3c81675922fab9699cR106 . What was the reason?

it was probably an oversight...

I removed it.

cloud-fan · 2023-06-19T09:09:53Z

+/**
+ * The expression represents a positional parameter that should be replaced by a literal.
+ *
+ * @param pos An unique position of the parameter in a SQL query.


Suggested change

* @param pos An unique position of the parameter in a SQL query.

* @param pos An unique position of the parameter in a SQL query text.

cloud-fan · 2023-06-19T09:11:04Z

+
+        val positions = scala.collection.mutable.Set.empty[Int]
+        bind(child) { case p @ PosParameter(pos) => positions.add(pos); p }
+        val posToIndex = positions.toSeq.sorted.zipWithIndex.toMap


shall we fail earlier if the number of parameters does not match the number of actual arguments?

Do you mean exact matching? Current approach is consistent to named parameters when a map can contain not used arguments in a query. This can open a door for additional use cases, from my point of view.

not exact match, when the number of parameters is too less.

maybe it's fine to fail with unbound parameters.

cloud-fan · 2023-06-19T09:15:57Z

+      exception = intercept[AnalysisException] {
+        spark.sql("select :param1, ?", Map("param1" -> 1))
+      },
+      errorClass = "UNBOUND_SQL_PARAMETER",


shall we throw a better error message, saying that we found positional parameters while binding named parameters?

MaxGekk · 2023-06-22T06:39:38Z

Merging to master. Thank you, @entong @cloud-fan for review.

### What changes were proposed in this pull request? In the PR, I propose to add a sequence of literal expressions to 1. Proto API: `SqlCommand` and the `SQL` relation 2. Scala connect API: `SparkSession. sql` This PR is a follow up of #41568. ### Why are the changes needed? Currently `SparkSession.sql` in Spark Connect doesn't support parameterized SQL with positional parameters. The changes allow to achieve feature parity with Scala/Java/PySpark APIs. ### Does this PR introduce _any_ user-facing change? No, the changes just extend the existing API. ### How was this patch tested? By running new test: ``` $ build/sbt "test:testOnly *.ClientE2ETestSuite" ``` Closes #41698 from MaxGekk/parameterized-query-pos-param-proto. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…sql` API GA ### What changes were proposed in this pull request? This PR aims to make `Parameterized SQL queries` of `SparkSession.sql` API GA in Apache Spark 4.0.0. ### Why are the changes needed? Apache Spark has been supported `Parameterized SQL queries` because they are very convenient usage for the users . - #38864 (Since Spark 3.4.0) - #41568 (Since Spark 3.5.0) It's time to make it GA by removing `Experimental` tags since this feature has been serving well for a long time. ### Does this PR introduce _any_ user-facing change? No, there is no behavior change. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48965 from dongjoon-hyun/SPARK-50422. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>

### What changes were proposed in this pull request? This PR aims to support `Parameterized SQL queries` in `sql` API. ### Why are the changes needed? For feature parity, we had better support this GA feature. - apache/spark#38864 (Since Spark 3.4.0) - apache/spark#40623 (Since Spark 3.4.0) - apache/spark#41568 (Since Spark 3.5.0) - apache/spark#48965 (GA Since Spark 4.0.0) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #103 from dongjoon-hyun/SPARK-51986. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

MaxGekk added 4 commits June 12, 2023 11:19

Grammar rules of positional parameters

bdd167a

Test parsing of positional parameters

6160fdf

Add NamedParameter

69f4cfc

Add NameParameterizedQuery

5fe4387

github-actions Bot added CONNECT SQL labels Jun 13, 2023

MaxGekk added 9 commits June 13, 2023 14:58

Fix QueryExecutionErrorsSuite

f66c56a

Implement positional parameters

575afde

Trigger build

6605476

Generalization of bind()

537ee4b

Trigger build

80529c1

Add a test to AnalysisSuite

5846057

Extend sql()

c2e6f09

Exclude sql from mima

f928f66

Trigger build

6540077

MaxGekk changed the title ~~[WIP][SQL] Support positional parameters in parameterized query~~ [WIP][SPARK-44066][SQL] Support positional parameters in parameterized query Jun 15, 2023

MaxGekk added 5 commits June 15, 2023 11:14

SPARK-XXXXX -> SPARK-44066

860786b

Add tests to ParametersSuite

3f41426

Enable a test and bug fix

7c0c589

Add the test about unused positional arguments

9ab3d8a

Add TODO

5400051

MaxGekk changed the title ~~[WIP][SPARK-44066][SQL] Support positional parameters in parameterized query~~ [SPARK-44066][SQL] Support positional parameters in Scala/Java sql() Jun 15, 2023

MaxGekk marked this pull request as ready for review June 15, 2023 14:59

entong reviewed Jun 15, 2023

View reviewed changes

Added a test of mixing of positional and named parameters

cc57777

MaxGekk requested a review from cloud-fan June 15, 2023 18:31

Merge remote-tracking branch 'origin/master' into parametrized-query-…

9bbc345

…pos-param

cloud-fan reviewed Jun 19, 2023

View reviewed changes

MaxGekk added 3 commits June 20, 2023 15:51

Seq -> Array

6f32d27

Remove res.copyTagsFrom

5414de4

Fix a comment

9ae3214

cloud-fan approved these changes Jun 20, 2023

View reviewed changes

MaxGekk added 2 commits June 20, 2023 19:18

Add an Java test

7efe863

sequence -> array

f00a29d

MaxGekk closed this in 1b4048b Jun 22, 2023

MaxGekk mentioned this pull request Jun 25, 2023

[SPARK-44178][CONNECT] Support positional parameters in sql() #41698

Closed

dongjoon-hyun mentioned this pull request Nov 26, 2024

[SPARK-50422][SQL] Make Parameterized SQL queries of SparkSession.sql API GA #48965

Closed

dongjoon-hyun mentioned this pull request May 2, 2025

[SPARK-51986] Support Parameterized SQL queries in sql API apache/spark-connect-swift#103

Closed

	* @param pos An unique position of the parameter in a SQL query.
	* @param pos An unique position of the parameter in a SQL query text.

Conversation

MaxGekk commented Jun 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Jun 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MaxGekk commented Jun 13, 2023 •

edited

Loading

MaxGekk Jun 15, 2023 •

edited

Loading