Skip to content

[SPARK-44066][SQL] Support positional parameters in Scala/Java sql()#41568

Closed
MaxGekk wants to merge 25 commits into
apache:masterfrom
MaxGekk:parametrized-query-pos-param
Closed

[SPARK-44066][SQL] Support positional parameters in Scala/Java sql()#41568
MaxGekk wants to merge 25 commits into
apache:masterfrom
MaxGekk:parametrized-query-pos-param

Conversation

@MaxGekk
Copy link
Copy Markdown
Member

@MaxGekk MaxGekk commented Jun 13, 2023

What changes were proposed in this pull request?

In the PR, I propose to extend SparkSession API and override the sql method by:

  def sql(sqlText: String, args: Array[_]): DataFrame

which accepts an array of Java/Scala objects that can be converted to SQL literal expressions.

And the first argument sqlText might have named parameters in the positions of constants like literal values. A value can be also a Column of literal expression, in that case it is taken as is.

For example:

  spark.sql(
    sqlText = "SELECT * FROM tbl WHERE date > ? LIMIT ?",
    args = Array(LocalDate.of(2023, 6, 15), 100))

The new sql() method parses the input SQL statement and replaces the positional parameters by the literal values.

Why are the changes needed?

  1. To conform the SQL standard and JDBC/ODBC protocol.

  2. To improve user experience with Spark SQL via

    • Using Spark as remote service (microservice).
    • Write SQL code that will power reports, dashboards, charts and other data presentation solutions that need to account for criteria modifiable by users through an interface.
    • Build a generic integration layer based on the SQL API. The goal is to expose managed data to a wide application ecosystem with a microservice architecture. It is only natural in such a setup to ask for modular and reusable SQL code, that can be executed repeatedly with different parameter values.
  3. To achieve feature parity with other systems that support positional parameters.

Does this PR introduce any user-facing change?

No, the changes extend the existing API.

How was this patch tested?

By running new tests:

$ build/sbt "test:testOnly *AnalysisSuite"
$ build/sbt "test:testOnly *PlanParserSuite"
$ build/sbt "test:testOnly *ParametersSuite"

and the affected test suites:

$ build/sbt "sql/testOnly *QueryExecutionErrorsSuite"

@MaxGekk MaxGekk changed the title [WIP][SQL] Support positional parameters in parameterized query [WIP][SPARK-44066][SQL] Support positional parameters in parameterized query Jun 15, 2023
@MaxGekk MaxGekk changed the title [WIP][SPARK-44066][SQL] Support positional parameters in parameterized query [SPARK-44066][SQL] Support positional parameters in Scala/Java sql() Jun 15, 2023
@MaxGekk MaxGekk marked this pull request as ready for review June 15, 2023 14:59
CTERelationDef.curId.set(0)
val expected1 = parsePlan("WITH a AS (SELECT 1 c) SELECT * FROM a LIMIT 10").analyze
comparePlans(actual1, expected1)
// Ignore unused arguments
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add more negative test cases?

  • Error out when there are less number of arguments than positional placeholders (?)
  • Error out when the query text contains both named and positional params

Copy link
Copy Markdown
Member Author

@MaxGekk MaxGekk Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error out when there are less number of arguments than positional placeholders (?)

@entong Such test has been added already, see test("non-substituted positional parameters") in ParametersSuite, please.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error out when the query text contains both named and positional params

I added such test "mixing of positional and named parameters". Thank you for the proposal @entong. BTW, existing API doesn't allow to pass both named and positional arguments but the query might contain such parameters. The test checks that.

@MaxGekk MaxGekk requested a review from cloud-fan June 15, 2023 18:31
case p @ NameParameterizedQuery(child, args) if !child.containsPattern(UNRESOLVED_WITH) =>
checkArgs(args)
val res = bind(child) { case NamedParameter(name) if args.contains(name) => args(name) }
res.copyTagsFrom(p)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need this? the function bind calls resolveExpressionsWithPruning which automatically propagate the tree node tags

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was probably an oversight...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it.

/**
* The expression represents a positional parameter that should be replaced by a literal.
*
* @param pos An unique position of the parameter in a SQL query.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param pos An unique position of the parameter in a SQL query.
* @param pos An unique position of the parameter in a SQL query text.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


val positions = scala.collection.mutable.Set.empty[Int]
bind(child) { case p @ PosParameter(pos) => positions.add(pos); p }
val posToIndex = positions.toSeq.sorted.zipWithIndex.toMap
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we fail earlier if the number of parameters does not match the number of actual arguments?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean exact matching? Current approach is consistent to named parameters when a map can contain not used arguments in a query. This can open a door for additional use cases, from my point of view.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not exact match, when the number of parameters is too less.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's fine to fail with unbound parameters.

exception = intercept[AnalysisException] {
spark.sql("select :param1, ?", Map("param1" -> 1))
},
errorClass = "UNBOUND_SQL_PARAMETER",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we throw a better error message, saying that we found positional parameters while binding named parameters?

@MaxGekk
Copy link
Copy Markdown
Member Author

MaxGekk commented Jun 22, 2023

Merging to master. Thank you, @entong @cloud-fan for review.

@MaxGekk MaxGekk closed this in 1b4048b Jun 22, 2023
dongjoon-hyun pushed a commit that referenced this pull request Jun 25, 2023
### What changes were proposed in this pull request?
In the PR, I propose to add a sequence of literal expressions to
1. Proto API: `SqlCommand` and the `SQL` relation
2. Scala connect API: `SparkSession. sql`

This PR is a follow up of #41568.

### Why are the changes needed?
Currently `SparkSession.sql` in Spark Connect doesn't support parameterized SQL with positional parameters. The changes allow to achieve feature parity with Scala/Java/PySpark APIs.

### Does this PR introduce _any_ user-facing change?
No, the changes just extend the existing API.

### How was this patch tested?
By running new test:
```
$ build/sbt "test:testOnly *.ClientE2ETestSuite"
```

Closes #41698 from MaxGekk/parameterized-query-pos-param-proto.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
MaxGekk pushed a commit that referenced this pull request Nov 26, 2024
…sql` API GA

### What changes were proposed in this pull request?

This PR aims to make `Parameterized SQL queries` of `SparkSession.sql` API GA in Apache Spark 4.0.0.

### Why are the changes needed?

Apache Spark has been supported `Parameterized SQL queries` because they are very convenient usage for the users .
- #38864 (Since Spark 3.4.0)
- #41568 (Since Spark 3.5.0)

It's time to make it GA by removing `Experimental` tags since this feature has been serving well for a long time.

### Does this PR introduce _any_ user-facing change?

No, there is no behavior change.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48965 from dongjoon-hyun/SPARK-50422.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
dongjoon-hyun added a commit to apache/spark-connect-swift that referenced this pull request May 3, 2025
### What changes were proposed in this pull request?

This PR aims to support `Parameterized SQL queries` in `sql` API.

### Why are the changes needed?

For feature parity, we had better support this GA feature.

- apache/spark#38864 (Since Spark 3.4.0)
- apache/spark#40623 (Since Spark 3.4.0)
- apache/spark#41568 (Since Spark 3.5.0)
- apache/spark#48965 (GA Since Spark 4.0.0)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #103 from dongjoon-hyun/SPARK-51986.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants