[SPARK-44066][SQL] Support positional parameters in Scala/Java sql()#41568
[SPARK-44066][SQL] Support positional parameters in Scala/Java sql()#41568MaxGekk wants to merge 25 commits into
sql()#41568Conversation
sql()
| CTERelationDef.curId.set(0) | ||
| val expected1 = parsePlan("WITH a AS (SELECT 1 c) SELECT * FROM a LIMIT 10").analyze | ||
| comparePlans(actual1, expected1) | ||
| // Ignore unused arguments |
There was a problem hiding this comment.
Can we add more negative test cases?
- Error out when there are less number of arguments than positional placeholders (?)
- Error out when the query text contains both named and positional params
There was a problem hiding this comment.
Error out when there are less number of arguments than positional placeholders (?)
@entong Such test has been added already, see test("non-substituted positional parameters") in ParametersSuite, please.
There was a problem hiding this comment.
Error out when the query text contains both named and positional params
I added such test "mixing of positional and named parameters". Thank you for the proposal @entong. BTW, existing API doesn't allow to pass both named and positional arguments but the query might contain such parameters. The test checks that.
| case p @ NameParameterizedQuery(child, args) if !child.containsPattern(UNRESOLVED_WITH) => | ||
| checkArgs(args) | ||
| val res = bind(child) { case NamedParameter(name) if args.contains(name) => args(name) } | ||
| res.copyTagsFrom(p) |
There was a problem hiding this comment.
do we really need this? the function bind calls resolveExpressionsWithPruning which automatically propagate the tree node tags
There was a problem hiding this comment.
:-) Wenchen, you added this in the PR https://github.com/apache/spark/pull/40333/files#diff-0c6239dcd44391b82e8a2b0a1bc3c6210daae448be9caa3c81675922fab9699cR106 . What was the reason?
There was a problem hiding this comment.
it was probably an oversight...
| /** | ||
| * The expression represents a positional parameter that should be replaced by a literal. | ||
| * | ||
| * @param pos An unique position of the parameter in a SQL query. |
There was a problem hiding this comment.
| * @param pos An unique position of the parameter in a SQL query. | |
| * @param pos An unique position of the parameter in a SQL query text. |
|
|
||
| val positions = scala.collection.mutable.Set.empty[Int] | ||
| bind(child) { case p @ PosParameter(pos) => positions.add(pos); p } | ||
| val posToIndex = positions.toSeq.sorted.zipWithIndex.toMap |
There was a problem hiding this comment.
shall we fail earlier if the number of parameters does not match the number of actual arguments?
There was a problem hiding this comment.
Do you mean exact matching? Current approach is consistent to named parameters when a map can contain not used arguments in a query. This can open a door for additional use cases, from my point of view.
There was a problem hiding this comment.
not exact match, when the number of parameters is too less.
There was a problem hiding this comment.
maybe it's fine to fail with unbound parameters.
| exception = intercept[AnalysisException] { | ||
| spark.sql("select :param1, ?", Map("param1" -> 1)) | ||
| }, | ||
| errorClass = "UNBOUND_SQL_PARAMETER", |
There was a problem hiding this comment.
shall we throw a better error message, saying that we found positional parameters while binding named parameters?
|
Merging to master. Thank you, @entong @cloud-fan for review. |
### What changes were proposed in this pull request? In the PR, I propose to add a sequence of literal expressions to 1. Proto API: `SqlCommand` and the `SQL` relation 2. Scala connect API: `SparkSession. sql` This PR is a follow up of #41568. ### Why are the changes needed? Currently `SparkSession.sql` in Spark Connect doesn't support parameterized SQL with positional parameters. The changes allow to achieve feature parity with Scala/Java/PySpark APIs. ### Does this PR introduce _any_ user-facing change? No, the changes just extend the existing API. ### How was this patch tested? By running new test: ``` $ build/sbt "test:testOnly *.ClientE2ETestSuite" ``` Closes #41698 from MaxGekk/parameterized-query-pos-param-proto. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…sql` API GA ### What changes were proposed in this pull request? This PR aims to make `Parameterized SQL queries` of `SparkSession.sql` API GA in Apache Spark 4.0.0. ### Why are the changes needed? Apache Spark has been supported `Parameterized SQL queries` because they are very convenient usage for the users . - #38864 (Since Spark 3.4.0) - #41568 (Since Spark 3.5.0) It's time to make it GA by removing `Experimental` tags since this feature has been serving well for a long time. ### Does this PR introduce _any_ user-facing change? No, there is no behavior change. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48965 from dongjoon-hyun/SPARK-50422. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
### What changes were proposed in this pull request? This PR aims to support `Parameterized SQL queries` in `sql` API. ### Why are the changes needed? For feature parity, we had better support this GA feature. - apache/spark#38864 (Since Spark 3.4.0) - apache/spark#40623 (Since Spark 3.4.0) - apache/spark#41568 (Since Spark 3.5.0) - apache/spark#48965 (GA Since Spark 4.0.0) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #103 from dongjoon-hyun/SPARK-51986. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
In the PR, I propose to extend SparkSession API and override the
sqlmethod by:which accepts an array of Java/Scala objects that can be converted to SQL literal expressions.
And the first argument
sqlTextmight have named parameters in the positions of constants like literal values. A value can be also aColumnof literal expression, in that case it is taken as is.For example:
spark.sql( sqlText = "SELECT * FROM tbl WHERE date > ? LIMIT ?", args = Array(LocalDate.of(2023, 6, 15), 100))The new
sql()method parses the input SQL statement and replaces the positional parameters by the literal values.Why are the changes needed?
To conform the SQL standard and JDBC/ODBC protocol.
To improve user experience with Spark SQL via
To achieve feature parity with other systems that support positional parameters.
Does this PR introduce any user-facing change?
No, the changes extend the existing API.
How was this patch tested?
By running new tests:
and the affected test suites: