Skip to content

[SPARK-41666][PYTHON] Support parameterized SQL by sql()#39183

Closed
MaxGekk wants to merge 8 commits into
apache:masterfrom
MaxGekk:parameterized-sql-pyspark-dict
Closed

[SPARK-41666][PYTHON] Support parameterized SQL by sql()#39183
MaxGekk wants to merge 8 commits into
apache:masterfrom
MaxGekk:parameterized-sql-pyspark-dict

Conversation

@MaxGekk
Copy link
Copy Markdown
Member

@MaxGekk MaxGekk commented Dec 22, 2022

What changes were proposed in this pull request?

In the PR, I propose to extend the sql() method in PySpark to support parameterized SQL queries, see #38864, and add new parameter - args of the type Dict[str, str]. This parameter maps named parameters that can occur in the input SQL query to SQL literals like 1, INTERVAL '1-1' YEAR TO MONTH, DATE'2022-12-22' (see the doc of supported literals).

For example:

    >>> spark.sql("SELECT * FROM range(10) WHERE id > :minId", args = {"minId" : "7"})
       id
    0   8
    1   9

Closes #39159

Why are the changes needed?

To achieve feature parity with Scala/Java API, and provide PySpark users the same feature.

Does this PR introduce any user-facing change?

No, it shouldn't.

How was this patch tested?

Checked the examples locally, and running the tests:

$ python/run-tests --modules=pyspark-sql --parallelism=1

"INVALID_SQL_ARG" : {
"message" : [
"The argument <name> of `sql()` is invalid. Consider to replace it by a SQL literal statement."
"The argument <name> of `sql()` is invalid. Consider to replace it by a SQL literal."
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed this to address the review comment at #38864 (comment)

@MaxGekk MaxGekk marked this pull request as ready for review December 22, 2022 18:32
@MaxGekk MaxGekk changed the title [WIP][SPARK-41666][PYTHON] Support parameterized SQL by sql() [SPARK-41666][PYTHON] Support parameterized SQL by sql() Dec 22, 2022
@MaxGekk MaxGekk requested a review from HyukjinKwon December 22, 2022 18:38
Copy link
Copy Markdown
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread python/pyspark/sql/session.py
Comment thread python/pyspark/pandas/sql_formatter.py
@MaxGekk
Copy link
Copy Markdown
Member Author

MaxGekk commented Dec 23, 2022

Merging to master. Thank you, @HyukjinKwon for review.

@MaxGekk MaxGekk closed this in a1c727f Dec 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants