Skip to content

Databricks SQL operators#21363

Merged
potiuk merged 11 commits into
apache:mainfrom
alexott:databricks-sql-operator
Feb 27, 2022
Merged

Databricks SQL operators#21363
potiuk merged 11 commits into
apache:mainfrom
alexott:databricks-sql-operator

Conversation

@alexott

@alexott alexott commented Feb 6, 2022

Copy link
Copy Markdown
Contributor

This PR adds new operators to Databricks provider:

  • DatabrickSqlOperator that allows to execute SQL commands against Databricks SQL Endpoints and Databricks clusters.
  • DatabricksCopyIntoOperator (built on top of DatabrickSqlOperator) that allows to import data into Databricks tables.

This operator uses the same connection as other Databricks operators (although it could be discussed), if having a dedicated connection make sense as we can further customize it with specific input fields, etc.

Another possible improvement - make the databricks-sql-connector dependency optional, but I'm not sure how to make it correctly in Airflow

closes: #21030
closes: #21376

@alexott alexott marked this pull request as draft February 6, 2022 14:58
@alexott alexott changed the title [WIP-do-not-merge] Databricks SQL operator Databricks SQL operator Feb 6, 2022
Comment thread airflow/providers/databricks/hooks/databricks_base.py Outdated
Comment thread airflow/providers/databricks/hooks/databricks_base.py Outdated
Comment thread airflow/providers/databricks/hooks/databricks.py Outdated
Comment thread airflow/providers/databricks/hooks/databricks.py Outdated
Comment thread airflow/providers/databricks/hooks/databricks_sql.py Outdated
Comment thread airflow/providers/databricks/hooks/databricks_sql.py Outdated
Comment thread airflow/providers/databricks/hooks/databricks_sql.py Outdated
Comment thread airflow/providers/databricks/hooks/databricks_sql.py Outdated
@alexott

alexott commented Feb 6, 2022

Copy link
Copy Markdown
Contributor Author

Thank you for review @pateash , but this is really far from review state - more refactoring is coming

Comment thread setup.py Outdated
@alexott alexott force-pushed the databricks-sql-operator branch from da9a71f to 29bded7 Compare February 13, 2022 13:55
@alexott alexott marked this pull request as ready for review February 13, 2022 19:05
Comment thread airflow/providers/databricks/hooks/databricks_base.py Outdated
Comment thread airflow/providers/databricks/hooks/databricks_base.py Outdated
Comment thread docs/apache-airflow-providers-databricks/connections/databricks.rst Outdated
Comment thread docs/apache-airflow-providers-databricks/connections/databricks.rst Outdated
Comment thread docs/apache-airflow-providers-databricks/operators.rst Outdated
@alexott alexott force-pushed the databricks-sql-operator branch 2 times, most recently from e74f505 to 74d2e87 Compare February 20, 2022 10:57
@alexott alexott requested a review from mik-laj February 20, 2022 11:23
@alexott alexott changed the title Databricks SQL operator Databricks SQL operators Feb 20, 2022
@alexott

alexott commented Feb 21, 2022

Copy link
Copy Markdown
Contributor Author

@potiuk Jarek - would it be possible to review the changes?

@potiuk

potiuk commented Feb 26, 2022

Copy link
Copy Markdown
Member

You need to rebase @alexott

alexott and others added 10 commits February 27, 2022 11:01
No documentation & tests yet
Still need to fix existing tests & add tests for Databricks SQL hook &
operator
This includes:
* identifying SQL Endpoint by name
* allow to output results into a CSV/JSON/JSONL file
* fix tests for DatabricksHook
* address most of the comments
…rator

Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
Split documentation for operators into separate pages & add more content
and examples.
@alexott alexott force-pushed the databricks-sql-operator branch from d95f421 to 958c6be Compare February 27, 2022 10:22
@alexott

alexott commented Feb 27, 2022

Copy link
Copy Markdown
Contributor Author

@potiuk done. thank you for review

@potiuk

potiuk commented Feb 27, 2022

Copy link
Copy Markdown
Member

Tests are failing though :(

@alexott alexott force-pushed the databricks-sql-operator branch from 958c6be to 855aee4 Compare February 27, 2022 12:31
@alexott

alexott commented Feb 27, 2022

Copy link
Copy Markdown
Contributor Author

🤦 forgot that tests are referring to the requests that was moved into another file...
tests are green now @potiuk

@potiuk potiuk merged commit 27d19e7 into apache:main Feb 27, 2022
@jedcunningham jedcunningham added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Feb 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Databricks: Operator to load data into Delta tables Databricks: Add support for execution of SQL commands against clusters/SQL endpoints

6 participants