Skip to content

Add query tags: DatabricksSqlSensor, DatabricksPartitionSensor #68704

Merged
eladkal merged 3 commits into
apache:mainfrom
cruseakshay:feature/68582-databricks-query-tags
Jun 25, 2026
Merged

Add query tags: DatabricksSqlSensor, DatabricksPartitionSensor #68704
eladkal merged 3 commits into
apache:mainfrom
cruseakshay:feature/68582-databricks-query-tags

Conversation

@cruseakshay

@cruseakshay cruseakshay commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Extends the session-level query-tags instrumentation added in #66895 to the remaining Databricks operators and sensors that send queries to Databricks:

  • DatabricksSqlSensor
  • DatabricksPartitionSensor
  • DatabricksSQLStatementsOperator
  • DatabricksSQLStatementsSensor

Closes: #68582

What changed

Two mechanisms are used because these components talk to Databricks in two different ways:

  • DatabricksSqlSensor and DatabricksPartitionSensor go through DatabricksSqlHook, so they reuse the existing QUERY_TAGS session-parameter plumbing from Add session-level query tags to Databricks SQL operators #66895. The merged tags are set on the hook before the query runs.
  • DatabricksSQLStatementsOperator and DatabricksSQLStatementsSensor use the REST Statement Execution API (/api/2.0/sql/statements/), which does not accept session_configuration. The API exposes a native query_tags field ([{"key": ..., "value": ...}]), so tags are injected directly into the request body. For the sensor, tags are only applied when it submits a new statement; if a statement_id is passed in, nothing is submitted and no tags are attached.

Each component gains two parameters mirroring the existing SQL operators:

  • query_tags: dict[str, str | None] | None — user-supplied tags, templated

  • include_airflow_query_tags: bool = True — merge in Airflow context tags:

    • dag_id
    • task_id
    • run_id
    • try_number
    • map_index

User-supplied tags win on key collision.

The Airflow-context tag logic that previously lived in operators/databricks_sql.py is extracted to a shared utils/query_tags.py module:

  • get_airflow_query_tags
  • build_query_tags
  • dict_to_query_tag_list

This lets all five operators and sensors share one implementation.

This is a pure relocation: DatabricksSqlOperator and DatabricksCopyIntoOperator behavior is unchanged.

Was generative AI tooling used to co-author this PR?
  • Yes, Claude Code

Generated-by: Claude Code, following the [guidelines]

…ricksSQLStatementsOperator, DatabricksSQLStatementsSensor
@cruseakshay cruseakshay force-pushed the feature/68582-databricks-query-tags branch from 03f70ea to af92c2e Compare June 19, 2026 07:58
@cruseakshay cruseakshay marked this pull request as ready for review June 22, 2026 04:55
@eladkal

eladkal commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

cc @moomindani for review

@eladkal eladkal requested a review from jroachgolf84 June 24, 2026 14:35

@moomindani moomindani left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the following:

  • This PR completes query-tag coverage for all SQL-executing components in the Databricks provider. The remaining operators (SubmitRun, RunNow, CreateJobs, NotebookOperator, etc.) use the Jobs API, which doesn't expose query_tags, so they are out of scope.
  • The two injection paths — session parameter via DatabricksSqlHook and native query_tags field via Statement Execution API — match the Databricks architecture (query tags docs, Statement Execution API).
  • Parameter naming, types, defaults, and merge logic are consistent with the existing DatabricksSqlOperator / DatabricksCopyIntoOperator implementation from #66895.
  • The relocation from databricks_sql.py to utils/query_tags.py is a pure move with no behavior change.

One minor issue noted inline.


Drafted-by: Claude Code (Opus 4.8); reviewed by @moomindani before posting

Comment thread providers/databricks/src/airflow/providers/databricks/sensors/databricks.py Outdated
Co-Authored-By: Otto <noreply@astronomer.io>
@potiuk potiuk added the ready for maintainer review Set after triaging when all criteria pass. label Jun 25, 2026

@jroachgolf84 jroachgolf84 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took some time last night and this morning to review again - LGTM.

@eladkal eladkal merged commit 4bfafae into apache:main Jun 25, 2026
81 checks passed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A post merge comment for followup
This PR introduced utils which we didn't have in the first PR that added query tags #66895
Given that worth consedring a refactor moving query tag related utils like _format_query_tag_value from the hook to the utils file (possibly look if we need more housekeeping)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve DatabricksSqlSensor, DatabricksSqlStatementsSensor and DatabricksSqlStatementsOperators with query tags

5 participants