Improve SnowflakeSqlApiOperator error message on query status check failure#66642
Conversation
There was a problem hiding this comment.
I think this is an interesting one to think more about. So, the exception that is being thrown is from a failure when hitting the Snowflake API, not a failure of the query.
I'm not sure that it makes sense to remove the query and mark the statement as being failed. For example, what if your creds expire? This would cause the status check to fail even if the statement is running correctly. Your unit test captures this scenario (connection timing out) which doesn't mean that the query failed.
|
@justinpakzad - I think it might make sense to raise a more specific exception rather than just a |
Hey, thanks for the review. To address your first comment - that's a good point, slipped over my head a bit that there is a distinction between a query failure and other types of errors that don't necessarily mean there was a failure on the query end (as you mentioned). Are you suggesting we just revert it back but maybe with something like: try:
statement_status = self._hook.get_sql_api_query_status(query_id)
except Exception as e:
raise RuntimeError(f"Failed to get status for query {query_id}: {e}") from e |
|
Yes, I think that would make sense! |
c03ce37 to
6348406
Compare
jroachgolf84
left a comment
There was a problem hiding this comment.
Thanks for the change, this LGTM!
|
@justinpakzad This PR has had no author response for 15 days since the last maintainer comment, and the branch now has failing CI. Closing to keep the queue clean. When you're ready to resume, please rebase onto the current Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. Drafted-by: Claude Code (Opus 4.7); reviewed by @potiuk before posting |
|
@potiuk, not sure why this got closed, I was awaiting a review (it got tagged yesterday as ready for review) and would have happily just rebased (nothing was failing to begin with, just outdated branch). Anyways, was a tiny change so not really that important but maybe just a reminder next time instead of instant closure. |
My mistake, sorry |
…#368) Sweep 4 (stale ready-for-review label) in stale-sweeps.md proposes `strip-ready-label` (4a, healthy branch) or `close` (4b, rotted branch) when a PR carries the `ready for maintainer review` label, a maintainer commented ≥ 7 days ago, and the author has been silent since. The "maintainer commented ≥ 7 days ago" check was naive: it counted ANY maintainer comment older than 7 days, including ones that predate the `ready for maintainer review` label-add. So a PR that was promoted to ready on day N could be closed by Sweep 4b on day N+2 — because the maintainer comment that initially nudged the contributor (and that the contributor's eventual fix addressed, leading to the promotion) was now > 7 days old and the contributor was, by definition, "silent" since. This is backwards. When the `ready for maintainer review` label is added, the queue moves to the maintainers; the author has nothing they need to reply to. Pre-label maintainer comments are part of the conversation that *got* the PR to ready, not evidence of silence on something the author still owes. == Fix == Sweep 4's "Common trigger" now requires the qualifying maintainer comment to have come AFTER the label was added: last_maintainer_comment_at > ready_label_added_at `ready_label_added_at` is the most recent `LabeledEvent { label.name == "ready for maintainer review" }` timestamp from the PR's `timelineItems`. If no maintainer has commented post-promotion, the sweep doesn't fire at all — the queue is on the maintainers, not the contributor. The new guard mirrors the "after the label-add timestamp" pattern that classify-and-act.md row F4 already uses for regression detection, so the convention is consistent across the skill. == Live incident motivating the fix == apache/airflow#66642 — contributor was promoted to `ready for maintainer review` on 2026-05-24, closed by this skill on 2026-05-26 with *"no author response for 15 days since the last maintainer comment"*. The cited maintainer comment was from 2026-05-11, well before the promotion. Contributor pushed back politely; PR reopened 2026-05-28; this is the skill-side fix. == Verification == `skill-and-tool-validate` exits 0. Generated-by: Claude Code (Opus 4.7)
6348406 to
605de3c
Compare
No worries at all! Thanks for re-opening. |
|
@potiuk, can you give this a review please? Thanks. |
605de3c to
1d0a7c2
Compare
potiuk
left a comment
There was a problem hiding this comment.
LGTM — RuntimeError with the query id and from e chaining is a clear improvement over ValueError({...}), which used the wrong type and buried the cause in a dict. It's consistent with the RuntimeError already raised a few lines up for the error status, and nothing was catching the old exception, so no behavior breakage. Nice that you added a test for it. Thanks!
Drafted-by: Claude Code (Opus 4.8); reviewed by @potiuk before posting
Replace ValueError with RuntimeError on query status check failure in poll_on_queries, including the query id and original error for better debuggability. Added a new test to ensure a RuntimeError is raised when status check fails.
Was generative AI tooling used to co-author this PR?
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.