Skip to content

Fix Supervisor crash in Stackdriver remote log IO#68295

Draft
23tae wants to merge 1 commit into
apache:mainfrom
23tae:fix-gcl-supervisor-crash
Draft

Fix Supervisor crash in Stackdriver remote log IO#68295
23tae wants to merge 1 commit into
apache:mainfrom
23tae:fix-gcl-supervisor-crash

Conversation

@23tae

@23tae 23tae commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

This PR prevents the Airflow 3 Supervisor process from crashing entirely when transient network or IAM errors occur during Stackdriver log transmission.

Description

This PR addresses Bug 3 described in #68240.

In the new Airflow 3 Task SDK architecture, the StackdriverRemoteLogIO handler runs within the Supervisor process rather than the task process itself. If an exception is raised during _transport.send(), it propagates upwards and crashes the entire Supervisor process, disrupting task monitoring.

Key changes

  • Exception Handling: Wrapped the _transport.send() call in a try...except Exception block. When a transmission fails, the exception is safely caught, and a warning is logged using the internal _logger. This ensures the Supervisor process remains highly resilient.

Verification Results

I have verified the changes using prek and breeze.

  • Static Checks (Prek): Passed
  • Unit Tests (Breeze): Passed

related: #68240


Was generative AI tooling used to co-author this PR?
  • Yes

Generated-by: Antigravity following the guidelines


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

Note

✅ Ready for review · @23tae@potiuk · 2026-06-12 13:30 UTC

Thanks @23tae — all checks are green and this PR is marked ready for maintainer review. The ball is with the maintainers now; a maintainer will take the next look.

Automated triage — may be imperfect.

@shahar1 shahar1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please resolve conflicts

@23tae 23tae force-pushed the fix-gcl-supervisor-crash branch from baa54c6 to dbde7e6 Compare June 17, 2026 07:48
@23tae

23tae commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

@shahar1 Conflicts resolved. Thanks!

@23tae 23tae requested a review from shahar1 June 17, 2026 09:09
@eladkal

eladkal commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Does this PR solve #68240 ?

@23tae

23tae commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

@23tae 23tae marked this pull request as draft June 18, 2026 04:43
@23tae

23tae commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Putting this on hold — the try-except guard here may only be effective for SyncTransport users, not the default BackgroundThreadTransport. Waiting for the stack trace from the issue author to confirm (#68240 (comment)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:logging area:providers provider:google Google (including GCP) related issues ready for maintainer review Set after triaging when all criteria pass.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants