Warn when supervisor-side mask_secret IPC send fails#67503
Conversation
|
Gentle nudge — this and a few related task-SDK / execution-API hardening PRs are green and have been waiting on review (9 days): #67503, #67504, #67505, #67506, #67628. They're small, self-contained fixes. @ashb @kaxil @amoghrajesh — would appreciate a look when you have a moment. Drafted-by: Claude Code (Opus 4.8); reviewed by @potiuk before posting |
ashb
left a comment
There was a problem hiding this comment.
Fairly sure this is a false positive.
0cb1598 to
6752065
Compare
|
Fair point that in a full supervisor crash the warning can't be delivered — stdout/stderr are the supervisor's sockets, as you say. But And that's the case that actually bites: under You're right that the original test (a generic Drafted-by: Claude Code (Opus 4.8); reviewed by @potiuk before posting |
No, I don't think it can. |
005f319 to
9230e4b
Compare
|
To lay out where this stands and a way to make it stronger. Current approach. The PR replaces the old On the "the types already prevent this" point. You're right that Proposal to make it stronger. Rather than just reporting the gap, we can mostly close it by coercing the non-JSON iterables to a ipc_value = list(secret) if isinstance(secret, (set, frozenset, tuple)) else secret
comms.send(MaskSecret(value=ipc_value, name=name))That makes the two paths agree for the common case (a Happy to push the coercion + a test asserting the supervisor now receives the masked list, if you're on board — or keep it warning-only if you'd prefer the smaller change. Which do you want? Drafted-by: Claude Code (Opus 4.8); reviewed by @potiuk before posting |
476e2de to
56be332
Compare
|
@ashb — gentle nudge on this one. It's now rebased on The open question from my last comment is still the decision point: Two ways forward, your call:
Which would you prefer? Drafted-by: Claude Code (Opus 4.8); reviewed by @potiuk before posting |
56be332 to
46e438e
Compare
|
Thanks @ashb — good catch on the leak risk especially. Addressed all three and rebased onto main:
Since the warning is now provably leak-safe and only fires on a genuine registration failure, hopefully this addresses the concern — happy to drop it entirely if you'd still rather not carry it. |
``mask_secret()`` mirrors the mask to the supervisor process so any forwarded logs are masked there too. The send was wrapped in a bare ``suppress(Exception)``, so a transient IPC failure was silently swallowed. When ``sending_to_supervisor=True`` the local task drops its own ``mask_logs`` processor (it relies on the supervisor to do the masking) — so a silent send failure could leave the secret unmasked in supervisor-level logs. Replace the bare suppress with a targeted try/except that emits a warning to the dedicated ``airflow.logging.mask_secret`` logger when the supervisor is available but the send fails. The "no supervisor context" case (called outside task execution) stays silent — there is no supervisor to notify.
Per @ashb's review: - The warning previously used exc_info=True. A failure to build MaskSecret can raise an error whose text reprs the value, so on this path (where local mask_logs is already off) the warning could itself leak the secret. Now we record only error_type=type(e).__name__ -- no traceback, message, or value. - Reworded the 'task-execution context' comment to avoid clashing with the SDK's 'context' meaning. - Reworked the tests to prove the secret value never appears in the warning (the _Unserializable repr embeds a canary that the assertion checks is absent), covering both the build-failure and broken-socket paths. Generated-by: Claude Opus 4.8 (1M context)
46e438e to
9cb7ad5
Compare
|
Possibly still worth merging it now @ashb ? |
mask_secret()mirrors the mask to the supervisor process so any forwarded logs are masked there too. The send atlog.py:281was wrapped in a baresuppress(Exception), so a transient IPC failure was silently swallowed.When
sending_to_supervisor=Truethe local task drops its ownmask_logsprocessor (it relies on the supervisor to do the masking — seelog.pylines 72-73 and 115-118) — so a silent send failure could leave the secret unmasked in supervisor-level logs. Operators had no signal that this had happened.Reported as F-001 in the
apache/tooling-agentsL3 task-sdk sweep0920c77.Change
Replace the bare
suppresswith a targetedtry/exceptthat emits a warning to a dedicatedairflow.logging.mask_secretstructlog logger when the supervisor is available but the send fails. The "no supervisor context" case (mask_secretcalled outside task execution, noSUPERVISOR_COMMS) stays silent — there is no supervisor to notify.Test plan
test_warns_when_supervisor_send_fails— patchestask_runner.SUPERVISOR_COMMSto a Mock whosesendraises; captures structlog output and asserts thesupervisor_mask_secret_failedevent lands with the secret name and exc_info.test_silent_when_supervisor_send_succeeds— success path emits no warning,comms.sendcalled once.test_silent_when_no_supervisor_context— outside a task-execution context the function is a no-op.prek run ruffclean.prek run mypy-task-sdkclean.test_log.pysuite: 8 passed.Was generative AI tooling used to co-author this PR?
Generated-by: Claude Code (Opus 4.7) following the guidelines