Skip to content

Surface remote-log upload failures via structured warnings#66571

Merged
vatsrahul1001 merged 2 commits into
apache:mainfrom
potiuk:fix/tasksdk-remote-log-load-warning
May 19, 2026
Merged

Surface remote-log upload failures via structured warnings#66571
vatsrahul1001 merged 2 commits into
apache:mainfrom
potiuk:fix/tasksdk-remote-log-load-warning

Conversation

@potiuk

@potiuk potiuk commented May 7, 2026

Copy link
Copy Markdown
Member

Summary

upload_to_remote() silently returned in three failure paths (handler load failure, log-path resolution exception, handler.upload exception) with no signal to the operator that logs were not reaching the remote system. The supervisor's _upload_logs already logs the outermost exception, but the inner paths inside upload_to_remote() were silent — so a misconfigured remote handler or a transient remote-system outage would degrade silently while local-only logs continued.

Fix

Add a dedicated airflow.logging.remote structlog logger and emit a log.warning at each of the three failure paths with the TI id and the underlying error string. No behaviour change otherwise — failures still fall through softly so a bad remote handler doesn't abort the task lifecycle.

Reported by

L3 ASVS sweep — apache/tooling-agents#24 (FINDING-004).


Was generative AI tooling used to co-author this PR?
  • Yes — Claude Code (Opus 4.7)

Generated-by: Claude Code (Opus 4.7) following the guidelines

`upload_to_remote()` silently returned in three failure paths (handler
load failure, log-path resolution exception, handler.upload exception)
with no signal to the operator that logs were not reaching the remote
system. The supervisor's `_upload_logs` already logs the outermost
exception, but the inner paths inside `upload_to_remote()` were silent
— so a misconfigured remote handler or a transient remote-system
outage would degrade silently while local-only logs continued.

Add a dedicated `airflow.logging.remote` structlog logger and emit a
`log.warning` at each of the three failure paths with the TI id and
the underlying error string. No behaviour change otherwise — failures
still fall through softly so a bad remote handler doesn't abort the
task lifecycle.

Reported by the L3 ASVS sweep at apache/tooling-agents#24 (FINDING-004).
@potiuk potiuk force-pushed the fix/tasksdk-remote-log-load-warning branch from c33b0a4 to 313bc16 Compare May 17, 2026 19:39
@potiuk

potiuk commented May 17, 2026

Copy link
Copy Markdown
Member Author

I'd love to get this one merged — and would love it in 3.2.2 if it's not too late. cc @vatsrahul1001 (3.2.2 RM)


Drafted-by: Claude Code (Opus 4.7); reviewed by @potiuk before posting

@potiuk potiuk added this to the Airflow 3.2.2 milestone May 17, 2026
Comment thread task-sdk/src/airflow/sdk/log.py Outdated

@Lee-W Lee-W left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking nit

Comment thread task-sdk/src/airflow/sdk/log.py Outdated
Comment thread task-sdk/src/airflow/sdk/log.py Outdated
Comment thread task-sdk/src/airflow/sdk/log.py Outdated
@vatsrahul1001

Copy link
Copy Markdown
Contributor

@potiuk can you address comments?

@amoghrajesh amoghrajesh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also have a test pls?

Comment thread task-sdk/src/airflow/sdk/log.py Outdated
Comment thread task-sdk/src/airflow/sdk/log.py Outdated
Comment thread task-sdk/src/airflow/sdk/log.py Outdated
- Use ti.id directly (available since 3.0) instead of getattr fallback.
- Pass exc_info=exc to structlog so the traceback is preserved on
  path-resolution and upload failures.
- Trim past-tense from the docstring comment.
- Add tests covering the three failure paths and the silent
  success / no-path cases.
@potiuk

potiuk commented May 19, 2026

Copy link
Copy Markdown
Member Author

Addressed in 9c39993:

  • @Lee-W — switched to str(ti.id) (3 places) and exc_info=exc for both except paths so the traceback is preserved.
  • @amoghrajesh — trimmed the past-tense sentence from the comment; also added task-sdk/tests/task_sdk/test_log.py with 5 tests for upload_to_remote() covering the three failure paths plus silent-no-path and silent-success.

Drafted-by: Claude Code (Opus 4.7); reviewed by @potiuk before posting

@vatsrahul1001 vatsrahul1001 merged commit ef87426 into apache:main May 19, 2026
113 checks passed
@github-actions

Copy link
Copy Markdown
Contributor

Backport successfully created: v3-2-test

Note: As of Merging PRs targeted for Airflow 3.X
the committer who merges the PR is responsible for backporting the PRs that are bug fixes (generally speaking) to the maintenance branches.

In matter of doubt please ask in #release-management Slack channel.

Status Branch Result
v3-2-test PR Link

vatsrahul1001 pushed a commit that referenced this pull request May 19, 2026
#66571) (#67172)

* Surface remote-log upload failures via structured warnings

`upload_to_remote()` silently returned in three failure paths (handler
load failure, log-path resolution exception, handler.upload exception)
with no signal to the operator that logs were not reaching the remote
system. The supervisor's `_upload_logs` already logs the outermost
exception, but the inner paths inside `upload_to_remote()` were silent
— so a misconfigured remote handler or a transient remote-system
outage would degrade silently while local-only logs continued.

Add a dedicated `airflow.logging.remote` structlog logger and emit a
`log.warning` at each of the three failure paths with the TI id and
the underlying error string. No behaviour change otherwise — failures
still fall through softly so a bad remote handler doesn't abort the
task lifecycle.

Reported by the L3 ASVS sweep at apache/tooling-agents#24 (FINDING-004).

* Address review feedback on remote-log upload warnings

- Use ti.id directly (available since 3.0) instead of getattr fallback.
- Pass exc_info=exc to structlog so the traceback is preserved on
  path-resolution and upload failures.
- Trim past-tense from the docstring comment.
- Add tests covering the three failure paths and the silent
  success / no-path cases.
(cherry picked from commit ef87426)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
vatsrahul1001 pushed a commit that referenced this pull request May 20, 2026
#66571) (#67172)

* Surface remote-log upload failures via structured warnings

`upload_to_remote()` silently returned in three failure paths (handler
load failure, log-path resolution exception, handler.upload exception)
with no signal to the operator that logs were not reaching the remote
system. The supervisor's `_upload_logs` already logs the outermost
exception, but the inner paths inside `upload_to_remote()` were silent
— so a misconfigured remote handler or a transient remote-system
outage would degrade silently while local-only logs continued.

Add a dedicated `airflow.logging.remote` structlog logger and emit a
`log.warning` at each of the three failure paths with the TI id and
the underlying error string. No behaviour change otherwise — failures
still fall through softly so a bad remote handler doesn't abort the
task lifecycle.

Reported by the L3 ASVS sweep at apache/tooling-agents#24 (FINDING-004).

* Address review feedback on remote-log upload warnings

- Use ti.id directly (available since 3.0) instead of getattr fallback.
- Pass exc_info=exc to structlog so the traceback is preserved on
  path-resolution and upload failures.
- Trim past-tense from the docstring comment.
- Add tests covering the three failure paths and the silent
  success / no-path cases.
(cherry picked from commit ef87426)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
vatsrahul1001 pushed a commit that referenced this pull request May 20, 2026
#66571) (#67172)

* Surface remote-log upload failures via structured warnings

`upload_to_remote()` silently returned in three failure paths (handler
load failure, log-path resolution exception, handler.upload exception)
with no signal to the operator that logs were not reaching the remote
system. The supervisor's `_upload_logs` already logs the outermost
exception, but the inner paths inside `upload_to_remote()` were silent
— so a misconfigured remote handler or a transient remote-system
outage would degrade silently while local-only logs continued.

Add a dedicated `airflow.logging.remote` structlog logger and emit a
`log.warning` at each of the three failure paths with the TI id and
the underlying error string. No behaviour change otherwise — failures
still fall through softly so a bad remote handler doesn't abort the
task lifecycle.

Reported by the L3 ASVS sweep at apache/tooling-agents#24 (FINDING-004).

* Address review feedback on remote-log upload warnings

- Use ti.id directly (available since 3.0) instead of getattr fallback.
- Pass exc_info=exc to structlog so the traceback is preserved on
  path-resolution and upload failures.
- Trim past-tense from the docstring comment.
- Add tests covering the three failure paths and the silent
  success / no-path cases.
(cherry picked from commit ef87426)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
vatsrahul1001 pushed a commit that referenced this pull request May 21, 2026
#66571) (#67172)

* Surface remote-log upload failures via structured warnings

`upload_to_remote()` silently returned in three failure paths (handler
load failure, log-path resolution exception, handler.upload exception)
with no signal to the operator that logs were not reaching the remote
system. The supervisor's `_upload_logs` already logs the outermost
exception, but the inner paths inside `upload_to_remote()` were silent
— so a misconfigured remote handler or a transient remote-system
outage would degrade silently while local-only logs continued.

Add a dedicated `airflow.logging.remote` structlog logger and emit a
`log.warning` at each of the three failure paths with the TI id and
the underlying error string. No behaviour change otherwise — failures
still fall through softly so a bad remote handler doesn't abort the
task lifecycle.

Reported by the L3 ASVS sweep at apache/tooling-agents#24 (FINDING-004).

* Address review feedback on remote-log upload warnings

- Use ti.id directly (available since 3.0) instead of getattr fallback.
- Pass exc_info=exc to structlog so the traceback is preserved on
  path-resolution and upload failures.
- Trim past-tense from the docstring comment.
- Add tests covering the three failure paths and the silent
  success / no-path cases.
(cherry picked from commit ef87426)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
@ashb

ashb commented Jun 4, 2026

Copy link
Copy Markdown
Member

Two issues:

One: This happens for every task when remote logging is not configured.

Two: this only appears in the worker logs, which isn't visible to 99% of Airflow users. Was that your intention with this log, or were you expecting it to be visible in task logs?

@ashb

ashb commented Jun 9, 2026

Copy link
Copy Markdown
Member

@potiuk ^^

@potiuk

potiuk commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

@ashb you were right — sorry for the unnecessary noise. The warnings fired for every task on installs without remote logging (a missing handler is the default, not a failure) and only reached worker stdout, not the task logs operators actually read.

I tried a more targeted fix — silencing the no-handler case and routing genuine upload failures through the task's own logger so they'd show up in task logs — but it added more complexity than the defense-in-depth was worth. Reverting the whole change is the best solution for now: #68371. We can revisit surfacing real remote-upload failures later with a design that doesn't trip on the common no-remote-logging case.


Drafted-by: Claude Code (Opus 4.8); reviewed by @potiuk before posting

ashb pushed a commit that referenced this pull request Jun 11, 2026
…66571)" (#68371)

This reverts commit ef87426.

The remote-log upload warnings fired for every task on installs without
remote logging configured (a missing handler is the default, not a
failure) and were emitted through a dedicated "airflow.logging.remote"
logger that only reaches worker stdout, not the task logs operators
actually read. The whole change is withdrawn rather than patched.
imrichardwu pushed a commit to imrichardwu/airflow that referenced this pull request Jun 16, 2026
…pache#66571)" (apache#68371)

This reverts commit ef87426.

The remote-log upload warnings fired for every task on installs without
remote logging configured (a missing handler is the default, not a
failure) and were emitted through a dedicated "airflow.logging.remote"
logger that only reaches worker stdout, not the task logs operators
actually read. The whole change is withdrawn rather than patched.
dingo4dev pushed a commit to dingo4dev/airflow that referenced this pull request Jun 16, 2026
…pache#66571)" (apache#68371)

This reverts commit ef87426.

The remote-log upload warnings fired for every task on installs without
remote logging configured (a missing handler is the default, not a
failure) and were emitted through a dedicated "airflow.logging.remote"
logger that only reaches worker stdout, not the task logs operators
actually read. The whole change is withdrawn rather than patched.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants