Skip to content

Harden S3 sync target paths#67847

Merged
shahar1 merged 2 commits into
apache:mainfrom
dfgvaetyj3456356-hash:security/object-storage-sync-path-traversal
Jun 17, 2026
Merged

Harden S3 sync target paths#67847
shahar1 merged 2 commits into
apache:mainfrom
dfgvaetyj3456356-hash:security/object-storage-sync-path-traversal

Conversation

@dfgvaetyj3456356-hash

@dfgvaetyj3456356-hash dfgvaetyj3456356-hash commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

S3Hook.sync_to_local_dir() derives local target paths from object keys after removing the configured prefix. Current upstream already contains the analogous GCS guard; this PR applies the same containment check to S3 sync targets.

The change resolves each candidate S3 download target and rejects keys that would land outside the requested local directory before creating parent directories or downloading content.

Regression coverage adds a traversal-shaped S3 object key under the requested prefix.

Validation performed locally:

  • python -m py_compile providers\amazon\src\airflow\providers\amazon\aws\hooks\s3.py providers\amazon\tests\unit\amazon\aws\hooks\test_s3.py
  • python -m ruff check providers\amazon\src\airflow\providers\amazon\aws\hooks\s3.py providers\amazon\tests\unit\amazon\aws\hooks\test_s3.py
  • git diff --check -- providers\amazon\src\airflow\providers\amazon\aws\hooks\s3.py providers\amazon\tests\unit\amazon\aws\hooks\test_s3.py

The focused pytest command could not run in this local checkout because the Airflow pytest plugin dependency time_machine is not installed in the current environment.

Gen-AI disclosure:

  • This PR was prepared with assistance from OpenAI Codex.
  • I reviewed the generated code and tests for relevance, correctness, and scope before submitting.
  • I ran the checks listed above locally, and I take responsibility for the final submitted changes.

Important

🛠️ Maintainer triage note for @dfgvaetyj3456356-hash · by @potiuk · 2026-06-12 11:23 UTC

Some review feedback from @shahar1 is waiting on you:

  • You've pushed new commits since @shahar1 requested changes, but the review hasn't been followed up.

The ball is in your court — you've been assigned to this PR. Address the outstanding comments (reply or push a fix), then ping the reviewer for a re-review.

Automated triage — may be imperfect; a maintainer takes the next look.

@boring-cyborg boring-cyborg Bot added area:providers provider:amazon AWS/Amazon - related issues provider:google Google (including GCP) related issues labels Jun 1, 2026
@boring-cyborg

boring-cyborg Bot commented Jun 1, 2026

Copy link
Copy Markdown

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example Dag that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@dfgvaetyj3456356-hash dfgvaetyj3456356-hash force-pushed the security/object-storage-sync-path-traversal branch from 8a12b7c to 04d72c6 Compare June 1, 2026 14:17
@dfgvaetyj3456356-hash dfgvaetyj3456356-hash changed the title Harden object storage sync target paths Harden S3 sync target paths Jun 1, 2026
shahar1
shahar1 previously requested changes Jun 5, 2026

@shahar1 shahar1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the PR's description, including adding declaration for using AI according to the guidelines.

@dfgvaetyj3456356-hash dfgvaetyj3456356-hash force-pushed the security/object-storage-sync-path-traversal branch from 4370590 to 2c4e1f0 Compare June 5, 2026 08:00
@dfgvaetyj3456356-hash

Copy link
Copy Markdown
Contributor Author

Thanks for the review. I updated the PR description to include the required Gen-AI disclosure and refreshed the validation notes.

I also rebased the branch onto current main and force-pushed clean history, so the PR now contains only the two intended noreply-authored commits.

Local validation after the rebase:

  • python -m py_compile providers\amazon\src\airflow\providers\amazon\aws\hooks\s3.py providers\amazon\tests\unit\amazon\aws\hooks\test_s3.py
  • python -m ruff check providers\amazon\src\airflow\providers\amazon\aws\hooks\s3.py providers\amazon\tests\unit\amazon\aws\hooks\test_s3.py
  • git diff --check -- providers\amazon\src\airflow\providers\amazon\aws\hooks\s3.py providers\amazon\tests\unit\amazon\aws\hooks\test_s3.py

@dfgvaetyj3456356-hash

Copy link
Copy Markdown
Contributor Author

Hi @shahar1, I updated the PR description with the Gen-AI disclosure and validation notes, then rebased/pushed the clean branch. The visible checks are green now. Could you take another look when you have a chance?

@shahar1 shahar1 dismissed their stale review June 12, 2026 13:32

PR's description was fixed

@shahar1 shahar1 self-requested a review June 12, 2026 13:33
@shahar1 shahar1 merged commit 28ca0b6 into apache:main Jun 17, 2026
94 checks passed
@boring-cyborg

boring-cyborg Bot commented Jun 17, 2026

Copy link
Copy Markdown

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:amazon AWS/Amazon - related issues provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants