Skip to content

Backfill partitioned Dags by partition-date range#67537

Merged
Lee-W merged 11 commits into
apache:mainfrom
astronomer:dags-backfill-partition-range
Jun 15, 2026
Merged

Backfill partitioned Dags by partition-date range#67537
Lee-W merged 11 commits into
apache:mainfrom
astronomer:dags-backfill-partition-range

Conversation

@Lee-W

@Lee-W Lee-W commented May 26, 2026

Copy link
Copy Markdown
Member

Why

Partition-based timetables advance along a partition-date axis rather than a logical-date interval, so the existing backfill path — which enumerates logical-date intervals — could not express "backfill these partitions". Rather than expose a separate set of partition flags, this teaches backfill to recognise a partitioned timetable and reuse the dates the user already passes, so there is nothing new to learn.

What

  • New timetable method iter_partition_dagrun_infos (airflow-core/src/airflow/timetables/base.py): the base Timetable raises NotImplementedError by default; CronPartitionTimetable (timetables/trigger.py) implements it. Walking the partition-date axis, it yields one DagRunInfo per cron tick in the half-open interval [earliest_partition_date, latest_partition_date) — with partition_date=tick, partition_key set to the tick's label in the timetable timezone, run_after=partition_date, and
    data_interval=None.
  • _get_info_list (models/backfill.py) dispatches on dag.timetable.partitioned: partitioned Dags enumerate over [from_date.date(), to_date.date() + 1 day) via iter_partition_dagrun_infos (resolve_day_bound snaps to the local-midnight boundary; the +1 day makes --to-date inclusive); non-partitioned Dags keep the original logical-date interval path.
  • _format_key formats the partition key in the timetable timezone, fixing cross-timezone labels — e.g. an Asia/Taipei midnight partition is labelled with its local date instead of the previous UTC day.
  • CLI --from-date / --to-date are now required=True, with help text noting that for a partitioned Dag this range is the partition-date range (detected automatically).

closes: #65922


Was generative AI tooling used to co-author this PR?
  • Yes — Claude Code (Opus 4.8)
    Generated-by: Claude Code (Opus 4.8) following the
    guidelines

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@Lee-W Lee-W force-pushed the dags-backfill-partition-range branch 5 times, most recently from 44f05d4 to c004a94 Compare May 29, 2026 14:35
@Lee-W Lee-W changed the title feat(cli): support partition-date range for backfill create Support partitioned Dag backfills with mutually exclusive CLI selectors May 29, 2026
@Lee-W Lee-W force-pushed the dags-backfill-partition-range branch 2 times, most recently from 0b1dcaa to 693aca6 Compare June 4, 2026 00:59
@Lee-W Lee-W added this to the Airflow 3.3.0 milestone Jun 5, 2026
@Lee-W Lee-W force-pushed the dags-backfill-partition-range branch from 693aca6 to d510489 Compare June 5, 2026 06:22
@Lee-W Lee-W moved this to In Progress in AIP-76 Asset Partitioning Jun 5, 2026
@Lee-W Lee-W force-pushed the dags-backfill-partition-range branch 5 times, most recently from e3da6eb to 783d201 Compare June 8, 2026 13:12
@Lee-W Lee-W changed the title Support partitioned Dag backfills with mutually exclusive CLI selectors Backfill partitioned Dags by partition-date range Jun 8, 2026
@Lee-W Lee-W force-pushed the dags-backfill-partition-range branch 3 times, most recently from 2534af9 to 3391c38 Compare June 8, 2026 14:32
@Lee-W Lee-W marked this pull request as ready for review June 8, 2026 14:43
@Lee-W Lee-W requested a review from uranusjr June 10, 2026 13:30
Comment thread airflow-core/src/airflow/serialization/definitions/dag.py
@Lee-W Lee-W force-pushed the dags-backfill-partition-range branch from 276d17c to 69fc2e2 Compare June 11, 2026 06:29
Comment thread airflow-core/src/airflow/models/backfill.py Outdated
Comment thread airflow-core/src/airflow/timetables/base.py
@Lee-W Lee-W force-pushed the dags-backfill-partition-range branch from 69fc2e2 to 5e3d3c4 Compare June 12, 2026 08:28
@Lee-W Lee-W requested a review from uranusjr June 12, 2026 08:29
@Lee-W Lee-W force-pushed the dags-backfill-partition-range branch from 5e3d3c4 to 67f8049 Compare June 12, 2026 10:13
Comment thread airflow-core/src/airflow/models/backfill.py Outdated
Comment thread airflow-core/src/airflow/timetables/base.py Outdated
Lee-W added 11 commits June 15, 2026 15:34
* partitioned Dags backfill via --partition-date-start / --partition-date-end
* non-partitioned Dags backfill via --from-date / --to-date
* mixed or missing selectors raise dedicated ValueError subclasses

Dates are compared as wall-clock dates in the timezone of the provided
value. The REST API does not yet expose partition selectors.
…form

The Run Backfill form now renders partition_date_start / partition_date_end
for partitioned Dags (and from/to for the rest, switched on
timetable_partitioned), and both the create and dry-run paths send the
matching selectors so a partitioned-Dag backfill no longer fails. The
partition window is validated start <= end on the client.
…o-date

Backfilling a partitioned Dag now uses --from-date/--to-date, interpreted
as the partition-date range when the Dag's timetable is partitioned. The
separate --partition-date-start/--end flags, the mutual-exclusion
validation, and the partition-specific API request and UI form fields are
removed -- the partition vs date-range behavior is detected from the
timetable, so one date range is passed regardless of Dag type.
… string

partition_key is a formatted label; relabeling it (e.g. formatting in the timetable timezone) made backfill dedup miss already-scheduled runs and create duplicate partition runs on upgrade. Dedup now matches on partition_date — the format-independent UTC instant of the tick, shared by scheduled and backfill runs for the same partition.

Adds a query-level and an end-to-end regression test covering a historical run keyed with the old UTC-instant label.
…tion

Pin that asset-partitioned Dags (PartitionedAssetTimetable, periodic=False) are
rejected from backfill and dry-run as non-periodic (HTTP 422) rather than reaching
the partition iterator. Adds a param to the two existing non-periodic rejection tests.
@Lee-W Lee-W force-pushed the dags-backfill-partition-range branch from 67f8049 to 2172d1a Compare June 15, 2026 07:36
@Lee-W Lee-W requested a review from uranusjr June 15, 2026 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

airflow dags backfill: partition-range + ordering + concurrency

4 participants