Skip to content

Change default log filename template to include map_index#21495

Merged
ashb merged 12 commits into
apache:mainfrom
astronomer:mapped-task-logs
Feb 15, 2022
Merged

Change default log filename template to include map_index#21495
ashb merged 12 commits into
apache:mainfrom
astronomer:mapped-task-logs

Conversation

@ashb

@ashb ashb commented Feb 10, 2022

Copy link
Copy Markdown
Member

With the recently added LogTemplate mechanism old TIs will still use the format they had at creation time (with the change here to ensure that we create a LogTemplate row for the just-upgraded-in-place) so the logs can still be viewed in the UI.

Unfortuantely the only way I can find to have map_index only appear for mapped tasks, but also to retain full control of the filename to users was to put a conditional in the template, which I am not happy about.

And since it was now getting quite "deep" I have chosen to "label" the components in the "hive partition style".

I'll need to test this change with Elasticsearch too ES working now.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

Comment thread airflow/config_templates/config.yml Outdated
@ashb ashb requested review from XD-DENG and kaxil as code owners February 10, 2022 16:52
@ashb

ashb commented Feb 11, 2022

Copy link
Copy Markdown
Member Author

Example of current layout:

$ tree ~/airflow/logs
/home/ash/airflow/logs
├── dag_id=maptest
│   ├── run_id=backfill__2022-02-10T00:00:00+00:00
│   │   ├── task_id=consumer
│   │   │   ├── map_index=0
│   │   │   │   └── attempt=1.log
│   │   │   ├── map_index=1
│   │   │   │   └── attempt=1.log
│   │   │   └── map_index=2
│   │   │       └── attempt=1.log
│   │   └── task_id=make_list
│   │       └── attempt=1.log
└── scheduler
    ├── 2022-02-10
    └── latest -> /home/ash/airflow/logs/scheduler/2022-02-10

@ashb ashb changed the title 🚧 Change default log filename template to include map_index Change default log filename template to include map_index Feb 11, 2022
@ashb ashb force-pushed the mapped-task-logs branch 2 times, most recently from 6dddeda to 3028972 Compare February 14, 2022 15:03
Comment thread airflow/config_templates/config.yml Outdated
Comment thread airflow/models/mappedoperator.py Outdated

@uranusjr uranusjr left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally lgtm except two minor issues

Comment thread airflow/decorators/base.py Outdated
Comment on lines 312 to 313

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be None?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, cos of this in BaseOperator init:

        self.inlets: List = []
        self.outlets: List = []

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or not. I was wrong and these aren't needed at all actually.

Comment thread airflow/models/taskinstance.py Outdated
Comment on lines 1839 to 1841

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was setting task.params here unintended or simply unnecessary?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget the error, but setting it here was giving some kind of error (I think from it being set twice on the same in-memory object, once in the supervisor, and once in the actual runner.)

So I took the approach that get_* should never have any sideffects!

Comment thread airflow/sensors/smart_sensor.py Outdated
Comment thread docs/apache-airflow/logging-monitoring/logging-tasks.rst Outdated
Comment thread docs/apache-airflow/logging-monitoring/logging-tasks.rst Outdated
@ashb

ashb commented Feb 15, 2022

Copy link
Copy Markdown
Member Author

@kaxil @uranusjr Final(?) review please?

ashb added 7 commits February 15, 2022 11:19
With the recently added LogTemplate mechanism old TIs will still use the
format they had at creation time (with the change here to ensure that we
create a LogTemplate row for the just-upgraded-in-place) so the logs can
still be viewed in the UI

And since it was now getting quite "deep" I have chosen to "label" the
components in the "hive partition style"
The `except AttributeError` was _also_ catching more than just "this
handler doesn't have a set_context attribute", but also errors from
calling that function which lead to hard-to-track-down errors (missing
inlets/outlets)

I have also changed `get_template_context` to a side-effect-free
function, so it no longer mutates task.params!
And instead of having to duplicate the default config value in to
configuration.py for the update process, change it to get the new
default value out of the loaded default_airflow.cfg
It has been broken in main for a while (it works fine in 2.2.x series),
but because we were catching _all_ AttributeErrors we never noticed.
ashb and others added 5 commits February 15, 2022 11:19
Since the log_id is never really visible to users I have taken the
easier approach of just always including the map_index, even for
unmapped tasks.
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
@github-actions github-actions Bot added the full tests needed We need to run full set of tests for this PR to merge label Feb 15, 2022
@github-actions

Copy link
Copy Markdown
Contributor

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@ashb ashb merged commit 2258e13 into apache:main Feb 15, 2022
@ashb ashb deleted the mapped-task-logs branch February 15, 2022 12:06
@jedcunningham jedcunningham added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Feb 28, 2022
potiuk added a commit that referenced this pull request Apr 7, 2022
@jedcunningham jedcunningham added this to the Airflow 2.3.0 milestone Apr 26, 2022
potiuk added a commit that referenced this pull request Jun 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dynamic-task-mapping AIP-42 changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) full tests needed We need to run full set of tests for this PR to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants