localize the run_id in dagrun#17502
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
|
I kind of feel maybe we should just use this behaviour all the time. The run ID is only for identifying the run, and using UTC has no advantage beside being easy to implement. |
|
Either way though, you need some tests to ensure this works. |
I agree that the
I've tested the code changes, setting this value to |
|
@uranusjr wrote about adding unit tests. And also there is one serious caveat (and one that should be covered by the unit tests) as well for sure is the DST change. UTC is not only easy to implement but guarantees conflict avoidance. With local time, if you have hourly scheduled DAG, you have a guaranteed conflict once a year - you will get the same run id conflict (when you move clock backwards). So unless you make sure that timezone is included and reflected properly, you might have a problem. I think in this case, the time zone offset will be different when you use isoformat, but it definitely needs testing (and what I mean by that unit testing) - so that we can avoid regressions and you actually have shown that you thought and tested that case consciously. Also i think it would be great to have parameterized unit |
I see, it makes sense that the run_id may be conflicting if the system timezone is changed. But actually, for our usage, we really need a local run_id to show to find the specific dag run. |
|
I think local_run_id might be misleading - if people will see different value in the UI and in the DB that might be quite confusing. Plus it adds unnecesary field in the DB. The run_id is used throughout the whole UI and It would be quite a change to use 'local' version all over the UI. And it will be even harder to test. I agree with @uranusjr that using local timezone settings always is a good idea, but it has to be thoroughly unit-tested. |
Agree. |
|
I want to know, in your local life, if the time is conflict, how do you distinguish them during the daylight saving time switch. |
|
I think the timezone part will change. For example, if you’re in England, 2am without DST (Greenwich Mean Time) would be |
If that, I think it will work fine if we can make sure the DST offset is in the |
This would be a bit difficult to determine unless the timezone implementation correctly implements |
|
I think |
|
@uranusjr I think you may miss the PR comment message. |
There was a problem hiding this comment.
This can have unintended consequences -- we will have to look closely.
There was a problem hiding this comment.
Sure. Post the checks before I determine changing these
As I checked the source codes, there may be two risks after changing these codes:
- As there is a UNIQUE_KEY(dag_id,run_id) in the database, we must make sure that the two values are unique.
- If the user changes the server config
localize_dag_run_idfromTruetoFalseor they change thedefault_timezonefrequently, the dagrun_idmay be conflict.
For the first risk issue, I use pendulum to make sure the local_times are unique during the DST changing. And I tried to did the full test and add the test codes into PR.
For the second risk issue, I think it is only a low-rate case. Even the user changes the default_timezone, the dag run_id is accurate to six decimal places, it is really hard to make the run_id conflict.
There was a problem hiding this comment.
It should be based on the dag timezone, not the default timezone.
There was a problem hiding this comment.
it makes sense.
But we have to change the method from static_method to class method.
I will change it tomorrow
There was a problem hiding this comment.
Oh hmmm. It can't even be a class method. Maybe we should just pass the date in the "right" TZ here already?
There was a problem hiding this comment.
@ashb
Just confirm that the dag timezone is the value that is inited here?
Line 326 in e99624d
Can we use the
tzinfo in the execution_date directly?Or we need to add a parameter
dag_timezone in the method generate_run_id and set the value when calling this method?Or transform the date to dag timezone datetime outside?
There was a problem hiding this comment.
@ashb
I checked the codes, we can not use the tzinfo in the execution_date directly, because it has been transformed into UTC already.

So we have two ways to implement:
- change the method generate_run_id from static method to class method
- add a parameter
timezonein the methodgenerate_run_idand the default value is settings.TIMEZONE.
Which one do you prefer? Or there is something I miss that we can have a better choice?
Let me know what do you think.
Thanks
There was a problem hiding this comment.
I don't see the need for this suffix -- when it's in DST the timezone offset will already be different (+00:00 vs +01:00__DST in your example for UK)
There was a problem hiding this comment.
Actually, I don't know the living habits who are under DST, because we don't use the DST in China.
I just think it will be much clearer if we add __DST.
I will remove it tomorrow
|
Commit the new changes.
Actually, I'm still not sure about the second change, we can continue the discussion about it here . Thanks a lot |
db0d4c5 to
66740a3
Compare
|
rebased to latest main |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
|
rebase to the latest main. |
Why I commit this change:
airflowfeatures, I found that the dagrun_idis generated using the UTC time, but actually, it is not convenient for us to use the UTC time because we are using local time.run_idismanual__2021-09-08T11:01:02.022226+08:00, but after a few hours, I want to find this run, I have to parse the local time to UTC time to get the specific dag_run. If therun_idis generated using local time, therun_idwill bemanual__2021-09-08T19:01:02.022226+08:00, I can easily find this dag_run.How do I change:
localize_dag_run_idin thecoreconfig and the default value is False. It takes no effect if the user doesn't care about the dag run_id format. If the users who just like me want to use the local time to generate therun_id, they can set it toTrueResult: