Removes the next_dagrun_create_after reset#21214
Conversation
|
This needs more thought/a deeper review than I can give right now. I think by removing that then we might have undone the fix where we introduced this change in the first place |
I checked this. Let's reconstruct the sequence of events.
|
ephraimbuddy
left a comment
There was a problem hiding this comment.
We need to have a test that creates a dagrun and while the dagrun is still running, the scheduler loops again and should not create more dagrun
ephraimbuddy
left a comment
There was a problem hiding this comment.
This works fine in that it doesn't create more runs when max_active_runs is reached but it didn't resolve the linked issue.
Can you work on fixing the linked issue too as I like that next_dagrun_create_after is no longer nullified
This fix resolves the linked issue. The new test I have to explain how it works:
There is a gap in launches: Pay attention to the same |
|
I did load testing today, and it turned out that now the scheduler consumes a lot more CPU, since it constantly selects all the necessary DAGs in the |
|
@avkirilishin , Can you test with |
|
|
Hi @avkirilishin, I debugged this issue and it turned out that the problem is not that the airflow/airflow/jobs/scheduler_job.py Line 1060 in 1d170f8 is returning a wrong figure. You can verify this. I have not been able to pin out why it's returning a wrong figure but If you change this line of code: airflow/airflow/jobs/scheduler_job.py Line 1088 in 1d170f8 to if self._should_update_dag_next_dagruns(dag, dag_model, active_runs-1):The problem will be gone. |
@ephraimbuddy I tested your suggestion. The problem has not gone away. DAG sticks after the first launch. |
Sorry, I have updated my comment, I gave the wrong permalink. You should change line 1088 airflow/airflow/jobs/scheduler_job.py Line 1088 in 1d170f8 |
This is a good change, and I think we need to apply it. But only it won't work if one sets the DagRun state manually (Success or Failed). |
It worked for me. Can you tests too? |
With main plus "active_runs-1" or with this branch plus "active_runs-1"? Main plus "active_runs-1" - the manually changed DagRun state leads to a hang for me (with |
It still works for me in main, not sure why. I will suggest we implement |
|
But we can (and we do it) calculate the current running DagRun count at any time. Why do we need a new column? Maybe this way:
? |
I want to avoid https://github.com/apache/airflow/pull/21214/files#diff-62c8e300ee91e0d59f81e0ea5d30834f04db71ae74f2e155a10b51056b00b59bR2874. That query is run a lot, the conditional check and providing a list of IDs to look into won't be efficient in large deployments |
|
Is there anything else I can do? And whether to close this merge request or not? |
If you don't mind you can apply the |
You can do it in a new PR if you want |
|
I have added the PR: #21413 |
|
Can we add airflow/tests/jobs/test_scheduler_job.py Lines 2927 to 2961 in 039ba52 I will remove |
|
Done |











closes: #21083
closes: #19901
The
next_dagrun_create_afterreset causes a problem (details in the issue).But now we check
_should_update_dag_next_dagrunsbeforecalculate_dagrun_date_fieldscall. So we can remove the reset.^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.