Try to move "dangling" rows in upgradedb#18953
Merged
uranusjr merged 5 commits intoOct 14, 2021
Merged
Conversation
cf3d9a0 to
277abdc
Compare
4e2e6aa to
dda518a
Compare
Instead of failing loudly for invalid records (which happens way too often), this attempts to move those offending data to another table and carry on with the migration if possible. This table for dangling data are copied with CREATE TABLE ... AS SELECT ... and could miss some indexing and stuff, but this is only meant for temporary storage, so this is probably not a big deal. If copying went well, the dangling data are automatically deleted so we can carry on with migration. Additionally, this commit removes the upgrade check on TaskFail, and added check on TaskReschedule. This is because TaskFail is not actually being migrated in 2.2, while TaskReschedule is, and we concluded this is likely a typo during implementation and not an intentional choice.
dda518a to
f73720a
Compare
Member
Author
|
I just added some code to show the alert in the web UI. This should be ready. |
Contributor
|
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
jedcunningham
approved these changes
Oct 14, 2021
jedcunningham
left a comment
Member
There was a problem hiding this comment.
I tested this locally as well, looks good.
Member
Author
|
Also I somehow forgot to push the fix to the db query count test before :( Waiting for CI to pass... |
kaxil
approved these changes
Oct 14, 2021
jedcunningham
pushed a commit
that referenced
this pull request
Oct 14, 2021
(cherry picked from commit f967ca9)
37 tasks
jedcunningham
pushed a commit
to astronomer/airflow
that referenced
this pull request
Oct 27, 2021
(cherry picked from commit f967ca9)
jedcunningham
pushed a commit
to astronomer/airflow
that referenced
this pull request
Oct 28, 2021
(cherry picked from commit f967ca9)
potiuk
added a commit
to potiuk/airflow
that referenced
this pull request
Nov 10, 2021
In Airflow 2.2.2 we introduced a fix in apache#18953 where the corrupted data was moved to a separate table. However some of our users (rightly) might not have the context. We've never had anything like that before, so the users who treat Airflow DB as black-boxes might get confused on what the error means and what they should do in this case. You can see it in apache#19440 converted into discussion apache#19444 and apache#19421 indicate that the message is a bit unclear for users. This PR attempts to improve that it adds `upgrading` section to our documentation and have the message link to it so that rather than asking questions in the issues, users can find context and answers what they should do in our docs. It also guides the users who treat Airflow DB as "black-box" on how they can use their tools and airflow db shell to fix the problem.
potiuk
added a commit
that referenced
this pull request
Nov 10, 2021
* Improve message and documentation around moved data In Airflow 2.2.2 we introduced a fix in #18953 where the corrupted data was moved to a separate table. However some of our users (rightly) might not have the context. We've never had anything like that before, so the users who treat Airflow DB as black-boxes might get confused on what the error means and what they should do in this case. You can see it in #19440 converted into discussion #19444 and #19421 indicate that the message is a bit unclear for users. This PR attempts to improve that it adds `upgrading` section to our documentation and have the message link to it so that rather than asking questions in the issues, users can find context and answers what they should do in our docs. It also guides the users who treat Airflow DB as "black-box" on how they can use their tools and airflow db shell to fix the problem.
kaxil
pushed a commit
that referenced
this pull request
Nov 11, 2021
* Improve message and documentation around moved data In Airflow 2.2.2 we introduced a fix in #18953 where the corrupted data was moved to a separate table. However some of our users (rightly) might not have the context. We've never had anything like that before, so the users who treat Airflow DB as black-boxes might get confused on what the error means and what they should do in this case. You can see it in #19440 converted into discussion #19444 and #19421 indicate that the message is a bit unclear for users. This PR attempts to improve that it adds `upgrading` section to our documentation and have the message link to it so that rather than asking questions in the issues, users can find context and answers what they should do in our docs. It also guides the users who treat Airflow DB as "black-box" on how they can use their tools and airflow db shell to fix the problem. (cherry picked from commit de43fb3)
22 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix #18894.
Instead of failing loudly for invalid records (which happens way too often), this attempts to move those offending data to another table and carry on with the migration if possible. This table for dangling data are copied with
CREATE TABLE ... AS SELECT ...and could miss some indexing and stuff, but this is only meant for temporary storage, so this is probably not a big deal. If copying went well, the dangling data are automatically deleted so we can carry on with migration.This also removed the upgrade check on TaskFail and added TaskReschedule. This is because TaskFail is not actually being migrated in 2.2, while TaskReschedule is, and we concluded this is likely a typo during implementation and not an intentional choice.