Add db clean CLI command for purging old data#20838
Conversation
|
Love this @dstandish ❤️ |
There was a problem hiding this comment.
Perhaps you mean "Splits comma-separated string and returns the result in a list"?
cadb6ae to
f6e8c04
Compare
|
Okie doke I think this is ready for a look. Had to write a lot of tests. Think I have decent coverage. I have toyed with the idea of moving this to the |
78bd301 to
d1291e8
Compare
maintenance cleanup CLI command for purging old datadb clean CLI command for purging old data
|
renamed welcome opinions on the naming |
74bc174 to
9130721
Compare
There was a problem hiding this comment.
Probably easier to understand if this logic is put into sorted(..., key=...) instead. This class does not need to be generally sortable (and the sorting logic isn’t obvious either).
There was a problem hiding this comment.
shouldn't this reference other arg instead of self twice?
There was a problem hiding this comment.
shouldn't this reference
otherarg instead ofselftwice?
ha, yup!
f8ea2a6 to
b2b77a4
Compare
|
Static checks :) ? |
sorry should be fixed now |
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
1aeb431 to
b66c7a8
Compare
Must supply "purge before date".
Can optionally provide table list.
Dry run will only print the number of rows meeting criteria.
If not dry run, will require the user to confirm before deleting.
example dry run output:
Notes:
I didn't add docs since it appears that CLI docs are mostly automated and the command is pretty intuitive.
One thing that's maybe a bit non-obvious (though very sensible) that I'll highlight here is that for DagRuns the last scheduled dag run is always retained. This is to ensure continuity with with scheduled dag runs.
The other thing that's maybe nonobvious is that we have foreign key relationships and they have
on delete cascadebuilt in to the model so this means if your cleanup run deletes a dag run it will also delete all of its associated TIs, even if you didn't ask to cleanup the TI table.I have toyed with the idea of moving this to the airflow db subcommand for now since it's very much db-specific and we don't have any other maintenance commands at the moment. I we add a lot of maintenancey stuff later we can always move the commands. I welcome opinions on this.
The last thing I should point out is, the verbose mode doesn't do anything right now. Initially I thought we might print out the rows that were deleted (similar to how they are printed to logs in the "maintenance dags" that inspired this effort). But I don't think it's actually that helpful to do and it could be a lot of data which could actually be harmful since it would crowd out other output. So I've left the verbose option but it doesn't do anything. We could leave it, remove it, or add some kind of verbose output. Similarly, I thought we might want to print all the to-be-deleted rows in the dry run, but for the same reason I decided not to do that. Though I've left some print rows logic in there in case we want to enable that.