Track completed triggers until deleted from database#19546
Conversation
andrewgodwin
left a comment
There was a problem hiding this comment.
I think this should work - but not entirely sure if it will be watertight, as what if the trigger is re-assigned to the subthread from the DB after it's deleted? Maybe it would be better to keep a separate datastructure of "recently completed trigger IDs" and use that to stop launching duplicates, rather than tying it to the triggers dict entry itself?
a8505d6 to
73f216d
Compare
| self.runner.update_triggers(set(ids)) | ||
|
|
||
| @provide_session | ||
| def purge_completed_triggers_list(self, session): |
There was a problem hiding this comment.
it would be nice if we didn't have to query the database here but i don't think it can be avoided.
i explored updating clean_unused to return the ids deleted. then we could pass those to purge_completed and purge those rows. but this won't work because clean_unused does not filter w.r.t. triggerer_id, so other triggerer instances could delete the rows belonging to this triggerer, so completed_triggers would grow over time.
on the bright side, we're querying a PK, so it shouldn't be that expensive.
alternatively we could change clean_unused to take a triggerer id and delete only the rows for that triggerer, then the above approach would probably work.
There was a problem hiding this comment.
| def purge_completed_triggers_list(self, session): | |
| def purge_completed_triggers(self, session): |
There was a problem hiding this comment.
the reason i put _list there iss because without it, it could signify "purge complleted triggers from the database"
but it's only purging from the instance attribribute completed_triggers. the actual deletion of triggers from the database is done in TriggerJob.clean_unused
what do you think?
| details["name"], | ||
| ) | ||
| self.failed_triggers.append(trigger_id) | ||
| self.completed_triggers.add(trigger_id) |
There was a problem hiding this comment.
It appears that completed_triggers will grow in an unbounded fashion with this code over time - I would suggest a datastructure that is self-limiting. My initial idea is a dict with the IDs as keys and the datetime they were inserted as values, and then just remove all keys whose values are more than 5 minutes old in the first part of purge_completed_triggers.
There was a problem hiding this comment.
how do you envision it growing? the items should only be in completed_triggers for basically one or two loop cycles. it goes submit_event (into completed) -> Trigger.clean_unused (purged from db) -> purge_completed_triggers_list (purged from completed)
but maybe you're saying that in the wild the code may not behave as intended, and triggers will accumulate in the DB, and therefore they'll accumulate in the completed_triggers set? but if they accumulate in the db they'll keep getting recreated and eventually that would be an issue in itself because not only would they exist as ids in a set but they would be running.
using a TTL approach as you have suggested would avoid a db query though.
LMK your thoughts.
There was a problem hiding this comment.
@andrewgodwin gentle nudge here. If you want to go with time-based expiration I have no objection and am happy to rework this. I do recognize that my solution adds a query but if you don't mind explaining I'm having trouble seeingthe unbound growth.
There was a problem hiding this comment.
I guess I just don't quite trust difference_update quite enough, I missed it in my first read-through for "what in here cleans out the datastructure?"
If you're happy there's no edge cases where it can leak entries and write a test to prove so, I think my objection here is void.
kaxil
left a comment
There was a problem hiding this comment.
Approach lgtm, please add tests
|
superceded by #20699 |
Commonly a trigger will complete and be deleted from the
TriggerRunner.triggersdictionary attr before the the TriggerJob has had a chance tosubmit_event. So TriggerRunner thinks it needs to create the trigger [again], which it does.One way to resolve this is to have TriggerJob mark
is_submittedafter successful complletion ofsubmit_event, and then have the cleanup process wait for this. But I'm not sure if perhaps we will need to implement monitoring / timeouts if for some reason the event is never submitted; in that case we'd want to ensure that the trigger is re-run.Should fix #18392
If you think this approach looks OK I'll proceed with tests.