Pop caching#107
Merged
Merged
Conversation
The cleanup and restore code doesn't even work so the tests are not currently idempotent anyways. This code is basically just dead weight that makes the tests harder to maintain. After this patch, it is a precondition for starting the test script that the containers are started with fresh volumes (i.e. that you just did a podman volume prune command before bringing the containers up), and after the tests run, the volumes are left in and unspecified state. You really shouldn't be running the tests on an instance with important/ production data in the volumes, but if you really want to, you should make a meta script that uses the existing backup and restore scripts to save the volumes brings the containers down, prunes the volumes, brings them back up, runs the tests brings them down and then restores the volumes.
The folder is called `pop` but the actual program is `pop3`. Just like the first time we encountered ENOENT errors on exec from local build artifacts contaminating in the container images, this took me way too long to debug. Fixes: 5893239 ("add .dockerignore to mask any local build artifacts")
Fixes: 175e5a5 ("submatrix: initial version")
When we added the grades db volume while creating the denis container we forgot to add it to the list of volumes to back up used by the backup script. Fixes: 64e35fd ("denis: automate denis using container technology")
Since denis needs access to the email data folders we need to introduce a dependency link between it and smtp so that smtp is started beforehand and has an opportunity to create the folders within the volumes with the appropriate user and group before denis starts.
Separating the volumes allows finer grained control over which containers have the ability view the data and edit it. pop only needs access to only the mail, while denis needs access to both the mail and logs.
We can just `chdir` into the mail folder and use regular open to avoid this global variable. While we are at it, we can clean up the code for loading the email since we aren't using different unix groups for access control so the call to `accessat` is redundant.
This saves an additional syscall during the execution of `TOP` (moving it to the startup), but when the maildrop data is cached, it will no longer need to be done for each email for each session.
We don't even really need to support more than one session per connection let alone over 200, but we certain did not need to support 64k of them. Shrinking this field also makes the fixed ID length 30 bytes which is small enough to allow shrinking the email structure in pop to make the maildrop more compact.
now that smtp ids are 30 bytes, we can fit all of the data needed in 48 bytes which is a 25% savings on the overall maildrop size.
The type will be shared with other code so moving to a .h file is helpful.
The current model for loading email in the pop server is very inefficient: we have to open the mail directory iterate over and stat every file in the directory to collect metadata about the emails (regardless of how many the user actually wants to access) and then throw all the data away when the client disconnects. This was essentially necessary to implement the unix permissions based system of access control, but as that is not even working / being used anymore, it is just a massive waste of time. As an alternative, the array of emails structures that would be built by the pop program by reading the list of mail can be built by a separate program and stored in a file that is `mmap`ed into memory by the pop program on startup. This offers an essentially O(1) vs O(n) speed improvement on the existing startup procedure where n is the number of emails in the maildrop with the only downside that this journal file must be updated periodically to give access to new emails. This however is not a serious issue, because email is read more often than sent (this is especially true in a mailing list where every email sent is read by as many people as are on the list), and the need to update the file can be used as an opportunity to inject access control logic and hold back email from being visible until appropriate opening the door for a more flexible replacement to the prior permissions scheme in addition to the substantial performance upside. The new program has two modes of operation. It can create a new empty mail journal file which is used during building the container to initialize the journal, and a second mode where a folder containing email and a temporary file name is provided and the program reads all the mail and creates a new journal file with all the data in the temporary location before swapping it in to replace the real journal atomically once everything is settled. This swap procedure ensures that existing pop programs that opened the old file will still be holding a reference to its inode as the backing for their mmap region and will never see the new content. Once all of the pop programs that were running at the time of the swap exit, the file will be released. Meanwhile, newly spawned pop programs will resolve the path to the new journal file inode and access the new emails. There is no current provision for automatically updating the journal file at this time, so currently the tests are hardcoded to manually trigger an update after sending email, but if this system were to be deployed, a cron job could be installed to `podman exec` the journal update program every five minutes to provide essentially the same behavior as the prior version where email was available instantly. A more automated approach is coming in subsequent patches.
The requirements are the same as imposed in the smtp server, so valid accounts will already follow them. We want to be able to look up an extra attribute on the journal file based on the username to implement access controls per user, so adding these limits will ensure that the usernames are sane to use in kernel xattr names.
If a student has not made an initial submission when the deadline closes we will want to be able to block them from seeing the other students initial submissions once they are released for peer review. A list of 'delinquent' students who have not submitted will need to be found by checking the denis database in order to exclude them from peer review anyways, and these students' usernames can be passed to the new `restrict_access` program with the `-d` flag to deny them access to all future emails until such time as the restriction is removed by invoking the script with their the `-a` flag and their username to re-allow access. To implement this, the loading of emails in pop is moved to after the user logs in so their username can be passed to the function that fills the maildrop. This function now needs to try looking up a limit xattr specific to that user first, and then fall back to the generic limit if no such xattr is found. Whichever limit is in effect is then used for the same size calculations and mmap as before. The program to initialize the journal with mail needs to be aware of these user specific limits and make sure to copy the xattrs to the new journal file it creates before its atomic swap into place. The tests are expanded to verify that this mechanism does indeed work correctly by simulating a user who has been marked as 'delinquent' trying to access the inbox and verifying that they cannot see mail until their access restriction has been lifted.
b886f5a to
66f4f00
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overhaul of how mail is loaded into pop that offers substantial speed improvements and a more flexible system of access controls that will allow us to restrict access to new mail per student.
Also various miscellaneous fixes discovered while implementing these changes.