fix: use atomic writes for state.json to prevent corruption#600
fix: use atomic writes for state.json to prevent corruption#600MayankSharmaCSE wants to merge 1 commit intourunc-dev:mainfrom
Conversation
Signed-off-by: mayanksharmaCSE <mayanksharmacse1@gmail.com>
✅ Deploy Preview for urunc canceled.
|
There was a problem hiding this comment.
Pull request overview
This PR hardens unikontainer state persistence by replacing direct os.WriteFile() updates with an atomic temp-file-and-rename approach to reduce the risk of state.json corruption on mid-write termination (Fixes #598).
Changes:
- Added
atomicWriteFile()helper for atomic file updates via temp file + sync + rename. - Refactored
saveContainerState()(state.json) andwritePidFile()to useatomicWriteFile(). - Added unit tests covering basic
atomicWriteFile()behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
pkg/unikontainers/utils.go |
Introduces atomicWriteFile() and refactors writePidFile() to use it |
pkg/unikontainers/unikontainers.go |
Switches saveContainerState() to atomic writes |
pkg/unikontainers/utils_test.go |
Adds unit tests for atomicWriteFile() |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return fmt.Errorf("failed to close temp file: %w", closeErr) | ||
| } | ||
|
|
||
| return os.Rename(tmpName, path) |
There was a problem hiding this comment.
atomicWriteFile returns os.Rename(tmpName, path) directly. If the rename fails, the temp file is left behind and the returned error lacks context about the target path. Consider removing the temp file on rename failure and wrapping the error so callers get a clearer failure reason.
| return os.Rename(tmpName, path) | |
| if err := os.Rename(tmpName, path); err != nil { | |
| _ = os.Remove(tmpName) | |
| return fmt.Errorf("failed to rename temp file %q to %q: %w", tmpName, path, err) | |
| } | |
| return nil |
| dir := filepath.Dir(path) | ||
| tmpName := filepath.Join(dir, "."+filepath.Base(path)+".tmp") | ||
|
|
||
| f, err := os.OpenFile(tmpName, os.O_RDWR|os.O_CREATE|os.O_TRUNC|os.O_SYNC, perm) |
There was a problem hiding this comment.
atomicWriteFile opens the temp file with O_SYNC and also calls f.Sync(). This is redundant and can significantly slow down writes (O_SYNC already forces synchronous writes). Consider dropping O_SYNC and relying on an explicit f.Sync() (or vice versa) to keep the durability semantics while avoiding extra overhead.
| f, err := os.OpenFile(tmpName, os.O_RDWR|os.O_CREATE|os.O_TRUNC|os.O_SYNC, perm) | |
| f, err := os.OpenFile(tmpName, os.O_RDWR|os.O_CREATE|os.O_TRUNC, perm) |
| t.Run("fails on invalid directory", func(t *testing.T) { | ||
| t.Parallel() | ||
| target := filepath.Join("/nonexistent/dir", "state.json") | ||
|
|
||
| err := atomicWriteFile(target, []byte("data"), 0o644) | ||
| assert.Error(t, err) | ||
| }) |
There was a problem hiding this comment.
TestAtomicWriteFile covers the success path and a create-temp failure, but it doesn’t cover the rename-failure path (e.g., when the target exists as a directory or permissions prevent replacing it). Adding a test for rename failure is useful to verify temp-file cleanup and to prevent accumulating stray temp files over time.
|
|
||
| stateName := filepath.Join(u.BaseDir, stateFilename) | ||
| return os.WriteFile(stateName, data, 0o644) //nolint: gosec | ||
| return atomicWriteFile(stateName, data, 0o644) //nolint: gosec |
There was a problem hiding this comment.
saveContainerState still has a //nolint: gosec suppression on the atomicWriteFile call. Since this is no longer a direct os.WriteFile with a literal permission, the suppression may be unnecessary and could hide future issues on this line. Consider removing the nolint (or moving it to the specific line that triggers gosec, if any).
| return atomicWriteFile(stateName, data, 0o644) //nolint: gosec | |
| return atomicWriteFile(stateName, data, 0o644) |
| dir := filepath.Dir(path) | ||
| tmpName := filepath.Join(dir, "."+filepath.Base(path)+".tmp") | ||
|
|
||
| f, err := os.OpenFile(tmpName, os.O_RDWR|os.O_CREATE|os.O_TRUNC|os.O_SYNC, perm) | ||
| if err != nil { |
There was a problem hiding this comment.
atomicWriteFile uses a deterministic temp path ("."+base+".tmp") and opens it with O_TRUNC (no O_EXCL / unique name). If two goroutines/processes call atomicWriteFile for the same target concurrently, they can share the same temp inode; one caller may rename it to the final path while the other still has the FD open and continues writing, which can corrupt the final file (defeating the purpose of atomic writes). Use a unique temp file per call (e.g., os.CreateTemp in the same dir with a prefix) and then rename that unique file into place (optionally chmod to perm).
Description
This PR fixes the non-atomic state.json writes in
saveContainerState()that could lead to file corruption if the urunc process is killed mid-write.Problem
saveContainerState()inpkg/unikontainers/unikontainers.gowas usingos.WriteFile()directlyurunc start,kill,delete) would fail with JSON parse errorsSolution
atomicWriteFile()helper inpkg/unikontainers/utils.gothat:saveContainerState()to use the new helperwritePidFile()to use the same helper for consistencyatomicWriteFile()Files changed
pkg/unikontainers/utils.go- AddedatomicWriteFile(), refactoredwritePidFile()pkg/unikontainers/unikontainers.go- ChangedsaveContainerState()to use atomic writespkg/unikontainers/utils_test.go- AddedTestAtomicWriteFilewith 4 test casesRelated issues
How was this tested?
atomicWriteFile()covering:writePidFilestill works correctly (existing test passes)go vetpasses for Linux targetgolangci-lintpasses with no new warningsLLM usage
Claude Opus 4.7 (Anthropic) assisted with analysis, implementation, and code review.
Checklist
make lint).make test_ctr,make test_nerdctl,make test_docker,make test_crictl).