mountinfo: read the full file instead of capping at 1 MiB#817
mountinfo: read the full file instead of capping at 1 MiB#817IgorOhrimenko wants to merge 1 commit into
Conversation
GetMounts and GetProcMounts read /proc/<pid>/mountinfo via util.ReadFileNoStat,
which caps reads at 1 MiB (io.LimitReader). On hosts with a very large number of
mounts (busy container/Kubernetes nodes routinely have a 1.0-1.2 MiB mountinfo)
the file is truncated mid-line; parseMountInfoString then fails on the corrupted
last line ("Too few fields in mount string" / "couldn't find separator") and the
whole parse returns an error, so no mounts are reported at all.
The 1 MiB cap was already bumped once for this exact reason (commit f74c35e,
"Bump max buffer size to 1024kb to handle larger mountinfo files", 512 KiB ->
1 MiB); raising it again only moves the ceiling. mountinfo has no meaningful
upper size bound, and ReadFileNoStat's own doc note says "For files larger than
this, a scanner should be used". Read the file in full instead.
Add a readMountInfo helper and switch the four mountinfo accessors to it;
ReadFileNoStat is left unchanged for its other (small pseudo-file) callers.
Includes a regression test that a >1 MiB mountinfo is read and parsed in full.
Surfaced via prometheus/node_exporter#3530: node_exporter >= 1.10.1 delegates
mountinfo parsing to this package, so its filesystem collector silently loses
all node_filesystem_* metrics on such hosts.
Signed-off-by: Igor Ohrimenko <igor.ohrimenko@travelata.ru>
|
This was intentionally changed to fix #225. |
|
Thanks! #225 (hyphen inside a field, e.g. #228 also switched the read to So this keeps #228's parser as-is and only removes the size cap on the read. If you'd prefer a different shape (e.g. keep |
|
Some real-world before/after from our fleet, in case it's useful. Environment: Kubernetes worker nodes, ~400 pods each → ~1800 mountinfo entries / ~1.16 MB mountinfo, i.e. just over the 1 MiB cap. node_exporter 1.11.1 (stock), which reads mountinfo through this package. Before — fraction of scrapes where
During pod-churn bursts individual nodes sat at 100% failure for tens of minutes — After — rolling out a node_exporter built against a procfs that reads the full mountinfo: 0 failures out of 5160 scrapes across all 8 nodes over the next 10 h, and the nodes that were at 100% at rollout dropped to 0 the moment the new binary started. |
|
The problem is we specifically avoid using multiple read calls as the kernel does not really lock the contents of proc files. We have often seen corrupt reads when using things like |
|
If the real concern is non-atomic proc reads (fair — seq_file can shift between reads regardless of buffer size), the robust fix is to make the parser tolerant of a single malformed line instead of failing the whole file. I'm happy to fold that in here, so a torn read degrades to "one missing mount this scrape" rather than "all mounts gone". Want me to add it? |
What
GetMounts/GetProcMountsread/proc/<pid>/mountinfoviautil.ReadFileNoStat, which caps reads at 1 MiB (io.LimitReader). On hostswith many mounts the file is larger than 1 MB, so it is truncated mid-line.
parseMountInfoStringthen fails on the corrupted last line(
Too few fields in mount string/couldn't find separator) andparseMountInforeturns that error — so no mounts are returned at all.A node running a few hundred containers is enough to cross the limit: e.g.
~380 Kubernetes pods produce ~1800 mount entries and a mountinfo of ~1.2 MB,
and it only grows from there.
Why not just raise the cap
The 1 MiB limit was already bumped once for exactly this reason —
f74c35e "Bump max buffer size to 1024kb to handle larger mountinfo files"
(512 KiB → 1 MiB). Raising it again only moves the ceiling.
mountinfohas nomeaningful upper size bound, and
ReadFileNoStat's own doc comment says"For files larger than this, a scanner should be used." So this reads the
file in full instead.
Change
readMountInfohelper that reads the whole file (os.Open+io.ReadAll), and switch the four mountinfo accessors (GetMounts,GetProcMounts,FS.GetMounts,FS.GetProcMounts) to it.util.ReadFileNoStatis left unchanged for its other (small pseudo-file)callers.
(and that
ReadFileNoStatwould truncate it).Downstream impact
Surfaced via prometheus/node_exporter#3530. node_exporter ≥ 1.10.1 delegates
mountinfo parsing to this package (since prometheus/node_exporter#3452), so its
filesystem collector drops all
node_filesystem_*metrics on nodes whosemountinfo exceeds 1 MB.
Note
readMountInforeads the file into a[]byteso the existingparseMountInfo([]byte)is reused unchanged. Streaming the*os.Filestraightinto the scanner would also work; I kept the
[]bytepath for a minimal diff.