mountinfo: read the full file instead of capping at 1 MiB by IgorOhrimenko · Pull Request #817 · prometheus/procfs

IgorOhrimenko · 2026-06-01T20:17:08Z

What

GetMounts / GetProcMounts read /proc/<pid>/mountinfo via
util.ReadFileNoStat, which caps reads at 1 MiB (io.LimitReader). On hosts
with many mounts the file is larger than 1 MB, so it is truncated mid-line.
parseMountInfoString then fails on the corrupted last line
(Too few fields in mount string / couldn't find separator) and
parseMountInfo returns that error — so no mounts are returned at all.

A node running a few hundred containers is enough to cross the limit: e.g.
~380 Kubernetes pods produce ~1800 mount entries and a mountinfo of ~1.2 MB,
and it only grows from there.

Why not just raise the cap

The 1 MiB limit was already bumped once for exactly this reason —
f74c35e "Bump max buffer size to 1024kb to handle larger mountinfo files"
(512 KiB → 1 MiB). Raising it again only moves the ceiling. mountinfo has no
meaningful upper size bound, and ReadFileNoStat's own doc comment says
"For files larger than this, a scanner should be used." So this reads the
file in full instead.

Change

Add a readMountInfo helper that reads the whole file (os.Open +
io.ReadAll), and switch the four mountinfo accessors (GetMounts,
GetProcMounts, FS.GetMounts, FS.GetProcMounts) to it.
util.ReadFileNoStat is left unchanged for its other (small pseudo-file)
callers.
Add a regression test that a >1 MiB mountinfo is read and parsed in full
(and that ReadFileNoStat would truncate it).

Downstream impact

Surfaced via prometheus/node_exporter#3530. node_exporter ≥ 1.10.1 delegates
mountinfo parsing to this package (since prometheus/node_exporter#3452), so its
filesystem collector drops all node_filesystem_* metrics on nodes whose
mountinfo exceeds 1 MB.

Note

readMountInfo reads the file into a []byte so the existing
parseMountInfo([]byte) is reused unchanged. Streaming the *os.File straight
into the scanner would also work; I kept the []byte path for a minimal diff.

GetMounts and GetProcMounts read /proc/<pid>/mountinfo via util.ReadFileNoStat, which caps reads at 1 MiB (io.LimitReader). On hosts with a very large number of mounts (busy container/Kubernetes nodes routinely have a 1.0-1.2 MiB mountinfo) the file is truncated mid-line; parseMountInfoString then fails on the corrupted last line ("Too few fields in mount string" / "couldn't find separator") and the whole parse returns an error, so no mounts are reported at all. The 1 MiB cap was already bumped once for this exact reason (commit f74c35e, "Bump max buffer size to 1024kb to handle larger mountinfo files", 512 KiB -> 1 MiB); raising it again only moves the ceiling. mountinfo has no meaningful upper size bound, and ReadFileNoStat's own doc note says "For files larger than this, a scanner should be used". Read the file in full instead. Add a readMountInfo helper and switch the four mountinfo accessors to it; ReadFileNoStat is left unchanged for its other (small pseudo-file) callers. Includes a regression test that a >1 MiB mountinfo is read and parsed in full. Surfaced via prometheus/node_exporter#3530: node_exporter >= 1.10.1 delegates mountinfo parsing to this package, so its filesystem collector silently loses all node_filesystem_* metrics on such hosts. Signed-off-by: Igor Ohrimenko <igor.ohrimenko@travelata.ru>

SuperQ · 2026-06-01T20:32:42Z

This was intentionally changed to fix #225.

IgorOhrimenko · 2026-06-01T21:21:56Z

Thanks! #225 (hyphen inside a field, e.g. /media/isa/300A-6F09) was fixed in #228 by rewriting parseMountInfoString to locate the - separator by position from the end. This PR doesn't touch parseMountInfoString at all — the parser, and therefore the #225 fix, is unchanged.

#228 also switched the read to util.ReadFileNoStat, but that's orthogonal to the hyphen handling: it caps reads at 1 MiB (io.LimitReader), which silently truncates mountinfo on hosts with a few hundred containers (~380 k8s pods ≈ 1800 entries ≈ ~1.2 MB), and then the parse fails on the corrupted last line. The cap was already bumped 512 KiB → 1 MiB once for this same reason (f74c35e), so raising it again just moves the ceiling.

So this keeps #228's parser as-is and only removes the size cap on the read. If you'd prefer a different shape (e.g. keep ReadFileNoStat for everything else and add an unbounded variant just for mountinfo, or fold this into the regexp-parser rewrite you mentioned in #225), happy to adjust.

IgorOhrimenko · 2026-06-02T05:57:35Z

Some real-world before/after from our fleet, in case it's useful.

Environment: Kubernetes worker nodes, ~400 pods each → ~1800 mountinfo entries / ~1.16 MB mountinfo, i.e. just over the 1 MiB cap. node_exporter 1.11.1 (stock), which reads mountinfo through this package.

Before — fraction of scrapes where node_scrape_collector_success{collector="filesystem"} was 0, per node over 24h:

node	failed scrapes (24h)
1	42.8%
2	42.6%
3	30.9%
4	22.1%
5	21.7%
6	21.2%
7	18.9%
8	10.8%

During pod-churn bursts individual nodes sat at 100% failure for tens of minutes — node_filesystem_* completely absent from those nodes for that whole window.

After — rolling out a node_exporter built against a procfs that reads the full mountinfo: 0 failures out of 5160 scrapes across all 8 nodes over the next 10 h, and the nodes that were at 100% at rollout dropped to 0 the moment the new binary started.

SuperQ · 2026-06-02T06:13:38Z

The problem is we specifically avoid using multiple read calls as the kernel does not really lock the contents of proc files.

We have often seen corrupt reads when using things like io.ReadAll to walk the file.

IgorOhrimenko · 2026-06-02T07:54:00Z

ReadFileNoStat is itself io.ReadAll(io.LimitReader(f, 1MiB)), so the current mountinfo path already does the multi-read io.ReadAll walk — this PR keeps the exact same read mechanics and only drops the cap. The 1 MiB limit doesn't make the read atomic; it just stops at 1 MiB, so on a >1 MiB mountinfo you get a guaranteed truncated final record every scrape, which is strictly worse than the occasional race.

If the real concern is non-atomic proc reads (fair — seq_file can shift between reads regardless of buffer size), the robust fix is to make the parser tolerant of a single malformed line instead of failing the whole file. I'm happy to fold that in here, so a torn read degrades to "one missing mount this scrape" rather than "all mounts gone". Want me to add it?

IgorOhrimenko marked this pull request as ready for review June 1, 2026 20:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mountinfo: read the full file instead of capping at 1 MiB#817

mountinfo: read the full file instead of capping at 1 MiB#817
IgorOhrimenko wants to merge 1 commit into
prometheus:masterfrom
IgorOhrimenko:fix/mountinfo-large-files

IgorOhrimenko commented Jun 1, 2026

Uh oh!

SuperQ commented Jun 1, 2026

Uh oh!

IgorOhrimenko commented Jun 1, 2026

Uh oh!

IgorOhrimenko commented Jun 2, 2026

Uh oh!

SuperQ commented Jun 2, 2026

Uh oh!

IgorOhrimenko commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

IgorOhrimenko commented Jun 1, 2026

What

Why not just raise the cap

Change

Downstream impact

Note

Uh oh!

SuperQ commented Jun 1, 2026

Uh oh!

IgorOhrimenko commented Jun 1, 2026

Uh oh!

IgorOhrimenko commented Jun 2, 2026

Uh oh!

SuperQ commented Jun 2, 2026

Uh oh!

IgorOhrimenko commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants