Skip to content

Add actual disk utilization metric by component#170

Open
sjmiller609 wants to merge 4 commits intomainfrom
codex/disk-utilization-metric
Open

Add actual disk utilization metric by component#170
sjmiller609 wants to merge 4 commits intomainfrom
codex/disk-utilization-metric

Conversation

@sjmiller609
Copy link
Collaborator

@sjmiller609 sjmiller609 commented Mar 26, 2026

Summary

  • add hypeman_disk_utilization_bytes for actual filesystem usage by storage component
  • keep hypeman_resources_disk_breakdown_bytes as the existing allocation/provisioned view
  • add tests for sparse-file accounting, snapshot classification, and cached refresh behavior

Testing

  • go test ./lib/diskutilization ./lib/resources
  • validated the new collector on dev-yul-hypeman-1 against the live filesystem without deploying Hypeman changes there first
  • live server usage matched essentially exactly:
    • images: 658.6 GiB
    • oci_cache: 108.3 GiB
    • volumes: 1.9 GiB
    • rootfs_overlays: 89.3 GiB
    • volume_overlays: 0 GiB
    • snapshot_uncompressed: 288.2 GiB
    • snapshot_compressed: 86.4 GiB
    • snapshot_other: 0 GiB
  • all buckets matched exactly except rootfs_overlays, which differed by only 208 KiB across separate runs, consistent with normal live filesystem drift
  • runtime on that host was about 100-120 ms per collection, with scrapes still remaining cheap because the metric is measured on the refresh loop and served from cached in-memory values

Note

Medium Risk
Introduces a new filesystem-walking collector in the resource monitoring refresh loop; bugs or performance regressions could affect monitoring refresh latency and metric correctness, but it does not change core scheduling or data paths.

Overview
Adds a new lib/diskutilization collector that measures actual filesystem allocated bytes (sparse-aware via stat.Blocks) for Hypeman storage components, including snapshot classification (compressed/uncompressed/other).

Integrates this into resource monitoring by caching the collected breakdown in the monitoring snapshot and exporting it as a new Prometheus/OTel gauge hypeman_disk_utilization_bytes (keeping the existing hypeman_resources_disk_breakdown_bytes as the provisioned/allocation view), with tests covering sparse-file accounting, snapshot classification, and that scrapes read cached values until the next refresh.

Written by Cursor Bugbot for commit f58e598. This will update automatically on new commits. Configure here.

@sjmiller609 sjmiller609 marked this pull request as ready for review March 26, 2026 18:35
@sjmiller609 sjmiller609 requested a review from hiroTamada March 26, 2026 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant