Skip to content

feat(ipam): implement IPClass — policy layer for claiming IP space#72

Draft
scotwells wants to merge 2 commits into
mainfrom
feat/ip-class-impl
Draft

feat(ipam): implement IPClass — policy layer for claiming IP space#72
scotwells wants to merge 2 commits into
mainfrom
feat/ip-class-impl

Conversation

@scotwells

Copy link
Copy Markdown
Contributor

What this adds

Implements IPClass — the cluster-scoped, platform-owned allocation-policy resource proposed in the IPClass enhancement (docs/enhancements/ip-class.md). It's the direct analog of a Kubernetes StorageClass: a class names a kind of address space and the rules for handing it out (provisioner, IP family, allocation strategy, allowed/default prefix lengths, reclaim policy, visibility). Pools advertise the classes they back via spec.classNames; claims select a class via spec.className.

A claim naming a class (or falling through to the default class) resolves the class, folds its policy into the claim, and picks a backing pool across the caller's project and the platform scope — returning the CIDR synchronously. The allocation records its class for provenance. Existing poolRef/poolSelector paths are unchanged; poolSelector on the claim is deprecated in favor of the class.

Consumers stop needing to know pool names or label vocabulary — they claim by class name, and the same manifest is portable across environments.

Included

  • API types + full codegen (IPClass, IPPoolSpec.ClassNames, IPClaimSpec.ClassName, IPAllocationSpec.ClassName).
  • Cluster-scoped ipclass registry, apiserver wiring, validation, defaulting, single-default-class enforcement.
  • Claim resolution (caller + platform scope), classNames index, provenance, a low-cardinality class metric label.
  • milo-ipam class list/class show + prefix claim --class.
  • IAM ProtectedResource + role verbs, examples/ipclass/.
  • Unit tests, chainsaw e2e suites, a k6 class-claim throughput script, and store race-regression tests.

Validation (live kind deployment)

  • e2e: 15/16 chainsaw suites pass, including both new IPClass suites (claim-by-class + provenance, default-class, error paths, immutability, backward-compat, and the consumer→platform-owned-class path). The one non-pass is the pre-existing tracing suite (needs the optional observability stack; unrelated).
  • Smoke: class → backing pool → synchronous bind confirmed end-to-end on the live apiserver (server-derived family/reclaim + provenance).
  • Perf: 3/4 thresholds pass — prefix-claim-throughput 147/s (p95 54ms), class-claim-throughput 93.9/s (p95 130ms), pool-exhaustion deny (p95 68ms); all read latencies pass. The read-latency success-rate threshold is blocked only by a pre-existing, arch-independent apiserver stability bug (Apiserver heap corruption under high-concurrency LIST load (~7–9k req/s) — pre-existing, arch-independent #71), not by this change.

Scope

Fully backward compatible and additive. Claiming by class name is the standard path; poolSelector on claims is deprecated; poolRef remains as an advanced escape hatch. Cross-project claiming and catalog-driven per-project class distribution are separate follow-ups.

Known issue (decoupled)

The perf read-success-rate gap is a pre-existing, architecture-independent apiserver heap-corruption crash under sustained high-concurrency LISTs (documented on main via the MaxConns=10 mitigation; zero IPAM frames; our read/store/convert/watch paths proven race-free by the -race regression tests here). Tracked in #71. Not introduced by this PR.

Status

Draft. Depends conceptually on the enhancement proposal (#70).

scotwells added 2 commits July 2, 2026 08:21
Introduces IPClass, a cluster-scoped, platform-owned policy resource that
names a kind of address space and the rules for handing it out (provisioner,
ip family, allocation strategy, allowed/default prefix lengths, reclaim
policy, visibility). Pools advertise the classes they back via
spec.classNames; claims select a class via spec.className.

An IPClaim naming a class (or falling through to the default class) resolves
the class, folds its policy into the claim, and picks a backing pool across
the caller's project and platform scopes, returning the CIDR synchronously.
The allocation records its class for provenance. Existing poolRef/poolSelector
paths are unchanged; poolSelector on the claim is deprecated in favor of the
class. Adds the class dimension to allocation metrics, a milo-ipam 'class'
command surface and 'prefix claim --class', IAM protected-resource + role
verbs, examples, chainsaw e2e suites, and a k6 class-claim perf script.

Cross-project claiming via projectRef and per-project class distribution via
the service catalog are deferred to follow-ups.
…sk, and store race-regression tests

- test/e2e/ip-class-platform-scope: a consumer project claiming a
  platform-owned IPClass by spec.className binds to the platform pool
  across the caller+platform scope, with IPAllocation provenance and no
  use-grant required (validates the class resolution scoping decision).
- Taskfile: test/load:class-throughput target for the class-based claim
  k6 script.
- Race-regression tests over the store GetList decode->convert->encode
  and watch paths, plus the apiserver codec, all clean under -race —
  guarding the shared serving path IPClass conversion runs through.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant