ITK's CI matrix currently runs 17 build jobs per PR spread across GitHub Actions and Azure Pipelines. Several of these jobs are strict subsets of others (notably: every *Python Azure job is a superset of its non-Python sibling on the same OS/arch), and the MinSizeRel config used by all Azure Linux/Windows lanes tests an optimization profile almost no end user actually ships. The redundancy has real costs: per-branch ccache/sccache entries multiply by job count and routinely push the GitHub Actions GB-per-repo cache limit, and each extra job adds another roll of the dice on transient external fetches (the SWIG tarball mirror, pixi solver retries, ExternalData hosts) being flaky.
This issue proposes consolidating the matrix from 17 → ~12 build jobs while preserving every coverage axis we currently care about (Win/Mac/Linux × x86_64/arm64 × Python × legacy-removed × C++20 × MSVC v142/v143 × shared/static).
Current matrix (what each job tests)
| # |
Pipeline / Job |
OS |
Arch |
Std / Build |
Libs |
Python |
Legacy |
Notes |
| CI_01 |
GH arm.yml linux-arm |
ubuntu-24.04-arm |
arm64 |
Release |
static |
— |
— |
Only native arm64 Linux |
| CI_02 |
GH arm.yml macos-rosetta |
macos-15 |
x86_64 |
Release |
static |
— |
— |
Only x86_64 macOS |
| CI_03 |
GH arm.yml macos-py |
macos-15 |
arm64 |
Release |
static |
3.11 |
— |
Overlaps Az MacOSPython |
| CI_04 |
GH pixi.yml linux |
ubuntu-22.04 |
x86_64 |
pixi |
(pixi) |
— |
— |
pixi-managed toolchain |
| CI_05 |
GH pixi.yml windows |
windows-2022 |
x86_64 |
pixi |
(pixi) |
— |
— |
pixi-managed toolchain |
| CI_06 |
GH pixi.yml macos |
macos-15 |
arm64 |
pixi |
(pixi) |
— |
— |
pixi-managed toolchain |
| CI_08 |
Az Linux |
ubuntu-22.04 |
x86_64 |
C++17 / MinSizeRel |
static |
— |
— |
subset of CI_11 |
| CI_09 |
Az LinuxLegacyRemoved |
ubuntu-22.04 |
x86_64 |
C++17 / MinSizeRel |
static |
— |
LEGACY_REMOVE=ON |
only legacy-removed lane |
| CI_10 |
Az LinuxCxx20 |
ubuntu-24.04 |
x86_64 |
C++20 / MinSizeRel |
static |
— |
— |
only C++20 lane |
| CI_11 |
Az LinuxPython |
ubuntu-22.04 |
x86_64 |
C++17 / MinSizeRel |
static |
3.10 |
— |
|
| CI_12 |
Az MacOS |
macos-15 |
arm64 |
Release |
shared |
— |
— |
subset of CI_13 modulo shared-libs |
| CI_13 |
Az MacOSPython |
macos-15 |
arm64 |
Release |
static |
3.10 |
— |
|
| CI_14 |
Az Windows |
windows-2022 |
x86_64 |
MinSizeRel |
shared |
— |
— |
subset of CI_15 |
| CI_15 |
Az WindowsPython |
windows-2022 |
x86_64 |
MinSizeRel |
shared |
3.11 |
— |
|
| CI_16 |
Az Batch v143 |
windows-2022 |
x86_64 |
Release |
shared |
— |
— |
rolling/batch only |
| CI_17 |
Az Batch v142 |
windows-2022 |
x86_64 |
Release |
shared |
— |
— |
only v142 toolset |
Recommendations
A. Drop strict-subset jobs (3 deletions)
Per the rule "a Python build is a superset of a non-Python build on the same OS/arch/toolchain" — if a *Python job is green, the corresponding non-Python job adds no signal:
- Delete
AzurePipelinesLinux.yml job Linux (CI_08) — strict subset of LinuxPython (CI_11).
- Delete
AzurePipelinesWindows.yml (CI_14) — strict subset of WindowsPython (CI_15).
- Delete
AzurePipelinesMacOS.yml (CI_12) — overlaps MacOSPython (CI_13). To keep one shared-library macOS build, set BUILD_SHARED_LIBS=ON in MacOSPython instead of maintaining a separate lane.
B. Merge orthogonal Linux axes (1 deletion)
LinuxLegacyRemoved (CI_09) and LinuxCxx20 (CI_10) are both Ubuntu / gcc / MinSizeRel / static / no-Python lanes that differ in exactly one CMake flag each. They are orthogonal:
- Combine into a single
LinuxLegacyRemovedCxx20 job on ubuntu-24.04 with both ITK_LEGACY_REMOVE=ON and CMAKE_CXX_STANDARD=20. Failure-mode separation is rarely needed in CI; local reproduction handles the rare bisect.
- Set
BUILD_EXAMPLES:BOOL=ON on this same lane so examples coverage (currently OFF everywhere on Azure) is preserved without burdening the Python wrapping jobs that already dominate per-PR wall time. This makes LinuxLegacyRemovedCxx20 the comprehensive non-Python signal: legacy-removal + C++20 + examples + MinSizeRel canary, all in one job.
C. Standardize on Release, keep one MinSizeRel canary
MinSizeRel (-Os) tests an optimization profile our users overwhelmingly do not ship — they ship Release (-O3) or RelWithDebInfo. Currently 5 of 7 Azure jobs use MinSizeRel. The historical reason (small artifacts on free-tier runners) no longer applies on current Azure/GitHub images.
- Switch
LinuxPython, WindowsPython to Release. MacOS* already uses Release.
- Keep the merged
LinuxLegacyRemovedCxx20 on MinSizeRel as a single canary that the unusual optimizer config still builds.
- Consider adding one
Debug lane (asserts on, no NDEBUG) with budget freed up — catches a strictly more useful bug class than MinSizeRel.
D. Python-version spread (no deletions, just policy)
Python wrapping coverage doesn't need duplicating per OS — staggering the Python version across the surviving Python jobs so the union covers our supported range is sufficient:
- Linux Python 3.10 → bump to 3.10
- macOS Python 3.10 → bump to 3.12
- Windows Python 3.11 → keep 3.11
This way each supported Python sees exposure on at least one OS without three jobs running 3.10.
E. Keep as-is
arm.yml jobs CI_01, CI_02 — only native arm64 Linux and only x86_64 macOS coverage.
pixi.yml jobs CI_04–CI_06 — fast PR signal with a different toolchain provenance from Azure; intentional cross-check.
AzurePipelinesBatch v142 + v143 (CI_16, CI_17) — only MSVC v142 coverage and run on integration only, not per-PR; cheap.
Net result: 17 → 12 PR build jobs
|
Before |
After |
| Build jobs per PR |
17 |
12 (−5) |
MinSizeRel jobs |
5 |
1 |
Release jobs |
11 |
10 |
| Legacy-removed lanes |
1 |
1 |
| C++20 lanes |
1 |
1 (merged with legacy) |
| Python wrapping coverage (OSes) |
3 |
3 |
| Native arm64 coverage |
yes |
yes |
| x86_64 macOS coverage |
yes |
yes |
| MSVC v142 coverage |
yes (batch) |
yes (batch) |
| shared-lib coverage on each OS |
yes |
yes |
Estimated cache and compute savings
These are order-of-magnitude estimates; concrete numbers will need a measurement pass on a representative PR.
Cache footprint (ccache / sccache / pixi cache combined):
- Per-job cache size for ITK on a warm build is typically 0.5–2 GB depending on platform (Windows/MSVC sccache largest, pixi-managed Linux smaller). Call it ~1.2 GB average.
- GitHub Actions for ITK enforces a 45 GB per-repo cache cap with LRU eviction. We currently push past it routinely, which is why warm builds frequently regress to cold-cache wall times on busy days.
- Removing 3 GH-side jobs (and reducing per-job size by switching to
Release which produces fewer/smaller object files than MinSizeRel on some toolchains, marginal) should free ~3–4 GB of steady-state cache pressure → measurably fewer LRU evictions on active PR queues.
- Azure pipeline caches are sized differently, but the same logic applies; expect ~5–7 GB less active cache surface across both systems.
Compute time (PR end-to-end wall clock):
- Azure jobs that are dropped (CI_08, CI_12, CI_14): each runs ~45–75 minutes, including provisioning, configuration, build, and test. Three of them ≈ ~3 hours of agent-time per PR removed.
- Merging CI_09+CI_10 saves another job's startup + configure overhead even though the build itself is similar in size, ≈ ~30–45 minutes per PR.
- Switching
MinSizeRel → Release on the surviving jobs is approximately wall-clock-neutral on build (slightly more inlining, slightly less code-size optimization) but tests run noticeably faster under Release due to better vectorization → expect ~5–10% faster ctest phases on the affected jobs.
- Total per-PR wall-clock saved on the critical path: ~30–60 minutes (jobs run in parallel, so the saving comes from removing slow lanes, not from summing).
- Total per-PR agent-minutes saved (billable / queue-pressure metric): ~3.5–4 hours.
Flake-rate reduction:
- Each Azure job pulls SWIG tarball, ExternalData, pixi solver, gcc/clang from apt, etc. Empirically, the per-job flake rate from external fetches is on the order of 1–3% when mirrors are healthy and spikes much higher during outages.
- With 17 jobs, the probability that at least one hits a transient fetch failure on a given PR is roughly
1 − (1 − p)^17 ≈ 15–40% depending on conditions.
- With 12 jobs, the same calculation yields 11–30% — a meaningful drop in spurious red CI without any change to actual test coverage.
Caveat: all numbers assume current runner hardware and current cache infrastructure behavior. A short measurement pass (one PR before, one after the consolidation) would confirm.
Suggested rollout
- Land deletion of
AzurePipelinesLinux.yml, AzurePipelinesWindows.yml, AzurePipelinesMacOS.yml in one commit. Trivially revertible.
- Merge
LinuxLegacyRemoved + LinuxCxx20 → LinuxLegacyRemovedCxx20 in a second commit on ubuntu-24.04 with BUILD_EXAMPLES:BOOL=ON to preserve examples coverage on the comprehensive non-Python lane.
- Switch
LinuxPython and WindowsPython from MinSizeRel → Release in a third commit.
- (Optional) Add a single
LinuxDebug lane in a fourth commit using budget freed by steps 1–3.
- Tag a maintainer to confirm Azure-side pipeline definitions on
dev.azure.com/itkrobotmacospython are updated to match the deleted YAML files.
ITK's CI matrix currently runs 17 build jobs per PR spread across GitHub Actions and Azure Pipelines. Several of these jobs are strict subsets of others (notably: every
*PythonAzure job is a superset of its non-Python sibling on the same OS/arch), and theMinSizeRelconfig used by all Azure Linux/Windows lanes tests an optimization profile almost no end user actually ships. The redundancy has real costs: per-branch ccache/sccache entries multiply by job count and routinely push the GitHub Actions GB-per-repo cache limit, and each extra job adds another roll of the dice on transient external fetches (the SWIG tarball mirror, pixi solver retries, ExternalData hosts) being flaky.This issue proposes consolidating the matrix from 17 → ~12 build jobs while preserving every coverage axis we currently care about (Win/Mac/Linux × x86_64/arm64 × Python × legacy-removed × C++20 × MSVC v142/v143 × shared/static).
Current matrix (what each job tests)
arm.ymllinux-armarm.ymlmacos-rosettaarm.ymlmacos-pypixi.ymllinuxpixi.ymlwindowspixi.ymlmacosLinuxLinuxLegacyRemovedLinuxCxx20LinuxPythonMacOSMacOSPythonWindowsWindowsPythonBatchv143Batchv142Recommendations
A. Drop strict-subset jobs (3 deletions)
Per the rule "a Python build is a superset of a non-Python build on the same OS/arch/toolchain" — if a
*Pythonjob is green, the corresponding non-Python job adds no signal:AzurePipelinesLinux.ymljobLinux(CI_08) — strict subset ofLinuxPython(CI_11).AzurePipelinesWindows.yml(CI_14) — strict subset ofWindowsPython(CI_15).AzurePipelinesMacOS.yml(CI_12) — overlapsMacOSPython(CI_13). To keep one shared-library macOS build, setBUILD_SHARED_LIBS=ONinMacOSPythoninstead of maintaining a separate lane.B. Merge orthogonal Linux axes (1 deletion)
LinuxLegacyRemoved(CI_09) andLinuxCxx20(CI_10) are both Ubuntu / gcc / MinSizeRel / static / no-Python lanes that differ in exactly one CMake flag each. They are orthogonal:LinuxLegacyRemovedCxx20job on ubuntu-24.04 with bothITK_LEGACY_REMOVE=ONandCMAKE_CXX_STANDARD=20. Failure-mode separation is rarely needed in CI; local reproduction handles the rare bisect.BUILD_EXAMPLES:BOOL=ONon this same lane so examples coverage (currentlyOFFeverywhere on Azure) is preserved without burdening the Python wrapping jobs that already dominate per-PR wall time. This makesLinuxLegacyRemovedCxx20the comprehensive non-Python signal: legacy-removal + C++20 + examples +MinSizeRelcanary, all in one job.C. Standardize on
Release, keep oneMinSizeRelcanaryMinSizeRel(-Os) tests an optimization profile our users overwhelmingly do not ship — they shipRelease(-O3) orRelWithDebInfo. Currently 5 of 7 Azure jobs useMinSizeRel. The historical reason (small artifacts on free-tier runners) no longer applies on current Azure/GitHub images.LinuxPython,WindowsPythontoRelease.MacOS*already uses Release.LinuxLegacyRemovedCxx20onMinSizeRelas a single canary that the unusual optimizer config still builds.Debuglane (asserts on, no NDEBUG) with budget freed up — catches a strictly more useful bug class thanMinSizeRel.D. Python-version spread (no deletions, just policy)
Python wrapping coverage doesn't need duplicating per OS — staggering the Python version across the surviving Python jobs so the union covers our supported range is sufficient:
This way each supported Python sees exposure on at least one OS without three jobs running 3.10.
E. Keep as-is
arm.ymljobs CI_01, CI_02 — only native arm64 Linux and only x86_64 macOS coverage.pixi.ymljobs CI_04–CI_06 — fast PR signal with a different toolchain provenance from Azure; intentional cross-check.AzurePipelinesBatchv142 + v143 (CI_16, CI_17) — only MSVC v142 coverage and run on integration only, not per-PR; cheap.Net result: 17 → 12 PR build jobs
MinSizeReljobsReleasejobsEstimated cache and compute savings
These are order-of-magnitude estimates; concrete numbers will need a measurement pass on a representative PR.
Cache footprint (ccache / sccache / pixi cache combined):
Releasewhich produces fewer/smaller object files thanMinSizeRelon some toolchains, marginal) should free ~3–4 GB of steady-state cache pressure → measurably fewer LRU evictions on active PR queues.Compute time (PR end-to-end wall clock):
MinSizeRel→Releaseon the surviving jobs is approximately wall-clock-neutral on build (slightly more inlining, slightly less code-size optimization) but tests run noticeably faster underReleasedue to better vectorization → expect ~5–10% faster ctest phases on the affected jobs.Flake-rate reduction:
1 − (1 − p)^17≈ 15–40% depending on conditions.Caveat: all numbers assume current runner hardware and current cache infrastructure behavior. A short measurement pass (one PR before, one after the consolidation) would confirm.
Suggested rollout
AzurePipelinesLinux.yml,AzurePipelinesWindows.yml,AzurePipelinesMacOS.ymlin one commit. Trivially revertible.LinuxLegacyRemoved+LinuxCxx20→LinuxLegacyRemovedCxx20in a second commit on ubuntu-24.04 withBUILD_EXAMPLES:BOOL=ONto preserve examples coverage on the comprehensive non-Python lane.LinuxPythonandWindowsPythonfromMinSizeRel→Releasein a third commit.LinuxDebuglane in a fourth commit using budget freed by steps 1–3.dev.azure.com/itkrobotmacospythonare updated to match the deleted YAML files.