Skip to content

fix(elfpatch): self-patch opt-in + bin/lib auto-classify + cross-platform polish#11

Merged
Sunrisepeak merged 1 commit intomainfrom
fix/elfpatch-self-patch-and-platform-handling
May 2, 2026
Merged

fix(elfpatch): self-patch opt-in + bin/lib auto-classify + cross-platform polish#11
Sunrisepeak merged 1 commit intomainfrom
fix/elfpatch-self-patch-and-platform-handling

Conversation

@Sunrisepeak
Copy link
Copy Markdown
Member

Summary

  • Bug A (P0) — Rule 1 (self-patch) is now opt-in. Auto-self-patching glibc breaks ld-linux/libc.so.6 program-header invariants → segfault at execve+1 (SEGV_MAPERR @ 0x8). Provider hooks that genuinely need it must opt in via `elfpatch.set({ self_patch = true })`.
  • Bug B (P1) — `_patch_elf` fallback + declarative bins paths now use `_has_pt_interp()` to skip `--set-interpreter` on shared libraries (legitimately have no INTERP segment). Eliminates `exec failed (code=1)` log noise; auto-classifies bin vs lib without filename heuristics.
  • Cross-platform polish
    • Windows early-bail in `M._apply` (PE has no INTERP/RPATH analog).
    • macOS rpath-only auto-trigger when ≥1 dep declares `libdirs` (Mach-O has no INTERP, so loader-based predicate never fired before).
    • Documented platform support matrix in `patch_elf_loader_rpath` header.

Why this matters

`xlings 0.4.11` shipped predicate-driven elfpatch but Linux fresh installs of any glibc-dependent xpkg segfault immediately. Reproduced in isolated `XLINGS_HOME`:

```sh
$ xlings install openssl # fresh install, predicate fires, glibc self-patches
$ openssl version
Segmentation fault (core dumped)
```

Root cause traced: Rule 1 patched libc.so.6 INTERP, falsifying offsets baked into glibc's static RO data (link_map, _rtld_global). After this PR, the same flow:

```sh
$ xlings install openssl
$ openssl version
OpenSSL 3.1.5 30 Jan 2024 (Library: OpenSSL 3.1.5 30 Jan 2024)
```

Cross-platform behavior matrix (post-fix)

Platform Tool INTERP RPATH Predicate auto-fire
Linux patchelf ✅ `--set-interpreter` ✅ `--set-rpath` When ≥1 runtime dep declares `exports.runtime.loader`
macOS install_name_tool — (no Mach-O analog) ✅ `-add_rpath` When ≥1 runtime dep declares `exports.runtime.libdirs`
Windows Skipped at entry; PE has no analog

Test plan

  • `xpkg_executor_test` 22/22 pass (Linux, macOS, Windows test cases all exercise existing legacy + new predicate paths)
  • E2E in isolated `XLINGS_HOME`: `xlings install openssl` from scratch
    • glibc payload: untouched (libc.so.6 INTERP unchanged from tarball)
    • openssl bin: INTERP rewrites to xim-x-glibc loader, RUNPATH = correct closure
    • libssl.so.3: no `--set-interpreter` attempt (silent), RUNPATH set
    • `openssl version` → exits 0
  • CI: cross-platform GHA matrix
  • Tag v0.0.34 → consumed by xlings 0.4.12 hotfix

Code-level documentation added

Comprehensive comments at:

  • `_resolve_predicate` — full predicate decision tree with reasoning per rule
  • `_has_pt_interp` — discriminator semantics (no `.so` filename heuristics)
  • `_patch_elf` fallback — bin/lib auto-classification
  • `_patch_elf_executables` — graceful fall-through when bins/ contains a `.so`
  • `patch_elf_loader_rpath` — platform support matrix
  • `M._apply` — Windows early-bail with clear platform-support-matrix doc

…form polish

The 0.4.11 predicate-driven elfpatch shipped with three issues that
together made fresh installs of any glibc-dependent xpkg unusable on
Linux. This patch addresses all of them and tightens the cross-platform
contract.

## Bug A (P0): self-patch segfault

Rule 1 in `_resolve_predicate` (a loader-provider auto-patches its own
payload via `self_exports.loader`) breaks ld-linux / libc.so.6 program-
header invariants. patchelf's --set-interpreter on libc.so.6 relays
the .interp segment, which falsifies offsets bake'd into glibc's static
RO data (link_map, _rtld_global). Any consumer linked against the
patched glibc segfaults at execve+1 with SEGV_MAPERR @ 0x8 — before
any application code runs.

Repro (isolated XLINGS_HOME, fresh):
  $ xlings install openssl
  $ openssl version
  Segmentation fault (core dumped)

Fix: self-patch is now opt-in. Provider hooks that genuinely need it
(rare — usually the install hook pre-relocates internal absolute paths)
must call `elfpatch.set({ self_patch = true })` explicitly. The
exports.runtime.loader field is now correctly treated as metadata for
*consumers*, not a self-patch trigger.

## Bug B (P1): patchelf --set-interpreter noise on shared libraries

Fallback scan and declarative bins/libs paths attempted
--set-interpreter on every ELF found, including .so files. patchelf
exits 1 on shared libs (they legitimately have no PT_INTERP), spamming
the install log with `exec failed (code=1)` for each library.

Fix: route each file through `_has_pt_interp()` (already present in
the module) and skip --set-interpreter on files without an INTERP
segment. RPATH is still applied — that's the natural patch for shared
libs anyway. No more false-positive errors in the install log.

This also discriminates by ELF format rather than filename heuristics
(`.so`), so PIE binaries with unusual names still get classified
correctly as executables.

## Cross-platform polish

- Windows early-bail in `M._apply`: PE has no INTERP/RPATH analog (DLL
  search is governed by the Windows loader: same-dir → System32 →
  PATH). Previously the predicate ran the full resolve + closure
  computation only to be no-op'd at the dispatch layer. Now skip at
  the entry point with a clear debug log.

- macOS rpath-only auto-trigger: Mach-O has no INTERP, so deps on
  macOS shouldn't declare `loader`. Previously the predicate refused
  to fire without a loader, so consumers on macOS got no auto-RPATH.
  Now: when no loader candidate exists but ≥1 dep declared `libdirs`,
  fire rpath-only on macOS (Linux deliberately doesn't have this
  fallback — RPATH-only on Linux without managing INTERP would leave
  binaries pointing at build-host glibc).

- `patch_elf_loader_rpath` dispatch: added a header comment documenting
  the platform support matrix so future maintainers can find this in
  one place.

## Tests

- xpkg_executor_test 22/22 pass (incl. all platform-specific cases).
- E2E (isolated XLINGS_HOME, Linux fresh install openssl):
    glibc payload: untouched (libc.so.6 INTERP unchanged from tarball)
    openssl bin:   INTERP=xim-x-glibc/lib64/ld-linux, RUNPATH=closure
    libssl.so.3:   no INTERP attempt (silent), RUNPATH=closure
    `openssl version` → OpenSSL 3.1.5 30 Jan 2024, exit 0
@Sunrisepeak
Copy link
Copy Markdown
Member Author

Linux CI failure here is the very segfault this PR fixes — xlings 0.4.11's broken predicate elfpatch poisons glibc during xlings install, then xmake --version crashes with SIGSEGV @ 0x8 (exit 139). macOS + Windows CI green; libxpkg unit tests 22/22 pass; isolated-XLINGS_HOME E2E with patched binary verified openssl version runs cleanly. Re-running this CI after xlings 0.4.12 ships will turn it green (chicken-and-egg: 0.4.12 depends on this libxpkg fix being released as v0.0.34).

@Sunrisepeak Sunrisepeak merged commit f40d939 into main May 2, 2026
2 of 3 checks passed
Sunrisepeak added a commit to d2learn/xlings that referenced this pull request May 2, 2026
…olish (#255)

Bumps VERSION 0.4.11 → 0.4.12 + add_requires("mcpplibs-xpkg 0.0.34").

Hotfix for the predicate-driven elfpatch regression in 0.4.11. Three
issues that together made fresh installs of any glibc-dependent xpkg
unusable on Linux:

  * Bug A (P0): Rule 1 self-patch was unconditional. A loader-provider
    declaring `exports.runtime.loader` got auto-self-patched, which
    rewrites `libc.so.6`'s `.interp` segment and falsifies offsets baked
    into glibc's static RO data (link_map / _rtld_global). Any consumer
    linked against the patched glibc segfaulted at execve+1 with
    SEGV_MAPERR @ 0x8 — before any application code ran.

    Repro on 0.4.11 (isolated XLINGS_HOME):
      $ xlings install openssl
      $ openssl version
      Segmentation fault (core dumped)

    Fix: self-patch is now opt-in. Provider hooks that need it must call
    `elfpatch.set({ self_patch = true })` explicitly. The
    exports.runtime.loader field is now correctly treated as metadata
    *for consumers*, not a self-patch trigger. Most providers (glibc
    included) pre-relocate their own payload at install time; the
    install hook (e.g. glibc.lua's __relocate) is the right place for
    that.

  * Bug B (P1): patchelf --set-interpreter noise on shared libraries.
    Fallback scan + declarative bins paths attempted --set-interpreter
    on every ELF found, including .so files. patchelf exits 1 on shared
    libs (no PT_INTERP) → log spam `exec failed (code=1)` for each
    library. Now uses the existing _has_pt_interp() helper to skip
    --set-interpreter on files without an INTERP segment; RPATH still
    applied. Auto bin/lib classification works on PIE binaries with
    unusual names too — no `.so` filename heuristic.

  * Cross-platform polish:
      - Windows early-bail in M._apply: PE has no INTERP/RPATH analog
        (DLL search is governed by Windows loader: same dir → System32
        → PATH). Previously the predicate ran the full resolve + closure
        only to be no-op'd at dispatch.
      - macOS rpath-only auto-trigger when ≥1 dep declares libdirs
        (Mach-O has no INTERP, so deps shouldn't declare loader; old
        gate refused to fire without a loader → consumers on macOS got
        no auto-RPATH). Linux deliberately doesn't have this fallback —
        RPATH-only on Linux without managing INTERP would leave
        binaries pointing at build-host glibc.
      - Comprehensive cross-platform comments + support matrix doc in
        elfpatch.lua.

Migration: 0.4.11 → 0.4.12 is binary-compatible. The in-place self-update
path handles upgrade. Any payload installed under 0.4.11 that's known
working stays as-is (patched binaries are functional; only the
glibc-self-patch case was broken, and 0.4.11's segfault made those
installs fail to mark complete in many cases anyway).

Verified locally with isolated XLINGS_HOME:
  - libxpkg unit tests 22/22 pass (Linux/macOS/Windows cases)
  - fresh openssl install: glibc payload untouched, openssl bin INTERP
    points at xim-x-glibc loader, RUNPATH = correct closure,
    `openssl version` exits 0

libxpkg PR: openxlings/libxpkg#11 (merged, tagged v0.0.34)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant