Skip to content

Runtime ELF patching and rtld_audit removal#810

Open
wdcui wants to merge 17 commits intomainfrom
wdcui/pr4-runtime-patch
Open

Runtime ELF patching and rtld_audit removal#810
wdcui wants to merge 17 commits intomainfrom
wdcui/pr4-runtime-patch

Conversation

@wdcui
Copy link
Copy Markdown
Member

@wdcui wdcui commented Apr 25, 2026

This PR replace the rtld_audit based syscall interception with runtime ELF patching during mmap. When a PROT_EXEC segment is mapped, the shim now patches syscall instructions in-place and places trampoline stubs in a dynamically-allocated region near the code.

@wdcui
Copy link
Copy Markdown
Member Author

wdcui commented Apr 25, 2026

This PR is ready for review. Thanks!

@sangho2
Copy link
Copy Markdown
Contributor

sangho2 commented Apr 27, 2026

Agents have found a bunch of vulnerable code paths that do not rewrite executable pages and more. We should fix all of them and perhaps should merge #805 before this PR. I add them below to ease agent-driven code patching. I'll "manually" add review later.

  • do_mmap_file() only fails the mapping if maybe_patch_exec_segment() returns false (litebox_shim_linux/src/syscalls/mm.rs:129-140). The runtime patching path logs and returns true for trampoline allocation failure, too-far trampoline, entry write failure, mprotect failures, read/writeback failures, and rewriter errors (:692-724, :747-770, :773-889). That leaves the executable mapping live with original syscall bytes.
  • patch_code_segment() can mutate code_buf by replacing unpatchable syscalls with traps and return empty stubs plus skipped addresses (litebox_syscall_rewriter/src/lib.rs:448-452, :753-765). The Linux mmap caller only writes code_buf back inside Ok(stubs) if !stubs.is_empty() (litebox_shim_linux/src/syscalls/mm.rs:810-864), while Ok(_) does nothing (:878-880). Trap-only rewrites are therefore dropped. In mixed cases, the trap bytes are written back because stubs are non-empty, but skipped sites are only logged and the mapping still succeeds.
  • Runtime rewriting is gated on the original mmap prot containing PROT_EXEC (litebox_shim_linux/src/syscalls/mm.rs:118-134). mprotect dispatch goes straight to sys_mprotect (litebox_shim_linux/src/lib.rs:641-642), which delegates directly to common mprotect (litebox_shim_linux/src/syscalls/mm.rs:375-381). Since mappings get VM_MAYEXEC by default, a non-exec file mapping can later become exec without rewrite.
  • AOT rewriting filters actual executable text sections (litebox_syscall_rewriter/src/lib.rs:183-194, :219-230, :282-312). Runtime rewriting passes the whole mapped executable range to patch_code_segment() (litebox_shim_linux/src/syscalls/mm.rs:773-795) and writes the whole buffer back (:863-864). That can patch bytes in executable PT_LOAD padding/data that AOT would not touch.
  • For unpatched binaries, ElfParsedFile::load() only bumps info.brk (litebox_common_linux/src/loader.rs:487-491); it does not map or reserve a VMA. The Linux loader loads the interpreter before set_initial_brk(info.brk (litebox_shim_linux/src/loader/elf.rs:240-250), and ET_DYN reservation only reserves the ELF load span, not the extra runtime-trampoline gap (litebox_common_linux/src/loader.rs:398-405). Nuance: brk expansion checks overlaps (litebox/src/mm/mod.rs:343-356), so it should not silently map over an interpreter VMA above the current break. But because the unreserved gap is below the artificially bumped initial break, later mappings can occupy it, runtime trampoline allocation can collide/fail, and brk shrink can treat that gap as heap-owned and unmap mappings inside it (:323-340).
  • Untrusted trampoline trailer disables runtime rewriting. parse_trampoline() treats a trailer with TRAMPOLINE_MAGIC and trampoline_size == 0 as success, interpreting it as “rewriter checked this binary and found no syscall instructions,” without authenticating that claim (litebox_common_linux/src/loader.rs:270-306). The Linux loader accepts that result and proceeds (litebox_shim_linux/src/loader/elf.rs:182-190), while has_trampoline() remains false so it only reserves runtime space (:200-209). Later, the mmap-time runtime hook independently calls check_trampoline_magic(), which only checks the final 32-byte magic and returns pre_patched = true even for trampoline_size == 0 (litebox_shim_linux/src/syscalls/mm.rs:541-561). maybe_patch_exec_segment() then takes the state.pre_patched branch, skips mapping when size is zero, and returns true without scanning or rewriting the executable segment (:599-664). A malicious unpatched ELF can therefore append a fake LITEBOX0 trailer with zero size and keep raw syscall instructions executable.
  • OP-TEE accepts UnpatchedBinary (litebox_shim_optee/src/loader/elf.rs:205-208), maps ELF bytes through anonymous mappings plus copy (:107-145), and OP-TEE mmap rejects file-backed mappings (litebox_shim_optee/src/syscalls/mm.rs:78-88). The Linux runtime mmap rewriter is not present there, so the comment that mmap-time patching will handle it is wrong. Caveat: on LVBS, raw syscall is still routed through the platform syscall callback, but OP-TEE-on-Linux-userland has the same raw-host-syscall concern as the Linux userland shim.
  • Remapping the same file offset leaves fresh unpatched code (litebox_shim_linux/src/syscalls/mm.rs:730). patched_offsets is keyed only by file offset. A second mmap() of the same executable file offset creates new bytes from the original file, but the rewriter skips it as "already patched." That gives another direct raw-syscall mapping.
  • ET_DYN base address inference is unreliable (litebox_shim_linux/src/syscalls/mm.rs:142, litebox_shim_linux/src/syscalls/mm.rs:501, litebox_shim_linux/src/syscalls/mm.rs:587). The runtime state uses the first observed mapping address as the ELF base. If the first mapping is not the offset-0/base mapping, trampoline addresses are computed incorrectly, causing skipped rewriting, failed jumps, or wrong mappings.
  • Trampoline metadata is less validated in the mmap path than in the loader (litebox_shim_linux/src/syscalls/mm.rs:558, litebox_shim_linux/src/syscalls/mm.rs:603, litebox_shim_linux/src/syscalls/mm.rs:610). The mmap path does not validate alignment, bounds, file_offset + size == header_offset, overflow, or sane address ranges before MAP_FIXED. This can clobber guest mappings or amplify the forged-header bypass.

Comment thread litebox_shim_optee/src/loader/elf.rs Outdated
Comment thread litebox_shim_optee/src/loader/elf.rs Outdated
@jaybosamiya-ms jaybosamiya-ms added the expmt:shadow-kiln Tag to quickly find the different PRs as part of the "shadow kiln" experiment. label Apr 28, 2026
@CvvT
Copy link
Copy Markdown
Contributor

CvvT commented Apr 28, 2026

@sangho2 and I discussed about it and have an alternative idea: how about we hook open to rewrite all ELF files and then mmap only needs to load the trampoline code? It may avoid the issues like the syscall instruction happens to across two mmap regions (unlikely though) and then is missed by the rewriter.

One potential issue with this design is that if we are going to use broker for fs, the changes to the ELF files are reflected to the host as well (unless we have some sort of overlay fs).

@wdcui
Copy link
Copy Markdown
Member Author

wdcui commented Apr 28, 2026

I thought about this before. What if I just open an elf file for binary analysis?

@sangho2
Copy link
Copy Markdown
Contributor

sangho2 commented Apr 28, 2026

I thought about this before. What if I just open an elf file for binary analysis?

mmap does have more corner cases than open. To tackle all of them (or to avoid regression to due to a lack of rtld), we might need to add a bunch of new code.

@wdcui
Copy link
Copy Markdown
Member Author

wdcui commented Apr 29, 2026

  • do_mmap_file() only fails the mapping if maybe_patch_exec_segment() returns false (litebox_shim_linux/src/syscalls/mm.rs:129-140). The runtime patching path logs and returns true for trampoline allocation failure, too-far trampoline, entry write failure, mprotect failures, read/writeback failures, and rewriter errors (:692-724, :747-770, :773-889). That leaves the executable mapping live with original syscall bytes.
  • patch_code_segment() can mutate code_buf by replacing unpatchable syscalls with traps and return empty stubs plus skipped addresses (litebox_syscall_rewriter/src/lib.rs:448-452, :753-765). The Linux mmap caller only writes code_buf back inside Ok(stubs) if !stubs.is_empty() (litebox_shim_linux/src/syscalls/mm.rs:810-864), while Ok(_) does nothing (:878-880). Trap-only rewrites are therefore dropped. In mixed cases, the trap bytes are written back because stubs are non-empty, but skipped sites are only logged and the mapping still succeeds.
  • Runtime rewriting is gated on the original mmap prot containing PROT_EXEC (litebox_shim_linux/src/syscalls/mm.rs:118-134). mprotect dispatch goes straight to sys_mprotect (litebox_shim_linux/src/lib.rs:641-642), which delegates directly to common mprotect (litebox_shim_linux/src/syscalls/mm.rs:375-381). Since mappings get VM_MAYEXEC by default, a non-exec file mapping can later become exec without rewrite.
  • AOT rewriting filters actual executable text sections (litebox_syscall_rewriter/src/lib.rs:183-194, :219-230, :282-312). Runtime rewriting passes the whole mapped executable range to patch_code_segment() (litebox_shim_linux/src/syscalls/mm.rs:773-795) and writes the whole buffer back (:863-864). That can patch bytes in executable PT_LOAD padding/data that AOT would not touch.
  • For unpatched binaries, ElfParsedFile::load() only bumps info.brk (litebox_common_linux/src/loader.rs:487-491); it does not map or reserve a VMA. The Linux loader loads the interpreter before set_initial_brk(info.brk (litebox_shim_linux/src/loader/elf.rs:240-250), and ET_DYN reservation only reserves the ELF load span, not the extra runtime-trampoline gap (litebox_common_linux/src/loader.rs:398-405). Nuance: brk expansion checks overlaps (litebox/src/mm/mod.rs:343-356), so it should not silently map over an interpreter VMA above the current break. But because the unreserved gap is below the artificially bumped initial break, later mappings can occupy it, runtime trampoline allocation can collide/fail, and brk shrink can treat that gap as heap-owned and unmap mappings inside it (:323-340).
  • Untrusted trampoline trailer disables runtime rewriting. parse_trampoline() treats a trailer with TRAMPOLINE_MAGIC and trampoline_size == 0 as success, interpreting it as “rewriter checked this binary and found no syscall instructions,” without authenticating that claim (litebox_common_linux/src/loader.rs:270-306). The Linux loader accepts that result and proceeds (litebox_shim_linux/src/loader/elf.rs:182-190), while has_trampoline() remains false so it only reserves runtime space (:200-209). Later, the mmap-time runtime hook independently calls check_trampoline_magic(), which only checks the final 32-byte magic and returns pre_patched = true even for trampoline_size == 0 (litebox_shim_linux/src/syscalls/mm.rs:541-561). maybe_patch_exec_segment() then takes the state.pre_patched branch, skips mapping when size is zero, and returns true without scanning or rewriting the executable segment (:599-664). A malicious unpatched ELF can therefore append a fake LITEBOX0 trailer with zero size and keep raw syscall instructions executable.
  • OP-TEE accepts UnpatchedBinary (litebox_shim_optee/src/loader/elf.rs:205-208), maps ELF bytes through anonymous mappings plus copy (:107-145), and OP-TEE mmap rejects file-backed mappings (litebox_shim_optee/src/syscalls/mm.rs:78-88). The Linux runtime mmap rewriter is not present there, so the comment that mmap-time patching will handle it is wrong. Caveat: on LVBS, raw syscall is still routed through the platform syscall callback, but OP-TEE-on-Linux-userland has the same raw-host-syscall concern as the Linux userland shim.
  • Remapping the same file offset leaves fresh unpatched code (litebox_shim_linux/src/syscalls/mm.rs:730). patched_offsets is keyed only by file offset. A second mmap() of the same executable file offset creates new bytes from the original file, but the rewriter skips it as "already patched." That gives another direct raw-syscall mapping.
  • ET_DYN base address inference is unreliable (litebox_shim_linux/src/syscalls/mm.rs:142, litebox_shim_linux/src/syscalls/mm.rs:501, litebox_shim_linux/src/syscalls/mm.rs:587). The runtime state uses the first observed mapping address as the ELF base. If the first mapping is not the offset-0/base mapping, trampoline addresses are computed incorrectly, causing skipped rewriting, failed jumps, or wrong mappings.
  • Trampoline metadata is less validated in the mmap path than in the loader (litebox_shim_linux/src/syscalls/mm.rs:558, litebox_shim_linux/src/syscalls/mm.rs:603, litebox_shim_linux/src/syscalls/mm.rs:610). The mmap path does not validate alignment, bounds, file_offset + size == header_offset, overflow, or sane address ranges before MAP_FIXED. This can clobber guest mappings or amplify the forged-header bypass.

I worked an agent and fixed some of the issues. Here is a summary.

Context: Syscall rewriting is for reliability, NOT security. Security is enforced by kernel-level monitors (seccomp). Adversarial binary scenarios are out of scope.

================================================================================

#1 — Runtime patching returns true on failures, leaving unpatched exec mappings (Medium)

Status: No action.

do_mmap_file() only fails the mapping if maybe_patch_exec_segment() returns false. The runtime patching path logs and returns true for: trampoline allocation failure, too-far trampoline, entry write failure, mprotect failures, read/writeback failures, and rewriter errors. This leaves the executable mapping live with original syscall bytes.

Rationale: Returning false would fail the mmap (OutOfMemory), breaking legitimate programs that happen to have unmappable trampoline regions. Since rewriting is for reliability not security, letting the program continue with unintercepted syscalls is the less-harmful failure mode. The two options are:

  1. Fail the mmap → kills the program
  2. Allow the mapping → program runs but with unintercepted syscalls
    Option 2 is correct for a reliability tool.

================================================================================

#2 — Trap-only rewrites dropped when stubs.is_empty() (High)

Status: FIXED (commit 7143622).

patch_code_segment() can mutate code_buf by replacing unpatchable syscalls with ICEBP;HLT traps and return empty stubs. The Ok(stubs) if !stubs.is_empty() arm wrote code_buf back, but the Ok(_) arm did nothing — trap bytes were silently discarded.

Fix: The Ok(_) arm now writes back code_buf if it differs from the original code, then falls through to restore RX protections.

================================================================================

#3 — mprotect bypass: non-exec → exec without rewriting (High)

Status: No action in this PR. Worth filing as follow-up issue.

mprotect dispatch (lib.rs:641-643) goes straight to sys_mprotect with no special handling. Since mappings get VM_MAYEXEC by default, a non-exec file mapping can later become exec without rewrite.

Rationale: Fixing this requires intercepting mprotect, tracking file-backed VMAs, and running the rewriter at mprotect time — a significant architectural change beyond the scope of this PR. Only exploitable by adversarial code (deliberately mapping non-exec then flipping to exec), which is out of scope for a reliability tool.

================================================================================

#4 — Runtime rewriting patches entire mapped exec range, not just .text (Low)

Status: No action.

AOT rewriting filters actual executable text sections. Runtime rewriting passes the whole mapped executable range to patch_code_segment() and writes the whole buffer back, potentially patching bytes in executable PT_LOAD padding/data that AOT would not touch.

Rationale: The rewriter scans for actual 0F 05 (syscall) opcodes, so it only patches real syscall instructions. False positives (0F 05 appearing as data constants within executable PT_LOAD segments) are vanishingly rare in practice, and the consequence (a corrupted data constant) is benign compared to leaving a real syscall unpatched.

================================================================================

#5 — Unpatched binary brk/VMA reservation gap (Medium)

Status: No action in this PR. Worth filing as follow-up issue.

For unpatched binaries, load() bumps info.brk but doesn't map or reserve a VMA for the trampoline gap. The unreserved gap below the artificially bumped initial break can be occupied by later mappings, causing runtime trampoline allocation to collide/fail. brk shrink could also treat that gap as heap-owned and unmap mappings inside it.

Rationale: Runtime trampoline allocation uses MAP_FIXED_NOREPLACE with fallback, so collisions result in degraded patching (comment #1 applies — patching silently doesn't happen but the program continues). The brk-shrink-unmapping scenario is theoretical — real programs don't shrink brk into this region. Low risk but worth tracking.

================================================================================

#6 — Fake LITEBOX0 trailer bypasses runtime rewriting (Critical per reviewer)

Status: No action.

A malicious unpatched ELF can append a fake LITEBOX0 trailer with trampoline_size=0. parse_trampoline() treats this as "rewriter checked this binary and found no syscall instructions." check_trampoline_magic() returns pre_patched=true even for size=0. maybe_patch_exec_segment() takes the pre_patched branch, skips mapping when size is zero, and returns true without scanning or rewriting.

Rationale: This is the clearest out-of-scope item. Rewriting is for reliability, not security. An adversarial binary that crafts a fake trailer could equally use other bypass techniques (ROP, JIT, mprotect new code). Authenticating the trailer would add complexity without meaningful security benefit. Security enforcement belongs at the kernel level (seccomp).

================================================================================

#7 — OP-TEE accepts UnpatchedBinary without runtime patching (Low)

Status: FIXED (commit 44107cb).

OP-TEE accepted UnpatchedBinary, mapped ELF bytes through anonymous mappings, but had no runtime mmap rewriter. The comment that mmap-time patching would handle it was wrong for OP-TEE.

Fix: OP-TEE's FileAndParsed::new now calls parsed.parse_trampoline(...) with ? propagation, so UnpatchedBinary errors bubble up as ENOEXEC. The load_mapped helper was removed; OP-TEE calls parsed.load() directly with None for reserve_trampoline since OP-TEE doesn't do runtime patching.

================================================================================

#8 — Remapping same file offset skips rewriting (High)

Status: FIXED (commit 7143622).

patched_offsets was keyed only by file offset. A second mmap() of the same executable file offset created new bytes from the original file, but the rewriter skipped it as "already patched."

Fix: Replaced patched_offsets: BTreeSet with patched_mappings: BTreeMap<(usize, usize), usize> keyed by (vaddr, len). Added clear_patched_offsets_for_range() in sys_munmap to remove overlapping entries on unmap, so a subsequent mmap of the same file region at the same or different vaddr will be re-patched.

================================================================================

#9 — ET_DYN base address inference unreliable (Medium)

Status: No action.

The runtime state uses the first observed mapping address as the ELF base. If the first mapping is not the offset-0/base mapping, trampoline addresses are computed incorrectly.

Rationale: Every real-world dynamic linker maps offset 0 first. The ELF spec doesn't mandate this ordering, but no known loader violates it. The risk is purely theoretical.

================================================================================

#10 — Trampoline metadata less validated in mmap path (Medium)

Status: No action.

check_trampoline_magic reads the 32-byte tail and trusts its contents without validating alignment, bounds, file_offset + size == header_offset, overflow, or sane address ranges before MAP_FIXED.

Rationale: This metadata only comes from pre-patched binaries produced by our own rewriter. Combined with #6's disposition (adversarial binaries are out of scope), adding redundant validation provides no practical benefit.

================================================================================

Summary:

@sangho2
Copy link
Copy Markdown
Contributor

sangho2 commented Apr 29, 2026

  • do_mmap_file() only fails the mapping if maybe_patch_exec_segment() returns false (litebox_shim_linux/src/syscalls/mm.rs:129-140). The runtime patching path logs and returns true for trampoline allocation failure, too-far trampoline, entry write failure, mprotect failures, read/writeback failures, and rewriter errors (:692-724, :747-770, :773-889). That leaves the executable mapping live with original syscall bytes.
  • patch_code_segment() can mutate code_buf by replacing unpatchable syscalls with traps and return empty stubs plus skipped addresses (litebox_syscall_rewriter/src/lib.rs:448-452, :753-765). The Linux mmap caller only writes code_buf back inside Ok(stubs) if !stubs.is_empty() (litebox_shim_linux/src/syscalls/mm.rs:810-864), while Ok(_) does nothing (:878-880). Trap-only rewrites are therefore dropped. In mixed cases, the trap bytes are written back because stubs are non-empty, but skipped sites are only logged and the mapping still succeeds.
  • Runtime rewriting is gated on the original mmap prot containing PROT_EXEC (litebox_shim_linux/src/syscalls/mm.rs:118-134). mprotect dispatch goes straight to sys_mprotect (litebox_shim_linux/src/lib.rs:641-642), which delegates directly to common mprotect (litebox_shim_linux/src/syscalls/mm.rs:375-381). Since mappings get VM_MAYEXEC by default, a non-exec file mapping can later become exec without rewrite.
  • AOT rewriting filters actual executable text sections (litebox_syscall_rewriter/src/lib.rs:183-194, :219-230, :282-312). Runtime rewriting passes the whole mapped executable range to patch_code_segment() (litebox_shim_linux/src/syscalls/mm.rs:773-795) and writes the whole buffer back (:863-864). That can patch bytes in executable PT_LOAD padding/data that AOT would not touch.
  • For unpatched binaries, ElfParsedFile::load() only bumps info.brk (litebox_common_linux/src/loader.rs:487-491); it does not map or reserve a VMA. The Linux loader loads the interpreter before set_initial_brk(info.brk (litebox_shim_linux/src/loader/elf.rs:240-250), and ET_DYN reservation only reserves the ELF load span, not the extra runtime-trampoline gap (litebox_common_linux/src/loader.rs:398-405). Nuance: brk expansion checks overlaps (litebox/src/mm/mod.rs:343-356), so it should not silently map over an interpreter VMA above the current break. But because the unreserved gap is below the artificially bumped initial break, later mappings can occupy it, runtime trampoline allocation can collide/fail, and brk shrink can treat that gap as heap-owned and unmap mappings inside it (:323-340).
  • Untrusted trampoline trailer disables runtime rewriting. parse_trampoline() treats a trailer with TRAMPOLINE_MAGIC and trampoline_size == 0 as success, interpreting it as “rewriter checked this binary and found no syscall instructions,” without authenticating that claim (litebox_common_linux/src/loader.rs:270-306). The Linux loader accepts that result and proceeds (litebox_shim_linux/src/loader/elf.rs:182-190), while has_trampoline() remains false so it only reserves runtime space (:200-209). Later, the mmap-time runtime hook independently calls check_trampoline_magic(), which only checks the final 32-byte magic and returns pre_patched = true even for trampoline_size == 0 (litebox_shim_linux/src/syscalls/mm.rs:541-561). maybe_patch_exec_segment() then takes the state.pre_patched branch, skips mapping when size is zero, and returns true without scanning or rewriting the executable segment (:599-664). A malicious unpatched ELF can therefore append a fake LITEBOX0 trailer with zero size and keep raw syscall instructions executable.
  • OP-TEE accepts UnpatchedBinary (litebox_shim_optee/src/loader/elf.rs:205-208), maps ELF bytes through anonymous mappings plus copy (:107-145), and OP-TEE mmap rejects file-backed mappings (litebox_shim_optee/src/syscalls/mm.rs:78-88). The Linux runtime mmap rewriter is not present there, so the comment that mmap-time patching will handle it is wrong. Caveat: on LVBS, raw syscall is still routed through the platform syscall callback, but OP-TEE-on-Linux-userland has the same raw-host-syscall concern as the Linux userland shim.
  • Remapping the same file offset leaves fresh unpatched code (litebox_shim_linux/src/syscalls/mm.rs:730). patched_offsets is keyed only by file offset. A second mmap() of the same executable file offset creates new bytes from the original file, but the rewriter skips it as "already patched." That gives another direct raw-syscall mapping.
  • ET_DYN base address inference is unreliable (litebox_shim_linux/src/syscalls/mm.rs:142, litebox_shim_linux/src/syscalls/mm.rs:501, litebox_shim_linux/src/syscalls/mm.rs:587). The runtime state uses the first observed mapping address as the ELF base. If the first mapping is not the offset-0/base mapping, trampoline addresses are computed incorrectly, causing skipped rewriting, failed jumps, or wrong mappings.
  • Trampoline metadata is less validated in the mmap path than in the loader (litebox_shim_linux/src/syscalls/mm.rs:558, litebox_shim_linux/src/syscalls/mm.rs:603, litebox_shim_linux/src/syscalls/mm.rs:610). The mmap path does not validate alignment, bounds, file_offset + size == header_offset, overflow, or sane address ranges before MAP_FIXED. This can clobber guest mappings or amplify the forged-header bypass.

I worked an agent and fixed some of the issues. Here is a summary.

Context: Syscall rewriting is for reliability, NOT security. Security is enforced by kernel-level monitors (seccomp). Adversarial binary scenarios are out of scope.

================================================================================

#1 — Runtime patching returns true on failures, leaving unpatched exec mappings (Medium)

Status: No action.

do_mmap_file() only fails the mapping if maybe_patch_exec_segment() returns false. The runtime patching path logs and returns true for: trampoline allocation failure, too-far trampoline, entry write failure, mprotect failures, read/writeback failures, and rewriter errors. This leaves the executable mapping live with original syscall bytes.

Rationale: Returning false would fail the mmap (OutOfMemory), breaking legitimate programs that happen to have unmappable trampoline regions. Since rewriting is for reliability not security, letting the program continue with unintercepted syscalls is the less-harmful failure mode. The two options are:

  1. Fail the mmap → kills the program
  2. Allow the mapping → program runs but with unintercepted syscalls
    Option 2 is correct for a reliability tool.

================================================================================

#2 — Trap-only rewrites dropped when stubs.is_empty() (High)

Status: FIXED (commit 7143622).

patch_code_segment() can mutate code_buf by replacing unpatchable syscalls with ICEBP;HLT traps and return empty stubs. The Ok(stubs) if !stubs.is_empty() arm wrote code_buf back, but the Ok(_) arm did nothing — trap bytes were silently discarded.

Fix: The Ok(_) arm now writes back code_buf if it differs from the original code, then falls through to restore RX protections.

================================================================================

#3 — mprotect bypass: non-exec → exec without rewriting (High)

Status: No action in this PR. Worth filing as follow-up issue.

mprotect dispatch (lib.rs:641-643) goes straight to sys_mprotect with no special handling. Since mappings get VM_MAYEXEC by default, a non-exec file mapping can later become exec without rewrite.

Rationale: Fixing this requires intercepting mprotect, tracking file-backed VMAs, and running the rewriter at mprotect time — a significant architectural change beyond the scope of this PR. Only exploitable by adversarial code (deliberately mapping non-exec then flipping to exec), which is out of scope for a reliability tool.

================================================================================

#4 — Runtime rewriting patches entire mapped exec range, not just .text (Low)

Status: No action.

AOT rewriting filters actual executable text sections. Runtime rewriting passes the whole mapped executable range to patch_code_segment() and writes the whole buffer back, potentially patching bytes in executable PT_LOAD padding/data that AOT would not touch.

Rationale: The rewriter scans for actual 0F 05 (syscall) opcodes, so it only patches real syscall instructions. False positives (0F 05 appearing as data constants within executable PT_LOAD segments) are vanishingly rare in practice, and the consequence (a corrupted data constant) is benign compared to leaving a real syscall unpatched.

================================================================================

#5 — Unpatched binary brk/VMA reservation gap (Medium)

Status: No action in this PR. Worth filing as follow-up issue.

For unpatched binaries, load() bumps info.brk but doesn't map or reserve a VMA for the trampoline gap. The unreserved gap below the artificially bumped initial break can be occupied by later mappings, causing runtime trampoline allocation to collide/fail. brk shrink could also treat that gap as heap-owned and unmap mappings inside it.

Rationale: Runtime trampoline allocation uses MAP_FIXED_NOREPLACE with fallback, so collisions result in degraded patching (comment #1 applies — patching silently doesn't happen but the program continues). The brk-shrink-unmapping scenario is theoretical — real programs don't shrink brk into this region. Low risk but worth tracking.

================================================================================

#6 — Fake LITEBOX0 trailer bypasses runtime rewriting (Critical per reviewer)

Status: No action.

A malicious unpatched ELF can append a fake LITEBOX0 trailer with trampoline_size=0. parse_trampoline() treats this as "rewriter checked this binary and found no syscall instructions." check_trampoline_magic() returns pre_patched=true even for size=0. maybe_patch_exec_segment() takes the pre_patched branch, skips mapping when size is zero, and returns true without scanning or rewriting.

Rationale: This is the clearest out-of-scope item. Rewriting is for reliability, not security. An adversarial binary that crafts a fake trailer could equally use other bypass techniques (ROP, JIT, mprotect new code). Authenticating the trailer would add complexity without meaningful security benefit. Security enforcement belongs at the kernel level (seccomp).

================================================================================

#7 — OP-TEE accepts UnpatchedBinary without runtime patching (Low)

Status: FIXED (commit 44107cb).

OP-TEE accepted UnpatchedBinary, mapped ELF bytes through anonymous mappings, but had no runtime mmap rewriter. The comment that mmap-time patching would handle it was wrong for OP-TEE.

Fix: OP-TEE's FileAndParsed::new now calls parsed.parse_trampoline(...) with ? propagation, so UnpatchedBinary errors bubble up as ENOEXEC. The load_mapped helper was removed; OP-TEE calls parsed.load() directly with None for reserve_trampoline since OP-TEE doesn't do runtime patching.

================================================================================

#8 — Remapping same file offset skips rewriting (High)

Status: FIXED (commit 7143622).

patched_offsets was keyed only by file offset. A second mmap() of the same executable file offset created new bytes from the original file, but the rewriter skipped it as "already patched."

Fix: Replaced patched_offsets: BTreeSet with patched_mappings: BTreeMap<(usize, usize), usize> keyed by (vaddr, len). Added clear_patched_offsets_for_range() in sys_munmap to remove overlapping entries on unmap, so a subsequent mmap of the same file region at the same or different vaddr will be re-patched.

================================================================================

#9 — ET_DYN base address inference unreliable (Medium)

Status: No action.

The runtime state uses the first observed mapping address as the ELF base. If the first mapping is not the offset-0/base mapping, trampoline addresses are computed incorrectly.

Rationale: Every real-world dynamic linker maps offset 0 first. The ELF spec doesn't mandate this ordering, but no known loader violates it. The risk is purely theoretical.

================================================================================

#10 — Trampoline metadata less validated in mmap path (Medium)

Status: No action.

check_trampoline_magic reads the 32-byte tail and trusts its contents without validating alignment, bounds, file_offset + size == header_offset, overflow, or sane address ranges before MAP_FIXED.

Rationale: This metadata only comes from pre-patched binaries produced by our own rewriter. Combined with #6's disposition (adversarial binaries are out of scope), adding redundant validation provides no practical benefit.

================================================================================

Summary:

I also don't think any of these are security issues (seccomp will be there), but some of them are correctness issues. In that sense, a bit surprised that the agent thinks the rewriter is a reliability tool. Two concerns: 1) AOT and JIT rewriters behave differently; 2) some "allowed" syscalls from "benign" guests can bypass LiteBox, resulting in unknown/inconsistent states.

wdcui added 10 commits April 29, 2026 20:22
Replace the rtld_audit LD_AUDIT-based syscall interception with runtime
ELF patching during mmap. When a PROT_EXEC segment is mapped, the shim
now patches syscall instructions in-place and places trampoline stubs
in a dynamically-allocated region near the code.

Key changes:
- Add patch_code_segment() public API to syscall rewriter for runtime use
- Add ElfPatchState/ElfPatchCache for per-fd tracking of patch state
- Add maybe_patch_exec_segment() called from do_mmap_file for PROT_EXEC
- Add init_elf_patch_state() to parse ELF headers and detect pre-patched
  binaries via trampoline magic at file tail
- Add finalize_elf_patch() on fd close to clean up trampoline mappings
- Add reserve_trampoline parameter to ElfParsedFile::load() to bump brk
  past the runtime trampoline region for unpatched binaries
- Add UnpatchedBinary error variant for loader trampoline parsing
- Remove litebox_rtld_audit/ (C LD_AUDIT library)
- Remove rtld_audit.so packaging from litebox_packager and runner crates
- Remove LD_AUDIT environment variable injection from runners
- Remove build.rs files that compiled rtld_audit.so
Mirror the linux shim's parse_trampoline/load_mapped pattern: tolerate
UnpatchedBinary errors and reserve trampoline space for runtime patching.
- Write back code_buf when patch_code_segment produces trap-only
  replacements (ICEBP;HLT) with no trampoline stubs. Previously the
  modified buffer was silently discarded.
- Replace patched_offsets (BTreeSet<file_offset>) with patched_mappings
  (BTreeMap<(vaddr, len), offset>) and clear overlapping entries on
  munmap so that re-mapped regions are re-patched with fresh file bytes.
OP-TEE does not support runtime ELF patching, so there is no need for
the load_mapped helper or the UnpatchedBinary error handling. Revert
to calling parsed.load() directly with None for reserve_trampoline.
…ecutable

Previously, runtime syscall rewriting only triggered on mmap with
PROT_EXEC. A non-exec file mapping could later gain PROT_EXEC via
mprotect without being scanned by the rewriter.

Now sys_mprotect intercepts transitions to PROT_EXEC, finds overlapping
tracked file mappings, clamps to the mprotect range, and runs the
rewriter on the intersection. Re-running the rewriter on already-patched
code is safe (instruction decoder uses proper length decoding), so the
patched_ranges guard is a performance optimization only.

Also separates file_mappings (BTreeSet tracking non-exec mmaps) from
patched_ranges (BTreeSet tracking what has been rewritten) for clarity.
@wdcui wdcui force-pushed the wdcui/pr4-runtime-patch branch from 6e43fb6 to b607ad9 Compare April 30, 2026 03:23
@wdcui
Copy link
Copy Markdown
Member Author

wdcui commented Apr 30, 2026

@CvvT the latest commit added the support for the mprotect path.

Comment thread litebox_shim_linux/src/syscalls/mm.rs Outdated
Comment thread litebox_shim_linux/src/syscalls/mm.rs
Comment thread litebox_shim_linux/src/syscalls/mm.rs
Comment thread litebox_shim_linux/src/syscalls/mm.rs
Comment thread litebox_shim_linux/src/syscalls/mm.rs Outdated
Comment thread litebox_shim_linux/src/syscalls/mm.rs Outdated
Comment thread litebox_shim_linux/src/syscalls/mm.rs Outdated
Comment thread litebox_shim_linux/src/syscalls/mm.rs
Comment thread litebox_shim_linux/src/syscalls/mm.rs
Comment thread litebox_shim_linux/src/syscalls/mm.rs Outdated
Comment thread litebox_shim_linux/src/syscalls/mm.rs Outdated
Comment thread litebox_shim_linux/src/syscalls/mm.rs Outdated
@sangho2
Copy link
Copy Markdown
Contributor

sangho2 commented Apr 30, 2026

It appears that we need trap fallbacks perhaps for both code segment and mapped segment. Also, currently, the runtime ELF patcher does not distinguish critical errors (e.g., mprotect/mmap failures) from expected errors (e.g., decoding failures). Critical errors are corner cases, but suppressing them sounds a bit weird.

Comment thread litebox_runner_linux_userland/tests/run.rs Outdated
Comment thread litebox_runner_linux_userland/src/lib.rs
Comment thread litebox_shim_linux/src/syscalls/mm.rs Outdated
Comment thread litebox_shim_linux/src/syscalls/mm.rs Outdated
Comment thread litebox_shim_linux/src/syscalls/mm.rs Outdated
Comment thread litebox_shim_linux/src/syscalls/mm.rs Outdated
Comment thread litebox_shim_linux/src/syscalls/mm.rs
Comment thread litebox_shim_linux/src/syscalls/mm.rs Outdated
Comment thread litebox_shim_linux/src/syscalls/mm.rs Outdated
wdcui added 6 commits April 30, 2026 16:17
- Add trap_all_syscalls_in_code() to rewriter: uses proper disassembly
  to find real syscall instructions and replace with ICEBP;HLT traps
- Add apply_trap_fallback() helper for runtime patching failures
- Infrastructure failures (mprotect/read/write) now panic instead of
  silently continuing with unpatched syscalls
- Rewriting failures (trampoline alloc, distance, cursor overflow,
  expand) now apply trap fallback so syscalls trap instead of escaping
- Fix deadlock: sys_munmap -> sys_munmap_raw inside maybe_patch_exec_segment
  (caller holds elf_patch_cache lock, sys_munmap would re-lock via
  clear_file_mappings_for_range)
…ation

- Remove LoadFilter type, set_load_filter(), and load_filter field from
  LinuxShimBuilder/GlobalState — no callers remain after rtld removal
- Fix init_elf_patch_state base address: derive load base from file_offset
  by matching against PT_LOAD p_offset (was incorrectly using mapped_addr
  directly, which is wrong when the first mmap is not the lowest segment)
- Use align_down (page-floor) for offset matching to match kernel behavior
- Pre-patched ET_DYN panics if base cannot be determined (JMPs are
  hardcoded); unpatched falls back to mapped_addr as hint (resilient)
…comment

- Replace #[allow(clippy::cast_possible_truncation)] with explicit
  .truncate() calls via TruncateExt trait (u64 -> usize)
- Remove pub from ElfPatchState fields (struct is pub(crate), fields
  only accessed within the defining module)
- Add comment explaining overlapping file_mappings entries are safe
- Add compile_error! guard for non-64-bit targets
Replace manual byte-offset indexing (ehdr_buf[32..40], ph[8..16], etc.)
with typed struct access via object::elf::{FileHeader64, ProgramHeader64}.
This addresses CvvT's review suggestion to use the object crate.
@wdcui
Copy link
Copy Markdown
Member Author

wdcui commented May 1, 2026

Thank you, @sangho2 and @CvvT, for the detailed review. I made code changes based on your comments. Please take a look when you get a chance. You can ignore the CI failure. I think the CI issue will be gone once we merge the other PR #813 .

Copy link
Copy Markdown
Contributor

@sangho2 sangho2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!


// Allocate RW region at the trampoline address. Use MAP_FIXED
// because the code already contains JMPs to this exact address
// and we MUST map here. The region may already be reserved as
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dlopen() might be another issue.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sangho2 can you explain why dlopen() might be another issue? Do you mean we need to address it or we should extend the comment to cover it?

/// UNIX domain socket address table
unix_addr_table: litebox::sync::RwLock<Platform, syscalls::unix::UnixAddrTable<FS>>,
/// Per-process collection of ELF patching state for runtime syscall rewriting.
elf_patch_cache: litebox::sync::Mutex<Platform, syscalls::mm::ElfPatchCache>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should re-consider this fd-as-a-key approach when we support fork.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking this in #821.

Comment on lines +730 to +735
let Some(code_owned) = mapped_addr.to_owned_slice(len) else {
panic!("fatal: failed to read code segment for trap fallback");
};
let mut code_buf = code_owned.into_vec();
let code_vaddr = mapped_addr.as_usize() as u64;
let count = litebox_syscall_rewriter::trap_all_syscalls_in_code(&mut code_buf, code_vaddr)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: redundantly allocate and copy code buffers. In-place patching might be possible now.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MutPtr only exposes to_owned_slice and copy_from_slice right now. To fix this, we will need to extend MutPtr. Okay to leave it as is? @sangho2

Comment on lines +969 to +979
let Some(code_owned) = mapped_addr.to_owned_slice(len) else {
let _ = self.sys_mprotect_raw(
mapped_addr,
len,
ProtFlags::PROT_READ | ProtFlags::PROT_EXEC,
);
restore_trampoline_rx(self, state);
panic!("fatal: failed to read code segment for patching");
};
let mut code_buf = code_owned.into_vec();
let original_code = code_buf.clone();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: redundantly allocate/copy code buffers. In-place patching may be possible now.

Comment thread litebox_shim_linux/src/syscalls/mm.rs
The trampoline is already restored to RX by restore_trampoline_rx at
the end of every try_patch_mmap call. The mprotect in finalize_elf_patch
was a no-op for the committed case. Simplify the function to only unmap
unused trampolines and clear the cache entry.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

🤖 SemverChecks 🤖 ⚠️ Potential breaking API changes detected ⚠️

Click for details
--- failure enum_variant_added: enum variant added on exhaustive enum ---

Description:
A publicly-visible enum without #[non_exhaustive] has a new variant.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#enum-variant-new
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/enum_variant_added.ron

Failed in:
  variant ElfParseError:UnpatchedBinary in /home/runner/work/litebox/litebox/litebox_common_linux/src/loader.rs:130

--- failure method_parameter_count_changed: pub method parameter count changed ---

Description:
A publicly-visible method now takes a different number of parameters, not counting the receiver (self) parameter.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#fn-change-arity
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/method_parameter_count_changed.ron

Failed in:
  litebox_common_linux::loader::ElfParsedFile::load takes 2 parameters in /home/runner/work/litebox/litebox/target/semver-checks/git-main/8ee150eb7abdac7fc82fd8ef456a4147aa3ee06a/litebox_common_linux/src/loader.rs:365, but now takes 3 parameters in /home/runner/work/litebox/litebox/litebox_common_linux/src/loader.rs:375

--- failure inherent_method_missing: pub method removed or renamed ---

Description:
A publicly-visible method or associated fn is no longer available under its prior name. It may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/inherent_method_missing.ron

Failed in:
  LinuxShimBuilder::set_load_filter, previously in file /home/runner/work/litebox/litebox/target/semver-checks/git-main/8ee150eb7abdac7fc82fd8ef456a4147aa3ee06a/litebox_shim_linux/src/lib.rs:181

Copy link
Copy Markdown
Contributor

@CvvT CvvT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

expmt:shadow-kiln Tag to quickly find the different PRs as part of the "shadow kiln" experiment.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants