Conversation
|
This PR is ready for review. Thanks! |
|
Agents have found a bunch of vulnerable code paths that do not rewrite executable pages and more. We should fix all of them and perhaps should merge #805 before this PR. I add them below to ease agent-driven code patching. I'll "manually" add review later.
|
|
@sangho2 and I discussed about it and have an alternative idea: how about we hook One potential issue with this design is that if we are going to use broker for fs, the changes to the ELF files are reflected to the host as well (unless we have some sort of overlay fs). |
|
I thought about this before. What if I just open an elf file for binary analysis? |
mmap does have more corner cases than open. To tackle all of them (or to avoid regression to due to a lack of rtld), we might need to add a bunch of new code. |
I worked an agent and fixed some of the issues. Here is a summary. Context: Syscall rewriting is for reliability, NOT security. Security is enforced by kernel-level monitors (seccomp). Adversarial binary scenarios are out of scope. ================================================================================ #1 — Runtime patching returns true on failures, leaving unpatched exec mappings (Medium) Status: No action. do_mmap_file() only fails the mapping if maybe_patch_exec_segment() returns false. The runtime patching path logs and returns true for: trampoline allocation failure, too-far trampoline, entry write failure, mprotect failures, read/writeback failures, and rewriter errors. This leaves the executable mapping live with original syscall bytes. Rationale: Returning false would fail the mmap (OutOfMemory), breaking legitimate programs that happen to have unmappable trampoline regions. Since rewriting is for reliability not security, letting the program continue with unintercepted syscalls is the less-harmful failure mode. The two options are:
================================================================================ #2 — Trap-only rewrites dropped when stubs.is_empty() (High) Status: FIXED (commit 7143622). patch_code_segment() can mutate code_buf by replacing unpatchable syscalls with ICEBP;HLT traps and return empty stubs. The Ok(stubs) if !stubs.is_empty() arm wrote code_buf back, but the Ok(_) arm did nothing — trap bytes were silently discarded. Fix: The Ok(_) arm now writes back code_buf if it differs from the original code, then falls through to restore RX protections. ================================================================================ #3 — mprotect bypass: non-exec → exec without rewriting (High) Status: No action in this PR. Worth filing as follow-up issue. mprotect dispatch (lib.rs:641-643) goes straight to sys_mprotect with no special handling. Since mappings get VM_MAYEXEC by default, a non-exec file mapping can later become exec without rewrite. Rationale: Fixing this requires intercepting mprotect, tracking file-backed VMAs, and running the rewriter at mprotect time — a significant architectural change beyond the scope of this PR. Only exploitable by adversarial code (deliberately mapping non-exec then flipping to exec), which is out of scope for a reliability tool. ================================================================================ #4 — Runtime rewriting patches entire mapped exec range, not just .text (Low) Status: No action. AOT rewriting filters actual executable text sections. Runtime rewriting passes the whole mapped executable range to patch_code_segment() and writes the whole buffer back, potentially patching bytes in executable PT_LOAD padding/data that AOT would not touch. Rationale: The rewriter scans for actual 0F 05 (syscall) opcodes, so it only patches real syscall instructions. False positives (0F 05 appearing as data constants within executable PT_LOAD segments) are vanishingly rare in practice, and the consequence (a corrupted data constant) is benign compared to leaving a real syscall unpatched. ================================================================================ #5 — Unpatched binary brk/VMA reservation gap (Medium) Status: No action in this PR. Worth filing as follow-up issue. For unpatched binaries, load() bumps info.brk but doesn't map or reserve a VMA for the trampoline gap. The unreserved gap below the artificially bumped initial break can be occupied by later mappings, causing runtime trampoline allocation to collide/fail. brk shrink could also treat that gap as heap-owned and unmap mappings inside it. Rationale: Runtime trampoline allocation uses MAP_FIXED_NOREPLACE with fallback, so collisions result in degraded patching (comment #1 applies — patching silently doesn't happen but the program continues). The brk-shrink-unmapping scenario is theoretical — real programs don't shrink brk into this region. Low risk but worth tracking. ================================================================================ #6 — Fake LITEBOX0 trailer bypasses runtime rewriting (Critical per reviewer) Status: No action. A malicious unpatched ELF can append a fake LITEBOX0 trailer with trampoline_size=0. parse_trampoline() treats this as "rewriter checked this binary and found no syscall instructions." check_trampoline_magic() returns pre_patched=true even for size=0. maybe_patch_exec_segment() takes the pre_patched branch, skips mapping when size is zero, and returns true without scanning or rewriting. Rationale: This is the clearest out-of-scope item. Rewriting is for reliability, not security. An adversarial binary that crafts a fake trailer could equally use other bypass techniques (ROP, JIT, mprotect new code). Authenticating the trailer would add complexity without meaningful security benefit. Security enforcement belongs at the kernel level (seccomp). ================================================================================ #7 — OP-TEE accepts UnpatchedBinary without runtime patching (Low) Status: FIXED (commit 44107cb). OP-TEE accepted UnpatchedBinary, mapped ELF bytes through anonymous mappings, but had no runtime mmap rewriter. The comment that mmap-time patching would handle it was wrong for OP-TEE. Fix: OP-TEE's FileAndParsed::new now calls parsed.parse_trampoline(...) with ? propagation, so UnpatchedBinary errors bubble up as ENOEXEC. The load_mapped helper was removed; OP-TEE calls parsed.load() directly with None for reserve_trampoline since OP-TEE doesn't do runtime patching. ================================================================================ #8 — Remapping same file offset skips rewriting (High) Status: FIXED (commit 7143622). patched_offsets was keyed only by file offset. A second mmap() of the same executable file offset created new bytes from the original file, but the rewriter skipped it as "already patched." Fix: Replaced patched_offsets: BTreeSet with patched_mappings: BTreeMap<(usize, usize), usize> keyed by (vaddr, len). Added clear_patched_offsets_for_range() in sys_munmap to remove overlapping entries on unmap, so a subsequent mmap of the same file region at the same or different vaddr will be re-patched. ================================================================================ #9 — ET_DYN base address inference unreliable (Medium) Status: No action. The runtime state uses the first observed mapping address as the ELF base. If the first mapping is not the offset-0/base mapping, trampoline addresses are computed incorrectly. Rationale: Every real-world dynamic linker maps offset 0 first. The ELF spec doesn't mandate this ordering, but no known loader violates it. The risk is purely theoretical. ================================================================================ #10 — Trampoline metadata less validated in mmap path (Medium) Status: No action. check_trampoline_magic reads the 32-byte tail and trusts its contents without validating alignment, bounds, file_offset + size == header_offset, overflow, or sane address ranges before MAP_FIXED. Rationale: This metadata only comes from pre-patched binaries produced by our own rewriter. Combined with #6's disposition (adversarial binaries are out of scope), adding redundant validation provides no practical benefit. ================================================================================ Summary:
|
I also don't think any of these are security issues (seccomp will be there), but some of them are correctness issues. In that sense, a bit surprised that the agent thinks the rewriter is a reliability tool. Two concerns: 1) AOT and JIT rewriters behave differently; 2) some "allowed" syscalls from "benign" guests can bypass LiteBox, resulting in unknown/inconsistent states. |
Replace the rtld_audit LD_AUDIT-based syscall interception with runtime ELF patching during mmap. When a PROT_EXEC segment is mapped, the shim now patches syscall instructions in-place and places trampoline stubs in a dynamically-allocated region near the code. Key changes: - Add patch_code_segment() public API to syscall rewriter for runtime use - Add ElfPatchState/ElfPatchCache for per-fd tracking of patch state - Add maybe_patch_exec_segment() called from do_mmap_file for PROT_EXEC - Add init_elf_patch_state() to parse ELF headers and detect pre-patched binaries via trampoline magic at file tail - Add finalize_elf_patch() on fd close to clean up trampoline mappings - Add reserve_trampoline parameter to ElfParsedFile::load() to bump brk past the runtime trampoline region for unpatched binaries - Add UnpatchedBinary error variant for loader trampoline parsing - Remove litebox_rtld_audit/ (C LD_AUDIT library) - Remove rtld_audit.so packaging from litebox_packager and runner crates - Remove LD_AUDIT environment variable injection from runners - Remove build.rs files that compiled rtld_audit.so
Mirror the linux shim's parse_trampoline/load_mapped pattern: tolerate UnpatchedBinary errors and reserve trampoline space for runtime patching.
- Write back code_buf when patch_code_segment produces trap-only replacements (ICEBP;HLT) with no trampoline stubs. Previously the modified buffer was silently discarded. - Replace patched_offsets (BTreeSet<file_offset>) with patched_mappings (BTreeMap<(vaddr, len), offset>) and clear overlapping entries on munmap so that re-mapped regions are re-patched with fresh file bytes.
OP-TEE does not support runtime ELF patching, so there is no need for the load_mapped helper or the UnpatchedBinary error handling. Revert to calling parsed.load() directly with None for reserve_trampoline.
…ecutable Previously, runtime syscall rewriting only triggered on mmap with PROT_EXEC. A non-exec file mapping could later gain PROT_EXEC via mprotect without being scanned by the rewriter. Now sys_mprotect intercepts transitions to PROT_EXEC, finds overlapping tracked file mappings, clamps to the mprotect range, and runs the rewriter on the intersection. Re-running the rewriter on already-patched code is safe (instruction decoder uses proper length decoding), so the patched_ranges guard is a performance optimization only. Also separates file_mappings (BTreeSet tracking non-exec mmaps) from patched_ranges (BTreeSet tracking what has been rewritten) for clarity.
6e43fb6 to
b607ad9
Compare
|
@CvvT the latest commit added the support for the mprotect path. |
|
It appears that we need trap fallbacks perhaps for both code segment and mapped segment. Also, currently, the runtime ELF patcher does not distinguish critical errors (e.g., mprotect/mmap failures) from expected errors (e.g., decoding failures). Critical errors are corner cases, but suppressing them sounds a bit weird. |
- Add trap_all_syscalls_in_code() to rewriter: uses proper disassembly to find real syscall instructions and replace with ICEBP;HLT traps - Add apply_trap_fallback() helper for runtime patching failures - Infrastructure failures (mprotect/read/write) now panic instead of silently continuing with unpatched syscalls - Rewriting failures (trampoline alloc, distance, cursor overflow, expand) now apply trap fallback so syscalls trap instead of escaping - Fix deadlock: sys_munmap -> sys_munmap_raw inside maybe_patch_exec_segment (caller holds elf_patch_cache lock, sys_munmap would re-lock via clear_file_mappings_for_range)
…ation - Remove LoadFilter type, set_load_filter(), and load_filter field from LinuxShimBuilder/GlobalState — no callers remain after rtld removal - Fix init_elf_patch_state base address: derive load base from file_offset by matching against PT_LOAD p_offset (was incorrectly using mapped_addr directly, which is wrong when the first mmap is not the lowest segment) - Use align_down (page-floor) for offset matching to match kernel behavior - Pre-patched ET_DYN panics if base cannot be determined (JMPs are hardcoded); unpatched falls back to mapped_addr as hint (resilient)
…comment - Replace #[allow(clippy::cast_possible_truncation)] with explicit .truncate() calls via TruncateExt trait (u64 -> usize) - Remove pub from ElfPatchState fields (struct is pub(crate), fields only accessed within the defining module) - Add comment explaining overlapping file_mappings entries are safe - Add compile_error! guard for non-64-bit targets
Replace manual byte-offset indexing (ehdr_buf[32..40], ph[8..16], etc.)
with typed struct access via object::elf::{FileHeader64, ProgramHeader64}.
This addresses CvvT's review suggestion to use the object crate.
|
|
||
| // Allocate RW region at the trampoline address. Use MAP_FIXED | ||
| // because the code already contains JMPs to this exact address | ||
| // and we MUST map here. The region may already be reserved as |
There was a problem hiding this comment.
dlopen() might be another issue.
There was a problem hiding this comment.
@sangho2 can you explain why dlopen() might be another issue? Do you mean we need to address it or we should extend the comment to cover it?
| /// UNIX domain socket address table | ||
| unix_addr_table: litebox::sync::RwLock<Platform, syscalls::unix::UnixAddrTable<FS>>, | ||
| /// Per-process collection of ELF patching state for runtime syscall rewriting. | ||
| elf_patch_cache: litebox::sync::Mutex<Platform, syscalls::mm::ElfPatchCache>, |
There was a problem hiding this comment.
We should re-consider this fd-as-a-key approach when we support fork.
| let Some(code_owned) = mapped_addr.to_owned_slice(len) else { | ||
| panic!("fatal: failed to read code segment for trap fallback"); | ||
| }; | ||
| let mut code_buf = code_owned.into_vec(); | ||
| let code_vaddr = mapped_addr.as_usize() as u64; | ||
| let count = litebox_syscall_rewriter::trap_all_syscalls_in_code(&mut code_buf, code_vaddr) |
There was a problem hiding this comment.
nits: redundantly allocate and copy code buffers. In-place patching might be possible now.
There was a problem hiding this comment.
MutPtr only exposes to_owned_slice and copy_from_slice right now. To fix this, we will need to extend MutPtr. Okay to leave it as is? @sangho2
| let Some(code_owned) = mapped_addr.to_owned_slice(len) else { | ||
| let _ = self.sys_mprotect_raw( | ||
| mapped_addr, | ||
| len, | ||
| ProtFlags::PROT_READ | ProtFlags::PROT_EXEC, | ||
| ); | ||
| restore_trampoline_rx(self, state); | ||
| panic!("fatal: failed to read code segment for patching"); | ||
| }; | ||
| let mut code_buf = code_owned.into_vec(); | ||
| let original_code = code_buf.clone(); |
There was a problem hiding this comment.
nits: redundantly allocate/copy code buffers. In-place patching may be possible now.
The trampoline is already restored to RX by restore_trampoline_rx at the end of every try_patch_mmap call. The mprotect in finalize_elf_patch was a no-op for the committed case. Simplify the function to only unmap unused trampolines and clear the cache entry.
|
🤖 SemverChecks 🤖 Click for details |
This PR replace the rtld_audit based syscall interception with runtime ELF patching during mmap. When a PROT_EXEC segment is mapped, the shim now patches syscall instructions in-place and places trampoline stubs in a dynamically-allocated region near the code.