Skip to content

Re-encode IP-relative instructions when relocating into trampoline#813

Merged
wdcui merged 6 commits into
mainfrom
wdcui/pr2-ip-relative
May 6, 2026
Merged

Re-encode IP-relative instructions when relocating into trampoline#813
wdcui merged 6 commits into
mainfrom
wdcui/pr2-ip-relative

Conversation

@wdcui
Copy link
Copy Markdown
Member

@wdcui wdcui commented Apr 25, 2026

This PR improves the syscall rewriter by handling IP-relative instructions in pre or post syscall case (detect RIP-relative operands and re-encode them at the correct trampoline virtual address).

@wdcui wdcui marked this pull request as ready for review April 25, 2026 17:46
@wdcui
Copy link
Copy Markdown
Member Author

wdcui commented Apr 25, 2026

This PR is ready for review, but it should NOT be merged untill PR #811 is merged so we can remove the temporary x86 test change in this PR.

Comment thread litebox_syscall_rewriter/tests/snapshot_tests.rs
@sangho2 sangho2 added must-not-merge:blocked-on-other-changes Other changes/PRs to be handled first. Label not needed for non-main changes. labels Apr 27, 2026
@jaybosamiya-ms jaybosamiya-ms added the expmt:shadow-kiln Tag to quickly find the different PRs as part of the "shadow kiln" experiment. label Apr 28, 2026
@wdcui wdcui force-pushed the wdcui/pr2-ip-relative branch from 0453491 to 415d3f9 Compare April 30, 2026 04:19
@wdcui
Copy link
Copy Markdown
Member Author

wdcui commented Apr 30, 2026

@sangho2 now the x86 removal PR is merged, I guess we can remove the "must-not-merge" tag?

@sangho2
Copy link
Copy Markdown
Contributor

sangho2 commented Apr 30, 2026

@sangho2 now the x86 removal PR is merged, I guess we can remove the "must-not-merge" tag?

Sure. Let me remove it.

@sangho2 sangho2 removed the must-not-merge:blocked-on-other-changes Other changes/PRs to be handled first. Label not needed for non-main changes. label Apr 30, 2026
@CvvT
Copy link
Copy Markdown
Contributor

CvvT commented Apr 30, 2026

A potential issue found by agent:

Medium: [lib.rs:353-357] and [lib.rs:837-841] now allow the x86_64 scan to cross all outgoing control transfers, including call/indirect call. Re-encoding fixes the call target, but it cannot preserve the architectural return address that call pushes. A relocated call will push an address inside the trampoline instead of the original fallthrough address, which can break call/pop PIC patterns, stack inspection, unwinding/profiling, or callees that otherwise observe their caller. I’d keep Call/IndirectCall as relocation boundaries unless the trampoline explicitly synthesizes the original return address.

Should we prefer rewriting instructions without jmp and when neither before- nor after- hooks works, we rewrite jmp?

@wdcui
Copy link
Copy Markdown
Member Author

wdcui commented Apr 30, 2026

A potential issue found by agent:

Medium: [lib.rs:353-357] and [lib.rs:837-841] now allow the x86_64 scan to cross all outgoing control transfers, including call/indirect call. Re-encoding fixes the call target, but it cannot preserve the architectural return address that call pushes. A relocated call will push an address inside the trampoline instead of the original fallthrough address, which can break call/pop PIC patterns, stack inspection, unwinding/profiling, or callees that otherwise observe their caller. I’d keep Call/IndirectCall as relocation boundaries unless the trampoline explicitly synthesizes the original return address.

Should we prefer rewriting instructions without jmp and when neither before- nor after- hooks works, we rewrite jmp?

Maybe just ignore call instructions? @CvvT

@CvvT
Copy link
Copy Markdown
Contributor

CvvT commented May 1, 2026

A potential issue found by agent:

Medium: [lib.rs:353-357] and [lib.rs:837-841] now allow the x86_64 scan to cross all outgoing control transfers, including call/indirect call. Re-encoding fixes the call target, but it cannot preserve the architectural return address that call pushes. A relocated call will push an address inside the trampoline instead of the original fallthrough address, which can break call/pop PIC patterns, stack inspection, unwinding/profiling, or callees that otherwise observe their caller. I’d keep Call/IndirectCall as relocation boundaries unless the trampoline explicitly synthesizes the original return address.

Should we prefer rewriting instructions without jmp and when neither before- nor after- hooks works, we rewrite jmp?

Maybe just ignore call instructions? @CvvT

Sounds reasonable to me.

@wdcui
Copy link
Copy Markdown
Member Author

wdcui commented May 1, 2026

@CvvT @sangho2, thoughts about this PR?

Copy link
Copy Markdown
Contributor

@CvvT CvvT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

One comment and one question: Should we prefer rewriting instructions without jmp (if it is more stable to rewrite instructions without jmp)?

Comment thread litebox_syscall_rewriter/src/lib.rs Outdated
@wdcui
Copy link
Copy Markdown
Member Author

wdcui commented May 4, 2026

Should we prefer rewriting instructions without jmp and when neither before- nor after- hooks works, we rewrite jmp?

As we discussed offline, we don't need this optimization in this PR right now.

@wdcui wdcui force-pushed the wdcui/pr2-ip-relative branch from f85f0e2 to 73e4e80 Compare May 5, 2026 02:14
@wdcui
Copy link
Copy Markdown
Member Author

wdcui commented May 5, 2026

@sangho2 would you like to take another look at this PR or it's okay for me to merge it? Thanks!

Copy link
Copy Markdown
Contributor

@sangho2 sangho2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable in general. Seems that there are still some corner cases, but we could fix them later.

wdcui added 6 commits May 5, 2026 17:06
When the syscall rewriter copies pre-syscall or post-syscall instructions
into the trampoline, any RIP-relative memory operands become incorrect
because the instruction is now at a different virtual address. Detect
this with is_ip_rel_memory_operand and re-encode affected instructions
via iced_x86::Encoder at the correct trampoline IP.

If re-encoding fails (e.g. the instruction changes size), the pre-syscall
path falls back to hook_syscall_and_after; the post-syscall path rolls
back the trampoline data to a checkpoint and returns
InsufficientBytesBeforeOrAfter so the syscall is trapped instead.
Extract encode_instructions_for_trampoline() and reencode_instructions_at()
so both pre-syscall and post-syscall paths use the same encode-first,
append-on-success pattern. This eliminates the trampoline_data checkpoint/
rollback mechanism and fixes an O(n) skip_while scan by using direct
index-based slicing from the backward/forward scan loops.
Instead of only re-encoding instructions with RIP-relative memory
operands and raw-copying the rest, always run all relocated instructions
through the encoder. This correctly handles IP-relative branch targets
(call/jmp/jcc) in addition to RIP-relative memory, and allows the
backward/forward scans to cross outgoing control transfers on x86_64
since the encoder fixes up relative displacements automatically.

The x86_32 scan paths retain the control-transfer break since the
encoder is 64-bit only; x86_32 support is removed in a separate PR.
Call/IndirectCall instructions must not be relocated into the trampoline
because the return address pushed on the stack would point into the
trampoline instead of the original code, breaking unwinding, profiling,
and call/pop PIC patterns. Remove the now-unused arch parameter from
hook_syscall_and_after.
Both loops previously included the syscall instruction itself in their
iteration range and used code-equality guards (inst_id != i / code() !=
syscall_inst.code()) to skip the flow-control check on it. This was
necessary because iced-x86 classifies syscall as FlowControl::Call.

Refactor:
- Backward loop: start at (0..i) instead of (0..=i), and hoist the
  control-transfer-target check for the syscall address outside the loop.
- Forward loop (hook_syscall_and_after): skip(inst_index + 1) instead of
  skip(inst_index), removing both code-equality guards.

Behavior is preserved: the syscall is still included in the replaced
byte range, and real call instructions still terminate the scan.
@wdcui wdcui force-pushed the wdcui/pr2-ip-relative branch from 673535f to b6e7001 Compare May 6, 2026 00:06
@wdcui wdcui enabled auto-merge May 6, 2026 00:07
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

🤖 SemverChecks 🤖 No breaking API changes detected

Note: this does not mean API is unchanged, or even that there are no breaking changes; simply, none of the detections triggered.

@wdcui wdcui added this pull request to the merge queue May 6, 2026
Merged via the queue into main with commit 7b5140d May 6, 2026
13 checks passed
@wdcui wdcui deleted the wdcui/pr2-ip-relative branch May 6, 2026 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

expmt:shadow-kiln Tag to quickly find the different PRs as part of the "shadow kiln" experiment.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants