Skip to content

Fix ARM64 interface dispatch cache torn read#126346

Merged
MichalStrehovsky merged 3 commits intodotnet:mainfrom
MichalStrehovsky:fix/arm64-interface-dispatch-torn-read
Apr 8, 2026
Merged

Fix ARM64 interface dispatch cache torn read#126346
MichalStrehovsky merged 3 commits intodotnet:mainfrom
MichalStrehovsky:fix/arm64-interface-dispatch-torn-read

Conversation

@MichalStrehovsky
Copy link
Copy Markdown
Member

On ARM64, the CHECK_CACHE_ENTRY macro read m_pInstanceType and m_pTargetCode from a cache entry using two separate ldr instructions separated by a control dependency (cmp/bne). ARM64's weak memory model does not order loads across control dependencies, so the hardware can speculatively satisfy the second load (target) before the first (type) commits. When a concurrent thread atomically populates the entry via stlxp/casp (UpdateCacheEntryAtomically), the reader can observe the new m_pInstanceType but the old m_pTargetCode (0), then br to address 0.

Fix by using ldp to load both fields in a single instruction (single-copy atomic on FEAT_LSE2 / ARMv8.4+ hardware), plus a cbz guard to catch torn reads on pre-LSE2 hardware where ldp pair atomicity is not architecturally guaranteed.

Fixes #126345

On ARM64, the CHECK_CACHE_ENTRY macro read m_pInstanceType and m_pTargetCode
from a cache entry using two separate ldr instructions separated by a control
dependency (cmp/bne). ARM64's weak memory model does not order loads across
control dependencies, so the hardware can speculatively satisfy the second
load (target) before the first (type) commits. When a concurrent thread
atomically populates the entry via stlxp/casp (UpdateCacheEntryAtomically),
the reader can observe the new m_pInstanceType but the old m_pTargetCode (0),
then br to address 0.

Fix by using ldp to load both fields in a single instruction (single-copy
atomic on FEAT_LSE2 / ARMv8.4+ hardware), plus a cbz guard to catch torn
reads on pre-LSE2 hardware where ldp pair atomicity is not architecturally
guaranteed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @agocke, @dotnet/ilc-contrib
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a potential torn-read race in the ARM64 cached interface dispatch fast-path that could lead to branching to address 0 when reading a concurrently populated cache entry.

Changes:

  • Replace two independent loads of cache-entry fields with a single ldp pair load to avoid reordering across control dependencies on ARM64.
  • Add a cbz guard on the loaded target to treat observed torn reads (type updated, target still 0) as a cache miss on pre-LSE2 hardware.
  • Mirror the changes in both the GAS (.S) and ARMASM (.asm) implementations.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/coreclr/runtime/arm64/StubDispatch.S Updates ARM64 stub macro to use ldp + cbz to avoid torn cache-entry reads.
src/coreclr/runtime/arm64/StubDispatch.asm Same logic as above for the ARMASM variant to keep implementations consistent.

Comment thread src/coreclr/runtime/arm64/StubDispatch.S Outdated
Comment thread src/coreclr/runtime/arm64/StubDispatch.asm Outdated
Instead of emitting an add instruction per cache entry when the ldp
offset exceeds [-512,504], rebase x9 once when the threshold is
crossed. This keeps the per-entry probe to a single ldp for all
entries in the 32/64 slot stubs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread src/coreclr/runtime/arm64/StubDispatch.asm
@MichalStrehovsky
Copy link
Copy Markdown
Member Author

/azp run runtime-nativeaot-outerloop

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Comment thread src/coreclr/runtime/arm64/StubDispatch.S
Comment thread src/coreclr/runtime/arm64/StubDispatch.asm
Apple ARM64 platforms all have FEAT_LSE2, which makes ldp single-copy
atomic for 16-byte aligned pairs. The cbz torn-read guard is
unnecessary there.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@MichalStrehovsky
Copy link
Copy Markdown
Member Author

I also looked whether the new scheme over at #123252 would have this issue in the monomorphic path (that looks similar to this) and it seems like it doesn't because of ldar use. I guess ldar could be another option here.

@MichalStrehovsky
Copy link
Copy Markdown
Member Author

/azp run runtime-nativeaot-outerloop

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@MichalStrehovsky
Copy link
Copy Markdown
Member Author

/ba-g the GC test issue is unrelated and I fixed it. the leg timeouts are not interesting because they're not ARM64

@MichalStrehovsky MichalStrehovsky merged commit 0bf5e88 into dotnet:main Apr 8, 2026
121 of 135 checks passed
@MichalStrehovsky MichalStrehovsky deleted the fix/arm64-interface-dispatch-torn-read branch April 8, 2026 03:04
@MichalStrehovsky
Copy link
Copy Markdown
Member Author

/backport to release/10.0

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

Started backporting to release/10.0 (link to workflow run)

radekdoulik pushed a commit to radekdoulik/runtime that referenced this pull request Apr 9, 2026
On ARM64, the CHECK_CACHE_ENTRY macro read m_pInstanceType and
m_pTargetCode from a cache entry using two separate ldr instructions
separated by a control dependency (cmp/bne). ARM64's weak memory model
does not order loads across control dependencies, so the hardware can
speculatively satisfy the second load (target) before the first (type)
commits. When a concurrent thread atomically populates the entry via
stlxp/casp (UpdateCacheEntryAtomically), the reader can observe the new
m_pInstanceType but the old m_pTargetCode (0), then br to address 0.

Fix by using ldp to load both fields in a single instruction
(single-copy atomic on FEAT_LSE2 / ARMv8.4+ hardware), plus a cbz guard
to catch torn reads on pre-LSE2 hardware where ldp pair atomicity is not
architecturally guaranteed.

Fixes dotnet#126345
JulieLeeMSFT pushed a commit that referenced this pull request Apr 17, 2026
Backport of #126346 to release/10.0

/cc @MichalStrehovsky

## Customer Impact

- [x] Customer reported
- [ ] Found internally

Reported by a first party. Torn read in ARM64 interface dispatch can
cause a dispatch to null.

Interface dispatch cell is a pair or MethodTable and a code address. If
the type of MethodTable matches the type of `this`, we call the code
address. The pair is written atomically, however the read is not atomic
and on ARM64 a torn read can happen, matching the MethodTable, but still
seeing a 0 code address.

## Regression

- [ ] Yes
- [x] No

Not a regression, this bug has existed ever since ARM64 was added to
.NET Native for UWP apps in 2017 or so.

## Testing

This is a race condition that requires a lot of luck to hit. Testing is
"code review" basically.

## Risk

The risk is low, instead of loading two pointers individually, we load
them together, and add an extra check for the race condition case.

**IMPORTANT**: If this backport is for a servicing release, please
verify that:

- For .NET 8 and .NET 9: The PR target branch is `release/X.0-staging`,
not `release/X.0`.
- For .NET 10+: The PR target branch is `release/X.0` (no `-staging`
suffix).

## Package authoring no longer needed in .NET 9

**IMPORTANT**: Starting with .NET 9, you no longer need to edit a NuGet
package's csproj to enable building and bump the version.
Keep in mind that we still need package authoring in .NET 8 and older
versions.

---------

Co-authored-by: Michal Strehovský <MichalStrehovsky@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crash in interface dispatch on ARM64

4 participants