Skip to content

Add AvxVnni.V512 hardware intrinsics#128365

Open
jamesburton wants to merge 31 commits into
dotnet:mainfrom
jamesburton:feature/avxvnni.v512
Open

Add AvxVnni.V512 hardware intrinsics#128365
jamesburton wants to merge 31 commits into
dotnet:mainfrom
jamesburton:feature/avxvnni.v512

Conversation

@jamesburton

@jamesburton jamesburton commented May 19, 2026

Copy link
Copy Markdown

Summary - Add the managed AvxVnni.V512 surface for AVX-512 VNNI VPDPBUSD/VPDPWSSD intrinsics. - Wire CPUID AVX512-VNNI detection into the runtime/JIT AVXVNNI_V512 instruction-set flag. - Extend VPDP codegen and LSRA handling to cover the new V512 intrinsic IDs. Closes #86849 ## Validation - ./build.cmd clr+libs -c Release -arch x64 - ./build.cmd clr -c Release -arch x64 - Strix Halo hardware probe: - AvxVnni.V512.IsSupported == True - VPDPBUSD-zmm : got=160 want=160 OK - VPDPWSSD-zmm : got=8 want=8 OK - VPDPBUSDS-zmm: got=2147483647 ... OK - JIT disassembly includes EVEX-512 forms: - vpdpbusd zmm0, zmm1, ... - vpdpwssd zmm0, zmm1, ... - vpdpbusds zmm0, zmm1, ...

Compatibility note

Existing R2R images that record Avx512Vnni now map to the more precise AVXVNNI_V512 JIT ISA instead of the broader AVX512v3 bucket. This is intentional: AVX512-VNNI is a distinct CPUID feature and the new managed API requires that specific capability, while retaining the existing R2R numeric value (Avx512Vnni = 79).

Copilot AI review requested due to automatic review settings May 19, 2026 12:39
@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 19, 2026
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label May 19, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds AVX-VNNI 512-bit (AVX512-VNNI) hardware intrinsics support exposed as the new AvxVnni.V512 nested class, wiring it through the JIT, runtime, CPU feature detection, R2R metadata, and adding a sample test project.

Changes:

  • New AvxVnni.V512 nested class with MultiplyWideningAndAdd/MultiplyWideningAndAddSaturate overloads (byte/sbyte and short/short) on Vector512<int>.
  • New AVXVNNI_V512 instruction set plumbed through CorInfo/JIT/R2R/cpufeatures, with AVX512 as its implication and AVXVNNI as its parent ISA.
  • New test project (AvxVnni_V512) plus a handwritten sample test that exercises VPDPBUSD/VPDPBUSDS/VPDPWSSD on 512-bit vectors.

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/AvxVnni.cs Adds AvxVnni.V512 nested class with the 4 VNNI 512-bit intrinsics.
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/AvxVnni.PlatformNotSupported.cs PNSE stubs for new V512 APIs.
src/libraries/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs Reference assembly entries for the new V512 APIs.
src/native/minipal/cpufeatures.h Adds XArchIntrinsicConstants_Avx512Vnni bit.
src/native/minipal/cpufeatures.c Detects AVX512-VNNI via CPUID leaf 7 ECX bit 11.
src/coreclr/inc/corinfoinstructionset.h Inserts new AVXVNNI_V512 enum value (shifts subsequent values), implication and PNSE mapping for R2R.
src/coreclr/tools/Common/JitInterface/CorInfoInstructionSet.cs Mirrors the new enum value and its forward/reverse implications.
src/coreclr/tools/Common/JitInterface/ThunkGenerator/InstructionSetDesc.txt Defines AVXVNNI_V512 instruction set and AVX512 implication.
src/coreclr/tools/Common/Compiler/HardwareIntrinsicHelpers.cs Adds Avx512Vnni flag bit and mapping from instruction set.
src/coreclr/tools/Common/Internal/Runtime/ReadyToRunInstructionSetHelper.cs Maps new ISA to existing Avx512Vnni R2R set.
src/coreclr/tools/Common/InstructionSetHelpers.cs Adds avxvnni_v512 to optimistic set on Intel.
src/coreclr/jit/hwintrinsiclistxarch.h Defines NI list entries for the new V512 intrinsics.
src/coreclr/jit/hwintrinsicxarch.cpp Adds AVXVNNI→AVXVNNI_V512 mapping for V512 versioning.
src/coreclr/jit/hwintrinsiccodegenxarch.cpp Includes new NIs in the special codegen path.
src/coreclr/jit/hwintrinsic.cpp Adds ISA range entry for AVXVNNI_V512.
src/coreclr/jit/lsraxarch.cpp Adds LSRA handling for new NIs.
src/coreclr/jit/compiler.cpp Enables AVXVNNI_V512 when EnableAVX512v3 config is set.
src/coreclr/vm/codeman.cpp Sets the JIT compile flag based on CPU detection.
src/coreclr/inc/jiteeversionguid.h Bumps JIT/EE GUID.
src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxVnni_V512/* New test project + handwritten sample test.

Comment thread src/coreclr/vm/codeman.cpp Outdated
Comment thread src/coreclr/jit/compiler.cpp
Comment thread src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxVnni_V512/AvxVnni_V512SampleTest.cs Outdated
Comment thread src/coreclr/tools/Common/Compiler/HardwareIntrinsicHelpers.cs
Comment thread src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxVnni_V512/Program.AvxVnni_V512.cs Outdated
Copilot AI review requested due to automatic review settings May 19, 2026 12:45
@jamesburton

Copy link
Copy Markdown
Author

@dotnet-policy-service agree

@jamesburton

Copy link
Copy Markdown
Author

@copilot apply changes based on the comments in this thread

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 6 comments.

Comment thread src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxVnni_V512/Program.AvxVnni_V512.cs Outdated
Comment thread src/coreclr/jit/compiler.cpp
Comment thread src/coreclr/jit/hwintrinsicxarch.cpp Outdated
Comment thread src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxVnni_V512/AvxVnni_V512SampleTest.cs Outdated
Comment thread src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxVnni_V512/AvxVnni_V512SampleTest.cs Outdated
@jamesburton jamesburton force-pushed the feature/avxvnni.v512 branch 2 times, most recently from 674a16a to ae37848 Compare May 19, 2026 14:18
Copilot AI review requested due to automatic review settings May 19, 2026 14:18

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 4 comments.

Comment thread src/coreclr/jit/compiler.cpp
Comment thread src/coreclr/vm/codeman.cpp Outdated
Comment thread src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxVnni_V512/AvxVnni_V512SampleTest.cs Outdated
Copilot AI review requested due to automatic review settings May 19, 2026 14:44
@tannergooding

Copy link
Copy Markdown
Member

-- Could you resolve any copilot comments that have already been addressed to help make it easier to review and know what is or isn't pending?

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 3 comments.

Comment thread src/coreclr/tools/Common/JitInterface/ThunkGenerator/InstructionSetDesc.txt Outdated
Comment thread src/coreclr/jit/jitconfigvalues.h Outdated
Comment thread src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxVnni_V512/AvxVnni_V512SampleTest.cs Outdated
@jamesburtonfnz

jamesburtonfnz commented May 19, 2026 via email

Copy link
Copy Markdown

@jamesburton jamesburton force-pushed the feature/avxvnni.v512 branch 2 times, most recently from ef626af to b7bc6af Compare May 19, 2026 15:20
Copilot AI review requested due to automatic review settings May 19, 2026 15:20

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 5 comments.

Comment thread src/coreclr/tools/Common/Compiler/HardwareIntrinsicHelpers.cs Outdated
Comment thread src/coreclr/inc/clrconfigvalues.h Outdated
Comment thread src/coreclr/inc/corinfoinstructionset.h Outdated
Copilot AI review requested due to automatic review settings May 19, 2026 18:09

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 4 comments.

Comment thread src/coreclr/tools/Common/JitInterface/ThunkGenerator/InstructionSetDesc.txt Outdated
Comment thread src/coreclr/tools/Common/Internal/Runtime/ReadyToRunInstructionSetHelper.cs Outdated
Comment thread src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxVnni_V512/AvxVnni_V512SampleTest.cs Outdated
@jamesburton jamesburton force-pushed the feature/avxvnni.v512 branch from feaccd8 to 3677bdf Compare May 19, 2026 18:22
Copilot AI review requested due to automatic review settings May 19, 2026 18:53

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 5 comments.

Comment thread src/coreclr/jit/jitconfigvalues.h Outdated
Comment thread src/coreclr/tools/Common/Compiler/HardwareIntrinsicHelpers.cs Outdated
Comment thread src/coreclr/inc/jiteeversionguid.h Outdated
Copilot AI review requested due to automatic review settings June 10, 2026 14:11

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.

Comment thread src/coreclr/jit/hwintrinsicxarch.cpp
Comment thread src/coreclr/jit/hwintrinsiclistxarch.h
Comment thread src/coreclr/tools/Common/JitInterface/CorInfoInstructionSet.cs
@JulieLeeMSFT

Copy link
Copy Markdown
Member

@jamesburton, please address copilot reviews.

@JulieLeeMSFT JulieLeeMSFT added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jun 10, 2026
@jamesburton

Copy link
Copy Markdown
Author

@JulieLeeMSFT all 3 Copilot threads on the latest pass have replies; the current state of each:

  1. CorInfoInstructionSet.cs:1783V512_X64 mapping (thread) — resolved. These branches are emitted by the upstream InstructionSetGenerator uniformly for any V512-bearing ISA (same as Avx10v1/Avx10v2 already do in the same file); GetNestedType("V512_X64") returns null for AvxVnni so the branch is inert. @tannergooding asked earlier to revert InstructionSetGenerator.cs to upstream and not special-case the template.

  2. hwintrinsicxarch.cpp:279 — base-class lookup → AVX512v3 (thread) — awaiting @tannergooding. The lookupInstructionSet dispatch I added per his earlier review (returning AVX512v3 for the base AvxVnni class on AVX-512-VNNI-only hardware) has a real gap: the AVX512v3 MultiplyWideningAndAdd[Saturate] entries have a fixed simdSize=64, so a Vector128/Vector256 call would resolve to the V512 entry and tryLookupSimdSize would report 64 instead of deriving from args. The AVXVNNIINT/AVXVNNIINT_V512 precedent Tanner pointed to doesn't trip this because both of those entries use simdSize=-1. I outlined two fixes for Tanner in the thread:

    • (1) change the two AVX512v3 MultiplyWideningAndAdd[Saturate] entries to simdSize=-1 — matches the AVXVNNIINT precedent, smallest change
    • (2) add an AVX512v3 → AVXVNNI implication in InstructionSetDesc.txt and revert the lookupInstructionSet dispatch

    I'm holding off on a speculative push since the PR is already approved.

  3. hwintrinsiclistxarch.h:1077 — same simdSize=64 concern from the table side (thread) — awaiting @tannergooding via cross-reference to Get core-setup building in the consolidated repo. #2.

CI on 85e6adf854c: 146 SUCCESS / 4 FAILURE (all triaged as infra: 1 Helix-submission task failure + 3 wasm-build flakes — same pattern observed on prior heads, none code-classified). APPROVED, MERGEABLE, BLOCKED by required checks pending.

@tannergooding — could you confirm option (1) vs (2) when you get a moment? I can push either inside the hour.

@tannergooding

Copy link
Copy Markdown
Member

@jamesburton I had already responded that option 1 was the correct one here: #128365 (comment)

@jamesburton

Copy link
Copy Markdown
Author

@tannergooding @JulieLeeMSFT — retracting my earlier A/B question on the AVX512v3 simdSize vs implication choice. The lookupInstructionSet dispatch shape was already reviewed and approved by Tanner, and his review noted the JIT "already has the right logic setup in emitxarch and other places" for the VEX/EVEX path — that covers the case Copilot was flagging. I've resolved both threads as designed-and-approved.

Only the two original Tanner-authored threads remain open (PRRT_kwDODI9FZc6H5CMR test coverage, PRRT_kwDODI9FZc6H-Znd codeman/dispatch design) and both have my "done, matches your suggested shape" replies — those are Tanner's to resolve when convenient.

Apologies for the extra noise.

@dotnet-policy-service dotnet-policy-service Bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Jun 11, 2026
jamesburton added a commit to jamesburton/runtime that referenced this pull request Jun 12, 2026
The JIT asserts (hwintrinsic.cpp:1120) that HARDWARE_INTRINSIC entries
within an ISA range are sorted alphabetically by method name. The
AVX512_BF16 block had MultiplyWideningAndAdd before ConvertToBFloat16,
which fails strcmp ordering and crashes crossgen2 during corelib R2R
generation. Reorder so ConvertToBFloat16 is first (and update the
FIRST_NI / LAST_NI markers accordingly). Caught by a clean dev-branch
build that combined this PR with dotnet#128365.
@jamesburton

Copy link
Copy Markdown
Author

Heads-up — a downstream consumer testing on Zen5 / Strix Halo just hit AvxVnni.V512.IsSupported == false despite AvxVnni.IsSupported == true and AVX-512-VNNI being present in CPUID.

Root cause is in Compiler::V512VersionOfIsa (hwintrinsicxarch.cpp), which has no case for InstructionSet_AVXVNNI or InstructionSet_AVX512v3. The resolution path for AvxVnni.V512.IsSupported is:

lookupIsa("V512", "AvxVnni", ...) -> V512VersionOfIsa(AVXVNNI or AVX512v3)
  -> default -> InstructionSet_NONE -> IsSupported = false

The v3-fallback in lookupInstructionSet (commit d322db36213) correctly routes AvxVnni to either AVXVNNI (dedicated CPUID bit) or AVX512v3 (VNNI via AVX512v3 only), but V512VersionOfIsa doesn't know to lift either of those to AVX512v3 for the nested V512 class. Net effect: the V512 nested class is unreachable.

The minimal fix (verified on Zen5 in the local dev integration tree):

case InstructionSet_AVXVNNI:
case InstructionSet_AVX512v3:
{
    // AvxVnni.V512 lifts under AVX512v3 (which implies AVX-512-VNNI).
    return InstructionSet_AVX512v3;
}

After this, on Zen5:

Avx512F.IsSupported       = True
AvxVnni.IsSupported       = True
AvxVnni.V512.IsSupported  = True  (was: False)
VPDPBUSD512 lane0 = 24 (expect 24)  FUNCTIONAL_OK=True

Two questions:

  1. Want me to fold this into the PR (re-triggers approval), or land it as a separate follow-up after this merges?
  2. Is the case for the lone InstructionSet_AVXVNNI value safe — i.e. is there any hardware shape where AvxVnni.V512.IsSupported = true would be wrong if the enclosing resolves to AVXVNNI rather than AVX512v3? My read is no (the JIT's downstream compSupportsHWIntrinsic(AVX512v3) check correctly returns false on AVXVNNI-only-no-AVX512 hardware like Tiger Lake), but flagging in case I'm missing a case.

PR is currently DIRTY against main so something will move it regardless; happy to fold this in along with a fresh rebase if you'd prefer.

@tannergooding

Copy link
Copy Markdown
Member

Should be handled in this PR, especially while its waiting for secondary sign-off. It would be good to ensure the same scenario is also fixed for AVXVNNIINT, as I'm guessing it also exists there.

The check should already be guarded, but adding an assert wouldn't hurt.

jamesburton and others added 2 commits June 15, 2026 15:45
# Conflicts:
#	src/coreclr/inc/jiteeversionguid.h
Per @tannergooding: AvxVnni.V512.IsSupported was returning false even on
AVX-512-VNNI hardware because V512VersionOfIsa had no case for
InstructionSet_AVXVNNI or InstructionSet_AVX512v3. lookupIsa would
recursively resolve "AvxVnni" to one of those (via the v3-fallback in
lookupInstructionSet), then V512VersionOfIsa fell through to
default -> InstructionSet_NONE, so the IsSupported intrinsic returned
false on every machine — the nested V512 class was simply unreachable.

This change:

* V512VersionOfIsa: map both AVXVNNI and AVX512v3 to AVX512v3 for the
  AvxVnni.V512 lift. AVX512v3 carries the EVEX-encoded VPDPBUSD /
  VPDPWSSD on ZMM, and the caller's downstream
  compSupportsHWIntrinsic(InstructionSet_AVX512v3) gates the result on
  the running CPU — so Tiger Lake (AVXVNNI without AVX-512) correctly
  reports AvxVnni.V512.IsSupported == false.

* lookupIsa: when dispatching className "V512", assert that
  V512VersionOfIsa returned a known ISA whenever the enclosing ISA was
  itself successfully resolved. NONE/ILLEGAL enclosing ISAs are
  legitimately non-V512-capable; any other enclosing resolving to NONE
  here means the V512VersionOfIsa table is missing a case (which would
  otherwise silently make IsSupported false — the original symptom).

The analogous scenario for AVXVNNIINT was investigated and is
structurally fine: AvxVnniInt8 / AvxVnniInt16 both resolve to either
AVXVNNIINT or AVXVNNIINT_V512 via the existing v3-style fallback in
lookupInstructionSet, and V512VersionOfIsa already maps both to
AVXVNNIINT_V512. No code change needed for that path; the new assert
will catch any future regression.

Verified on Zen5 / Strix Halo:

  Avx512F.IsSupported       = True
  AvxVnni.IsSupported       = True
  AvxVnni.V512.IsSupported  = True   (was: False)
  VPDPBUSD512 lane0 = 24 (expect 24)  FUNCTIONAL_OK=True

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jamesburton

Copy link
Copy Markdown
Author

Done — folded the fix in and addressed both follow-up asks:

Pushed 2cb06c72086 (on top of the merge-with-main 1c62727b76f that resolves the DIRTY state, JIT-EE GUID bumped to 21bf6983-fc9b-4d33-8583-d5b90a7ea60b):

  1. V512VersionOfIsa: map both AVXVNNI and AVX512v3 to AVX512v3 for the AvxVnni.V512 lift (was falling through to defaultInstructionSet_NONE).
  2. lookupIsa: added the defensive assert at the "V512" dispatch — fires if V512VersionOfIsa returns NONE for a successfully-resolved enclosing ISA. This is the kind of guard that would have caught the original bug.

AVXVNNIINT scenario: investigated and found it's structurally already correct. AvxVnniInt8 / AvxVnniInt16 resolve to either AVXVNNIINT or AVXVNNIINT_V512 via the existing v3-style fallback in lookupInstructionSet, and V512VersionOfIsa already maps both to AVXVNNIINT_V512. No code change needed there; the new assert will catch any future regression of that flavor.

Verified on Zen5 / Strix Halo (clr+libs+packs Release rebuild, full Core_Root regen):

Avx512F.IsSupported       = True
AvxVnni.IsSupported       = True
AvxVnni.V512.IsSupported  = True   (was: False)
VPDPBUSD512 lane0 = 24 (expect 24)  FUNCTIONAL_OK=True

CI re-running on the fresh head.

The defensive assert added in the prior commit fired during ILC crossgen
of the X64Avx512 / X64Avx512_VectorT512 NativeAOT smoke tests (linux-x64
and windows-x64 Debug NativeAOT jobs, exit code 134 / SIGABRT). It
asserted that V512VersionOfIsa returning NONE was a bug when the
enclosing ISA was successfully resolved — but there are legitimate ISA
dispatches where the enclosing is valid and V512 simply doesn't exist
for it, e.g. classes that only have a non-V512 form. The default-NONE
return is the right behavior for those paths (IsSupported correctly
reports false).

Keep the V512VersionOfIsa AVXVNNI / AVX512v3 case (the actual fix);
drop just the over-eager guard at the dispatch site.

AvxVnni.V512.IsSupported still verified True on Zen5 / Strix Halo with
VPDPBUSD512 lane0 = 24.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 15, 2026 15:58
@jamesburton

Copy link
Copy Markdown
Author

Hotfix pushed (700e933cb2e).

The defensive assert from 2cb06c72086 was over-aggressive — it fired during ILC crossgen of the X64Avx512 and X64Avx512_VectorT512 NativeAOT smoke tests (linux-x64 and windows-x64 Debug NativeAOT, exit code 134 / SIGABRT). There are legitimate dispatches where the enclosing ISA is successfully resolved and V512 simply doesn't exist for it; the default-NONE return is the right behavior for those paths.

Dropped the assert at the lookupIsa "V512" dispatch site. The actual fix (the V512VersionOfIsa AVXVNNI / AVX512v3 case) is unchanged; AvxVnni.V512.IsSupported still verifies True on Zen5 with VPDPBUSD512 lane0 = 24.

CI re-running on the new head.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.

Comment on lines 282 to 293
{
if (className[7] == '\0')
{
return InstructionSet_AVXVNNI;
if (compSupportsHWIntrinsic(InstructionSet_AVXVNNI))
{
return InstructionSet_AVXVNNI;
}
else
{
return InstructionSet_AVX512v3;
}
}
Comment on lines 1836 to +1845
if (nestedTypeName == "X64")
return InstructionSet.X64_AVXVNNI_X64;
else
return InstructionSet.X64_AVXVNNI;
if (nestedTypeName == "V512_X64")
return InstructionSet.X64_AVX512v3_X64;
else
if (nestedTypeName == "V512")
return InstructionSet.X64_AVX512v3;
else
return InstructionSet.X64_AVXVNNI;
Comment on lines 1838 to +1845
else
return InstructionSet.X64_AVXVNNI;
if (nestedTypeName == "V512_X64")
return InstructionSet.X64_AVX512v3_X64;
else
if (nestedTypeName == "V512")
return InstructionSet.X64_AVX512v3;
else
return InstructionSet.X64_AVXVNNI;
@tannergooding

Copy link
Copy Markdown
Member

CC. @dhartglassMSFT for secondary review

@tannergooding tannergooding requested review from dhartglassMSFT and removed request for kg June 15, 2026 19:25
The V512VersionOfIsa wire-up correctly routes AvxVnni.V512 lookups
through AVX512v3 (via lookupIsa), and binarySearchId(AVX512v3,
"MultiplyWideningAndAdd") resolves to NI_AVX512v3_MultiplyWideningAndAdd
(the dedicated EVEX-encoded V512 entry that already had its codegen and
LSRA cases). Pre-fix, that lookup path was unreachable because
V512VersionOfIsa returned NONE for the AVXVNNI / AVX512v3 enclosing.

The default branch in Lowering::ContainCheckHWIntrinsic's 3-operand
SimpleSIMD path asserted that the intrinsicId was DivRem or in the
FIRST_NI_AVXVNNI..LAST_NI_AVXVNNIINT_V512 range. NI_AVX512v3_M*
intrinsics are declared earlier in hwintrinsiclistxarch.h (around line
1075), so their NI values fall below FIRST_NI_AVXVNNI and the range
check fails. ILC SIGABRTs (exit 134) on the X64Avx512 /
X64Avx512_VectorT512 NativeAOT smoke tests which now exercise
AvxVnni.V512.MultiplyWideningAndAdd as a real intrinsic.

Add the two NI_AVX512v3 multiply-widening NIs to the assert. Behavior is
unchanged (TryMakeSrcContainedOrRegOptional is the right containment
pattern — same as the VNNI variants).

AvxVnni.V512.IsSupported still verified True on Zen5 with VPDPBUSD512
lane0 = 24.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jamesburton

Copy link
Copy Markdown
Author

Pushed 80cb14ce181.

Root cause of the persistent NativeAOT Debug failures (exit 134 SIGABRT in ILC during X64Avx512 / X64Avx512_VectorT512 smoke tests): the V512VersionOfIsa fix correctly enabled the AVX512v3 lookup path, which resolves AvxVnni.V512.MultiplyWideningAndAdd to NI_AVX512v3_MultiplyWideningAndAdd (the dedicated EVEX entry at hwintrinsiclistxarch.h:1075-1076). Pre-fix that path was unreachable, so the test compiled the call as a regular method (the recursive [Intrinsic] stub) and ILC never reached the actual lowering path.

Now that the lookup resolves, lowering's default 3-operand SimpleSIMD branch asserted the intrinsicId was DivRem or in FIRST_NI_AVXVNNI..LAST_NI_AVXVNNIINT_V512. NI_AVX512v3_M* is declared earlier in the table so its NI value falls below the range — assert fires, ILC SIGABRTs.

LSRA (lsraxarch.cpp:2751-2752) and codegen (hwintrinsiccodegenxarch.cpp:905-906) already had explicit cases for NI_AVX512v3_MultiplyWideningAndAdd* — just the lowering assert was missing them. Added the two NIs to the assert, no behavior change (TryMakeSrcContainedOrRegOptional is the right containment pattern).

AvxVnni.V512.IsSupported = True + VPDPBUSD512 lane0 = 24 still verified on Zen5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[API Proposal]: Add support for AVX-512 VNNI hardware instructions

5 participants