Add AvxVnni.V512 hardware intrinsics#128365
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds AVX-VNNI 512-bit (AVX512-VNNI) hardware intrinsics support exposed as the new AvxVnni.V512 nested class, wiring it through the JIT, runtime, CPU feature detection, R2R metadata, and adding a sample test project.
Changes:
- New
AvxVnni.V512nested class withMultiplyWideningAndAdd/MultiplyWideningAndAddSaturateoverloads (byte/sbyte and short/short) onVector512<int>. - New
AVXVNNI_V512instruction set plumbed through CorInfo/JIT/R2R/cpufeatures, with AVX512 as its implication and AVXVNNI as its parent ISA. - New test project (
AvxVnni_V512) plus a handwritten sample test that exercises VPDPBUSD/VPDPBUSDS/VPDPWSSD on 512-bit vectors.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/AvxVnni.cs | Adds AvxVnni.V512 nested class with the 4 VNNI 512-bit intrinsics. |
| src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/AvxVnni.PlatformNotSupported.cs | PNSE stubs for new V512 APIs. |
| src/libraries/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs | Reference assembly entries for the new V512 APIs. |
| src/native/minipal/cpufeatures.h | Adds XArchIntrinsicConstants_Avx512Vnni bit. |
| src/native/minipal/cpufeatures.c | Detects AVX512-VNNI via CPUID leaf 7 ECX bit 11. |
| src/coreclr/inc/corinfoinstructionset.h | Inserts new AVXVNNI_V512 enum value (shifts subsequent values), implication and PNSE mapping for R2R. |
| src/coreclr/tools/Common/JitInterface/CorInfoInstructionSet.cs | Mirrors the new enum value and its forward/reverse implications. |
| src/coreclr/tools/Common/JitInterface/ThunkGenerator/InstructionSetDesc.txt | Defines AVXVNNI_V512 instruction set and AVX512 implication. |
| src/coreclr/tools/Common/Compiler/HardwareIntrinsicHelpers.cs | Adds Avx512Vnni flag bit and mapping from instruction set. |
| src/coreclr/tools/Common/Internal/Runtime/ReadyToRunInstructionSetHelper.cs | Maps new ISA to existing Avx512Vnni R2R set. |
| src/coreclr/tools/Common/InstructionSetHelpers.cs | Adds avxvnni_v512 to optimistic set on Intel. |
| src/coreclr/jit/hwintrinsiclistxarch.h | Defines NI list entries for the new V512 intrinsics. |
| src/coreclr/jit/hwintrinsicxarch.cpp | Adds AVXVNNI→AVXVNNI_V512 mapping for V512 versioning. |
| src/coreclr/jit/hwintrinsiccodegenxarch.cpp | Includes new NIs in the special codegen path. |
| src/coreclr/jit/hwintrinsic.cpp | Adds ISA range entry for AVXVNNI_V512. |
| src/coreclr/jit/lsraxarch.cpp | Adds LSRA handling for new NIs. |
| src/coreclr/jit/compiler.cpp | Enables AVXVNNI_V512 when EnableAVX512v3 config is set. |
| src/coreclr/vm/codeman.cpp | Sets the JIT compile flag based on CPU detection. |
| src/coreclr/inc/jiteeversionguid.h | Bumps JIT/EE GUID. |
| src/tests/JIT/HardwareIntrinsics/X86_Avx/AvxVnni_V512/* | New test project + handwritten sample test. |
|
@dotnet-policy-service agree |
|
@copilot apply changes based on the comments in this thread |
674a16a to
ae37848
Compare
|
-- Could you resolve any copilot comments that have already been addressed to help make it easier to review and know what is or isn't pending? |
|
Sure ... I'm actively looping through the feedback and will update each as they are resolved ... I see it is quite zealous and the stream of updates keeps a constant rebasing challenge but hopefully should be able to tidy this up enough today. I have identified 4 more related issues that I hope to apply that are related to this to unlock even further performance with some of these newer operators.
…________________________________
From: Tanner Gooding ***@***.***>
Sent: 19 May 2026 15:47
To: dotnet/runtime ***@***.***>
Cc: James Burton ***@***.***>; Comment ***@***.***>
Subject: Re: [dotnet/runtime] Add AvxVnni.V512 hardware intrinsics (PR #128365)
[https://avatars.githubusercontent.com/u/10487869?s=20&v=4]tannergooding left a comment (dotnet/runtime#128365)<#128365 (comment)>
-- Could you resolve any copilot comments that have already been addressed to help make it easier to review and know what is or isn't pending?
—
Reply to this email directly, view it on GitHub<#128365 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BIV6FFYGY6EGX5KSJF5UDND43RXXHAVCNFSM6AAAAACZEJ6OKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DIOBYHE2DCMBQGU>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID: ***@***.***>
|
ef626af to
b7bc6af
Compare
feaccd8 to
3677bdf
Compare
5f7c1fc to
bad990a
Compare
|
@jamesburton, please address copilot reviews. |
|
@JulieLeeMSFT all 3 Copilot threads on the latest pass have replies; the current state of each:
CI on @tannergooding — could you confirm option (1) vs (2) when you get a moment? I can push either inside the hour. |
|
@jamesburton I had already responded that option 1 was the correct one here: #128365 (comment) |
|
@tannergooding @JulieLeeMSFT — retracting my earlier A/B question on the AVX512v3 simdSize vs implication choice. The lookupInstructionSet dispatch shape was already reviewed and approved by Tanner, and his review noted the JIT "already has the right logic setup in Only the two original Tanner-authored threads remain open ( Apologies for the extra noise. |
The JIT asserts (hwintrinsic.cpp:1120) that HARDWARE_INTRINSIC entries within an ISA range are sorted alphabetically by method name. The AVX512_BF16 block had MultiplyWideningAndAdd before ConvertToBFloat16, which fails strcmp ordering and crashes crossgen2 during corelib R2R generation. Reorder so ConvertToBFloat16 is first (and update the FIRST_NI / LAST_NI markers accordingly). Caught by a clean dev-branch build that combined this PR with dotnet#128365.
|
Heads-up — a downstream consumer testing on Zen5 / Strix Halo just hit Root cause is in The v3-fallback in The minimal fix (verified on Zen5 in the local dev integration tree): case InstructionSet_AVXVNNI:
case InstructionSet_AVX512v3:
{
// AvxVnni.V512 lifts under AVX512v3 (which implies AVX-512-VNNI).
return InstructionSet_AVX512v3;
}After this, on Zen5: Two questions:
PR is currently |
|
Should be handled in this PR, especially while its waiting for secondary sign-off. It would be good to ensure the same scenario is also fixed for AVXVNNIINT, as I'm guessing it also exists there. The check should already be guarded, but adding an assert wouldn't hurt. |
# Conflicts: # src/coreclr/inc/jiteeversionguid.h
Per @tannergooding: AvxVnni.V512.IsSupported was returning false even on AVX-512-VNNI hardware because V512VersionOfIsa had no case for InstructionSet_AVXVNNI or InstructionSet_AVX512v3. lookupIsa would recursively resolve "AvxVnni" to one of those (via the v3-fallback in lookupInstructionSet), then V512VersionOfIsa fell through to default -> InstructionSet_NONE, so the IsSupported intrinsic returned false on every machine — the nested V512 class was simply unreachable. This change: * V512VersionOfIsa: map both AVXVNNI and AVX512v3 to AVX512v3 for the AvxVnni.V512 lift. AVX512v3 carries the EVEX-encoded VPDPBUSD / VPDPWSSD on ZMM, and the caller's downstream compSupportsHWIntrinsic(InstructionSet_AVX512v3) gates the result on the running CPU — so Tiger Lake (AVXVNNI without AVX-512) correctly reports AvxVnni.V512.IsSupported == false. * lookupIsa: when dispatching className "V512", assert that V512VersionOfIsa returned a known ISA whenever the enclosing ISA was itself successfully resolved. NONE/ILLEGAL enclosing ISAs are legitimately non-V512-capable; any other enclosing resolving to NONE here means the V512VersionOfIsa table is missing a case (which would otherwise silently make IsSupported false — the original symptom). The analogous scenario for AVXVNNIINT was investigated and is structurally fine: AvxVnniInt8 / AvxVnniInt16 both resolve to either AVXVNNIINT or AVXVNNIINT_V512 via the existing v3-style fallback in lookupInstructionSet, and V512VersionOfIsa already maps both to AVXVNNIINT_V512. No code change needed for that path; the new assert will catch any future regression. Verified on Zen5 / Strix Halo: Avx512F.IsSupported = True AvxVnni.IsSupported = True AvxVnni.V512.IsSupported = True (was: False) VPDPBUSD512 lane0 = 24 (expect 24) FUNCTIONAL_OK=True Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Done — folded the fix in and addressed both follow-up asks: Pushed
AVXVNNIINT scenario: investigated and found it's structurally already correct. Verified on Zen5 / Strix Halo (clr+libs+packs Release rebuild, full Core_Root regen): CI re-running on the fresh head. |
The defensive assert added in the prior commit fired during ILC crossgen of the X64Avx512 / X64Avx512_VectorT512 NativeAOT smoke tests (linux-x64 and windows-x64 Debug NativeAOT jobs, exit code 134 / SIGABRT). It asserted that V512VersionOfIsa returning NONE was a bug when the enclosing ISA was successfully resolved — but there are legitimate ISA dispatches where the enclosing is valid and V512 simply doesn't exist for it, e.g. classes that only have a non-V512 form. The default-NONE return is the right behavior for those paths (IsSupported correctly reports false). Keep the V512VersionOfIsa AVXVNNI / AVX512v3 case (the actual fix); drop just the over-eager guard at the dispatch site. AvxVnni.V512.IsSupported still verified True on Zen5 / Strix Halo with VPDPBUSD512 lane0 = 24. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Hotfix pushed ( The defensive assert from Dropped the assert at the CI re-running on the new head. |
| { | ||
| if (className[7] == '\0') | ||
| { | ||
| return InstructionSet_AVXVNNI; | ||
| if (compSupportsHWIntrinsic(InstructionSet_AVXVNNI)) | ||
| { | ||
| return InstructionSet_AVXVNNI; | ||
| } | ||
| else | ||
| { | ||
| return InstructionSet_AVX512v3; | ||
| } | ||
| } |
| if (nestedTypeName == "X64") | ||
| return InstructionSet.X64_AVXVNNI_X64; | ||
| else | ||
| return InstructionSet.X64_AVXVNNI; | ||
| if (nestedTypeName == "V512_X64") | ||
| return InstructionSet.X64_AVX512v3_X64; | ||
| else | ||
| if (nestedTypeName == "V512") | ||
| return InstructionSet.X64_AVX512v3; | ||
| else | ||
| return InstructionSet.X64_AVXVNNI; |
| else | ||
| return InstructionSet.X64_AVXVNNI; | ||
| if (nestedTypeName == "V512_X64") | ||
| return InstructionSet.X64_AVX512v3_X64; | ||
| else | ||
| if (nestedTypeName == "V512") | ||
| return InstructionSet.X64_AVX512v3; | ||
| else | ||
| return InstructionSet.X64_AVXVNNI; |
|
CC. @dhartglassMSFT for secondary review |
The V512VersionOfIsa wire-up correctly routes AvxVnni.V512 lookups through AVX512v3 (via lookupIsa), and binarySearchId(AVX512v3, "MultiplyWideningAndAdd") resolves to NI_AVX512v3_MultiplyWideningAndAdd (the dedicated EVEX-encoded V512 entry that already had its codegen and LSRA cases). Pre-fix, that lookup path was unreachable because V512VersionOfIsa returned NONE for the AVXVNNI / AVX512v3 enclosing. The default branch in Lowering::ContainCheckHWIntrinsic's 3-operand SimpleSIMD path asserted that the intrinsicId was DivRem or in the FIRST_NI_AVXVNNI..LAST_NI_AVXVNNIINT_V512 range. NI_AVX512v3_M* intrinsics are declared earlier in hwintrinsiclistxarch.h (around line 1075), so their NI values fall below FIRST_NI_AVXVNNI and the range check fails. ILC SIGABRTs (exit 134) on the X64Avx512 / X64Avx512_VectorT512 NativeAOT smoke tests which now exercise AvxVnni.V512.MultiplyWideningAndAdd as a real intrinsic. Add the two NI_AVX512v3 multiply-widening NIs to the assert. Behavior is unchanged (TryMakeSrcContainedOrRegOptional is the right containment pattern — same as the VNNI variants). AvxVnni.V512.IsSupported still verified True on Zen5 with VPDPBUSD512 lane0 = 24. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Pushed Root cause of the persistent NativeAOT Debug failures (exit 134 SIGABRT in ILC during Now that the lookup resolves, lowering's default 3-operand SimpleSIMD branch asserted the intrinsicId was DivRem or in LSRA (
|
Summary - Add the managed
AvxVnni.V512surface for AVX-512 VNNI VPDPBUSD/VPDPWSSD intrinsics. - Wire CPUID AVX512-VNNI detection into the runtime/JITAVXVNNI_V512instruction-set flag. - Extend VPDP codegen and LSRA handling to cover the new V512 intrinsic IDs. Closes #86849 ## Validation -./build.cmd clr+libs -c Release -arch x64-./build.cmd clr -c Release -arch x64- Strix Halo hardware probe: -AvxVnni.V512.IsSupported == True-VPDPBUSD-zmm : got=160 want=160 OK-VPDPWSSD-zmm : got=8 want=8 OK-VPDPBUSDS-zmm: got=2147483647 ... OK- JIT disassembly includes EVEX-512 forms: -vpdpbusd zmm0, zmm1, ...-vpdpwssd zmm0, zmm1, ...-vpdpbusds zmm0, zmm1, ...Compatibility note
Existing R2R images that record
Avx512Vnninow map to the more preciseAVXVNNI_V512JIT ISA instead of the broaderAVX512v3bucket. This is intentional: AVX512-VNNI is a distinct CPUID feature and the new managed API requires that specific capability, while retaining the existing R2R numeric value (Avx512Vnni = 79).