Skip to content

JIT: Use ValueTuple argument for SIMD type lookup for AdvSimd.Store*#107099

Merged
amanasifkhalid merged 4 commits into
dotnet:mainfrom
amanasifkhalid:flag-valuetuple-intrinsics
Aug 29, 2024
Merged

JIT: Use ValueTuple argument for SIMD type lookup for AdvSimd.Store*#107099
amanasifkhalid merged 4 commits into
dotnet:mainfrom
amanasifkhalid:flag-valuetuple-intrinsics

Conversation

@amanasifkhalid

@amanasifkhalid amanasifkhalid commented Aug 28, 2024

Copy link
Copy Markdown
Contributor

Follow-up to #106765. Store and StoreVectorAndZip, like StoreSelectedScalar, have variants that operate on ValueTuples of Vectors, and in the case of the former intrinsic, the SIMD size has to be determined at runtime since it can operate on 64- and 128-bit Vectors. Thus, the JIT needs to determine the Vector size used by the intrinsic overloads' ValueTuple argument.

There are a few other intrinsics (like LoadAndInsertScalar) that also operate on ValueTuples, and the JIT currently differentiates between these variants with dedicated NamedIntrinsics (LoadAndInsertScalarVector64x2, LoadAndInsertScalarVector64x3, etc). I think these can be unified into one NamedIntrinsic and use the ValueTuple lookup logic, in case we want to clean these up.

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 28, 2024
@amanasifkhalid

Copy link
Copy Markdown
Contributor Author

@tannergooding I did a quick parse of the intrinsics that take ValueTuples, and none of them need to look up their SIMD sizes at runtime, so are there any correctness issues we need to fix here? I only see potential for cleanup with this new functionality.

Comment thread src/coreclr/jit/hwintrinsiclistarm64.h Outdated
HARDWARE_INTRINSIC(AdvSimd, SignExtendWideningUpper, 16, 1, {INS_sxtl2, INS_invalid, INS_sxtl2, INS_invalid, INS_sxtl2, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_BaseTypeFromFirstArg)
HARDWARE_INTRINSIC(AdvSimd, SqrtScalar, 8, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_fsqrt, INS_fsqrt}, HW_Category_SIMD, HW_Flag_SIMDScalar)
HARDWARE_INTRINSIC(AdvSimd, Store, 8, 2, {INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_invalid, INS_invalid, INS_st1_2regs, INS_invalid}, HW_Category_MemoryStore, HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialImport|HW_Flag_SpecialCodeGen|HW_Flag_NeedsConsecutiveRegisters)
HARDWARE_INTRINSIC(AdvSimd, Store, 8, 2, {INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_invalid, INS_invalid, INS_st1_2regs, INS_invalid}, HW_Category_MemoryStore, HW_Flag_BaseTypeFromSecondArg|HW_Flag_BaseTypeFromValueTupleArg|HW_Flag_SpecialImport|HW_Flag_SpecialCodeGen|HW_Flag_NeedsConsecutiveRegisters)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SIMD size here isn't correct as the size is not always 8. Consider, for example:

/// <summary>
/// <para>void vst1q_u8 (uint8_t * ptr, uint8x16_t val)</para>
/// <para> A32: VST1.8 { Dd, Dd+1 }, [Rn]</para>
/// <para> A64: ST1 { Vt.16B }, [Xn]</para>
/// </summary>
public static unsafe void Store(byte* address, Vector128<byte> source) { throw new PlatformNotSupportedException(); }

or in the case of AdvSimd_Arm64:

/// <summary> A64: ST1 { Vn.16B, Vn+1.16B }, [Xn]</summary>
public static unsafe void Store(byte* address, (Vector128<byte> Value1, Vector128<byte> Value2) value) { throw new PlatformNotSupportedException(); }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so this one needs to look up the size at runtime, too. Let me fix that.

Comment thread src/coreclr/jit/hwintrinsiclistarm64.h Outdated
HARDWARE_INTRINSIC(AdvSimd_Arm64, ShiftRightLogicalRoundedNarrowingSaturateScalar, 8, 2, {INS_uqrshrn, INS_uqrshrn, INS_uqrshrn, INS_uqrshrn, INS_uqrshrn, INS_uqrshrn, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_ShiftRightByImmediate, HW_Flag_HasImmediateOperand|HW_Flag_SIMDScalar)
HARDWARE_INTRINSIC(AdvSimd_Arm64, Sqrt, -1, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_fsqrt, INS_fsqrt}, HW_Category_SIMD, HW_Flag_NoFlag)
HARDWARE_INTRINSIC(AdvSimd_Arm64, Store, 16, 2, {INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs}, HW_Category_MemoryStore, HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialImport|HW_Flag_SpecialCodeGen|HW_Flag_NeedsConsecutiveRegisters)
HARDWARE_INTRINSIC(AdvSimd_Arm64, Store, -1, 2, {INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs}, HW_Category_MemoryStore, HW_Flag_BaseTypeFromSecondArg|HW_Flag_BaseTypeFromValueTupleArg|HW_Flag_SpecialImport|HW_Flag_SpecialCodeGen|HW_Flag_NeedsConsecutiveRegisters)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this one only has overloads dealing with Vector128

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right; fixed.

@tannergooding tannergooding left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I believe this is all the places that are using tuple and that had been changed from .NET 8, but I notably did not compare line by line to find any other places that may have gotten incorrectly changed

@amanasifkhalid

Copy link
Copy Markdown
Contributor Author

LGTM. I believe this is all the places that are using tuple and that had been changed from .NET 8, but I notably did not compare line by line to find any other places that may have gotten incorrectly changed

There is also LoadAndInsertScalar, but that one is special-cased such that during importation, we convert the NamedIntrinsic to one specifying the ValueTuple shape. VectorTableLookup also has a ValueTuple arg, but it doesn't have an immediate argument, and it has a special path in the importer anyway, so I don't think anything else needs to be fixed?

@amanasifkhalid

Copy link
Copy Markdown
Contributor Author

@tannergooding should this be backported to .NET 9, since we fixed the SIMD size lookup for AdvSimd.Store? I see some logic during codegen that relies on the SIMD size being correct to pass the right parameters to the emitter.

jtschuster pushed a commit to jtschuster/runtime that referenced this pull request Sep 17, 2024
@github-actions github-actions Bot locked and limited conversation to collaborators Oct 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants