[TIR][LLVM] Expose TIR api for llvm sext/zext and trunc native type converter intrinsics#15960
[TIR][LLVM] Expose TIR api for llvm sext/zext and trunc native type converter intrinsics#15960cbalint13 wants to merge 1 commit intoapache:mainfrom
Conversation
|
Other than lack of hardware support, are there any cases where we wouldn't want to apply these intrinsics? If there aren't, I'm wondering if |
To sum up the needs here:
A bit longer: The ultimate goal, if 15918 is done, would be to have a solid int8 support (again not overflowing on x86 !), working even backward with older hardware (not everyone have vnni or avx512) to make LLM and other quantized models happier. |
You mean CreateIntCast() generic int and CreateIntCast() for truncate , right ?
|
The main thing I'm wondering is whether the support for these conversions should be done through an LLVM-specific intrinsic, or through a change in Though, looking at LLVM's implementation of
Ah, that makes sense. So if I understand correctly, we are currently relying on LLVM's choice of intrinsic for performing integer to integer casts, but we want to be able to override that choice of intrinsic by explicitly specifying it. |
This PR expose new tir api operators binded to their llvm native intrinsic counterparts.
Adds the ability to emit native cpu intrinsics for atomic type conversions of vectors for tensorizers.
Changes
zextend,sextend,truncatefor type conversions.atomic_addmapping to proper LLVM intrinsic guaranteed (best-effort) to lower to single instruction.Rationale
Some highly efficient CPU intrinsics related to data type manipulations of whole vectors are not exposed by LLVM.
As substitute LLVM offers "higher level functions" with guarantees that will emit the exact & right instruction on CPU.
Example
On x86 we want to expand a vector from
uint8x16->uint16x16or perhaps sign expand toint16x16.In order to do this the pmovzxwd and pmovsxbw are needed which are not exposed by LLVM directly.
The new
zextend(non-sign, zero aware) andsextend(sign aware) functions can now do this:Notes
A more complete example with real usage in a tensorization process with these new tir operators can be seen here .
This also allows more TOPI/MS data type conversions leveraging precise control on involved atomic CPU instructions.
This PR is indispensable part of #15918 , an effort towards int8 tensorization coverage on x86.
Cc: @Lunderberg , @junrushao , @masahi , @vinx13, @ekalda , @lhutton1 , @quic-sanirudh , @kparzysz-quic