[CODEGEN] ARM Popcount lowering rule and codegen updates#1235
[CODEGEN] ARM Popcount lowering rule and codegen updates#1235tqchen merged 5 commits intoapache:masterfrom
Conversation
…ing and accessing vectors
|
There is an unintended change that reverts the submodule to an older version. Please update the submodule (HalideIR) to the latest version. You can do it by git pull under the HalideIR folder |
| int num_elems = static_cast<int>(vec->getType()->getVectorNumElements()); | ||
| if (extent == num_elems && begin == 0) return vec; | ||
| CHECK_LT(begin + extent, num_elems); | ||
| CHECK_LT(begin + extent, num_elems+1); |
| return CodeGenCPU::CreateIntrinsic(op); | ||
| } | ||
|
|
||
| Expr CodeGenARM::ARMPopcount(const Call *call) { |
There was a problem hiding this comment.
We will need a regression test for this rule. please add a test case to arm popcount, to a new file tests/python/unittest/test_codegen_arm.py .
Since we don't have ARM device to verify, what we can do is to dump out the asm file(Maybe we can patch GetSource in llvm module to support get_source("asm") ) and verify the neons sequence is as expected.
| ::llvm::Intrinsic::ID vpaddu_id = ::llvm::Intrinsic::arm_neon_vpaddlu; | ||
|
|
||
|
|
||
| Type uint8_type = Type(e.type().code(), 8, e.type().bits() * e.type().lanes() / 8); |
There was a problem hiding this comment.
move the typedef after the fallback guard, add comment that the division is always dividable.
There was a problem hiding this comment.
Add a comment about what this specific pattern of neon sequence is
|
Thanks, this is merged! |
|
Nice! |
TVM compiler changes for low precision operators
Thanks for contributing to TVM! Please refer to guideline http://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from others in the community.