Improve AArch64 depthwise convolution through smlal/smlal2 intrinsic#6711
Merged
FrozenGene merged 6 commits intoapache:mainfrom Nov 3, 2020
Merged
Improve AArch64 depthwise convolution through smlal/smlal2 intrinsic#6711FrozenGene merged 6 commits intoapache:mainfrom
FrozenGene merged 6 commits intoapache:mainfrom
Conversation
Contributor
Author
Member
|
I like this change. May I ask two questions?
|
Contributor
Author
|
Hi @FrozenGene
|
Member
For ansor, we could make ansor support tensorize (like TensorCore we need tensorize too). However, if we could done it in the llvm / tir, we will make ansor support it easily. If we could lift it as generic pass, I think it will bring benifit to other hardware platform too. |
mbaret
requested changes
Oct 28, 2020
added 3 commits
October 29, 2020 21:56
- Added an intrinsic to load a single int16x8 vector and produce two int32x4 output vectors through smlal/smlal2 instructions - Changed the NHWC depthwise schedule to accomodate the aforementioned intrinsic Change-Id: I347c3bf98fa8dd87057304dcda0d78e558424c57
4a7b7c1 to
efb8ef9
Compare
added 3 commits
October 29, 2020 22:44
mbaret
approved these changes
Oct 30, 2020
Contributor
mbaret
left a comment
There was a problem hiding this comment.
LGTM now. I agree with @FrozenGene that this would be best implemented nearer the LLVM codegen for reusability, but I think this is a good start and demonstrates a worthwhile benefit to performance.
Contributor
Author
|
Hi @mbaret , thanks for the review! @FrozenGene , any update on this? |
FrozenGene
approved these changes
Nov 3, 2020
Member
trevor-m
pushed a commit
to trevor-m/tvm
that referenced
this pull request
Dec 2, 2020
…pache#6711) * Improve depthwise convolution through smlal/smlal2 intrinsic - Added an intrinsic to load a single int16x8 vector and produce two int32x4 output vectors through smlal/smlal2 instructions - Changed the NHWC depthwise schedule to accomodate the aforementioned intrinsic Change-Id: I347c3bf98fa8dd87057304dcda0d78e558424c57 * Address review comments * Rebasing - 2 * Rebasing - 3 * Rebasing - 3 * Fix linting
trevor-m
pushed a commit
to trevor-m/tvm
that referenced
this pull request
Dec 4, 2020
…pache#6711) * Improve depthwise convolution through smlal/smlal2 intrinsic - Added an intrinsic to load a single int16x8 vector and produce two int32x4 output vectors through smlal/smlal2 instructions - Changed the NHWC depthwise schedule to accomodate the aforementioned intrinsic Change-Id: I347c3bf98fa8dd87057304dcda0d78e558424c57 * Address review comments * Rebasing - 2 * Rebasing - 3 * Rebasing - 3 * Fix linting
trevor-m
pushed a commit
to neo-ai/tvm
that referenced
this pull request
Dec 4, 2020
…pache#6711) * Improve depthwise convolution through smlal/smlal2 intrinsic - Added an intrinsic to load a single int16x8 vector and produce two int32x4 output vectors through smlal/smlal2 instructions - Changed the NHWC depthwise schedule to accomodate the aforementioned intrinsic Change-Id: I347c3bf98fa8dd87057304dcda0d78e558424c57 * Address review comments * Rebasing - 2 * Rebasing - 3 * Rebasing - 3 * Fix linting
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Added an intrinsic to load a single int16x8 vector and produce two
int32x4 output vectors through smlal/smlal2 instructions
Changed the NHWC depthwise schedule to accomodate the aforementioned
intrinsic
Change-Id: I347c3bf98fa8dd87057304dcda0d78e558424c57