[QNN][Relay][Topi] Add qnn.dense with weight layout#13854
[QNN][Relay][Topi] Add qnn.dense with weight layout#13854echuraev merged 2 commits intoapache:mainfrom
Conversation
|
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment. Generated by tvm-bot |
This commit imroves performance of qnn.mul operation without QNN canonicalization.
116a60d to
f55a0c9
Compare
|
This is cool, thanks a lot. I went through this a little already earlier, but I'll go through the PR more closely. |
quic-sanirudh
left a comment
There was a problem hiding this comment.
Looks good to me apart from the one question about schedule.
Also, one small thought and I might be wrong here, but I see that the nn version of this op is called nn.contrib_dense_pack, so should we name this qnn.contrib_dense_pack?
| sch: Schedule | ||
| The computation schedule for the op. | ||
| """ | ||
| return default_schedule(outs) |
There was a problem hiding this comment.
is there a plan to add vrmpy tensorized schedule separately, or is that going to be dependent on metaschedule?
There was a problem hiding this comment.
I have such plans, but with low priority. The most interesting is work of MetaScheduler for now.
Yes, I can rename this operator to be aligned with its analog |
| # Shift kernel if necessary. | ||
| if kernel_dtype == "int8": | ||
| # Compute (QA + 128) and (zp_a + 128) | ||
| kernel, kernel_zero_point = _shift(kernel, kernel_zero_point, "uint8") | ||
|
|
There was a problem hiding this comment.
Why do we want to convert kernel to int8? This would result into non-zero value zero-point and introduced additional computation which could have been avoided if kernel stayed in int8.
There was a problem hiding this comment.
Main goal of conversion int8 --> uint8 is to enable using of faster vrmpy u8u8i32 intrinsic instead of vrmpy u8i8i32 for qnn.dense/qnn.conv2d
Yes, you're absolutely right. We have some overhead on such conversion. That's why I have commented out this code and skip i8 -> u8 conversion for weights. Right now I do not see any performance improvement due to overhead.
I will enable this code if I will manage to get better performance.
f55a0c9 to
7e45dda
Compare
|
|
||
| def test_qnn_legalize_qnn_conv2d_non_scalar_qnn_params(): | ||
| """ | ||
| Test QNN legalization for qnn.dense op for Hexagon target when kernel zero point and kernel |
There was a problem hiding this comment.
typo?: qnn.dense->qnn.conv2d
| def test_qnn_legalize_qnn_conv2d_non_scalar_qnn_params(): | ||
| """ | ||
| Test QNN legalization for qnn.dense op for Hexagon target when kernel zero point and kernel | ||
| scale are not scalars. |
There was a problem hiding this comment.
Could you please elaborate what you mean by kernel zero-point and scale not being scalars?
There was a problem hiding this comment.
So, by "not scalar" I mean constant vector of scalars. For example:
relay.const([val1, val2 ... valN])
This is more interesting case for testing because on legalization we do padding of weights. And dimension of weights, vector of scale/zero point and 'channels' attribute should be aligned.
This commit adds new Relay operation "qnn.contrib_dense_pack" that supports different weights layout (nn.dense and qnn.dense do not support this attribute). This new operation is full analog of "nn.contrib_dense_pack" operation but in QNN space.
7e45dda to
15e85e5
Compare
* [Hexagon][QNN] Improve performance of qnn.mul This commit imroves performance of qnn.mul operation without QNN canonicalization. * [QNN][Relay][Topi] Add qnn.dense with weight layout This commit adds new Relay operation "qnn.contrib_dense_pack" that supports different weights layout (nn.dense and qnn.dense do not support this attribute). This new operation is full analog of "nn.contrib_dense_pack" operation but in QNN space.
This commit adds new Relay operation
qnn.contrib_dense_packthat supports different weights layout (nn.denseandqnn.densedo not support this attribute). This new operation is full analog ofnn.contrib_dense_packoperation but in QNN space.With this PR, current QNN Dense can achieve ~10x performance gain on Hexagon target without QNN canonicalization (through the use of
vrmpyintrinsic).Also, this PR includes slight performance improvement for
qnn.mul(without QNN canonicalization).