[FQ2I] Support converting dense -> add to qnn.dense -> add -> requantize#13578
[FQ2I] Support converting dense -> add to qnn.dense -> add -> requantize#13578masahi merged 12 commits intoapache:mainfrom
dense -> add to qnn.dense -> add -> requantize#13578Conversation
|
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
Icemist
left a comment
There was a problem hiding this comment.
LGTM, a little code remark.
| # Y = alpha * A * B + beta * C | ||
| alpha = float(attr.get("alpha", 1.0)) | ||
| beta = float(attr.get("beta", 1.0)) | ||
| beta = float(attr.get("beta")) |
There was a problem hiding this comment.
I would keep the original line of .get('beta', 1.0) since you cannot call float() on None which attr.get can return.
Then below on L1409, you can just do if beta is None --> if 'beta' not in attr.keys() or something.
There was a problem hiding this comment.
Though the change on L1409 might not be needed since if beta == 1, it can be removed with constant folding.
There was a problem hiding this comment.
Constant folding doesn't work when beta is multiplying an output of qnn ops, since we cannot fold over them. The model in #13545 has multiply(1f, dequantize(bias) after dense, which was also causing some issues.
There was a problem hiding this comment.
Moved float(beta) to the else block of if beta is None.
There was a problem hiding this comment.
Actually the whole purpose of this change was to avoid multiplying by 1.0, since multiply(1f, dequantize(bias) would be converted to qnn.mul(quantize(1), bias) by FQ2I. So I restored the original code cc @Icemist
An alternative would be to add algebraic simplification to the SimpliyfyExpr pass.
d411b86 to
da99fa5
Compare
bd6e2d2 to
9c8d26e
Compare
… `requantize` (apache#13578) * wip * hack to convert size-1 scale and zp tensors to scalar * fix to binary op fast path * check output zp * add assert * add comment * lint * clean up beta handling * use regular binary op only for 32 bit add (bias addition) * do float(beta) when we know that beta is not None * restore original beta handling code to avoid mul by 1 * add comment on overflow
… `requantize` (apache#13578) * wip * hack to convert size-1 scale and zp tensors to scalar * fix to binary op fast path * check output zp * add assert * add comment * lint * clean up beta handling * use regular binary op only for 32 bit add (bias addition) * do float(beta) when we know that beta is not None * restore original beta handling code to avoid mul by 1 * add comment on overflow
Closes #13545
The pattern of
dense -> add, where the add is really bias addition, can appear often as the result of converting ONNXGemmop:tvm/python/tvm/relay/frontend/onnx.py
Line 1409 in edfeba5
Currently, FQ2I tries to convert this
addtoqnn.add. But if this add is being used for bias addition,out_t.scaleandout_t.zero_pointvariables infake_quantization_to_integer.py, which are used to initialize the output scale and zp of the QNN binary operators, can be tensors rather than scalars. QNN binary operators do not support such output qparams, which led to the error reported in #13545.For this reason, apparently we haven't supported converting
dense -> addtoqnn.dense -> add -> requantize, whenaddis a bias add, in FQ2I. The pattern ofdense -> nn.bias_addcan be converted toqnn.dense -> nn.bias_add -> requantize, but we never usenn.bias_addafterdense.So I added a code path in the FQ2I QNN binary op converter, to identify such patterns and use regular binary ops rather than QNN ones.
cc @AndrewZhaoLuo @Icemist @elvin-n