[QNN] Optimize requantize for power of 2 and fix dequantize for per-channel quantized input#6675
Conversation
|
@ZihengJiang Can you please review? |
ec78ab9 to
3e41e7d
Compare
| (is_upward_rounding | ||
| ? FixedPointMultiply(scaled_int32_t, fixed_point_multiplier, shift) | ||
| : FixedPointMultiplyToNearest(scaled_int32_t, double_multiplier, input_shape)); | ||
| if (is_upward_rounding && fixed_point_multiplier == (1 << 30)) { |
There was a problem hiding this comment.
why the fixed_point_multiplier must be (1 << 30)?
There was a problem hiding this comment.
So, we use frexp to represent a floating point numbers. It gives a float significant which is between [0.5, 1). For power of 2, it is always 0.5. We convert the float significand into a fixed point 32-bit integer, where decimal point is between the first and second bit. 0.5 in this representation = 1 << 30
There was a problem hiding this comment.
@anijain2305 , can add a small one line comment regarding (1<<30) ? These days aside from float32 many other types of float floats around.
There was a problem hiding this comment.
@cbalint13 Added a comment, can you PTAL
5825c48 to
da11098
Compare
Can you describe what's the bug you are fixing for dequantize in the commit message or link to an actual bug report ? Ramana |
|
dequantize in main branch fails when input is per-channel quantized. There is no bug report. I found this while working on quantizing models in Relay. |
Cool, could the description above say something like "Fix dequantize for per-channel quantized input" instead of just fixing a bug ? |
giuseros
left a comment
There was a problem hiding this comment.
Few comments also from me before this goes in !
| // Power of 2 is determined by the fixed_point_multiplier == 1 << 30. In case of power of 2, | ||
| // fixed point multiplier will represent a float value of 0.5. In fixed point, this is | ||
| // represented by 1 << 30. | ||
| scaled_int32_t = PowerOfTwoMultiply(scaled_int32_t, shift - 1); |
There was a problem hiding this comment.
Does it make sense for this to go in FixedPointMultiply? This would give the possibility to everybody using FixedPointMultiply to exploit this fix.
| auto rounding_factor = 1 << (exp - 1); | ||
| auto rounded_t = Add(tensor, MakeConstantScalar(DataType::Int(32), rounding_factor)); | ||
| out = RightShift(rounded_t, MakeConstantScalar(DataType::Int(32), exp)); |
There was a problem hiding this comment.
Are you sure you don't need to convert to int64 upfront and then cast back to int32?
|
@giuseros Can you PTAL? I put all the stuff in fixed_point_multiply. Regarding your comment on upcasting to int64, I don't think it's necessary. But, please let me know if I am missing a corner case. |
|
Hi @anijain2305 , |
|
@ZihengJiang @u99127 Can you please take a look again? PR changed after I incorporated @giuseros comments. |
|
LGTM! Thanks! @anijain2305 |
…hannel quantized input (apache#6675) * [QNN] Optimize requantize for power of 2 and bug in dequantize * Comments * Docs * Comments * Ethos
…hannel quantized input (apache#6675) * [QNN] Optimize requantize for power of 2 and bug in dequantize * Comments * Docs * Comments * Ethos
…hannel quantized input (apache#6675) * [QNN] Optimize requantize for power of 2 and bug in dequantize * Comments * Docs * Comments * Ethos
Optimize the lowering for requantize op for a power of 2. Additionally, fixing a bug for dequantize.