[RFC][Quantization] Designing and lowering of quantized ops#3512
[RFC][Quantization] Designing and lowering of quantized ops#3512shoubhik wants to merge 9 commits intoapache:masterfrom
Conversation
…apache#3367](apache#3367) In this PR I want to discuss the design and implementation of the - Quantize op -> FP32 to i8/u8 - Dequantize Op -> i8/u8 -> fp32 I have added test cases to verify the correctness of the ops.
1. Correcting docs 2. Reordering Clip and Cast in the dequantize op for stability.
|
Is this ready for review? Have we been converged on the design in the quantization RFC? |
|
We have made good progress on the Quantization RFC, achieving clarity and convergence on many points. |
| const auto scale = MakeConstantScalar(Float(32), attrs->output_scale); | ||
| const int32_t min_val = get_qmin(out_dtype); | ||
| const int32_t max_val = get_qmax(out_dtype); | ||
| auto scale_data = Cast(Round(Divide(data, scale)), Int(32)); |
There was a problem hiding this comment.
As discussed in the RFC, this should be round_away_from_zero, i.e. std::round in C++. How about this Round here? I don't see definition of this Round.
There was a problem hiding this comment.
This is Relay Round operator which translates to LLVM round intrinsic
https://llvm.org/docs/LangRef.html#llvm-round-intrinsic
The comment on the above link says that it has same behavior as LIBM round function which is defined here
https://sourceware.org/newlib/libm.html#lround
This is round-away-from-zero.
|
@FrozenGene and @tqchen, any other major comments for the PR? |
|
Mainly organizational issues, please make things consistent with what was discussed in #3531 |
1. Correcting the file paths as suggested in the reviews.
2. Fixing lint issues.
|
@liangfu made the changes you suggested. |
| DataType out_dtype; | ||
|
|
||
| TVM_DECLARE_ATTRS(QuantizeAttrs, "relay.attrs.QuantizeAttrs") { | ||
| TVM_ATTR_FIELD(out_dtype) |
There was a problem hiding this comment.
I have seen that PR #3531 accepts quantized tensor data type of 8/16 bit, are we going to align?
| namespace tvm { | ||
| namespace relay { | ||
|
|
||
| inline bool IsInt8(const DataType& dtype) { |
There was a problem hiding this comment.
hmmmmmm, do we really need such utils?
| return dtype == Float(32); | ||
| } | ||
|
|
||
| inline bool IsQuantizedType(const DataType& dtype) { |
There was a problem hiding this comment.
yes i am waiting for #3531 to get merged. it seems to be in flux for now. once it is merged i will make the changes in this one.
| case QuantizeOpType::Quantize: | ||
| return IsFloat32(in_dtype); | ||
| case QuantizeOpType ::Dequantize: | ||
| return IsQuantizedType(in_dtype); |
There was a problem hiding this comment.
The type check is against your api definition.
| const DataType& in_dtype) { | ||
| switch (op_type) { | ||
| case QuantizeOpType::Quantize: | ||
| return IsQuantizedType(in_dtype); |
There was a problem hiding this comment.
The type check is against your api definition. Alos @anijain2305 , do we need to align?
I think, if someone defines an OpType and conversions, it his responsible to include that code in one unified PR. Otherwise, don't include the OpType definition or conversions.
There was a problem hiding this comment.
i agree, lets finish #3531 and then i will update this one.
| } | ||
| } | ||
|
|
||
| inline const int32_t GetQmin(const DataType& dtype) { |
There was a problem hiding this comment.
The implementation seems against PR #3531, of which the code is more simple.
| } else if (IsUint32(dtype)) { | ||
| return std::numeric_limits<uint32_t>::min(); | ||
| } | ||
| LOG(FATAL) << "Type not supported\n"; |
There was a problem hiding this comment.
maybe include in an else section.
| return -1; | ||
| } | ||
|
|
||
| inline const int32_t GetQmax(const DataType& dtype) { |
|
Regarding 16/32 bits quantization, I have a discussion in RFC #3591 . |
[Relay] [Quantization] WIP - This is the continuation of pull request #3367
In this PR I want to discuss the design and implementation of the
I have added test cases to verify the correctness of the ops.