References: - Model Quantization: Concepts, Methods, and Why It Matters
Quantization Algorithms
Quantization maps a float number
Affine/Asymmetric Quantization: Defined by scale
factor
where: - round converts the scaled value to the nearest
quantized representation. - clip ensures the scaled value
stay within the range of quantized representation.
To recover the full-precision from a quantized value:
As we can see that round and clip bring
errors naturally, which are inherent to the quantization process.
Symmetric Quantization: Fixing the zero-point
NOTE: The mostly used quantization algrithm is symmetric quantization as the affine quantization does not offer a significant boost on model performances. NVIDIA TensorRT and Model Optimizer use symmetric quantization.