tensor.quantize_linear
Quantizes a Tensor using linear quantization.
The linear quantization operator. It consumes a high precision tensor, a scale, and a zero point to compute the low precision / quantized tensor. The scale factor and zero point must have same shape, and can be either a scalar for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization. The quantization formula is y = saturate ((x / y_scale) + y_zero_point)
. For saturation, it saturates to [-128, 127]
. For (x / y_scale), it's rounding to the nearest even.
Args
self
(@Tensor<T>
) - The input tensor.y_scale
(@Tensor<T>
) - Scale for doing quantization to gety
.y_zero_point
(@Tensor<T>
) - Zero point for doing quantization to gety
.
Returns
A new Tensor<Q>
with the same shape as the input tensor, containing the quantized values.
Type Constraints
u32 tensor, not supported.
Examples
Last updated