tensor.qlinear_mul
Performs the element-wise multiplication of quantized Tensors
It consumes two quantized input tensors, their scales and zero points, scale and zero point of output, and computes the quantized output. The quantization formula is y = saturate((x / y_scale) + y_zero_point). It performs the element-wise multiplication of the two vectors once dequantized, then return the quantization of the result of the multiplication. The broadcasting is supported Scale and zero point must have same shape and the same type. They must be either scalar (per tensor) or N-D tensor (per row for 'a' and per column for 'b'). Scalar refers to per tensor quantization whereas N-D refers to per row or per column quantization.
Args
self
(@Tensor<i8>
) - The first tensor to be multiplied (a).a_scale
(@Tensor<T>
) - Scale for inputa
.a_zero_point
(@Tensor<T>
) - Zero point for inputa
.b
(@Tensor<i8>
) - The second tensor to be multipliedb_scale
(@Tensor<T>
) - Scale for inputb
.b_zero_point
(@Tensor<T>
) - Zero point for inputb
.y_scale
(@Tensor<T>
) - Scale for outut.y_zero_point
(@Tensor<T>
) - Zero point for output.
Returns
A new Tensor<i8>
, containing the quantized result of the element-wise multiplication of the dequantized inputs.
Type Constraints
u32 tensor, not supported. fp8x23wide tensor, not supported. fp16x16wide tensor, not supported.
Example
use core::array::{ArrayTrait, SpanTrait};
use orion::operators::tensor::{TensorTrait, Tensor, I8Tensor, FP16x16Tensor}; use orion::numbers::{FP16x16, FP16x16Impl, FixedTrait};
Last updated