Currently, for a network to be quantized using the PTQ algorithm, all of its nodes must have a linear reponse (FC, Conv2D, ..).
The reason behind this requirement is purely mathematical : during the PTQ the input values of each node are calibrated using scalar multiplications by scaling factors.
As a result, the output of a node, which has its inputs rescaled by a scaling factor, has to be equal to the scaling factor multiplied by the output that the node would have if its inputs were untouched. That is :
Node(scaling_factor * x) = scaling_factor * Node(x) (1)
The linearity condition is not verified by the ReLU
, yet it is seamlessly integrated to the PTQ pipleine. The reason why is also mathematical : the scaling factor is not any real, but only a positive one. As result the output of Node(scaling_factor * x)
cannot change sign, and the possible solution to (1)
are node responses that are both linear on R+
and on R-
, that is ReLUs
and LeakyReLUs
(and that is really all).
Assume a non-linear, let say a Sigmoid
, operator receives a rescaled input. As (1)
is not verified, we need to rescale its input back to its original range, by cancelling the product of all the previous scaling factors, and revert this rescaling after the output computation.
The goal of this MR is not only of integrating the Sigmoid
operator, but also to provide a reference example for the integration of any non-linear, non-quantized, operator to the PTQ pipeline.
insertScalingNodes()
to add an extra scaling before non-linear nodesnormalizeParameters()
and normalizeActivations()
to rescale inputs/outputs of non-linear nodesquantizeNormalizedNetwork()
to keep the float dynamic of non-linear nodes untouchedNote : in addition to theses changes, several modifications were made to improve the code quality.
Copyright © Eclipse Foundation, Inc. All Rights Reserved. Privacy Policy | Terms of Use | Copyright Agent