Skip to content

Quantization Updates and Optimizations

Benjamin Halimi requested to merge bhalimi/aidge_quantization:DevPTQ into main


This Merge Request aims to release several major improvements and features that will enrich the PTQ routine.


Implementation of the Clipping Optimization feature

By default, we spread the activations between their min and max values. It is possible to use a smarter clipping policy, that will reduce the available range of values, but increase the resolution of the most frequent values. In order to choose the optimal clipping value we first compute the histogram of the activations. Then we compute the difference between this base histogram to the quantized version of it, for each value of the clipping, and pick the clipping that minimize this difference. The metric used to measure this difference can either be the MSE or the KL-Divergence.

Better handling of joining nodes

Joining nodes, that is nodes that merge two branches into one (Add, Concat, ...) were handled differently for each joining node type. It was both not generic and inefficient in some cases. To address this issue, we now use a maximum arbitration on the propagated scaling-factors. As a consequence, for example in the case of Add nodes, the sum will overflow. That is why we now insert a scaling node after each joining node.

Implementation of the Cross Layer Equalization

Some very deep networks expose large differences of weight ranges across the layers. To balance the dynamics of the values propagated inside the network, it can be interesting to re-equilibrate those weight ranges. This will lead to a better quantization. This is the role of the Cross Layer Equalization routine. Note that the CLE is only implemented for feed-forward architectures.

PTQ without rounding

We add an option that allows to perform the PTQ without roundings. This is mainly implemented for testing and debugging purpose.

Sign Optimization

As some network architectures often contain nodes with positive valued output (e.g. ReLUs) the representation of those values on signed number is inefficient. In order to double the resolution of the values, an optimization is required. To do so, we first analyze locally the graph around each node and determine if we can represent its inputs and outputs on unsigned numbers. This is the goal of the ComputeSignMap routine. This sign map will later be used in the post-calibration quantization routine.

Single Shift

After the quantization, all values but the scaling factors are stored as integers. We would like to avoid the floating point operations caused by the scaling nodes. A possible answer to this issue is to approximate each rescaling by a single-shift, that is a multiplication by a power of two. To do so, we first determine the value of the shift, and then compensate this approximation by multiplying the weights of the previous layer by the ratio of the approximation.

Global adaptation of the code

Several changes were made only to ensure that the PTQ is compliant with the 'main' branches of the bundle. As for now, only 'aidge_core' needs to be on 'dev', in order to benefit from a fix of the scheduler. The test script '' was also updated. The pybind docstrings were also completed.

Modified files

QuantPTQ.cpp, QuantPTQ.hpp, pybind_QuantPTQ.cpp and

Edited by Benjamin Halimi

Merge request reports