Quantization Updates and Optimizations
-
Review changes -
-
Download -
Patches
-
Plain diff
Intro
This Merge Request aims to release several major improvements and features that will enrich the PTQ routine.
Changes
Implementation of the Clipping Optimization feature
By default, we spread the activations between their min and max values. It is possible to use a smarter clipping policy, that will reduce the available range of values, but increase the resolution of the most frequent values. In order to choose the optimal clipping value we first compute the histogram of the activations. Then we compute the difference between this base histogram to the quantized version of it, for each value of the clipping, and pick the clipping that minimize this difference. The metric used to measure this difference can either be the MSE or the KL-Divergence.
Better handling of joining nodes
Joining nodes, that is nodes that merge two branches into one (Add, Concat, ...) were handled differently for each joining node type. It was both not generic and inefficient in some cases. To address this issue, we now use a maximum arbitration on the propagated scaling-factors. As a consequence, for example in the case of Add nodes, the sum will overflow. That is why we now insert a scaling node after each joining node.
Implementation of the Cross Layer Equalization
Some very deep networks expose large differences of weight ranges across the layers. To balance the dynamics of the values propagated inside the network, it can be interesting to re-equilibrate those weight ranges. This will lead to a better quantization. This is the role of the Cross Layer Equalization routine. Note that the CLE is only implemented for feed-forward architectures.
PTQ without rounding
We add an option that allows to perform the PTQ without roundings. This is mainly implemented for testing and debugging purpose.
Sign Optimization
As some network architectures often contain nodes with positive valued output (e.g. ReLUs) the representation of those values on signed number is inefficient. In order to double the resolution of the values, an optimization is required. To do so, we first analyze locally the graph around each node and determine if we can represent its inputs and outputs on unsigned numbers. This is the goal of the ComputeSignMap routine. This sign map will later be used in the post-calibration quantization routine.
Single Shift
After the quantization, all values but the scaling factors are stored as integers. We would like to avoid the floating point operations caused by the scaling nodes. A possible answer to this issue is to approximate each rescaling by a single-shift, that is a multiplication by a power of two. To do so, we first determine the value of the shift, and then compensate this approximation by multiplying the weights of the previous layer by the ratio of the approximation.
Global adaptation of the code
Several changes were made only to ensure that the PTQ is compliant with the 'main' branches of the bundle. As for now, only 'aidge_core' needs to be on 'dev', in order to benefit from a fix of the scheduler. The test script 'aidge_ptq.py' was also updated. The pybind docstrings were also completed.
Modified files
QuantPTQ.cpp
, QuantPTQ.hpp
, pybind_QuantPTQ.cpp
and aidge_ptq.py
Merge request reports
- version 35548ee433
- version 3444aaa796
- version 33b2f4d118
- version 328dcc11af
- version 316e1af116
- version 3041dc9140
- version 2923a22639
- version 28b3b9b5c0
- version 272c33d763
- version 26cec2bee7
- version 25fb7cc758
- version 249113a048
- version 234e5bdf88
- version 2220b035c3
- version 21680123a4
- version 2007612643
- version 19bc2432e1
- version 182c6fc32c
- version 1744b01472
- version 163d3c1231
- version 15244740a9
- version 1400872114
- version 13c7adf10d
- version 12cb8e4238
- version 11e9c19a08
- version 1099544f56
- version 9b7c826fc
- version 8aec0e71d
- version 77fe0d34e
- version 631c93f3d
- version 5d547c11f
- version 44d5398d0
- version 3988a2b5d
- version 2f0ff972d
- version 1f0ff972d
- main (base)
- latest version0e591cf837 commits,
- version 35548ee43336 commits,
- version 3444aaa79635 commits,
- version 33b2f4d11834 commits,
- version 328dcc11af33 commits,
- version 316e1af11632 commits,
- version 3041dc914031 commits,
- version 2923a2263930 commits,
- version 28b3b9b5c029 commits,
- version 272c33d76328 commits,
- version 26cec2bee727 commits,
- version 25fb7cc75826 commits,
- version 249113a04825 commits,
- version 234e5bdf8824 commits,
- version 2220b035c323 commits,
- version 21680123a422 commits,
- version 200761264321 commits,
- version 19bc2432e119 commits,
- version 182c6fc32c18 commits,
- version 1744b0147217 commits,
- version 163d3c123116 commits,
- version 15244740a915 commits,
- version 140087211414 commits,
- version 13c7adf10d13 commits,
- version 12cb8e423812 commits,
- version 11e9c19a0811 commits,
- version 1099544f569 commits,
- version 9b7c826fc8 commits,
- version 8aec0e71d7 commits,
- version 77fe0d34e6 commits,
- version 631c93f3d5 commits,
- version 5d547c11f4 commits,
- version 44d5398d03 commits,
- version 3988a2b5d2 commits,
- version 2f0ff972d1 commit,
- version 1f0ff972d1 commit,
- Side-by-side
- Inline