Add support for the MatMul operator
Description
The goal of this MR is to allow the quantization of models containing MatMul operators, using the PTQ pipeline.
It is important to note that there are in fact two very different cases where the MatMul operator is used :
-
The first one is to represent a
FCnode which has no bias. To handle this case without adding complexity to the PTQ pipeline, we can use thefuseMatMultoFC()recipe. But we must first ensure that the weight is connected to the input1of theMatMulnode (and no the input0). That's why areorderMatMulInputs()recipe is needed. -
The second one is the case where the two inputs of the
MatMulsare actual data (i.e. not weights). In this case we need to modify the different steps of the PTQ pipeline to ensure the scaling ratios are correclty flowing in the graph. The general idea is to multiply the two input scaling ratios that come from the branches that are merged by theMatMuloperator.
TODO
To handle the first case :
-
modify the isAffine()function to catchMatMulnodes connected to a weight Tensor (using theisWeightedtag) -
create the reorderMatMulInputs()recipe that ensure that the weightProduceris connected to input1 -
handle MatMulsthat are connected to a weight without replacing them withFCnodes
To handle the second case :
-
modify the isMerging()function to catchMatMulnodes not connected to a weight Tensor (using theisWeightedtag) -
modify the normalizeParameters()andnormalizeActivations()function to multiply the two input accumulated ratios -
modify the quantizeNormalizeNetwork()function to rescale theMatMulscaling twice as much
Overall :
-
test and validate the changes on several network topologies
Files changed : mostly PTQ.cpp, but also various PTQ related files (CLE.cpp, headers, ...)
Note : Several other changes were made to improve the code quality (e.g. hasAttr(), addAttr(), ...)