Add support for the MatMul operator
Description
The goal of this MR is to allow the quantization of models containing MatMul
operators, using the PTQ pipeline.
It is important to note that there are in fact two very different cases where the MatMul
operator is used :
-
The first one is to represent a
FC
node which has no bias. To handle this case without adding complexity to the PTQ pipeline, we can use thefuseMatMultoFC()
recipe. But we must first ensure that the weight is connected to the input1
of theMatMul
node (and no the input0
). That's why areorderMatMulInputs()
recipe is needed. -
The second one is the case where the two inputs of the
MatMuls
are actual data (i.e. not weights). In this case we need to modify the different steps of the PTQ pipeline to ensure the scaling ratios are correclty flowing in the graph. The general idea is to multiply the two input scaling ratios that come from the branches that are merged by theMatMul
operator.
TODO
To handle the first case :
-
modify the isAffine()
function to catchMatMul
nodes connected to a weight Tensor (using theisWeighted
tag) -
create the reorderMatMulInputs()
recipe that ensure that the weightProducer
is connected to input1
-
handle MatMuls
that are connected to a weight without replacing them withFC
nodes
To handle the second case :
-
modify the isMerging()
function to catchMatMul
nodes not connected to a weight Tensor (using theisWeighted
tag) -
modify the normalizeParameters()
andnormalizeActivations()
function to multiply the two input accumulated ratios -
modify the quantizeNormalizeNetwork()
function to rescale theMatMul
scaling twice as much
Overall :
-
test and validate the changes on several network topologies
Files changed : mostly PTQ.cpp
, but also various PTQ related files (CLE.cpp
, headers, ...)
Note : Several other changes were made to improve the code quality (e.g. hasAttr()
, addAttr()
, ...)