Skip to content

Add support for the MatMul operator

Benjamin Halimi requested to merge add_matmul into dev

Description

The goal of this MR is to allow the quantization of models containing MatMul operators, using the PTQ pipeline.

It is important to note that there are in fact two very different cases where the MatMul operator is used :

  • The first one is to represent a FC node which has no bias. To handle this case without adding complexity to the PTQ pipeline, we can use the fuseMatMultoFC() recipe. But we must first ensure that the weight is connected to the input 1 of the MatMul node (and no the input 0). That's why a reorderMatMulInputs() recipe is needed.

  • The second one is the case where the two inputs of the MatMuls are actual data (i.e. not weights). In this case we need to modify the different steps of the PTQ pipeline to ensure the scaling ratios are correclty flowing in the graph. The general idea is to multiply the two input scaling ratios that come from the branches that are merged by the MatMul operator.

TODO

To handle the first case :

  • modify the isAffine() function to catch MatMul nodes connected to a weight Tensor (using the isWeighted tag)
  • create the reorderMatMulInputs() recipe that ensure that the weight Producer is connected to input 1
  • handle MatMuls that are connected to a weight without replacing them with FC nodes

To handle the second case :

  • modify the isMerging() function to catch MatMul nodes not connected to a weight Tensor (using the isWeighted tag)
  • modify the normalizeParameters() and normalizeActivations() function to multiply the two input accumulated ratios
  • modify the quantizeNormalizeNetwork() function to rescale the MatMul scaling twice as much

Overall :

  • test and validate the changes on several network topologies

Files changed : mostly PTQ.cpp, but also various PTQ related files (CLE.cpp, headers, ...)

Note : Several other changes were made to improve the code quality (e.g. hasAttr(), addAttr(), ...)

Edited by Benjamin Halimi

Merge request reports

Loading