Add MatMulTiling recipe
Add MatMulTiling
recipe. The goal is to tile any matrix multiplication to several fixed size matrix multiplications. For instance, for a MatMul
of size 80x80 and a tiling of 16x16, this will tile the MatMul
operator to 25 (5 by 5) MatMul
operators of size 16x16, with Slice
operators inserted at the inputs and Concat
operators inserted at the outputs.
This is especially useful when matrix multiplication must be mapped to fixed maximum size hardware TPU (Tensor Processing Unit) or MMA (Matrix Multiplication Accelerator). This recipe can be combined with the ConvToMatMul
recipe in order to convert convolutions to matrix multiplication beforehand, and ConstantFolding
recipe to fold sliced constant tensors.
Detailed list of changes:
-
Fix Concat
forward bug with negative axis; -
Fix SliceImpl
registration bug on backend CPU; -
Fix removeNode
recipe to useSinglePassGraphMatching
, as GraphRegex is bugged on complex cases; -
Added MatMulTiling
recipe.
READY TO BE REVIEWED
Here is the graph generated by a single step of the MatMulTiling
recipe (after the very first matrix multiplication split):
%%{init: {'flowchart': { 'curve': 'monotoneY'}, 'fontFamily': 'Verdana' } }%%
flowchart TB
Producer_7(<em>Producer#7</em>):::producerCls
MatMul_1(<em>MatMul#1</em>)
Concat_0(<em>Concat#0</em>)
Producer_1(<em>Producer#1</em>):::producerCls
Producer_2(<em>Producer#2</em>):::producerCls
Producer_3(<em>Producer#3</em>):::producerCls
Producer_4(<em>Producer#4</em>):::producerCls
Producer_5(<em>Producer#5</em>):::producerCls
Producer_6(<em>Producer#6</em>):::producerCls
Identity_0(<em>Identity#0</em>):::rootCls
Slice_0(<em>Slice#0</em>)
Producer_0(<em>Producer#0</em>):::producerCls
MatMul_0(<em>MatMul#0</em>)
Identity_1(<em>Identity#1</em>)
Slice_1(<em>Slice#1</em>)
Producer_7-->|"0 [2]→4"|Slice_1
MatMul_1-->|"0 [2, 3, 64, 80]→1"|Concat_0
Producer_1-->|"0 [2]→2"|Slice_0
Producer_2-->|"0 [2]→3"|Slice_0
Producer_3-->|"0 [2]→4"|Slice_0
Producer_4-->|"0 [2]→1"|Slice_1
Producer_5-->|"0 [2]→2"|Slice_1
Producer_6-->|"0 [2]→3"|Slice_1
Identity_0-->|"0 [2, 3, 80, 80]→0"|Slice_0
Identity_0-->|"0 [2, 3, 80, 80]→0"|Slice_1
Slice_0-->|"0 [2, 3, 16, 80]→0"|MatMul_0
Producer_0-->|"0 [2]→1"|Slice_0
MatMul_0-->|"0 [2, 3, 16, 80]→0"|Concat_0
Identity_1-->|"0 [2, 3, 80, 80]→1"|MatMul_1
Identity_1-->|"0 [2, 3, 80, 80]→1"|MatMul_0
Slice_1-->|"0 [2, 3, 64, 80]→0"|MatMul_1
input0((in#0)):::inputCls--->|"→0[2, 3, 80, 80]"|Identity_0
input1((in#1)):::inputCls--->|"→0[2, 3, 80, 80]"|Identity_1
Concat_0--->|"0 [2, 3, 80, 80]→"|output0((out#0)):::outputCls
classDef inputCls fill:#afa
classDef outputCls fill:#ffa
classDef externalCls fill:#ccc
classDef producerCls fill:#ccf
classDef genericCls fill:#f9f9ff,stroke-width:1px,stroke-dasharray: 5 5
classDef metaCls stroke-width:5px
classDef rootCls stroke:#f00
classDef producerCls_rootCls stroke:#f00,fill:#ccf
classDef genericCls_rootCls stroke:#f00,fill:#f9f9ff,stroke-width:1px,stroke-dasharray: 5 5
classDef metaCls_rootCls stroke:#f00,stroke-width:5px
Initial graph:
%%{init: {'flowchart': { 'curve': 'monotoneY'}, 'fontFamily': 'Verdana' } }%%
flowchart TB
MatMul_0("matmul1<br/><sub><em>(MatMul#0)</em></sub>"):::rootCls
Producer_1("w1<br/><sub><em>(Producer#1)</em></sub>"):::producerCls
Producer_0("dataProvider<br/><sub><em>(Producer#0)</em></sub>"):::producerCls
MatMul_0--->|"0 [2, 3, 80, 80]→"|output0((out#0)):::outputCls
Producer_1-->|"0 [2, 3, 80, 80]→1"|MatMul_0
Producer_0-->|"0 [2, 3, 80, 80]→0"|MatMul_0
classDef inputCls fill:#afa
classDef outputCls fill:#ffa
classDef externalCls fill:#ccc
classDef producerCls fill:#ccf
classDef genericCls fill:#f9f9ff,stroke-width:1px,stroke-dasharray: 5 5
classDef metaCls stroke-width:5px
classDef rootCls stroke:#f00
classDef producerCls_rootCls stroke:#f00,fill:#ccf
classDef genericCls_rootCls stroke:#f00,fill:#f9f9ff,stroke-width:1px,stroke-dasharray: 5 5
classDef metaCls_rootCls stroke:#f00,stroke-width:5px