Improved scheduling
What's in this MR?
- Static schedule now includes early and late logical start for each scheduled element;
- A new multi-threaded
ParallelScheduler
; - A new simple thread pooling class
ThreadPool
; - The log class allows to specify default values through env variables;
- Updated C-P model to work with both data and tokens (it is now possible to generate a static sheduling without
forwardDims()
with the defaultOperatorImpl
C-P model).
Example:
This is the scheduling obtained for the graph that follows:
gantt
dateFormat x
axisFormat %Q
(Producer0) :0, 1
(Pop0) :0, 0
ltsm_hidden_state (Memorize0) :0, 1
ltsm_cell_state (Memorize1) :0, 4
(Producer1) :0, 1
ltsm_input (Identity0) :1, 1
ltsm_forgetGateH (FC4) :1, 2
ltsm_inputGateH (FC5) :1, 2
ltsm_cellCandidateH (FC6) :1, 2
ltsm_outputGateH (FC7) :1, 5
ltsm_forgetGateX (FC0) :2, 2
ltsm_inputGateX (FC1) :2, 2
ltsm_cellCandidateX (FC2) :2, 2
ltsm_outputGateX (FC3) :2, 5
ltsm_forgetGate (Add0) :3, 3
ltsm_inputGate (Add1) :3, 3
ltsm_cellCandidate (Add2) :3, 3
ltsm_outputGate (Add3) :3, 6
ltsm_forgetGateAct (Sigmoid0) :4, 4
ltsm_inputGateAct (Sigmoid1) :4, 4
ltsm_cellCandidateAct (Tanh0) :4, 4
ltsm_outputGateAct (Sigmoid2) :4, 7
ltsm_forgetGateMul (Mul0) :5, 5
ltsm_inputGateMul (Mul1) :5, 5
ltsm_add (Add4) :6, 6
ltsm_cell_state (Memorize1) :7, 12
ltsm_cellUpdatedAct (Tanh1) :7, 7
ltsm_outputGateMul (Mul2) :8, 8
ltsm_hidden_state (Memorize0) :9, 9
(Producer0) :3, 9
(Producer1) :2, 9
(Pop0) :2, 8
ltsm_forgetGateH (FC4) :10, 10
ltsm_inputGateH (FC5) :10, 10
ltsm_cellCandidateH (FC6) :10, 10
ltsm_outputGateH (FC7) :10, 13
ltsm_input (Identity0) :3, 9
ltsm_forgetGateX (FC0) :4, 10
ltsm_inputGateX (FC1) :4, 10
ltsm_cellCandidateX (FC2) :4, 10
ltsm_outputGateX (FC3) :4, 13
ltsm_forgetGate (Add0) :11, 11
ltsm_inputGate (Add1) :11, 11
ltsm_cellCandidate (Add2) :11, 11
ltsm_outputGate (Add3) :11, 14
ltsm_forgetGateAct (Sigmoid0) :12, 12
ltsm_inputGateAct (Sigmoid1) :12, 12
ltsm_cellCandidateAct (Tanh0) :12, 12
ltsm_outputGateAct (Sigmoid2) :12, 15
ltsm_forgetGateMul (Mul0) :13, 13
ltsm_inputGateMul (Mul1) :13, 13
ltsm_add (Add4) :14, 14
ltsm_cell_state (Memorize1) :15, 17
ltsm_cellUpdatedAct (Tanh1) :15, 15
ltsm_outputGateMul (Mul2) :16, 16
ltsm_hidden_state (Memorize0) :17, 17
Example of parallel execution:
gantt
dateFormat x
axisFormat %Q µs
(Pop0) :0, 133
ltsm_hidden_state (Memorize0) :220, 242
ltsm_input (Identity0) :316, 321
ltsm_cell_state (Memorize1) :386, 398
(Producer0) :183, 482
(Producer1) :306, 575
ltsm_forgetGateH (FC4) :594, 640
ltsm_inputGateH (FC5) :639, 657
ltsm_cellCandidateH (FC6) :670, 693
ltsm_forgetGateX (FC0) :691, 705
ltsm_inputGateX (FC1) :734, 743
ltsm_cellCandidateX (FC2) :760, 773
ltsm_outputGateX (FC3) :798, 810
ltsm_outputGateH (FC7) :782, 797
(Pop0) :823, 900
ltsm_inputGate (Add1) :905, 1015
ltsm_forgetGate (Add0) :896, 1016
ltsm_cellCandidate (Add2) :912, 1020
ltsm_outputGate (Add3) :1041, 1106
ltsm_input (Identity0) :1166, 1168
ltsm_forgetGateAct (Sigmoid0) :1188, 1204
ltsm_inputGateAct (Sigmoid1) :1213, 1223
ltsm_cellCandidateAct (Tanh0) :1219, 1235
(Producer1) :1011, 1244
ltsm_outputGateAct (Sigmoid2) :1250, 1266
ltsm_forgetGateMul (Mul0) :1270, 1306
(Producer0) :1138, 1313
ltsm_inputGateMul (Mul1) :1288, 1315
ltsm_add (Add4) :1336, 1361
ltsm_forgetGateX (FC0) :1352, 1365
ltsm_inputGateX (FC1) :1363, 1372
ltsm_cellCandidateX (FC2) :1540, 1555
ltsm_outputGateX (FC3) :1558, 1565
ltsm_cellUpdatedAct (Tanh1) :1573, 1584
ltsm_hidden_state (Memorize0) :1630, 1636
ltsm_cell_state (Memorize1) :1592, 1598
ltsm_outputGateMul (Mul2) :1605, 1618
ltsm_forgetGateH (FC4) :1658, 1669
ltsm_inputGateH (FC5) :1690, 1697
ltsm_cellCandidateH (FC6) :1704, 1716
ltsm_outputGateH (FC7) :1722, 1729
ltsm_forgetGate (Add0) :1732, 1776
ltsm_inputGate (Add1) :1752, 1799
ltsm_cellCandidate (Add2) :1778, 1799
ltsm_forgetGateAct (Sigmoid0) :1831, 1837
ltsm_inputGateAct (Sigmoid1) :1843, 1846
ltsm_outputGate (Add3) :1811, 1856
ltsm_cellCandidateAct (Tanh0) :1868, 1873
ltsm_outputGateAct (Sigmoid2) :1876, 1880
ltsm_forgetGateMul (Mul0) :1902, 1918
ltsm_inputGateMul (Mul1) :1914, 1928
ltsm_add (Add4) :1948, 1969
ltsm_cellUpdatedAct (Tanh1) :1987, 1991
ltsm_cell_state (Memorize1) :2010, 2012
ltsm_outputGateMul (Mul2) :2016, 2028
ltsm_hidden_state (Memorize0) :2048, 2052
%%{init: {'flowchart': { 'curve': 'monotoneY'}, 'fontFamily': 'Verdana' } }%%
flowchart TB
Pop_0(<em>Pop#0</em>)
Identity_0("ltsm_input\n<sub><em>(Identity#0)</em></sub>"):::rootCls
Memorize_0("ltsm_hidden_state\n<sub><em>(Memorize#0)</em></sub>")
Memorize_1("ltsm_cell_state\n<sub><em>(Memorize#1)</em></sub>")
Add_4("ltsm_add\n<sub><em>(Add#4)</em></sub>")
FC_0("ltsm_forgetGateX\n<sub><em>(FC#0)</em></sub>")
FC_4("ltsm_forgetGateH\n<sub><em>(FC#4)</em></sub>")
Add_0("ltsm_forgetGate\n<sub><em>(Add#0)</em></sub>")
Sigmoid_0("ltsm_forgetGateAct\n<sub><em>(Sigmoid#0)</em></sub>")
Mul_0("ltsm_forgetGateMul\n<sub><em>(Mul#0)</em></sub>")
FC_1("ltsm_inputGateX\n<sub><em>(FC#1)</em></sub>")
FC_5("ltsm_inputGateH\n<sub><em>(FC#5)</em></sub>")
Add_1("ltsm_inputGate\n<sub><em>(Add#1)</em></sub>")
Sigmoid_1("ltsm_inputGateAct\n<sub><em>(Sigmoid#1)</em></sub>")
Mul_1("ltsm_inputGateMul\n<sub><em>(Mul#1)</em></sub>")
FC_2("ltsm_cellCandidateX\n<sub><em>(FC#2)</em></sub>")
FC_6("ltsm_cellCandidateH\n<sub><em>(FC#6)</em></sub>")
Add_2("ltsm_cellCandidate\n<sub><em>(Add#2)</em></sub>")
Tanh_0("ltsm_cellCandidateAct\n<sub><em>(Tanh#0)</em></sub>")
FC_3("ltsm_outputGateX\n<sub><em>(FC#3)</em></sub>")
FC_7("ltsm_outputGateH\n<sub><em>(FC#7)</em></sub>")
Add_3("ltsm_outputGate\n<sub><em>(Add#3)</em></sub>")
Sigmoid_2("ltsm_outputGateAct\n<sub><em>(Sigmoid#2)</em></sub>")
Mul_2("ltsm_outputGateMul\n<sub><em>(Mul#2)</em></sub>")
Tanh_1("ltsm_cellUpdatedAct\n<sub><em>(Tanh#1)</em></sub>")
Producer_0(<em>Producer#0</em>):::producerCls
Producer_1(<em>Producer#1</em>):::producerCls
Pop_0-->|"0→0"|Identity_0
Identity_0-->|"0→0"|FC_0
Identity_0-->|"0→0"|FC_1
Identity_0-->|"0→0"|FC_2
Identity_0-->|"0→0"|FC_3
Memorize_0-->|"1→0"|FC_4
Memorize_0-->|"1→0"|FC_5
Memorize_0-->|"1→0"|FC_6
Memorize_0-->|"1→0"|FC_7
Memorize_1-->|"1→1"|Mul_0
Add_4-->|"0→0"|Tanh_1
Add_4-->|"0→0"|Memorize_1
FC_0-->|"0→0"|Add_0
FC_4-->|"0→1"|Add_0
Add_0-->|"0→0"|Sigmoid_0
Sigmoid_0-->|"0→0"|Mul_0
Mul_0-->|"0→0"|Add_4
FC_1-->|"0→0"|Add_1
FC_5-->|"0→1"|Add_1
Add_1-->|"0→0"|Sigmoid_1
Sigmoid_1-->|"0→0"|Mul_1
Mul_1-->|"0→1"|Add_4
FC_2-->|"0→0"|Add_2
FC_6-->|"0→1"|Add_2
Add_2-->|"0→0"|Tanh_0
Tanh_0-->|"0→1"|Mul_1
FC_3-->|"0→0"|Add_3
FC_7-->|"0→1"|Add_3
Add_3-->|"0→0"|Sigmoid_2
Sigmoid_2-->|"0→0"|Mul_2
Mul_2-->|"0→0"|Memorize_0
Tanh_1-->|"0→1"|Mul_2
Producer_0-->|"0 [3, 2]→1"|FC_1
Producer_0-->|"0 [3, 2]→1"|FC_3
Producer_0-->|"0 [3, 2]→1"|FC_0
Producer_0-->|"0 [3, 2]→1"|FC_2
Producer_1-->|"0 [3, 3]→1"|FC_5
Producer_1-->|"0 [3, 3]→1"|FC_7
Producer_1-->|"0 [3, 3]→1"|FC_4
Producer_1-->|"0 [3, 3]→1"|FC_6
input0((in#0)):::inputCls--->|→0|Pop_0
input1((in#1)):::inputCls--->|→2|FC_0
input2((in#2)):::inputCls--->|→2|FC_1
input3((in#3)):::inputCls--->|→2|FC_2
input4((in#4)):::inputCls--->|→2|FC_3
input5((in#5)):::inputCls--->|→1|Memorize_0
input6((in#6)):::inputCls--->|→2|FC_4
input7((in#7)):::inputCls--->|→2|FC_5
input8((in#8)):::inputCls--->|→2|FC_6
input9((in#9)):::inputCls--->|→2|FC_7
input10((in#10)):::inputCls--->|→1|Memorize_1
Memorize_1--->|"0→"|output0((out#0)):::outputCls
Memorize_0--->|"0→"|output1((out#1)):::outputCls
classDef inputCls fill:#afa
classDef outputCls fill:#ffa
classDef externalCls fill:#ccc
classDef producerCls fill:#ccf
classDef genericCls fill:#f9f9ff,stroke-width:1px,stroke-dasharray: 5 5
classDef metaCls stroke-width:5px
classDef rootCls stroke:#f00
classDef producerCls_rootCls stroke:#f00,fill:#ccf
classDef genericCls_rootCls stroke:#f00,fill:#f9f9ff,stroke-width:1px,stroke-dasharray: 5 5
classDef metaCls_rootCls stroke:#f00,stroke-width:5px
Edited by Olivier BICHLER