Fix poor implementations + memory leaks
All CuDNN initialization and workspace memory allocation should be done only once! This MR should improve performances and solve (at least some) memory leak issues (workspace was allocated at each forward pass but never freed!).
Fixed issues:
-
ReduceMean -
ReduceSum -
Sub -
Add -
Mul
Edited by Olivier BICHLER