Improve backward kernel of Mul operator.
Problem description
Current implementation in file MulImpl_kernels.hpp is confusing.
Improving backward of Mul will also be useful for other kernels of element-wise operations (Add, Sub, Div), and thus for SNNs.