Mistakes in some backward kernel implementations
Description
I'm implementing a new node (Clip
), so I went through the sources of the backward kernels of several nodes to see how they are implemented, and it looks like some of them contain mistakes.
To be specific, there seems to be some confusion (in few of them) between straight derivative computation, and what the backward pass should do.
For example, in the Sqrt
backward kernel, this :
output[i] = 0.5 / std::sqrt(input[i]);
Should be repaced by :
grad_input[i] = (0.5 / std::sqrt(input[i])) * grad_output[i];
And in the LeakyReLU
backward kernel, this :
output[i] = (input[i] > 0) ? input[i] : negativeSlope*input[i];
Should be repaced by :
grad_input[i] = (input[i] > 0) ? grad_output[i] : negativeSlope*grad_output[i];
There might be some similar issues in other backward kernels, but most of them seem correct to me.