Skip to content

Mistakes in some backward kernel implementations

Description

I'm implementing a new node (Clip), so I went through the sources of the backward kernels of several nodes to see how they are implemented, and it looks like some of them contain mistakes.

To be specific, there seems to be some confusion (in few of them) between straight derivative computation, and what the backward pass should do.

For example, in the Sqrt backward kernel, this :

output[i] = 0.5 / std::sqrt(input[i]);

Should be repaced by :

grad_input[i] = (0.5 / std::sqrt(input[i])) * grad_output[i];

And in the LeakyReLU backward kernel, this :

output[i] = (input[i] > 0) ? input[i] : negativeSlope*input[i];

Should be repaced by :

grad_input[i] = (input[i] > 0) ? grad_output[i] : negativeSlope*grad_output[i];

There might be some similar issues in other backward kernels, but most of them seem correct to me.

Edited by Benjamin Halimi