Since the Heaviside function is non-differentiable, the classical approach is to use a surrogate gradient for the backward pass. The most common gradient function used is the shifted arctan function :
For the forward pass, th :
Thus for the backward :
Copyright © Eclipse Foundation, Inc. All Rights Reserved. Privacy Policy | Terms of Use | Copyright Agent