Scheduler Backward and Input Tensor Gradient
Context
I'm working on allowing CUDA based training in AIDGE using any device index.
Important note: This problem only appears when attempting to train a model on the cuda
backend with a device
index different from zero.
Description
At the moment, the scheduler backward()
does not take care of the fact that the input tensor gradient may not exist.
In fact, if we follow the standard training pipeline, this input tensor gradient is never created !
Thus, the following loop will end up crashing after the backward()
call (at the end of the first iteration).
for _ in range(10 ** 6):
# get samples
x, y_h = get_batch_pair()
# forward
scheduler.forward(True, [x])
y = get_ouput_tensor()
# loss
l = loss(y, y_h)
# backward
scheduler.backward()
# optimizer
optimizer.update() # <- crashes around here !
optimizer.reset_grad(classifier)
Workaround
A workaround for this issue is to explicitly create a grad tensor for the input tensor before calling the backward()
method.
This can be done by simply calling the grad()
method just after retrieving the input.
x, y_h = get_batch_pair()
x.grad() # <- this creates the grad tensor and fix the issue !
...
Nevertheless this is not a very user friendly solution, so a fix has to be made !