[backend_cuda] Unable to perform inference for different batch sizes on backend cuda

Problem description

It is not possible to use scheduler.forward(data=[data_batch]) multiple times for different data_batch sizes when using backend cuda.

In the provided notebook.tar.gz, we compute the accuracy of a network using two different configurations:

When drop_last option for data provider is set to True, the computation succeeds,
When drop_last option for data provider is set to False, the computation fails.

The only difference between the two configurations is the size of the last batch fed into the network.