Evaluate speed performance of inference with cuda
Measure the inference time of some operators with backend_cuda. Some optimizations that could be done is to reduce the things we do in the forward operation and maybe add them to the constructor of the operator.
Edited by Houssem ROUIS