CUDA support for the Quantization routines
Description
This MR introduces the CUDA support for the PTQ and QAT routines.
For the Quantization module, it is a game changer as it now allows to quantize real size models (e.g. ResNet18) in only few minutes instead of several hours.
Regarding the QAT, minor modifications were also made to make it functional, but for now the QAT is only working over small sized models.
Changes
Here is an exhaustive list of the changes made to the source files :
- Support of the Leaky ReLUs for the PTQ
- Support of the CUDA backend for the PTQ
- Fix of the CUDA backend for the LSQ/FixedQ nodes
- Support of the CUDA backend for the QAT routines
- Add a recipe submodule for the Quantization module
TODO
This MR does not fully enable the QAT, which for now only works over small models/datasets.
Later works will also provide unit tests for the QAT.