Skip to content

CUDA support for the Quantization routines

Benjamin Halimi requested to merge DevQAT into dev

Description

This MR introduces the CUDA support for the PTQ and QAT routines.

For the Quantization module, it is a game changer as it now allows to quantize real size models (e.g. ResNet18) in only few minutes instead of several hours.

Regarding the QAT, minor modifications were also made to make it functional, but for now the QAT is only working over small sized models.

Changes

Here is an exhaustive list of the changes made to the source files :

  • Support of the Leaky ReLUs for the PTQ
  • Support of the CUDA backend for the PTQ
  • Fix of the CUDA backend for the LSQ/FixedQ nodes
  • Support of the CUDA backend for the QAT routines
  • Add a recipe submodule for the Quantization module

TODO

This MR does not fully enable the QAT, which for now only works over small models/datasets.

Later works will also provide unit tests for the QAT.

Merge request reports

Loading