Skip to content

[quantization] Inconsistent integer promotion in QAdd operator leads to int8 overflow during addition

Required prerequisites

  • Make sure you've read the documentation. Your issue may be addressed there.
  • Search the issue tracker and discussions to verify that this hasn't already been reported. +1 or comment there if it has.

What commit version of aidge do you use

  • aidge_core: 0.7.0
  • aidge_quantization: 0.4.2
  • aidge_export_cpp: 0.4.0

Problem description

An inconsistency has been identified in the computation logic of the QAdd block (addition, multiplication, and bit shifting) between aidge_core and aidge_export_cpp when using an 8-bit quantized model.

Analysis suggests that aidge_core performs implicit promotion of intermediate values to 32-bit integers (int32_t), in accordance with the C++ standard, which mandates integral promotion for arithmetic operations involving types smaller than int. However, this promotion does not appear to occur in aidge_export_cpp (compiled with g++ using only the options specified in the default Makefile).

As a result, an integer overflow can occur during the addition step, leading to incorrect output values in aidge_export_cpp.

Example

For a QAdd operator with a scale factor of 108 and a right shift of 7:

  • aidge_core:
    QAdd[108, 7](13, 115) = (13 + 115) * 108 >> 7 = 128 * 108 >> 7 = 108
  • aidge_export_cpp:
    QAdd[108, 7](13, 115) = (13 + 115) * 108 >> 7 = -128 * 108 >> 7 = -108 (overflow in addition)

This discrepancy can lead to incorrect behavior when executing quantized models.

Edited by Clément Fisher