[quantization] Inconsistent integer promotion in QAdd operator leads to int8 overflow during addition
Required prerequisites
-
Make sure you've read the documentation. Your issue may be addressed there. -
Search the issue tracker and discussions to verify that this hasn't already been reported. +1 or comment there if it has.
What commit version of aidge do you use
-
aidge_core: 0.7.0 -
aidge_quantization: 0.4.2 -
aidge_export_cpp: 0.4.0
Problem description
An inconsistency has been identified in the computation logic of the QAdd block (addition, multiplication, and bit shifting) between aidge_core and aidge_export_cpp when using an 8-bit quantized model.
Analysis suggests that aidge_core performs implicit promotion of intermediate values to 32-bit integers (int32_t), in accordance with the C++ standard, which mandates integral promotion for arithmetic operations involving types smaller than int. However, this promotion does not appear to occur in aidge_export_cpp (compiled with g++ using only the options specified in the default Makefile).
As a result, an integer overflow can occur during the addition step, leading to incorrect output values in aidge_export_cpp.
Example
For a QAdd operator with a scale factor of 108 and a right shift of 7:
-
aidge_core:
QAdd[108, 7](13, 115) = (13 + 115) * 108 >> 7 = 128 * 108 >> 7 = 108 -
aidge_export_cpp:
QAdd[108, 7](13, 115) = (13 + 115) * 108 >> 7 = -128 * 108 >> 7 = -108(overflow in addition)
This discrepancy can lead to incorrect behavior when executing quantized models.