⚡ ️ Calculate positive and negative FMA sums in parallel
Created by: michael-platzer
The FMA currently calculates the sum of the product and the addend first and, in case certain conditions are met, it then negates it.
However, two's complement negation requires another full carry chain over the width of the sum, which means that the FMA currently has two sequential carry chains over the entire 3*PRECISION_BITS+4
bits of the sum (one for the addition and another one for the negation). This leads to long paths and timing issues.
A common way to reduce the length of signal paths in a FP adder is to calculate a positive and a negative sum in parallel and then select the correct result with a mux. The advantage is that the carry chains are no longer sequential but parallel instead. The Wally core uses this technique in its FMA: https://github.com/openhwgroup/cvw/blob/main/src/fpu/fma/fmaadd.sv
This PR modifies the FMA of CVFPU accordingly. We have verified that it is equivalent to the current implementation with Synopsys VC formal's sequential equivalence check (the script to reproduce the check is posted in a comment below).
Initial results from preliminary PD runs in FC suggest that TNS is reduced and timing ameliorated.