Skip to content

Replace FMA's LZC with CVW's LZA

Replace leading zero counter with leading zero anticipator in FMA sum path

Summary

This PR optimizes the floating-point multiply-add (FMA) unit by replacing the sequential leading zero counter (LZC) in the sum path with a parallel leading zero anticipator (LZA). This change removes normalization from the critical path, significantly improving FMA performance.

Problem

The previous implementation computed the sum first, then counted leading zeros for normalization:

Multiply → Align → Add/Subtract → Count Leading Zeros → Normalize → Result

                               Critical path bottleneck

This sequential approach added unnecessary latency to the FMA operation, as normalization had to wait for the complete sum calculation.

Solution

Added Schmookler's leading zero anticipation algorithm IEEEX, implemented in the Walley Core that predicts the normalization shift count in parallel with the sum computation:

Multiply → Align → Add/Subtract ──────────→ Normalize → Result
           ↓                               ↗
           └── Leading Zero Anticipator ──┘
           (in parallel)

Technical Details

The LZA implementation:

  • Uses carry-lookahead logic (P/G/K signals) to predict leading zero patterns
  • Handles both addition and subtraction operations via the sub control signal
  • Added logic to detect and handle miss-predictions by one
  • Feeds the predicted shift count directly to the normalization stage

Testing

  • Verified with Synopsys VC formal 's sequential equivalence check
  • Proven to be equal
  Summary Proofs:
   ----------------------------------------------------------------------------------------------------------------------
    VpId |           Name |      Type |         Parent |     #A |     #C |     #S |     #F |     #I |    Status |     %
   ----------------------------------------------------------------------------------------------------------------------
       0 |         seqdef |      root |            nil |     13 |      3 |     13 |      0 |      0 |   success |   100
       0 |      seqdef-rw |        or |         seqdef |      - |      - |      - |      - |      - |         - |     -
       0 |          rw1_1 |       int |      seqdef-rw |      5 |      0 |      5 |      0 |      0 |   success |   100
       0 |       rw1_1-ur |        or |          rw1_1 |      - |      - |      - |      - |      - |         - |     -
       0 |           ur_1 |      leaf |       rw1_1-ur |      4 |      0 |      4 |      0 |      0 |   success |   100
       0 |      rw1_1-dcp | decompose |          rw1_1 |      - |      - |      - |      - |      - |         - |     -
       0 |         idcp_1 |      leaf |      rw1_1-dcp |      4 |      0 |      4 |      0 |      0 |   success |   100
   ----------------------------------------------------------------------------------------------------------------------

Merge request reports

Loading