Replace FMA's LZC with CVW's LZA
Replace leading zero counter with leading zero anticipator in FMA sum path
Summary
This PR optimizes the floating-point multiply-add (FMA) unit by replacing the sequential leading zero counter (LZC) in the sum path with a parallel leading zero anticipator (LZA). This change removes normalization from the critical path, significantly improving FMA performance.
Problem
The previous implementation computed the sum first, then counted leading zeros for normalization:
Multiply → Align → Add/Subtract → Count Leading Zeros → Normalize → Result
                                      ↑
                               Critical path bottleneckThis sequential approach added unnecessary latency to the FMA operation, as normalization had to wait for the complete sum calculation.
Solution
Added Schmookler's leading zero anticipation algorithm IEEEX, implemented in the Walley Core that predicts the normalization shift count in parallel with the sum computation:
Multiply → Align → Add/Subtract ──────────→ Normalize → Result
           ↓                               ↗
           └── Leading Zero Anticipator ──┘
           (in parallel)Technical Details
The LZA implementation:
- Uses carry-lookahead logic (P/G/K signals) to predict leading zero patterns
- Handles both addition and subtraction operations via the subcontrol signal
- Added logic to detect and handle miss-predictions by one
- Feeds the predicted shift count directly to the normalization stage
Testing
- Verified with Synopsys VC formal 's sequential equivalence check
- Proven to be equal
  Summary Proofs:
   ----------------------------------------------------------------------------------------------------------------------
    VpId |           Name |      Type |         Parent |     #A |     #C |     #S |     #F |     #I |    Status |     %
   ----------------------------------------------------------------------------------------------------------------------
       0 |         seqdef |      root |            nil |     13 |      3 |     13 |      0 |      0 |   success |   100
       0 |      seqdef-rw |        or |         seqdef |      - |      - |      - |      - |      - |         - |     -
       0 |          rw1_1 |       int |      seqdef-rw |      5 |      0 |      5 |      0 |      0 |   success |   100
       0 |       rw1_1-ur |        or |          rw1_1 |      - |      - |      - |      - |      - |         - |     -
       0 |           ur_1 |      leaf |       rw1_1-ur |      4 |      0 |      4 |      0 |      0 |   success |   100
       0 |      rw1_1-dcp | decompose |          rw1_1 |      - |      - |      - |      - |      - |         - |     -
       0 |         idcp_1 |      leaf |      rw1_1-dcp |      4 |      0 |      4 |      0 |      0 |   success |   100
   ----------------------------------------------------------------------------------------------------------------------