Add FPGA Optimized Register File Version
Created by: ganoam
Add a register file, optimized for synthesis on FPGAs supporting distributed RAM. The register file features two RAM blocks each with 1 sync-write and 3 async read ports. To achieve the behavior of a 2 sync-write / 3 async-read register file, the read access is arbitrated depending on which block was last written to. For this purpose an additional array of 1-bit registers is introduced.
Savings for FPGA synthesis are achieved by:
- Replacing an Array of FFs with distributed RAM. Example: 31 32-bit registers as FFs occupy 992 FFs, or 446 LUTs on Xilinx Artix-7 FPGAs. The equivalent storage capacity using distributed RAM is implemented by 36 RAM32M primitives (inferrred from generic HDL), or 144 distributed RAM enabled LUTs, and 31 FFs for block selection (16 LUTs).
- The distributed RAM primitives have the read- address decoders already integrated. This saves three 32-bit 32-to-1 multiplexers at the read ports.
- Since both write ports unconditionally write to their respective RAM blocks, the multiplexing of the write ports is also saved. That is 32 32-bit 2-to-1 multiplexers.
Concrete Savings:
-
without FPU reg file: baseline: 7347 LUTs, 2508 FFs optimized: 5722 LUTs, 1541 FFs ------------------------------- difference: -1625 LUTS (-22.1%) -967 FFs (-38.6%)
-
with FPU reg file: baseline: 13160 LUTs, 4027 FFs optimized: 10257 LUTs, 2062 FFs ------------------------------- difference: -3353 LUTS (-24.6%) -1965 FFs (-48.8%)
Signed-off-by: ganoam gnoam@live.com