FPGA Optimization for Performance Counters
Created by: ganoam
This commit modifies the performance counter implementation to allow inference of DSP slices to absorb counter logic on FPGas. By making use of dedicated circuitry we are able to save both logic and flip-flop resources. The optimization only works for counter widths up to 48 bit, due to restrictions of the DSP slices. The largest benefit has been observed for 32 bit wide counters. In order to avoid premature overflows, the mcycle and minstret counters are left untouched (64 bit).
Summary of changes:
- Counter logic for mhpmcounters is moved to a new separate module.
- The preprocessor variable TARGET_XILINX (automatically set by Bender) is used to include the required synthesis pragma and synchronous reset.
- DSP inference is supported for Xilinx FPGA devices featuring DSP48E1 slices or similar and for counter widths up to 48 bits. The benefits are the largest for 32 bits wide counters.
- A new top-level parameter MHPMCounterWidth is introduced to control the width of the performance counters (excluding mcycle and minstret).
Concrete Savings: (Xilinx Kintex-7, xc7k325tffg900-2)
14 counters, 32 bit wide
LUT FF DSP48E1
---------------------------------
baseline: 39874 22349 27
optimized: 38694 21922 41
--------------------------------
Diff -1180 -427 +14
-3.0% -1.9%
14 counters, 48 bit wide
LUT FF DSP48E1
---------------------------------
baseline: 39841 22539 27
optimized: 39533 21905 41
--------------------------------
Diff -308 -634 +14
-0.8% -2.8%
Modified write adress decoder