Skip to content

FPGA Optimization for Performance Counters

Eclipse Webmaster requested to merge github/fork/ganoam/fpga-opt-perfcnt-pr into master

Created by: ganoam

This commit modifies the performance counter implementation to allow inference of DSP slices to absorb counter logic on FPGas. By making use of dedicated circuitry we are able to save both logic and flip-flop resources. The optimization only works for counter widths up to 48 bit, due to restrictions of the DSP slices. The largest benefit has been observed for 32 bit wide counters. In order to avoid premature overflows, the mcycle and minstret counters are left untouched (64 bit).

Summary of changes:

  • Counter logic for mhpmcounters is moved to a new separate module.
  • The preprocessor variable TARGET_XILINX (automatically set by Bender) is used to include the required synthesis pragma and synchronous reset.
  • DSP inference is supported for Xilinx FPGA devices featuring DSP48E1 slices or similar and for counter widths up to 48 bits. The benefits are the largest for 32 bits wide counters.
  • A new top-level parameter MHPMCounterWidth is introduced to control the width of the performance counters (excluding mcycle and minstret).

Concrete Savings: (Xilinx Kintex-7, xc7k325tffg900-2)

14 counters, 32 bit wide

            LUT   FF      DSP48E1
---------------------------------
baseline:   39874 22349   27
optimized:  38694 21922   41
--------------------------------
Diff        -1180 -427    +14
            -3.0% -1.9%

14 counters, 48 bit wide

            LUT   FF      DSP48E1
---------------------------------
baseline:   39841 22539   27
optimized:  39533 21905   41
--------------------------------
Diff        -308  -634    +14
            -0.8% -2.8%

Modified write adress decoder

Merge request reports

Loading