Replace performance counters with Hardware Performance Monitor
Created by: silabs-PaulZ
The following change was approved by the OpenHW Core Task Group:
Replace existing performance counter implementation with the Hardware Performance Monitor specified in the privileged architecture.
Access to an existing performance counter will be removed:
- PCMRs,
- PCERs,
- PCCRs No longer support external events
Specific feature details that were not fully discussed are now mentioned in this issue :
- All counters shall be 64bits.
- MCYCLE and MINSTRET shall always be implemented and available
- The number of MHPMCOUNTERs (and corresponding MHPMEVENTs) registers shall be parameterizable with a top level parameter, NUM_MHPMCOUNTERS
- default value of 1
- range between 1 and 29
- Access to non-implemented MHPMCOUNTER and MHPEVENT shall be allowed and will always return a read value 0
- MCOUNTEREN and SCOUNTEREN shall not be implemented, as these WARL registers control access to lower privilege modes and shall return a value of 0 when read (it will not cause an illegal instruction)
- MCOUNTINHIBIT shall be implemented to disable/enable counting of each counter
- The MHPMEVENT will each have 15 bits, allowing for the selection of the following monitor counting options (matches existing PCCR count types):
Index | Name | Description |
---|---|---|
0 | CYCLES | Counts the number of cycles the core was active (not sleeping) |
1 | INSTR | Counts the number of instructions executed |
2 | LD_STALL | Number of load data hazards |
3 | JR_STALL | Number of jump register data hazards |
4 | IMISS | Cycles waiting for instruction fetches, i.e. number of instructions wasted due to non-ideal caching |
5 | LD | Number of data memory loads executed. (Misaligned accesses are counted twice) |
6 | ST | Number of data memory stores executed. (Misaligned accesses are counted twice) |
7 | JUMP | Number of unconditional jumps (j, jal, jr, jalr) |
8 | BRANCH | Number of branches. (Counts taken and not taken branches) |
9 | BTAKEN | Number of taken branches. |
10 | RVC | Number of compressed instructions executed |
11 | FP_TYPE | Cycles wasted due to different latencies of subsequent FP-operations |
12 | FP_CONT | Cycles wasted due to contentions at the shared FPU (PULP only) |
13 | FP_DEP | Cycles wasted due to data hazards in subsequent FP instructions |
14 | FP_WB | Cycles wasted due to FP operations resulting in write-back contentions |