Add hardware loop code generation
Created by: serkm
We use the HardwareLoops
pass before instruction selection to insert hardware loop intrinsics into LLVM-IR. This pass calls isHardwareLoopProfitable
which checks loops for following constraints.
- The
xcorevhwlp
extension is enabled - The loop needs exactly one basic block as latch and exiting block
- The trip count needs to be loop invariant and representable in a register
- The loop contains no instruction that might get lowered to a branch instruction
- The control flow graph of the loop body needs to be linear (except for nested hardware loops)
The intrinsics @llvm.set.loop.iterations.i32
and @llvm.loop.decrement.i32
with a conditional branch are lowered to the pseudo instructions HwlpSetup
/HwlpSetupImm
and HwlpBranch
, respectively.
The RISCVCoreVHwlpBlocks
pass is inserted before register allocation. This pass reorders the blocks of hardware loops to eliminate branch instructions and adds the blocks to RISCVMachineFunctionInfo
to prevent analysing their terminators and to disable instruction compression. (reusing this commit).
Before emitting the machine code, the RISCVExpandCoreVHwlpPseudo
pass replaces the pseudo instructions with the proper instructions from xcorevhwlp
. We have to do this as late as possible to make sure that the last instruction in the loop won't be moved.