For xpulp SIMD ALU shift instructions with SCI mode CV.SRL.SCI, CV.SRA.SCI, CV.SLL.SCI 6-bit Imm usage ambiguity between User Manual and RTL

Created by: dd-vaibhavjain

Issue Description

For xpulp SIMD ALU Operation instructions - Shift Instructions with Immediate Scalar Replication mode – CV.SRL.SCI, CV.SRA.SCI, CV.SLL.SCI , 6-bit Immidiate is used as per encoding but upper bits gets dropped in rtl as per the SIMD vector data size of 16 or 8 bit.

Section 17.6.1 SIMD ALU operations of User-manual-en-cv32e40p_v1.3.0
Description:
cv.srl[.sc,.sci]{.h,.b} rD, rs1, [rs2, Imm6] : rD[i] = rs1[i] >> op2[i] Note: Immediate is zero-extended, shift is logical.
Encoding:
31 : 27 26 25 24 : 20 19 : 15 14 : 12 11 : 7 6 : 0
0 1000 0 Imm6[0|5:1] src1 110 dest 111 1011 cv.srl.sci.h rD, rs1, Imm6

Here as per this instruction's encoding in the Spec, Immediate is a 6-bit value and Zero-extended.
Thus, as per spec here all 6-bits [5:0] are valid and immediate value encoded in instruction can be as such be 6-bit wide.

But current RTL implementation drops the upper bits as per vector data mode it is operating on, for example:
For 16-bit SIMD : CV.SRL.SCI.H :design only uses [3:0] bits for shift operation and drops the upper bits of the immediate completely.
For 8-bit SIMD : CV.SRL.SCI.B : design only uses [2:0] bits for shift operation and drops the upper bits of the immediate completely. This is not explained in user manual also, and as such in case upper bits [5:4] or [5:3] are written the expected result is not clear.

Related RTL file: cv32e40p/rtl/cv32e40p_alu.sv
For RTL reference, always_comb block at line 272:
VEC_MODE16: begin
shift_right_result[31:16] = $signed(
{shift_arithmetic & shift_op_a[31], shift_op_a[31:16]}
) >>> shift_amt_int[19:16];
shift_right_result[15:0] = $signed(
{shift_arithmetic & shift_op_a[15], shift_op_a[15:0]}
) >>> shift_amt_int[3:0];
end
….
…

Thus probably there is a need for calrification or potential RTL fix based on what is design decision:
(1) clarification from the Spec, by defining either valid number of immidiate bits writable OR how upper bits should be treated if 6-bit immidate is available.
(2) RTL fix to either prevent or cause illegal encoding for upper immediate bits OR if chose to use all 6-bits as the case currently, so for cases of shift being more than vector size, the rtl implementation matches User Manual/spec description by either dropping/or using all upper bits.

Component

Component:Doc
Component:RTL

Steps to Reproduce

cv32e40p Hash : f789eb83 ;
core-v-verif ; branch - cv32e40p/dev ; hash : f4a83bd3ffc9ac69584f17b9d13fa0d77b22c833
cd core-v-verif/cv32e40p/sim/uvmt
make clean gen_corev-dv TEST=corev_sanity_instr_test CFG=pulp_fpu SEED=894955177 SIMULATOR=vsim
make test TEST=corev_sanity_instr_test USE_ISS=yes WAVES=1 CFG=pulp_fpu SEED=506015487 SIMULATOR=vsim
Log Error Message:
Info (IDV) Instruction executed prior to mismatch '0x3f2(main+522): 42e165fb cv.srl.sci.h x11,x2,29'
Error (IDV) GPR register value mismatch (HartId:0, PC:0x000003f2 main+522):
Info (IDV) 0> GPR x11
Info (IDV) . dut:0x00000007
Info (IDV) . ref:0x00000000

Trace Log:
767.000 ns 250 000003ee 40c3dcfb cv.srl.sc.b x25, x7, x12 x25=60000000 x7:60000000 x12:00000000
770.000 ns 251 000003f2 42e165fb cv.srl.sci.h x11, x2, 0x29 x11=00000007 x2:0000ff44
773.000 ns 252 000003f6 40fef4fb cv.srl.sci.b x9, x29, 0x30 x9=00000000 x29:07060a1f