Square root R=4 K=2 bug fix
Created by: kevindkim723
Certain instances of the divider can calculate more digits than necessary and produce the wrong result because of integer digit calculation in preprocessing stage for square root.
For example, For R=4, K=2, a single-precision (F) divider must generate (NF+2)+LOG(R)=28 bits. Because rK=4, the divider iterates for 7 cycles with each cycle generating rK=4 bits. However, with square root pre-generating LOG(R)=2 integer bits, there are 26 bits left to be calculated. The divider still works through 7 cycles and calculates 28 bits, although only 26 bits are necessary. The last 2 bits, which are generated by the second divider stage on the last cycle, fudge the LSBs of the square root result.
By modifying digit selection logic to always produce a 1 on the first step we can fix this bug, and subsequently removes the need for sqrt-specific muxing for X.
The modified divider under all R/K/FDQH permutations pass the div/sqrt softfloat vectors.