Changeset 244

Dec 23, 2008, 8:01:15 AM (10 years ago)

Operand fetch unit logic

2 edited


  • docs/ASPLOS09/asplos094-cameron.tex

    r243 r244  
    14241424Only one of these values may be 1; both are 0 if
    14251425no modifier is specified.
    1426 In addition, the SIDU supplies decoded field width information
    1427 to both the SOFU and to the SIMD Instruction Execute Unit (SIEU).
     1426The SIDU also supplies decoded field width signals $w_k$
     1427for each field width $2^k$ to both the SOFU and to the
     1428SIMD Instruction Execute Unit (SIEU).  Only one of the
     1429field width signals has the value 1.
    14281430The SIDU also supplies decoded SIMD opcode information to SIEU and
    14291431a decoded $A$-bit register address for the destination register to
    14481450$n=2^k$ for $0 \leq k \leq K$.  This involves
    14491451additional circuitry for field widths not supported
    1450 in existing processors.  For inductive doubling
    1451 algorithms in support of parallel bit streams,
    1452 the principal need is for additional circuitry to
    1453 support 2-bit and 4-bit field widths.  This circuity
    1454 is generally less complicated than that for larger
    1455 fields.  Support for circuitry at these width
    1456 has other applications as well.   For example,
    1457 DNA sequences are frequently represented using
    1458 packed sequences of 2-bit codes for the four possible
    1459 nucleotides\cite{}, while the need for accurate financial
    1460 calculation has seen a resurgence of the 4-bit
    1461 packed BCD format for decimal floating point \cite{}.
     1452in existing processors.  In our evaluation model,
     1453IDISA-A adds support for 2-bit, 4-bit and 128-bit
     1454field widths in comparison with the RefA architecture,
     1455while IDISA-B similarly extends RefB.
    14631457When execution of the SWAR instruction is
    14691463\subsection{Operand Fetch Unit Logic}
    1471 Discussion of gate-level implementation.
     1465The SOFU is responsible for implementing the half-operand
     1466modification logic for each of up to two input operands fetched
     1467from SRF.  For each operand, this logic is implemented
     1468using the decoded half-operand modifiers signals $h$ and $l$,
     1469the decoded field width signals $w_k$ and the 128-bit operand
     1470value $r$ fetched from SRF to produce a modified 128-bit operand
     1471value $s$ following the requirements of equations (4), (5) and
     1472(6) above.  Those equations must be applied for each possible
     1473modifier and each field width to determine the possible values $s[i]$
     1474for each bit position $i$.  For example, consider bit
     1475position 41, whose binary 7-bit address is $0101001$.
     1476Considering the address bits left to right, each 1 bit
     1477corresponds to a field width for which this bit lies in the
     1478lower $n/2$ bits (widths 2, 16, 64), while each 0 bit corresponds to a field
     1479width for which this bit lies in the high $n/2$ bits.
     1480In response to the half-operand modifier signal $h$,
     1481this bit may receive a value from the corresponding field
     1482of width 2, 16 or 64 whose address bit is 0, namely $r[40]$,
     1483$r[33]$ or $r[9]$.   Otherwise, this bit receives the value $r[41]$,
     1484in the case of no half-operand modifier, or a low half-operand modifier
     1485in conjunction with a field width signal $w_2$, $w_{16}$ or $w_{64}$.
     1486The overall logic for determining this bit value is thus given as follows.
     1488s[41] & = & h \wedge (w_2 \wedge r[40] \vee w_{16} \wedge r[33] \vee w_{64} \wedge r[9]) \\
     1489& & \vee \neg h \wedge (\neg l \vee w_2 \vee w_{16} \vee w_{64}) \wedge r[41]
     1492Similar logic is determined for each of the 128 bit positions.
     1493For each of the 7 field widths, 64 bits are in the low $n/2$ bits,
     1494resulting in 448 2-input and gates for the $w_k \wedge r[i]$ terms.
     1495For 120 of the bit positions, or gates are needed to combine these
     1496terms; $441 -120 = 321$ 2-input or gates are required.  Another
     1497127 2-input and gates combine these values with the $h$ signal.
     1498In the case of a low-half-operand modifier, the or-gates combining $w_k$
     1499signals can share circuitry.  For each bit position $i=2^k+j$ one
     1500additional or gate is required beyond that for position $j$.
     1501Thus 127 2-input or gates are required.  Another 256 2-input and gates
     1502are required for combination with the $\not h$  and $r[i]$ terms.  The terms for
     1503the low and high half-operand modifiers are then combined with an
     1504additional 127 2-input or gates.   Thus, the circuity complexity
     1505for the combinational logic implementation of half-operand
     1506modifiers within the SOFU is 1279 2-input gates per operand,
     1507or 2558 gates in total.
Note: See TracChangeset for help on using the changeset viewer.