Changeset 244 for docs/ASPLOS09/asplos094cameron.tex
 Timestamp:
 Dec 23, 2008, 8:01:15 AM (11 years ago)
 File:

 1 edited
Legend:
 Unmodified
 Added
 Removed

docs/ASPLOS09/asplos094cameron.tex
r243 r244 1424 1424 Only one of these values may be 1; both are 0 if 1425 1425 no modifier is specified. 1426 In addition, the SIDU supplies decoded field width information 1427 to both the SOFU and to the SIMD Instruction Execute Unit (SIEU). 1426 The SIDU also supplies decoded field width signals $w_k$ 1427 for each field width $2^k$ to both the SOFU and to the 1428 SIMD Instruction Execute Unit (SIEU). Only one of the 1429 field width signals has the value 1. 1428 1430 The SIDU also supplies decoded SIMD opcode information to SIEU and 1429 1431 a decoded $A$bit register address for the destination register to … … 1448 1450 $n=2^k$ for $0 \leq k \leq K$. This involves 1449 1451 additional circuitry for field widths not supported 1450 in existing processors. For inductive doubling 1451 algorithms in support of parallel bit streams, 1452 the principal need is for additional circuitry to 1453 support 2bit and 4bit field widths. This circuity 1454 is generally less complicated than that for larger 1455 fields. Support for circuitry at these width 1456 has other applications as well. For example, 1457 DNA sequences are frequently represented using 1458 packed sequences of 2bit codes for the four possible 1459 nucleotides\cite{}, while the need for accurate financial 1460 calculation has seen a resurgence of the 4bit 1461 packed BCD format for decimal floating point \cite{}. 1452 in existing processors. In our evaluation model, 1453 IDISAA adds support for 2bit, 4bit and 128bit 1454 field widths in comparison with the RefA architecture, 1455 while IDISAB similarly extends RefB. 1462 1456 1463 1457 When execution of the SWAR instruction is … … 1469 1463 \subsection{Operand Fetch Unit Logic} 1470 1464 1471 Discussion of gatelevel implementation. 1465 The SOFU is responsible for implementing the halfoperand 1466 modification logic for each of up to two input operands fetched 1467 from SRF. For each operand, this logic is implemented 1468 using the decoded halfoperand modifiers signals $h$ and $l$, 1469 the decoded field width signals $w_k$ and the 128bit operand 1470 value $r$ fetched from SRF to produce a modified 128bit operand 1471 value $s$ following the requirements of equations (4), (5) and 1472 (6) above. Those equations must be applied for each possible 1473 modifier and each field width to determine the possible values $s[i]$ 1474 for each bit position $i$. For example, consider bit 1475 position 41, whose binary 7bit address is $0101001$. 1476 Considering the address bits left to right, each 1 bit 1477 corresponds to a field width for which this bit lies in the 1478 lower $n/2$ bits (widths 2, 16, 64), while each 0 bit corresponds to a field 1479 width for which this bit lies in the high $n/2$ bits. 1480 In response to the halfoperand modifier signal $h$, 1481 this bit may receive a value from the corresponding field 1482 of width 2, 16 or 64 whose address bit is 0, namely $r[40]$, 1483 $r[33]$ or $r[9]$. Otherwise, this bit receives the value $r[41]$, 1484 in the case of no halfoperand modifier, or a low halfoperand modifier 1485 in conjunction with a field width signal $w_2$, $w_{16}$ or $w_{64}$. 1486 The overall logic for determining this bit value is thus given as follows. 1487 \begin{eqnarray*} 1488 s[41] & = & h \wedge (w_2 \wedge r[40] \vee w_{16} \wedge r[33] \vee w_{64} \wedge r[9]) \\ 1489 & & \vee \neg h \wedge (\neg l \vee w_2 \vee w_{16} \vee w_{64}) \wedge r[41] 1490 \end{eqnarray*} 1491 1492 Similar logic is determined for each of the 128 bit positions. 1493 For each of the 7 field widths, 64 bits are in the low $n/2$ bits, 1494 resulting in 448 2input and gates for the $w_k \wedge r[i]$ terms. 1495 For 120 of the bit positions, or gates are needed to combine these 1496 terms; $441 120 = 321$ 2input or gates are required. Another 1497 127 2input and gates combine these values with the $h$ signal. 1498 In the case of a lowhalfoperand modifier, the orgates combining $w_k$ 1499 signals can share circuitry. For each bit position $i=2^k+j$ one 1500 additional or gate is required beyond that for position $j$. 1501 Thus 127 2input or gates are required. Another 256 2input and gates 1502 are required for combination with the $\not h$ and $r[i]$ terms. The terms for 1503 the low and high halfoperand modifiers are then combined with an 1504 additional 127 2input or gates. Thus, the circuity complexity 1505 for the combinational logic implementation of halfoperand 1506 modifiers within the SOFU is 1279 2input gates per operand, 1507 or 2558 gates in total. 1472 1508 1473 1509
Note: See TracChangeset
for help on using the changeset viewer.