Dec 8, 2008, 12:50:08 PM (11 years ago)

IDISA implementation: libraries; model.

1 edited


  • docs/ASPLOS09/asplos094-cameron.tex

    r227 r228  
    1025 We have constructed libraries that provide
    1026 simulated implementation of the inductive doubling architecture
    1027 on each of the MMX, SSE, Altivec, and SPU platforms and have
    1028 used these libraries in the implementation of each of the
    1029 parallel bit stream algorithms discussed herein.
    1030 This implementation work has been successful in validating
    1031 the basic concepts underlying the inductive doubling instruction
    1032 set architecture.
    1034 Implementation of the architecture on chip is beyond the
    1035 scope of our present resources and capabilities.  However,
    1036 the principal requirements are implementation of the various
    1037 operations at all power-of-2 field widths and implementation
    1038 of half-operand modifiers.  Implementation of SIMD operations
    1039 at additional field widths involves design trade-offs
    1040 with respect to transistor counts, available opcode space,
    1041 and the potential value of the new operations to SIMD
    1042 programmers.  From the perspective of parallel bit
    1043 stream programming, the primary need is for SIMD integer,
    1044 shift, pack and merge operations at field widths of 2, 4
    1045 and 8, as well as the field width of 1, where it makes
    1046 sense (e.g. with merge operations).  In support of the
    1047 general concept of inductive doubling architecture,
    1048 SIMD operations at large field widths (64, 128) are also
    1049 called for, but these operations cannot be justified on
    1050 the basis of parallel bit stream programming.
    1052 Implementation of half-operand modifiers can logically
    1053 be carried out with additional circuitry attached to the
    1054 register fetch units of a pipelined processor.  This
    1055 circuitry would require control signals from the
    1056 instruction decode unit to identify the field widths
    1057 of operands and the particular half-operand modifier to be applied,
    1058 if any.  The additional logic required for instruction
    1059 decode and that required for operand modification
    1060 as part of the operand fetch process is expected to be
    1061 reasonably modest.
    1063 Full assessment of implementation issues is an important
    1064 area for future work.
     1025We have carried implementation work for IDISA in three
     1026ways.  First, we have constructed libraries that
     1027implement the IDISA instructions by template and/or macro
     1028expansion for each of MMX, SSE, Altivec, and SPU platforms.
     1029Second, we have developed a model implementation
     1030involving a modified operand fetch component
     1031of a pipelined SIMD processor.  Third, we have written
     1032and evaluated Verilog HDL description of this model
     1035\subsection{IDISA Libraries}
     1037Implementation of IDISA instructions using template
     1038and macro libraries has been useful in developing
     1039and assessing the correctness of many of the algorithms
     1040presented here.  Although these implementations do not
     1041deliver the performance benefits associated with
     1042direct hardware implementation of IDISA, they
     1043have been quite useful in providing a practical means
     1044for portable implementation of parallel bit stream
     1045algorithms on multiple SWAR architectures.  However,
     1046one additional facility has also proven necessary for
     1047portability of parallel bit stream algorithms across
     1048big-endian and little-endian architectures: the
     1049notion of shift-forward and shift-back operations.
     1050In essence, shift forward means shift to the left
     1051on little-endian systems and shift to the right on
     1052big-endian systems, while shift back has the reverse
     1053interpretation.  Although this concept is unrelated to
     1054inductive doubling, its inclusion with the IDISA
     1055libraries has provided a suitable basis for portable
     1056SIMD implementations of parallel bit stream algorithms.
     1057Beyond this, the IDISA libraries have the additional
     1058benefit of allowing the implementation
     1059of inductive doubling algorithms at a higher level
     1060abstraction, without need for programmer coding of
     1061the underlying shift and mask operations.
     1063\subsection{IDISA Model}
     1064Figure \ref{pipeline-model} shows a model architecture
     1065for a pipelined SIMD processor implementing IDISA.
     1066The SIMD Register File (SRF) provides a file of $R = 2^A$
     1067registers each of width $N = 2^K$ bits. 
     1068IDISA instructions identified by the Instruction Fetch
     1069Unit (IFU) are forwarded for decoding to the SIMD
     1070Instruction Decode Unit (SIDU).  This unit decodes
     1071the instruction to produce
     1072signals identifying the source and destination
     1073operand registers, the half-operand modifiers, the
     1074field width specification and the SIMD operation
     1075to be applied.
     1077The SIDU supplies the source register information and the half-operand
     1078modifier information to the SIMD Operand Fetch Unit (SOFU).
     1079For each source operand, the SIDU provides an $A$-bit register
     1080address and two 1-bit signals $h$ and $l$ indicating the value
     1081of the decoded half-operand modifiers for this operand.
     1082Only one of these values may be 1; both are 0 if
     1083no modifier is specified.
     1084In addition, the SIDU supplies decoded field width information
     1085to both the SOFU and to the SIMD Instruction Execute Unit (SIEU).
     1086The SIDU also supplies decoded SIMD opcode information to SIEU and
     1087a decoded $A$-bit register address for the destination register to
     1088the SIMD Result Write Back Unit (SRWBU).
     1090The SOFU is the key component of the IDISA model that
     1091differs from that found in a traditional SWAR
     1092processor.  For each of the two $A$-bit source
     1093register addresses, SOFU is first responsible for
     1094fetching the raw operand values from the SRF.
     1095Then, before supplying operand values to the
     1096SIEU, the SOFU applies the half-operand modification
     1097logic as specified by the $h$, $l$, and field-width
     1098signals.  The possibly modified operand values are then
     1099provided to the SIEU for carrying out the SIMD operations.
     1100A detailed model of SOFU logic is described in the following
     1103The SIEU differs from similar execution units in
     1104current commodity processors primarily by providing
     1105SIMD operations at each field width
     1106$n=2^k$ for $0 \leq k \leq K$.  This involves
     1107additional circuitry for field widths not supported
     1108in existing processors.  For inductive doubling
     1109algorithms in support of parallel bit streams,
     1110the principal need is for additional circuitry to
     1111support 2-bit and 4-bit field widths.  This circuity
     1112is generally less complicated than that for larger
     1113fields.  Support for circuitry at these width
     1114has other applications as well.   For example,
     1115DNA sequences are frequently represented using
     1116packed sequences of 2-bit codes for the four possible
     1117nucleotides\cite{}, while the need for accurate financial
     1118calculation has seen a resurgence of the 4-bit
     1119packed BCD format for decimal floating point \cite{}.
     1121When execution of the SWAR instruction is
     1122completed, the result value is then provided
     1123to the SRWBU to update the value stored in the
     1124SRF at the address specified by the $A$-bit
     1125destination operand.
     1127\subsection{Operand Fetch Unit Logic}
Note: See TracChangeset for help on using the changeset viewer.