Changeset 231 for docs

Dec 10, 2008, 9:19:54 PM (11 years ago)

Modified section on horizontal operations

1 edited


  • docs/ASPLOS09/asplos094-cameron.tex

    r230 r231  
    936936left instruction.  Right justification by shifting an $n$ bit field
    937937$i$ positions to the right is equivalent to a left rotate of $n-i$
    938 positions.  These rotation amounts are computed by the operation \newline
     938positions.  These rotation amounts are computed by the operation
    939939\verb#rj=simd<8>::sub<x,l>(simd<8>::const(8), cts_4)# as shown in row 5,
    940940except that don't care fields (which won't be subsequently used)
    943943The left shift amounts are calculated by \verb#lj=simd<8>::srli<4>(cts_4)#
    944944as shown in row 6, and are combined with the right shift amounts
    945 by the selection operation \newline \verb#rot_8=simd_if(simd<16>::const(0xFF00), rj, lj)#
     945by the selection operation \verb#rot_8=simd_if(simd<16>::const(0xFF00), rj, lj)#
    946946as shown in row 7.  Using these computed values, the inductive step
    947 is completed by application of the operation \newline \verb#rslt_16=simd<8>::rotl(rslt_8, rot_8)#
     947is completed by application of the operation \verb#rslt_16=simd<8>::rotl(rslt_8, rot_8)#
    948948as shown in row 8.
    970970{\em vertical} operations which combine corresponding
    971971fields of different registers.  Horizontal operations
    972 can be found that combine two (e.g., haddpd on SSE3),
    973 four (e.g, \verb:si_orx: on SPU), eight (e.g, psadbw on SSE)
    974 or sixteen values (e.g., vcmpequb on Altivec).  Some
     972can be found that combine two (e.g., \verb:haddpd: on SSE3),
     973four (e.g, \verb:si_orx: on SPU), eight (e.g, \verb:psadbw: on SSE)
     974or sixteen values (e.g., \verb:vcmpequb: on Altivec).  Some
    975975horizontal operations have a vertical component as well.
    976 For example, psadbw first forms the absolute value of
     976For example, \verb:psadbw: first forms the absolute value of
    977977the difference of eight corresponding byte fields before
    978978performing horizontal add of the eight values, while
    979 vsum4ubs on Altivec performs horizontal add of sets of
     979\verb:vsum4ubs: on Altivec performs horizontal add of sets of
    980980four unsigned 8-bit fields within one register
    981981and then combines the result horizontally with
    994994operations in general.
    996 By making use of \verb:<l,h>: half-operand modifier
    997 combinations, the inductive doubling architecture
    998 offers systematic support for horizontal operations
    999 on pairs of adjacent fields.
     996In contrast to this {\em ad hoc} support on commodity
     997processors, IDISA offers a completely systematic treatment
     998of horizontal operations without any special features beyond
     999the inductive doubling features already described.
     1000In the simplest case, any vertical operation
     1001\verb#simd<n>::F# on $n$-bit fields gives rise to
     1002an immediate horizontal operation
     1003\verb#simd<n>::F<l,h>(r, r)# for combining adjacent
     1004pairs of $n/2$ bit fields.
    10001005For example, \verb#simd<16>::add<l,h># adds values
    10011006in adjacent 8 bit fields to produce 16 bit results,
    10021007while \verb#simd<32>::min<l,h># can produce the
    1003 minimum value of adjacent 16-bit fields.  In general,
    1004 \newline \verb#simd<n>::F<l,h># denotes the horizontal
    1005 binary combination of adjacent fields for any
    1006 operator $F$ and field width $n$.
    1008 Horizontal combinations of larger numbers of fields
    1009 makes use of the inductive doubling property.
    1010 For example, consider the or-across operation \verb:si_orx:
     1008minimum value of adjacent 16-bit fields.
     1009Thus any binary horizontal operation can be implemented
     1010in a single IDISA instruction making use of the \verb:<h, l>:
     1011operand modifier combination.
     1013Horizontal combinations of four adjacent fields can also be
     1014realized in a general way through two steps of inductive
     1015doubling.  For example, consider the or-across operation \verb:si_orx:
    10111016of the SPU, that performs a logical or operation
    10121017on four 32-bit fields.  This four field combination
    1013 involves two steps in the inductive doubling approach.
     1018can easily be implemented with the following two operations.
    1020 This example is also interesting in showing a potential
    1021 value for supporting bitwise logical operations at
    1022 different field widths, i.e., specifically for use with
    1023 half-operand modifiers.
    1025 Similarly, to combine any eight fields simply requires
    1026 three inductive doubling steps using the desired
    1027 operator at successive power-of-two field widths, while
    1028 combining sixteen fields requires four such operations.
    1029 In this way, the inductive doubling architecture provides
    1030 systematic support for horizontal operations well beyond
    1031 the existing facilities of commodity architectures,
    1032 although lacking some of the special features found in
    1033 some cases.
     1026In general, systematic support for horizontal
     1027combinations of sets of $2^h$ adjacent fields may
     1028be realized through $h$ inductive double steps
     1029in a similar fashion.
     1030Thus, IDISA esssentially offers systematic support
     1031for horizontal operations entirely through the
     1032use of \verb:<h, l>: half-operand modifier
     1035Systematic support for general horizontal operations
     1036under IDISA also creates opportunity for a design tradeoff:
     1037offsetting the circuit complexity of half-operand
     1038modifiers with potential elimination of dedicated
     1039logic for some {/ad hoc} horizontal SIMD operations.
     1040Even if legacy support for these operations is required,
     1041it may be possible to provide that support through
     1042software or firmware rather than a full hardware
     1043implementation.  Evaluation of these possibilities
     1044in the context of particular architectures is a potential
     1045area for further work.
Note: See TracChangeset for help on using the changeset viewer.