Changes between Version 5 and Version 6 of IDISAproject


Ignore:
Timestamp:
May 18, 2010, 7:06:26 PM (9 years ago)
Author:
cameron
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • IDISAproject

    v5 v6  
    7272Note also that important basic operations such as population count
    7373and parity on ''n'' = 2^''k''^ bit fields can be expressed using
    74 $k$ step inductive doubling algorithms.    For example,
     74''k'' step inductive doubling algorithms.    For example,
    7575population count within 8-bit fields of a vector ''v'' is
    7676computed by simd_hl<8>::add(simd_hl<4>:add(simd_hl<2>:add(''v''))),
     
    100100any given platform.
    101101
    102 Note that there are lots of potential tricks.   
     102Note that there are lots of potential tricks.   Another case occurs with the simd<16>::pack
    103103
    1041041. For example,
     
    108108operation.
    109109
    110 simd_hl<2>::add(a) = simd<16>::add(simd<16>::srli(a, 1) & simd<2>:constant(1), a & simd<2>:constant(1))
     110 simd_hl<2>::add(a) = simd<16>::add(simd<16>::srli(a, 1) & simd<2>::constant(1), a & simd<2>:constant(1))
    111111
    112112But one of the masks can be eliminated by taking advantage
    113113of the properties of 2-bit subtraction.
    114114
    115 simd_hl<2>::add(a) = simd<16>::sub(a, simd<16>::srli(a, 1) & simd<2>:constant(1))
     115 simd_hl<2>::add(a) = simd<16>::sub(a, simd<16>::srli(a, 1) & simd<2>::constant(1))
    116116
    117117=== IDISA Test Generator ===
     
    131131such as simd<16>::add(a,b) on most platforms.
    132132
     133Another case is implementing the IDISA nonsaturating
     134pack using the saturating pack found with SSE, for example.
     135In this case, the default definition requires masking:
    133136
     137  template<>
     138  inline SIMD_type simd<16>::pack(SIMD_type r1, SIMD_type r2) {
     139      return _mm_packus_epi16(simd_andc(r2, simd<16>::himask()), simd_andc(r1, simd<16>::himask()));
     140  }
     141
     142But, if we know that the high byte of each of r1 and r2 are
     143zero, then the masks are not required.   This might be the case,
     144for example, if we have a run of UTF-16 code units in the
     145ASCII range.
    134146
    135147=== IDISA Reverse Instruction Optimizer. ===