Changes between Version 7 and Version 8 of IDISAproject

May 6, 2011, 11:43:05 AM (8 years ago)



  • IDISAproject

    v7 v8  
    5656A slight variant of this notation provides a fully general
    5757structure for horizontal SIMD operations combining pairs
    58 of adjacent fields.  Each ''n''-bit field ''x'' is viewed
    59 as a pair of adjacent ''n''/2 bit fields ''h''(''x'') for the high
    60 ''n''/2 bits of ''x'' and ''l''(''x'')  for the low ''n''/2 bits of ''x''.
    61 Given binary operation ''f'' on ''n'' bit fields and an ''N'' bit
    62 vector ''a'', ''v''=simd_hl<''n''>::''f''(''a'')  denotes the application of ''f''
    63 in the horizontal combination of all sets of adjacent fields of ''a'',
    64 such that ''v,,i,,,''=''f''(''h''(''a,,i,,''), ''l''(''a,,i,,'')).   Thus, simd_hl<16>::add(''x'')
    65 denotes the 16-bit addition of all pairs of adjacent 8-bit fields
    66 of ''x'' to produce a vector of 16-bit results.
     58of adjacent fields. 
     59Given binary operation ''f'' on ''n'' bit fields and two ''N'' bit
     60vectors ''a'' and ''b'', let ''c'' be the ''2N'' bit concatenation of ''a'' and ''b''.
     61Then ''v''=hsimd<''n''>::''f''(''a'', ''b'')  denotes the application of ''f''
     62in the horizontal combination of all sets of adjacent fields of ''c'' such that
     63''v,,i,,,''=''f''(''c,,2i,,''), ''c,,2i+1,,'').
    68 The horizontal combinations under IDISA are also designed
    69 to support inductive doubling: the repeated transitions
    70 from ''n''/2 to ''n'' bit fields widths.   For example, the
    71 horizontal combination of sets of four adjacent 8-bit fields
    72 of a vector ''x'' into 32-bit sums can be expressed
    73 in two  IDISA steps: simd_hl<32>::add(simd_hl<16>:add(''x'')).
    74 Similarly horizontal combinations of eight fields require 3 steps.
    75 Note also that important basic operations such as population count
    76 and parity on ''n'' = 2^''k''^ bit fields can be expressed using
    77 ''k'' step inductive doubling algorithms.    For example,
    78 population count within 8-bit fields of a vector ''v'' is
    79 computed by simd_hl<8>::add(simd_hl<4>:add(simd_hl<2>:add(''v''))),
    80 while parity within 4-bit fields is computed by
    81 simd_hl<4>:xor(simd_hl<2>:xor(''v'')).
     65See the list of [wiki:IDISA_Horizontal IDISA Horizontal] operations for the
     66individual operations and their semantics.
    136122Another case is implementing the IDISA nonsaturating
    137 pack using the saturating pack found with SSE, for example.
     123packl using the saturating pack found with SSE, for example.
    138124In this case, the default definition requires masking:
    140126  template<>
    141   inline SIMD_type simd<16>::pack(SIMD_type r1, SIMD_type r2) {
     127  inline SIMD_type simd<16>::packl(SIMD_type r1, SIMD_type r2) {
    142128      return _mm_packus_epi16(simd_andc(r2, simd<16>::himask()), simd_andc(r1, simd<16>::himask()));
    143129  }