Changeset 226

Dec 5, 2008, 1:19:05 PM (11 years ago)

Intro - IDISA def; evaluation model

1 edited


  • docs/ASPLOS09/asplos094-cameron.tex

    r225 r226  
    116116The report addressing the broader problem of XML parsing is perhaps more
    117 compelling, demonstrating the utility of parallel bit stream techniques
     117interesting, demonstrating the utility of parallel bit stream techniques
    118118in delivering performance benefits through a significant portion of the
    119119web technology stack.  In an XML statistics gathering application,
    120120including the implementation of XML well-formedness checking, an
    121 overall 2.5X to 6X performance improvement is reported in using
     121overall 3X to 10X performance improvement is achieved in using
    122122the parallel bit stream methods in comparison with a similarly
    123123coded application using such well known parsers as Expat and Xerces.
    124 Although still a work in progress, the parser is claimed to have functional
     124Although still a work in progress, the parser has functional
    125125coverage of XML and related specifications comparable to, but somewhat
    126126beyond Expat.   The study also identifies promising further
    144144widths.   These transitions are quite frequent in several critical
    145145algorithms for parallel bit streams.   These transitions also
    146 occur in other applications as well.   For example, efficient
    147 automatic interleaving based on power-of-2 field widths has been
    148 reported as generally useful for a number of SIMD kernels \cite{Nuzman}.
     146occur in other applications as well.  In related work,
     147efficient automatic interleaving based on power-of-2 strides has been
     148reported quite useful for a number of SIMD kernels \cite{Nuzman}.
     149The specific features presented herein will be referred to
     150as IDISA: inductive doubling instruction set architecture.
     152To assess the benefits of IDISA features in comparison to
     153those found in existing SIMD support on commodity
     154processors, we will focus on an idealized three-register
     155model of SIMD instruction sets and evaluation of
     156parallel bit stream and other computational kernels with
     157respect to these features.  The three-register model is
     158based on the general approach to binary SIMD operations
     159as ones that operate on the contents of two source operand
     160registers to produce a value to be written back to a
     161single destination operand register.  This three-register
     162model is directly used by the Altivec SIMD instructions
     163of the Power PC, for example.  On the Intel SSE platform,
     164the three-register model is used as a programmer's
     165interface for the C-language intrinsics, while the
     166underlying instructions use a two-register model with
     167destructive updating.  The computational kernels we study consist
     168of 100\% branch-free code operating on registers, without
     169memory access   For such kernels, we use straight-line
     170instruction count as the performance metric of interest,
     171assuming that pipelined processors are well-designed
     172to handle latencies. 
    150175The remainder of this paper is organized as follows.
    152 The second section of this paper introduces the proposed
    153 inductive doubling instruction set architecture and the
     177The second section of this paper introduces IDISA and the
    154178SIMD notation used throughout this paper.  A brief comparison of
    155 this architecture with existing SIMD instruction
     179IDISA features with existing SIMD instruction
    156180sets of commodity processors such as the Intel SSE
    157181instruction set and the Power PC Altivec instruction set
    177201operations to transform 128 bytes of data using the three-register
    178202instruction form.  We then move on to consider how the task may
    179 be simplified using the inductive doubling architecture to
     203be simplified using IDISA to
    180204require a mere 24 operations.  As well as providing a 3X speed-up,
    181205it is also argued that the version using the inductive doubling
    186210streams.  Again, an algorithm to carry out this task requires
    18721172 straight-line SIMD operations in the Altivec three-register
    188 form, but is reduced to a simpler 24 operations using the
    189 inductive doubling architecture.
     212form, but is reduced to a simpler 24 operations using IDISA.
    191214The sixth section then goes on to consider the problem of
    204226The seventh section considers the issue of
    205227horizontal SIMD operations, that is, operations for combining
    207229corresponding fields within sets of registers.  While existing
    208230SIMD instruction set architectures tend to only support a few
    209 ad hoc horizontal combinations, the inductive doubling
    210 architecture is shown to provide a systematic means for
    211 efficient horizontal combinations of any kind.
    213 Implementation issues are briefly considered in section 8 of the paper,
    214 while the ninth section concludes the paper with a summary of results
     231ad hoc horizontal combinations, IDISA is shown to provide a systematic
     232means for efficient horizontal combinations of any kind.
     234An implementation model for IDISA is then considered
     235in section 8 of the paper, focusing on a pipelined
     236SIMD architecture featuring a modified operand fetch stage.
     237A gate-count analysis of one feasible implementation is
     238provided as well as a discussion of the implementation
     239of required extensions to handle 2-bit and 4-bit fields not
     240commonly supported on existing SIMD architectures.   Design
     241tradeoffs are also considered focusing the potential removal of
     242various {\em ad hoc} instructions on existing processors in favor of
     243more general alternatives provided through IDISA.
     245The ninth section concludes the paper with a summary of results
    215246and discussion of areas for future work.
Note: See TracChangeset for help on using the changeset viewer.