Changeset 1113

Apr 10, 2011, 10:47:28 PM (8 years ago)

Significant edits.

2 edited


  • docs/PACT2011/05-corei3.tex

    r1112 r1113  
    8282\subsection{SIMD Instructions vs. Total Instructions}
    84 Parabix gains its performance by using parallel bitstreams, which are
    85 mostly generated and calculated by SIMD instructions.  The ratio of
    86 executed SIMD instructions over total instructions indicates the
    87 amount of parallel processing we were able to achieve. 
    88 Using Intel PIN, a dynamic binary instrumentation tool, we gathered the running instruction mix of each XML workload and classified the instructions as either vector (SIMD-based) instructions or non-vector (Non-SIMD-based) instructions.
    89 Figure \ref{corei3_INS_p1} and Figure \ref{corei3_INS_p2} shows the
    90 percentage of SIMD instructions of Parabix1 and Parabix2
     84Parabix achieves performance via parallel bit stream technology. In Parabix XML processing, parallel bit streams are
     85both computed and predominately operated upon using the SIMD instructions of commodity processors.  The ratio of
     86retired SIMD instructions to total instructions provides insight into\ the relative degree to which Parabix achieves parallelism
     87over the byte-at-a-time approach.
     89Using the Intel Pin tool, we gather the dynamic instruction mix for each XML workload, and classify instructions as either vector (SIMD) or non-vector instructions.
     90Figures \ref{corei3_INS_p1} and \ref{corei3_INS_p2} show the
     91percentage of SIMD instructions for Parabix1 and Parabix2 respectively.
    9192%(Expat and Xerce do not use any SIMD instructions)
    92 .  For Parabix1, 18\% to 40\%
    93 of the executed instructions consists of SIMD instructions.  By using
    94 bistream addition for parallel scanning, Parabix2 uses 60\% to 80\%
     93For Parabix1, 18\% to 40\% of the executed instructions are SIMD instructions.  Using
     94bit stream addition to scan XML characters in parallel, the Parabix2 instruction mix is made up of 60\% to 80\%
    9595SIMD instructions.  Although the resulting ratios are (negatively) proportional to the markup density
    9696for both Parabix1 and Parabix2, the degradation rate of
    134134\subsection{CPU Cycles}
    136 Figure \ref{corei3_TOT} shows the result of the overall performance
    137 evaluated as CPU cycles per thousand input bytes.  Parabix1 is 1.5 to
     136Figure \ref{corei3_TOT} shows overall parser performance
     137evaluated in terms of CPU cycles per kilobyte.  Parabix1 is 1.5 to
    1381382.5 times faster on document-oriented input and 2 to 3 times faster on
    139 data-oriented input compared with Expat and Xerces.  Parabix2 is 2.5
     139data-oriented input than the Expat and Xerces parsers respectively.  Parabix2 is 2.5
    140140to 4 times faster on document-oriented input and 4.5 to 7 times faster
    141141on data-oriented input.  Traditional parsers can be dramatically
    142 slowed down by higher markup density while Parabix with parallel
    143 processing is less affected.  The comparison is not entirely fair for
    144 Xerces that transcodes input into UTF-16, which typically takes
     142slowed by dense markup, while Parabix2 is generally unaffected.  The results presented are not entirely fair to the
     143Xerces parser since it first transcodes input from UTF-8 to UTF-16 before processing. In Xerces, this transcoding requires
    145144several cycles per byte.  However, transcoding using parallel
    146 bitstreams can be much faster and it takes less than a cycle per byte
    147 to transcode ASCI3I files such as road.gml, po.xml and soap.xml
     145bit streams is significantly faster and requires less than a single cycle per byte.
    158156\subsection{Power and Energy}
    159 There is a growing concern of power consumption and energy efficiency.
    160 Chip producers not only work on improving the performance but also
    161 have worked hard to develop power efficient chips. We studied the
     157In response to the growing industry concerns on power consumption and energy efficiency,
     158chip producers work hard to not only improve performance but
     159also achieve high energy efficiency in processors design. We study the
    162160power and energy consumption of Parabix in comparison with Expat and
    163 Xerces on \CITHREE{}. 
    165 Figure \ref{corei3_power} shows the average power consumed by the four
    166 different parsers.  The average power of \CITHREE\ 530 is about 21 watts.
    167 This model released by Intel last year has a good reputation for power
    168 efficiency.  Parabix2 dominated by SIMD instructions uses only about
    169 5\% higher power than the other parsers.
     161Xerces on \CITHREE{}. The average power of \CITHREE\ 530 is about 21 watts.
     162This Intel model has a good reputation for power efficiency. Figure \ref{corei3_power} shows the average power consumed by each parser.
     163Parabix2, dominated by SIMD instructions, uses approximately 5\% additional power.     
    179 The more interesting trend is energy, Figure \ref{corei3_energy} shows
    180 the energy consumption of the four different parsers.  Although
    181 Parabix2 requires slightly more power (per instruction), its processing time is significantly lower
    182 and therefore consumes substantially less energy than the other parsers. Parabix2 consumes 50 to 75
    183 nJ per byte while Expat and Xerces consumes 80nJ to 320nJ and 140nJ to
    184 370nJ per byte seperately.
     173As shown in Figure \ref{corei3_energy}, a comparison of energy efficiency demonstrates a more interesting result. Although
     174Parabix2 requires slightly more power (per instruction), the processing time of Parabix2 is significantly lower,
     175and therefore Parabix2 consumes substantially less energy than the other parsers. Parabix2 consumes 50 to 75
     176nJ per byte while Expat and Xerces consume 80nJ to 320nJ and 140nJ to 370nJ per byte respectively.
Note: See TracChangeset for help on using the changeset viewer.