Changeset 1409 for docs/HPCA2012

Aug 31, 2011, 5:23:32 PM (8 years ago)

edits to mobile processor section

1 edited


  • docs/HPCA2012/06-scalability.tex

    r1408 r1409  
    1 \section{Evaluating Parabix on Hardware}
     1\section{Evaluation of Parabix accross different Hardware}
    7070\subsection{Parabix on Mobile processors}
    72 Our experience with the generation of Intel processors led us to
    73 contemplate about mobile processors such as the ARM \CORTEXA8\ which
    74 also includes SIMD units.  ARM \NEON{} makes available a 128-bit SIMD
     72Our experience with Intel processors led us to
     73question whether mobile processors with SIMD support, such as the ARM \CORTEXA8{},
     74could benefit from Parabix technology. ARM \NEON{} provides a 128-bit SIMD
    7575instruction set similar in functionality to Intel SSE3 instruction
    7676set. In this section, we present our performance comparison of a
    8282Samsung S5PC110 ARM \CORTEXA8{} 1Ghz single-core, dual-issue,
    8383superscalar microprocessor. It includes a 32kB L1 data cache and a
    84 512kB L2 shared cache.  Migration of Parabix to the Android platform
    85 began with the retargeting of a subset of the Parabix SIMD library
    86 for ARM \NEON{}.  The majority of the Parabix SIMD functionality ported
    87 directly. However, for a small subset of the SIMD functions (e.g., bit
    88 packing) of \NEON{} equivalents did not exist. In such cases we simply
    89 emulated logical equivalent instructions using the available the
    90 scalar instruction set. This library code was cross-compiled for
    91 Android using the Android NDK. 
     84512kB L2 shared cache.  Migration of Parabix-XML to the Android platform
     85only required developing a Parabix runtime library for ARM \NEON{}.
     86The majority of the runtime functionality was ported
     87directly. However, a small subset of key SIMD instructions (e.g., bit
     88packing) did not exist on \NEON{}. In such cases, the
     89logical equivalent of those instructions was emulated using the available
     90ISA. The resulting application was cross-compiled for
     91Android using the Android NDK.
    9393A comparison of Figure \ref{arm_processing_time} and Figure
    9494\ref{corei3_TOT} demonstrates that the performance of both Parabix and
    95 Expat degrades substantially on \CORTEXA8{} (5$\times$---17$\times$).
     95Expat degrades substantially on \CORTEXA8{} (5--17$\times$).
    9696This result was expected given the comparably performance limited
    9797\CORTEXA8{}.  Surprisingly, on \CORTEXA8{}, Expat outperforms Parabix
    105 \subfigure[ARM Neon Performance]{
     105\subfigure[ARM Neon Performance (cycles per kB)]{
    119 \caption{Comparaing Parabix on ARM and Intel.}
     119\caption{Comparison of Parabix-XML on ARM vs. Intel.}
    129129each parser varies in a linear fashion with respect to the markup
    130130density of the file. On the both \CORTEXA8{} and \CITHREE{} both
    131 parsers demonstrate the same trend. For lower mark up density files
    132 for which the fraction of SIMD operations and hence the potential for
    133 parallelism is limited, the overheads of SIMD instructions affect
    134 overall execution time. Figure~\ref{relative_performance_arm} provides
    135 insight into the problem, Parabix's performance is hindered by SIMD
    136 instruction latency for low markup density files; it appears that the
    137 latency of SIMD operations is relatively higher on the \CORTEXA8{}
    138 processor.  This is possibly because the \NEON{} SIMD extensions are
    139 implemented as a coprocessor on \CORTEXA8{} which imposes higher
     131parsers demonstrate the same trend: files with a lower markup density
     132exhibit higher levels of parallelism; consequently, the overhead of SIMD
     133instructions has a greater impact on the overall execution time for
     134those files.
     135The contrast between Figure~\ref{relative_performance_arm} and~\ref{relative_performance_intel} provides
     136insight into the problem: Parabix-XML's performance is hindered by SIMD
     137instruction latency.  This is possibly because the \NEON{} SIMD extensions are
     138implemented as a coprocessor on the \CORTEXA8{}, which imposes a higher
    140139overhead for applications that frequently inter-operate between scalar
    141140and SIMD registers. Future performance enhancement to ARM \NEON{} that
    142141implement the \NEON{} within the core microarchitecture could
    143 substantially improve the efficiency of Parabix.
     142substantially improve the efficiency of Parabix-XML.
Note: See TracChangeset for help on using the changeset viewer.