# Changeset 1409 for docs/HPCA2012/06-scalability.tex

Ignore:
Timestamp:
Aug 31, 2011, 5:23:32 PM (8 years ago)
Message:

edits to mobile processor section

File:
1 edited

### Legend:

Unmodified
 r1408 \section{Evaluating Parabix on Hardware} \section{Evaluation of Parabix accross different Hardware} \label{section:scalability} \subsection{Performance} \subsection{Parabix on Mobile processors} \label{section:scalability:\NEON{}} Our experience with the generation of Intel processors led us to contemplate about mobile processors such as the ARM \CORTEXA8\ which also includes SIMD units.  ARM \NEON{} makes available a 128-bit SIMD Our experience with Intel processors led us to question whether mobile processors with SIMD support, such as the ARM \CORTEXA8{}, could benefit from Parabix technology. ARM \NEON{} provides a 128-bit SIMD instruction set similar in functionality to Intel SSE3 instruction set. In this section, we present our performance comparison of a Samsung S5PC110 ARM \CORTEXA8{} 1Ghz single-core, dual-issue, superscalar microprocessor. It includes a 32kB L1 data cache and a 512kB L2 shared cache.  Migration of Parabix to the Android platform began with the retargeting of a subset of the Parabix SIMD library for ARM \NEON{}.  The majority of the Parabix SIMD functionality ported directly. However, for a small subset of the SIMD functions (e.g., bit packing) of \NEON{} equivalents did not exist. In such cases we simply emulated logical equivalent instructions using the available the scalar instruction set. This library code was cross-compiled for Android using the Android NDK. 512kB L2 shared cache.  Migration of Parabix-XML to the Android platform only required developing a Parabix runtime library for ARM \NEON{}. The majority of the runtime functionality was ported directly. However, a small subset of key SIMD instructions (e.g., bit packing) did not exist on \NEON{}. In such cases, the logical equivalent of those instructions was emulated using the available ISA. The resulting application was cross-compiled for Android using the Android NDK. A comparison of Figure \ref{arm_processing_time} and Figure \ref{corei3_TOT} demonstrates that the performance of both Parabix and Expat degrades substantially on \CORTEXA8{} (5$\times$---17$\times$). Expat degrades substantially on \CORTEXA8{} (5--17$\times$). This result was expected given the comparably performance limited \CORTEXA8{}.  Surprisingly, on \CORTEXA8{}, Expat outperforms Parabix \begin{figure}[!h] \subfigure[ARM Neon Performance]{ \subfigure[ARM Neon Performance (cycles per kB)]{ \includegraphics[width=0.3\textwidth]{plots/arm_TOT.pdf} \label{arm_processing_time} \label{relative_performance_intel} } \caption{Comparaing Parabix on ARM and Intel.} \caption{Comparison of Parabix-XML on ARM vs. Intel.} \end{figure} each parser varies in a linear fashion with respect to the markup density of the file. On the both \CORTEXA8{} and \CITHREE{} both parsers demonstrate the same trend. For lower mark up density files for which the fraction of SIMD operations and hence the potential for parallelism is limited, the overheads of SIMD instructions affect overall execution time. Figure~\ref{relative_performance_arm} provides insight into the problem, Parabix's performance is hindered by SIMD instruction latency for low markup density files; it appears that the latency of SIMD operations is relatively higher on the \CORTEXA8{} processor.  This is possibly because the \NEON{} SIMD extensions are implemented as a coprocessor on \CORTEXA8{} which imposes higher parsers demonstrate the same trend: files with a lower markup density exhibit higher levels of parallelism; consequently, the overhead of SIMD instructions has a greater impact on the overall execution time for those files. The contrast between Figure~\ref{relative_performance_arm} and~\ref{relative_performance_intel} provides insight into the problem: Parabix-XML's performance is hindered by SIMD instruction latency.  This is possibly because the \NEON{} SIMD extensions are implemented as a coprocessor on the \CORTEXA8{}, which imposes a higher overhead for applications that frequently inter-operate between scalar and SIMD registers. Future performance enhancement to ARM \NEON{} that implement the \NEON{} within the core microarchitecture could substantially improve the efficiency of Parabix. substantially improve the efficiency of Parabix-XML.