Ignore:
Timestamp:
Aug 21, 2011, 4:20:30 PM (8 years ago)
Author:
ashriram
Message:

Working on evaluation. Fixed Figure sizes

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/HPCA2012/08-arm.tex

    r1302 r1335  
    11\def\CORTEXA8{Cortex-A8}
    22
    3 \section {Parabix2 on GT-P1000M}
     3\section {Parabix on Mobile Platforms}
    44
    5 The Samsung Galaxy Tab GT-P1000M device houses a Samsung S5PC110 ARM \CORTEXA8{} single-core, dual-issue, superscalar microprocessor. In addition to the standard feature set found in such low-power 32-bit microprocessors, the S5PC110 includes the ARM NEON general-purpose SIMD engine. ARM NEON makes available a 128-bit SIMD instruction set similar in functionality to Intel SSE3 instruction set. In this section, we present our performance comparison of a NEON-based port of Parabix2 versus the Expat parser, and executed on the Samsung Galaxy Tab GT-P1000M hardware. Parabix1 and Xerces are excluded from this portion of our study due to the complexity of the cross-platform build process in porting native C/C++ applications to the Android platform.
     5The Samsung Galaxy Tab GT-P1000M device houses a Samsung S5PC110 ARM
     6\CORTEXA8{} 1Ghz single-core, dual-issue, superscalar
     7microprocessor. It includes a 32kB L1 data cache and a 512kB L2 shared
     8cache. In addition to the standard feature set found in such low-power
     932-bit microprocessors, the S5PC110 includes the ARM NEON
     10general-purpose SIMD engine. ARM NEON makes available a 128-bit SIMD
     11instruction set similar in functionality to Intel SSE3 instruction
     12set. In this section, we present our performance comparison of a
     13NEON-based port of Parabix2 versus the Expat parser, and executed on
     14the Samsung Galaxy Tab GT-P1000M hardware.  Xerces is excluded from
     15this portion of our study due to the complexity of the cross-platform
     16build process in porting native C/C++ applications to the Android
     17platform.
    618
    7 \subsection{Platform Hardware}
    8 %\paragraph{GT-P1000M}
    9 Samsung Galaxy Tab GT-P1000M was produced by Samsung and incorporates the ARM
    10 \CORTEXA8{} microprocessor. Table \ref{arminfo} describes the Samsung Galaxy Tab GT-P1000M hardware.
    11 
    12 \begin{table}[h]
    13 \begin{center}
    14 \begin{tabular}{|l||l|}
    15 \hline
    16 Processor & ARM \CORTEXA8{} (1GHz) \\ \hline
    17 L1 Cache & 32kB I-Cache, 32kB D-Cache \\ \hline
    18 L2 Cache & 512kB \\ \hline
    19 Flash & 16GB \\ \hline
    20 \end{tabular}
    21 \end{center}
    22 \caption{GT-P1000M}
    23 \label{arminfo}
    24 \end{table}
    2519
    2620\subsection{Performance Results}
    2721
    2822\begin{figure}
    29 \begin{center}
     23\subfigure[ARM Neon Performance]{
    3024\includegraphics[width=0.5\textwidth]{plots/arm_TOT.pdf}
    31 \end{center}
    32 \caption{Parabix2 Performance on GT-P1000M (y-axis: CPU Cycles per kB)}
    3325\label{arm_processing_time}
     26}
     27\hfill
     28\subfigure[Performance ARM Neon vs Core i3 SSE.]{
     29\includegraphics[width=0.5\textwidth]{plots/RelativePerformanceARMvsCoreI3.pdf}
     30\label{relative_performance_arm_vs_i3}
     31}
    3432\end{figure}
    3533
    36 Migration of Parabix2 to the Android platform began with the retargetting of a subset of the Parabix2 IDISA SIMD library for ARM NEON.
    37 This library code was cross-compiled for Android using the Android NDK. The Android NDK is a companion tool to the Android SDK
    38 that allows developers to build performance-critical portions of applications in native code. The majority of the Parabix2 SIMD functionality ported directly. However, for a small subset of
    39 the SIMD functions of Parabix2 NEON equivalents did not exist. In such cases we simply simulated logical equivalencies using the available the instruction set.
    40 
    41 A comparison of Figure \ref{arm_processing_time} and Figure \ref{corei3_TOT} demonstrates that the performance of
    42 both Parabix2 and Expat degrades substantially on \CORTEXA8{}.  This result was expected given the combarably performance limited \CORTEXA8{} hardware architecture.  Surprisingly on \CORTEXA8{}  Expat outperforms Parabix2 on each of the lower markup density workloads, dew.xml and jaw.xm. On the remaining higher-density workloads, Parabix2 performs only moderately better than Expat.
    43 The higher latency of the NEON instructions on \CORTEXA8{} is the likely contributor to this loss in performance. A more interesting aspect of this result is demonstrated in a comparison of Figure
    44 \ref{relative_performance_arm_vs_i3} and Figure \ref{relative_performance_arm_vs_i3}. These figure demonstrate that the relative performance of each parser degrades in a relatively constant manner.
    45 That is, compared to the \CITHREE{}, on the GT-P1000M, Parabix2 and Expat operate at approximately 17.2\% and
    46 55.7\% efficiency respectively. Figure \ref{relative_performance_arm_vs_i3} shows that the baseline cost of Parabix2 operations implemented using the NEON instruction set---
    47 and thereby the baseline cost of Parabix2---is substantially higher on the \CORTEXA8{} processor.
    48 Given that Parabix2 was not designed with the limitations of the \CORTEXA8{} in mind, in the future a
    49 careful analysis of the cost of each instruction provided in the ARMv7 ISA may allow us to better utilize
    50 the hardware resources provided. In particular, future performance enhancement to ARM NEON could result in substantial overall improvement in Parabix2 execution time.
    51 
    52 \begin{figure}
    53 \begin{center}
    54 \includegraphics[width=0.5\textwidth]{plots/RelativePerformanceARMvsCoreI3.pdf}
    55 \end{center}
    56 \caption{Relative Slow Down of Parbix2 and Expat on GT-P1000M vs. \CITHREE{} }
    57 \label{relative_performance_arm_vs_i3}
    58 \end{figure}
     34Migration of Parabix2 to the Android platform began with the
     35retargetting of a subset of the Parabix2 IDISA SIMD library for ARM
     36NEON.  This library code was cross-compiled for Android using the
     37Android NDK. The Android NDK is a companion tool to the Android SDK
     38that allows developers to build performance-critical portions of
     39applications in native code. The majority of the Parabix2 SIMD
     40functionality ported directly. However, for a small subset of the SIMD
     41functions of Parabix2 NEON equivalents did not exist. In such cases we
     42simply simulated logical equivalencies using the available the
     43instruction set.
    5944
    6045
     46
     47A comparison of Figure \ref{arm_processing_time} and Figure
     48\ref{corei3_TOT} demonstrates that the performance of both Parabix2
     49and Expat degrades substantially on \CORTEXA8{}.  This result was
     50expected given the combarably performance limited \CORTEXA8{} hardware
     51architecture.  Surprisingly on \CORTEXA8{} Expat outperforms Parabix2
     52on each of the lower markup density workloads, dew.xml and jaw.xm. On
     53the remaining higher-density workloads, Parabix2 performs only
     54moderately better than Expat.  The higher latency of the NEON
     55instructions on \CORTEXA8{} is the likely contributor to this loss in
     56performance. A more interesting aspect of this result is demonstrated
     57in a comparison of Figure \ref{relative_performance_arm_vs_i3} and
     58Figure \ref{relative_performance_arm_vs_i3}. These figure demonstrate
     59that the relative performance of each parser degrades in a relatively
     60constant manner.  That is, compared to the \CITHREE{}, on the
     61GT-P1000M, Parabix2 and Expat operate at approximately 17.2\% and
     6255.7\% efficiency respectively. Figure
     63\ref{relative_performance_arm_vs_i3} shows that the baseline cost of
     64Parabix2 operations implemented using the NEON instruction set--- and
     65thereby the baseline cost of Parabix2---is substantially higher on the
     66\CORTEXA8{} processor.  Given that Parabix2 was not designed with the
     67limitations of the \CORTEXA8{} in mind, in the future a careful
     68analysis of the cost of each instruction provided in the ARMv7 ISA may
     69allow us to better utilize the hardware resources provided. In
     70particular, future performance enhancement to ARM NEON could result in
     71substantial overall improvement in Parabix2 execution time.
     72
     73
Note: See TracChangeset for help on using the changeset viewer.