source: docs/HPCA2011/08-arm.tex @ 1302

Last change on this file since 1302 was 1302, checked in by lindanl, 8 years ago

Create a directory for HPCA

File size: 4.0 KB
Line 
1\def\CORTEXA8{Cortex-A8}
2
3\section {Parabix2 on GT-P1000M}
4
5The Samsung Galaxy Tab GT-P1000M device houses a Samsung S5PC110 ARM \CORTEXA8{} single-core, dual-issue, superscalar microprocessor. In addition to the standard feature set found in such low-power 32-bit microprocessors, the S5PC110 includes the ARM NEON general-purpose SIMD engine. ARM NEON makes available a 128-bit SIMD instruction set similar in functionality to Intel SSE3 instruction set. In this section, we present our performance comparison of a NEON-based port of Parabix2 versus the Expat parser, and executed on the Samsung Galaxy Tab GT-P1000M hardware. Parabix1 and Xerces are excluded from this portion of our study due to the complexity of the cross-platform build process in porting native C/C++ applications to the Android platform.
6
7\subsection{Platform Hardware}
8%\paragraph{GT-P1000M}
9Samsung Galaxy Tab GT-P1000M was produced by Samsung and incorporates the ARM
10\CORTEXA8{} microprocessor. Table \ref{arminfo} describes the Samsung Galaxy Tab GT-P1000M hardware.
11
12\begin{table}[h]
13\begin{center}
14\begin{tabular}{|l||l|}
15\hline
16Processor & ARM \CORTEXA8{} (1GHz) \\ \hline
17L1 Cache & 32kB I-Cache, 32kB D-Cache \\ \hline 
18L2 Cache & 512kB \\ \hline
19Flash & 16GB \\ \hline
20\end{tabular}
21\end{center}
22\caption{GT-P1000M} 
23\label{arminfo} 
24\end{table}
25
26\subsection{Performance Results}
27
28\begin{figure}
29\begin{center}
30\includegraphics[width=0.5\textwidth]{plots/arm_TOT.pdf}
31\end{center}
32\caption{Parabix2 Performance on GT-P1000M (y-axis: CPU Cycles per kB)}
33\label{arm_processing_time}
34\end{figure}
35
36Migration of Parabix2 to the Android platform began with the retargetting of a subset of the Parabix2 IDISA SIMD library for ARM NEON.
37This library code was cross-compiled for Android using the Android NDK. The Android NDK is a companion tool to the Android SDK
38that allows developers to build performance-critical portions of applications in native code. The majority of the Parabix2 SIMD functionality ported directly. However, for a small subset of
39the SIMD functions of Parabix2 NEON equivalents did not exist. In such cases we simply simulated logical equivalencies using the available the instruction set.
40
41A comparison of Figure \ref{arm_processing_time} and Figure \ref{corei3_TOT} demonstrates that the performance of
42both Parabix2 and Expat degrades substantially on \CORTEXA8{}.  This result was expected given the combarably performance limited \CORTEXA8{} hardware architecture.  Surprisingly on \CORTEXA8{}  Expat outperforms Parabix2 on each of the lower markup density workloads, dew.xml and jaw.xm. On the remaining higher-density workloads, Parabix2 performs only moderately better than Expat.
43The higher latency of the NEON instructions on \CORTEXA8{} is the likely contributor to this loss in performance. A more interesting aspect of this result is demonstrated in a comparison of Figure
44\ref{relative_performance_arm_vs_i3} and Figure \ref{relative_performance_arm_vs_i3}. These figure demonstrate that the relative performance of each parser degrades in a relatively constant manner.
45That is, compared to the \CITHREE{}, on the GT-P1000M, Parabix2 and Expat operate at approximately 17.2\% and
4655.7\% efficiency respectively. Figure \ref{relative_performance_arm_vs_i3} shows that the baseline cost of Parabix2 operations implemented using the NEON instruction set---
47and thereby the baseline cost of Parabix2---is substantially higher on the \CORTEXA8{} processor.
48Given that Parabix2 was not designed with the limitations of the \CORTEXA8{} in mind, in the future a
49careful analysis of the cost of each instruction provided in the ARMv7 ISA may allow us to better utilize
50the hardware resources provided. In particular, future performance enhancement to ARM NEON could result in substantial overall improvement in Parabix2 execution time.
51
52\begin{figure}
53\begin{center}
54\includegraphics[width=0.5\textwidth]{plots/RelativePerformanceARMvsCoreI3.pdf}
55\end{center}
56\caption{Relative Slow Down of Parbix2 and Expat on GT-P1000M vs. \CITHREE{} }
57\label{relative_performance_arm_vs_i3}
58\end{figure}
59
60
Note: See TracBrowser for help on using the repository browser.