source: docs/PACT2011/075-arm.tex @ 1125

Last change on this file since 1125 was 1125, checked in by lindanl, 8 years ago

Add ARM

File size: 3.7 KB
Line 
1\def\CORTEXA8{Cortex-A8}
2
3\section {Parabix2 on GT-P1000M}
4
5The Samsung Galaxy Tab GT-P1000M device houses a SAMSUNG S5PC110 ARM \CORTEXA8{}-based single-core dual-issue superscalar microprocessor. Apart from the usual features found in typical 32-bit low-power microprocessors, the S5PC110 includes the NEON extension. This feature enables the use of 128-bit SIMD operations, similar to in scope to Intel's SSE3 technology. In this section, we discuss the performance of Parabix2 on the Samsung Galaxy Tab GT-P1000M compared to Expat. Parabix1 and Xerces were excluded from this study due to the difficulties involved in porting them to the Android platform.
6
7\subsection{Platform Hardware}
8%\paragraph{GT-P1000M}
9Samsung Galaxy Tab GT-P1000M was produced by Samsung and incorporates the
10\CORTEXA8{} microprocessor developed by ARM. Table \ref{arminfo} gives the
11hardware description of the Samsung Galaxy Tab GT-P1000M tablet.
12
13\begin{table}[h]
14\begin{center}
15\begin{tabular}{|l||l|}
16\hline
17Processor & ARM \CORTEXA8{} (1GHz) \\ \hline
18L1 Cache & 32KB I-Cache, 32KB D-Cache \\ \hline 
19L2 Cache & 512KB \\ \hline
20Flash & 16 GB \\ \hline
21Max TDP & ??? W \\ \hline
22\end{tabular}
23\end{center}
24\caption{GT-P1000M} 
25\label{arminfo} 
26\end{table}
27
28\subsection{Performance Results}
29
30\begin{figure}
31\begin{center}
32\includegraphics[width=0.5\textwidth]{plots/arm_TOT.pdf}
33\end{center}
34\caption{Processing Time on GT-P1000M}
35\label{arm_processing_time}
36\end{figure}
37
38In order to implement Parabix2 with NEON technology, Parabix2's underlying SIMD library was rewritten
39to match the NEON instruction set. It was cross-compiled through the use of the Android 2.2 SDK using the
40Android-specific Application Binary Interface (ABI) for ARM-based CPU architectures, armeabi-v7a.
41The majority of the SIMD functions were directly portable but some functions were not directly supported
42and had to be simulated or otherwise modified to take advantage of the NEON-specific instructions.
43
44Figure \ref{arm_processing_time} shows that the processing time (in terms of cycles per kilobyte) of
45both Parabix2 and Expat increases substantially compared to the processing time on \CITHREE{} 
46(Figure \ref{corei3_TOT}). That result is not unexpected given the architectural differences between
47the \CORTEXA8{} and the \CITHREE{}. Surprisingly, Expat outperforms Parabix2 on both of the
48lower-density workloads (dew.xml and jaw.xml), and is only moderately worse than Parabix2 on the
49higher-density workloads. Although the higher latency of the NEON instruction set is definatly
50a factor in performance loss of Parabix2, a more interesting view of this result can be seen in Figure
51\ref{relative_performance_arm_vs_i3}; by comparing the relative performance of the GT-P1000M vs.
52the \CITHREE{}, the performance of each parser on each workload is actually quite stable.
53Compared to the \CITHREE{}, on the GT-P1000M, Parabix2 and Expat operate at approximately 17.2\% and
5455.7\% efficiency respectively. This indicates that the baseline cost of the NEON instruction set---
55and thereby the baseline cost of Parabix2---is substantially higher.
56Given that Parabix2 was not designed with the limitations of the \CORTEXA8{} processor in mind, a
57careful analysis of the cost of each instruction provided in the ARMv7 ISA may allow us to better utilize
58the resources provided in the GT-P1000M.
59Additionally, if future NEON implementations result in even marginally improved performance, large gains
60could be realized in Parabix2.
61
62\begin{figure}
63\begin{center}
64\includegraphics[width=0.5\textwidth]{plots/RelativePerformanceARMvsCoreI3.pdf}
65\end{center}
66\caption{Relative Performance of GT-P1000M vs. \CITHREE{}}
67\label{relative_performance_arm_vs_i3}
68\end{figure}
69
70
Note: See TracBrowser for help on using the repository browser.