source: docs/HPCA2012/08-arm.tex @ 1399

Last change on this file since 1399 was 1339, checked in by cameron, 8 years ago

Intro updates; section cross-references

File size: 3.4 KB
3\section {Parabix on Mobile Platforms}
5The Samsung Galaxy Tab GT-P1000M device houses a Samsung S5PC110 ARM
6\CORTEXA8{} 1Ghz single-core, dual-issue, superscalar
7microprocessor. It includes a 32kB L1 data cache and a 512kB L2 shared
8cache. In addition to the standard feature set found in such low-power
932-bit microprocessors, the S5PC110 includes the ARM NEON
10general-purpose SIMD engine. ARM NEON makes available a 128-bit SIMD
11instruction set similar in functionality to Intel SSE3 instruction
12set. In this section, we present our performance comparison of a
13NEON-based port of Parabix2 versus the Expat parser, and executed on
14the Samsung Galaxy Tab GT-P1000M hardware.  Xerces is excluded from
15this portion of our study due to the complexity of the cross-platform
16build process in porting native C/C++ applications to the Android
20\subsection{Performance Results}
23\subfigure[ARM Neon Performance]{
28\subfigure[Performance ARM Neon vs Core i3 SSE.]{
34Migration of Parabix2 to the Android platform began with the
35retargetting of a subset of the Parabix2 IDISA SIMD library for ARM
36NEON.  This library code was cross-compiled for Android using the
37Android NDK. The Android NDK is a companion tool to the Android SDK
38that allows developers to build performance-critical portions of
39applications in native code. The majority of the Parabix2 SIMD
40functionality ported directly. However, for a small subset of the SIMD
41functions of Parabix2 NEON equivalents did not exist. In such cases we
42simply simulated logical equivalencies using the available the
43instruction set.
47A comparison of Figure \ref{arm_processing_time} and Figure
48\ref{corei3_TOT} demonstrates that the performance of both Parabix2
49and Expat degrades substantially on \CORTEXA8{}.  This result was
50expected given the combarably performance limited \CORTEXA8{} hardware
51architecture.  Surprisingly on \CORTEXA8{} Expat outperforms Parabix2
52on each of the lower markup density workloads, dew.xml and jaw.xm. On
53the remaining higher-density workloads, Parabix2 performs only
54moderately better than Expat.  The higher latency of the NEON
55instructions on \CORTEXA8{} is the likely contributor to this loss in
56performance. A more interesting aspect of this result is demonstrated
57in a comparison of Figure \ref{relative_performance_arm_vs_i3} and
58Figure \ref{relative_performance_arm_vs_i3}. These figure demonstrate
59that the relative performance of each parser degrades in a relatively
60constant manner.  That is, compared to the \CITHREE{}, on the
61GT-P1000M, Parabix2 and Expat operate at approximately 17.2\% and
6255.7\% efficiency respectively. Figure
63\ref{relative_performance_arm_vs_i3} shows that the baseline cost of
64Parabix2 operations implemented using the NEON instruction set--- and
65thereby the baseline cost of Parabix2---is substantially higher on the
66\CORTEXA8{} processor.  Given that Parabix2 was not designed with the
67limitations of the \CORTEXA8{} in mind, in the future a careful
68analysis of the cost of each instruction provided in the ARMv7 ISA may
69allow us to better utilize the hardware resources provided. In
70particular, future performance enhancement to ARM NEON could result in
71substantial overall improvement in Parabix2 execution time.
Note: See TracBrowser for help on using the repository browser.