Changeset 1129 for docs


Ignore:
Timestamp:
Apr 14, 2011, 8:27:16 PM (8 years ago)
Author:
ksherdy
Message:

Minor Edits.

Location:
docs/PACT2011
Files:
4 edited

Legend:

Unmodified
Added
Removed
  • docs/PACT2011/075-arm.tex

    r1128 r1129  
    33\section {Parabix2 on GT-P1000M}
    44
    5 The Samsung Galaxy Tab GT-P1000M device houses a SAMSUNG S5PC110 ARM \CORTEXA8{}-based single-core dual-issue superscalar microprocessor. Apart from the usual features found in typical 32-bit low-power microprocessors, the S5PC110 includes the NEON extension. This feature enables the use of 128-bit SIMD operations, similar to in scope to Intel's SSE3 technology. In this section, we discuss the performance of Parabix2 on the Samsung Galaxy Tab GT-P1000M compared to Expat. Parabix1 and Xerces were excluded from this study due to the difficulties involved in porting them to the Android platform.
     5The Samsung Galaxy Tab GT-P1000M device houses a Samsung S5PC110 ARM \CORTEXA8{} single-core, dual-issue, superscalar microprocessor. In addition to the standard feature set found in such low-power 32-bit microprocessors, the S5PC110 includes the ARM NEON general-purpose SIMD engine. ARM NEON makes available a 128-bit SIMD instruction set similar in functionality to Intel SSE3 instruction set. In this section, we present our performance comparison of a NEON-based port of Parabix2 versus the Expat parser, and executed on the Samsung Galaxy Tab GT-P1000M hardware. Parabix1 and Xerces are excluded from this portion of our study due to the complexity of the cross-platform build process in porting native C/C++ applications to the Android platform.
    66
    77\subsection{Platform Hardware}
    88%\paragraph{GT-P1000M}
    9 Samsung Galaxy Tab GT-P1000M was produced by Samsung and incorporates the
    10 \CORTEXA8{} microprocessor developed by ARM. Table \ref{arminfo} gives the
    11 hardware description of the Samsung Galaxy Tab GT-P1000M tablet.
     9Samsung Galaxy Tab GT-P1000M was produced by Samsung and incorporates the ARM
     10\CORTEXA8{} microprocessor. Table \ref{arminfo} describes the Samsung Galaxy Tab GT-P1000M hardware.
    1211
    1312\begin{table}[h]
     
    1615\hline
    1716Processor & ARM \CORTEXA8{} (1GHz) \\ \hline
    18 L1 Cache & 32KB I-Cache, 32KB D-Cache \\ \hline
    19 L2 Cache & 512KB \\ \hline
    20 Flash & 16 GB \\ \hline
     17L1 Cache & 32kB I-Cache, 32kB D-Cache \\ \hline
     18L2 Cache & 512kB \\ \hline
     19Flash & 16GB \\ \hline
    2120\end{tabular}
    2221\end{center}
     
    3534\end{figure}
    3635
    37 In order to implement Parabix2 with NEON technology, Parabix2's underlying SIMD library was rewritten
    38 to match the NEON instruction set. It was cross-compiled through the use of the Android 2.2 SDK using the
    39 Android-specific Application Binary Interface (ABI) for ARM-based CPU architectures, armeabi-v7a.
    40 The majority of the SIMD functions were directly portable but some functions were not directly supported
    41 and had to be simulated or otherwise modified to take advantage of the NEON-specific instructions.
     36Migration of Parabix2 to the Android platform began with the retargetting of a subset of the Parabix2 IDISA SIMD library for ARM NEON.
     37This library code was cross-compiled for Android using the Android NDK. The Android NDK is a companion tool to the Android SDK
     38that allows developers to build performance-critical portions of applications in native code. The majority of the Parabix2 SIMD functionality ported directly. However, for a small subset of
     39the SIMD functions of Parabix2 NEON equivalents did not exist. In such cases we simply simulated logical equivalencies using the available the instruction set.
    4240
    43 Figure \ref{arm_processing_time} shows that the processing time (in terms of cycles per kilobyte) of
    44 both Parabix2 and Expat increases substantially compared to the processing time on \CITHREE{}
    45 (Figure \ref{corei3_TOT}). That result is not unexpected given the architectural differences between
    46 the \CORTEXA8{} and the \CITHREE{}. Surprisingly, Expat outperforms Parabix2 on both of the
    47 lower-density workloads (dew.xml and jaw.xml), and is only moderately worse than Parabix2 on the
    48 higher-density workloads. Although the higher latency of the NEON instruction set is definatly
    49 a factor in performance loss of Parabix2, a more interesting view of this result can be seen in Figure
    50 \ref{relative_performance_arm_vs_i3}; by comparing the relative performance of the GT-P1000M vs.
    51 the \CITHREE{}, the performance of each parser on each workload is actually quite stable.
    52 Compared to the \CITHREE{}, on the GT-P1000M, Parabix2 and Expat operate at approximately 17.2\% and
    53 55.7\% efficiency respectively. This indicates that the baseline cost of the NEON instruction set---
    54 and thereby the baseline cost of Parabix2---is substantially higher.
    55 Given that Parabix2 was not designed with the limitations of the \CORTEXA8{} processor in mind, a
     41A comparison of Figure \ref{arm_processing_time} and Figure \ref{corei3_TOT} demonstrates that the performance of
     42both Parabix2 and Expat degrades substantially on \CORTEXA8{}.  This result was expected given the combarably performance limited \CORTEXA8{} hardware architecture.  Surprisingly on \CORTEXA8{}  Expat outperforms Parabix2 on each of the lower markup density workloads, dew.xml and jaw.xm. On the remaining higher-density workloads, Parabix2 performs only moderately better than Expat.
     43The higher latency of the NEON instructions on \CORTEXA8{} is the likely contributor to this loss in performance. A more interesting aspect of this result is demonstrated in a comparison of Figure
     44\ref{relative_performance_arm_vs_i3} and Figure \ref{relative_performance_arm_vs_i3}. These figure demonstrate that the relative performance of each parser degrades in a relatively constant manner.
     45That is, compared to the \CITHREE{}, on the GT-P1000M, Parabix2 and Expat operate at approximately 17.2\% and
     4655.7\% efficiency respectively. Figure \ref{relative_performance_arm_vs_i3} shows that the baseline cost of Parabix2 operations implemented using the NEON instruction set---
     47and thereby the baseline cost of Parabix2---is substantially higher on the \CORTEXA8{} processor.
     48Given that Parabix2 was not designed with the limitations of the \CORTEXA8{} in mind, in the future a
    5649careful analysis of the cost of each instruction provided in the ARMv7 ISA may allow us to better utilize
    57 the resources provided in the GT-P1000M.
    58 Additionally, if future NEON implementations result in even marginally improved performance, large gains
    59 could be realized in Parabix2.
     50the hardware resources provided. In particular, future performance enhancement to ARM NEON could result in substantial overall improvement in Parabix2 execution time.
    6051
    6152\begin{figure}
     
    6354\includegraphics[width=0.5\textwidth]{plots/RelativePerformanceARMvsCoreI3.pdf}
    6455\end{center}
    65 \caption{Relative Performance of GT-P1000M vs. \CITHREE{}}
     56\caption{Relative Slow Down of Parbix2 and Expat on GT-P1000M vs. \CITHREE{} }
    6657\label{relative_performance_arm_vs_i3}
    6758\end{figure}
  • docs/PACT2011/main.tex

    r1125 r1129  
     1
    12\input{preamble-final-acm}
    23%\input{preamble-tr}
Note: See TracChangeset for help on using the changeset viewer.