# Changeset 1407

Ignore:
Timestamp:
Aug 31, 2011, 3:13:36 PM (8 years ago)
Message:

Minor bug fixes

Location:
docs/HPCA2012
Files:
10 edited

Unmodified
Removed
• ## docs/HPCA2012/01-intro.tex

 r1405 \begin{figure} \begin{center} \includegraphics[width=85mm]{plots/performance_energy_chart.pdf} \end{center} \caption{XML Parser Technology Energy vs. Performance} \label{perf-energy} \end{figure} Figure~\ref{perf-energy} showcases the overall efficiency of our framework. The Parabix-XML parser improves the performance %by ?$\times$ performance %by ?$\times$ and energy efficiency %by ?$\times$ several-fold compared \begin{comment} Figure~\ref{perf-energy} is an energy-performance scatter plot showing the results obtained. With all this XML processing, a substantial literature has arisen addressing XML processing performance in general and the performance of XML parsers in particular.  Nicola and John specifically identified XML parsing as a threat to database performance and outlined a number of potential directions for potential performance improvements \cite{NicolaJohn03}.  The nature of XML APIs was found to have a significant affect on performance with event-based SAX (Simple API for XML) parsers avoiding the tree construction costs of the more flexible DOM (Document Object Model) parsers \cite{Perkins05}.  The commercial importance of XML parsing spurred developments of hardware-based approaches including the development of a custom XML chip \cite{Leventhal2009} as well as FPGA-based implementations \cite{DaiNiZhu2010}.  However promising these approaches may be for particular niche applications, it is likely that the bulk of the world's XML processing workload will be carried out on commodity processors using software-based solutions. To accelerate XML parsing performance in software, most recent work has focused on parallelization.  The use of multicore parallelism for chip multiprocessors has attracted the attention of several groups \cite{ZhangPanChiu09, ParaDOM2009, LiWangLiuLi2009}, while SIMD (Single Instruction Multiple Data) parallelism has been of interest to Intel in designing new SIMD instructions\cite{XMLSSE42} , as well as to the developers of parallel bit stream technology \cite{CameronHerdyLin2008,Cameron2009,Cameron2010}. Each of these approaches has shown considerable performance benefits over traditional sequential parsing techniques that follow the byte-at-a-time model. \end{comment} \begin{figure} \begin{center} \includegraphics[width=85mm]{plots/performance_energy_chart.pdf} \end{center} \caption{XML Parser Technology Energy vs. Performance} \label{perf-energy} \end{figure} The remainder of this paper is organized as follows. Section~\ref{section:background} presents background material on XML Section~\ref{section:scalability} compares the performance and energy efficiency of 128 bit SIMD extensions across three generations of intel processors and includes a comparison with the ARM Cortex-A8 Intel processors and includes a comparison with the ARM Cortex-A8 processor.  Section~\ref{section:avx} examines the Intel's new 256-bit AVX technology and comments on the benefits and challenges compared to Parabix XML parser which seeks to exploit the SIMD units scattered across multiple cores.
• ## docs/HPCA2012/03-research.tex

 r1398 point to determine other bit streams.  In particular, Parabix uses the basis bit streams to construct \emph{character-class bit streams} in which each $\tt 1$ bit indicates the presense of a significant which each $\tt 1$ bit indicates the presence of a significant character (or class of characters) in the parsing process. Character-class bit streams may then be used to compute \emph{lexical Unlike the single-cursor approach of traditional text parsers, these allow Parabix to process multiple cursors in parallel. Error bit streams are often the byproduct or derivative of computing lexical bit streams and can be used to identify any well-formedness issues found during the parsing process. The presense of a $\tt 1$ in an error stream indicates that the lexical stream cannot be issues found during the parsing process. The presence of a $\tt 1$ in an error stream indicates that the lexical stream cannot be trusted to be completely accurate and it may be necessary to perform some sequential parsing on that section to determine the cause and severity of the error. %How errors are handled depends on the logical implications of the error and go beyond the scope of this paper. sixteen 8-bit fields. These operations were originally developed for 128-bit Altivec operations on Power PC as well as 64-bit MMX and 128-bit SSE operations on Intel but have recently extended to support the new 256-bit AVX operations on Intel as well as the 128-bit \NEON{} operations on the ARM architecture. We have ported parabix to a wide variety of processor architectures demonstrating its applicability to commodity SIMD hardware. We currently take advantage of the 128-bit Altivec operations on the Power PC, 64-bit MMX and 128-bit SSE operations on previous generation Intel platforms, the latest 256-bit AVX extensions on the Sandybridge processor, and finally the 128-bit \NEON{} operations on ARM.
• ## docs/HPCA2012/03b-research.tex

 r1396 (2) references, and (3) start tags, end tags, and empty tags as well as any related attributes. Afterwards, the information is gathered by the {\tt Name\_Validation} and Afterward, the information is gathered by the {\tt Name\_Validation} and {\tt Err\_Check} functions, producing name check streams and error streams. Name check streams are weak error streams that verify each character used in a
• ## docs/HPCA2012/04-methodology.tex

 r1399 entirely of single byte  encoded ASCII characters. \begin{table*} \begin{table*}[!h] \begin{center} {
• ## docs/HPCA2012/05-corei3.tex

 r1400 \begin{figure} \begin{figure}[!h] \subfigure[L1 Misses]{ \includegraphics[width=0.32\textwidth]{plots/corei3_L1DM.pdf}
• ## docs/HPCA2012/06-scalability.tex

 r1393 of Neon SIMD operations. \begin{figure}[!h] \subfigure[ARM Neon Performance]{ \includegraphics[width=0.3\textwidth]{plots/arm_TOT.pdf} \label{arm_processing_time} } \hfill \subfigure[ARM Neon]{ \includegraphics[width=0.32\textwidth]{plots/Markup_density_Arm.pdf} \label{relative_performance_arm} } \hfill \subfigure[Core i3]{ \includegraphics[width=0.32\textwidth]{plots/Markup_density_Intel.pdf} \label{relative_performance_intel} } \caption{Comparaing Parabix on ARM and Intel.} \end{figure} \begin{figure} \subfigure[ARM Neon Performance]{ \includegraphics[width=0.3\textwidth]{plots/arm_TOT.pdf} \label{arm_processing_time} } \hfill \subfigure[ARM Neon]{ \includegraphics[width=0.32\textwidth]{plots/Markup_density_Arm.pdf} \label{relative_performance_arm} } \hfill \subfigure[Core i3]{ \includegraphics[width=0.32\textwidth]{plots/Markup_density_Intel.pdf} \label{relative_performance_intel} } \caption{Comparaing Parabix on ARM and Intel.} \end{figure}
• ## docs/HPCA2012/07-avx.tex

 r1389 application didn't need to be modified. \begin{figure*} \begin{center} \includegraphics[height=0.25\textheight]{plots/InsMix.pdf} \end{center} \caption{Parabix Instruction Counts (y-axis: Instructions per kB)} \label{insmix} \end{figure*} \begin{figure} \begin{center} \includegraphics[width=0.5\textwidth]{plots/avx.pdf} \end{center} \caption{Parabix Performance (y-axis: ns per kB)} \label{avx} \end{figure} \paragraph{3-Operand Form} AVX. \begin{figure*}[!h] \begin{center} \includegraphics[height=0.25\textheight]{plots/InsMix.pdf} \end{center} \caption{Parabix Instruction Counts (y-axis: Instructions per kB)} \label{insmix} \end{figure*} \begin{figure}[!h] \begin{center} \includegraphics[width=0.5\textwidth]{plots/avx.pdf} \end{center} \caption{Parabix Performance (y-axis: ns per kB)} \label{avx} \end{figure} Note that, in each workload, the number of non-SIMD instructions remains relatively constant with each workload.  As may be expected