# Changeset 954

Ignore:
Timestamp:
Mar 18, 2011, 1:30:24 PM (8 years ago)
Message:

Add more charts, modified abstract and some other minor changes

Location:
docs/PACT2011
Files:
1 deleted
5 edited

Unmodified
Removed
• ## docs/PACT2011/00-abstract.tex

 r953 XML is a data format designed for documents as well as the representation of data structures. The simplicity and generality of the rules make it widely used in web services and database systems. Traditional byte-at-a time XML parsers have reach their bottleneck for further improvement to satisfy the growing demand on high performance and energy efficient XML parsing. We propose a new XML parser, Parabix, based on parallel bit stream technology, which enables parallel processing using SIMD registers. We evaluate and analyze the characteristic of our first and second version parsers, which is later referred as Parabix1 and Parabix2, as well as two other popular XML parsers, Expat and Xerces on three generations of x86 machines, Dual Core, Core i3 and Sandy Bridge. The results show that Parabix2 runs 2X to 8X faster than Expat and Xerces and performs much better in terms of data cache misses and branch misperditions. Moreover, Parabix2 scales better on the three different architectures and achieves more performance improvement on newer ones. With the same level of power consumption of all parsers we studied, Parabix2 consumes much less energy. XML is a data format designed for documents as well as the representation of data structures. The simplicity and generality of the rules make it widely used in web services and database systems. Traditional XML parsers have been built around the byte-at-a-time model, in which they process every character token in the file in a sequential fashion. Unfortunately, the byte-at-time sequential model is a fundamental hindrance on performance and and in some cases can add up 100\% overhead to the database queries themselves. In this paper, we propose a new XML parser, Parabix, based on parallel bit stream technology, which converts the character strings into bitstreams and then exploits SIMD operations prevalent on modern CPUs. The first generation parser that we developed, Parabix1, uses the bitscan and bitlevel sequencing SIMD operations to emulate much of the parsers functions. Unfortunately operations like bitscan are inherently sequential nature and Parabix1's speedup is limited. We present a second generation parser, Parabix2, that fully parallelizes the parsing operations using using parallel bitlevel logic provided in modern SIMD extensions like SSE2.  We evaluate Parabix1and Parabix2 against two widely-used XML parsers, Apache's Expat and IBM's Xerces on three generations of x86 machines, including the new Intel Sandybridge. We show that Parabix2's speedup is 2$\times$---8$\times$ over Expat and Xerces. Across the different Intel machine generations, Parabix rides the scalability curve of SIMD operations whose performance inherently scales better than traditional sequential thread performance. Comparing Intel's new Sandbrige core with the Core i3 we observed performance improvement between 20---60\% for our Parabix parsers while sequential parsers like Xerces improve by $<$20\%. We measure real CPU power to demonstrate that Parabix also brings with itself significant energy efficiency. On the core i3, Parabix consumes $\simeq$4nJ per byte parsed while Xerces consumes $\simeq$20nJ per byte parsed. Finally, we perform a case study of the Intel's new 256-bit wide AVX instructions, and demonstrate that it provides X speedup over 128 bit SSE2 instruction set.
• ## docs/PACT2011/03-research.tex

 r949 \section{Parabix1} \section{Parabix} \label{section:reserach} Describe key technology behind Parabix
• ## docs/PACT2011/04-methodology.tex

 r949 Describe parameters; what each parameter means. \subsection{Platform Hardware} \subsubsection{Intel Core 2} \begin{table}[h] \begin{center} \begin{tabular}{|c||c|} \hline Processor & Core(TM)2 6400  (2.13GHz) \\ \hline L1 Cache & 32KB I-Cache, 32KB D-Cache \\ \hline L2 Cache & 2MB \\ \hline Front Side Bus & 1333 MHz \\ \hline Memory  & 2GB \\ \hline \subsubsection{Server - Intel Core i3} \end{tabular} \end{center} \caption{Core 2} \label{core2} \end{table} \subsubsection{Intel Core i3} The Intel Core i3 is a Nehalem based processor produced by Intel. The intent of this processor is to serve as a low end server processor. Table \ref{i3} gives the hardware description of the Intel Core i3 based machine selected. \end{table} \subsubsection{Server - Sandy Bridge} \subsubsection{Sandy Bridge} \subsection{PMC Hardware Events}\label{events}
• ## docs/PACT2011/05-performance.tex

 r953 \begin{figure} \begin{center} \includegraphics[width=85mm]{plots/corei3_INS.pdf} \includegraphics[width=85mm]{plots/corei3_INS_p1.pdf} \end{center} \caption{Vector instruction vs. non-vertor instruction on core i3} \label{corei3_INS} \caption{Vector instruction vs. non-vertor instruction for Parabix1 on core i3} \label{corei3_INS_p1} \end{figure} \begin{figure} \begin{center} \includegraphics[width=85mm]{plots/corei3_INS_p2.pdf} \end{center} \caption{Vector instruction vs. non-vertor instruction for Parabix2 on core i3} \label{corei3_INS_p2} \end{figure}
Note: See TracChangeset for help on using the changeset viewer.