Changeset 1407 for docs


Ignore:
Timestamp:
Aug 31, 2011, 3:13:36 PM (8 years ago)
Author:
ashriram
Message:

Minor bug fixes

Location:
docs/HPCA2012
Files:
10 edited

Legend:

Unmodified
Added
Removed
  • docs/HPCA2012/01-intro.tex

    r1405 r1407  
    6868
    6969
     70
     71
     72\begin{figure}
     73\begin{center}
     74\includegraphics[width=85mm]{plots/performance_energy_chart.pdf}
     75\end{center}
     76\caption{XML Parser Technology Energy vs. Performance}
     77\label{perf-energy}
     78\end{figure}
     79
     80
     81
    7082Figure~\ref{perf-energy} showcases the overall efficiency of our
    7183framework. The Parabix-XML parser improves the
    72 performance %by ?$\times$ 
     84performance %by ?$\times$
    7385and energy efficiency %by ?$\times$
    7486several-fold compared
     
    105117
    106118
    107 \begin{comment}
    108 Figure~\ref{perf-energy} is an energy-performance scatter plot showing
    109 the results obtained.
    110 
    111 
    112 With all this XML processing, a substantial literature has arisen
    113 addressing XML processing performance in general and the performance
    114 of XML parsers in particular.  Nicola and John specifically identified
    115 XML parsing as a threat to database performance and outlined a number
    116 of potential directions for potential performance improvements
    117 \cite{NicolaJohn03}.  The nature of XML APIs was found to have a
    118 significant affect on performance with event-based SAX (Simple API for
    119 XML) parsers avoiding the tree construction costs of the more flexible
    120 DOM (Document Object Model) parsers \cite{Perkins05}.  The commercial
    121 importance of XML parsing spurred developments of hardware-based
    122 approaches including the development of a custom XML chip
    123 \cite{Leventhal2009} as well as FPGA-based implementations
    124 \cite{DaiNiZhu2010}.  However promising these approaches may be for
    125 particular niche applications, it is likely that the bulk of the
    126 world's XML processing workload will be carried out on commodity
    127 processors using software-based solutions.
    128 
    129 To accelerate XML parsing performance in software, most recent
    130 work has focused on parallelization.  The use of multicore
    131 parallelism for chip multiprocessors has attracted
    132 the attention of several groups \cite{ZhangPanChiu09, ParaDOM2009, LiWangLiuLi2009},
    133 while SIMD (Single Instruction Multiple Data) parallelism
    134 has been of interest to Intel in designing new SIMD instructions\cite{XMLSSE42}
    135 , as well as to the developers of parallel bit stream technology
    136 \cite{CameronHerdyLin2008,Cameron2009,Cameron2010}.
    137 Each of these approaches has shown considerable performance
    138 benefits over traditional sequential parsing techniques that follow the
    139 byte-at-a-time model.
    140 \end{comment}
    141 
    142 
    143 
    144 \begin{figure}
    145 \begin{center}
    146 \includegraphics[width=85mm]{plots/performance_energy_chart.pdf}
    147 \end{center}
    148 \caption{XML Parser Technology Energy vs. Performance}
    149 \label{perf-energy}
    150 \end{figure}
    151 
    152119The remainder of this paper is organized as follows.
    153120Section~\ref{section:background} presents background material on XML
     
    163130Section~\ref{section:scalability} compares the performance and energy
    164131efficiency of 128 bit SIMD extensions across three generations of
    165 intel processors and includes a comparison with the ARM Cortex-A8
     132Intel processors and includes a comparison with the ARM Cortex-A8
    166133processor.  Section~\ref{section:avx} examines the Intel's new 256-bit
    167134AVX technology and comments on the benefits and challenges compared to
     
    170137Parabix XML parser which seeks to exploit the SIMD units scattered
    171138across multiple cores.
     139
     140
    172141
    173142
  • docs/HPCA2012/03-research.tex

    r1398 r1407  
    6868point to determine other bit streams.  In particular, Parabix uses the
    6969basis bit streams to construct \emph{character-class bit streams} in
    70 which each $\tt 1$ bit indicates the presense of a significant
     70which each $\tt 1$ bit indicates the presence of a significant
    7171character (or class of characters) in the parsing process.
    7272Character-class bit streams may then be used to compute \emph{lexical
     
    138138Unlike the single-cursor approach of traditional text parsers, these allow Parabix to process multiple cursors in parallel.
    139139Error bit streams are often the byproduct or derivative of computing lexical bit streams and can be used to identify any well-formedness
    140 issues found during the parsing process. The presense of a $\tt 1$ in an error stream indicates that the lexical stream cannot be
     140issues found during the parsing process. The presence of a $\tt 1$ in an error stream indicates that the lexical stream cannot be
    141141trusted to be completely accurate and it may be necessary to perform some sequential parsing on that section to determine the cause and severity
    142142of the error. %How errors are handled depends on the logical implications of the error and go beyond the scope of this paper.
     
    347347sixteen 8-bit fields.
    348348
    349 These operations were originally developed for 128-bit Altivec operations on Power PC
    350 as well as 64-bit MMX and 128-bit SSE operations on Intel
    351 but have recently extended to support
    352 the new 256-bit AVX operations on Intel as well as the 128-bit
    353 \NEON{} operations on the ARM architecture.
    354 
     349We have ported parabix to a wide variety of processor architectures
     350demonstrating its applicability to commodity SIMD hardware. We
     351currently take advantage of the 128-bit Altivec operations on the
     352Power PC, 64-bit MMX and 128-bit SSE operations on previous generation
     353Intel platforms, the latest 256-bit AVX extensions on the Sandybridge
     354processor, and finally the 128-bit \NEON{} operations on ARM.
     355
  • docs/HPCA2012/03b-research.tex

    r1396 r1407  
    2828(2) references, and
    2929(3) start tags, end tags, and empty tags as well as any related attributes.
    30 Afterwards, the information is gathered by the {\tt Name\_Validation} and
     30Afterward, the information is gathered by the {\tt Name\_Validation} and
    3131{\tt Err\_Check} functions, producing name check streams and error streams.
    3232Name check streams are weak error streams that verify each character used in a
  • docs/HPCA2012/04-methodology.tex

    r1399 r1407  
    3333entirely of single byte  encoded ASCII characters.
    3434
    35 \begin{table*}
     35\begin{table*}[!h]
    3636\begin{center}
    3737{
  • docs/HPCA2012/05-corei3.tex

    r1400 r1407  
    3131
    3232
    33 \begin{figure}
     33\begin{figure}[!h]
    3434\subfigure[L1 Misses]{
    3535\includegraphics[width=0.32\textwidth]{plots/corei3_L1DM.pdf}
  • docs/HPCA2012/06-scalability.tex

    r1393 r1407  
    100100of Neon SIMD operations.
    101101
     102\begin{figure}[!h]
     103\subfigure[ARM Neon Performance]{
     104\includegraphics[width=0.3\textwidth]{plots/arm_TOT.pdf}
     105\label{arm_processing_time}
     106}
     107\hfill
     108\subfigure[ARM Neon]{
     109\includegraphics[width=0.32\textwidth]{plots/Markup_density_Arm.pdf}
     110\label{relative_performance_arm}
     111}
     112\hfill
     113\subfigure[Core i3]{
     114\includegraphics[width=0.32\textwidth]{plots/Markup_density_Intel.pdf}
     115\label{relative_performance_intel}
     116}
     117\caption{Comparaing Parabix on ARM and Intel.}
     118\end{figure}
     119
     120
    102121
    103122
     
    123142
    124143
    125 \begin{figure}
    126 \subfigure[ARM Neon Performance]{
    127 \includegraphics[width=0.3\textwidth]{plots/arm_TOT.pdf}
    128 \label{arm_processing_time}
    129 }
    130 \hfill
    131 \subfigure[ARM Neon]{
    132 \includegraphics[width=0.32\textwidth]{plots/Markup_density_Arm.pdf}
    133 \label{relative_performance_arm}
    134 }
    135 \hfill
    136 \subfigure[Core i3]{
    137 \includegraphics[width=0.32\textwidth]{plots/Markup_density_Intel.pdf}
    138 \label{relative_performance_intel}
    139 }
    140 \caption{Comparaing Parabix on ARM and Intel.}
    141 \end{figure}
    142144
    143145
    144 
  • docs/HPCA2012/07-avx.tex

    r1389 r1407  
    1010application didn't need to be modified.
    1111
    12 \begin{figure*}
    13 \begin{center}
    14 \includegraphics[height=0.25\textheight]{plots/InsMix.pdf}
    15 \end{center}
    16 \caption{Parabix Instruction Counts (y-axis: Instructions per kB)}
    17 \label{insmix}
    18 \end{figure*}
    19 
    20 \begin{figure}
    21 \begin{center}
    22 \includegraphics[width=0.5\textwidth]{plots/avx.pdf}
    23 \end{center}
    24 \caption{Parabix Performance (y-axis: ns per kB)}
    25 \label{avx}
    26 \end{figure}
    2712
    2813\paragraph{3-Operand Form}
     
    7762AVX.
    7863
     64
     65\begin{figure*}[!h]
     66\begin{center}
     67\includegraphics[height=0.25\textheight]{plots/InsMix.pdf}
     68\end{center}
     69\caption{Parabix Instruction Counts (y-axis: Instructions per kB)}
     70\label{insmix}
     71\end{figure*}
     72
     73\begin{figure}[!h]
     74\begin{center}
     75\includegraphics[width=0.5\textwidth]{plots/avx.pdf}
     76\end{center}
     77\caption{Parabix Performance (y-axis: ns per kB)}
     78\label{avx}
     79\end{figure}
     80
    7981Note that, in each workload, the number of non-SIMD instructions
    8082remains relatively constant with each workload.  As may be expected
  • docs/HPCA2012/09-pipeline.tex

    r1390 r1407  
    2626but this requires introducing significant complexity in the overall
    2727logic of the program.
     28
     29\begin{table*}[!h]
     30{
     31\centering
     32\footnotesize
     33\begin{center}
     34\begin{tabular}{|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|}
     35\hline
     36        &      & & \multicolumn{10}{|c|}{Data Structures}\\ \hline
     37        &      &                & data\_buffer& basis\_bits & u8   & lex   & scope & ctCDPI & ref    & tag    & xml\_names & err\_streams\\ \hline
     38        & latency(C/B)     & size (B)       & 128         & 128         & 496  & 448   & 80    & 176    & 112    & 176    & 16         & 112           \\ \hline
     39Stage1  & 1.97 &read\_data      & write       &             &      &       &       &        &        &        &            &               \\
     40        &      &transposition   & read        & write       &      &       &       &        &        &        &            &               \\
     41        &      &classification  &             & read        &      & write &       &        &        &        &            &               \\ \hline
     42Stage2  & 1.22 &validate\_u8    &             & read        & write&       &       &        &        &        &            &               \\
     43        &      &gen\_scope      &             &             &      & read  & write &        &        &        &            &               \\
     44        &      &parse\_CtCDPI   &             &             &      & read  & read  & write  &        &        &            & write         \\
     45        &      &parse\_ref      &             &             &      & read  & read  & read   & write  &        &            &               \\ \hline
     46Stage3  & 2.03 &parse\_tag      &             &             &      & read  & read  & read   &        & write  &            &               \\
     47        &      &validate\_name  &             &             & read & read  &       & read   & read   & read   & write      & write         \\
     48        &      &gen\_check      &             &             & read & read  & read  & read   &        & read   & read       & write         \\ \hline
     49Stage4  & 1.32 &postprocessing  & read        &             &      & read  &       & read   & read   &        &            & read          \\ \hline
     50\end{tabular}
     51\end{center}
     52\caption{Relationship between Each Pass and Data Structures}
     53\label{pass_structure}
     54}
     55\end{table*}
    2856
    2957
     
    5684cause the stage implicitly stall.
    5785
    58 
    59 \begin{table*}[t]
    60 {
    61 \centering
    62 \footnotesize
    63 \begin{center}
    64 \begin{tabular}{|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|@{~}c@{~}|}
    65 \hline
    66         &      & & \multicolumn{10}{|c|}{Data Structures}\\ \hline
    67         &      &                & data\_buffer& basis\_bits & u8   & lex   & scope & ctCDPI & ref    & tag    & xml\_names & err\_streams\\ \hline
    68         & latency(C/B)     & size (B)       & 128         & 128         & 496  & 448   & 80    & 176    & 112    & 176    & 16         & 112           \\ \hline
    69 Stage1  & 1.97 &read\_data      & write       &             &      &       &       &        &        &        &            &               \\
    70         &      &transposition   & read        & write       &      &       &       &        &        &        &            &               \\
    71         &      &classification  &             & read        &      & write &       &        &        &        &            &               \\ \hline
    72 Stage2  & 1.22 &validate\_u8    &             & read        & write&       &       &        &        &        &            &               \\
    73         &      &gen\_scope      &             &             &      & read  & write &        &        &        &            &               \\
    74         &      &parse\_CtCDPI   &             &             &      & read  & read  & write  &        &        &            & write         \\
    75         &      &parse\_ref      &             &             &      & read  & read  & read   & write  &        &            &               \\ \hline
    76 Stage3  & 2.03 &parse\_tag      &             &             &      & read  & read  & read   &        & write  &            &               \\
    77         &      &validate\_name  &             &             & read & read  &       & read   & read   & read   & write      & write         \\
    78         &      &gen\_check      &             &             & read & read  & read  & read   &        & read   & read       & write         \\ \hline
    79 Stage4  & 1.32 &postprocessing  & read        &             &      & read  &       & read   & read   &        &            & read          \\ \hline
    80 \end{tabular}
    81 \end{center}
    82 \caption{Relationship between Each Pass and Data Structures}
    83 \label{pass_structure}
    84 }
    85 \end{table*}
    8686
    8787
  • docs/HPCA2012/10-related.tex

    r1394 r1407  
    3030enable bit streams to exploit SIMD extensions found on commodity
    3131processors.  We are also the first to perform a detailed analysis of
    32 SIMD instruction extensions across three generations of intel
     32SIMD instruction extensions across three generations of Intel
    3333processors including the new 256 bit AVX extensions. Finally, we have
    3434shown the benefits of using multithreading in conjunction with data
Note: See TracChangeset for help on using the changeset viewer.