Changeset 2429 for docs/Working/icXML


Ignore:
Timestamp:
Oct 10, 2012, 3:59:17 PM (7 years ago)
Author:
nmedfort
Message:

some progress

Location:
docs/Working/icXML
Files:
6 added
4 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icXML/background-parabix.tex

    r2301 r2429  
     1\subsection{The Parabix Framework}
    12\begin{figure*}[tbhp]
    23\begin{center}
     
    1516\end{figure*}
    1617
     18The Parabix (parallel bit stream) framework is a transformative approach to XML parsing
     19(and other forms of text processing.) The key idea is to exploit the availability of wide
     20(e.g., 128-bit) SIMD registers in commodity processors to represent data from long blocks
     21of input data by using one register bit per single input byte.
     22To facilitate this, the input data is first transposed into a set of basis bit streams
     23and then boolean-logic operations\footnote{$\land$, $\lor$ and $\lnot$ denote the boolean
     24AND, OR and NOT operators.} are used to classify the input bits into a set of character-class
     25(and eventually lexical) bit streams.
     26For example, in Figure~\ref{fig:BitStreamsExample}, we show how the ASCII string ``{\ttfamily
     27  b7\verb`<`A}'' is represented as 8 basis bit streams, $\tt b_{0
     28  \ldots 7}$. The bits used to construct $\tt b_7$ have been
     29highlighted in this example.
    1730
    18 \subsection{Parabix}
    19 The Parabix (parallel bit stream) framework
    20 is a transformative approach to XML and other
    21 forms of text processing based on a key idea to
    22 exploit the availability of wide (e.g., 128 bit)
    23 SIMD registers in commodity processors: 
    24 to represent data from long blocks of input data
    25 (e.g., 128 bytes at a time) using one register bit per input
    26 byte.   Consider, for example, the XML source data  stream shown in the
    27 first line of Figure \ref{fig:parabix1}.  The remaining lines
    28 of this figure show several parallel bit streams that are
    29 computed in Parabix-style parsing, with each bit of each stream
    30 in one-to-one correspondence to the source character code units
    31 of the input stream.   In each stream, 1 bits are shown as themselves
    32 and 0 bits are represented as underscores, for clarity.
     31\begin{figure}[h]
     32\begin{center}
     33\begin{tabular}{r c c c c }
     34STRING & \ttfamily{b} & \ttfamily{7} & \ttfamily{\verb`<`} & \ttfamily{A} \\
     35ASCII & \ttfamily{\footnotesize 0110001{\bfseries 0}} & \ttfamily{\footnotesize 0011011{\bfseries 1}} & \ttfamily{\footnotesize 0011110{\bfseries 0}} & \ttfamily{\footnotesize 0100000{\bfseries 1}} \\
     36\hline
     37\end{tabular}
     38\end{center}
     39\begin{center}
     40\begin{tabular}{r |c |c |c |c |c |c |c |c |}
     41 & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{0}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{1}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{2}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{3}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{4}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{5}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{6}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{7}$}$ \\
     42 & \ttfamily{0} & \ttfamily{1} & \ttfamily{1} & \ttfamily{0} & \ttfamily{0} & \ttfamily{0} & \ttfamily{1} & \bfseries\ttfamily{0} \\
     43 & \ttfamily{0} & \ttfamily{0} & \ttfamily{1} & \ttfamily{1} & \ttfamily{0} & \ttfamily{1} & \ttfamily{1} & \bfseries\ttfamily{1} \\
     44 & \ttfamily{0} & \ttfamily{0} & \ttfamily{1} & \ttfamily{1} & \ttfamily{1} & \ttfamily{1} & \ttfamily{0} & \bfseries\ttfamily{0} \\
     45 & \ttfamily{0} & \ttfamily{1} & \ttfamily{0} & \ttfamily{0} & \ttfamily{0} & \ttfamily{0} & \ttfamily{0} & \bfseries\ttfamily{1} \\
     46\end{tabular}
     47\end{center}
     48
     49\caption{8-bit ASCII Basis Bit Streams}
     50\label{fig:BitStreamsExample}
     51\end{figure}
     52
     53Character-class bit streams allow us to perform 128
     54comparisons in parallel with a single operation by using a series
     55of boolean-logic operations to merge multiple
     56basis bit streams into a single character-class stream that marks the
     57positions of key characters with a $1$. 
     58For example, one of the fundemental markers in XML is the one that
     59identifies all left-angle brackets ``\verb`<`''. A character is
     60an ``\verb`<`'' if and only if $\lnot(b_0 \lor b_1) \land (b_2 \land
     61b_3 \land b_4 \land b_5) \land \lnot (b_6 \lor b_7) = 1$.
     62Classes of characters can be found with similar formulas.  For example, a
     63character is a number {\tt [0-9]} if and only if $\lnot(b_0 \lor b_1)
     64\land (b_2 \land b_3) \land \lnot(b_4 \land (b_5 \lor b_6))$.  An
     65important observation here is that a range of characters can sometimes
     66take fewer operations and require fewer basis bit streams to compute
     67than individual characters. Finding optimal solutions to all
     68character-classes is non-trivial and goes beyond the scope of this
     69paper.
     70
     71
     72Using a mixture of boolean-logic and arithmetic operations, character-class
     73bit streams can be transformed into lexical bit streams, where the presense of
     74a 1 bit identifies a key position in the input data. As an artifact of this
     75process, intra-element well-formedness validation is performed on each block
     76of text.
     77
     78Consider, for example, the XML source data stream shown in the first line of Figure \ref{fig:parabix1}.
     79The remaining lines of this figure show several parallel bit streams that are computed in Parabix-style
     80parsing, with each bit of each stream in one-to-one correspondence to the source character code units
     81of the input stream.
     82For clarity, 1 bits are denoted with 1 in each stream and 0 bits are represented as underscores.
    3383The first bit stream shown is that for the opening
    3484angle brackets that represent tag openers in XML.
     
    4191attribute names and attribute values of tags. 
    4292
     93{\it Do we need to explain how those can be computed from the input text or do we simply refer them to prior papers?}
     94
    4395Two intuitions may help explain how the Parabix approach can lead
    44 to improved XML parsing performance.   The first is that
     96to improved XML parsing performance. The first is that
    4597the use of the full register width offers a considerable
    4698information advantage over sequential byte-at-a-time
     
    50102The second is that byte-at-a-time loop scanning loops are actually
    51103often just computing a single bit of information per iteration:
    52 is the scan complete at this position yet or not?  Rather than
     104is the scan complete at this position yet?  Rather than
    53105computing these bits one at a time, an approach that computes
    54106many of them in parallel (e.g., 128 with SSE registers) should
    55 provide substantial benefit.
    56 
    57 Previous studies have shown the performance benefits of the
    58 Parabix approach inmany aspects of XML processing, including transcoding\cite{Cameron2008},
     107provide substantial {\bf benefit}.
     108Previous studies have shown the performance {\bf benefits} of the
     109Parabix approach in many aspects of XML processing, including transcoding\cite{Cameron2008},
    59110character classification and validation, tag parsing and well-formedness
    60 checking.  The first Parabix  parser used processor bit scan instructions
     111checking.  The first Parabix parser used processor bit scan instructions
    61112to considerably accelerate sequential scanning loops for individual
    62 characters \cite{CameronHerdyLin2008}.   Recent work has incorporated a method of parallel
     113characters \cite{CameronHerdyLin2008}.
     114Recent work has incorporated a method of parallel
    63115scanning using bitstream addition \cite{cameron-EuroPar2011}, as
    64116well as combining SIMD methods with 4-stage pipeline parallelism to further improve
     
    66118
    67119Although these research prototypes handle the full syntax of
    68 DTDless XML documents including well-formedness checking, they fall
    69 short of the functionality required in full XML parser for several reasons.
    70 Commercial XML processors include a number of additional facilities such
     120DTD-less XML documents, including well-formedness checking, they fall
     121short of the functionality required in full XML parser for several reasons. Namely,
     122commercial XML processors, such as Xerces, include a number of additional facilities such
    71123as support for transcoding of multiple character sets,
    72 the ability to parse and validate against DTDs (document type definitions),
    73 both internal and external,
     124the ability to parse and validate against DTDs, both internal and external,
    74125facilities for handling different XML vocabularies through namespace
    75126processing, as well validation against XML schema.  In addition,
    76127commercial parsers can be expected to provide a number of API
    77128facilities beyond those found in research prototypes, including
    78 full implementations of the widely used SAX and DOM interfaces.
     129full implementations of the widely used SAX, SAX2 and DOM interfaces.
    79130
  • docs/Working/icXML/background-xerces.tex

    r2300 r2429  
    44XML parser produced as open-source software
    55by the Apache Software Foundation.  It features
    6 comprehensive support for XML character encodings
    7 both commonplace and rarely used, support for
     6comprehensive support for a variety of character encodings
     7both commonplace (e.g., UTF-8, UTF-16) and rarely used (e.g., EBCDIC), support for
    88multiple XML vocabularies through the XML namespace
    99mechanism, as well as complete implementations
    10 of structure and data validation through grammars
     10of structure and data validation through multiple grammars
    1111declared using either legacy DTDs (document type
    1212definitions) or modern XML schema facilities.
    1313Xerces also supports several APIs for accessing
    1414parser services, including event-based parsing
    15 using either pull parsing or SAX-style push
    16 parsing as well as tree-based parsing with
    17 a DOM-based interface.
     15using either pull parsing or SAX push-style
     16parsing as well as a DOM tree-based parsing interface.
    1817
    19 As a complex software system, there is no single Xerces
    20 feature that dominates in overall parsing performance.
     18% What is the story behind the xerces-profile picture? should it contain one single file or several from our test suite?
     19% Our test suite does not have any grammars in it; ergo, processing those files will give a poor indication of the cost of using grammars
     20% Should we show a val-grind summary of a few files in a linechart form?
     21
     22Xerces, like all traditional parsers, process XML documents sequentially a byte-at-a-time from the
     23first to the last byte of input data. Each byte passes through several processing layers and are
     24classified and eventually validated within the context of the document state.
     25This introduces implicit dependencies between the various tasks within the application that make it
     26difficult to optimize for performance.
     27As a complex software system, no one feature dominates the overall parsing performance.
    2128Figure \ref{fig:xerces-profile} shows the
    2229execution time profile of the top ten functions
    23 in a typical run.  Even if it were possible,
    24 tackling any single one of these
     30in a typical run.
     31Even if it were possible, tackling any single one of these
    2532functions for parallelization in isolation would
    2633only produce a small improvement in perfomance
    2734in accord with Amdahl's Law.  In order to obtain
    28 systematic acceleration of the Xerces parser, then,
     35systematic acceleration of the Xerces parser,
    2936it should be expected that a comprehensive restructuring
    3037is required, involving all aspects of the parser.
    3138
    32 Figure \ref{fig:xerces-arch} shows the
    33 overall architecture of the Xerces C++ parser.
    34 In analyzing the structure of Xerces, it was found that
    35 there were a number of individual byte-at-a-time
    36 processing tasks.
    37 \begin{enumerate}
    38 \item Transcoding to UTF-16
    39 \item Character validation.
    40 \item Line break normalization.
    41 \item Character classification.
    42 \item Line-column calculation.
    43 \item Escape insertion and replacement.
    44 \item Surrogate handling.
    45 \item Name processing.
    46 \item Markup parsing.
    47 \item Attribute checking.
    48 \item xmlns attribute processing.
    49 \item namespacing processing.
    50 \item Grammars, content model and data type validation.
    51 \end{enumerate}
     39
     40% Figure \ref{fig:xerces-arch} shows the
     41% overall architecture of the Xerces C++ parser.
     42% In analyzing the structure of Xerces, it was found that
     43% there were a number of individual byte-at-a-time
     44% processing tasks.
     45%
     46% \begin{enumerate}
     47% \item Transcoding of source data to UTF-16
     48% \item Character validation.
     49% \item Line break normalization.
     50% \item Character classification.
     51% \item Line-column calculation.
     52% \item Escape insertion and replacement.
     53% \item Surrogate handling.
     54% \item Name processing.
     55% \item Markup parsing.
     56% \item Attribute validation.
     57% %\item Attribute checking.
     58% %\item xmlns attribute processing.
     59% \item Namespace processing.
     60% \item Grammars, content model and data type validation.
     61% \end{enumerate}
    5262
    5363
    54 
  • docs/Working/icXML/icxml-main.tex

    r2302 r2429  
    5656\end{abstract}
    5757
     58\def \icXML {icXML}
     59
    5860\category{CR-number}{subcategory}{third-level}
    5961
     
    7173studied problem that has seen the development of a number
    7274of interesting research prototypes.
    73 One possibility to data parallelizing the parsing process is by adding a pre-parsing step to get the skeleton that symbolized the tree structure of the XML document \cite{GRID2006}.
     75One possibility to data parallelizing the parsing process is by adding a
     76pre-parsing step to get the skeleton that symbolized the tree structure of the XML document \cite{GRID2006}.
    7477The pre-parsing stage can also be parallelized using state machines \cite{E-SCIENCE2007, IPDPS2008}.
    75 Methods without pre-parsing require speculation \cite{HPCC2011} or post-processing that combines the partial results \cite{ParaDOM2009}.
    76 A hybrid method that combines data parallelism and pipeline parallelism is proposed to hide the latency of the ``job'' that has to be done sequentially \cite{ICWS2008}.
    77 Intel introduced new string processing instructions in the SSE 4.2 instruction set extension and showed how it can be used to improve the performance of XML parsing \cite{XMLSSE42}.
    78 Parabix XML parser exploit the SIMD extensions to process hundreds of XML input characters simultaneously \cite{Cameron2009, cameron-EuroPar2011}.
    79 Parabix can also be combined with thread-level parallelism to achieve further improvement on multicore systems \cite{HPCA2012}.
     78Methods without pre-parsing require speculation \cite{HPCC2011} or post-processing that
     79combines the partial results \cite{ParaDOM2009}.
     80A hybrid method that combines data parallelism and pipeline parallelism is proposed to
     81hide the latency of the ``job'' that has to be done sequentially \cite{ICWS2008}.
     82Intel introduced new string processing instructions in the SSE 4.2 instruction set extension
     83and showed how it can be used to improve the performance of XML parsing \cite{XMLSSE42}.
     84Parabix XML parser exploit the SIMD extensions to process hundreds of XML input characters
     85simultaneously \cite{Cameron2009, cameron-EuroPar2011}.
     86Parabix can also be combined with thread-level parallelism to achieve further improvement
     87on multicore systems \cite{HPCA2012}.
    8088
    8189Paragraph 2:
     
    109117\section{Background}
    110118
     119\input{background-xerces}
    111120\input{background-parabix}
    112 \input{background-xerces}
     121\input{background-fundemental-differences.tex}
    113122
    114123\section{Architecture}
    115124
     125\input{arch-overview.tex}
     126
    116127\input{parfilter.tex}
    117128
     129\input{arch-namespace.tex}
    118130
    119 
     131\input{arch-errorhandling.tex}
    120132
    121133\begin{figure}
    122134\begin{center}
    123135\includegraphics[width=0.15\textwidth]{plots/xerces.pdf}
    124 \label{xerces_structure}
     136\label{fig:xerces-arch}
    125137\caption{}
    126138\end{center}
     
    129141\begin{figure}
    130142\includegraphics[width=0.50\textwidth]{plots/icxml.pdf}
    131 \label{icxml_structure}
     143\label{fig:icxml-arch}
    132144\caption{}
    133145\end{figure}
    134   - Philosophy:  Maximizing Bit Stream Processing
    135 
    136   - Character Set Adapters vs. Transcoding
    137   - Bitstreams 1: Charset Validation and Transcoding equations
    138   - Bitstreams 2: Parabix style parsing and validation
    139 
    140   - Bitstreams 3: Parallel filtering and normalization
    141           - LB normalization
    142           - reference compression -> single code unit speculation
    143           - parallel string termination
    144 
    145   - Bitstreams 4: Symbol processing
    146 
    147   - From bit streams to doublebyte streams: the content buffer
    148      
    149   - Namespace Processing: A Bitset approach.
    150146
    151147\section{Performance}
    152148
    153 ICXML vs. Original Xerces
     149\icXML{} vs. Original Xerces
    154150
    155151 -- SAXCount
  • docs/Working/icXML/parfilter.tex

    r2289 r2429  
    44Several stages of standard XML processing include compressions
    55that filter out data from certain input byte positions
    6 in various ways.   These include the compression of multibyte
     6in various ways. These include the compression of multibyte
    77sequences in UTF-8 to UTF-16 transcoding, the compression
    88of CR-LF sequences to single LF characters in line break
     
    5050code units by 4, although each code unit is  now a 16-bit
    5151quantity.  This compression is carried out by the UTF-8 to UTF-16
    52 transcoder in Xerces.   Line 3 then shows the reduction of the
    53 two CR-LF sequences to simple LF values.  This is the line break
    54 normalization that takes place within the Reader???
     52transcoder in Xerces. Line 3 then shows the reduction of the
     53two CR-LF sequences to simple LF values. 
     54% This is the line break normalization that takes place within the Reader???
    5555Line 4 then shows the further compression by replacement of
    5656the character and general entity references by their corresponding
     
    6363deletion using parallel block-at-a-time operations, following
    6464for example, the parallel compress algorithm\cite{HackersDelight}.
    65 Figure \ref{Fig:compression2} shows three deletion masks corresponding
     65Figure \ref{fig:compression2} shows three deletion masks corresponding
    6666to the three compressions of Figure \ref{fig:compression1}.
    6767In the fourth line of this figure, the bitwise-or of these
Note: See TracChangeset for help on using the changeset viewer.