# Changeset 2429 for docs/Working/icXML

Ignore:
Timestamp:
Oct 10, 2012, 3:59:17 PM (7 years ago)
Message:

some progress

Location:
docs/Working/icXML
Files:
4 edited

Unmodified
Removed
• ## docs/Working/icXML/background-parabix.tex

 r2301 \subsection{The Parabix Framework} \begin{figure*}[tbhp] \begin{center} \end{figure*} The Parabix (parallel bit stream) framework is a transformative approach to XML parsing (and other forms of text processing.) The key idea is to exploit the availability of wide (e.g., 128-bit) SIMD registers in commodity processors to represent data from long blocks of input data by using one register bit per single input byte. To facilitate this, the input data is first transposed into a set of basis bit streams and then boolean-logic operations\footnote{$\land$, $\lor$ and $\lnot$ denote the boolean AND, OR and NOT operators.} are used to classify the input bits into a set of character-class (and eventually lexical) bit streams. For example, in Figure~\ref{fig:BitStreamsExample}, we show how the ASCII string {\ttfamily b7\verb<A}'' is represented as 8 basis bit streams, $\tt b_{0 \ldots 7}$. The bits used to construct $\tt b_7$ have been highlighted in this example. \subsection{Parabix} The Parabix (parallel bit stream) framework is a transformative approach to XML and other forms of text processing based on a key idea to exploit the availability of wide (e.g., 128 bit) SIMD registers in commodity processors: to represent data from long blocks of input data (e.g., 128 bytes at a time) using one register bit per input byte.   Consider, for example, the XML source data  stream shown in the first line of Figure \ref{fig:parabix1}.  The remaining lines of this figure show several parallel bit streams that are computed in Parabix-style parsing, with each bit of each stream in one-to-one correspondence to the source character code units of the input stream.   In each stream, 1 bits are shown as themselves and 0 bits are represented as underscores, for clarity. \begin{figure}[h] \begin{center} \begin{tabular}{r c c c c } STRING & \ttfamily{b} & \ttfamily{7} & \ttfamily{\verb<} & \ttfamily{A} \\ ASCII & \ttfamily{\footnotesize 0110001{\bfseries 0}} & \ttfamily{\footnotesize 0011011{\bfseries 1}} & \ttfamily{\footnotesize 0011110{\bfseries 0}} & \ttfamily{\footnotesize 0100000{\bfseries 1}} \\ \hline \end{tabular} \end{center} \begin{center} \begin{tabular}{r |c |c |c |c |c |c |c |c |} & $\mbox{\fontsize{11}{11}\selectfont$\tt b_{0}$}$ & $\mbox{\fontsize{11}{11}\selectfont$\tt b_{1}$}$ & $\mbox{\fontsize{11}{11}\selectfont$\tt b_{2}$}$ & $\mbox{\fontsize{11}{11}\selectfont$\tt b_{3}$}$ & $\mbox{\fontsize{11}{11}\selectfont$\tt b_{4}$}$ & $\mbox{\fontsize{11}{11}\selectfont$\tt b_{5}$}$ & $\mbox{\fontsize{11}{11}\selectfont$\tt b_{6}$}$ & $\mbox{\fontsize{11}{11}\selectfont$\tt b_{7}$}$ \\ & \ttfamily{0} & \ttfamily{1} & \ttfamily{1} & \ttfamily{0} & \ttfamily{0} & \ttfamily{0} & \ttfamily{1} & \bfseries\ttfamily{0} \\ & \ttfamily{0} & \ttfamily{0} & \ttfamily{1} & \ttfamily{1} & \ttfamily{0} & \ttfamily{1} & \ttfamily{1} & \bfseries\ttfamily{1} \\ & \ttfamily{0} & \ttfamily{0} & \ttfamily{1} & \ttfamily{1} & \ttfamily{1} & \ttfamily{1} & \ttfamily{0} & \bfseries\ttfamily{0} \\ & \ttfamily{0} & \ttfamily{1} & \ttfamily{0} & \ttfamily{0} & \ttfamily{0} & \ttfamily{0} & \ttfamily{0} & \bfseries\ttfamily{1} \\ \end{tabular} \end{center} \caption{8-bit ASCII Basis Bit Streams} \label{fig:BitStreamsExample} \end{figure} Character-class bit streams allow us to perform 128 comparisons in parallel with a single operation by using a series of boolean-logic operations to merge multiple basis bit streams into a single character-class stream that marks the positions of key characters with a $1$. For example, one of the fundemental markers in XML is the one that identifies all left-angle brackets \verb<''. A character is an \verb<'' if and only if $\lnot(b_0 \lor b_1) \land (b_2 \land b_3 \land b_4 \land b_5) \land \lnot (b_6 \lor b_7) = 1$. Classes of characters can be found with similar formulas.  For example, a character is a number {\tt [0-9]} if and only if $\lnot(b_0 \lor b_1) \land (b_2 \land b_3) \land \lnot(b_4 \land (b_5 \lor b_6))$.  An important observation here is that a range of characters can sometimes take fewer operations and require fewer basis bit streams to compute than individual characters. Finding optimal solutions to all character-classes is non-trivial and goes beyond the scope of this paper. Using a mixture of boolean-logic and arithmetic operations, character-class bit streams can be transformed into lexical bit streams, where the presense of a 1 bit identifies a key position in the input data. As an artifact of this process, intra-element well-formedness validation is performed on each block of text. Consider, for example, the XML source data stream shown in the first line of Figure \ref{fig:parabix1}. The remaining lines of this figure show several parallel bit streams that are computed in Parabix-style parsing, with each bit of each stream in one-to-one correspondence to the source character code units of the input stream. For clarity, 1 bits are denoted with 1 in each stream and 0 bits are represented as underscores. The first bit stream shown is that for the opening angle brackets that represent tag openers in XML. attribute names and attribute values of tags. {\it Do we need to explain how those can be computed from the input text or do we simply refer them to prior papers?} Two intuitions may help explain how the Parabix approach can lead to improved XML parsing performance.   The first is that to improved XML parsing performance. The first is that the use of the full register width offers a considerable information advantage over sequential byte-at-a-time The second is that byte-at-a-time loop scanning loops are actually often just computing a single bit of information per iteration: is the scan complete at this position yet or not?  Rather than is the scan complete at this position yet?  Rather than computing these bits one at a time, an approach that computes many of them in parallel (e.g., 128 with SSE registers) should provide substantial benefit. Previous studies have shown the performance benefits of the Parabix approach inmany aspects of XML processing, including transcoding\cite{Cameron2008}, provide substantial {\bf benefit}. Previous studies have shown the performance {\bf benefits} of the Parabix approach in many aspects of XML processing, including transcoding\cite{Cameron2008}, character classification and validation, tag parsing and well-formedness checking.  The first Parabix  parser used processor bit scan instructions checking.  The first Parabix parser used processor bit scan instructions to considerably accelerate sequential scanning loops for individual characters \cite{CameronHerdyLin2008}.   Recent work has incorporated a method of parallel characters \cite{CameronHerdyLin2008}. Recent work has incorporated a method of parallel scanning using bitstream addition \cite{cameron-EuroPar2011}, as well as combining SIMD methods with 4-stage pipeline parallelism to further improve Although these research prototypes handle the full syntax of DTDless XML documents including well-formedness checking, they fall short of the functionality required in full XML parser for several reasons. Commercial XML processors include a number of additional facilities such DTD-less XML documents, including well-formedness checking, they fall short of the functionality required in full XML parser for several reasons. Namely, commercial XML processors, such as Xerces, include a number of additional facilities such as support for transcoding of multiple character sets, the ability to parse and validate against DTDs (document type definitions), both internal and external, the ability to parse and validate against DTDs, both internal and external, facilities for handling different XML vocabularies through namespace processing, as well validation against XML schema.  In addition, commercial parsers can be expected to provide a number of API facilities beyond those found in research prototypes, including full implementations of the widely used SAX and DOM interfaces. full implementations of the widely used SAX, SAX2 and DOM interfaces.
• ## docs/Working/icXML/background-xerces.tex

 r2300 XML parser produced as open-source software by the Apache Software Foundation.  It features comprehensive support for XML character encodings both commonplace and rarely used, support for comprehensive support for a variety of character encodings both commonplace (e.g., UTF-8, UTF-16) and rarely used (e.g., EBCDIC), support for multiple XML vocabularies through the XML namespace mechanism, as well as complete implementations of structure and data validation through grammars of structure and data validation through multiple grammars declared using either legacy DTDs (document type definitions) or modern XML schema facilities. Xerces also supports several APIs for accessing parser services, including event-based parsing using either pull parsing or SAX-style push parsing as well as tree-based parsing with a DOM-based interface. using either pull parsing or SAX push-style parsing as well as a DOM tree-based parsing interface. As a complex software system, there is no single Xerces feature that dominates in overall parsing performance. % What is the story behind the xerces-profile picture? should it contain one single file or several from our test suite? % Our test suite does not have any grammars in it; ergo, processing those files will give a poor indication of the cost of using grammars % Should we show a val-grind summary of a few files in a linechart form? Xerces, like all traditional parsers, process XML documents sequentially a byte-at-a-time from the first to the last byte of input data. Each byte passes through several processing layers and are classified and eventually validated within the context of the document state. This introduces implicit dependencies between the various tasks within the application that make it difficult to optimize for performance. As a complex software system, no one feature dominates the overall parsing performance. Figure \ref{fig:xerces-profile} shows the execution time profile of the top ten functions in a typical run.  Even if it were possible, tackling any single one of these in a typical run. Even if it were possible, tackling any single one of these functions for parallelization in isolation would only produce a small improvement in perfomance in accord with Amdahl's Law.  In order to obtain systematic acceleration of the Xerces parser, then, systematic acceleration of the Xerces parser, it should be expected that a comprehensive restructuring is required, involving all aspects of the parser. Figure \ref{fig:xerces-arch} shows the overall architecture of the Xerces C++ parser. In analyzing the structure of Xerces, it was found that there were a number of individual byte-at-a-time processing tasks. \begin{enumerate} \item Transcoding to UTF-16 \item Character validation. \item Line break normalization. \item Character classification. \item Line-column calculation. \item Escape insertion and replacement. \item Surrogate handling. \item Name processing. \item Markup parsing. \item Attribute checking. \item xmlns attribute processing. \item namespacing processing. \item Grammars, content model and data type validation. \end{enumerate} % Figure \ref{fig:xerces-arch} shows the % overall architecture of the Xerces C++ parser. % In analyzing the structure of Xerces, it was found that % there were a number of individual byte-at-a-time % processing tasks. % % \begin{enumerate} % \item Transcoding of source data to UTF-16 % \item Character validation. % \item Line break normalization. % \item Character classification. % \item Line-column calculation. % \item Escape insertion and replacement. % \item Surrogate handling. % \item Name processing. % \item Markup parsing. % \item Attribute validation. % %\item Attribute checking. % %\item xmlns attribute processing. % \item Namespace processing. % \item Grammars, content model and data type validation. % \end{enumerate}
• ## docs/Working/icXML/icxml-main.tex

 r2302 \end{abstract} \def \icXML {icXML} \category{CR-number}{subcategory}{third-level} studied problem that has seen the development of a number of interesting research prototypes. One possibility to data parallelizing the parsing process is by adding a pre-parsing step to get the skeleton that symbolized the tree structure of the XML document \cite{GRID2006}. One possibility to data parallelizing the parsing process is by adding a pre-parsing step to get the skeleton that symbolized the tree structure of the XML document \cite{GRID2006}. The pre-parsing stage can also be parallelized using state machines \cite{E-SCIENCE2007, IPDPS2008}. Methods without pre-parsing require speculation \cite{HPCC2011} or post-processing that combines the partial results \cite{ParaDOM2009}. A hybrid method that combines data parallelism and pipeline parallelism is proposed to hide the latency of the job'' that has to be done sequentially \cite{ICWS2008}. Intel introduced new string processing instructions in the SSE 4.2 instruction set extension and showed how it can be used to improve the performance of XML parsing \cite{XMLSSE42}. Parabix XML parser exploit the SIMD extensions to process hundreds of XML input characters simultaneously \cite{Cameron2009, cameron-EuroPar2011}. Parabix can also be combined with thread-level parallelism to achieve further improvement on multicore systems \cite{HPCA2012}. Methods without pre-parsing require speculation \cite{HPCC2011} or post-processing that combines the partial results \cite{ParaDOM2009}. A hybrid method that combines data parallelism and pipeline parallelism is proposed to hide the latency of the job'' that has to be done sequentially \cite{ICWS2008}. Intel introduced new string processing instructions in the SSE 4.2 instruction set extension and showed how it can be used to improve the performance of XML parsing \cite{XMLSSE42}. Parabix XML parser exploit the SIMD extensions to process hundreds of XML input characters simultaneously \cite{Cameron2009, cameron-EuroPar2011}. Parabix can also be combined with thread-level parallelism to achieve further improvement on multicore systems \cite{HPCA2012}. Paragraph 2: \section{Background} \input{background-xerces} \input{background-parabix} \input{background-xerces} \input{background-fundemental-differences.tex} \section{Architecture} \input{arch-overview.tex} \input{parfilter.tex} \input{arch-namespace.tex} \input{arch-errorhandling.tex} \begin{figure} \begin{center} \includegraphics[width=0.15\textwidth]{plots/xerces.pdf} \label{xerces_structure} \label{fig:xerces-arch} \caption{} \end{center} \begin{figure} \includegraphics[width=0.50\textwidth]{plots/icxml.pdf} \label{icxml_structure} \label{fig:icxml-arch} \caption{} \end{figure} - Philosophy:  Maximizing Bit Stream Processing - Character Set Adapters vs. Transcoding - Bitstreams 1: Charset Validation and Transcoding equations - Bitstreams 2: Parabix style parsing and validation - Bitstreams 3: Parallel filtering and normalization - LB normalization - reference compression -> single code unit speculation - parallel string termination - Bitstreams 4: Symbol processing - From bit streams to doublebyte streams: the content buffer - Namespace Processing: A Bitset approach. \section{Performance} ICXML vs. Original Xerces \icXML{} vs. Original Xerces -- SAXCount
• ## docs/Working/icXML/parfilter.tex

 r2289 Several stages of standard XML processing include compressions that filter out data from certain input byte positions in various ways.   These include the compression of multibyte in various ways. These include the compression of multibyte sequences in UTF-8 to UTF-16 transcoding, the compression of CR-LF sequences to single LF characters in line break code units by 4, although each code unit is  now a 16-bit quantity.  This compression is carried out by the UTF-8 to UTF-16 transcoder in Xerces.   Line 3 then shows the reduction of the two CR-LF sequences to simple LF values.  This is the line break normalization that takes place within the Reader??? transcoder in Xerces. Line 3 then shows the reduction of the two CR-LF sequences to simple LF values. % This is the line break normalization that takes place within the Reader??? Line 4 then shows the further compression by replacement of the character and general entity references by their corresponding deletion using parallel block-at-a-time operations, following for example, the parallel compress algorithm\cite{HackersDelight}. Figure \ref{Fig:compression2} shows three deletion masks corresponding Figure \ref{fig:compression2} shows three deletion masks corresponding to the three compressions of Figure \ref{fig:compression1}. In the fourth line of this figure, the bitwise-or of these
Note: See TracChangeset for help on using the changeset viewer.