Changeset 2522 for docs/Working


Ignore:
Timestamp:
Oct 20, 2012, 12:30:32 PM (7 years ago)
Author:
nmedfort
Message:

edits

Location:
docs/Working/icXML
Files:
10 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icXML/arch-charactersetadapters.tex

    r2520 r2522  
    1111
    1212In \icXML{}, however,  the concept of Character Set Adapters (CSAs) is used to minimize transcoding costs.
    13 Given a specified input encoding, a Character Set Adapter is responsible for checking that
    14 input code units represent valid characters, mapping the characters of the incoding into
     13Given a specified input encoding, a CSA is responsible for checking that
     14input code units represent valid characters, mapping the characters of the encoding into
    1515the appropriate bit streams for XML parsing actions (i.e., producing the lexical item
    1616streams), as well as supporting ultimate transcoding requirements.   All of this work
    1717is performed using the parallel bit stream representation of the source input.
    1818
    19 An important observation is that many character sets are some form of
     19An important observation is that many character sets are an
    2020extension to the legacy 7-bit ASCII character set.  This includes the
    21 various ISO Latin character sets, UTF-8 and UTF-16 as well as many others.
     21various ISO Latin character sets, UTF-8, UTF-16 and many others.
    2222Furthermore, all significant characters for parsing XML are confined to the
    2323ASCII repertoire.   Thus, a single common set of lexical item calculations
    2424serves to compute lexical item streams for all such ASCII-based character sets.
    2525
    26 A second observation is that, regardless of which character set is used, it is
    27 often the case that all of the characters in a particular block of input
    28 happen to be within the ASCII repertoire.   This is a very simple test to
    29 perform using the bit stream representation, simply confirming that the
     26A second observation is that---regardless of which character set is used---quite
     27often all of the characters in a particular block of input will be within the ASCII range.
     28This is a very simple test to perform using the bit stream representation, simply confirming that the
    3029bit 0 stream is zero for the entire block.   For blocks satisfying this test,
    3130all logic dealing with non-ASCII characters can simply be skipped.
    32 Transcoding to UTF-16 becomes trivial: the high eight bit streams of the
     31Transcoding to UTF-16 becomes trivial as the high eight bit streams of the
    3332UTF-16 form are each set to zero in this case.
    3433
  • docs/Working/icXML/arch-namespace.tex

    r2505 r2522  
    27276. & \verb|</book>| \\
    2828\end{tabular}
     29\caption{XML Namespace Example}
    2930\label {fig:namespace1}
    30 \caption{XML Namespace Example}
    3131\end{figure}
    3232
     
    6464\end{tabular}
    6565\caption{Namespace Binding Table Example}
     66\end{center}
    6667\label{tbl:namespace1}
    67 \end{center}
    6868\end{table}
    6969
  • docs/Working/icXML/arch-overview.tex

    r2516 r2522  
    1919the unlikely event that the input contains an error), performs all line-break normalization
    2020and validates context-specific character set issues, such as tokenization of qualified-names and
    21 ensures each character is legal w.r.t. the XML specification.
     21ensures each character is legal \wrt{} the XML specification.
    2222The {\it Scanner} pulls data through the reader and constructs the intermediate representation (IR)
    2323of the document; it deals with all issues related to entity expansion, validates
     
    2525be completely handled by the reader or transcoder (e.g., surrogate characters, validation
    2626and normalization of character references, etc.)
    27 The {\it Namespace Binder}, which is a core piece of their element stack, is primarily tasked
     27The {\it Namespace Binder}, which is a core piece of the element stack, is primarily tasked
    2828with handling namespace scoping issues between different XML vocabularies and faciliates
    2929the scanner with the construction and utilization of Schema grammar structures.
     
    5353Like Xerces, \icXML{} must provide the Line and Column position of each error.
    5454The {\it Line-Column Tracker} uses the lexical information to keep track of the cursor position(s) through the use of an
    55 optimized population count algorithm; this is described in Section \ref{section:arch:errorhandling}.
     55optimized population count algorithm, described in Section \ref{section:arch:errorhandling}.
    5656From here, two data-independent branches exist: the Symbol Pesolver and Content Preperation Unit.
    5757
     
    6969filtered streams into the tagged UTF-16 {\it content stream}.
    7070This is discussed in Section \ref{sec:parfilter}.
    71 Combined, the symbol stream and content stream form \icXML{}'s compressed IR of the XML document.
    7271
    73 % This component has a multitude of
    74 % responsibilities, which will be discussed in Section \ref{sec:parfilter}, but its primary function is to produce
    75 % near-final UTF-16 content.
    76 
    77 The {\it \MP{}} parses the IR to validate and produces the sequential output for the end user.
    78 The {\it WF checker} performs inter-element wellformedness validation that would be too costly
     72Combined, the symbol and content stream form \icXML{}'s compressed IR of the XML document.
     73The {\it \MP{}}~parses the IR to validate and produce the sequential output for the end user.
     74The {\it Final WF checker} performs inter-element wellformedness validation that would be too costly
    7975to perform in bitspace, such as ensuring every start tag has a matching end tag.
    8076Xerces's namespace binding functionality is replaced by the {\it Namespace Processor}. Unlike Xerces,
    81 it's performed as a discrete phase, which produces a set of URI identifiers (URI IDs) that is
    82 associated with symbol occurrence.
     77it is a discrete phase that produces a series of URI identifiers (URI IDs), the {\it URI stream}, which are
     78associated with each symbol occurrence.
    8379This is discussed in Section \ref{section:arch:namespacehandling}.
    84 The final {\it Validation} process is responsible for the same tasks as Xerces's validator; however
     80Finally, the {\it Validation} layer mimics the Xerces's validator; however
    8581the majority of the grammar look-ups are performed beforehand and stored within the symbol themselves.
    8682
  • docs/Working/icXML/background-fundemental-differences.tex

    r2490 r2522  
    2525In adapting to the requirements of the Xerces sequential parsing API,
    2626however, the resultant parallel
    27 bit streams, taken individually, may out-of-order with respect to the source
    28 document.  They hence must be amalgamated and iterated through to produce
     27bit streams, taken individually, may out-of-order \wrt{} the source
     28document.  Hence they must be amalgamated and iterated through to produce
    2929sequential output.
    3030% The end user should not be expected to work with out-of-order data ...
  • docs/Working/icXML/background-parabix.tex

    r2516 r2522  
    44The Parabix (parallel bit stream) framework is a transformative approach to XML parsing
    55(and other forms of text processing.) The key idea is to exploit the availability of wide
    6 (e.g., 128-bit) SIMD registers in commodity processors to represent data from long blocks
     6SIMD registers (e.g., 128-bit) in commodity processors to represent data from long blocks
    77of input data by using one register bit per single input byte.
    88To facilitate this, the input data is first transposed into a set of basis bit streams.
     
    1818Similarly, a character is numeric
    1919{\tt [0-9]} if and only if $\lnot(b_0 \lor b_1) \land (b_2 \land b_3) \land \lnot(b_4 \land (b_5 \lor b_6))$.
    20 % An important observation here is that a range of characters can sometimes
    21 % take fewer operations and require fewer basis bit streams to compute
    22 % than individual characters. Finding optimal solutions to all
    23 % character-classes is non-trivial and goes beyond the scope of this
    24 % paper.
     20An important observation here is that ranges of characters may
     21require fewer operations than individual characters and multiple
     22classes can sometimes share the classification cost.
    2523
    2624\begin{figure}[tbh]
     
    113111well as combining SIMD methods with 4-stage pipeline parallelism to further improve
    114112throughput \cite{HPCA2012}.
    115 Although these research prototypes handle the full syntax of
    116 DTD-less XML documents, they lacked the functionality required by full XML parsers.
     113Although these research prototypes handled the full syntax of
     114{\bf grammarless} XML documents, they lacked the functionality required by full XML parsers.
    117115Namely, commercial XML processors, such as Xerces,
    118116as support for transcoding of multiple character sets,
    119 the ability to parse and validate against DTDs, both internal and external,
     117the ability to parse and validate against DTD grammars, both internal and external,
    120118facilities for handling different XML vocabularies through namespace
    121119processing, as well validation against XML Schema grammars. 
  • docs/Working/icXML/background-xerces.tex

    r2490 r2522  
    2929difficult to optimize for performance.
    3030As a complex software system, no one feature dominates the overall parsing performance.
    31 Figure \ref{fig:xerces-profile} shows the
    32 execution time profile of the top ten functions
    33 in a typical run.
    34 Even if it were possible, tackling any single one of these
    35 functions for parallelization in isolation would
    36 only produce a small improvement in perfomance
    37 in accord with Amdahl's Law.  In order to obtain
    38 systematic acceleration of the Xerces parser,
     31Figure \ref{fig:xerces-profile} shows the execution time profile of the top ten functions in a typical run.
     32Even if it were possible, Amdahl's Law dictates that tackling any one of these functions for
     33parallelization in isolation would only produce a minute improvement in perfomance.
     34Unfortunetly, early investigation into these functions found they were already performing well in their given tasks
     35and only trivial enhancements were possible.
     36In order to obtain a systematic acceleration of Xerces,
    3937it should be expected that a comprehensive restructuring
    4038is required, involving all aspects of the parser.
     39
     40% In order to obtain systematic acceleration of the Xerces parser,
     41% it should be expected that a comprehensive restructuring
     42% is required, involving all aspects of the parser.
     43
     44\begin{figure}[h]
     45\begin{tabular}{r|l}
     46Time (\%) & Function Name \\
     47\hline
     4813.29   &       XMLUTF8Transcoder::transcodeFrom \\
     497.45    &       IGXMLScanner::scanCharData \\
     506.83    &       memcpy \\
     515.83    &       XMLReader::getNCName \\
     524.67    &       IGXMLScanner::buildAttList \\
     534.54    &       RefHashTableOf\verb|<>|::findBucketElem \\
     544.20    &       IGXMLScanner::scanStartTagNS \\
     553.75    &       ElemStack::mapPrefixToURI \\
     563.58    &       ReaderMgr::getNextChar \\
     573.20    &       IGXMLScanner::basicAttrValueScan \\
     58\end{tabular}
     59\caption{Execution Time of Top 10 Xerces Functions}
     60\label {fig:xerces-profile}
     61\end{figure}
     62
    4163
    4264
  • docs/Working/icXML/conclusion.tex

    r2512 r2522  
    1 \section{Conclusion and Future Work}
    2 
    31This paper presented the \icXML{} parser and discussed the key architectural differences between it and Xerces.
    42\icXML{} was architected to utilize SIMD parallelism, facilitated by the Parabix framework. \icXMLp{} extended on
     
    119Although only a two-thread version was explored, more is possible---but the value of using more is dependent on
    1210the application utilizing the \icXML{} library.
    13 As the application becomes more complex there are diminishing returns w.r.t. additional thread-level parallelism.
     11As the application becomes more complex there are diminishing returns \wrt{} additional thread-level parallelism.
    1412A more interesting use of additional threads could be in the inclusion of an XPath and XQuery modules that could
    1513eliminate unneeded data prior to the \MP{} stage.
  • docs/Working/icXML/icxml-main.tex

    r2516 r2522  
    4646\def \PS {Parabix Subsystem}
    4747\def \MP {Markup Processor}
     48\def \wrt {with respect to}
    4849
    4950\title{\icXML{}:  Accelerating a Commercial XML Parser Using SIMD and Multicore Technologies}
     
    8586Methods without pre-parsing have used speculation \cite{HPCC2011} or post-processing that
    8687combines the partial results \cite{ParaDOM2009}.
    87 A hybrid method that combines data parallelism and pipeline parallelism is proposed to
    88 hide the latency of the ``job'' that has to be done sequentially \cite{ICWS2008}.
     88A hybrid technique that combines data and pipeline parallelism was proposed to
     89hide the latency of a ``job'' that has to be done sequentially \cite{ICWS2008}.
    8990
    9091Fewer efforts have investigated SIMD parallelism, although this approach
    9192has the potential advantage of improving single core performance as well
    92 as offering savings in energy consumption.
     93as offering savings in energy consumption \cite{HPCA2012}.
    9394Intel introduced specialized SIMD string processing instructions in the SSE 4.2 instruction set extension
    9495and showed how they can be used to improve the performance of XML parsing \cite{XMLSSE42}.
     
    106107preserve the existing APIs as well as offering worthwhile
    107108end-to-end acceleration of XML processing.   
    108 To achieve the best results possible, we have undertaken
     109To achieve the best results possible, we undertook
    109110a comprehensive restructuring of the Xerces-C++ parser,
    110111seeking to expose as many critical aspects of XML parsing
     
    116117multiple cores.   
    117118
    118 The remainder of this paper is organized as follows.   Section 2 discusses
    119 the structure of the Xerces and Parabix XML parsers and the fundamental
    120 differences between the two parsing models.   Section 3 then presents
    121 the \icXML{} design based on a restructured Xerces architecture to
     119The remainder of this paper is organized as follows.   
     120Section \ref{background} discusses the structure of the Xerces and Parabix XML parsers and the fundamental
     121differences between the two parsing models.   
     122Section \ref{architecture} then presents the \icXML{} design based on a restructured Xerces architecture to
    122123incorporate SIMD parallelism using Parabix methods.   
    123 Section 4 moves on to consider the multithreading of the \icXML{} architecture
     124Section \ref{multithread} moves on to consider the multithreading of the \icXML{} architecture
    124125using the pipeline parallelism model. 
    125 Section 5 analyzes the performance of both the single-threaded and
     126Section \ref{performance} analyzes the performance of both the single-threaded and
    126127multi-threaded versions of \icXML{} in comparison to original Xerces,
    127128demonstrating substantial end-to-end acceleration of
    128129a GML-to-SVG translation application written against the Xerces API.
    129 Section 6 concludes the
    130 paper with a discussion of future work and the potential for
     130Section \ref{conclusion} concludes the paper with a discussion of future work and the potential for
    131131applying the techniques discussed herein in other application domains.
    132132
     
    139139
    140140\section{Architecture}
     141\label{architecture}
    141142
    142143\input{arch-overview.tex}
     
    150151\input{arch-errorhandling.tex}
    151152
     153\section{Leveraging Pipeline Parallelism}
     154\label{multithread}
     155
    152156\input{multithread.tex}
    153157
     158\section{Performance}
     159\label{performance}
    154160\input{performance.tex}
    155161
     162\section{Conclusion and Future Work}
     163\label{conclusion}
    156164\input{conclusion.tex}
    157165
  • docs/Working/icXML/multithread.tex

    r2506 r2522  
    11%\section{Leveraging SIMD Parallelism for Multicore: Pipeline Parallelism}
    2 \section{Leveraging Pipeline Parallelism}
     2
    33% As discussed in section \ref{background:xerces}, Xerces can be considered a complex finite-state machine
    44% Finite-state machine belongs to the hardest application class to parallelize and process efficiently
  • docs/Working/icXML/performance.tex

    r2513 r2522  
    1 \section{Performance}
    2 
    31We evaluate the Xerces C++ 3.1.1, ICXML Xerces C++ XML parser and pipelined
    42ICXML Xerces C++ against two benchmark applications. Firstly against the Xerces C++ SAXCount
Note: See TracChangeset for help on using the changeset viewer.