Changeset 2522 for docs

Ignore:
Timestamp:
Oct 20, 2012, 12:30:32 PM (6 years ago)
Message:

edits

Location:
docs/Working/icXML
Files:
10 edited

Legend:

Unmodified
Removed

 r2520 In \icXML{}, however,  the concept of Character Set Adapters (CSAs) is used to minimize transcoding costs. Given a specified input encoding, a Character Set Adapter is responsible for checking that input code units represent valid characters, mapping the characters of the incoding into Given a specified input encoding, a CSA is responsible for checking that input code units represent valid characters, mapping the characters of the encoding into the appropriate bit streams for XML parsing actions (i.e., producing the lexical item streams), as well as supporting ultimate transcoding requirements.   All of this work is performed using the parallel bit stream representation of the source input. An important observation is that many character sets are some form of An important observation is that many character sets are an extension to the legacy 7-bit ASCII character set.  This includes the various ISO Latin character sets, UTF-8 and UTF-16 as well as many others. various ISO Latin character sets, UTF-8, UTF-16 and many others. Furthermore, all significant characters for parsing XML are confined to the ASCII repertoire.   Thus, a single common set of lexical item calculations serves to compute lexical item streams for all such ASCII-based character sets. A second observation is that, regardless of which character set is used, it is often the case that all of the characters in a particular block of input happen to be within the ASCII repertoire.   This is a very simple test to perform using the bit stream representation, simply confirming that the A second observation is that---regardless of which character set is used---quite often all of the characters in a particular block of input will be within the ASCII range. This is a very simple test to perform using the bit stream representation, simply confirming that the bit 0 stream is zero for the entire block.   For blocks satisfying this test, all logic dealing with non-ASCII characters can simply be skipped. Transcoding to UTF-16 becomes trivial: the high eight bit streams of the Transcoding to UTF-16 becomes trivial as the high eight bit streams of the UTF-16 form are each set to zero in this case.
• docs/Working/icXML/arch-namespace.tex

 r2505 6. & \verb|| \\ \end{tabular} \caption{XML Namespace Example} \label {fig:namespace1} \caption{XML Namespace Example} \end{figure} \end{tabular} \caption{Namespace Binding Table Example} \end{center} \label{tbl:namespace1} \end{center} \end{table}
• docs/Working/icXML/arch-overview.tex

 r2516 the unlikely event that the input contains an error), performs all line-break normalization and validates context-specific character set issues, such as tokenization of qualified-names and ensures each character is legal w.r.t. the XML specification. ensures each character is legal \wrt{} the XML specification. The {\it Scanner} pulls data through the reader and constructs the intermediate representation (IR) of the document; it deals with all issues related to entity expansion, validates be completely handled by the reader or transcoder (e.g., surrogate characters, validation and normalization of character references, etc.) The {\it Namespace Binder}, which is a core piece of their element stack, is primarily tasked The {\it Namespace Binder}, which is a core piece of the element stack, is primarily tasked with handling namespace scoping issues between different XML vocabularies and faciliates the scanner with the construction and utilization of Schema grammar structures. Like Xerces, \icXML{} must provide the Line and Column position of each error. The {\it Line-Column Tracker} uses the lexical information to keep track of the cursor position(s) through the use of an optimized population count algorithm; this is described in Section \ref{section:arch:errorhandling}. optimized population count algorithm, described in Section \ref{section:arch:errorhandling}. From here, two data-independent branches exist: the Symbol Pesolver and Content Preperation Unit. filtered streams into the tagged UTF-16 {\it content stream}. This is discussed in Section \ref{sec:parfilter}. Combined, the symbol stream and content stream form \icXML{}'s compressed IR of the XML document. % This component has a multitude of % responsibilities, which will be discussed in Section \ref{sec:parfilter}, but its primary function is to produce % near-final UTF-16 content. The {\it \MP{}} parses the IR to validate and produces the sequential output for the end user. The {\it WF checker} performs inter-element wellformedness validation that would be too costly Combined, the symbol and content stream form \icXML{}'s compressed IR of the XML document. The {\it \MP{}}~parses the IR to validate and produce the sequential output for the end user. The {\it Final WF checker} performs inter-element wellformedness validation that would be too costly to perform in bitspace, such as ensuring every start tag has a matching end tag. Xerces's namespace binding functionality is replaced by the {\it Namespace Processor}. Unlike Xerces, it's performed as a discrete phase, which produces a set of URI identifiers (URI IDs) that is associated with symbol occurrence. it is a discrete phase that produces a series of URI identifiers (URI IDs), the {\it URI stream}, which are associated with each symbol occurrence. This is discussed in Section \ref{section:arch:namespacehandling}. The final {\it Validation} process is responsible for the same tasks as Xerces's validator; however Finally, the {\it Validation} layer mimics the Xerces's validator; however the majority of the grammar look-ups are performed beforehand and stored within the symbol themselves.
• docs/Working/icXML/background-fundemental-differences.tex

 r2490 In adapting to the requirements of the Xerces sequential parsing API, however, the resultant parallel bit streams, taken individually, may out-of-order with respect to the source document.  They hence must be amalgamated and iterated through to produce bit streams, taken individually, may out-of-order \wrt{} the source document.  Hence they must be amalgamated and iterated through to produce sequential output. % The end user should not be expected to work with out-of-order data ...
• docs/Working/icXML/background-parabix.tex

 r2516 The Parabix (parallel bit stream) framework is a transformative approach to XML parsing (and other forms of text processing.) The key idea is to exploit the availability of wide (e.g., 128-bit) SIMD registers in commodity processors to represent data from long blocks SIMD registers (e.g., 128-bit) in commodity processors to represent data from long blocks of input data by using one register bit per single input byte. To facilitate this, the input data is first transposed into a set of basis bit streams. Similarly, a character is numeric {\tt [0-9]} if and only if $\lnot(b_0 \lor b_1) \land (b_2 \land b_3) \land \lnot(b_4 \land (b_5 \lor b_6))$. % An important observation here is that a range of characters can sometimes % take fewer operations and require fewer basis bit streams to compute % than individual characters. Finding optimal solutions to all % character-classes is non-trivial and goes beyond the scope of this % paper. An important observation here is that ranges of characters may require fewer operations than individual characters and multiple classes can sometimes share the classification cost. \begin{figure}[tbh] well as combining SIMD methods with 4-stage pipeline parallelism to further improve throughput \cite{HPCA2012}. Although these research prototypes handle the full syntax of DTD-less XML documents, they lacked the functionality required by full XML parsers. Although these research prototypes handled the full syntax of {\bf grammarless} XML documents, they lacked the functionality required by full XML parsers. Namely, commercial XML processors, such as Xerces, as support for transcoding of multiple character sets, the ability to parse and validate against DTDs, both internal and external, the ability to parse and validate against DTD grammars, both internal and external, facilities for handling different XML vocabularies through namespace processing, as well validation against XML Schema grammars.
• docs/Working/icXML/background-xerces.tex

 r2490 difficult to optimize for performance. As a complex software system, no one feature dominates the overall parsing performance. Figure \ref{fig:xerces-profile} shows the execution time profile of the top ten functions in a typical run. Even if it were possible, tackling any single one of these functions for parallelization in isolation would only produce a small improvement in perfomance in accord with Amdahl's Law.  In order to obtain systematic acceleration of the Xerces parser, Figure \ref{fig:xerces-profile} shows the execution time profile of the top ten functions in a typical run. Even if it were possible, Amdahl's Law dictates that tackling any one of these functions for parallelization in isolation would only produce a minute improvement in perfomance. Unfortunetly, early investigation into these functions found they were already performing well in their given tasks and only trivial enhancements were possible. In order to obtain a systematic acceleration of Xerces, it should be expected that a comprehensive restructuring is required, involving all aspects of the parser. % In order to obtain systematic acceleration of the Xerces parser, % it should be expected that a comprehensive restructuring % is required, involving all aspects of the parser. \begin{figure}[h] \begin{tabular}{r|l} Time (\%) & Function Name \\ \hline 13.29   &       XMLUTF8Transcoder::transcodeFrom \\ 7.45    &       IGXMLScanner::scanCharData \\ 6.83    &       memcpy \\ 5.83    &       XMLReader::getNCName \\ 4.67    &       IGXMLScanner::buildAttList \\ 4.54    &       RefHashTableOf\verb|<>|::findBucketElem \\ 4.20    &       IGXMLScanner::scanStartTagNS \\ 3.75    &       ElemStack::mapPrefixToURI \\ 3.58    &       ReaderMgr::getNextChar \\ 3.20    &       IGXMLScanner::basicAttrValueScan \\ \end{tabular} \caption{Execution Time of Top 10 Xerces Functions} \label {fig:xerces-profile} \end{figure}
• docs/Working/icXML/conclusion.tex

 r2512 \section{Conclusion and Future Work} This paper presented the \icXML{} parser and discussed the key architectural differences between it and Xerces. \icXML{} was architected to utilize SIMD parallelism, facilitated by the Parabix framework. \icXMLp{} extended on Although only a two-thread version was explored, more is possible---but the value of using more is dependent on the application utilizing the \icXML{} library. As the application becomes more complex there are diminishing returns w.r.t. additional thread-level parallelism. As the application becomes more complex there are diminishing returns \wrt{} additional thread-level parallelism. A more interesting use of additional threads could be in the inclusion of an XPath and XQuery modules that could eliminate unneeded data prior to the \MP{} stage.
• docs/Working/icXML/icxml-main.tex

 r2516 \def \PS {Parabix Subsystem} \def \MP {Markup Processor} \def \wrt {with respect to} \title{\icXML{}:  Accelerating a Commercial XML Parser Using SIMD and Multicore Technologies} Methods without pre-parsing have used speculation \cite{HPCC2011} or post-processing that combines the partial results \cite{ParaDOM2009}. A hybrid method that combines data parallelism and pipeline parallelism is proposed to hide the latency of the job'' that has to be done sequentially \cite{ICWS2008}. A hybrid technique that combines data and pipeline parallelism was proposed to hide the latency of a job'' that has to be done sequentially \cite{ICWS2008}. Fewer efforts have investigated SIMD parallelism, although this approach has the potential advantage of improving single core performance as well as offering savings in energy consumption. as offering savings in energy consumption \cite{HPCA2012}. Intel introduced specialized SIMD string processing instructions in the SSE 4.2 instruction set extension and showed how they can be used to improve the performance of XML parsing \cite{XMLSSE42}. preserve the existing APIs as well as offering worthwhile end-to-end acceleration of XML processing. To achieve the best results possible, we have undertaken To achieve the best results possible, we undertook a comprehensive restructuring of the Xerces-C++ parser, seeking to expose as many critical aspects of XML parsing multiple cores. The remainder of this paper is organized as follows.   Section 2 discusses the structure of the Xerces and Parabix XML parsers and the fundamental differences between the two parsing models.   Section 3 then presents the \icXML{} design based on a restructured Xerces architecture to The remainder of this paper is organized as follows. Section \ref{background} discusses the structure of the Xerces and Parabix XML parsers and the fundamental differences between the two parsing models. Section \ref{architecture} then presents the \icXML{} design based on a restructured Xerces architecture to incorporate SIMD parallelism using Parabix methods. Section 4 moves on to consider the multithreading of the \icXML{} architecture Section \ref{multithread} moves on to consider the multithreading of the \icXML{} architecture using the pipeline parallelism model. Section 5 analyzes the performance of both the single-threaded and Section \ref{performance} analyzes the performance of both the single-threaded and multi-threaded versions of \icXML{} in comparison to original Xerces, demonstrating substantial end-to-end acceleration of a GML-to-SVG translation application written against the Xerces API. Section 6 concludes the paper with a discussion of future work and the potential for Section \ref{conclusion} concludes the paper with a discussion of future work and the potential for applying the techniques discussed herein in other application domains. \section{Architecture} \label{architecture} \input{arch-overview.tex} \input{arch-errorhandling.tex} \section{Leveraging Pipeline Parallelism} \label{multithread} \input{multithread.tex} \section{Performance} \label{performance} \input{performance.tex} \section{Conclusion and Future Work} \label{conclusion} \input{conclusion.tex}