# Changeset 2505 for docs/Working

Ignore:
Timestamp:
Oct 19, 2012, 6:56:35 PM (7 years ago)
Message:

more edits

Location:
docs/Working/icXML
Files:
6 edited

### Legend:

Unmodified
Removed

 r2496 \label{arch:character-set-adapter} The first major difference between Xerces and \icXML{} is the use of Character Set Adapters (CSAs). In Xerces, all input is transcoded into UTF-16 to simplify the parsing costs of Xerces itself and to provide the end-consumer with a single encoding format. In Xerces, all input is transcoded into UTF-16 to simplify the parsing costs of Xerces itself and provide the end-consumer with a single encoding format. \icXML{} uses Character Set Adapters (CSAs) to parse data from encoding type into a set of basis and lexical bit streams.
• ## docs/Working/icXML/arch-errorhandling.tex

 r2496 column number. \begin{figure}[ht] {\bf TODO: An example of a skip mask, error mask, and the raw data and transcoded data for it. Should a multi-byte character be used and/or some CRLFs to show the difficulties?} \label{fig:error_mask} \caption{} \end{figure} % \begin{figure}[ht] % {\bf TODO: An example of a skip mask, error mask, and the raw data and transcoded data for it. % Should a multi-byte character be used and/or some CRLFs to show the difficulties?} % \label{fig:error_mask} % \caption{} % \end{figure} The \MP{} is a state-driven machine. As such, error detection within it is very similar to Xerces. The \MP{} parses the content stream, which is a series of tagged UTF-16 strings. Each string is normalized in accordance with the XML specification. All symbol data and unnecessary whitespace is eliminated from the stream. This means it is impossible to directly assess the current location using only the cursor position within the content stream. All symbol data and unnecessary whitespace is eliminated from the stream; thus its impossible to derive the current location using only the content stream. To calculate the location, the \MP{} borrows three additional pieces of information from the \PS{}: the line-feed, skip mask, and a {\it deletion mask stream}, which is a bit stream denoting the (code-unit) position of every
• ## docs/Working/icXML/arch-namespace.tex

 r2494 In both Xerces and ICXML, every URI has a one-to-one mapping to a URI ID. In both Xerces and \icXML{}, every URI has a one-to-one mapping to a URI ID. These persist for the lifetime of the application through the use of a global URI pool. Xerces maintains a stack of namespace scopes that is pushed (popped) every time a start tag (end tag) occurs (2) those that repeatedly modify the namespaces in predictable patterns. For that reason, ICXML contains an independent namespace stack and utilizes bit vectors to cheaply perform For that reason, \icXML{} contains an independent namespace stack and utilizes bit vectors to cheaply perform % speculation and scope resolution options with a single XOR operation---even if many alterations are performed.
• ## docs/Working/icXML/arch-overview.tex

 r2496 and validates context-specific character set issues, such as tokenization of qualified-names and ensures each character is legal w.r.t. the XML specification. The {\it Scanner} pulls data through the reader and constructs the intermediate (and near-final) representation of the document; it deals with all issues related to entity expansion, validates The {\it Scanner} pulls data through the reader and constructs the intermediate representation (IR) of the document; it deals with all issues related to entity expansion, validates the XML well-formedness constraints and any character set encoding issues that cannot be completely handled by the reader or transcoder (e.g., surrogate characters, validation with handling namespace scoping issues between different XML vocabularies and faciliates the scanner with the construction and utilization of Schema grammar structures. The {\it Validator} takes the intermediate representation produced by the Scanner (and The {\it Validator} takes the IR produced by the Scanner (and potentially annotated by the Namespace Binder) and assesses whether the final output matches the user-defined DTD and Schema grammar(s) before passing the information to the end-user. the user-defined DTD and Schema grammar(s) before passing it to the end-user. \begin{figure} mirrors Xerces's Transcoder duties; however instead of producing UTF-16 it produces a set of lexical bit streams, similar to those shown in Figure \ref{fig:parabix1}. These lexical bit streams are later transformed into UTF-16 in the Content Buffer Generator, after additional processing is performed. These lexical bit streams are later transformed into UTF-16 in the Content Stream Generator, after additional processing is performed. The first precursor to producing UTF-16 is the {\it Parallel Markup Parser} phase. It takes the lexical streams and produces a set of marker bit streams in which a 1-bit identifies The {\it Line-Column Tracker} uses the lexical information to keep track of the cursor position(s) through the use of an optimized population count algorithm; this is described in Section \ref{section:arch:errorhandling}. From here, two major data-independent branches remain: the {\bf symbol resolver} and the {\bf content stream generator}. % The output of both are required by the \MP{}. Apart from the Parabix framework, another core difference between Xerces and \icXML{} is the use of symbols. A typical XML document will contain relatively few unique element and attribute names---but each of them will occur frequently throughout the document. In \icXML{}, names are represented by distinct symbol structures and global identifiers (GIDs). Using the information produced by the parallel markup parser, the {\it Symbol Resolver} uses a bitscan intrinsic to iterate through a symbol bit stream (64-bits at a time) to generate a set of GIDs. % It keys each symbol on its raw data representation, which means it can potentially be run in parallel with the content stream generator. One of the main advantages of using GIDs is that grammar information can be associated with the symbol itself and help bypass the lookup cost in the validation process. The final component of the \PS{} is the {\it Content Stream Generator}. This component has a multitude of responsibilities, which will be discussed in Section \ref{sec:parfilter}, but its primary function is to produce near-final UTF-16 content. From here, two data-independent branches exist: the Symbol Pesolver and Content Preperation Unit. The {\it \MP{}} parses a compressed representation of the XML document, generated by the symbol resolver and content stream generator, to validate and produce the final (sequential) output for the end user. The {\it WF checker} performs all remaining inter-element wellformedness validation that would be too costly \icXML{} represents elements and attributes as distinct data structures, called symbols, each with their own global identifier (GID). Using the {\bf symbol marker streams} produced by the Parallel Markup Parser, the {\it Symbol Resolver} scans through the raw data to produce a stream (series) of GIDs, called the {\it symbol stream}. A typical XML file will contain relatively few unique element and attribute names---but each of them will occur frequently. % throughout the document. % Grammar information can be associated with each symbol and can help reduce the look-up cost of the later Validation process. The final components of the \PS{} are the {\it Content Preperation Unit} and {\it Content Stream Generator}. The former takes the (transposed) basis bit streams and selectively filters them, according to the information provided by the Parallel Markup Parser, and the latter transforms the filtered streams into the tagged UTF-16 {\it content stream}. This is discussed in Section \ref{sec:parfilter}. Combined, the symbol stream and content stream form \icXML{}'s compressed IR of the XML document. % This component has a multitude of % responsibilities, which will be discussed in Section \ref{sec:parfilter}, but its primary function is to produce % near-final UTF-16 content. The {\it \MP{}} parses the IR to validate and produces the sequential output for the end user. The {\it WF checker} performs inter-element wellformedness validation that would be too costly to perform in bitspace, such as ensuring every start tag has a matching end tag. The {\it Namespace Processor} replaces Xerces's namespace binding functionality. Unlike Xerces, this is performed as a discrete phase and simply produces a set of URI identifiers (URI IDs), to be associated with each occurrence of a symbol. Xerces's namespace binding functionality is replaced by the {\it Namespace Processor}. Unlike Xerces, it's performed as a discrete phase, which produces a set of URI identifiers (URI IDs) that is associated with symbol occurrence. This is discussed in Section \ref{section:arch:namespacehandling}. The final {\it Validation} process is responsible for the same tasks as Xerces's validator, however, the majority of the grammar look up operations are performed beforehand and stored within the symbols themselves. The final {\it Validation} process is responsible for the same tasks as Xerces's validator; however the majority of the grammar look-ups are performed beforehand and stored within the symbol themselves. \begin{figure}
• ## docs/Working/icXML/background-parabix.tex

 r2494 \end{tabular} \end{center} \caption{8-bit ASCII Basis Bit Streams} \label{fig:BitStreamsExample}
 r2502 \section{Leveraging SIMD Parallelism for Multicore: Pipeline Parallelism} %\section{Leveraging SIMD Parallelism for Multicore: Pipeline Parallelism} \section{Leveraging Pipeline Parallelism} % As discussed in section \ref{background:xerces}, Xerces can be considered a complex finite-state machine % Finite-state machine belongs to the hardest application class to parallelize and process efficiently As discussed in section \ref{background:xerces}, Xerces can be considered a complex finite-state machine, the hardest type of application to parallelize and process efficiently \cite{Asanovic:EECS-2006-183}. However, \icXML{} provides logical layers between modules, which naturally enables pipeline parallel processing. However \icXML{} provides logical layers between modules, which naturally enables pipeline parallel processing. In our pipeline model, each thread is in charge of one module or one group of modules. A straight forward division is to take advantage of the layer between Parabix Subsystem and Markup Processor. In this case, the first thread $T_1$ will read 16k of XML input $I$ at a time and process all the modules in Parabix Subsystem to generates % content buffer, symbol array, URI array, context ID array and store them to a pre-allocated shared data structure $S$. content buffer, symbol array, URI array, and store them to a pre-allocated shared data structure $S$. The second thread $T_2$ consumes the data provided by the first thread and goes through all the modules in Markup Processor and writes output $O$. In the pipeline model, each thread is in charge of a group of modules. A straight forward division is to take advantage of the layer between \PS{} and \MP{}. In this case, the first thread $T_1$ reads 16k of XML input $I$ at a time and produces the content, symbol and URI streams, then stores them in a pre-allocated shared data structure $S$. The second thread $T_2$ consumes $S$, performs well-formedness and grammar-based validation then generates the output $O$. The shared data structure is implemented using a ring buffer, where each entry consists of all the arrays shared between the two threads with size of 160k. where every entry contains an independent set of data streams. In the example of Figure \ref{threads_timeline1} and \ref{threads_timeline2}, the ring buffer has four entries. A lock-free mechanism is applied to ensure that each entry can only be read or written by one thread at the same time. In Figure \ref{threads_timeline1}, the processing time of the first thread is longer, thus the second thread always wait for the first thread to finish processing one chunk of input and write to the shared memory. Figure \ref{threads_timeline2} illustrates a different situation where the second thread is slower and the first thread has to wait for the second thread finishing reading the shared data before it can reuse the memory space. In Figure \ref{threads_timeline1} the processing time of $T_1$ is longer than $T_2$; thus $T_2$ always waits for $T_1$ to write to the shared memory. Figure \ref{threads_timeline2} illustrates the scenario in which $T_1$ is faster and must wait for $T_2$ to finish reading the shared data before it can reuse the memory space. % In our pipeline model, each thread is in charge of one module or one group of modules. % A straight forward division is to take advantage of the layer between \PS{} and \MP{}. % In this case, the first thread $T_1$ will read 16k of XML input $I$ at a time % and process all the modules in \PS{} to generates % content buffer, symbol array, URI array, and store them to a pre-allocated shared data structure $S$. % The second thread $T_2$ consumes the data provided by the first thread and % goes through all the modules in Markup Processor and writes output $O$. % The shared data structure is implemented using a ring buffer, % where each entry consists of all the arrays shared between the two threads with size of 160k. % In the example of Figure \ref{threads_timeline1} and \ref{threads_timeline2}, the ring buffer has four entries. % A lock-free mechanism is applied to ensure that each entry can only be read or written by one thread at the same time. % In Figure \ref{threads_timeline1}, the processing time of the first thread is longer, % thus the second thread always wait for the first thread to finish processing one chunk of input % and write to the shared memory. % Figure \ref{threads_timeline2} illustrates a different situation where the second thread is slower % and the first thread has to wait for the second thread finishing reading the shared data before it can reuse the memory space. To understand the performance improvement that can be achieved by this pipeline model,