source: docs/Working/icXML/background-fundemental-differences.tex @ 2866

Last change on this file since 2866 was 2866, checked in by nmedfort, 7 years ago


File size: 2.0 KB
1\subsection {Sequential vs. Parallel Paradigm}
3% Sequential: bytes through layers
4Xerces---like all traditional XML parsers---processes XML documents sequentially.
5Each character is examined to distinguish between the
6XML-specific markup, such as a left angle bracket ``\verb`<`'', and the
7content held within the document. 
8As the parser progresses through the document, it alternates between markup scanning,
9validation and content processing modes.
12In other words, Xerces belongs to an equivalent class applications termed FSM applications\footnote{
13  Herein FSM applications are software systems whose behavior is defined by the inputs,
14  current state and the events associated with transitions of states.}.
15Each state transition indicates the processing context of subsequent characters.
16Unfortunately, textual data tends to be unpredictable and any character could induce a state transition.
18% Unfortunately, textual data tends to consist of variable-length strings sequenced in
19% unpredictable patterns.
20% Each character must be examined in sequence because any character could be a state transition until deemed otherwise.
25% Parallel: blocks/segments/buffers through layers
26Parabix-style XML parsers utilize a concept of layered processing.
27A block of source text is transformed into a set of lexical bit streams,
28which undergo a series of operations that can be grouped into logical layers,
29e.g., transposition, character classification, and lexical analysis.
30Each layer is pipeline parallel and require neither speculation nor pre-parsing stages\cite{HPCA2012}.
31% In adapting to the requirements of the Xerces sequential parsing API,
32% however, the resultant parallel bit streams may out-of-order \wrt{} the source document.
33% Hence they must be amalgamated and iterated through to produce sequential output.
34To meet the API requirements of the document-ordered Xerces output,
35the results of the Parabix processing layers must be interleaved to produce the equivalent behavior.
Note: See TracBrowser for help on using the repository browser.