source: docs/Working/icXML/background-fundemental-differences.tex @ 2471

Last change on this file since 2471 was 2471, checked in by nmedfort, 7 years ago

some edits

File size: 1.8 KB
1\subsection {Sequential vs. Parallel Paradigm}
3% Sequential: bytes through layers
4Xerces---like all traditional XML parsers---processes XML documents sequentially.
5Each character is examined to distinguish between the
6XML-specific markup, such as a left angle bracket ``\verb`<`'', and the
7content held within the document. 
8As the parser moves its cursor through the document, it alternates
9between markup scanning, validation, and content processing
10operations.  In other words, Xerces is a complex
11finite-state machine that use byte comparisons to transition between
12data and metadata states. Each state transition indicates the context
13for subsequent characters. Unfortunately, textual data tends to
14consist of variable-length items sequenced in generally unpredictable
15patterns; thus any character could be a state transition until deemed
18% Parallel: blocks/segments/buffers through layers
19Parabix-style XML parsers utilize a concept of layers:
20as block of source text is transformed into a set of lexical bit streams,
21it undergoes a series of operations that can be grouped together as a logical
22layer, such as transposition, character classification, and the lexical analysis
23phases. Each layer is pipeline parallel, as they require no speculation nor
24pre-parsing stages\cite{HPCA2012}.
25The disadvantage of this approach is that, taken individually, the resultant parallel
26bit streams may out-of-order w.r.t. the source document and must be amalgamated and
27iterated through to produce sequential output.
28% The end user should not be expected to work with out-of-order data ...
30% a block of input
31% text is transformed into a set of lexical bit streams. Operations are then
32% performed on these streams to identify key positions in the input data and
33% perform intra-element well-formedness validation (as an artifact of the
34% identification process.)
Note: See TracBrowser for help on using the repository browser.