source: docs/Working/icXML/background-fundemental-differences.tex @ 2429

Last change on this file since 2429 was 2429, checked in by nmedfort, 7 years ago

some progress

File size: 1.8 KB
Line 
1\subsection {Sequential vs. Parallel Paradigm}
2
3% Sequential: bytes through layers
4Xerces---like all traditional XML parsers---processes XML documents sequentially.
5Each character is examined to distinguish between the
6XML-specific markup, such as a left angle bracket ``\verb`<`'', and the
7content held within the document. 
8As the parser moves its cursor through the document, it alternates
9between markup scanning, validation, and content processing
10operations.  In other words, Xerces is a complex
11finite-state machine that use byte comparisons to transition between
12data and metadata states. Each state transition indicates the context
13for subsequent characters. Unfortunately, textual data tends to
14consist of variable-length items sequenced in generally unpredictable
15patterns; thus any character could be a state transition until deemed
16otherwise.
17
18% Parallel: blocks/segments/buffers through layers
19Parabix-style XML parsers utilize a concept of layers:
20as block of source text is transformed into a set of lexical bit streams,
21it undergoes a series of operations that can be grouped together as a logical
22layer, such as transposition, character classification, and the lexical analysis
23phases. Each layer is pipeline parallel, as they require no speculation nor
24pre-parsing stages\cite{HPCA2012}.
25The disadvantage of this approach is that, taken individually, the resultant lexical
26bit streams may out-of-order w.r.t. the source document and must be amalgamated and
27iterated through to produce sequential output.
28
29% a block of input
30% text is transformed into a set of lexical bit streams. Operations are then
31% performed on these streams to identify key positions in the input data and
32% perform intra-element well-formedness validation (as an artifact of the
33% identification process.)
34
Note: See TracBrowser for help on using the repository browser.