source: docs/Working/icXML/background-fundemental-differences.tex @ 2529

Last change on this file since 2529 was 2522, checked in by nmedfort, 7 years ago


File size: 1.9 KB
1\subsection {Sequential vs. Parallel Paradigm}
3% Sequential: bytes through layers
4Xerces---like all traditional XML parsers---processes XML documents sequentially.
5Each character is examined to distinguish between the
6XML-specific markup, such as a left angle bracket ``\verb`<`'', and the
7content held within the document. 
8As the parser moves its cursor through the document, it alternates
9between markup scanning, validation, and content processing
10operations.  In other words, Xerces is a complex
11finite-state machine that use byte comparisons to transition between
12data and metadata states. Each state transition indicates the context
13for subsequent characters. Unfortunately, textual data tends to
14consist of variable-length items sequenced in generally unpredictable
15patterns; thus any character could be a state transition until deemed
18% Parallel: blocks/segments/buffers through layers
19Parabix-style XML parsers utilize a concept of layers:
20as each block of source text is transformed into a set of lexical bit streams,
21it undergoes a series of operations that can be grouped together in logical
22layers, such as transposition, character classification, and the lexical analysis
23phases. Each layer is pipeline parallel, requiring no speculation nor
25In adapting to the requirements of the Xerces sequential parsing API,
26however, the resultant parallel
27bit streams, taken individually, may out-of-order \wrt{} the source
28document.  Hence they must be amalgamated and iterated through to produce
29sequential output.
30% The end user should not be expected to work with out-of-order data ...
32% a block of input
33% text is transformed into a set of lexical bit streams. Operations are then
34% performed on these streams to identify key positions in the input data and
35% perform intra-element well-formedness validation (as an artifact of the
36% identification process.)
Note: See TracBrowser for help on using the repository browser.