source: docs/HPCA2012/03b-research.tex @ 1379

Last change on this file since 1379 was 1372, checked in by lindanl, 8 years ago

modifications

File size: 2.3 KB
Line 
1\section{The Parabix XML Parser}
2\label{section:parser}
3
4\begin{figure}[h]
5\begin{center}
6\includegraphics[width=1\textwidth]{plots/parabix_arch.pdf}
7\end{center}
8\caption{Parabix XML Parser Structure}
9\label{parabix_arch}
10\end{figure}
11
12
13Figure \ref{parabix_arch} shows the overall structure of the Parabix XML parser set up for
14well-formedness checking.
15The input file is processed using 11 functions organized into 7 modules. 
16In the first module, the Read\_Data function loads data blocks from an input file to data\_buffer.
17The data is then transposed to eight parallel basis bitstreams (basis\_bits) in the Transposition module.
18The eight bitstreams are used in the Classification function to generate all the XML lexical item streams (lex)
19as well as in the U8\_Validation module to validate UTF-8 characters.
20The lexical item streams and scope streams (scope) that are generated in Gen\_Scope function
21are supplied to the parsing module, which consists three functions, Parse\_CtCDPI, Parse\_Ref and Parse\_tag.
22These functions deal with the parsing of
23comments, CDATA sections, processing instructions, references and tags.   After this,
24information is gathered by Name\_Validation and Err\_Check functions, producing
25name check streams and error streams.  These are then passed to the final module for Postprocessing.
26All the possible errors that cannot be conveniently detected by bitstreams are checked in this last module.
27The final output reports any well-formedness error detected and its position within the input file.
28
29Within this structure, all functions in the four shaded modules consist entirely of parallel bit stream
30operations.  Of these, the Classification function consists of XML character class definitions that
31are generated using ccc, while much of the U8\_Validation similarly consists of UTF-8 byte class
32definitions that are also generated by ccc.  The remainder of these functions are programmed using
33our unbounded bitstream language following the logical requirements of XML parsing.   All the functions
34in the four shaded modules are then compiled to low-level C/C++ code using our Pablo compiler.   This
35code is then linked in with the general Transposition code available in the Parabix run-time library,
36as well as the hand-written Postprocessing code that completes the well-formed checking.
Note: See TracBrowser for help on using the repository browser.