source: docs/HPCA2012/03b-research.tex @ 1388

Last change on this file since 1388 was 1388, checked in by lindanl, 8 years ago

minor changes

File size: 2.4 KB
RevLine 
[1339]1\section{The Parabix XML Parser}
2\label{section:parser}
[1355]3
4\begin{figure}[h]
[1331]5\begin{center}
[1334]6\includegraphics[width=1\textwidth]{plots/parabix_arch.pdf}
[1331]7\end{center}
[1372]8\caption{Parabix XML Parser Structure}
[1331]9\label{parabix_arch}
10\end{figure}
[1302]11
[1388]12This section describes the implementation of the Parabix XML parser
13using the framework explained in Section \ref{section:parabix}.
14Figure \ref{parabix_arch} shows its overall structure set up for
15well-formedness checking.  The input file is
[1380]16processed using 11 functions organized into 7 modules.  In the first
17module, the Read\_Data function loads data blocks from an input file
18to data\_buffer.  The data is then transposed to eight parallel basis
19bitstreams (basis\_bits) in the Transposition module.  The eight
20bitstreams are used in the Classification function to generate all the
21XML lexical item streams (lex) as well as in the U8\_Validation module
22to validate UTF-8 characters.  The lexical item streams and scope
23streams (scope) that are generated in Gen\_Scope function are supplied
24to the parsing module, which consists three functions, Parse\_CtCDPI,
25Parse\_Ref and Parse\_tag.  These functions deal with the parsing of
26comments, CDATA sections, processing instructions, references and
27tags.  After this, information is gathered by Name\_Validation and
28Err\_Check functions, producing name check streams and error streams.
29These are then passed to the final module for Postprocessing.  All the
30possible errors that cannot be conveniently detected by bitstreams are
31checked in this last module.  The final output reports any
32well-formedness error detected and its position within the input file.
[1302]33
[1380]34Within this structure, all functions in the four shaded modules
35consist entirely of parallel bit stream operations.  Of these, the
36Classification function consists of XML character class definitions
37that are generated using ccc, while much of the U8\_Validation
38similarly consists of UTF-8 byte class definitions that are also
39generated by ccc.  The remainder of these functions are programmed
40using our unbounded bitstream language following the logical
41requirements of XML parsing.  All the functions in the four shaded
42modules are then compiled to low-level C/C++ code using our Pablo
43compiler.  This code is then linked in with the general Transposition
44code available in the Parabix run-time library, as well as the
45hand-written Postprocessing code that completes the well-formed
46checking.
Note: See TracBrowser for help on using the repository browser.