source: docs/HPCA2012/03b-research.tex @ 1382

Last change on this file since 1382 was 1380, checked in by ashriram, 8 years ago

Done evaluation

File size: 2.3 KB
Line 
1\section{The Parabix XML Parser}
2\label{section:parser}
3
4\begin{figure}[h]
5\begin{center}
6\includegraphics[width=1\textwidth]{plots/parabix_arch.pdf}
7\end{center}
8\caption{Parabix XML Parser Structure}
9\label{parabix_arch}
10\end{figure}
11
12
13Figure \ref{parabix_arch} shows the overall structure of the Parabix
14XML parser set up for well-formedness checking.  The input file is
15processed using 11 functions organized into 7 modules.  In the first
16module, the Read\_Data function loads data blocks from an input file
17to data\_buffer.  The data is then transposed to eight parallel basis
18bitstreams (basis\_bits) in the Transposition module.  The eight
19bitstreams are used in the Classification function to generate all the
20XML lexical item streams (lex) as well as in the U8\_Validation module
21to validate UTF-8 characters.  The lexical item streams and scope
22streams (scope) that are generated in Gen\_Scope function are supplied
23to the parsing module, which consists three functions, Parse\_CtCDPI,
24Parse\_Ref and Parse\_tag.  These functions deal with the parsing of
25comments, CDATA sections, processing instructions, references and
26tags.  After this, information is gathered by Name\_Validation and
27Err\_Check functions, producing name check streams and error streams.
28These are then passed to the final module for Postprocessing.  All the
29possible errors that cannot be conveniently detected by bitstreams are
30checked in this last module.  The final output reports any
31well-formedness error detected and its position within the input file.
32
33Within this structure, all functions in the four shaded modules
34consist entirely of parallel bit stream operations.  Of these, the
35Classification function consists of XML character class definitions
36that are generated using ccc, while much of the U8\_Validation
37similarly consists of UTF-8 byte class definitions that are also
38generated by ccc.  The remainder of these functions are programmed
39using our unbounded bitstream language following the logical
40requirements of XML parsing.  All the functions in the four shaded
41modules are then compiled to low-level C/C++ code using our Pablo
42compiler.  This code is then linked in with the general Transposition
43code available in the Parabix run-time library, as well as the
44hand-written Postprocessing code that completes the well-formed
45checking.
Note: See TracBrowser for help on using the repository browser.