source: docs/HPCA2012/03b-research.tex @ 1362

Last change on this file since 1362 was 1355, checked in by cameron, 8 years ago

Use of ccc and Pablo

File size: 2.3 KB
Line 
1\section{The Parabix XML Parser}
2\label{section:parser}
3\subsection{Parser Structure}
4
5\begin{figure}[h]
6\begin{center}
7\includegraphics[width=1\textwidth]{plots/parabix_arch.pdf}
8\end{center}
9\caption{Parabix2 Structure}
10\label{parabix_arch}
11\end{figure}
12
13
14Figure \ref{parabix_arch} shows the overall structure of the Parabix XML parser set up for
15well-formedness checking.
16The input file is processed using 11 functions organized into 7 modules. 
17In the first module, the Read\_Data function loads data blocks from an input file to data\_buffer.
18The data is then transposed to eight parallel basis bitstreams (basis\_bits) in the Transposition module.
19The eight bitstreams are used in the Classification function to generate all the XML lexical item streams (lex)
20as well as in the U8\_Validation module to validate UTF-8 characters.
21The lexical item streams and scope streams (scope) that are generated in Gen\_Scope function
22are supplied to the parsing module, which consists three functions, Parse\_CtCDPI, Parse\_Ref and Parse\_tag.
23These functions deal with the parsing of
24comments, CDATA sections, processing instructions, references and tags.   After this,
25information is gathered by Name\_Validation and Err\_Check functions, producing
26name check streams and error streams.  These are then passed to the final module for Postprocessing.
27All the possible errors that cannot be conveniently detected by bitstreams are checked in this last module.
28The final output reports any well-formedness error detected and its position within the input file.
29
30Within this structure, all functions in the four shaded modules consist entirely of parallel bit stream
31operations.  Of these, the Classification function consists of XML character class definitions that
32are generated using ccc, while much of the U8\_Validation similarly consists of UTF-8 byte class
33definitions that are also generated by ccc.  The remainder of these functions are programmed using
34our unbounded bitstream language following the logical requirements of XML parsing.   All the functions
35in the four shaded modules are then compiled to low-level C/C++ code using our Pablo compiler.   This
36code is then linked in with the general Transposition code available in the Parabix run-time library,
37as well as the hand-written Postprocessing code that completes the well-formed checking.
Note: See TracBrowser for help on using the repository browser.