source:docs/HPCA2012/03b-research.tex@1331

Last change on this file since 1331 was 1331, checked in by lindanl, 8 years ago

section 4

File size: 2.8 KB
Line
1\section{Parabix}
2
3\subsection{Parabix Architecture}
4\begin{figure}
5\begin{center}
6\includegraphics[width=0.5\textwidth]{plots/parabix_arch.pdf}
7\end{center}
8\caption{Parabix2 Architecture}
9\label{parabix_arch}
10\end{figure}
11
12
13Figure \ref{parabix_arch} shows the overall architecture of the parabix for well-formedness checking.
14The input file is processed by 7 modules or 11 stages and the error position is reported at the end if there is any.
15The first stage, Read\_Data, loads a chunk of data from an input file to data\_buffer.
16The data is then transposed to eight parallel basis bitstreams (basis\_bits) in the Transposition stage.
17The eight bitstreams are used in Classification stage to generate all the XML lexical item streams (lex)
18as well as in U8\_Validation stage to validate UTF-8 characters.
19The lexical item streams and scope streams (scope) that are generated in Gen\_Scope stage
20are supplied to the parsing module, which consists three stages, Parse\_CtCDPI, Parse\_Ref and Parse\_tag.
21After parsing the comments, cdata, processing instructions, references and tags,
22information is gathered by Name\_Validation and Err\_Check stages,
23where name streams and error streams are calculated and passed to the final stage, Postprocessing.
24All the possible errors that cannot be detected by bitstreams are checked in this last stage and
25error type with line and column number will be reported.
26
27\subsection{Parallel Bit Stream Compilation}
28
29
30While the description of parallel bit stream parsing in the previous section works conceptually on
31unbounded bit streams, in practice, a corresponding C implementation to process input streams into blocks
32of size equal to the SIMD register width of the target processor is required. In our work, we leverage the unbounded
33integer type of the Python programming language. Using a restricted subset of Python, we prototype and validate the
34functionality of applications, such as XML validation and UTF-8 to UTF-16 transcoding. We then compile this Python code
35into equivalent block-at-a-time C code. The key question becomes how to transfer information from one block to the next whenever
36token scans cross block boundaries.
37
38The answer lies in carry bit propagation. Since the parallel $scanto$ operation relies solely on bit-wise addition and logical operations,
39block-to-block information transfer is captured in entirety by the carry bit associated with each underlying addition operation. Logical operations
40do not require information flow across block boundaries. Properly determining, initializing and inserting carry bits into a block-by-block
41implementation is tedious and error prone. Thus we have developed compiler technology to automatically transform parallel bit stream
42Python code to block-at-a-time C implementations. Details are beyond the scope of this paper, but are described in the on-line
43source code repository at parabix.costar.sfu.ca.
44
Note: See TracBrowser for help on using the repository browser.