# Changeset 1331 for docs/HPCA2012/03b-research.tex

Ignore:
Timestamp:
Aug 20, 2011, 7:51:52 PM (8 years ago)
Message:

section 4

File:
1 edited

### Legend:

Unmodified
 r1302 \section{Parabix2} Talk about where Parabix1 may be improved. \section{Parabix} Talk about compiler effort. Talk about usage of new SIMD instructions. Describe Parabix 2. Describe differences between Parabix1 and Parabix2. Why is Parabix2 better? \subsection{Parabix Architecture} \begin{figure} \begin{center} \includegraphics[width=0.5\textwidth]{plots/parabix_arch.pdf} \end{center} \caption{Parabix2 Architecture} \label{parabix_arch} \end{figure} Figure \ref{parabix_arch} shows the overall architecture of the parabix for well-formedness checking. The input file is processed by 7 modules or 11 stages and the error position is reported at the end if there is any. The first stage, Read\_Data, loads a chunk of data from an input file to data\_buffer. The data is then transposed to eight parallel basis bitstreams (basis\_bits) in the Transposition stage. The eight bitstreams are used in Classification stage to generate all the XML lexical item streams (lex) as well as in U8\_Validation stage to validate UTF-8 characters. The lexical item streams and scope streams (scope) that are generated in Gen\_Scope stage are supplied to the parsing module, which consists three stages, Parse\_CtCDPI, Parse\_Ref and Parse\_tag. After parsing the comments, cdata, processing instructions, references and tags, information is gathered by Name\_Validation and Err\_Check stages, where name streams and error streams are calculated and passed to the final stage, Postprocessing. All the possible errors that cannot be detected by bitstreams are checked in this last stage and error type with line and column number will be reported. \subsection{Parallel Bit Stream Compilation} While the description of parallel bit stream parsing in the previous section works conceptually on unbounded bit streams, in practice, a corresponding C implementation to process input streams into blocks of size equal to the SIMD register width of the target processor is required. In our work, we leverage the unbounded integer type of the Python programming language. Using a restricted subset of Python, we prototype and validate the functionality of applications, such as XML validation and UTF-8 to UTF-16 transcoding. We then compile this Python code into equivalent block-at-a-time C code. The key question becomes how to transfer information from one block to the next whenever token scans cross block boundaries. The answer lies in carry bit propagation. Since the parallel $scanto$ operation relies solely on bit-wise addition and logical operations, block-to-block information transfer is captured in entirety by the carry bit associated with each underlying addition operation. Logical operations do not require information flow across block boundaries. Properly determining, initializing and inserting carry bits into a block-by-block implementation is tedious and error prone. Thus we have developed compiler technology to automatically transform parallel bit stream Python code to block-at-a-time C implementations. Details are beyond the scope of this paper, but are described in the on-line source code repository at parabix.costar.sfu.ca.