Changeset 1354 for docs/HPCA2012


Ignore:
Timestamp:
Aug 23, 2011, 5:25:41 PM (8 years ago)
Author:
ksherdy
Message:

first revision of section 3

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/HPCA2012/03-research.tex

    r1331 r1354  
    11\section{Parabix}
    22\label{section:parabix}
     3
    34%Describe key technology behind Parabix
    45%Introduce SIMD;
     
    2324% This section focuses on the
    2425
    25 
    26 % With this method, byte-oriented character data is first transposed to eight parallel bit streams, one for each bit position within the character code units (bytes). These bit streams are then loaded into SIMD registers of width $W$ (e.g., 64-bit, 128-bit, 256-bit, etc). This allows $W$ consecutive code units to be represented and processed at once. Bitwise logic and shift operations, bit scans, population counts and other bit-based operations are then used to carry out the work in parallel \cite{CameronLin2009}.
     26% With this method, byte-oriented character data is first transposed to eight parallel bit streams, one for each bit position
     27% within the character code units (bytes). These bit streams are then loaded into SIMD registers of width $W$ (e.g., 64-bit,
     28% 128-bit, 256-bit, etc). This allows $W$ consecutive code units to be represented and processed at once. Bitwise logic and
     29% shift operations, bit scans, population counts and other bit-based operations are then used to carry out the work in parallel
     30% \cite{CameronLin2009}.
    2731
    2832% The results of \cite{CameronHerdyLin2008} showed that Parabix, the predecessor of Parabix2, was dramatically faster than both Expat 2.0.1 and Xerces-C++ 2.8.0.
     
    3236% Hence, one of the foci in our study is the manner in which straight line SIMD code influences energy usage.
    3337
    34 This section provides an overview of the SIMD-based parallel bit stream XML parsers, Parabix1 and Parabix2. A comprehensive study of Parabix2 can be found in the technical report ``Parallel Parsing with Bitstream Addition: An XML Case Study'' \cite{Cameron2010}.
    35 
    36 \subsection{Parabix1}
    37 
    38 % Our first generation parallel bitstream XML parser---Parabix1---uses employs a less conventional approach of SIMD technology to represent text in parallel bitstreams. Bits of each stream are in one-to-one-correspondence with the bytes of a character stream. A transposition step first transforms sequential byte stream data into eight basis bitstreams for the bits of each byte.
    39 
    40 Parabix1 processes source XML in a functionally equivalent manner as a traditional recursive descent XML parser. That is, Parabix1 moves sequentially through the source document, maintains a single parser cursor position, and parsers recursively and depth-first. Where Parabix1 differs from the traditional parser is that it scans for key markup characters using a series of bit streams. A bit stream is simply a sequence of $0$s and $1$s. A $1$-bit marks the postion of each key character in the corresponding source data stream. A single stream is generated for each of the key markup characters.
    41 
    42 In Parabix1, basis bit streams are used to generate character-class streams for key markup characters. Basis bit streams are defined as the set of bit streams that represent the transposed data format of the source XML byte data. In other words, $M$-bit source characters are represented in transposed representation using $M$ basis bit streams. Figure \ref{fig:BitstreamsExample} presents an example of the basis bit stream representation of 8-bit ASCII characters. $B_0 \ldots B_7$ are the individual bit streams. The $0$ bits in the bit streams are represented by periods as to emphasize the $1$ bits.
     38This section presents an overview of the SIMD-based parallel bit stream text processing framework, \emph{Parabix}
     39In Section {\bf 4}, we discuss one of its implementations, \emph{Parabix-XML}.
     40A more comprehensive study of Parabix and Parabix-XML, can be found in the technical report
     41``Parallel Parsing with Bitstream Addition: An XML Case Study'' \cite{Cameron2010}.
     42
     43The fundemental difference between the Parabix framework and traditional text processing models is in how
     44Parabix represents the source data.
     45Rather than viewing it as a stream of bytes or characters and performing per-byte comparisons to
     46determine when to transition between parsing states, Parabix views data as a set of \emph{bit streams} and uses a mixture of
     47boolean-logic and integer-based math on those streams to effectively parse many bytes in parallel.
     48In this paper, we use the notation $\land$, $\lor$ and $\lnot$ to denote the AND, OR and NOT boolean operations.
     49Bit streams are the foundation of the Parabix technology. In order to understand how it is possible to
     50perform complex state-transition-based text processing operations in parallel, understanding what bit streams are,
     51how they are created, and --- more importantly --- how they are used by Parabix, is critical.
     52
     53% Our first generation parallel bitstream XML parser---Parabix1---uses employs a less conventional approach of
     54% SIMD technology to represent text in parallel bitstreams. Bits of each stream are in one-to-one-correspondence
     55% with the bytes of a character stream. A transposition step first transforms sequential byte stream data into
     56% eight basis bitstreams for the bits of each byte.
     57
     58% Parabix1 is functionally equivalent to a traditional XML parser.
     59% That is, Parabix1 moves a single cursor sequentially through the source document and
     60% parses it to distinguish between (and properly validate) the markup and content.
     61% Where Parabix1 differs from traditional parsers is that instead of processing a document a byte-at-a-time, it
     62% transforms the document into a set of \emph{bit streams} (explained shortly) and then performs boolean-logic operations
     63% on those bit streams to process many bytes in parallel.
     64
     65
     66%Each $1$-bit marks the postion of each key character in the
     67%corresponding source data stream. A single stream is generated for each of the key markup characters.
     68
     69\subsection{What are Bit Streams?}
     70
     71A bit stream is simply a sequence of $\tt 1$s and $\tt 0$s.
     72The significance of each bit value is dependent on the type of bit stream.
     73We view bit streams in one of two ways: $n$ 1-bit values for boolean-logic operations and 1 $n$-bit value for integer-based math.
     74For simplicity, assume that $n$ is infinitely long w.r.t. the size of the source data. In reality, each bit stream is divided into
     75a series of $w$-bit blocks, where $w$ is equal to the width of the SIMD registers within the system
     76(e.g., 64-bits for MMX, 128-bits for SSE/NEON, and 256-bits for AVX).
     77We discuss how these $\frac{n}{w}$ bit stream segments can be used as infinitely-long bit stream in Section \ref{parallel bit stream compilation}.
     78
     79The first type of bit streams used in Parabix are refered to as \emph{basis bit streams}, which contain the input source data.
     80Parabix uses the basis bit streams to construct \emph{character-class bit streams} in which each $\tt 1$ bit indicates the presense
     81of a significant character (or class of characters) in the parsing process.
     82Character-class bit streams are used to compute \emph{lexical bit streams} and \emph{error bit streams}, which
     83Parabix uses to process and validate the source document. The remainder of this section will discuss each type of bit stream.
     84
     85{\bf Basis Bit Streams:}
     86To construct the basis bit streams, the source data is first loaded in
     87sequential order and then transposed --- through a series of SIMD pack, shift, and bitwise operations ---
     88so that Parabix can efficiently produce the character-class bit streams.
     89Essentially, when the source data is in basis bit stream form, the $k$-th bit of $i$-th character in the source
     90text is in the $i$-th (bit) position of the $k$-th basis bit stream, $b_{k}$.
     91Using the SIMD capabilities of current commodity processors, the transposition process incurs an amortized cost
     92of approximately 1 cycle per byte \cite{CameronHerdyLin2008}.
     93The size of $k$ is dependent on the code unit size of the text encoding format of the source document. A code unit is simply
     94a fixed number of bits (or bytes) used to represent a specific character (code point). Some encoding formats, such
     95as UNICODE, may use a multiple code units to express all of its possible code points.
     96How multi-code-unit characters can be parsed efficiently goes beyond the scope of this paper.
     97The most dominant format in data-oriented XML documents is ASCII, which uses a 8-bit code unit to represent all of the
     98128 code points within it.
     99In Figure \ref{fig:BitstreamsExample}, we show how the ASCII string ``{\ttfamily b7\verb`<`A}'' is represented as 8 basis bit
     100streams, $\tt b_{0 \ldots 7}$. The bits used to construct $\tt b_7$ have been highlighted in this example.
    43101
    44102\begin{figure}[h]
     103
    45104\begin{center}
    46 \begin{tabular}{lr}\\
    47 source data & \verb`<t1>abc</t1><tag2/>`\\
    48 $B_0$ & \verb`..1.1.1.1.1...11.1.`\\
    49 $B_1$ & \verb`...1.11.1..1...1111`\\
    50 $B_2$ & \verb`11.1...111.111.1.11`\\
    51 $B_3$ & \verb`1..1...11..11....11`\\
    52 $B_4$ & \verb`1111...1.11111..1.1`\\
    53 $B_5$ & \verb`1111111111111111111`\\
    54 $B_6$ & \verb`.1..111..1...111...`\\
    55 $B_7$ & \verb`...................`\\
     105\begin{tabular}{r c c c c }
     106STRING & \ttfamily{b} & \ttfamily{7} & \ttfamily{\verb`<`} & \ttfamily{A} \\
     107ASCII & \ttfamily{\footnotesize 0110001{\bfseries 0}} & \ttfamily{\footnotesize 0011011{\bfseries 1}} & \ttfamily{\footnotesize 0011110{\bfseries 0}} & \ttfamily{\footnotesize 0100000{\bfseries 1}} \\
     108\hline
    56109\end{tabular}
    57110\end{center}
    58 \caption{Example 8-bit ASCII Character Basis Bit Streams}
     111\begin{center}
     112\begin{tabular}{r |c |c |c |c |c |c |c |c |}
     113 & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{0}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{1}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{2}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{3}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{4}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{5}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{6}$}$ & $\mbox{\fontsize{11}{11}\selectfont $\tt b_{7}$}$ \\
     114 & \ttfamily{0} & \ttfamily{1} & \ttfamily{1} & \ttfamily{0} & \ttfamily{0} & \ttfamily{0} & \ttfamily{1} & \bfseries\ttfamily{0} \\
     115 & \ttfamily{0} & \ttfamily{0} & \ttfamily{1} & \ttfamily{1} & \ttfamily{0} & \ttfamily{1} & \ttfamily{1} & \bfseries\ttfamily{1} \\
     116 & \ttfamily{0} & \ttfamily{0} & \ttfamily{1} & \ttfamily{1} & \ttfamily{1} & \ttfamily{1} & \ttfamily{0} & \bfseries\ttfamily{0} \\
     117 & \ttfamily{0} & \ttfamily{1} & \ttfamily{0} & \ttfamily{0} & \ttfamily{0} & \ttfamily{0} & \ttfamily{0} & \bfseries\ttfamily{1} \\
     118\end{tabular}
     119\end{center}
     120
     121
     122\caption{Example 7-bit ASCII Basis Bit Streams}
    59123\label{fig:BitstreamsExample}
    60124\end{figure}
    61125
    62 To transform byte-oriented character data to parallel bit stream representation, source data is first loaded into SIMD registered in sequential order. It is then converted to the transposed basis bit stream representation through a series of parallel SIMD pack, shift, and logical bitwise operations. Using the SIMD capabilities of current commodity processors, the transposition of source data to basis bit stream format incurs an amortized cost of approximately 1 cycle per byte \cite{CameronHerdyLin2008}.
    63 
    64 Throughout the XML parsing process we must identify key XML characters. For example, the opening angle bracket character `<'. For this purpose, we combine the basis bit streams using bitwise logic and compute character-class bit streams. For example, the $j$-th character is an open angle bracket `<' if and only if the $j$-th bit of $B_2, B_3, B_4, B_5 =$ 1 and the $j$-th bit of $B_0, B_1, B_6, B_7 =$ 0. Character-class streams mark the positions of source characters as a single $1$-bit. Each bit position in the computed bit stream is in one-to-one correspondence with its source byte position.  Once generated, single cycle built-in {\em bitscan} operations are used to locate the positions of key XML characters throughout the parsing process. Utilizing $M$ SIMD registers of width $W$, it is possible to scan through $W$ characters in parallel. The register width $W$ is processor dependent and ranges from 64-bit for MMX, to 128-bit for SSE, and 256-bit for AVX.
    65 
    66 A common operation in XML parsing is XML start tag validation. Starts tags begin with `<' and end with either ``/>'' or ``>'' (depending on whether the element tag is an empty element tag or not, respectively). Figure \ref{fig:Parabix1StarttagExample} conceptually demonstrates start tag validation as performed in Parabix1 using character-class streams together with the processor built-in $bitscan$ operation. We proceeed as follows. The first bit stream $M_0$ is created and the parser begins scanning the source data for an open angle bracket `<', starting at position 1. Since the source data begins with `<', $M_0$ is assigned a cursor position of 1. The $advance$ operation then shifts $M_0$'s cursor position by 1, resulting in the creation of a new bit stream, $M_1$, with the cursor position at 2. The following $bitscan$ operation takes the cursor position from $M_1$ and sequentially scans every position until it locates either an `>'. It finds a `>' at position 4 and returns that as the new cursor position for $M_2$. Calculating $M_3$ advances the cursor again, and the $bitscan$ used to create $M_4$ locates the new opening angle bracket. This process continues in sequence until until all start tags are validated. Unlike traditional parsers, these sequential operations are accelerated significantly since the {\em bitscan} operation can skip up to $w$ positions, where $w$ is the processor word width in bits. This approach has recently been applied to Unicode transcoding and XML parsing to good effect, with research prototypes showing substantial speed-ups over even the best of byte-at-a-time alternatives \cite{CameronHerdyLin2008, Herdy2008, Cameron2009}.
     126
     127% In Parabix1, basis bit streams are used to generate \ref{lexical bit streams} for key markup characters.
     128% Basis bit streams
     129% are defined as the set of bit streams that represent the transposed data format of the source XML byte data. In other
     130% words, $M$-bit source characters are represented in transposed representation using $M$ basis bit streams.
     131% Figure \ref{fig:BitstreamsExample} presents an example of the basis bit stream representation of 8-bit
     132% ASCII characters. $B_0 \ldots B_7$ are the individual bit streams. The $0$ bits in the bit streams are represented
     133% by periods as to emphasize the $1$ bits.
     134
     135{\bf Character-class Bit Streams:}
     136Typically, as text parsers process input data, they locate specific characters to determine if and when to transition between
     137data and metadata parsing.
     138% For example, in a CSV file, any `,' or `\textbackslash n' can indicate the start of a new column or row respectively.
     139For example, in XML, any opening angle bracket character, `\verb`<`', may indicate that we are starting a new markup tag.
     140Traditional byte-at-a-time parsers find these characters by comparing the value of each code point with a set
     141of known code points and branching appropriately when one is found, typically using an if or switch statement.
     142Using this method to perform multiple transitions in parallel is non-trivial and may require fairly sophisticated algorithms
     143to do so correctly.
     144% However, a `<' is legal within an XML comment so not every `<' necessarily means that we are opening a new tag.
     145
     146Character-class bit streams allow us to perform up to 128 ``comparisons'' in parallel in a single operation (using Intel SSE/ARM NEON)
     147by using a series of boolean-logic operations to merge multiple basis bit streams into
     148a single character-class stream that marks the positions of key characters with a $1$.
     149For example, a character is an `\verb`<`' if and only if $\lnot(b_ 0 \lor b_1) \land (b_2 \land b_3 \land b_4 \land b_5) \land \lnot (b_6 \lor b_7) = 1$.
     150Classes of characters can be found with similar formulas.
     151For example, a character is a number {\tt [0-9]} if and only if $\lnot(b_0 \lor b_1) \land (b_2 \land b_3) \land \lnot(b_4 \land (b_5 \lor b_6))$.
     152An important observation here is that a range of characters can sometimes take fewer operations and require fewer basis bit streams to compute
     153than individual characters.
     154Finding optimal solutions to all character-classes is non-trivial and goes beyond the scope of this paper.
     155
     156% The advantage of character-class bit streams over traditional ``byte streams'' is that
     157
     158% Rather than testing each byte individually,
     159% Parabix1 combines the basis bit streams using boolean-logic and computes a lexical bit stream.
     160
     161
     162{\bf Lexical and Error Bit Streams:}
     163To perform lexical analysis on the input data, Parabix computes lexical and error bit streams from the character-class bit streams using
     164a mixture of both boolean logic and integer math. Lexical bit streams typically mark multiple current parsing positions.
     165Unlike the single-cursor approach of traditional text parsers, these allow Parabix to process multiple cursors in parallel.
     166Error bit streams are often the byproduct or derivative of computing lexical bit streams and can be used to identify any well-formedness
     167issues found during the parsing process. The presense of a $\tt 1$ bit in an error stream tends to mean that the lexical stream cannot be
     168trusted to be completely accurate and Parabix may need to perform some sequential parsing on that section to determine the cause and severity
     169of the error.
     170
     171To form lexical bit streams, we have to introduce a few new operations: Advance and ScanThru.
     172The $\texttt{Advance}$ operator accepts one input parameter, $c$, which is typically viewed as a bit stream containing multiple cursor bits,
     173and returns $c + c$ --- effectively moves each cursor one position to the ``right''.
     174$\texttt{ScanThru}$ accepts two input parameters, $c$ and $m$; any bit that is in both $c$ and $m$ is moved to first subsequent
     175$\tt 0$-bit in $m$ by calculating $(c + m) \land \lnot m$.
     176For example, in Figure \ref{fig:ParabixParsingExample} suppose we have the regular expression \verb`<[a-zA-Z]+>` and wish to find all
     177instances of it in the source text.
     178%We know from this statement that only strings starting with \verb`<` that contain at least one letter and end with a \verb`>` are matches.
     179We begin by constructing the character classes $C_{0}$, which consists of all letters, $C_{1}$, which contains all `\verb`>`'s, and
     180$C_{2}$, which marks all `\verb`<`'s. In $L_0$ the position of every `\verb`<`' is advanced by one to locate the first character of each
     181token. By computing $E_{0}$, the parser notes that ``\verb`<>`'' does not match the expected pattern. To find the end positions of each token,
     182the parser calculates $L_{1}$ by moving the cursors in $L_0$ through the letter bits in $C_0$. $L_1$ is then validated to ensure that each
     183token ends with a `\verb`>`' and discovers that ``\verb`<error]`'' too fails to match the expected pattern.
     184With additional post bit-stream processing, the erroneous cursor positions in $L_{0}$ and $L_{1}$ can be removed or ignored; the details
     185of which go beyond the scope of this paper.
    67186
    68187\begin{figure}[h]
     188
    69189\begin{center}
    70 \begin{tabular}{lr}\\
    71 source data                     & \verb`<t1>abc</t1><tag2/>`\\
    72 $M_0 = 1$                       & \verb`1..................`\\
    73 $M_1 = advance(M_0)$            & \verb`.1.................`\\
    74 $M_2 = bitscan('>')$            & \verb`...1...............`\\
    75 $M_3 = advance(M_2)$            & \verb`....1..............`\\
    76 $M_4 = bitscan('<')$            & \verb`.......1...........`\\
    77 $M_5 = advance(M_4)$            & \verb`........1..........`\\
    78 $M_6 = advance(M_5)$            & \verb`.........1.........`\\
    79 $M_7 = bitscan('<')$            & \verb`............1......`\\
    80 $M_{8} = advance(M_7)$  & \verb`.............1.....`\\
    81 $M_{9} = bitscan('/')$  & \verb`.................1.`\\
    82 $M_{10} = advance(M_{9})$       & \verb`..................1`\\
     190\begin{tabular}{lclr}
     191\multicolumn{3}{l}{source text} & \verb`<a><valid> <string>  <>ignored><error]` \\
     192$C_{0}$ & $=$ & $\texttt{[a-zA-Z]}$ & \verb`.1..11111...111111.....1111111..11111.` \\
     193$C_{1}$ & $=$ & $\texttt{[>]}$ & \verb`..1......1........1...1.......1.......` \\
     194$C_{2}$ & $=$ & $\texttt{[<]}$ & \verb`1..1.......1.........1.........1......` \\
     195$L_{0}$ & $=$ & $\texttt{Advance}(C_{2})$ & \verb`.1..1.......1.........1.........1.....` \\
     196$E_{0}$ & $=$ & $L_{0} \land \lnot C_{0}$ & \verb`......................1...............` \\
     197$L_{1}$ & $=$ & $\texttt{ScanThru}(L_{0}, C_{0})$ & \verb`..1......1........1...1..............1` \\
     198$E_{1}$ & $=$ & $L_{1} \land \lnot C_{1}$ & \verb`.....................................1` \\
    83199\end{tabular}
    84200\end{center}
    85 \caption{Parabix1 Start Tag Validation}
    86 \label{fig:Parabix1StarttagExample}
     201
     202
     203\caption{Lexical Parsing in Parabix}
     204\label{fig:ParabixParsingExample}
    87205\end{figure}
    88206
    89 \subsection{Parabix2}
    90 
    91 In Parabix2, the sequential single-cursor parsing approach using {\em bitscan} instructions is replaced by a parallel parsing approach, that uses multiple cursors when possible, and bit stream addition operations to advance multiple cursor positions in parallel.
    92 Unlike the single-cursor approach of Parabix1 (and conceptually of all sequential XML parsers),
    93 Parabix2 processes multiple cursors in parallel. For example, using the source data from
    94 Figure \ref{fig:Parabix1StarttagExample}, Figure \ref{fig:Parabix2StarttagExample} conceptually demonstrates the manner in which Parabix2 identifies and advances each of the start tag bit streams. Unlike Parabix1, Parabix2 begins scanning by creating two character-class bit streams, $N$, denoting the position of every alpha numeric character within the basis stream, and $M_0$, marking the position of every potential start tag in the bit stream. $M_0$ is advanced to create $M_1$, which is fed into the first $scanto$ operation along with $N$.  To handle variable length tag names, the $scanto$ operation effectively locates the cursor positions of the end tags in parallel by adding $M_1$ to $N$, and uses the bitwise AND operation of the negation of $N$ to find only the true end tags of $M_1$. Because an end tag may end on an `/' or '>', $scanto$ is called again to advance any cursor from `/' to `>'. For additional details, refer to the technical report \cite{Cameron2010}.
    95 
     207% For example, using the source data from Figure \ref{fig:Parabix1StarttagExample},
     208%
     209% Figure \ref{fig:Parabix2StarttagExample} conceptually demonstrates
     210% the manner in which Parabix2 identifies and advances each of the start tag bit streams. Unlike Parabix1, Parabix2
     211% begins scanning by creating two character-class bit streams, $N$, denoting the position of every alpha numeric
     212% character within the basis stream, and $M_0$, marking the position of every potential start tag in the bit stream.
     213% $M_0$ is advanced to create $M_1$, which is fed into the first $scanto$ operation along with $N$.  To handle variable
     214% length tag names, the $scanto$ operation effectively locates the cursor positions of the end tags in parallel by
     215% adding $M_1$ to $N$, and uses the $\land$ operation of $\lnot N$ to find only the true end tags of $M_1$.
     216% Because an end tag may end on an `/' or '>', $scanto$ is called again to advance any cursor from `/' to `>'. For additional
     217% details, refer to the technical report \cite{Cameron2010}.
     218
     219Using this parallel bit stream approach, conditional branch statements used to identify key positions and/or syntax errors at each
     220each parsing position are mostly eliminated, which, as Section XXX shows, minimizes branch misprediction penalties.
     221Accurate parsing and parallel lexical analysis is done through processor-friendly equations that requires no speculation nor multithreading.
     222
     223\subsection{Parallel Bit Stream Compilation}
     224\label {parallel bit stream compilation}
     225
     226While the description of parallel bit stream parsing in the previous section works conceptually on
     227infinitely-long bit streams, in practice, the implementation of a Parabix parser must process input streams as bit stream blocks
     228whose size is equal to the SIMD register width of the target processor.
     229In our work, we leverage the unbounded integer type of the Python programming language to verify our equations
     230against unbounded bit streams.
     231Using a restricted subset of Python, we prototype and validate the functionality of our applications,
     232such as CSV parsing, XML validation, and UTF-8 to UTF-16 transcoding, etc.
     233To transform the Python code into the equivalent block-at-a-time C/C++ code,
     234the key question becomes how to efficiently transfer information from one bit stream block to the next whenever the result of an operation
     235crosses a block boundary.
     236
     237The answer lies in carry bit propagation. Since the parallel ScanThru operation relies solely on bit-wise addition and logical operations,
     238block-to-block information transfer is captured in entirety by the carry bit associated with each underlying addition operation.
     239Boolean-logical operations do not pass information over block boundaries since each bit value is independent of the state of its surrounding bits.
     240Properly determining, initializing and inserting carry bits into a block-by-block implementation is tedious and error prone.
     241Thus we have developed compiler technology to automatically transform parallel bit stream
     242Python code to block-at-a-time C/C++ implementations. Details are beyond the scope of this paper, but are described in the on-line
     243source code repository at {\tt http://parabix.costar.sfu.ca}.
     244
     245\subsection{Parabix Tool Chain}
     246
     247To support the Parabix framework, we are developing tool technology to automate various aspects
     248of parallel bit stream programming. At present, our tool chain consists of two compilers: a character class
     249compiler (ccc) and a unbounded bit stream to C/C++ block-at-a-time processing compiler (Pablo).
     250
     251The character class compiler is used to automatically produce bit stream logic for all the individual characters (e.g.,
     252delimiters) and character classes (e.g., digits, letters) used in a particular application.
     253Input is specified using a character class syntax adapted from the standard regular expression notations.
     254Output is a minimized set of three-address bitwise operations, such as $a = b~\&~c$,
     255to compute each of the character classes from the basis bit streams.
     256
     257For example, Figure \ref{fig:CCC} shows the input and output produced by the
     258character class compiler for the example of \verb`[0-9]`
     259discussed in the previous section.  The output operations
     260may be viewed as operations on a single block of input
     261at a time, or may be viewed as operations on unbounded bitstreams
     262as supported by the Pablo compiler.
    96263
    97264\begin{figure}[h]
     265
    98266\begin{center}
    99 \begin{tabular}{lr}\\
    100 source data                     & \verb`<t1>abc</t1><tag2/>`\\
    101 $N = $ Tag Names                & \verb`.11......11..1111..`\\
    102 $M_0 = \texttt{[<]}$            & \verb`1...........1......`\\
    103 $M_1 = advance(M_0)$            & \verb`.1...........1.....`\\
    104 $M_2 = scanthru(M_1, A)$                & \verb`...1.............1.`\\
     267
     268\begin{tabular}{r l}
     269\ttfamily{\bfseries INPUT:} & \verb`digit = [<]` \\
     270\ttfamily{\bfseries OUTPUT:} & \verb`temp1 = (basis_bits.bit_0 | basis_bits.bit_1)` \\
     271& \verb`temp2 = (basis_bits.bit_2 & basis_bits.bit_3)` \\
     272& \verb`temp3 = (temp2 &~ temp1)` \\
     273& \verb`temp4 = (basis_bits.bit_5 | basis_bits.bit_6)` \\
     274& \verb`temp5 = (basis_bits.bit_4 & temp4)` \\
     275& \verb`digit = (temp3 &~ temp5)`
    105276\end{tabular}
     277
    106278\end{center}
    107 \caption{Parabix2 Start Tag Validation}
    108 \label{fig:Parabix2StarttagExample}
     279
     280\caption{Character Class Compiler Input/Output}
     281\label{fig:CCC}
    109282\end{figure}
    110283
    111 In general, the set of bit positions in a bit stream may be considered to be the current parsing
    112 positions of multiple parses taking place in parallel throughout the source data stream. Although it is not explicitly shown in these prior examples, error bit streams can be used to identify any well-formedness errors found during the parsing process. Error positions are gathered and
    113 processed in as a final post processing step. A further aspect of the parallel cursor method with bit stream addition is that the conditional branch statements used to identify syntax error at each each parsing position are eliminated. Hence, Parabix2 offers additional parallelism over Parabix1 in the form of multiple cursor parsing and further reduces branch misprediction penalties.
    114 
    115 
    116 
     284Pablo is a compiler that abstracts away the details of
     285programming parallel bit stream code in terms of finite
     286SIMD register widths and application buffer sizes.  Input
     287to Pablo is a language for expressing bitstream operations
     288on unbounded bitstreams.  The operations include bitwise
     289logic, the {\tt Advance} and {\tt ScanThru} operations described in he
     290previous subsection as well as if and while control structures.
     291Pablo translates these operations to block-at-a-time
     292code in C/C++, where the block size is the register
     293width for SIMD operations on the selected target architecture.
     294
     295
     296\subsection{Parabix Run-Time Libraries}
     297
     298The Parabix architecture also includes run-time libraries
     299that support a machine-independent view of basic SIMD
     300operations, as well as a set of core function libraries.
     301For machine-independence, we program all operations using
     302an abstract SIMD machine based on the Inductive Doubling
     303Instruction Set Architecture (IDISA) \cite{CameronLin2009}.
     304Originally developed for 128-bit Altivec operations on Power PC
     305as well as 64-bit MMX and 128-bit SSE operations on Intel,
     306we have recently extended our library support to include
     307the new 256-bit AVX operations on Intel as well as the 128-bit
     308NEON operations on the ARM architecture. Further details
     309are provided in later sections.
    117310 
Note: See TracChangeset for help on using the changeset viewer.