# Changeset 1393 for docs/HPCA2012/03-research.tex

Ignore:
Timestamp:
Aug 30, 2011, 10:47:59 AM (8 years ago)
Message:

Minor bug fixes up to 04

File:
1 edited

### Legend:

Unmodified
 r1384 \label{section:parabix} This section presents an overview of the SIMD-based parallel bit stream text processing framework, \emph{Parabix}. The framework has three components: a unifying architectural view of text processing in terms of parallel bit streams, a tool chain for automating the generation of parallel bit stream code from higher-level specifications, and a run-time environment providing a portable SIMD programming abstraction, independent of the specific facilities available on particular target architectures. This section presents an overview of the SIMD-based parallel bit stream text processing framework, \emph{Parabix}.  The framework has three components: a unifying architectural view of text processing in terms of parallel bit streams, a tool chain for automating the generation of parallel bit stream code from higher-level specifications, and a run-time environment providing a portable SIMD programming abstraction, independent of the specific facilities available on particular target architectures. \subsection{Parallel Bit Streams} The fundamental difference between the Parabix framework and traditional text processing models is in how Parabix represents the source data.   Given a traditional byte-oriented text stream, Parabix first transposes the text data to a transform domain consisting of 8 parallel bit streams, known as {\em basis bit streams}.  In essence, each basis bit stream $b_{k}$ represents the stream of $k$-th bit of each byte in the source text.  That is, the $k$-th bit of $i$-th byte in the source text is in the $i$-th (bit) position of the $k$-th basis bit stream, $b_{k}$. For example, in Figure \ref{fig:BitstreamsExample}, we show how the ASCII string {\ttfamily b7\verb<A}'' is represented as 8 basis bit streams, $\tt b_{0 \ldots 7}$. The bits used to construct $\tt b_7$ have been highlighted in this example. The fundamental difference between the Parabix framework and traditional text processing models is in how Parabix represents the source data.  Given a traditional byte-oriented text stream, Parabix first transposes the text data to a transform domain consisting of 8 parallel bit streams, known as {\em basis bit streams}.  In essence, each basis bit stream $b_{k}$ represents the stream of $k$-th bit of each byte in the source text.  That is, the $k$-th bit of $i$-th byte in the source text is in the $i$-th (bit) position of the $k$-th basis bit stream, $b_{k}$.  For example, in Figure\ref{fig:BitstreamsExample}, we show how the ASCII string {\ttfamily b7\verb<A}'' is represented as 8 basis bit streams, $\tt b_{0 \ldots 7}$. The bits used to construct $\tt b_7$ have been highlighted in this example. \begin{figure}[h] The advantage of the parallel bit stream representation is that we can use the 128-bit SIMD registers commonly found on commodity processors (e.g. SSE on Intel) to process 128 byte positions at a time using bitwise logic, shifting and other operations. use the 128-bit SIMD registers commonly found on commodity processors (e.g. SSE on Intel) to process 128 byte positions at a time using bitwise logic, shifting and other operations. Just as forward and inverse Fourier transforms are used to transform between the time and frequency domains in signal processing, bit stream transposition and inverse transposition provides byte space'' and bit space'' views of text.  The goal of the Parabix framework is to support efficient text processing using these two equivalent representations in the same way that efficient signal processing benefits from the use of the frequency domain in some cases and the time domain in others. between the time and frequency domains in signal processing, bit stream transposition and inverse transposition provides byte space'' and bit space'' views of text.  The goal of the Parabix framework is to support efficient text processing using these two equivalent representations in the same way that efficient signal processing benefits from the use of the frequency domain in some cases and the time domain in others. In the Parabix framework, basis bit streams are used as the starting point to determine other bit streams.   In particular, Parabix uses the basis bit streams to construct \emph{character-class bit streams} in which each $\tt 1$ bit indicates the presense of a significant character (or class of characters) in the parsing process. Character-class bit streams may then be used to compute \emph{lexical bit streams} and \emph{error bit streams}, which Parabix uses to process and validate the source document. The remainder of this section will discuss each type of bit stream. point to determine other bit streams.  In particular, Parabix uses the basis bit streams to construct \emph{character-class bit streams} in which each $\tt 1$ bit indicates the presense of a significant character (or class of characters) in the parsing process. Character-class bit streams may then be used to compute \emph{lexical bit streams} and \emph{error bit streams}, which Parabix uses to process and validate the source document. The remainder of this section will discuss each type of bit stream. {\bf Basis Bit Streams:} so that Parabix can efficiently produce the character-class bit streams. Using the SIMD capabilities of current commodity processors, the transposition process incurs an amortized cost of approximately 1 cycle per byte \cite{CameronHerdyLin2008}. of approximately 1 cycle per byte. %The size of $k$ is dependent on the code unit size of the text encoding format of the source document. A code unit is simply %a fixed number of bits (or bytes) used to represent a specific character (code point). Some encoding formats, such {\bf Character-class Bit Streams:} Typically, as text parsers process input data, they locate specific characters to determine if and when to transition between data and metadata parsing. Typically, as text parsers process input data, they locate specific characters to determine if and when to transition between data and metadata parsing. % For example, in a CSV file, any ,' or \textbackslash n' can indicate the start of a new column or row respectively. For example, in XML, any opening angle bracket character, \verb<', may indicate that we are starting a new markup tag. Traditional byte-at-a-time parsers find these characters by comparing the value of each code point with a set of known code points and branching appropriately when one is found, typically using an if or switch statement. Using this method to perform multiple transitions in parallel is non-trivial and may require fairly sophisticated algorithms to do so correctly. For example, in XML, any opening angle bracket character, \verb<', may indicate that we are starting a new markup tag.  Traditional byte-at-a-time parsers find these characters by comparing the value of each code point with a set of known code points and branching appropriately when one is found, typically using an if or switch statement.  Using this method to perform multiple transitions in parallel is non-trivial and may require fairly sophisticated algorithms to do so correctly. % However, a <' is legal within an XML comment so not every <' necessarily means that we are opening a new tag. Character-class bit streams allow us to perform up to 128 comparisons'' in parallel with a single operation by using a series of boolean-logic operations \footnote{$\land$, $\lor$ and $\lnot$ denote the boolean AND, OR and NOT operations.} to merge multiple basis bit streams into a single character-class stream that marks the positions of key characters with a $1$. For example, a character is an \verb<' if and only if $\lnot(b_ 0 \lor b_1) \land (b_2 \land b_3 \land b_4 \land b_5) \land \lnot (b_6 \lor b_7) = 1$. Classes of characters can be found with similar formulas. For example, a character is a number {\tt [0-9]} if and only if $\lnot(b_0 \lor b_1) \land (b_2 \land b_3) \land \lnot(b_4 \land (b_5 \lor b_6))$. An important observation here is that a range of characters can sometimes take fewer operations and require fewer basis bit streams to compute than individual characters. Finding optimal solutions to all character-classes is non-trivial and goes beyond the scope of this paper. Character-class bit streams allow us to perform up to 128 comparisons'' in parallel with a single operation by using a series of boolean-logic operations \footnote{$\land$, $\lor$ and $\lnot$ denote the boolean AND, OR and NOT operations.}  to merge multiple basis bit streams into a single character-class stream that marks the positions of key characters with a $1$.  For example, a character is an \verb<' if and only if $\lnot(b_ 0 \lor b_1) \land (b_2 \land b_3 \land b_4 \land b_5) \land \lnot (b_6 \lor b_7) = 1$.  Classes of characters can be found with similar formulas.  For example, a character is a number {\tt [0-9]} if and only if $\lnot(b_0 \lor b_1) \land (b_2 \land b_3) \land \lnot(b_4 \land (b_5 \lor b_6))$.  An important observation here is that a range of characters can sometimes take fewer operations and require fewer basis bit streams to compute than individual characters.  Finding optimal solutions to all character-classes is non-trivial and goes beyond the scope of this paper. % The advantage of character-class bit streams over traditional byte streams'' is that {\bf Lexical and Error Bit Streams:} To perform lexical analysis on the input data, Parabix computes lexical and error bit streams from the character-class bit streams using a mixture of both boolean logic and integer math. Lexical bit streams typically mark multiple current parsing positions. Unlike the single-cursor approach of traditional text parsers, these allow Parabix to process multiple cursors in parallel. Error bit streams are often the byproduct or derivative of computing lexical bit streams and can be used to identify any well-formedness issues found during the parsing process. The presense of a $\tt 1$ bit in an error stream tends to mean that the lexical stream cannot be trusted to be completely accurate and Parabix may need to perform some sequential parsing on that section to determine the cause and severity of the error. To form lexical bit streams, we have to introduce a few new operations: Advance and ScanThru. The $\texttt{Advance}$ operator accepts one input parameter, $c$, which is typically viewed as a bit stream containing multiple cursor bits, and advances each cursor one position forward.  On little-endian architectures, shifting forward means shifting to the right. $\texttt{ScanThru}$ accepts two input parameters, $c$ and $m$; any bit that is in both $c$ and $m$ is moved to first subsequent $\tt 0$-bit in $m$ by calculating $(c + m) \land \lnot m$. For example, in Figure \ref{fig:ParabixParsingExample} suppose we have the regular expression \verb<[a-zA-Z]+> and wish to find all instances of it in the source text. We begin by constructing the character classes $C_{0}$, which consists of all letters, $C_{1}$, which contains all \verb>'s, and $C_{2}$, which marks all \verb<'s. In $L_0$ the position of every \verb<' is advanced by one to locate the first character of each token. By computing $E_{0}$, the parser notes that \verb<>'' does not match the expected pattern. To find the end positions of each token, the parser calculates $L_{1}$ by moving the cursors in $L_0$ through the letter bits in $C_0$. $L_1$ is then validated to ensure that each token ends with a \verb>' and discovers that \verb and wish to find all instances of it in the source text.  We begin by constructing the character classes $C_{0}$, which consists of all letters, $C_{1}$, which contains all \verb>'s, and $C_{2}$, which marks all \verb<'s. In $L_0$ the position of every \verb<' is advanced by one to locate the first character of each token. By computing $E_{0}$, the parser notes that \verb<>'' does not match the expected pattern. To find the end positions of each token, the parser calculates $L_{1}$ by moving the cursors in $L_0$ through the letter bits in $C_0$. $L_1$ is then validated to ensure that each token ends with a \verb>' and discovers that \verb::f(a,b) denotes the general pattern for a vertical SIMD operation yielding an output SIMD vector $v$, given two input SIMD vectors $a$ and $b$. For each field $v_i$ of $v$, the value computed is $f(a_i, b_i)$. For example, given 128-bit SIMD vectors, \verbsimd<8>::add(a,b) represents the simultaneous addition of sixteen 8-bit fields. These operations were originally developed for 128-bit Altivec operations on Power PC as well as 64-bit MMX and 128-bit SSE operations on Intel but have recently extended to support the new 256-bit AVX operations on Intel as well as the 128-bit NEON operations on the ARM architecture. The Parabix architecture also includes run-time libraries that support a machine-independent view of basic SIMD operations, as well as a set of core function libraries.  For machine-independence, we program all operations using an abstract SIMD machine.  The abstract machine supports all power-of-2 field widths up to the full SIMD register width on a target machine.  Let $w = 2k$ be the field width in bits. Let $f$ be a basic binary operation defined on $w$-bit quantities producing an $w$-bit result. Let $W$ be the SIMD vector size in bits where $W = 2K$.  Then the C++ template notation \verbv=simd::f(a,b) denotes the general pattern for a vertical SIMD operation yielding an output SIMD vector $v$, given two input SIMD vectors $a$ and $b$. For each field $v_i$ of $v$, the value computed is $f(a_i, b_i)$.  For example, given 128-bit SIMD vectors, \verbsimd<8>::add(a,b) represents the simultaneous addition of sixteen 8-bit fields. These operations were originally developed for 128-bit Altivec operations on Power PC as well as 64-bit MMX and 128-bit SSE operations on Intel but have recently extended to support the new 256-bit AVX operations on Intel as well as the 128-bit NEON operations on the ARM architecture.