# Changeset 1374 for docs/HPCA2012/03-research.tex

Ignore:
Timestamp:
Aug 24, 2011, 8:36:57 PM (8 years ago)
Message:

Section 3 and other fixes

File:
1 edited

### Legend:

Unmodified
 r1368 \section{Parabix} \section{The Parabix Framework} \label{section:parabix} % Hence, one of the foci in our study is the manner in which straight line SIMD code influences energy usage. This section presents an overview of the SIMD-based parallel bit stream text processing framework, \emph{Parabix} In Section {\bf 4}, we discuss one of its implementations, \emph{Parabix-XML}. A more comprehensive study of Parabix and Parabix-XML, can be found in the technical report Parallel Parsing with Bitstream Addition: An XML Case Study'' \cite{Cameron2010}. The fundemental difference between the Parabix framework and traditional text processing models is in how Parabix represents the source data. Rather than viewing it as a stream of bytes or characters and performing per-byte comparisons to determine when to transition between parsing states, Parabix views data as a set of \emph{bit streams} and uses a mixture of boolean-logic\footnote{In this paper, we use the notation $\land$, $\lor$ and $\lnot$ to denote the AND, OR and NOT boolean operations.} and integer-based math on those streams to effectively parse many bytes in parallel. Bit streams are the foundation of the Parabix technology. In order to understand how it is possible to perform complex state-transition-based text processing operations in parallel, understanding what bit streams are, how they are created, and --- more importantly --- how they are used by Parabix, is critical. % Our first generation parallel bitstream XML parser---Parabix1---uses employs a less conventional approach of % SIMD technology to represent text in parallel bitstreams. Bits of each stream are in one-to-one-correspondence % with the bytes of a character stream. A transposition step first transforms sequential byte stream data into % eight basis bitstreams for the bits of each byte. % Parabix1 is functionally equivalent to a traditional XML parser. % That is, Parabix1 moves a single cursor sequentially through the source document and % parses it to distinguish between (and properly validate) the markup and content. % Where Parabix1 differs from traditional parsers is that instead of processing a document a byte-at-a-time, it % transforms the document into a set of \emph{bit streams} (explained shortly) and then performs boolean-logic operations % on those bit streams to process many bytes in parallel. %Each $1$-bit marks the postion of each key character in the %corresponding source data stream. A single stream is generated for each of the key markup characters. \subsection{What are Bit Streams?} A bit stream is simply a sequence of $\tt 1$s and $\tt 0$s. The significance of each bit value is dependent on the type of bit stream. We view bit streams in one of two ways: $n$ 1-bit values for boolean-logic operations and 1 $n$-bit value for integer-based math. For simplicity, assume that $n$ is infinitely long (i.e., unbounded) w.r.t. the size of the source data. In reality, each bit stream is divided into a series of $w$-bit blocks, where $w$ is equal to the SIMD register width of the system (e.g., 64, 128, or 256). We discuss how these $\lceil \frac{n}{w} \rceil$ bit blocks can be viewed as an unbounded bit stream in Section \ref{parabix tool chain}. The first type of bit streams used in Parabix are refered to as \emph{basis bit streams}, which contain the input source data. Parabix uses the basis bit streams to construct \emph{character-class bit streams} in which each $\tt 1$ bit indicates the presense of a significant character (or class of characters) in the parsing process. Character-class bit streams are used to compute \emph{lexical bit streams} and \emph{error bit streams}, which Parabix uses to process and validate the source document. The remainder of this section will discuss each type of bit stream. {\bf Basis Bit Streams:} To construct the basis bit streams, the source data is first loaded in sequential order and then transposed --- through a series of SIMD pack, shift, and bitwise operations --- so that Parabix can efficiently produce the character-class bit streams. Essentially, when the source data is in basis bit stream form, the $k$-th bit of $i$-th character in the source text is in the $i$-th (bit) position of the $k$-th basis bit stream, $b_{k}$. Using the SIMD capabilities of current commodity processors, the transposition process incurs an amortized cost of approximately 1 cycle per byte \cite{CameronHerdyLin2008}. The size of $k$ is dependent on the code unit size of the text encoding format of the source document. A code unit is simply a fixed number of bits (or bytes) used to represent a specific character (code point). Some encoding formats, such as UNICODE, may use a multiple code units to express all of its possible code points. How multi-code-unit characters can be parsed efficiently goes beyond the scope of this paper. The most dominant format in data-oriented XML documents is ASCII, which uses a 8-bit code unit to represent all of the 128 code points within it. In Figure \ref{fig:BitstreamsExample}, we show how the ASCII string {\ttfamily b7\verb<A}'' is represented as 8 basis bit This section presents an overview of the SIMD-based parallel bit stream text processing framework, \emph{Parabix}. The framework has three components: a unifying architectural view of text processing in terms of parallel bit streams, a tool chain for automating the generation of parallel bit stream code from higher-level specifications, and a run-time environment providing a portable SIMD programming abstraction, independent of the specific facilities available on particular target architectures. \subsection{Parallel Bit Streams} The fundamental difference between the Parabix framework and traditional text processing models is in how Parabix represents the source data.   Given a traditional byte-oriented text stream, Parabix first transposes the text data to a transform domain consisting of 8 parallel bit streams, known as {\em basis bit streams}.  In essence, each basis bit stream $b_{k}$ represents the stream of $k$th bit of each byte in the source text.  That is, the $k$th bit of $i$th byte in the source text is in the $i$th (bit) position of the $k$th basis bit stream, $b_{k}$. For example, in Figure \ref{fig:BitstreamsExample}, we show how the ASCII string {\ttfamily b7\verb<A}'' is represented as 8 basis bit streams, $\tt b_{0 \ldots 7}$. The bits used to construct $\tt b_7$ have been highlighted in this example. \end{figure} The advantage of the parallel bit stream representation is that we can use the 128-bit SIMD registers commonly found on commodity processors (e.g. SSE on Intel) to process 128 byte positions at a time using bitwise logic, shifting and other operations. Just as forward and inverse Fourier transforms are used to transform between the time and frequency domains in signal processing, bit stream transposition and inverse transposition provides byte space'' and bit space'' views of text.  The goal of the Parabix framework is to support efficient text processing using these two equivalent representations in the same way that efficient signal processing benefits from the use of the frequency domain in some cases and the time domain in others. In the Parabix framework, basis bit streams are used as the starting point to determine other bit streams.   In particular, Parabix uses the basis bit streams to construct \emph{character-class bit streams} in which each $\tt 1$ bit indicates the presense of a significant character (or class of characters) in the parsing process. Character-class bit streams may then be used to compute \emph{lexical bit streams} and \emph{error bit streams}, which Parabix uses to process and validate the source document. The remainder of this section will discuss each type of bit stream. {\bf Basis Bit Streams:} To construct the basis bit streams, the source data is first loaded in sequential order and then transposed --- through a series of SIMD pack, shift, and bitwise operations --- so that Parabix can efficiently produce the character-class bit streams. Using the SIMD capabilities of current commodity processors, the transposition process incurs an amortized cost of approximately 1 cycle per byte \cite{CameronHerdyLin2008}. %The size of $k$ is dependent on the code unit size of the text encoding format of the source document. A code unit is simply %a fixed number of bits (or bytes) used to represent a specific character (code point). Some encoding formats, such %as UNICODE, may use a multiple code units to express all of its possible code points. %How multi-code-unit characters can be parsed efficiently goes beyond the scope of this paper. %The most dominant format in data-oriented XML documents is ASCII, which uses a 8-bit code unit to represent all of the %128 code points within it. % In Parabix1, basis bit streams are used to generate \ref{lexical bit streams} for key markup characters. To form lexical bit streams, we have to introduce a few new operations: Advance and ScanThru. The $\texttt{Advance}$ operator accepts one input parameter, $c$, which is typically viewed as a bit stream containing multiple cursor bits, and returns $c + c$ --- effectively moves each cursor one position to the right''. and advances each cursor one position forward.  On little-endian architectures, shifting forward means shifting to the right. $\texttt{ScanThru}$ accepts two input parameters, $c$ and $m$; any bit that is in both $c$ and $m$ is moved to first subsequent $\tt 0$-bit in $m$ by calculating $(c + m) \land \lnot m$. $\tt 0$-bit in $m$ by calculating $(c + m) \land \lnot m$. For example, in Figure \ref{fig:ParabixParsingExample} suppose we have the regular expression \verb<[a-zA-Z]+> and wish to find all instances of it in the source text. {\tt Advance} and {\tt ScanThru}. For example, we can translate the simple parsing example of \ref{fig:ParabixParsingExample} above into Pablo code to produce the output as shown in Figure \ref{fig:Pablo}. In this example, Pablo has the primary responsibility of inserting carry variable declarations that allow the results of Advance and ScanThru operations to be carried over from block to block.  Explaining the full details of the translation is beyond the scope of this paper, however. \begin{figure}[h] \begin{center} \begin{tabular}{r l} \ttfamily{\bfseries INPUT:} & \verbdef parse_tags(classes, errors): \\ & \verb        classes.C0 = Alpha \\ & \verb        classes.C1 = Rangle \\ & \verb        classes.C2 = Langle \\ & \verb        L0 = bitutil.Advance(C2) \\ & \verb        errors.E0 = L0 &~ C0 \\ & \verb        L1 = bitutil.ScanThru(L0, C0) \\ & \verb        errors.E1 = L1 &~ C1 \\ \\ \ttfamily{\bfseries OUTPUT:} & \verbstruct Parse_tags { \\ & \verb  Parse_tags() {  \\ & \verb  CarryInit(carryQ, 2); } \\ & \verb  void do_block(Classes & classes, Errors & errors) { \\ & \verb                BitBlock L0, L1; \\ & \verb        classes.C0 = Alpha; \\ & \verb        classes.C1 = Rangle; \\ & \verb        classes.C2 = Langle; \\ & \verb        L0 = BitBlock_advance_ci_co(C2, carryQ, 0); \\ & \verb        errors.E0 = simd_andc(L0, C0); \\ & \verb        L1 = BitBlock_scanthru_ci_co(L0, C0, carryQ, 1); \\ & \verb        errors.E1 = simd_andc(L1, C1); \\ & \verb        CarryQ_Adjust(carryQ, 2); \\ & \verb  } \\ & \verb  CarryDeclare(carryQ, 2); \\ & \verb  }; \\ \end{tabular} \end{center} \caption{Parallel Block Compiler (Pablo) Input/Output} \label{fig:Pablo} \end{figure} but have recently extended to support the new 256-bit AVX operations on Intel as well as the 128-bit NEON operations on the ARM architecture. Further details are provided in later sections. NEON operations on the ARM architecture.