# Changeset 1099 for docs/PACT2011/03-research.tex

Ignore:
Timestamp:
Apr 8, 2011, 8:18:00 PM (8 years ago)
Message:

Re-write many sections.

File:
1 edited

### Legend:

Unmodified
 r1087 % Hence, one of the foci in our study is the manner in which straight line SIMD code influences energy usage. This section provides an overview of the SIMD-based parallel bitstream XML parsers, Parabix1 and Parabix2. A comprehensive study of Parabix2 can be found in the technical report Parallel Parsing with Bitstream Addition: An XML Case Study'' \cite{Cameron2010}. This section provides an overview of the SIMD-based parallel bit stream XML parsers, Parabix1 and Parabix2. A comprehensive study of Parabix2 can be found in the technical report Parallel Parsing with Bitstream Addition: An XML Case Study'' \cite{Cameron2010}. \subsection{Parabix1} % Our first generation parallel bitstream XML parser---Parabix1---uses employs a less conventional approach of SIMD technology to represent text in parallel bitstreams. Bits of each stream are in one-to-one-correspondence with the bytes of a character stream. A transposition step first transforms sequential byte stream data into eight basis bitstreams for the bits of each byte. At a high level, Parabix1 processes source XML in a functionally equivalent manner as a traditional processor. That is, Parabix1 moves sequentially through the source document, maintaining a single cursor position throughout the parsing process. Where Parabix1 differs from the traditional parser is that it scans for key markup characters using a series of basis bitstreams. A bitstream is simply a sequence of $0$s and $1$s, where there is one such bit in the bitstream for each character in a source data stream. A basis bitstream is a bitstream that consists of only transposed textual XML data. In other words, a source character consisting of $M$ bits can be represented with $M$ bitstreams and Parabix1 processes source XML in a functionally equivalent manner as that of many traditional recursive descent XML parsers. That is, Parabix1 moves sequentially through the source document, maintaining a single cursor position throughout the parsing process, and parsers recursively, depth first. Where Parabix1 differs from the traditional parser is that it scans for key markup characters using a series of basis bit streams. A bit stream is simply a sequence of $0$s and $1$s, where there is one such bit in the bit stream for each character in a source data stream. A basis bit stream is a bit stream that consists of only transposed textual XML data. In other words, a source character consisting of $M$ bits can be represented with $M$ bit streams and by utilizing $M$ SIMD registers of width $W$, it is possible to scan through $W$ characters in parallel. The register width $W$ varies between 64-bit for MMX, 128-bit for SSE, and 256-bit for AVX. Figure \ref{fig:BitstreamsExample} presents an example of how we represent 8-bit ASCII characters using eight bitstreams. $B_0 \ldots B_7$ are the individual bitstreams. The $0$ bits in the bitstreams are represented by periods, so that the $1$ bits stand out. Figure \ref{fig:BitstreamsExample} presents an example of the basis bit stream representation of 8-bit ASCII characters. $B_0 \ldots B_7$ are the individual bit streams. The $0$ bits in the bit streams are represented by periods to emphasize the $1$ bits. \begin{figure}[h] \end{tabular} \end{center} \caption{Parallel Bitstream Example} \caption{Example 8-bit ASCII Character Basis Bit Streams} \label{fig:BitstreamsExample} \end{figure} In order to represent the byte-oriented character data as parallel bitstreams, the source data is first loaded in sequential order and converted into its transposed representation through a series of packs, shifts, and bitwise operations. Using the SIMD capabilities of current commodity processors, this transposition of source data to bitstreams incurs an amortized overhead of about 1 CPU cycle per byte for transposition \cite{CameronHerdyLin2008}. When parsing, we need to consider multiple properties of characters at different stages during the process. Using the basis bitstreams, it is possible to combine them using bitwise logic in order to compute character-class bitstreams; that is, streams that identify the positions at which characters belonging to a specific character class occur. For example, the $j$-th character is an open angle bracket <' if and only if the $j$-th bit of $B_2, B_3, B_4, B_5 =$ 1 and the $j$-th bit of $B_0, B_1, B_6, B_7 =$ 0. Once these character-class bitstreams are created, a {\em bit scan} operation, which is an 1-cycle intrinsic function for commodity processors, can be used for sequential markup scanning and data validation operations. A common operation in all XML parsers is start tag validation. Starts tags begin with <' and end with either />'' or >'' (depending whether the element tag is an empty element tag or not, respectively). To transform byte-oriented character data to parallel bit stream representation, source data is first loaded into SIMD registered in sequential order. It is then converted to the transposed basis bit stream representation through a series of packs, shifts, and logical bitwise operations. Using the SIMD capabilities of current commodity processors, the transposition of source data to basis bit stream format incurs an amortized cost of approximately 1 cycle per byte \cite{CameronHerdyLin2008}. Throughout the XML parsing process we must identify significant XML characters. For example, the next opening angle bracket character <'. For this purpose, we combine the basis bit streams using bitwise logic and compute character-class bit streams. Character-class streams mark the positions of characters as a single $1$-bit if present, $0$ otherwise. Each bit position in the computed bit stream is in one-to-one correspondence to the source data byte positions. For example, the $j$-th character is an open angle bracket <' if and only if the $j$-th bit of $B_2, B_3, B_4, B_5 =$ 1 and the $j$-th bit of $B_0, B_1, B_6, B_7 =$ 0. Once generated, single cycle built-in {\em bit scan} operations are used to locate the positions significant XML character throughout parsing. A common operation in XML parsing is XML start tag validation. Starts tags begin with <' and end with either />'' or >'' (depending whether the element tag is an empty element tag or not, respectively). Figure \ref{fig:Parabix1StarttagExample} demonstrates the concept of start tag validation as performed in Parabix1 using character-class streams together with the processor built-in bit scan operation. The first bit stream $M_0$ is created and the parser begins scanning the source data for an open angle bracket <', starting at position 1. Since the source data begins with <', $M_0$ is assigned a cursor position of 1. The $advance$ operation then then shifts the $M_0$'s cursor position by 1, resulting in the creation of a new bit stream, $M_1$, with the cursor position at 2. The following $bitscan$ operation takes the cursor position from $M_1$ and sequentially scans every position until it locates either an >'. It finds a >' at position 4 and returns that as the new cursor position for $M_2$. Calculating $M_3$ advances the cursor again, and the $bitscan$ used to create $M_4$ locates the new opening angle bracket. This process continues in sequence until until all start tags are validated. Unlike traditional parsers, these sequential operations are accelerated significantly since the bit scan operation can skip up to $w$ positions, where $w$ is the processor word width in bits. This approach has recently been applied to Unicode transcoding and XML parsing to good effect, with research prototypes showing substantial speed-ups over even the best of byte-at-a-time alternatives \cite{CameronHerdyLin2008, Herdy2008, Cameron2009}. \begin{figure}[h] \end{figure} Figure \ref{fig:Parabix1StarttagExample} demonstrates the concept of start tag validation as performed in Parabix1. The first marker stream $M_0$ is created and the parser begins scanning the source data for an open angle bracket <', starting at position 1. Since the source data begins with <', $M_0$ is assigned a cursor position of 1. The $advance$ operation then then shifts the $M_0$'s cursor position by 1, resulting in the creation of a new marker stream, $M_1$, with the cursor position at 2. The following $bitscan$ operation takes the cursor position from $M_1$ and sequentially scans every position until it locates either an >'. It finds a >' at position 4 and returns that as the new cursor position for $M_2$. Calculating $M_3$ advances the cursor again, and the $bitscan$ used to create $M_4$ locates the new opening angle bracket. This process continues until in this manner until all start tags are validated. Unlike traditional parsers, these sequential operations are accelerated significantly since the bit-scan operation can skip up to $w$ positions, where $w$ is the processor word width in bits. This approach has recently been applied to Unicode transcoding and XML parsing to good effect, with research prototypes showing substantial speed-ups over even the best of byte-at-a-time alternatives \cite{CameronHerdyLin2008, Herdy2008, Cameron2009}. \subsection{Parabix2} In Parabix2, we replaced the sequential single-cursor parsing using bit scan instructions with a parallel parsing method using bitstream addition. In Parabix2, the sequential single-cursor parsing approach using bit scan instructions is replaced by a parallel parsing approach, using multiple cursors when possible, and bit stream addition operations to advance cursor positions. Unlike the single-cursor approach of Parabix1 (and conceptually of all sequential XML parsers), Parabix2 processes multiple cursors in parallel. For example, using the source data from Figure \ref{fig:Parabix1StarttagExample}, Figure \ref{fig:Parabix2StarttagExample} shows how Parabix2 identifies and moves each of the start tag markers forwards to the corresponding end tag. Unlike Parabix1, Parabix2 begins scanning by creating two character-class marker streams, $N$, denoting the position of every alpha numeric character within the basis stream, and $M_0$, marking the position of every potential start tag in the bitstream. $M_0$ is then advanced to create $M_1$, which is fed into the first $scanto$ operation along with $N$.  To handle variable length tag names, the $scanto$ operation effectively locates the cursor positions of the end tags in parallel by adding $M_1$ to $N$, and using the bitwise AND operation of the negation of $N$ to find only the true end tags of $M_1$. Because and end tag may end on an /' or '>', $scanto$ is called again to advance any cursor from /' to >'. For additional details, see the technical report \cite{Cameron2010}. Figure \ref{fig:Parabix1StarttagExample}, Figure \ref{fig:Parabix2StarttagExample} shows how Parabix2 identifies and advances each of the start tag bit streams. Unlike Parabix1, Parabix2 begins scanning by creating two character-class bit streams, $N$, denoting the position of every alpha numeric character within the basis stream, and $M_0$, marking the position of every potential start tag in the bit stream. $M_0$ is advanced to create $M_1$, which is fed into the first $scanto$ operation along with $N$.  To handle variable length tag names, the $scanto$ operation effectively locates the cursor positions of the end tags in parallel by adding $M_1$ to $N$, and uses the bitwise AND operation of the negation of $N$ to find only the true end tags of $M_1$. Because an end tag may end on an /' or '>', $scanto$ is called again to advance any cursor from /' to `>'. For additional details, refer to the technical report \cite{Cameron2010}. \end{figure} In general, the set of bit positions in a marker bitstream may be considered to be the current parsing positions of multiple parses taking place in parallel throughout the source data stream. A further aspect of the parallel method is that conditional branch statements used to identify syntax error at each each parsing position are eliminated. Although we do not show it in the prior examples, error bitstreams can be used to identify any well-formedness errors found during the parsing process. Error positions are gathered and processed in as a final post processing step. Hence, Parabix2 offers additional parallelism over Parabix1 in the form of multiple cursor parsing as well as further reducing branch misprediction penalties. In general, the set of bit positions in a bit stream may be considered to be the current parsing positions of multiple parses taking place in parallel throughout the source data stream. Although it is not explicitly shown in these prior examples, error bit streams can be used to identify any well-formedness errors found during the parsing process. Error positions are gathered and processed in as a final post processing step. A further aspect of the parallel cursor method with bit stream addition is that the conditional branch statements used to identify syntax error at each each parsing position are eliminated. Hence, Parabix2 offers additional parallelism over Parabix1 in the form of multiple cursor parsing as well as further reducing branch misprediction penalties.