Changeset 1043 for docs


Ignore:
Timestamp:
Mar 25, 2011, 8:45:18 PM (8 years ago)
Author:
ksherdy
Message:

Edits.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/PACT2011/03-research.tex

    r1031 r1043  
    3232% Hence, one of the foci in our study is the manner in which straight line SIMD code influences energy usage.
    3333
     34This section provides an overview of the SIMD-based parallel bitstream XML parsers, Parabix1 and Parabix2. A comprehensive study of Parabix2 can be found in the technical report ``Parallel Parsing with Bitstream Addition: An XML Case Study'' \cite{Cameron2010}.
    3435
    3536\subsection{Parabix1}
     
    4243by utilizing $M$ SIMD registers of width $W$, it is possible to scan through $W$ characters in parallel.
    4344The register width $W$ varies between 64-bit for MMX, 128-bit for SSE, and 256-bit for AVX.
    44 Figure \ref{fig:inputstreams} presents an example of how we represent 8-bit ASCII characters using eight bitstreams. $B_0 \ldots B_7$ are the individual bitstreams. The $0$ bits in the bitstreams are represented by periods, so that the $1$ bits stand out.
     45Figure \ref{fig:BitstreamsExample} presents an example of how we represent 8-bit ASCII characters using eight bitstreams. $B_0 \ldots B_7$ are the individual bitstreams. The $0$ bits in the bitstreams are represented by periods, so that the $1$ bits stand out.
    4546
    4647\begin{figure}[h]
    4748\begin{center}
    4849\begin{tabular}{cr}\\
    49 source data  & \verb`<t1>abc</t1><t2/>`\\
    50 $B_0$ & \verb`..1.1.1.1.1....1.`\\
    51 $B_1$ & \verb`...1.11.1..1..111`\\
    52 $B_2$ & \verb`11.1...111.111.11`\\
    53 $B_3$ & \verb`1..1...11..11..11`\\
    54 $B_4$ & \verb`1111...1.111111.1`\\
    55 $B_5$ & \verb`11111111111111111`\\
    56 $B_6$ & \verb`.1..111..1...1...`\\
    57 $B_7$ & \verb`.................`\\
     50source data & \verb`<t1>abc</t1><tag2/>`\\
     51$B_0$ & \verb`..1.1.1.1.1...11.1.`\\
     52$B_1$ & \verb`...1.11.1..1...1111`\\
     53$B_2$ & \verb`11.1...111.111.1.11`\\
     54$B_3$ & \verb`1..1...11..11....11`\\
     55$B_4$ & \verb`1111...1.11111..1.1`\\
     56$B_5$ & \verb`1111111111111111111`\\
     57$B_6$ & \verb`.1..111..1...111...`\\
     58$B_7$ & \verb`...................`\\
    5859\end{tabular}
    5960\end{center}
    6061\caption{Parallel Bitstream Example}
    61 \label{fig:inputstreams}
     62\label{fig:BitstreamsExample}
    6263\end{figure}
    6364
    6465In order represent the byte-oriented character data as parallel bitstreams, the source data is first loaded in sequential order and converted into its transposed representation through a series of packs, shifts, and bitwise operations.
    65 Using the SIMD capabilities of current commodity processors, this transposition of source data to bitstreams incurs an amortized overhead of about 1 CPU cycle per byte for transposition \cite{CameronHerdyLin2008}. When parsing, we need to consider multiple properties of characters at different stages during the process. Using the basis bitstreams, it is possible to combine them using bitwise logic in order to compute character-class bitstreams;t hat is, streams that identify the positions at which characters belonging to a specific character class occur. For example, a ASCII character is an open angle bracket `<' if and only if $B_2 \land \ldots \land B_5 =$ 1 and the other basis bitstreams are 0 at the same position within the basis bitstreams. Once these character-class bitstreams are created, bit-scan operations, common to commodity processors, can be used for sequential markup scanning and data validation operations. A common operation in all XML parsers is identifying the start tags (`<') and their accompanying end tags (either ``/>'' or ``>'' depending whether the element tag is an empty element tag or not, respectively).
     66Using the SIMD capabilities of current commodity processors, this transposition of source data to bitstreams incurs an amortized overhead of about 1 CPU cycle per byte for transposition \cite{CameronHerdyLin2008}. When parsing, we need to consider multiple properties of characters at different stages during the process. Using the basis bitstreams, it is possible to combine them using bitwise logic in order to compute character-class bitstreams; that is, streams that identify the positions at which characters belonging to a specific character class occur. For example, the $j$-th character is an open angle bracket `<' if and only if the $j$-th bit of $B_2, B_3, B_4, B_5 =$ 1 and the $j$-th bit of $B_0, B_1, B_6, B_7 =$ 0. Once these character-class bitstreams are created, a $bitscan$ operation, which is an 1-cycle intrinsic function for commodity processors, can be used for sequential markup scanning and data validation operations. A common operation in all XML parsers is identifying the start tags (`<') and their accompanying end tags (either ``/>'' or ``>'' depending whether the element tag is an empty element tag or not, respectively).
    6667
    6768\begin{figure}[h]
    6869\begin{center}
    6970\begin{tabular}{lr}\\
    70 source data  & \verb`<t1>abc<t1/><t2/>`\\
    71 % $N = $ name chars & \verb`.11.111.11...11..`\\
    72 $M_0 = 1$ & \verb`1................`\\
    73 $M_1 = advance(M_0)$ & \verb`.1...............`\\
    74 $M_2 = bitscan('>')$ & \verb`...1.............`\\
    75 $M_3 = advance(M_2)$ & \verb`....1............`\\
    76 $M_4 = bitscan('<')$ & \verb`.......1.........`\\
    77 $M_5 = bitscan('/')$ & \verb`..........1......`\\
    78 $M_6 = advance(M_5)$ & \verb`...........1.....`\\
    79 $M_7 = bitscan('<')$ & \verb`.............1...`\\
    80 $M_8 = bitscan('/')$ & \verb`...............1.`\\
    81 $M_9 = advance(M_8)$ & \verb`................1`\\
    82 % $M_2 \lor M_6 \lor M_9$    & \verb`...1.......1....1`\\
     71source data                                     & \verb`<t1>abc<t1/><tag2/>`\\
     72% $N = $ name chars                             & \verb`.11.111.11...11..`\\
     73$M_0 = bitscan(1, \texttt{<})$                  & \verb`1..................`\\
     74$M_1 = advance(M_0)$                            & \verb`.1.................`\\
     75$M_2 = bitscan(M_1, \texttt{/}, \texttt{>})$    & \verb`...1...............`\\
     76$M_3 = advance(M_2)$                            & \verb`....1..............`\\
     77$M_4 = bitscan(M_3, \texttt{<})$                & \verb`.......1...........`\\
     78$M_5 = bitscan(M_4, \texttt{/}, \texttt{>})$    & \verb`..........1........`\\
     79$M_6 = advance(M_5)$                            & \verb`...........1.......`\\
     80$M_7 = bitscan(M_6, \texttt{<})$                & \verb`.............1.....`\\
     81$M_8 = bitscan(M_7, \texttt{/}, \texttt{>})$    & \verb`.................1.`\\
     82$M_9 = advance(M_8)$                            & \verb`..................1`\\
    8383\end{tabular}
    8484\end{center}
     
    8787\end{figure}
    8888
     89In Figure \ref{fig:Parabix1StarttagExample}, the first marker stream $M_0$ is created and the parser begins scanning the source data for an open angle bracket `<', starting at position 1. Since the source data begins with `<', $M_0$ is assigned a cursor position of 1. The $advance$ operation then then shifts the $M_0$'s cursor position by 1, resulting in the creation of a new marker stream, $M_1$, with the cursor position at 2. The following $bitscan$ operation takes the cursor position from $M_1$ and sequentially scans every position until it locates either an `/' or `>'. It finds a `>' at position 4 and returns that as the new cursor position for $M_2$. Calculating $M_3$ advances the cursor again, and the $bitscan$ used to create $M_4$ locates the new opening angle bracket. This process continues until every open and closing angle bracket is located within the basis stream, resulting in a process that requires 3 logical operations to find each individual start and end tag pair.
    8990
    90 Unlike traditional parsers, these sequential operations are accelerated significantly since bit scan operations can perform up to $W$ finite state transitions per clock cycle. This approach has recently been applied to Unicode transcoding and XML parsing to good effect, with research prototypes showing substantial speed-ups over even the best of byte-at-a-time alternatives \cite{CameronHerdyLin2008, CameronLin2009, Cameron2010}.
    91 
    92 
    93 
    94 
    95 % In section 3, we should try to explain a bit more detail of the
    96 % operation.   Under Parabix 1, a little bit on transposition
    97 % and calculation of the [<] bitstream would be good, perhaps
    98 % using the examples from the 2010 Technical Report or EuroPar submission.
    99 
     91Unlike traditional parsers, these sequential operations are accelerated significantly since the bitscan operation can perform up to $W$ finite state transitions per clock cycle. This approach has recently been applied to Unicode transcoding and XML parsing to good effect, with research prototypes showing substantial speed-ups over even the best of byte-at-a-time alternatives \cite{CameronHerdyLin2008, CameronLin2009, Cameron2010}.
    10092
    10193\subsection{Parabix2}
    10294
    103 % Under Parabix 2 a little discussion of bitwise addition for
    104 % scanning, perhaps again excerpted from the TR/EuroPar submission
    105 % would be good.
     95In Parabix2, we replaced the sequential single-cursor parsing using bit scan instructions with a parallel parsing method using bitstream addition. Unlike the single-cursor approach of Parabix1 (and conceptually of all sequential XML parsers), Parabix2 processes multiple cursors in parallel. For example, using the source data from Figure \ref{fig:Parabix1StarttagExample}, Figure \ref{fig:Parabix2StarttagExample} shows how Parabix2 identifies and moves each of the start tag markers forwards to the corresponding end tag. Unlike Parabix1, Parabix2 begins scanning by creating two character-class marker streams, $N$, denoting the position of every alpha numeric character within the basis stream, and $M_0$, marking the position of every potential start tag in the bitstream. $M_0$ is then advanced to create $M_1$, which is fed into the first $scanto$ operation along with $N$.  To handle variable length tag names, the $scanto$ operation effectively locates the cursor positions of the end tags in parallel by adding $M_1$ to $N$, and using the bitwise AND operation of the negation of $N$ to find only the true end tags of $M_1$. Because and end tag may end on an `/' or '>', $scanto$ is called again to advance any cursor from `/' to `>'. For additional details, please refer to technical report ``Parallel Parsing with Bitstream Addition: An XML Case Study'' \cite{Cameron2010}.
    10696
    107 %In Parabix2, we replace the sequential single-cursor parsing using bit scan instructions with a parallel parsing method using bitstream addition. Unlike the single cursor approach of Parabix1 and conceptually of traditional sequential approach, in Parabix2 multiple cursors positions are processed in parallel.
    108 
    109 In Parabix2, we replace the sequential single-cursor parsing using bit scan
    110 instructions with a parallel parsing method using bitstream addition.
    111 Unlike the single-cursor approach of Parabix1 (and conceptually of all sequential XML parsers),
    112 Parabix2 processes multiple cursors in parallel. For example, using the source data from
    113 Figure \ref{fig:Parabix1StarttagExample}, Figure \ref{fig:Parabix2StarttagExample} shows how
    114 Parabix2 identifies and moves each of the start tag markers forwards to the corresponding end tag.
    115 %Like Parabix1, we assume that $N$ (the name chars) has been computed using the basis bitstreams and that
    11697
    11798\begin{figure}[h]
    11899\begin{center}
    119100\begin{tabular}{lr}\\
    120 source data  & \verb`<t1>abc<t1/><t2/>`\\
    121 $N = $ name chars & \verb`.11.111.11...11..`\\
    122 $M_0 = [<]$ & \verb`1......1....1....`\\
    123 $M_1 = \texttt{advance}(M_0)$ & \verb`.1......1....1...`\\
    124 $M_2 = \texttt{scanto}('/','>')$ & \verb`...1......1....1.`\\
    125 $M_3 = \texttt{scanto}(>)$ & \verb`...1.......1....1`
     101source data                     & \verb`<t1>abc<t1/><tag2/>`\\
     102$N = $ alphanumerics            & \verb`.11.111.11...1111..`\\
     103$M_0 = \texttt{[<]}$            & \verb`1......1....1......`\\
     104$M_1 = advance(M_0)$            & \verb`.1......1....1.....`\\
     105$M_2 = scanto(M_1, N)$          & \verb`...1......1......1.`\\
     106$M_3 = scanto(M_2, N)$          & \verb`...1.......1......1`
    126107\end{tabular}
    127108\end{center}
Note: See TracChangeset for help on using the changeset viewer.