Changeset 1397 for docs


Ignore:
Timestamp:
Aug 30, 2011, 6:45:58 PM (8 years ago)
Author:
ksherdy
Message:

edits

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/HPCA2012/03-research.tex

    r1393 r1397  
    1313
    1414\subsection{Parallel Bit Streams}
     15
    1516
    1617The fundamental difference between the Parabix framework and
     
    4849
    4950\caption{Example 7-bit ASCII Basis Bit Streams}
    50 \label{fig:BitstreamsExample}
     51\label{fig:bit streamsExample}
    5152\end{figure}
    5253
     
    7475process and validate the source document. The remainder of this
    7576section will discuss each type of bit stream.
     77
    7678
    7779{\bf Basis Bit Streams:}
     
    9294% are defined as the set of bit streams that represent the transposed data format of the source XML byte data. In other
    9395% words, $M$-bit source characters are represented in transposed representation using $M$ basis bit streams.
    94 % Figure \ref{fig:BitstreamsExample} presents an example of the basis bit stream representation of 8-bit
     96% Figure \ref{fig:bit streamsExample} presents an example of the basis bit stream representation of 8-bit
    9597% ASCII characters. $B_0 \ldots B_7$ are the individual bit streams. The $0$ bits in the bit streams are represented
    9698% by periods as to emphasize the $1$ bits.
     
    101103metadata parsing.
    102104% For example, in a CSV file, any `,' or `\textbackslash n' can indicate the start of a new column or row respectively.
    103 For example, in XML, any opening angle bracket character, `\verb`<`',
    104 may indicate that we are starting a new markup tag.  Traditional
    105 byte-at-a-time parsers find these characters by comparing the value of
    106 each code point with a set of known code points and branching
    107 appropriately when one is found, typically using an if or switch
    108 statement.  Using this method to perform multiple transitions in
    109 parallel is non-trivial and may require fairly sophisticated
    110 algorithms to do so correctly.
     105For example, in XML, any opening angle bracket character, `\verb`<`', may indicate that we are starting a new markup tag.
     106Traditional byte-at-a-time parsers find these characters by comparing the value of each byte with a set
     107of known significant characters and branching appropriately when one is found, typically using an if or switch statement.
     108Using this method to perform multiple transitions in parallel is non-trivial and may require fairly sophisticated algorithms
     109to do so correctly.
     110
    111111% However, a `<' is legal within an XML comment so not every `<' necessarily means that we are opening a new tag.
    112112
     
    134134
    135135
    136 {\bf Lexical and Error Bit Streams:} To perform lexical analysis on
    137 the input data, Parabix computes lexical and error bit streams from
    138 the character-class bit streams using a mixture of both boolean logic
    139 and integer math. Lexical bit streams typically mark multiple current
    140 parsing positions.  Unlike the single-cursor approach of traditional
    141 text parsers, these allow Parabix to process multiple cursors in
    142 parallel.  Error bit streams are derived from the lexical bit streams
    143 and can be used to identify any well-formedness issues found during
    144 the parsing process. The presense of a $\tt 1$ bit in an error stream
    145 tends to mean that the lexical stream cannot be trusted to be
    146 completely accurate and Parabix may need to perform some sequential
    147 parsing on that section to determine the cause and severity of the
    148 error.
    149 
    150 To form lexical bit streams, we have to introduce a few new
    151 operations: Advance and ScanThru.  The $\texttt{Advance}$ operator
    152 accepts one input parameter, $c$, which is typically viewed as a bit
    153 stream containing multiple cursor bits, and advances each cursor one
    154 position forward.  On little-endian architectures, shifting forward
    155 means shifting to the right.  $\texttt{ScanThru}$ accepts two input
    156 parameters, $c$ and $m$; any bit that is in both $c$ and $m$ is moved
    157 to first subsequent $\tt 0$-bit in $m$ by calculating $(c + m) \land
    158 \lnot m$.  For example, in Figure \ref{fig:ParabixParsingExample}
    159 suppose we have the regular expression \verb`<[a-zA-Z]+>` and wish to
    160 find all instances of it in the source text.  We begin by constructing
    161 the character classes $C_{0}$, which consists of all letters, $C_{1}$,
    162 which contains all `\verb`>`'s, and $C_{2}$, which marks all
    163 `\verb`<`'s. In $L_0$ the position of every `\verb`<`' is advanced by
    164 one to locate the first character of each token. By computing $E_{0}$,
    165 the parser notes that ``\verb`<>`'' does not match the expected
    166 pattern. To find the end positions of each token, the parser
    167 calculates $L_{1}$ by moving the cursors in $L_0$ through the letter
    168 bits in $C_0$. $L_1$ is then validated to ensure that each token ends
    169 with a `\verb`>`' and discovers that ``\verb`<error]`'' too fails to
    170 match the expected pattern.  With additional post bit stream
    171 processing, the erroneous cursor positions in $L_{0}$ and $L_{1}$ can
    172 be removed or ignored; the details of which go beyond the scope of
    173 this paper.
     136{\bf Lexical and Error Bit Streams:}
     137To perform lexical analysis on the input data, Parabix computes lexical and error bit streams from the character-class bit streams using
     138a mixture of both boolean logic and integer math. Lexical bit streams typically mark multiple current parsing positions.
     139Unlike the single-cursor approach of traditional text parsers, these allow Parabix to process multiple cursors in parallel.
     140Error bit streams are often the byproduct or derivative of computing lexical bit streams and can be used to identify any well-formedness
     141issues found during the parsing process. The presense of a $\tt 1$ in an error stream indicates that the lexical stream cannot be
     142trusted to be completely accurate and it may be necessary to perform some sequential parsing on that section to determine the cause and severity
     143of the error. %How errors are handled depends on the logical implications of the error and go beyond the scope of this paper.
     144
     145To form lexical bit streams, we have to introduce a few new operations: {\tt Advance} and {\tt ScanThru}.
     146The {\tt Advance} operator accepts one input parameter, $c$, which is typically viewed as a bit stream containing multiple cursor bits,
     147and advances each cursor one position forward.  On little-endian architectures, shifting forward means shifting to the right.
     148{\tt ScanThru} accepts two input parameters, $c$ and $m$; any bit that is in both $c$ and $m$ is moved to first subsequent
     149$\tt 0$-bit in $m$ by calculating $(c + m) \land \lnot m$. 
     150For example, in Figure \ref{fig:ParabixParsingExample} suppose we have the regular expression \verb`<[a-zA-Z]+>` and wish to find all
     151instances of it in the source text.
     152We begin by constructing the character classes $C_{0}$, which consists of all letters, $C_{1}$, which contains all `\verb`>`'s, and
     153$C_{2}$, which marks all `\verb`<`'s. In $L_0$ the position of every `\verb`<`' is advanced by one to locate the first character of each
     154token. By computing $E_{0}$, the parser notes that ``\verb`<>`'' does not match the expected pattern. To find the end positions of each token,
     155the parser calculates $L_{1}$ by moving the cursors in $L_0$ through the letter bits in $C_0$. $L_1$ is then validated to ensure that each
     156token ends with a `\verb`>`' and discovers that ``\verb`<error]`'' too fails to match the expected pattern.
     157With additional post bit-stream processing, the erroneous cursors in $L_{0}$ and $L_{1}$ can be removed; the details
     158of which go beyond the scope of this paper.
    174159
    175160\begin{figure}[h]
     
    228213application.  Input is specified using a character class syntax
    229214adapted from the standard regular expression notations.  Output is a
    230 minimized set of three-address bitwise operations, such as $a =
    231 b~\&~c$, to compute each of the character classes from the basis bit
    232 streams.
     215minimized set of three-address bitwise operations to compute each of
     216the character classes from the basis bit streams.
     217
    233218
    234219For example, Figure \ref{fig:CCC} shows the input and output produced
     
    299284\end{figure}
    300285
    301 The Pablo compiler abstracts away the details of programming parallel
    302 bit stream code in terms of finite SIMD register widths and
    303 application buffer sizes.  Input to Pablo is a language for expressing
    304 bitstream operations on unbounded bitstreams.  The operations include
    305 bitwise logic, the {\tt Advance} and {\tt ScanThru} operations
    306 described in the previous subsection as well as ``if-then'' and
    307 ``while {}'' control structures.  Pablo translates these operations to
    308 block-at-a-time code in C/C++.
     286
     287The Pablo compiler abstracts away the details of
     288programming parallel bit stream code in terms of finite
     289SIMD register widths and application buffer sizes.
     290Input to Pablo is a language for expressing bit stream operations
     291on unbounded bit streams.  The operations include bitwise
     292logic, the {\tt Advance} and {\tt ScanThru} operations described in he
     293previous subsection as well as if and while control structures.
     294Pablo translates these operations to block-at-a-time
     295code in C/C++.
     296
    309297%, where the block size is the register width for SIMD operations on the selected target architecture.
    310298The key functionality of Pablo is to arrange for block-to-block
    311 carry bit propagation to implement the long bitstream shift
     299carry bit propagation to implement the long bit stream shift
    312300and addition operations required by
    313301{\tt Advance} and {\tt ScanThru}.
    314302
    315 For example, we can translate the simple parsing example of
    316 \ref{fig:ParabixParsingExample} above into Pablo code to produce the
    317 output as shown in Figure \ref{fig:Pablo}.  In this example, Pablo has
    318 the primary responsibility of inserting carry variable declarations
    319 that allow the results of Advance and ScanThru operations to be
    320 carried over from block to block.  A separate carry variable is
    321 required for every {\tt Advance} or {\tt ScanThru} operation.  A
    322 function containing such operations is translated into a public C++
    323 class (struct), which includes a carry queue to hold all the carry
    324 variables from iteration to iteration, together with the a method {\tt
    325   do\_block} to implement the processing for a single block (based on
    326 the SIMD register width).  Macros {\tt CarryDeclare} and {\tt
    327   CarryInit} declare and initialize the carry queue structure
    328 depending on the specific architecture and carry queue representation.
    329 The unbounded bitstream {\tt Advance} and {\tt ScanThru} operations
    330 are translated into block-by-block equivalents with explicit carry-in
    331 and carry-out processing.  At the end of each block, the {\tt
    332   CarryQ\_Adjust} operation implements any necessary adjustment of the
    333 carry queue to prepare for the next iteration.  The Pablo language and
    334 compiler also support conditional and iterative bitstream logic on
    335 unbounded streams (if and while constructs) which involves additional
    336 carry-test insertion in control branches.  Explaining the full details
    337 of the translation is beyond the scope of this paper, however.
     303For example, we can translate the simple parsing example
     304of \ref{fig:ParabixParsingExample} above into Pablo code
     305to produce the output as shown in Figure \ref{fig:Pablo}.
     306In this example, Pablo has the primary responsibility of inserting
     307carry variable declarations that allow the results of
     308{\tt Advance} and {\tt ScanThru} operations to be carried over from
     309block to block.  A separate carry variable is required for every
     310{\tt Advance} or {\tt ScanThru} operation.  A function containing
     311such operations is translated into a public C++ class (struct),
     312which includes a Carry Queue to hold all the carry
     313variables from iteration to iteration, together with the
     314a method {\tt do\_block} to implement the processing
     315for a single block (based on the SIMD register width).
     316Macros {\tt CarryDeclare} and {\tt CarryInit} declare and
     317initialize the Carry Queue structure depending on the
     318specific architecture and Carry Queue representation.
     319The unbounded bit stream {\tt Advance} and {\tt ScanThru}
     320operations are translated into block-by-block equivalents
     321with explicit carry-in and carry-out processing. 
     322At the end of each block, the {\tt CarryQ\_Adjust}
     323operation implements any necessary adjustment of the
     324Carry Queue to prepare for the next iteration.
     325The Pablo language and compiler also support conditional
     326and iterative bit stream logic on unbounded streams
     327(if and while constructs) which involves additional
     328carry-test insertion in control branches.
     329Explaining the full details of the translation
     330is beyond the scope of this paper.
    338331
    339332\subsection{Parabix Run-Time Libraries}
     
    355348sixteen 8-bit fields.
    356349
    357 These operations were originally developed for 128-bit Altivec
    358 operations on Power PC as well as 64-bit MMX and 128-bit SSE
    359 operations on Intel but have recently extended to support the new
    360 256-bit AVX operations on Intel as well as the 128-bit NEON operations
    361 on the ARM architecture.
    362  
     350These operations were originally developed for 128-bit Altivec operations on Power PC
     351as well as 64-bit MMX and 128-bit SSE operations on Intel
     352but have recently extended to support
     353the new 256-bit AVX operations on Intel as well as the 128-bit
     354\NEON{} operations on the ARM architecture.
     355
Note: See TracChangeset for help on using the changeset viewer.