Changeset 1374 for docs/HPCA2012


Ignore:
Timestamp:
Aug 24, 2011, 8:36:57 PM (8 years ago)
Author:
cameron
Message:

Section 3 and other fixes

Location:
docs/HPCA2012
Files:
4 edited

Legend:

Unmodified
Added
Removed
  • docs/HPCA2012/02-background.tex

    r1368 r1374  
    8282between data and metadata states. Each state transition indicates the context in which to interpret the
    8383subsequent characters. Unfortunately, textual data tends to consist of variable-length items sequenced in
    84 generally unpredictable patterns \cite{Cameron2010}; thus any character could be a state transition until
     84generally unpredictable patterns; thus any character could be a state transition until
    8585deemed otherwise.
    8686
     
    9090Apache XML project \cite{xerces},
    9191uses a series of nested switch statements and state-dependent flag tests to control the parsing logic of the program.
    92 Our analysis, which we detail in {\bf SECTION XXX}, found that Xerces-C requires between 6 - 13 branches per byte
     92Our analysis, which we detail in Section \ref{section:XML-branches}, found that Xerces-C requires between 6 - 13 branches per byte
    9393of XML to support this form of control structure (depending on the ratio of markup vs. content).
    9494Cache utilization is also significantly reduced due to the manner in which markup and content must be scanned and buffered
  • docs/HPCA2012/03-research.tex

    r1368 r1374  
    1 \section{Parabix}
     1\section{The Parabix Framework}
    22\label{section:parabix}
    33
     
    3636% Hence, one of the foci in our study is the manner in which straight line SIMD code influences energy usage.
    3737
    38 This section presents an overview of the SIMD-based parallel bit stream text processing framework, \emph{Parabix}
    39 In Section {\bf 4}, we discuss one of its implementations, \emph{Parabix-XML}.
    40 A more comprehensive study of Parabix and Parabix-XML, can be found in the technical report
    41 ``Parallel Parsing with Bitstream Addition: An XML Case Study'' \cite{Cameron2010}.
    42 
    43 The fundemental difference between the Parabix framework and traditional text processing models is in how
    44 Parabix represents the source data.
    45 Rather than viewing it as a stream of bytes or characters and performing per-byte comparisons to
    46 determine when to transition between parsing states, Parabix views data as a set of \emph{bit streams} and uses a mixture of
    47 boolean-logic\footnote{In this paper, we use the notation $\land$, $\lor$ and $\lnot$ to denote the AND, OR and NOT boolean operations.} and integer-based math on those streams to effectively parse many bytes in parallel.
    48 Bit streams are the foundation of the Parabix technology. In order to understand how it is possible to
    49 perform complex state-transition-based text processing operations in parallel, understanding what bit streams are,
    50 how they are created, and --- more importantly --- how they are used by Parabix, is critical.
    51 
    52 % Our first generation parallel bitstream XML parser---Parabix1---uses employs a less conventional approach of
    53 % SIMD technology to represent text in parallel bitstreams. Bits of each stream are in one-to-one-correspondence
    54 % with the bytes of a character stream. A transposition step first transforms sequential byte stream data into
    55 % eight basis bitstreams for the bits of each byte.
    56 
    57 % Parabix1 is functionally equivalent to a traditional XML parser.
    58 % That is, Parabix1 moves a single cursor sequentially through the source document and
    59 % parses it to distinguish between (and properly validate) the markup and content.
    60 % Where Parabix1 differs from traditional parsers is that instead of processing a document a byte-at-a-time, it
    61 % transforms the document into a set of \emph{bit streams} (explained shortly) and then performs boolean-logic operations
    62 % on those bit streams to process many bytes in parallel.
    63 
    64 
    65 %Each $1$-bit marks the postion of each key character in the
    66 %corresponding source data stream. A single stream is generated for each of the key markup characters.
    67 
    68 \subsection{What are Bit Streams?}
    69 
    70 A bit stream is simply a sequence of $\tt 1$s and $\tt 0$s.
    71 The significance of each bit value is dependent on the type of bit stream.
    72 We view bit streams in one of two ways: $n$ 1-bit values for boolean-logic operations and 1 $n$-bit value for integer-based math.
    73 For simplicity, assume that $n$ is infinitely long (i.e., unbounded) w.r.t. the size of the source data. In reality, each bit stream is divided into
    74 a series of $w$-bit blocks, where $w$ is equal to the SIMD register width of the system (e.g., 64, 128, or 256).
    75 We discuss how these $\lceil \frac{n}{w} \rceil$ bit blocks can be viewed as an unbounded bit stream in Section \ref{parabix tool chain}.
    76 
    77 The first type of bit streams used in Parabix are refered to as \emph{basis bit streams}, which contain the input source data.
    78 Parabix uses the basis bit streams to construct \emph{character-class bit streams} in which each $\tt 1$ bit indicates the presense
    79 of a significant character (or class of characters) in the parsing process.
    80 Character-class bit streams are used to compute \emph{lexical bit streams} and \emph{error bit streams}, which
    81 Parabix uses to process and validate the source document. The remainder of this section will discuss each type of bit stream.
    82 
    83 {\bf Basis Bit Streams:}
    84 To construct the basis bit streams, the source data is first loaded in
    85 sequential order and then transposed --- through a series of SIMD pack, shift, and bitwise operations ---
    86 so that Parabix can efficiently produce the character-class bit streams.
    87 Essentially, when the source data is in basis bit stream form, the $k$-th bit of $i$-th character in the source
    88 text is in the $i$-th (bit) position of the $k$-th basis bit stream, $b_{k}$.
    89 Using the SIMD capabilities of current commodity processors, the transposition process incurs an amortized cost
    90 of approximately 1 cycle per byte \cite{CameronHerdyLin2008}.
    91 The size of $k$ is dependent on the code unit size of the text encoding format of the source document. A code unit is simply
    92 a fixed number of bits (or bytes) used to represent a specific character (code point). Some encoding formats, such
    93 as UNICODE, may use a multiple code units to express all of its possible code points.
    94 How multi-code-unit characters can be parsed efficiently goes beyond the scope of this paper.
    95 The most dominant format in data-oriented XML documents is ASCII, which uses a 8-bit code unit to represent all of the
    96 128 code points within it.
    97 In Figure \ref{fig:BitstreamsExample}, we show how the ASCII string ``{\ttfamily b7\verb`<`A}'' is represented as 8 basis bit
     38This section presents an overview of the SIMD-based parallel bit stream text processing framework, \emph{Parabix}.
     39The framework has three components: a unifying architectural view of text processing in terms
     40of parallel bit streams, a tool chain for automating the generation of parallel bit stream
     41code from higher-level specifications, and a run-time environment providing a portable
     42SIMD programming abstraction, independent of the specific facilities available
     43on particular target architectures.
     44
     45
     46\subsection{Parallel Bit Streams}
     47
     48The fundamental difference between the Parabix framework and traditional text processing models is in how
     49Parabix represents the source data.   Given a traditional byte-oriented text stream, Parabix
     50first transposes the text data to a transform domain consisting of 8 parallel bit streams, known
     51as {\em basis bit streams}.  In essence, each basis bit stream $b_{k}$ represents the stream
     52of $k$th bit of each byte in the source text.  That is, the $k$th bit of $i$th byte in the source
     53text is in the $i$th (bit) position of the $k$th basis bit stream, $b_{k}$.
     54For example, in Figure \ref{fig:BitstreamsExample}, we show how the ASCII string ``{\ttfamily b7\verb`<`A}'' is represented as 8 basis bit
    9855streams, $\tt b_{0 \ldots 7}$. The bits used to construct $\tt b_7$ have been highlighted in this example.
    9956
     
    12279\end{figure}
    12380
     81The advantage of the parallel bit stream representation is that we can
     82use the 128-bit SIMD registers commonly found on commodity processors (e.g. SSE on
     83Intel) to process 128 byte positions at a time using bitwise logic, shifting and
     84other operations.
     85
     86Just as forward and inverse Fourier transforms are used to transform
     87between the time and frequency domains in signal processing, bit stream
     88transposition and inverse transposition provides ``byte space'' and ``bit space''
     89views of text.  The goal of the Parabix framework is to support efficient
     90text processing using these two equivalent representations in the same
     91way that efficient signal processing benefits from the use of the frequency
     92domain in some cases and the time domain in others.
     93
     94In the Parabix framework, basis bit streams are used as the starting
     95point to determine other bit streams.   In particular, Parabix uses the basis bit streams to
     96construct \emph{character-class bit streams} in which each $\tt 1$ bit indicates the presense
     97of a significant character (or class of characters) in the parsing process.
     98Character-class bit streams may then be used to compute \emph{lexical bit streams} and \emph{error bit streams}, which
     99Parabix uses to process and validate the source document. The remainder of this section will discuss each type of bit stream.
     100
     101{\bf Basis Bit Streams:}
     102To construct the basis bit streams, the source data is first loaded in
     103sequential order and then transposed --- through a series of SIMD pack, shift, and bitwise operations ---
     104so that Parabix can efficiently produce the character-class bit streams.
     105Using the SIMD capabilities of current commodity processors, the transposition process incurs an amortized cost
     106of approximately 1 cycle per byte \cite{CameronHerdyLin2008}.
     107%The size of $k$ is dependent on the code unit size of the text encoding format of the source document. A code unit is simply
     108%a fixed number of bits (or bytes) used to represent a specific character (code point). Some encoding formats, such
     109%as UNICODE, may use a multiple code units to express all of its possible code points.
     110%How multi-code-unit characters can be parsed efficiently goes beyond the scope of this paper.
     111%The most dominant format in data-oriented XML documents is ASCII, which uses a 8-bit code unit to represent all of the
     112%128 code points within it.
    124113
    125114% In Parabix1, basis bit streams are used to generate \ref{lexical bit streams} for key markup characters.
     
    169158To form lexical bit streams, we have to introduce a few new operations: Advance and ScanThru.
    170159The $\texttt{Advance}$ operator accepts one input parameter, $c$, which is typically viewed as a bit stream containing multiple cursor bits,
    171 and returns $c + c$ --- effectively moves each cursor one position to the ``right''.
     160and advances each cursor one position forward.  On little-endian architectures, shifting forward means shifting to the right.
    172161$\texttt{ScanThru}$ accepts two input parameters, $c$ and $m$; any bit that is in both $c$ and $m$ is moved to first subsequent
    173 $\tt 0$-bit in $m$ by calculating $(c + m) \land \lnot m$.
     162$\tt 0$-bit in $m$ by calculating $(c + m) \land \lnot m$.  
    174163For example, in Figure \ref{fig:ParabixParsingExample} suppose we have the regular expression \verb`<[a-zA-Z]+>` and wish to find all
    175164instances of it in the source text.
     
    275264{\tt Advance} and {\tt ScanThru}.
    276265
     266For example, we can translate the simple parsing example
     267of \ref{fig:ParabixParsingExample} above into Pablo code
     268to produce the output as shown in Figure \ref{fig:Pablo}.
     269In this example, Pablo has the primary responsibility of inserting
     270carry variable declarations that allow the results of
     271Advance and ScanThru operations to be carried over from
     272block to block.  Explaining the full details of the translation
     273is beyond the scope of this paper, however.
     274
     275\begin{figure}[h]
     276
     277\begin{center}
     278
     279\begin{tabular}{r l}
     280\ttfamily{\bfseries INPUT:} & \verb`def parse_tags(classes, errors):` \\
     281& \verb`        classes.C0 = Alpha` \\
     282& \verb`        classes.C1 = Rangle` \\
     283& \verb`        classes.C2 = Langle` \\
     284& \verb`        L0 = bitutil.Advance(C2)` \\
     285& \verb`        errors.E0 = L0 &~ C0` \\
     286& \verb`        L1 = bitutil.ScanThru(L0, C0)` \\
     287& \verb`        errors.E1 = L1 &~ C1` \\ \\
     288
     289\ttfamily{\bfseries OUTPUT:} & \verb`struct Parse_tags {` \\
     290& \verb`  Parse_tags() { ` \\
     291& \verb`  CarryInit(carryQ, 2); }` \\
     292& \verb`  void do_block(Classes & classes, Errors & errors) {` \\
     293& \verb`                BitBlock L0, L1;` \\
     294& \verb`        classes.C0 = Alpha;` \\
     295& \verb`        classes.C1 = Rangle;` \\
     296& \verb`        classes.C2 = Langle;` \\
     297& \verb`        L0 = BitBlock_advance_ci_co(C2, carryQ, 0);` \\
     298& \verb`        errors.E0 = simd_andc(L0, C0);` \\
     299& \verb`        L1 = BitBlock_scanthru_ci_co(L0, C0, carryQ, 1);` \\
     300& \verb`        errors.E1 = simd_andc(L1, C1);` \\
     301& \verb`        CarryQ_Adjust(carryQ, 2);` \\
     302& \verb`  }` \\
     303& \verb`  CarryDeclare(carryQ, 2);` \\
     304& \verb`  };` \\
     305\end{tabular}
     306
     307\end{center}
     308
     309\caption{Parallel Block Compiler (Pablo) Input/Output}
     310\label{fig:Pablo}
     311\end{figure}
    277312
    278313
     
    289324but have recently extended to support
    290325the new 256-bit AVX operations on Intel as well as the 128-bit
    291 NEON operations on the ARM architecture. Further details
    292 are provided in later sections.
     326NEON operations on the ARM architecture.
    293327 
  • docs/HPCA2012/05-corei3.tex

    r1372 r1374  
    5050
    5151\subsection{Branch Mispredictions}
     52\label{section:XML-branches}
    5253In general, reducing the branch misprediction rate is difficult in
    5354text-based XML parsing applications. This is due to (1) variable
Note: See TracChangeset for help on using the changeset viewer.