# Changeset 883

Ignore:
Timestamp:
Feb 3, 2011, 6:53:53 PM (8 years ago)
Message:

Reduce Section 4 to 1.5 pages. Minor edits.

File:
1 edited

### Legend:

Unmodified
 r879 % to $W$ transitions with each $W$-bit binary addition operation. On processors supporting $W$-bit addition operations, the method can perform up to $W$ finite state transitions per clock cycle. the method can perform up to $W$ finite state transitions per instruction. The method is based on the concept of parallel bitstream technology, in which parallel streams of bits are formed such that each stream Our research has been exploring a promising alternative approach, however, based on the concept of {\em parallel bit streams} \cite{PPoPP08,CameronHerdyLin2008,Cameron2009}. the concept of {\em parallel bit streams} \cite{Cameron2009,PPoPP08,CameronHerdyLin2008}. In this approach, byte streams are first sliced into eight basis bit streams, one for each identify current scanning positions for multiple instances of a particular syntactic context within a byte stream. These multiple marker positions may be each be independently These multiple marker positions may each be independently advanced in parallel using addition and masking. The net result is a new scanning primitive that Section \ref{sec:compile} then considers the translation of high-level operations on unbounded bitstreams into equivalent low-level code using SIMD intrinsics in the the C programming equivalent low-level code using SIMD intrinsics in the C programming language. Performance results are presented in section 7, comparing \subsection{Fundamentals} A bitstream is simply a sequence of $0$s and $1$s, where there is one such bit in the bitstream for each character in a source data stream of characters. For parsing, and other text processing tasks, we need to consider multiple properties of characters at different stages during the parsing process. A bitstream can be associated with each of these properties, and hence there will be multiple (parallel) bitstreams associated with a source data stream of characters \cite{PPoPP08,Cameron2009}. A bitstream is simply a sequence of $0$s and $1$s, where there is one such bit in the bitstream for each character in a source data stream. For parsing, and other text processing tasks, we need to consider multiple properties of characters at different stages during the parsing process. A bitstream can be associated with each of these properties, and hence there will be multiple (parallel) bitstreams associated with a source data stream of characters \cite{Cameron2009,PPoPP08}. The starting point for bitstream methods are \emph{basis} bitstreams classes needed for XML parsing \cite{CameronHerdyLin2008}. Improved instruction sets using parallel extract operations or inductive doubling techniques may further reduce this overhead significantly \cite{HilewitzLee2006,CameronLin2009}. inductive doubling techniques may further reduce this overhead significantly \cite{CameronLin2009,HilewitzLee2006}. Beyond the bitwise logic needed for character class determination, scanning or parsing of a source data stream. The appearance of a 1 at a position in a marker bitstream could, for example, denote the starting position an XML tag in the data stream.   In general, the set of the starting position of an XML tag in the data stream.   In general, the set of bit positions in a marker bitstream may be considered to be the current parsing positions of multiple parses taking place in parallel throughout The figure omits determination of error bitstreams, processing of single-quoted attribute values and handling of empty and empty element tags, for simplicity.  In this figure, the first of empty element tags, for simplicity.  In this figure, the first four rows show the source data and three character class bitstreams: $N$ for characters permitted in XML names, $M_f$ & \verb____________1____________1_________________________1_\\ $m = n(M_f) - m_i$ & \verb_111111111111__11111111111______11111111111111111111_\\ $M_1 = n(\verb:[<]:) \wedge \neg m$ & \verb____________________________1_________________________ $M_1 = n(\verb:[<]:) \wedge \neg m$ & \verb____________________________1________________________ \end{tabular} \end{center} Taking advantage of Python's built-in support for arbitrary-size integers to represent unbounded bitstreams, unbounded integers to represent arbitrary-size bitstreams, we have implemented a complete parsing prototype for XML using only bitstream addition, subtraction and bitwise logic. implementation of the scanning primitive $s$ as shown previously.   The operation $n$ is simply implemented using a shift left operation, while subtraction and bitwise logic an upshift operation, while subtraction and bitwise logic are directly supported. In this section, we consider the full application of the parsing techniques of the previous section to the problem of XML well-formedness checking in accord with the requirements of the XML specification \cite{TR:XML}. of the previous section to the problem of XML well-formedness checking \cite{TR:XML}. This application is useful as a well-defined and commercially significant example to assess the overall applicability of the techniques. To what extent can well-formedness requirements of XML be completely discharged using parallel bitstream techniques. example to assess the overall applicability of parallel bit stream techniques. To what extent can the well-formedness requirements of XML be completely discharged using parallel bitstream techniques? Are those techniques worthwhile in every instance, or are there better alternatives for certain requirements? do better alternatives exist for certain requirements? For those requirements that cannot be fully implemented using parallel bitstream technology alone, what preprocessing supports can be offered by parallel bitstream preprocessing support can be offered by parallel bit stream technology to the discharge of these requirements in other ways? We address each of these questions in this section, Most of the requirements of XML well-formedness checking can be implemented using two particular types of computed bitstream: {\em error bitstreams}, which were introduced in the previous section, along with and {\em error-check bitstreams}. Recall that an error bitstream stream is a stream marking the location of definite errors according to a particular requirement.  For example, the bitstream: {\em error bitstreams}, introduced in the previous section, and {\em error-check bitstreams}. Recall that an error bitstream stream is a stream marking the location of definite errors in accordance with a particular requirement.  For example, the $E_0$, $E_1$, and $E_2$ bitstreams as computed during parsing of decimal character references in Figure \ref{fig:decref} are error bitstreams.   An error check bitstream is one are error bitstreams.  One bits mark definite errors and zero bits mark the absence of error according to the requirement. Thus the complete absence of errors according to the requirements listed may be determined by forming the bitwise logical or'' of these bitstreams and confirming that the resulting value is zero. An error check bitstream is one that marks potential errors to be further checked in some fashion during post-bitstream processing. \verb:: cannot occur in content. \\ \hline ref\_syntax & Any syntax error in references, except for name errors. \\ \hline comment & \verb:--:'' in a comment without following \verb:>:'' \\ \hline PI\_err & No space after a PI target. \\ \hline tag\_syntax & Any error in start, empty or end tags, except for name errors. \\ \hline \end{tabular} \end{center} \caption{XML Error Bitstreams} \label{tab:errstreams} \end{table} Table \ref{tab:checkstreams} lists the XML error-check bitstreams that are computed together with the subsequent post-bitstream processing required.  In typical documents, most of these error-check streams will be quite sparse delimiter  \verb: