# Changeset 1081 for docs

Ignore:
Timestamp:
Apr 8, 2011, 3:07:13 PM (8 years ago)
Message:

Minor edit.

Location:
docs/PACT2011
Files:
2 edited

### Legend:

Unmodified
 r1046 In order to represent the byte-oriented character data as parallel bitstreams, the source data is first loaded in sequential order and converted into its transposed representation through a series of packs, shifts, and bitwise operations. Using the SIMD capabilities of current commodity processors, this transposition of source data to bitstreams incurs an amortized overhead of about 1 CPU cycle per byte for transposition \cite{CameronHerdyLin2008}. When parsing, we need to consider multiple properties of characters at different stages during the process. Using the basis bitstreams, it is possible to combine them using bitwise logic in order to compute character-class bitstreams; that is, streams that identify the positions at which characters belonging to a specific character class occur. For example, the $j$-th character is an open angle bracket <' if and only if the $j$-th bit of $B_2, B_3, B_4, B_5 =$ 1 and the $j\th$ bit of $B_0, B_1, B_6, B_7 =$ 0. Once these character-class bitstreams are created, a {\em bit-scan} operation, which is an 1-cycle intrinsic function for commodity processors, can be used for sequential markup scanning and data validation operations. A common operation in all XML parsers is start tag validation. Starts tags begin with <' and end with either />'' or >'' (depending whether the element tag is an empty element tag or not, respectively). Using the SIMD capabilities of current commodity processors, this transposition of source data to bitstreams incurs an amortized overhead of about 1 CPU cycle per byte for transposition \cite{CameronHerdyLin2008}. When parsing, we need to consider multiple properties of characters at different stages during the process. Using the basis bitstreams, it is possible to combine them using bitwise logic in order to compute character-class bitstreams; that is, streams that identify the positions at which characters belonging to a specific character class occur. For example, the $j$-th character is an open angle bracket <' if and only if the $j$-th bit of $B_2, B_3, B_4, B_5 =$ 1 and the $j$-th bit of $B_0, B_1, B_6, B_7 =$ 0. Once these character-class bitstreams are created, a {\em bit-scan} operation, which is an 1-cycle intrinsic function for commodity processors, can be used for sequential markup scanning and data validation operations. A common operation in all XML parsers is start tag validation. Starts tags begin with <' and end with either />'' or >'' (depending whether the element tag is an empty element tag or not, respectively). \begin{figure}[h]