Changeset 3490 for docs/Working/re


Ignore:
Timestamp:
Sep 15, 2013, 9:16:37 AM (6 years ago)
Author:
cameron
Message:

Scripts and updates

Location:
docs/Working/re
Files:
1 deleted
5 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/re/ppopp-re.tex

    r3486 r3490  
    2121
    2222
    23 \documentclass{sigplanconf}
     23\documentclass[preprint]{sigplanconf}
    2424
    2525% The following \documentclass options may be useful:
     
    3939\setlength{\pdfpagewidth}{\paperwidth}
    4040
    41 \conferenceinfo{PPoPP 2014}{February 15-19, 2013, Orland, Florida, United States}
     41\conferenceinfo{PPoPP 2014}{February 15-19, 2014, Orlando, Florida, United States}
    4242\copyrightyear{2013}
    4343\copyrightdata{978-1-nnnn-nnnn-n/yy/mm}
     
    5454                                  % short abstracts)
    5555
    56 \titlebanner{banner above paper title}        % These are ignored unless
    57 \preprintfooter{short description of paper}   % 'preprint' option specified.
     56\titlebanner{}        % These are ignored unless
     57\preprintfooter{Bitwise Data Parallel Grep}   % 'preprint' option specified.
    5858
    5959\title{Bitwise Data Parallelism in Regular Expression Matching}
    60 \subtitle{Subtitle Text, if any}
    61 
    62 \authorinfo{Robert D. Cameron \and Kenneth S. Herdy \and Dan Lin \and Meng Lin \and Ben Hull \and Thomas S. Shermer \and Arrvindh Shriraman}
    63            {Simon Fraser University}
    64            {\{cameron,ksherdy,lindanl,linmengl,bhull,shermer,ashriram\}@cs.sfu.ca}
     60%\subtitle{Subtitle Text, if any}
     61
     62\authorinfo{Anonymous Authors}{Institutions}{emails}
     63%\authorinfo{Robert D. Cameron \and Kenneth S. Herdy \and Dan Lin \and Meng Lin \and Ben Hull \and Thomas S. Shermer \and Arrvindh Shriraman}
     64%          {Simon Fraser University}
     65%           {\{cameron,ksherdy,lindanl,linmengl,bhull,shermer,ashriram\}@cs.sfu.ca}
    6566
    6667\maketitle
     
    9091Thompson \cite{thompson1968} is credited with the first construction to convert regular expressions
    9192to nondeterministic finite automata (NFA).
    92 Following Thompson's approach, a regular expression of length m is first converted
    93 to an NFA with O(m) nodes. It is then possible to search a text of length n using the
    94 NFA in worst case O(mn) time. Often, a more efficient choice
     93Following Thompson's approach, a regular expression of length $m$ is first converted
     94to an NFA with $O(m)$ nodes. It is then possible to search a text of length $n$ using the
     95NFA in worst case $O(mn)$ time. Often, a more efficient choice
    9596is to convert the NFA into a DFA. A DFA has only a single active state at any time
    9697in the matching process and
    97 hence it is possible to search a text at of length n in worst-case O(n) optimal.
     98hence it is possible to search a text at of length $n$ in $O(n)$ time.
    9899However, it is well known that the conversion of an NFA to an equivalent DFA may result
    99100in state explosion. That is, the number of resultant DFA states may increase exponentially.
    100101In \cite{Baeza-yates_anew} a new approach to text searching was proposed based on bit-parallelism \cite{baeza1992new}.
    101102This technique takes advantage of the intrinsic parallelism of bitwise operations
    102 within a computer word. Given a w-bit word, the Shift-Or algorithm \cite{Baeza-yates_anew} algorithm uses the
     103within a computer word. Given a $w$-bit word, the Shift-Or algorithm \cite{Baeza-yates_anew} algorithm uses the
    103104bit-parallel approach to
    104 simulate an NFA in O($nm/w$) worst-case time.
     105simulate an NFA in $O(nm/w)$ worst-case time.
    105106
    106107A disadvantage of the bit-parallel Shift-Or pattern matching approach
     
    114115and hence the name nrgrep \cite{navarro2000}.
    115116
    116 {\em a brief review} 
    117117There has been considerable interest in using parallelization techniques
    118118to improve the performance of regular expression matching on parallel hardware
    119119such as multi-core processors (CPUs), graphics processing units (GPUs),
    120 field-programmable gate arrays (FPGAs), and even more exotic architectures such as
     120field-programmable gate arrays (FPGAs), and specialized architectures such as
    121121the Cell Broadband Engine (Cell BE). % FPGA results (synthesis of patterns into logic circuits) vs. memory based approaches (STTs in memory)
    122122%CPU
     
    281281processing of data stream elements.  Indeed, our most abstract model
    282282is that of unbounded data parallelism: processing all elements of
    283 the input data stream simultaneously.   In essence, we view
    284 data streams as (very large) integers.   The fundamental operations
    285 we apply are based on bitwise logic and long-stream addition.
     283the input data stream simultaneously.   In essence, data streams are viewed
     284as (very large) integers.   The fundamental operations are bitwise
     285logic, stream shifting and long-stream addition.
    286286
    287287Depending on the available parallel processing resources, an actual
     
    294294case, we rely on the Parabix tool chain \cite{lin2012parabix}
    295295to handle the details of compilation to block-by-block processing.
    296 As we show later, however, we have also adapted Parabix technology to processing
    297 blocks of 4K bytes at time in our GPGPU implementation,
    298 relying on the application of our long-stream addition technique
    299 to perform 4096-bit additions using 64 threads working in lock-step
    300 SIMT fashion each on 64-bit processors.
     296For our GPGPU implementation, we have developed a long-stream
     297addition technique that allows us to perform 4096-bit additions
     298using 64 threads working in lock-step SIMT fashion.  Using scripts
     299to modify the output of the Parabix tools, we effectively divide
     300the input into blocks of 4K bytes processed in a fully data-parallel
     301manner.
    301302
    302303A key concept in this streaming approach is the derivation of bit streams
     
    316317or not.  Figure \ref{fig:streams} shows an example of an
    317318input byte stream in ASCII, the eight basis bit streams of the
    318 transposed representation, and several character class bit streams
     319transposed representation, and the character class bit streams
     320\verb:[a]:,
     321\verb:[z]:, and
     322\verb:[0-9]:
    319323that may be computed from the basis bit streams using bitwise logic.
    320324Transposition and character class construction are straightforward
    321325using the Parabix tool chain \cite{lin2012parabix}.
     326
     327\begin{figure}[tbh]
     328\begin{center}
     329\begin{tabular}{cr}\\
     330input data  & \verb`a4534q--b29z---az---a4q--bca22z--`\\
     331$B_7$ & \verb`.................................`\\
     332$B_6$ & \verb`1....1..1..1...11...1.1..111..1..`\\
     333$B_5$ & \verb`111111111111111111111111111111111`\\
     334$B_4$ & \verb`.11111...111....1....11.....111..`\\
     335$B_3$ & \verb`......11..11111.1111...11.....111`\\
     336$B_2$ & \verb`.11.1.11....111..111.1.11......11`\\
     337$B_1$ & \verb`...1....11.1....1........11.111..`\\
     338$B_0$ & \verb`1.11.111..1.1111.1111.111.11...11`\\
     339\verb:[a]: & \verb`1..............1....1......1.....`\\
     340\verb:[z]: & \verb`...........1....1.............1..`\\
     341\verb:[0-9]: & \verb`.1111....11..........1......11...`
     342\end{tabular}
     343
     344\end{center}
     345\caption{Basis and Character Class Streams}
     346\label{fig:streams}
     347\end{figure}
     348
    322349
    323350\paragraph*{Marker Streams.}  Now consider how bit-parallel data
     
    417444\item  \verb#hsimd<64>::mask(X)# : gathering the high bit of each 64-bit
    418445field into a single compressed $f$-bit mask value, and
    419 \item normal bitwise logic operations on $f$-bit masks.
     446\item normal bitwise logic operations on $f$-bit masks, and
    420447\item  \verb#simd<64>::spread(x)# : distributing the bits of
    421 an $f$ bit mask, one bit each to the $f$ 64-bit fields of a vector, and
     448an $f$ bit mask, one bit each to the $f$ 64-bit fields of a vector.
    422449\end{itemize}
    423450
  • docs/Working/re/re-Unicode.tex

    r3489 r3490  
    40400xE0-0xEF, and 0xF0-F4, respectively) or as a UTF-8 suffix
    4141byte in the range 0x80-0xBF.   Parallel bit stream
    42 technology achieves this validation easily and efficiently \cite{PPoPP08}.
     42technology achieves this validation easily and efficiently \cite{cameron2008case}.
    4343
    4444The UTF-8 byte classification streams produced as a byproduct
  • docs/Working/re/reference.bib

    r3471 r3490  
    120120}
    121121
     122@inproceedings{cameron2008case,
     123  title={A case study in SIMD text processing with parallel bit streams: UTF-8 to UTF-16 transcoding},
     124  author={Cameron, Robert D},
     125  booktitle={Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming},
     126  pages={91--98},
     127  year={2008},
     128  organization={ACM}
     129}
    122130@inproceedings{cameron2008high,
    123131  title={High performance {XML} parsing using parallel bit stream technology},
  • docs/Working/re/scripts/streams.py

    r3488 r3490  
    8585        pablo.EOF_mask = pablo.transpose_streams(chardata, basis_bits)
    8686        do_lex(basis_bits, lex)
    87         return pablo.latex_streams([('input data ', chardata[::-1]),
     87        return pablo.latex_streams([('input data ', chardata),
    8888                              ('$B_7$', pablo.bitstream2string(basis_bits.bit_0, lgth, zero_ch)),
    8989                              ('$B_6$', pablo.bitstream2string(basis_bits.bit_1, lgth, zero_ch)),
Note: See TracChangeset for help on using the changeset viewer.