Changeset 4448 for docs/Working/icGrep


Ignore:
Timestamp:
Feb 2, 2015, 1:49:07 PM (4 years ago)
Author:
cameron
Message:

References

Location:
docs/Working/icGrep
Files:
1 added
3 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icGrep/icgrep.tex

    r4446 r4448  
    4141\input{conclusion.tex}
    4242
     43\bibliographystyle{plain}
     44\bibliography{bitgrep}
    4345\end{document}
  • docs/Working/icGrep/introduction.tex

    r4446 r4448  
    1515single input stream is more difficult.  Indeed, the finite state machines (FSMs)
    1616underlying are considered the hardest to parallelize (embarassingly sequential)
    17 of the 13 dwarves identified in the Berkeley landscape of parallel computing research \cite{}.
     17of the 13 dwarves identified in the Berkeley landscape of parallel computing research \cite{asanovic2006landscape}.
    1818However, some success has been reported recently along two independent lines of
    1919research.   Mytkowicz...   Cameron/Parabix....
     
    2626a high-performance, full-featured open-source grep implementation
    2727with systematic support for Unicode regular expressions addressing the
    28 requirements of Unicode Technical Standard \#18 \cite{}.  As an alternative
     28requirements of Unicode Technical Standard \#18 \cite{davis2012unicode}.  As an alternative
    2929to classical grep implementations, ICgrep offers dramatic performance
    3030acceleration in ASCII-based and Unicode matching performance alike.
  • docs/Working/icGrep/paradigm.tex

    r4447 r4448  
    1919One important aspect of the bitwise data parallel approach
    2020is transposition of input data.   In previous work, the Parabix transform
    21 has been reported as imposing an amoritized cost of 1.1 CPU cycles/input byte,
    22 when working with SSE2 \cite{}.  This is consistent with {\tt icGrep}
     21has been reported as imposing an amoritized cost of 1 CPU cycle/input byte,
     22when working with SSE2 \cite{lin2012parabix}.  This is consistent with {\tt icGrep}
    2323results.  However, the cost of the cost of this transposition can
    2424be hidden through multithreading and pipeline parallelism, having one
    2525core perform work ahead performing transposition, while another
    2626comes behind to perform matching.  We discuss this further in
    27 Section \ref{architecture}.
     27Section \ref{sec:architecture}.
    2828
    2929The treatment of character and character class recognition is
     
    4141character classes using a single long-stream addition operation.
    4242Interestingly, the MatchStar operation also has application to
    43 parallelized long-stream addition\cite{}, as well as use in Myers
    44 bit-parallel edit distance algorithm\cite{}.  In the next section,
     43parallelized long-stream addition\cite{cameron2014bitwise}, as well as use in Myers
     44bit-parallel edit distance algorithm\cite{myers1999fast}.  In the next section,
    4545we show how MatchStar can be extended for UTF-8 sequences.
    4646
     
    5151Let $C_k$ be the bit stream identifying positions at which the $k$
    5252prior input bytes are all in $C$.   Then the observation that
    53 $C_2k = C_k \wedge (C_k \mbox{\tt <<} k)$ enables positions meeting the lower
     53$C_{2k} = C_k \wedge (C_k \mbox{\tt <<} k)$ enables positions meeting the lower
    5454bound to be determined.   An upper bound $k$ similarly involves excluding
    55 those positions not within $k$ positions of the pending markers
     55those positions not within $k$ of the pending markers
    5656(from the previous match step).
    5757
Note: See TracChangeset for help on using the changeset viewer.