# Changeset 4448

Ignore:
Timestamp:
Feb 2, 2015, 1:49:07 PM (4 years ago)
Message:

References

Location:
docs/Working/icGrep
Files:
3 edited

Unmodified
Removed
• ## docs/Working/icGrep/icgrep.tex

 r4446 \input{conclusion.tex} \bibliographystyle{plain} \bibliography{bitgrep} \end{document}
• ## docs/Working/icGrep/introduction.tex

 r4446 single input stream is more difficult.  Indeed, the finite state machines (FSMs) underlying are considered the hardest to parallelize (embarassingly sequential) of the 13 dwarves identified in the Berkeley landscape of parallel computing research \cite{}. of the 13 dwarves identified in the Berkeley landscape of parallel computing research \cite{asanovic2006landscape}. However, some success has been reported recently along two independent lines of research.   Mytkowicz...   Cameron/Parabix.... a high-performance, full-featured open-source grep implementation with systematic support for Unicode regular expressions addressing the requirements of Unicode Technical Standard \#18 \cite{}.  As an alternative requirements of Unicode Technical Standard \#18 \cite{davis2012unicode}.  As an alternative to classical grep implementations, ICgrep offers dramatic performance acceleration in ASCII-based and Unicode matching performance alike.
 r4447 One important aspect of the bitwise data parallel approach is transposition of input data.   In previous work, the Parabix transform has been reported as imposing an amoritized cost of 1.1 CPU cycles/input byte, when working with SSE2 \cite{}.  This is consistent with {\tt icGrep} has been reported as imposing an amoritized cost of 1 CPU cycle/input byte, when working with SSE2 \cite{lin2012parabix}.  This is consistent with {\tt icGrep} results.  However, the cost of the cost of this transposition can be hidden through multithreading and pipeline parallelism, having one core perform work ahead performing transposition, while another comes behind to perform matching.  We discuss this further in Section \ref{architecture}. Section \ref{sec:architecture}. The treatment of character and character class recognition is character classes using a single long-stream addition operation. Interestingly, the MatchStar operation also has application to parallelized long-stream addition\cite{}, as well as use in Myers bit-parallel edit distance algorithm\cite{}.  In the next section, parallelized long-stream addition\cite{cameron2014bitwise}, as well as use in Myers bit-parallel edit distance algorithm\cite{myers1999fast}.  In the next section, we show how MatchStar can be extended for UTF-8 sequences. Let $C_k$ be the bit stream identifying positions at which the $k$ prior input bytes are all in $C$.   Then the observation that $C_2k = C_k \wedge (C_k \mbox{\tt <<} k)$ enables positions meeting the lower $C_{2k} = C_k \wedge (C_k \mbox{\tt <<} k)$ enables positions meeting the lower bound to be determined.   An upper bound $k$ similarly involves excluding those positions not within $k$ positions of the pending markers those positions not within $k$ of the pending markers (from the previous match step).