# Changeset 4558

Ignore:
Timestamp:
May 15, 2015, 3:08:14 AM (4 years ago)
Message:

streamlining

Location:
docs/Working/icGrep
Files:
3 edited

### Legend:

Unmodified
 r4557 \bottomrule \end{tabular} \caption{Performance of Complex Expressions for i7-2600 / i7-4700MQ $(\sigma)$}\label{table:relperf} \caption{Speedups of Complex Expressions for i7-2600 / i7-4700MQ $(\sigma)$}\label{table:relperf} \vspace{-2em} \end{table}
 r4556 \texttt{ni3} and $\text{CC}_{\texttt{hao}}$ is the bitstream that marks character \texttt{hao}. To match a two UTF-8 character sequence \texttt{ni3hao}, we first create an Initial stream that marks the first byte of all the valid characters. We also produce a NonFinal stream that marks every byte of all multibyte characters \emph{except for} the last byte. Using Initial to ScanThru NonFinal, we construct bitstream $M_1$, which To match a two UTF-8 character sequence \texttt{ni3hao}, we first construct bitstream $M_1$, which marks the positions of the last byte of every character. An overlap between $M_1$ and $\text{CC}_{\texttt{ni3}}$ gives the start operation may terminate prematurely. In order to remedy this problem, \icGrep{} again uses the two helper bitstreams \emph{Initial} and \emph{NonFinal}.   Any full match to a multibyte sequence must reach the initial position of the next character. The {\em NonFinal} bitstream consists of all positions except those that are final positions of UTF-8 sequences. It is used to fill in the gaps'' in the CC bitstream so that the In order to remedy this problem, \icGrep{} again uses the NonFinal stream  to fill in the gaps'' in the CC bitstream so that the MatchStar addition can move through a contiguous sequence of one bits.  In this way, matching of an arbitrary Unicode character class $C$ (with a 1 bit set at final positions of any members of the class), can be implemented using ${\mathit{MatchStar}(M, C |\mathit{NonFinal})}$. $U$ can be implemented using ${\mbox{MatchStar}(m, U |\mbox{NonFinal})}$. \paragraph{Predefined Unicode Classes.}