Changeset 4558 for docs


Ignore:
Timestamp:
May 15, 2015, 3:08:14 AM (4 years ago)
Author:
cameron
Message:

streamlining

Location:
docs/Working/icGrep
Files:
3 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icGrep/evaluation.tex

    r4557 r4558  
    301301\bottomrule
    302302\end{tabular}
    303 \caption{Performance of Complex Expressions for i7-2600 / i7-4700MQ $(\sigma)$}\label{table:relperf}
     303\caption{Speedups of Complex Expressions for i7-2600 / i7-4700MQ $(\sigma)$}\label{table:relperf}
    304304\vspace{-2em}
    305305\end{table}
  • docs/Working/icGrep/unicode-re.tex

    r4556 r4558  
    7777\texttt{ni3} and $\text{CC}_{\texttt{hao}}$ is the bitstream that
    7878marks character \texttt{hao}.
    79 To match a two UTF-8 character sequence \texttt{ni3hao}, we first
    80 create an Initial stream that marks the first byte of all the valid characters.
    81 We also produce a NonFinal stream that marks every byte of all
    82 multibyte characters \emph{except for} the last byte.
    83 Using Initial to ScanThru NonFinal, we construct bitstream $M_1$, which
     79To match a two UTF-8 character sequence \texttt{ni3hao}, we first construct bitstream $M_1$, which
    8480marks the positions of the last byte of every character.
    8581An overlap between $M_1$ and $\text{CC}_{\texttt{ni3}}$ gives the start
     
    124120operation may terminate prematurely.
    125121
    126 In order to remedy this problem, \icGrep{} again uses the two helper bitstreams
    127 \emph{Initial} and \emph{NonFinal}.   Any full match to a multibyte sequence must
    128 reach the initial position of the next character. 
    129 The {\em NonFinal} bitstream consists of all positions except those
    130 that are final positions of UTF-8 sequences.
    131 It is used to ``fill in the gaps'' in the CC bitstream so that the
     122In order to remedy this problem, \icGrep{} again uses the NonFinal
     123stream  to ``fill in the gaps'' in the CC bitstream so that the
    132124 MatchStar addition can move through a contiguous sequence of one
    133125 bits.  In this way, matching of an arbitrary Unicode character class
    134  $C$ (with a 1 bit set at final positions of any members of the class),
    135 can be implemented using ${\mathit{MatchStar}(M, C |\mathit{NonFinal})}$.
     126 $U$
     127can be implemented using ${\mbox{MatchStar}(m, U |\mbox{NonFinal})}$.
    136128
    137129\paragraph{Predefined Unicode Classes.}
Note: See TracChangeset for help on using the changeset viewer.