# Changeset 4604 for docs/Working

Ignore:
Timestamp:
Jun 14, 2015, 11:56:58 AM (4 years ago)
Message:

Small typos and clarifications

Location:
docs/Working/icGrep
Files:
5 edited

Unmodified
Removed
• ## docs/Working/icGrep/background.tex

 r4562 \end{eqnarray*} Here, Advance is an operation that advances all markers by a single position. $\mbox{\rm Advance}(m) = m+m$ MatchStar finds all matches of character class repetitions using in a MatchStar finds all matches of character class repetitions in a surprisingly simple manner \cite{cameron2014bitwise}. $\text{MatchStar}(M, C) = (((M \wedge C) + C) \oplus C) \vee M$
• ## docs/Working/icGrep/evaluation.tex

 r4565 and {\tt icgrep} provide systematic support for all property expressions at Unicode Level 1 as well as set union, intersection and difference. Unfortunetly, {\tt pcre2grep} does not support the set intersection and difference operators directly. Unfortunately, {\tt pcre2grep} does not support the set intersection and difference operators directly. However, these operators can be expressed using a regular expression feature known as a lookbehind assertion.   Set intersection involves a
• ## docs/Working/icGrep/introduction.tex

 r4565 Using parallel methods to accelerate matching of a single pattern on a single input stream is more difficult.  Indeed, of the 13 dwarves identified in the Berkeley overview of parallel computing research, single input stream is more difficult.  Indeed, of the 13 dwarfs identified in the Berkeley overview of parallel computing research, finite state machines (FSMs) are considered the hardest to parallelize (embarrassingly sequential) \cite{asanovic2006landscape}. However, some success has been reported recently along two independent lines of into regular expression matching using bitwise methods and is indeed being used to investigate Unicode level 2 requirements in a project funded by Google. Thirdly, it it fosters further research Thirdly, it fosters further research into bitwise data parallel algorithms generally and how they may take advantage of evolving architectural features.   Finally, \icGrep{} has also been designed as a teaching tool with many command line options to control algorithm features and print out internal representations of regular expressions and algorithm features and display internal representations of regular expressions and Parabix code. Section \ref{sec:Unicode} addresses the issues and performance challenges associated with meeting Unicode regular expression requirements and presents the extensions to the regular expression requirements and presents extensions to the Parabix techniques that we have developed to address them. Section \ref{sec:architecture} describes the overall architecture of
• ## docs/Working/icGrep/unicode-re.tex

 r4563 sequences of UTF-8 bytes or \emph{code units}.   The {\tt toUTF8} transformation performs this as a regular expression transformation, transforming input expressions such as \verb:\u{244}[\u{2030}-\u{2137}]: input expressions such as `\verb:\u{244}[\u{2030}-\u{2137}]:' to the corresponding UTF-8 regular expression consisting of the series of sequences and alternations shown below: \newline \paragraph{UTF-8 Byte Classification and Validation.} In UTF-8, bytes are classified as individual ASCII bytes, or as prefixes of two-, three-, or four-byte sequences, or as suffix bytes. In UTF-8, bytes are classified as (1) individual ASCII bytes, (2) prefixes of two-, three-, or four-byte sequences, or (3) suffix bytes. In addition, we say that the {\em scope} bytes of a prefix are the immediately following byte positions at which a suffix byte is We start with the marker stream $m_0$ initialized to Initial, indicating all positions are in play. Using ScanThru, we move to the final position of each character $t_1$. Applying bitwise and with $\text{CC}_{\texttt{ni3}}$ and advancing gives the Applying bitwise \verb|AND| with $\text{CC}_{\texttt{ni3}}$ and advancing gives the two matches $m_1$ for ni3.  Applying ScanThru once more advances to the final position of the character after \texttt{ni3}. Each property potentially contains many code points, so we further embed the calculations within an if hierarchy.   Each if-statement within the hierarchy determines whether the current block contains within the hierarchy determines whether the current input block contains any codepoints at all in a given Unicode range.   At the outer level, the ranges are quite coarse, becoming successively refined
Note: See TracChangeset for help on using the changeset viewer.