Changeset 4604 for docs


Ignore:
Timestamp:
Jun 14, 2015, 11:56:58 AM (4 years ago)
Author:
nmedfort
Message:

Small typos and clarifications

Location:
docs/Working/icGrep
Files:
5 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icGrep/background.tex

    r4562 r4604  
    7979\end{eqnarray*}
    8080Here, Advance is an operation that advances all markers by a single position. \[\mbox{\rm Advance}(m) = m+m\]
    81 MatchStar finds all matches of character class repetitions using in a
     81MatchStar finds all matches of character class repetitions in a
    8282surprisingly simple manner \cite{cameron2014bitwise}.
    8383\[\text{MatchStar}(M, C) = (((M \wedge C) + C)  \oplus C) \vee M\]
  • docs/Working/icGrep/evaluation.tex

    r4565 r4604  
    8282and {\tt icgrep} provide systematic support for all property expressions
    8383at Unicode Level 1 as well as set union, intersection and difference.
    84 Unfortunetly, {\tt pcre2grep} does not support the set intersection and difference operators directly.
     84Unfortunately, {\tt pcre2grep} does not support the set intersection and difference operators directly.
    8585However, these operators can be expressed using a regular expression
    8686feature known as a lookbehind assertion.   Set intersection involves a
  • docs/Working/icGrep/introduction.tex

    r4565 r4604  
    2525
    2626Using parallel methods to accelerate matching of a single pattern on a
    27 single input stream is more difficult.  Indeed, of the 13 dwarves identified in the Berkeley overview of parallel computing research,
     27single input stream is more difficult.  Indeed, of the 13 dwarfs identified
     28in the Berkeley overview of parallel computing research,
    2829finite state machines (FSMs) are considered the hardest to parallelize (embarrassingly sequential) \cite{asanovic2006landscape}.
    2930However, some success has been reported recently along two independent lines of
     
    7576into regular expression matching using bitwise methods and is indeed being used
    7677to investigate Unicode level 2 requirements in a project funded by Google.
    77 Thirdly, it it fosters further research
     78Thirdly, it fosters further research
    7879into bitwise data parallel algorithms generally and how they may take advantage
    7980of evolving architectural features.   Finally, \icGrep{} has also been
    8081designed as a teaching tool with many command line options to control
    81 algorithm features and print out internal representations of regular expressions and
     82algorithm features and display internal representations of regular expressions and
    8283Parabix code.   
    8384
     
    9293Section \ref{sec:Unicode} addresses
    9394the issues and performance challenges associated with meeting Unicode
    94 regular expression requirements and presents the extensions to the
     95regular expression requirements and presents extensions to the
    9596Parabix techniques that we have developed to address them. 
    9697Section \ref{sec:architecture} describes the overall architecture of
  • docs/Working/icGrep/unicode-re.tex

    r4563 r4604  
    1313sequences of UTF-8 bytes or \emph{code units}.   The {\tt toUTF8} transformation
    1414performs this as a regular expression transformation, transforming
    15 input expressions such as `\verb:\u{244}[\u{2030}-\u{2137}]:`
     15input expressions such as `\verb:\u{244}[\u{2030}-\u{2137}]:'
    1616to the corresponding UTF-8 regular expression consisting of the series of sequences and alternations shown below:
    1717\newline
     
    2323
    2424\paragraph{UTF-8 Byte Classification and Validation.}
    25 In UTF-8, bytes are classified as individual ASCII bytes, or as
    26 prefixes of two-, three-, or four-byte sequences, or as suffix bytes.
     25In UTF-8, bytes are classified as (1) individual ASCII bytes, (2)
     26prefixes of two-, three-, or four-byte sequences, or (3) suffix bytes.
    2727In addition, we say that the {\em scope} bytes of a prefix are the
    2828immediately following byte positions at which a suffix byte is
     
    7979We start with the marker stream $m_0$ initialized to Initial, indicating all positions are in play.
    8080Using ScanThru, we move to the final position of each character $t_1$.
    81 Applying bitwise and with $\text{CC}_{\texttt{ni3}}$ and advancing gives the
     81Applying bitwise \verb|AND| with $\text{CC}_{\texttt{ni3}}$ and advancing gives the
    8282two matches $m_1$ for ni3.  Applying ScanThru once more advances to the
    8383final position of the character after \texttt{ni3}. 
     
    131131Each property potentially contains many code points, so we further
    132132embed the calculations within an if hierarchy.   Each if-statement
    133 within the hierarchy determines whether the current block contains
     133within the hierarchy determines whether the current input block contains
    134134any codepoints at all in a given Unicode range.   At the outer
    135135level, the ranges are quite coarse, becoming successively refined
Note: See TracChangeset for help on using the changeset viewer.