Changeset 4469


Ignore:
Timestamp:
Feb 6, 2015, 3:05:09 PM (5 years ago)
Author:
cameron
Message:

Evaluation stub section

Location:
docs/Working/icGrep
Files:
1 added
2 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icGrep/evaluation.tex

    r4466 r4469  
    1010
    1111\subsection{ICgrep vs. Contemporary Competitors}
     12
     13A key feature of Unicode level 1 support in regular expression engines
     14is how the support that they provide for property expressions and combinations of property expressions
     15using set union, intersection and difference operators.   Both {\tt ugrep}
     16and {\tt icgrep} provide systematic support for all property expressions
     17at Unicode Level 1 as well as set union, intersection and difference.
     18On the other hand, {\tt pcre2grep} does not support the set intersection and difference operators directly.
     19However, these operators can instead be expressed using a regular expression
     20feature known as a lookbehind assertion.   Set intersection involves a
     21regular expression formed with a one of the property expressions and a
     22positive lookbehind assertion on the other, while set difference uses
     23a negative lookbehind assertion.  As all three programs support lookbehind
     24assertions in this way, we systematically generated set intersection and
     25difference in this way.
     26
     27We generated a set of regular expressions involving all Unicode values of
     28the Unicode general
     29category property ({\tt gc}) and all values of the Unicode script property ({\tt sc}).  We then generated
     30expressions involving random pairs of {\tt gc} and {\tt sc}
     31values combined with a random set operator chosen from union, intersection and difference.
     32All property values are represented at least once.   A small number of
     33expressions were removed because they involved properties not supported by pcre2grep.
     34In the end 246 test expressions were constructed in this process.
     35
     36We selected a set of Wikimedia XML files in several major languages representing
     37most of the world's major language families as a test corpus.   For each program
     38under test, we perform searches for each regular expression against each XML document.
     39Searches were repeated n times.  Table \ref{tbl:property_test} shows the results.
     40
     41\begin{table}
     42\input{table-prop.tex}
     43\caption{Performance of Matching Property and Property Combinations}\label{tbl:property_test}
     44\end{table}
     45
     46
    1247
    1348\subsection{Optimizations of Bitwise Methods}
Note: See TracChangeset for help on using the changeset viewer.