Changeset 4465 for docs

Feb 6, 2015, 7:30:09 AM (4 years ago)

Some work on evaluation

1 edited


  • docs/Working/icGrep/evaluation.tex

    r4446 r4465  
    3 \subsection{Case Studies in Bitwise Methods}
     3In this section, we report on the evaluation of ICgrep performance, looking
     4at three aspects.   First we consider a performance studies in a series
     5of Unicode regular expression search problems in comparison to the
     6contemporary competitors, including pcre2grep released in January 2015
     7and ugrep of the ICU 54.1 software distribution.  Then we move on to
     8investigate some performance aspects of ICgrep internal methods, looking
     9at the impact of optimizations and multithreading.
     11\subsection{ICgrep vs. Contemporary Competitors}
     13\subsection{Optimizations of Bitwise Methods}
     15In order to support evaluation of bitwise methods, as well as to support
     16the teaching of those methods and ongoing research, icGrep has an array
     17of command-line options.   This makes it relatively straightforward
     18to report on certain performance aspects of ICgrep, while others require
     19special builds.
     23For example, the command-line switch {\tt -disable-matchstar} can be used
     24to eliminate the use of the MatchStar operation for handling
     25Kleene-* repetition of character classes.   In this case, icGrep substitutes
     26a while loop that iteratively extends match results.   
     27Surprisingly, this
     28does not change performance much in many practical cases.   
     29In each block,
     30the maximum iteration count is the maximum length run encountered; the
     31overall performance is based on the average of these maximums throughout the
     32file.   But when search for XML tags using the regular expression
     33\verb:<[^!?][^>]*>:, a slowdown of more than 2X may be found in files
     34with many long tags.
     38In order to better understand the search process, icGrep allows
     39various internal representations to be printed out.   For example, the option
     40{\tt -print-REs} show the parsed regular expression as it goes
     41through various transformations.   The internal Pablo code generated
     42may be displayed with {\tt -print-pablo}.  This can be quite useful in
     43helping understand the match process.   It also possible to print out the
     44generated LLVM IR code ({\tt -dump-generated-IR}), but this includes many
     45details of low-level carry-handling that obscures the core logic.
    452\subsection{Single vs. Multithreaded Performance}
    5 \subsection{ICgrep vs. Contemporary Competitors}
Note: See TracChangeset for help on using the changeset viewer.