source: docs/Working/icGrep/evaluation.tex @ 4465

Last change on this file since 4465 was 4465, checked in by cameron, 5 years ago

Some work on evaluation

File size: 2.2 KB
Line 
1\section{Evaluation}\label{sec:evaluation}
2
3In this section, we report on the evaluation of ICgrep performance, looking
4at three aspects.   First we consider a performance studies in a series
5of Unicode regular expression search problems in comparison to the
6contemporary competitors, including pcre2grep released in January 2015
7and ugrep of the ICU 54.1 software distribution.  Then we move on to
8investigate some performance aspects of ICgrep internal methods, looking
9at the impact of optimizations and multithreading.
10
11\subsection{ICgrep vs. Contemporary Competitors}
12
13\subsection{Optimizations of Bitwise Methods}
14
15In order to support evaluation of bitwise methods, as well as to support
16the teaching of those methods and ongoing research, icGrep has an array
17of command-line options.   This makes it relatively straightforward
18to report on certain performance aspects of ICgrep, while others require
19special builds.
20
21
22
23For example, the command-line switch {\tt -disable-matchstar} can be used
24to eliminate the use of the MatchStar operation for handling
25Kleene-* repetition of character classes.   In this case, icGrep substitutes
26a while loop that iteratively extends match results.   
27Surprisingly, this
28does not change performance much in many practical cases.   
29In each block,
30the maximum iteration count is the maximum length run encountered; the
31overall performance is based on the average of these maximums throughout the
32file.   But when search for XML tags using the regular expression
33\verb:<[^!?][^>]*>:, a slowdown of more than 2X may be found in files
34with many long tags.
35
36
37
38In order to better understand the search process, icGrep allows
39various internal representations to be printed out.   For example, the option
40{\tt -print-REs} show the parsed regular expression as it goes
41through various transformations.   The internal Pablo code generated
42may be displayed with {\tt -print-pablo}.  This can be quite useful in
43helping understand the match process.   It also possible to print out the
44generated LLVM IR code ({\tt -dump-generated-IR}), but this includes many
45details of low-level carry-handling that obscures the core logic.
46
47
48
49
50
51
52\subsection{Single vs. Multithreaded Performance}
Note: See TracBrowser for help on using the repository browser.