Changeset 4554 for docs


Ignore:
Timestamp:
May 14, 2015, 4:49:50 PM (4 years ago)
Author:
nmedfort
Message:

Removed ICgrep

Location:
docs/Working/icGrep
Files:
4 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icGrep/evaluation.tex

    r4506 r4554  
    11\section{Evaluation}\label{sec:evaluation}
    22
    3 In this section, we report on the evaluation of ICgrep performance, looking
     3In this section, we report on the evaluation of \icGrep{} performance, looking
    44at three aspects.   First, we examine some performance aspects of
    5 ICgrep internal methods, looking at the impact of optimizations discussed previously.
     5\icGrep{} internal methods, looking at the impact of optimizations discussed previously.
    66Then we move on to a systematic performance study of \icGrep{}
    77with named Unicode property searches in comparison to two
     
    1616the teaching of those methods and ongoing research, \icGrep{} has an array
    1717of command-line options.   This makes it straightforward
    18 to report on certain performance aspects of ICgrep, while others require
     18to report on certain performance aspects of \icGrep{}, while others require
    1919special builds. 
    2020
     
    3939
    4040To control the insertion of if-statements into dynamically
    41 generated code, the
    42 number of
    43 %non-nullable
    44 pattern elements between if-tests
    45 can be selected with the {\tt -if-insertion-gap=} option.   The
    46 default value in \icGrep{} is 3, setting the gap to 100 effectively
    47 turns off if-insertion.   Eliminating if-insertion sometimes improves
    48 performance by avoiding the extra if tests and branch mispredictions.
    49 For patterns with long strings, however, there can be a substantial
    50 slowdown; searching for a pattern of length 40 slows down by more
    51 than 50\% without the if-statement short-circuiting.
    52 
    53 \ICgrep{} also provides options that allow
    54 various internal representations to be printed out.   These
    55 can aid in understanding and/or debugging performance issues.
     41generated code, the number of pattern elements between each if-test %non-nullable
     42can be selected with the {\tt -if-insertion-gap=} option.   
     43%
     44The default value in \icGrep{} is 3; setting the gap to 100 effectively
     45turns off if-insertion. 
     46%
     47Eliminating if-insertion sometimes improves performance by avoiding the extra if tests and branch mispredictions.
     48%
     49For patterns with long strings, however, there can be a substantial slowdown.
     50
     51%; searching for a pattern of length 40 slows down by more
     52%than 50\% without the if-statement short-circuiting. %%% I think we'd need to show this always true to make this claim.
     53
     54Additionally, \icGrep{} provides options that allow
     55various internal representations to be printed out.   
     56%
     57These can aid in understanding and/or debugging performance issues.
    5658For example, the option
    5759{\tt -print-REs} shows the parsed regular expression as it goes
    5860through various transformations.   The internal \Pablo{} code generated
    59 may be displayed with {\tt -print-\Pablo{}}.  This can be quite useful in
     61may be displayed with {\tt -print-pablo}.  This can be quite useful in
    6062helping understand the match process.   It also possible to print out the
    6163generated LLVM IR code ({\tt -dump-generated-IR}), but this may be
     
    129131We selected a set of Wikimedia XML files in several major languages representing
    130132most of the world's major language families as a test corpus.
    131 For each program under test, we performed searches for each regular
    132 expression against each XML document.
    133 Results are presented in Figure~\ref{fig:property_test}.  Performance is reported
    134 in CPU cycles per byte on an Intel Core i7 machine.   The results were grouped
    135 by the percentage of matching lines found in the XML document, grouped in
    136 5\% increments.  ICgrep shows dramatically better performance, particularly
    137 when searching for rare items.
    138 As shown in the figure, pcre2grep and ugrep both show
    139 increased performance (reduced CPU cycles per byte) with increasing percentage
    140 of matches found.  In essence, each match found allows these programs
    141 to skip the full processing of the rest of the line.   On the other
    142 hand, icGrep shows a slight drop-off in performance with the number
    143 of matches found.   This is primarily due to property classes that
    144 include large numbers of codepoints.   These classes require more
    145 bitstream equations for calculation and also have a greater probability
    146 of matching.   Nevertheless, the performance of icGrep in matching
    147 the defined property expressions is stable and well ahead of the competitors
    148 in all cases.
     133%
     134For each program under test, we performed searches for each regular expression against each XML document.
     135%
     136Performance is reported in CPU cycles per byte on an Intel Core i7 machine.   
     137%
     138The results are presented in Figure~\ref{fig:property_test}.
     139%
     140They were ranked by the percentage of matching lines found in the XML document and grouped in 5\% increments. 
     141%
     142When comparing the three programs, \icGrep{} exhibits dramatically better performance, particularly when searching for rare items.
     143%
     144The performance of both pcre2grep and ugrep improves (has a reduction in CPU cycles per byte) as the percentage of matching lines increases.
     145%
     146This occurs because each match allows them to bypass processing the rest of the line.
     147%
     148On the other hand, \icGrep{} shows a slight drop-off in performance with the number of matches found.   
     149%
     150This is primarily due to property classes that include large numbers of codepoints.   
     151%
     152These classes require more bitstream equations for calculation and also have a greater probability of matching.   
     153%
     154Nevertheless, the performance of \icGrep{} in matching the defined property expressions is stable and well ahead of the competitors in all cases.
    149155
    150156
  • docs/Working/icGrep/fig-executor.tex

    r4504 r4554  
    66\pgfsetlayers{threads,main}
    77
    8 \tikzstyle{block} = [rectangle, draw, text width=15em, text centered, minimum height=1.75em, thick, font=\ttfamily\bfseries, node distance=3.5em]
     8\tikzstyle{block} = [rectangle, draw, fill=white, text width=15em, text centered, minimum height=1.75em, thick, font=\ttfamily\bfseries, node distance=3.5em]
    99\tikzstyle{line} = [draw, ->, line width=1.4pt]
    1010\tikzstyle{separator} = [draw, line width=0.125em, dashed]
     
    2929    \path [line] (JITFunction) -- (MatchScanner);
    3030    \path [line] (MatchScanner) -- (OutputResult);
    31 
     31   
    3232    \begin{pgfonlayer}{threads}
    3333        \path (S2P.north west)+(-.1,.5) node (a) {};
  • docs/Working/icGrep/icgrep.tex

    r4550 r4554  
    1515\newcommand{\comment}[1]{}
    1616\newcommand{\icGrep}[1]{icGrep}
    17 \newcommand{\ICgrep}[1]{ICgrep}
    1817
    1918\def\RegularExpression{RegEx}
Note: See TracChangeset for help on using the changeset viewer.