Changeset 4506 for docs


Ignore:
Timestamp:
Feb 11, 2015, 8:11:36 PM (4 years ago)
Author:
nmedfort
Message:

Small fixes to eval

Location:
docs/Working/icGrep
Files:
4 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icGrep/background.tex

    r4504 r4506  
    107107regular expression search was shown to deliver substantial performance
    108108acceleration for traditional ASCII regular expression matching tasks,
    109 often 5X or better \cite{cameron2014bitwise}.
     109often 5$\times$ or better \cite{cameron2014bitwise}.
    110110
    111111
  • docs/Working/icGrep/evaluation.tex

    r4505 r4506  
    44at three aspects.   First, we examine some performance aspects of
    55ICgrep internal methods, looking at the impact of optimizations discussed previously.
    6 Then we move on to a systematic performance study of \icGrep{} search
    7 performance with named Unicode property searches in comparison to two
     6Then we move on to a systematic performance study of \icGrep{}
     7with named Unicode property searches in comparison to two
    88contemporary competitors, namely, pcre2grep released in January 2015
    9 and ugrep of the ICU 54.1 software distribution.  Finally, we look
    10 at some more complex expressions and also look at the impact
     9and ugrep of the ICU 54.1 software distribution.  Finally, we examine
     10both more complex expressions and also the impact
    1111of multithreading \icGrep{}.
    1212
     
    1515In order to support evaluation of bitwise methods, as well as to support
    1616the teaching of those methods and ongoing research, \icGrep{} has an array
    17 of command-line options.   This makes it relatively straightforward
     17of command-line options.   This makes it straightforward
    1818to report on certain performance aspects of ICgrep, while others require
    1919special builds. 
    2020
    21 For example, the command-line switch {\tt -disable-matchstar} can be used
     21For example, the command-line switch \texttt{-disable-matchstar} can be used
    2222to eliminate the use of the MatchStar operation for handling
    2323Kleene-* repetition of character classes.   In this case, \icGrep{} substitutes
     
    2727In each block,
    2828the maximum iteration count is the maximum length run encountered; the
    29 overall performance is based on the average of these maximums throughout the
     29overall performance is based on the average of these maxima throughout the
    3030file.   But when search for XML tags using the regular expression
    31 \verb:<[^!?][^>]*>:, a slowdown of more than 2X may be found in files
     31\verb:<[^!?][^>]*>:, a slowdown of more than 2$\times$ may be found in files
    3232with many long tags. 
    3333
     
    4040To control the insertion of if-statements into dynamically
    4141generated code, the
    42 number of non-nullable pattern elements between the if-tests
    43 can be set with the {\tt -if-insertion-gap=} option.   The
     42number of
     43%non-nullable
     44pattern elements between if-tests
     45can be selected with the {\tt -if-insertion-gap=} option.   The
    4446default value in \icGrep{} is 3, setting the gap to 100 effectively
    45 turns of if-insertion.   Eliminating if-insertion sometimes improves
    46 performance by avoiding the extra if tests and branch mispredications.
     47turns off if-insertion.   Eliminating if-insertion sometimes improves
     48performance by avoiding the extra if tests and branch mispredictions.
    4749For patterns with long strings, however, there can be a substantial
    4850slowdown; searching for a pattern of length 40 slows down by more
    4951than 50\% without the if-statement short-circuiting.
    5052
    51 ICgrep also provides options that allow
     53\ICgrep{} also provides options that allow
    5254various internal representations to be printed out.   These
    5355can aid in understanding and/or debugging performance issues.
    5456For example, the option
    55 {\tt -print-REs} show the parsed regular expression as it goes
     57{\tt -print-REs} shows the parsed regular expression as it goes
    5658through various transformations.   The internal \Pablo{} code generated
    5759may be displayed with {\tt -print-\Pablo{}}.  This can be quite useful in
     
    6769bitwise logic equations are applied for all members of the class independent
    6870of the Unicode blocks represented in the input document.   For the classes
    69 covering the largest numbers of codepoints, we observed slowdowns of up to 5X.
     71covering the largest numbers of codepoints, we observed slowdowns of up to 5$\times$.
    7072
    7173\subsection{Simple Property Expressions}
    7274
    7375A key feature of Unicode level 1 support in regular expression engines
    74 is how the support that they provide for property expressions and combinations of property expressions
     76the support that they provide for property expressions and combinations of property expressions
    7577using set union, intersection and difference operators.   Both {\tt ugrep}
    7678and {\tt icgrep} provide systematic support for all property expressions
     
    126128
    127129We selected a set of Wikimedia XML files in several major languages representing
    128 most of the world's major language families as a test corpus.   For each program
    129 under test, we perform searches for each regular expression against each XML document.
    130 Results are presented in Figure \ref{fig:property_test}.  Performance is reported
     130most of the world's major language families as a test corpus.
     131For each program under test, we performed searches for each regular
     132expression against each XML document.
     133Results are presented in Figure~\ref{fig:property_test}.  Performance is reported
    131134in CPU cycles per byte on an Intel Core i7 machine.   The results were grouped
    132135by the percentage of matching lines found in the XML document, grouped in
     
    167170\end{tabular}
    168171\caption{Regular Expressions}\label{table:regularexpr}
     172\vspace{-1em}
    169173\end{table}
    170174
    171175
    172 We also comparative performance of the matching engines on a series
    173 of more complex expressions as shown in Table \ref{table:regularexpr}.
    174 The first two are alphanumeric expressions, differing only in the first
    175 one is anchored to match the entire line.  The third
    176 searches for lines consisting of text in Arabic script.
     176We also examine the comparative performance of the matching engines on a
     177series of more complex expressions as shown in Table \ref{table:regularexpr}.
     178The first two are alphanumeric expressions, differing only in that the first
     179one is anchored to match the entire line.
     180The third searches for lines consisting of text in Arabic script.
    177181The fourth expression is a published currency expression taken from
    178 Stewart and Uckelman \cite{stewart2013unicode}.
    179 An expression matching runs of 6 or more Cyrillic script characters enclosed
     182Stewart and Uckelman~\cite{stewart2013unicode}.
     183An expression matching runs of six or more Cyrillic script characters enclosed
    180184in initial/opening and final/ending punctuation is fifth in the list.
    181185The final expression is an email expression that allows internationalized
     
    222226show dramatic slowdowns with ambiguities in regular expressions.
    223227This is most clearly illustrated in the different performance figures
    224 for the two Alphanumeric test expressions, but is also evident in the
    225 Arabic, Currency and Email expressions.   By way of contrast, icGrep{}
    226 maintains consistent fast performance in all test scenarios. 
     228for the two Alphanumeric test expressions but is also evident in the
     229Arabic, Currency and Email expressions.   By way of contrast, \icGrep{}
     230maintains consistently fast performance in all test scenarios. 
    227231
    228232The multithreaded \icGrep{} shows speedup in every case, but balancing
    229233of the workload across multiple cores is clearly an area for further work. 
    230 Nevertheless, our three thread system shows a speedup of over
     234Nevertheless, our three thread system shows a speedup over
    231235the single threaded version by up to 40\%.
    232236
  • docs/Working/icGrep/icgrep.tex

    r4502 r4506  
    4444of dynamic compilation and bitwise data parallelism.   
    4545In performance comparisons with several contemporary alternatives,
    46 10X or better speedups are often observed.
     4610$\times$ or better speedups are often observed.
    4747\end{abstract}
    4848
Note: See TracChangeset for help on using the changeset viewer.