Ignore:
Timestamp:
Sep 22, 2015, 12:29:29 PM (4 years ago)
Author:
cameron
Message:

More formatting for LNCS requirements

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icGrep/evaluation.tex

    r4782 r4786  
    5858and {\tt icgrep} provide systematic support for all property expressions
    5959at Unicode Level 1 as well as set union, intersection and difference.
    60 Unfortunately, {\tt pcre2grep} does not support the set intersection and difference operators directly.
    61 However, these operators can be expressed using a regular expression
    62 feature known as a lookbehind assertion.   Set intersection involves a
    63 regular expression formed with a one of the property expressions and a
    64 positive lookbehind assertion on the other, while set difference uses
    65 a negative lookbehind assertion. 
     60However, in order to implement these operators with {\tt pcre2grep}, we
     61translated them into an equivalent form using lookbehind assertions.
     62%Unfortunately, {\tt pcre2grep} does not support the set intersection and difference operators directly.
     63%However, these operators can be expressed using a regular expression
     64%feature known as a lookbehind assertion.   Set intersection involves a
     65%regular expression formed with a one of the property expressions and a
     66%positive lookbehind assertion on the other, while set difference uses
     67%a negative lookbehind assertion. 
    6668
    6769We generated a set of regular expressions involving all Unicode values of
     
    7981most of the world's major language families as a test corpus.
    8082For each program under test, we performed searches for each regular expression against each XML document.
     83Test cases were ranked by the percentage of matching lines found in the XML document and grouped in 5\% increments. 
    8184Performance is reported in CPU cycles per byte on an Intel i7-2600 machine.   
    8285The results are presented in Fig.~\ref{fig:property_test}.
    83 They were ranked by the percentage of matching lines found in the XML document and grouped in 5\% increments. 
    84 When comparing the three programs, \icGrep{} exhibits dramatically better performance, particularly when searching for rare items.
    85 The performance of both pcre2grep and ugrep improves (has a reduction in CPU cycles per byte) as the percentage of matching lines increases.
    86 This occurs because each match allows them to bypass processing the rest of the line.
    87 On the other hand, \icGrep{} shows a slight drop-off in performance with the number of matches found.   
    88 This is primarily due to property classes that include large numbers of codepoints.   
    89 These classes require more bitstream equations for calculation and also have a greater probability of matching.   
    90 Nevertheless, the performance of \icGrep{} in matching the defined property expressions is stable and well ahead of the competitors in all cases.
    91 
    9286\begin{figure}
    9387\vspace{-0.5em}
     
    122116\end{figure}
    123117
     118When comparing the three programs, \icGrep{} exhibits dramatically better performance, particularly when searching for rare items.
     119The performance of both pcre2grep and ugrep improves (CPU cycles per byte decreases) as the percentage of matching lines increases.
     120This occurs because each match allows them to bypass processing the rest of the line.
     121On the other hand, \icGrep{} shows a slight drop-off in performance with the number of matches found.   
     122This is primarily due to property classes that include large numbers of codepoints.   
     123These classes require more bitstream equations for calculation and also have a greater probability of matching.   
     124Nevertheless, the performance of \icGrep{} in matching the defined property expressions is stable and well ahead of the competitors in all cases.
     125
     126
    124127\subsection{Complex Expressions}
    125128
    126 This study evaluates the comparative performance of the matching engines on a
    127 series of more complex expressions, shown in Table \ref{table:regularexpr}.
     129This study evaluates the comparative performance of the matching engines on a set of
     130more complex expressions, shown in Table \ref{table:regularexpr}.
    128131The first two are alphanumeric (\AN{}) expressions, differing only in that the first
    129132one is anchored to match the entire line.
    130133The third searches for lines consisting of text in Arabic script.
    131134The fourth expression is a published currency expression taken from
    132 Stewart and Uckelman~\cite{stewart2013unicode}.
     135Stewart and Uckelman\cite{stewart2013unicode}.
    133136An expression matching runs of six or more Cyrillic script characters enclosed
    134137in initial/opening and final/ending punctuation is fifth in the list.
    135 The final expression is an email expression that allows internationalized names.
     138The last expression matches internationalized email names.
    136139
    137140\begin{table}\centering % requires booktabs
    138 \small\vspace{-2em}
     141\caption{Regular expressions}\label{table:regularexpr}
     142\small%\vspace{-2em}
    139143\begin{tabular}{@{}p{2cm}p{9.8cm}@{}}
    140144\textbf{Name}&\textbf{Regular Expression}\\
     
    155159\bottomrule
    156160\end{tabular}
    157 \caption{Regular expressions}\label{table:regularexpr}
    158 \vspace{-2em}
    159161\end{table}
    160162
    161 In Table \ref{table:complexexpr}, we show the performance results obtained
    162 from an Intel i7-2600 using generic 64-bit binaries for each engine.
    163 We limit the SIMD ISA within \icGrep{} to SSE2 which is available
    164 on all Intel/AMD 64-bit machines.
    165 In each case, we report seconds taken per GB of input averaged over 10
     163Table \ref{table:complexexpr} shows the performance results
     164on our Intel i7-2600 test machine, reporting seconds taken per GB of input averaged over 10
    166165runs each on our Wikimedia document collection.
    167166
    168167\begin{table}[ht]\centering % requires booktabs
     168\caption{Matching times for complex expressions (s/GB)}\label{table:complexexpr}
    169169\newcolumntype{T}{c}
    170 \small\vspace{-2em}
     170\small%\vspace{-2em}
    171171\begin{tabular}{@{}p{2cm}r@{~--~}rp{4pt}r@{~--~}rp{4pt}r@{~--~}rp{4pt}r@{~--~}rp{4pt}@{}}
    172172&\multicolumn{6}{c}{\textbf{\icGrep{}}}\\
     
    184184\bottomrule
    185185\end{tabular}
    186 \caption{Matching times for complex expressions (s/GB)}\label{table:complexexpr}
    187 \vspace{-2em}
    188186\end{table}
    189187
     
    211209
    212210\begin{table}[h]\centering % requires booktabs,siunitx
    213 \small\vspace{-2em}
     211\caption{Speedup of complex expressions on i7-4700MQ $(\sigma)$}\label{table:relperf}
     212\small%\vspace{-2em}
    214213\begin{tabular}{@{}p{2cm}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}@{}}
    215214&\multicolumn{2}{c}{\textbf{Base}}&\multicolumn{4}{c}{\textbf{SEQ}}&\multicolumn{6}{c}{\textbf{MT}}\\
     
    229228\bottomrule
    230229\end{tabular}
    231 \caption{Speedup of complex expressions on i7-4700MQ $(\sigma)$}\label{table:relperf}
    232 \vspace{-2em}
    233230\end{table}
    234231
Note: See TracChangeset for help on using the changeset viewer.