Changeset 4557 for docs/Working


Ignore:
Timestamp:
May 15, 2015, 2:36:34 AM (4 years ago)
Author:
nmedfort
Message:

Included second table; slightly too long now. May need some discussion about Currency but I was unsure what to say without discussing nullable.

Location:
docs/Working/icGrep
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icGrep/evaluation.tex

    r4554 r4557  
    11\section{Evaluation}\label{sec:evaluation}
    22
    3 In this section, we report on the evaluation of \icGrep{} performance, looking
    4 at three aspects.   First, we examine some performance aspects of
    5 \icGrep{} internal methods, looking at the impact of optimizations discussed previously.
    6 Then we move on to a systematic performance study of \icGrep{}
    7 with named Unicode property searches in comparison to two
    8 contemporary competitors, namely, pcre2grep released in January 2015
    9 and ugrep of the ICU 54.1 software distribution.  Finally, we examine
    10 both more complex expressions and also the impact
    11 of multithreading \icGrep{}.
     3In this section, we report on the evaluation of \icGrep{} performance, looking at three aspects.   
     4%
     5First, we discuss some performance aspects of \icGrep{} internal methods, looking at the impact of optimizations discussed previously.
     6%
     7Then we move on to a systematic performance study of \icGrep{} with named Unicode property searches in comparison to two contemporary competitors,
     8namely, pcre2grep released in January 2015 and ugrep of the ICU 54.1 software distribution.
     9%
     10Finally, we examine complex expressions and the impact of multithreading \icGrep{} on an
     11Intel i7-2600 (3.4GHz) and i7-4700MQ (2.4GHz) processor.
    1212
    1313\subsection{Optimizations of Bitwise Methods}
     
    8080and {\tt icgrep} provide systematic support for all property expressions
    8181at Unicode Level 1 as well as set union, intersection and difference.
    82 On the other hand, {\tt pcre2grep} does not support the set intersection and difference operators directly.
    83 However, these operators can instead be expressed using a regular expression
     82Unfortunetly, {\tt pcre2grep} does not support the set intersection and difference operators directly.
     83However, these operators can be expressed using a regular expression
    8484feature known as a lookbehind assertion.   Set intersection involves a
    8585regular expression formed with a one of the property expressions and a
     
    134134For each program under test, we performed searches for each regular expression against each XML document.
    135135%
    136 Performance is reported in CPU cycles per byte on an Intel Core i7 machine.   
     136Performance is reported in CPU cycles per byte on an Intel i7-2600 machine.   
    137137%
    138138The results are presented in Figure~\ref{fig:property_test}.
     
    158158
    159159\begin{table}\centering % requires booktabs
    160 \small
     160\small\vspace{-2em}
    161161\begin{tabular}{@{}p{2.7cm}p{10.8cm}@{}}
    162162\textbf{Name}&\textbf{Regular Expression}\\
     
    176176\end{tabular}
    177177\caption{Regular Expressions}\label{table:regularexpr}
    178 \vspace{-1em}
     178\vspace{-2em}
    179179\end{table}
    180180
    181 
    182 We also examine the comparative performance of the matching engines on a
    183 series of more complex expressions as shown in Table \ref{table:regularexpr}.
     181This study evaluates the comparative performance of the matching engines on a
     182series of more complex expressions, shown in Table \ref{table:regularexpr}.
     183%
    184184The first two are alphanumeric expressions, differing only in that the first
    185185one is anchored to match the entire line.
     186%
    186187The third searches for lines consisting of text in Arabic script.
     188%
    187189The fourth expression is a published currency expression taken from
    188190Stewart and Uckelman~\cite{stewart2013unicode}.
     191%
    189192An expression matching runs of six or more Cyrillic script characters enclosed
    190193in initial/opening and final/ending punctuation is fifth in the list.
    191 The final expression is an email expression that allows internationalized
    192 names.
    193 
     194%
     195The final expression is an email expression that allows internationalized names.
     196%
     197In Table \ref{table:complexexpr}, we show the performance results obtained
     198from an Intel i7-2600 using precompiled binaries for each engine
     199that are suitable for any 64-bit architecture.
     200%
     201In each case, we report seconds taken per GB of input averaged over 10
     202runs each on our Wikimedia document collection.
     203%
     204Table \ref{table:relperf} further studies \icGrep{} on a newer Intel
     205i7-4700MQ architecture and evaluates the improvement gained by the
     206newer processor and improved SIMD instruction set architecture (ISA).
     207%
     208Both SSE2 and AVX1 use 128-bit registers.
     209%
     210The main advantage of AVX1 over SSE2 is its support for 3-operand form,
     211which helps reduce register pressure.
     212%
     213AVX2 utilizes the improved ISA of AVX1 but uses 256-bit registers.
     214%
     215However, AVX2 has half the number of 256-bit registers (16) than 128-bit registers (32).
    194216
    195217% \begin{table}
     
    204226% \end{table}
    205227
    206 
    207 \begin{table}\centering % requires booktabs
     228% \begin{table*}[htbp]
     229% \begin{center}
     230% \footnotesize
     231% \begin{tabular}{|l||l|l|}
     232% \hline
     233% Processor & i7-2600 (3.4GHz) & i7-4700MQ (2.4GHz) \\ \hline
     234% L1 Cache & 256KB & 256KB  \\ \hline       
     235% L2 Cache & 1MB & 1MB  \\ \hline
     236% L3 Cache & 8MB & 8MB \\ \hline
     237% Bus & 1333Mhz & 1600Mhz \\ \hline
     238% Memory & 8GB & 8GB \\ \hline
     239% \end{tabular}
     240% \caption{Platform Hardware Specs}
     241% \label{hwinfo}
     242% \end{center}
     243% \vspace{-20pt}
     244% \end{table*}
     245
     246\begin{table}[ht]\centering % requires booktabs
    208247\newcolumntype{T}{c}
    209 \small
     248\small\vspace{-2em}
    210249\begin{tabular}{@{}p{3cm}r@{~--~}rp{4pt}r@{~--~}rp{4pt}r@{~--~}rp{4pt}r@{~--~}rp{4pt}@{}}
    211 &\multicolumn{6}{c}{\textbf{\icGrep{}}}\\
     250&\multicolumn{6}{c}{\textbf{\icGrep{} (SSE2)}}\\
    212251\cmidrule[1pt](lr){2-7}
    213252\cmidrule[1pt](lr){8-10}
     
    224263\end{tabular}
    225264\caption{Matching Times for Complex Expressions (Seconds Per GB)}\label{table:complexexpr}
     265\vspace{-2em}
    226266\end{table}
    227267
    228 The performance results are shown in Table \ref{table:complexexpr}.
    229 In each case, we report seconds taken per GB of input averaged over 10
    230 runs each on our Wikimedia document collection.
    231 The most striking aspect of the results is that both ugrep and pcregrep
     268
     269The most striking aspect of Table \ref{table:complexexpr} is that both ugrep and pcregrep
    232270show dramatic slowdowns with ambiguities in regular expressions.
     271%
    233272This is most clearly illustrated in the different performance figures
    234273for the two Alphanumeric test expressions but is also evident in the
    235 Arabic, Currency and Email expressions.   By way of contrast, \icGrep{}
    236 maintains consistently fast performance in all test scenarios. 
    237 
    238 The multithreaded \icGrep{} shows speedup in every case, but balancing
    239 of the workload across multiple cores is clearly an area for further work. 
    240 Nevertheless, our three thread system shows a speedup over
    241 the single threaded version by up to 40\%.
    242 
    243 
    244 
    245 
     274Arabic, Currency and Email expressions.   
     275%
     276Contrastingly, \icGrep{} maintains consistently fast performance in all test scenarios. 
     277%
     278The multithreaded \icGrep{} shows speedup in every case but balancing
     279of the workload across multiple cores is clearly an area for further work.
     280%
     281Nevertheless, our three thread system shows up to a 40\% speedup. %  over the single threaded version
     282
     283
     284\begin{table}[h]\centering % requires booktabs,siunitx
     285\small
     286\vspace{-2em}
     287\begin{tabular}{@{}p{3cm}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}@{}}
     288&\multicolumn{6}{c}{\textbf{SEQ}}&\multicolumn{6}{c}{\textbf{MT}}\\
     289\cmidrule[1pt](lr){2-7}
     290\cmidrule[1pt](lr){8-13}
     291\textbf{Expression}&\multicolumn{2}{c}{\textbf{SSE2}}&\multicolumn{2}{c}{\textbf{AVX1}}&\multicolumn{2}{c}{\textbf{AVX2}}&\multicolumn{2}{c}{\textbf{SSE2}}&\multicolumn{2}{c}{\textbf{AVX1}}&\multicolumn{2}{c}{\textbf{AVX2}}\\
     292\toprule
     293Alphanumeric \#1&1.28&(.06)&1.35&(.05)&1.64&(.16)&1.41&(.06)&1.44&(.06)&1.96&(.18)\\
     294Alphanumeric \#2&1.27&(.06)&1.32&(.05)&1.77&(.19)&1.39&(.07)&1.39&(.04)&2.18&(.22)\\
     295Arabic&1.21&(.07)&1.28&(.08)&1.43&(.16)&1.30&(.05)&1.30&(.05)&1.63&(.13)\\
     296Currency&1.01&(.05)&1.03&(.06)&1.06&(.12)&1.05&(.05)&1.06&(.05)&1.21&(.08)\\
     297Cyrillic&1.18&(.06)&1.25&(.05)&1.13&(.10)&1.26&(.04)&1.33&(.04)&1.22&(.10)\\
     298Email&1.32&(.04)&1.38&(.05)&1.86&(.21)&1.42&(.04)&1.46&(.05)&2.17&(.26)\\
     299\midrule
     300\textit{Geomean}&1.21&&1.26&&1.45&&1.30&&1.32&&1.68&\\
     301\bottomrule
     302\end{tabular}
     303\caption{Performance of Complex Expressions for i7-2600 / i7-4700MQ $(\sigma)$}\label{table:relperf}
     304\vspace{-2em}
     305\end{table}
     306
     307
     308Interestingly, the SSE2 column of Table \ref{table:relperf} shows that by simply using a newer hardware
     309improves performance by $\sim21$ and $30\%$ for the sequential and multithreaded versions of \icGrep{}.
     310%
     311By taking advantage of the improved AVX1 and AVX2 ISA there are further improvements but AVX2 exhibits
     312higher variation between datasets.
     313%
     314This appears to be a consequence of complex Kleene-* repetitions (i.e., those that cannot utilize the MatchStar operation)
     315both resulting in increased register pressure and worse branch misprediction because of the characteristics in the datasets
     316themselves.
     317%
    246318
    247319
Note: See TracChangeset for help on using the changeset viewer.