# Changeset 4557

Ignore:
Timestamp:
May 15, 2015, 2:36:34 AM (4 years ago)
Message:

Included second table; slightly too long now. May need some discussion about Currency but I was unsure what to say without discussing nullable.

Location:
docs/Working/icGrep
Files:
2 edited

### Legend:

Unmodified
 r4554 \section{Evaluation}\label{sec:evaluation} In this section, we report on the evaluation of \icGrep{} performance, looking at three aspects.   First, we examine some performance aspects of \icGrep{} internal methods, looking at the impact of optimizations discussed previously. Then we move on to a systematic performance study of \icGrep{} with named Unicode property searches in comparison to two contemporary competitors, namely, pcre2grep released in January 2015 and ugrep of the ICU 54.1 software distribution.  Finally, we examine both more complex expressions and also the impact of multithreading \icGrep{}. In this section, we report on the evaluation of \icGrep{} performance, looking at three aspects. % First, we discuss some performance aspects of \icGrep{} internal methods, looking at the impact of optimizations discussed previously. % Then we move on to a systematic performance study of \icGrep{} with named Unicode property searches in comparison to two contemporary competitors, namely, pcre2grep released in January 2015 and ugrep of the ICU 54.1 software distribution. % Finally, we examine complex expressions and the impact of multithreading \icGrep{} on an Intel i7-2600 (3.4GHz) and i7-4700MQ (2.4GHz) processor. \subsection{Optimizations of Bitwise Methods} and {\tt icgrep} provide systematic support for all property expressions at Unicode Level 1 as well as set union, intersection and difference. On the other hand, {\tt pcre2grep} does not support the set intersection and difference operators directly. However, these operators can instead be expressed using a regular expression Unfortunetly, {\tt pcre2grep} does not support the set intersection and difference operators directly. However, these operators can be expressed using a regular expression feature known as a lookbehind assertion.   Set intersection involves a regular expression formed with a one of the property expressions and a For each program under test, we performed searches for each regular expression against each XML document. % Performance is reported in CPU cycles per byte on an Intel Core i7 machine. Performance is reported in CPU cycles per byte on an Intel i7-2600 machine. % The results are presented in Figure~\ref{fig:property_test}. \begin{table}\centering % requires booktabs \small \small\vspace{-2em} \begin{tabular}{@{}p{2.7cm}p{10.8cm}@{}} \textbf{Name}&\textbf{Regular Expression}\\ \end{tabular} \caption{Regular Expressions}\label{table:regularexpr} \vspace{-1em} \vspace{-2em} \end{table} We also examine the comparative performance of the matching engines on a series of more complex expressions as shown in Table \ref{table:regularexpr}. This study evaluates the comparative performance of the matching engines on a series of more complex expressions, shown in Table \ref{table:regularexpr}. % The first two are alphanumeric expressions, differing only in that the first one is anchored to match the entire line. % The third searches for lines consisting of text in Arabic script. % The fourth expression is a published currency expression taken from Stewart and Uckelman~\cite{stewart2013unicode}. % An expression matching runs of six or more Cyrillic script characters enclosed in initial/opening and final/ending punctuation is fifth in the list. The final expression is an email expression that allows internationalized names. % The final expression is an email expression that allows internationalized names. % In Table \ref{table:complexexpr}, we show the performance results obtained from an Intel i7-2600 using precompiled binaries for each engine that are suitable for any 64-bit architecture. % In each case, we report seconds taken per GB of input averaged over 10 runs each on our Wikimedia document collection. % Table \ref{table:relperf} further studies \icGrep{} on a newer Intel i7-4700MQ architecture and evaluates the improvement gained by the newer processor and improved SIMD instruction set architecture (ISA). % Both SSE2 and AVX1 use 128-bit registers. % The main advantage of AVX1 over SSE2 is its support for 3-operand form, which helps reduce register pressure. % AVX2 utilizes the improved ISA of AVX1 but uses 256-bit registers. % However, AVX2 has half the number of 256-bit registers (16) than 128-bit registers (32). % \begin{table} % \end{table} \begin{table}\centering % requires booktabs % \begin{table*}[htbp] % \begin{center} % \footnotesize % \begin{tabular}{|l||l|l|} % \hline % Processor & i7-2600 (3.4GHz) & i7-4700MQ (2.4GHz) \\ \hline % L1 Cache & 256KB & 256KB  \\ \hline % L2 Cache & 1MB & 1MB  \\ \hline % L3 Cache & 8MB & 8MB \\ \hline % Bus & 1333Mhz & 1600Mhz \\ \hline % Memory & 8GB & 8GB \\ \hline % \end{tabular} % \caption{Platform Hardware Specs} % \label{hwinfo} % \end{center} % \vspace{-20pt} % \end{table*} \begin{table}[ht]\centering % requires booktabs \newcolumntype{T}{c} \small \small\vspace{-2em} \begin{tabular}{@{}p{3cm}r@{~--~}rp{4pt}r@{~--~}rp{4pt}r@{~--~}rp{4pt}r@{~--~}rp{4pt}@{}} &\multicolumn{6}{c}{\textbf{\icGrep{}}}\\ &\multicolumn{6}{c}{\textbf{\icGrep{} (SSE2)}}\\ \cmidrule[1pt](lr){2-7} \cmidrule[1pt](lr){8-10} \end{tabular} \caption{Matching Times for Complex Expressions (Seconds Per GB)}\label{table:complexexpr} \vspace{-2em} \end{table} The performance results are shown in Table \ref{table:complexexpr}. In each case, we report seconds taken per GB of input averaged over 10 runs each on our Wikimedia document collection. The most striking aspect of the results is that both ugrep and pcregrep The most striking aspect of Table \ref{table:complexexpr} is that both ugrep and pcregrep show dramatic slowdowns with ambiguities in regular expressions. % This is most clearly illustrated in the different performance figures for the two Alphanumeric test expressions but is also evident in the Arabic, Currency and Email expressions.   By way of contrast, \icGrep{} maintains consistently fast performance in all test scenarios. The multithreaded \icGrep{} shows speedup in every case, but balancing of the workload across multiple cores is clearly an area for further work. Nevertheless, our three thread system shows a speedup over the single threaded version by up to 40\%. Arabic, Currency and Email expressions. % Contrastingly, \icGrep{} maintains consistently fast performance in all test scenarios. % The multithreaded \icGrep{} shows speedup in every case but balancing of the workload across multiple cores is clearly an area for further work. % Nevertheless, our three thread system shows up to a 40\% speedup. %  over the single threaded version \begin{table}[h]\centering % requires booktabs,siunitx \small \vspace{-2em} \begin{tabular}{@{}p{3cm}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}@{}} &\multicolumn{6}{c}{\textbf{SEQ}}&\multicolumn{6}{c}{\textbf{MT}}\\ \cmidrule[1pt](lr){2-7} \cmidrule[1pt](lr){8-13} \textbf{Expression}&\multicolumn{2}{c}{\textbf{SSE2}}&\multicolumn{2}{c}{\textbf{AVX1}}&\multicolumn{2}{c}{\textbf{AVX2}}&\multicolumn{2}{c}{\textbf{SSE2}}&\multicolumn{2}{c}{\textbf{AVX1}}&\multicolumn{2}{c}{\textbf{AVX2}}\\ \toprule Alphanumeric \#1&1.28&(.06)&1.35&(.05)&1.64&(.16)&1.41&(.06)&1.44&(.06)&1.96&(.18)\\ Alphanumeric \#2&1.27&(.06)&1.32&(.05)&1.77&(.19)&1.39&(.07)&1.39&(.04)&2.18&(.22)\\ Arabic&1.21&(.07)&1.28&(.08)&1.43&(.16)&1.30&(.05)&1.30&(.05)&1.63&(.13)\\ Currency&1.01&(.05)&1.03&(.06)&1.06&(.12)&1.05&(.05)&1.06&(.05)&1.21&(.08)\\ Cyrillic&1.18&(.06)&1.25&(.05)&1.13&(.10)&1.26&(.04)&1.33&(.04)&1.22&(.10)\\ Email&1.32&(.04)&1.38&(.05)&1.86&(.21)&1.42&(.04)&1.46&(.05)&2.17&(.26)\\ \midrule \textit{Geomean}&1.21&&1.26&&1.45&&1.30&&1.32&&1.68&\\ \bottomrule \end{tabular} \caption{Performance of Complex Expressions for i7-2600 / i7-4700MQ $(\sigma)$}\label{table:relperf} \vspace{-2em} \end{table} Interestingly, the SSE2 column of Table \ref{table:relperf} shows that by simply using a newer hardware improves performance by $\sim21$ and $30\%$ for the sequential and multithreaded versions of \icGrep{}. % By taking advantage of the improved AVX1 and AVX2 ISA there are further improvements but AVX2 exhibits higher variation between datasets. % This appears to be a consequence of complex Kleene-* repetitions (i.e., those that cannot utilize the MatchStar operation) both resulting in increased register pressure and worse branch misprediction because of the characteristics in the datasets themselves. %