# Changeset 4503 for docs/Working/icGrep

Ignore:
Timestamp:
Feb 11, 2015, 6:40:31 PM (4 years ago)
Message:

Evaluation

Location:
docs/Working/icGrep
Files:
3 edited

Unmodified
Added
Removed
• ## docs/Working/icGrep/bitgrep.bib

 r4490 title={Parabix: Boosting the efficiency of text processing on commodity processors}, author={Lin, Dan and Medforth, Nigel and Herdy, Kenneth S and Shriraman, Arrvindh and Cameron, Rob}, booktitle={High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on}, booktitle={HPCA}, pages={1--12}, year={2012}, author = {Cameron, Robert D. and Shermer, Thomas C. and Shriraman, Arrvindh and Herdy, Kenneth S. and Lin, Dan and Hull, Benjamin R. and Lin, Meng}, title = {Bitwise Data Parallelism in Regular Expression Matching}, booktitle = {Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT)}, booktitle = {PACT}, series = {PACT '14}, year = {2014}, title={Architectural support for {SWAR} text processing with parallel bit streams: the inductive doubling principle}, author={Cameron, Robert D and Lin, Dan}, booktitle={ACM Sigplan Notices}, booktitle={ASPLOS}, volume={44}, pages={337--348}, author={Asanovic, Krste and Bodik, Ras and Catanzaro, Bryan Christopher and Gebis, Joseph James and Husbands, Parry and Keutzer, Kurt and Patterson, David A and Plishker, William Lester and Shalf, John and Williams, Samuel Webb and others}, year={2006}, institution={Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley} number ={UCB/EECS-2006-183}, institution={EECS Department, University of California, Berkeley} } title={Data-parallel finite-state machines}, author={Mytkowicz, Todd and Musuvathi, Madanlal and Schulte, Wolfram}, booktitle={Proceedings of the 19th international conference on Architectural support for programming languages and operating systems}, booktitle={ASPLOS}, pages={529--542}, year={2014}, title={Accelerating business analytics applications}, author={Salapura, Valentina and Karkhanis, Tejas and Nagpurkar, Priya and Moreira, Jose}, booktitle={High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on}, booktitle={HPCA}, pages={1--10}, year={2012}, title={GPU-based NFA implementation for memory efficient high speed regular expression matching}, author={Zu, Yuan and Yang, Ming and Xu, Zhonghu and Wang, Lin and Tian, Xin and Peng, Kunyang and Dong, Qunfeng}, booktitle={PPoPP '12 - Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming}, booktitle={PPoPP}, pages={129--140}, year={2012},
• ## docs/Working/icGrep/evaluation.tex

 r4502 In this section, we report on the evaluation of ICgrep performance, looking at three aspects.   First we consider a performance studies in a series of Unicode regular expression search problems in comparison to the contemporary competitors, including pcre2grep released in January 2015 and ugrep of the ICU 54.1 software distribution.  Then we move on to investigate some performance aspects of ICgrep internal methods, looking at the impact of optimizations and multithreading. at three aspects.   First we examine some performance aspects of ICgrep internal methods, looking at the impact of optimizations discussed previously. Then we move on to a systematic performance study of icGrep{} search performance with named Unicode property searches in comparison to two contemporary competitors, namely, pcre2grep released in January 2015 and ugrep of the ICU 54.1 software distribution.  Finally, we look at some more complex expressions and also look at the impact of multithreading icGrep{}. \subsection{Optimizations of Bitwise Methods} In order to support evaluation of bitwise methods, as well as to support the teaching of those methods and ongoing research, \icGrep{} has an array of command-line options.   This makes it relatively straightforward to report on certain performance aspects of ICgrep, while others require special builds. For example, the command-line switch {\tt -disable-matchstar} can be used to eliminate the use of the MatchStar operation for handling Kleene-* repetition of character classes.   In this case, \icGrep{} substitutes a while loop that iteratively extends match results. Surprisingly, this does not change performance much in many practical cases. In each block, the maximum iteration count is the maximum length run encountered; the overall performance is based on the average of these maximums throughout the file.   But when search for XML tags using the regular expression \verb:<[^!?][^>]*>:, a slowdown of more than 2X may be found in files with many long tags. %The {\tt -disable-log2-bounded-repetition} flag allows these %effectiveness of the special techniques for bounded repetition of %byte classes to be assessed.   A slowdown of 30\% was observed %with the searches using the regular expression %\verb:(^|[ ])[a-zA-Z]{11,33}([.!? ]|):, for example. To control the insertion of if-statements into dynamically generated code, the number of non-nullable pattern elements between the if-tests can be set with the {\tt -if-insertion-gap=} option. The default value in \icGrep{} is 3, setting the gap to 100 effectively turns of if-insertion. Eliminating if-insertion sometimes improves performance by avoiding the extra if tests and branch mispredications. For patterns with long strings, however, there can be a substantial slowdown; searching for a pattern of length 40 slows down by more than 50\% without the if-statement short-circuiting. ICgrep also provides options that allow various internal representations to be printed out. These can aid in understanding and/or debugging performance issues. For example, the option {\tt -print-REs} show the parsed regular expression as it goes through various transformations. The internal \Pablo{} code generated may be displayed with {\tt -print-\Pablo{}}. This can be quite useful in helping understand the match process. It also possible to print out the generated LLVM IR code ({\tt -dump-generated-IR}), but this may be less useful as it includes many details of low-level carry-handling that obscures the core logic. The precompiled calculations of the various Unicode properties are each placed in if-hierarchies as described previously. To assess the impact of this strategy, we built a version of icGrep without such if-hierarchies. In this case, when a Unicode property class is defined, bitwise logic equations are applied for all members of the class independent of the Unicode blocks represented in the input document. For the classes covering the largest numbers of codepoints, we observed slowdowns of up to 5X. \subsection{Simple Property Expressions} expressions were removed because they involved properties not supported by pcre2grep. In the end 246 test expressions were constructed in this process. \begin{figure} \begin{center} \pgfplotstableread[col sep = comma]{data/icgrep-scatter.csv}\icgrep \pgfplotstableread[col sep = comma]{data/ugrep541-scatter.csv}\ugrep \pgfplotstableread[col sep = comma]{data/pcre2grep-scatter.csv}\pcre \begin{tikzpicture} \begin{axis}[ grid=both, x tick label style={ /pgf/number format/1000 sep=}, ylabel={CPU Cycles Per Byte}, xlabel={Percentage of Matching Lines}, minor y tick num={1}, xmax=100, height=0.5\textwidth, legend style={at={(1.05,0.5)}, anchor=west,legend columns=1, align=left,draw=none,column sep=2ex} ] \addplot+[no markers,line width=2pt,color=blue!60,solid] table {\icgrep}; \addplot+[no markers,line width=2pt,color=red!60,solid] table {\ugrep}; \addplot+[no markers,line width=2pt,color=brown!60,solid] table {\pcre}; \legend{icGrep,ugrep541,pcre2grep} \end{axis} \end{tikzpicture} \end{center} \caption{Matching Performance for Simple Property Expressions}\label{fig:property_test} \end{figure} We selected a set of Wikimedia XML files in several major languages representing in all cases. \begin{figure} \begin{center} \pgfplotstableread[col sep = comma]{data/icgrep-scatter.csv}\icgrep \pgfplotstableread[col sep = comma]{data/ugrep541-scatter.csv}\ugrep \pgfplotstableread[col sep = comma]{data/pcre2grep-scatter.csv}\pcre \begin{tikzpicture} \begin{axis}[ grid=both, x tick label style={ /pgf/number format/1000 sep=}, ylabel={CPU Cycles Per Byte}, xlabel={Percentage of Matching Lines}, minor y tick num={1}, xmax=100, height=0.5\textwidth, legend style={at={(1.05,0.5)}, anchor=west,legend columns=1, align=left,draw=none,column sep=2ex} ] \addplot+[no markers,line width=2pt,color=blue!60,solid] table {\icgrep}; \addplot+[no markers,line width=2pt,color=red!60,solid] table {\ugrep}; \addplot+[no markers,line width=2pt,color=brown!60,solid] table {\pcre}; \legend{icGrep,ugrep541,pcre2grep} \end{axis} \end{tikzpicture} \end{center} \caption{Matching Performance for Simple Property Expressions}\label{fig:property_test} \end{figure} \subsection{Complex Expressions} \begin{table}\centering % requires booktabs \small \begin{tabular}{@{}p{2.7cm}p{10.8cm}@{}} \textbf{Name}&\textbf{Regular Expression}\\ \toprule Alphanumeric \#1&\verb^[\p{L}\p{N}]*((\p{L}\p{N})|(\p{N}\p{L}))[\p{L}\p{N}]*\\ \midrule Alphanumeric \#2&\verb[\p{L}\p{N}]*((\p{L}\p{N})|(\p{N}\p{L}))[\p{L}\p{N}]*\\ \midrule Arabic&\verb^[\p{Arabic}\p{Common}]*\p{Arabic}[\p{Arabic}\p{Common}]*$\\ \midrule Currency&\verb(\p{Sc}\s*(\d*|(\d{1,3}([,.]\d{3})*))([,.]\d{2}?)?)|\newline\verb((\d*|(\d{1,3}([,.]\d{3})*))([,.]\d{2}?)?\s*\p{Sc})\\ \midrule Cyrillic&\verb[\p{Pi}\p{Po}]\p{Cyrillic}{6,}[\p{Pf}\p{Pe}]\\ \midrule Email &\verb([^\p{Z}<]+@[\p{L}\p{M}\p{N}.-]+\.(\p{L}\p{M}*){2,6})(>|\p{Z}|$)\\ \bottomrule \end{tabular} \caption{Regular Expressions}\label{table:regularexpr} \end{table} We also comparative performance of the matching engines on a series of more complex expressions as shown in Table \ref{table:complexexpr}. of more complex expressions as shown in Table \ref{table:regularexpr}. The first two are alphanumeric expressions, differing only in the first one is anchored to match the entire line.  The third searches for lines consisting of text in Arabic script. The fourth expression is a published currency expression taken from Stewart and Uckelman \cite{stewart2013unicode}. An expression matching runs of 6 or more Cyrillic script characters enclosed in initial/opening and final/ending punctuation is fifth in the list. The final expression is an email expression that allows internationalized names. % \begin{table} % \end{table} \begin{samepage} \begin{table}\centering % requires booktabs \small \begin{tabular}{@{}p{2.7cm}p{10.8cm}@{}} \textbf{Name}&\textbf{Regular Expression}\\ \toprule alphanumeric \#1&\verb^[\p{L}\p{N}]*((\p{L}\p{N})|(\p{N}\p{L}))[\p{L}\p{N}]*$\\ \midrule alphanumeric \#2&\verb[\p{L}\p{N}]*((\p{L}\p{N})|(\p{N}\p{L}))[\p{L}\p{N}]*\\ \midrule arabic&\verb^[\p{Arabic}\p{Common}]*\p{Arabic}[\p{Arabic}\p{Common}]*$\\ \midrule currency&\verb(\p{Sc}\s*(\d*|(\d{1,3}([,.]\d{3})*))([,.]\d{2}?)?)|\newline\verb((\d*|(\d{1,3}([,.]\d{3})*))([,.]\d{2}?)?\s*\p{Sc})\\ \midrule email &\verb([^\p{Z}<]+@[\p{L}\p{M}\p{N}.-]+\.(\p{L}\p{M}*){2,6})(>|\p{Z}|$)\\ \bottomrule \end{tabular} \caption{Regular Expressions}\label{table:regularexpr} \end{table} \begin{table}\centering % requires booktabs \end{table} \end{samepage} \subsection{Optimizations of Bitwise Methods} In order to support evaluation of bitwise methods, as well as to support the teaching of those methods and ongoing research, \icGrep{} has an array of command-line options. This makes it relatively straightforward to report on certain performance aspects of ICgrep, while others require special builds. For example, the command-line switch {\tt -disable-matchstar} can be used to eliminate the use of the MatchStar operation for handling Kleene-* repetition of character classes. In this case, \icGrep{} substitutes a while loop that iteratively extends match results. Surprisingly, this does not change performance much in many practical cases. In each block, the maximum iteration count is the maximum length run encountered; the overall performance is based on the average of these maximums throughout the file. But when search for XML tags using the regular expression \verb:<[^!?][^>]*>:, a slowdown of more than 2X may be found in files with many long tags. The {\tt -disable-log2-bounded-repetition} flag allows these effectiveness of the special techniques for bounded repetition of byte classes to be assessed. A slowdown of 30\% was observed with the searches using the regular expression \verb:(^|[ ])[a-zA-Z]{11,33}([.!? ]|$):, for example. To control the insertion of if-statements into dynamically generated code, the number of non-nullable pattern elements between the if-tests can be set with the {\tt -if-insertion-gap=} option.   The default value in \icGrep{} is 3, setting the gap to 100 effectively turns of if-insertion.   Eliminating if-insertion sometimes improves performance by avoiding the extra if tests and branch mispredications. For patterns with long strings, however, there can be a substantial slowdown; searching for a pattern of length 40 slows down by more than 50\% without the if-statement short-circuiting. ICgrep also provides options that allow various internal representations to be printed out.   These can aid in understanding and/or debugging performance issues. For example, the option {\tt -print-REs} show the parsed regular expression as it goes through various transformations.   The internal \Pablo{} code generated may be displayed with {\tt -print-\Pablo{}}.  This can be quite useful in helping understand the match process.   It also possible to print out the generated LLVM IR code ({\tt -dump-generated-IR}), but this may be less useful as it includes many details of low-level carry-handling that obscures the core logic. The precompiled calculations of the various Unicode properties are each placed in if-hierarchies as described previously.   To assess the impact of this strategy, we built a version of icGrep without such if-hierarchies.  In this case, when a Unicode property class is defined, bitwise logic equations are applied for all members of the class independent of the Unicode blocks represented in the input document.   For the classes covering the largest numbers of codepoints, we observed slowdowns of up to 5X. The performance results are shown in Table \ref{table:complexexpr}. In each case, we report seconds taken per GB of input averaged over 10 runs each on our Wikimedia document collection. The most striking aspect of the results is that both ugrep and pcregrep show dramatic slowdowns with ambiguities in regular expressions. This is most clearly illustrated in the different performance figures for the two Alphanumeric test expressions, but is also evident in the Arabic, Currency and Email expressions.   By way of contrast, icGrep{} maintains consistent fast performance in all test scenarios. The multithreaded \icGrep{} shows speedup in every case, but balancing of the workload across multiple cores is clearly an area for further work. Nevertheless, our three thread system shows a speedup of over the single threaded version by up to 40\%.
Note: See TracChangeset for help on using the changeset viewer.