# Changeset 4786 for docs

Ignore:
Timestamp:
Sep 22, 2015, 12:29:29 PM (3 years ago)
Message:

More formatting for LNCS requirements

Location:
docs/Working/icGrep
Files:
4 edited

Unmodified
Removed
• ## docs/Working/icGrep/Paper88.tex

 r4782 \def\PabloCompiler{\Pablo{} Compiler} \pagestyle{empty} %\pagestyle{empty} \begin{document} \title{Bitwise Data Parallelism with LLVM: The icGrep Case Study} \author{Robert D. Cameron\inst{1 (}\Envelope\inst{)} \and Nigel Medforth\inst{1} \and Dan Lin\inst{1} \and Dale Denis\inst{1} \and William N. Sumner\inst{1} \author{Robert D. Cameron\Envelope \and Nigel Medforth \and Dan Lin \and Dale Denis \and William N. Sumner } \institute{School of Computing Science, Simon Fraser University} \authorrunning{Cameron, Medforth, Lin, Denis and Sumner} \maketitle
• ## docs/Working/icGrep/bitgrep.bib

 r4782 and {\tt icgrep} provide systematic support for all property expressions at Unicode Level 1 as well as set union, intersection and difference. Unfortunately, {\tt pcre2grep} does not support the set intersection and difference operators directly. However, these operators can be expressed using a regular expression feature known as a lookbehind assertion.   Set intersection involves a regular expression formed with a one of the property expressions and a positive lookbehind assertion on the other, while set difference uses a negative lookbehind assertion. However, in order to implement these operators with {\tt pcre2grep}, we translated them into an equivalent form using lookbehind assertions. %Unfortunately, {\tt pcre2grep} does not support the set intersection and difference operators directly. %However, these operators can be expressed using a regular expression %feature known as a lookbehind assertion.   Set intersection involves a %regular expression formed with a one of the property expressions and a %positive lookbehind assertion on the other, while set difference uses %a negative lookbehind assertion. We generated a set of regular expressions involving all Unicode values of most of the world's major language families as a test corpus. For each program under test, we performed searches for each regular expression against each XML document. Test cases were ranked by the percentage of matching lines found in the XML document and grouped in 5\% increments. Performance is reported in CPU cycles per byte on an Intel i7-2600 machine. The results are presented in Fig.~\ref{fig:property_test}. They were ranked by the percentage of matching lines found in the XML document and grouped in 5\% increments. When comparing the three programs, \icGrep{} exhibits dramatically better performance, particularly when searching for rare items. The performance of both pcre2grep and ugrep improves (has a reduction in CPU cycles per byte) as the percentage of matching lines increases. This occurs because each match allows them to bypass processing the rest of the line. On the other hand, \icGrep{} shows a slight drop-off in performance with the number of matches found. This is primarily due to property classes that include large numbers of codepoints. These classes require more bitstream equations for calculation and also have a greater probability of matching. Nevertheless, the performance of \icGrep{} in matching the defined property expressions is stable and well ahead of the competitors in all cases. \begin{figure} \vspace{-0.5em} \end{figure} When comparing the three programs, \icGrep{} exhibits dramatically better performance, particularly when searching for rare items. The performance of both pcre2grep and ugrep improves (CPU cycles per byte decreases) as the percentage of matching lines increases. This occurs because each match allows them to bypass processing the rest of the line. On the other hand, \icGrep{} shows a slight drop-off in performance with the number of matches found. This is primarily due to property classes that include large numbers of codepoints. These classes require more bitstream equations for calculation and also have a greater probability of matching. Nevertheless, the performance of \icGrep{} in matching the defined property expressions is stable and well ahead of the competitors in all cases. \subsection{Complex Expressions} This study evaluates the comparative performance of the matching engines on a series of more complex expressions, shown in Table \ref{table:regularexpr}. This study evaluates the comparative performance of the matching engines on a set of more complex expressions, shown in Table \ref{table:regularexpr}. The first two are alphanumeric (\AN{}) expressions, differing only in that the first one is anchored to match the entire line. The third searches for lines consisting of text in Arabic script. The fourth expression is a published currency expression taken from Stewart and Uckelman~\cite{stewart2013unicode}. Stewart and Uckelman\cite{stewart2013unicode}. An expression matching runs of six or more Cyrillic script characters enclosed in initial/opening and final/ending punctuation is fifth in the list. The final expression is an email expression that allows internationalized names. The last expression matches internationalized email names. \begin{table}\centering % requires booktabs \small\vspace{-2em} \caption{Regular expressions}\label{table:regularexpr} \small%\vspace{-2em} \begin{tabular}{@{}p{2cm}p{9.8cm}@{}} \textbf{Name}&\textbf{Regular Expression}\\ \bottomrule \end{tabular} \caption{Regular expressions}\label{table:regularexpr} \vspace{-2em} \end{table} In Table \ref{table:complexexpr}, we show the performance results obtained from an Intel i7-2600 using generic 64-bit binaries for each engine. We limit the SIMD ISA within \icGrep{} to SSE2 which is available on all Intel/AMD 64-bit machines. In each case, we report seconds taken per GB of input averaged over 10 Table \ref{table:complexexpr} shows the performance results on our Intel i7-2600 test machine, reporting seconds taken per GB of input averaged over 10 runs each on our Wikimedia document collection. \begin{table}[ht]\centering % requires booktabs \caption{Matching times for complex expressions (s/GB)}\label{table:complexexpr} \newcolumntype{T}{c} \small\vspace{-2em} \small%\vspace{-2em} \begin{tabular}{@{}p{2cm}r@{~--~}rp{4pt}r@{~--~}rp{4pt}r@{~--~}rp{4pt}r@{~--~}rp{4pt}@{}} &\multicolumn{6}{c}{\textbf{\icGrep{}}}\\ \bottomrule \end{tabular} \caption{Matching times for complex expressions (s/GB)}\label{table:complexexpr} \vspace{-2em} \end{table} \begin{table}[h]\centering % requires booktabs,siunitx \small\vspace{-2em} \caption{Speedup of complex expressions on i7-4700MQ $(\sigma)$}\label{table:relperf} \small%\vspace{-2em} \begin{tabular}{@{}p{2cm}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}l@{~}r@{~~}@{}} &\multicolumn{2}{c}{\textbf{Base}}&\multicolumn{4}{c}{\textbf{SEQ}}&\multicolumn{6}{c}{\textbf{MT}}\\ \bottomrule \end{tabular} \caption{Speedup of complex expressions on i7-4700MQ $(\sigma)$}\label{table:relperf} \vspace{-2em} \end{table}