- Timestamp:
- Feb 22, 2014, 5:09:56 PM (5 years ago)
- Location:
- docs/Working/re
- Files:
-
- 3 edited
Legend:
- Unmodified
- Added
- Removed
-
docs/Working/re/avx2.tex
r3625 r3637 58 58 xtick=data, 59 59 ylabel=AVX2 Instruction Reduction, 60 xticklabels={@,Date,Email,URIorEmail, xquote},60 xticklabels={@,Date,Email,URIorEmail,HexBytes}, 61 61 tick label style={font=\tiny}, 62 62 enlarge x limits=0.15, … … 109 109 xtick=data, 110 110 ylabel=AVX2 Speedup, 111 xticklabels={@,Date,Email,URIorEmail, xquote},111 xticklabels={@,Date,Email,URIorEmail,HexBytes}, 112 112 tick label style={font=\tiny}, 113 113 enlarge x limits=0.15, … … 134 134 135 135 As shown in Figure \ref{fig:AVXSpeedup} the reduction in 136 instruction count was reflected in a considerablespeed-up136 instruction count was reflected in a significant speed-up 137 137 in the bitstreams implementation. However, the speed-up was 138 138 considerably less than expected. As shown in \ref{fig:AVXIPC} … … 148 148 xtick=data, 149 149 ylabel=Change in Instructions per Cycle, 150 xticklabels={@,Date,Email,URIorEmail, xquote},150 xticklabels={@,Date,Email,URIorEmail,HexBytes}, 151 151 tick label style={font=\tiny}, 152 152 enlarge x limits=0.15, -
docs/Working/re/re-main.tex
r3636 r3637 586 586 the 64 work groups. Each work group carries out the regular 587 587 expression matching operations 4096 bytes at a time using SIMT 588 processing. Figure \ref{fig:GPUadd} shows our implementation 589 of long-stream addition on the GPU. Each thread maintains 590 its own carry and bubble values in shared memroy and performs 588 processing. We were able to adapt our long-stream addition 589 model to the GPU as shown in Figure \ref{fig:GPUadd}. The GPU 590 does not directly support the mask and spread operations, 591 but we are able to simulate them using shared memory. 592 Each thread maintains 593 its own carry and bubble values in shared memory and performs 591 594 synchronized updates with the other threads using a six-step 592 parallel-prefix style process. 595 parallel-prefix style process. Others have implemented 596 long-stream addition on the GPU using similar techniques. 593 597 594 598 … … 599 603 and also accessed by GPU for further processing. Therefore, 600 604 the expensive data transferring time that needed by traditional 601 discrete GPUs is hi nden and we compare only the kernel execution602 time with our SSE2 and AVX mplementations as shown in Figure605 discrete GPUs is hidden and we compare only the kernel execution 606 time with our SSE2 and AVX implementations as shown in Figure 603 607 \ref{fig:SSE-AVX-GPU}. The GPU version gives 30\% to 55\% performance 604 608 improvement over SSE version and 10\% to 40\% performance … … 616 620 further processing rather than jump to the next block with a 617 621 simple IF test. Therefore, the performance of different 618 regular expresions is depende d on the number of function calls619 to the long-stream additionand the total number of matches622 regular expresions is dependent on the number of 623 long-stream addition operations and the total number of matches 620 624 of a given input. 621 625 … … 665 669 xtick=data, 666 670 ylabel=Running Time (ms per megabyte), 667 xticklabels={@,Date,Email,URIorEmail, xquote},671 xticklabels={@,Date,Email,URIorEmail,HexBytes}, 668 672 tick label style={font=\tiny}, 669 673 enlarge x limits=0.15, -
docs/Working/re/sse2.tex
r3617 r3637 41 41 Email & \verb`([^ @]+)@([^ @]+)` \\ \hline 42 42 URIOrEmail & \verb`([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/[^ ]*)?|([^ @]+)@([^ @]+)` \\ \hline 43 Xquote & \verb`["']|"|'|�*3[49];|�*2[27];` \\ \hline43 HexBytes & \verb`(^|[ ])0x([a-fA-F0-9][a-fA-F0-9])+[.:,?!]?($|[ ])` \\ \hline 44 44 \end{tabular} 45 45 } … … 51 51 52 52 \paragraph{Test expressions.} 53 Each grep implementation is tested with the five regular expressions in Table \ref{RegularExpressions}. Xquote matches any of the representations of a 54 single or double quote character occuring in XML content. It is run on roads-2.gml, a 11,861,751 byte gml data file. The other four expressions are taken from Benchmark of Regex Libraries [http://lh3lh3.users.sourceforge.net/reb.shtml] and are all run on a concatenated version of the linux howto which is 39,422,105 bytes in length. @ simply matches the "@" character. It demonstrates the overhead involved in matching the simplest regular expression. Date, Email, and URIOrEmail provide examples of common uses for regular expression matching. 53 Each grep implementation is tested with the five regular expressions 54 in Table \ref{RegularExpressions}. @ simply matches the "@" character. It demonstrates the overhead involved in matching the simplest regular expression. Date, Email, and URIOrEmail provide examples of common uses for regular expression matching. 55 HexBytes matches delimited. They are taken from Benchmark of Regex 56 Libraries [http://lh3lh3.users.sourceforge.net/reb.shtml]. 57 HexBytes matches delimited byte strings in hexadecimal notation, 58 enforcing the constraint that the number of hex digits is even. This 59 expression shows performance of a repetition operator implemented with 60 a while loop. 61 All tests are run on a concatenated version of the linux howto which is 39,422,105 bytes in length. 55 62 56 63 … … 62 69 xtick=data, 63 70 ylabel=Cycles per Byte, 64 xticklabels={@,Date,Email,URIorEmail, xquote},71 xticklabels={@,Date,Email,URIorEmail,HexBytes}, 65 72 tick label style={font=\tiny}, 66 73 enlarge x limits=0.15, … … 106 113 xtick=data, 107 114 ylabel=Instructions per Byte, 108 xticklabels={@,Date,Email,URIorEmail, xquote},115 xticklabels={@,Date,Email,URIorEmail,HexBytes}, 109 116 tick label style={font=\tiny}, 110 117 enlarge x limits=0.15, … … 131 138 \end{figure} 132 139 133 Figure \ref{fig:SSEInstructionsPerByte} shows instructions per byte. The relative values mirror cycles per byte. The bitstreams implementation continues to show consistent performance. This is especially noticeable in Figure \ref{fig:SSEInstructionsPerCycle}, which shows instructions per cycle. The bitstreams implementation has almost no variation in the instructions per cycle. Both grep and nrGrep have considerably more variation based on the input regular expression. 140 Figure \ref{fig:SSEInstructionsPerByte} shows instructions per byte. 141 The relative values mirror cycles per byte. The bitstreams 142 implementation continues to show consistent performance. This is 143 especially noticeable in Figure \ref{fig:SSEInstructionsPerCycle}, 144 which shows instructions per cycle. The bitstreams implementation has 145 almost no variation in the instructions per cycle. Both grep and 146 nrGrep have considerably more variation based on the input regular 147 expression. 134 148 135 149 … … 140 154 xtick=data, 141 155 ylabel=Instructions per Cycle, 142 xticklabels={@,Date,Email,URIorEmail, xquote},156 xticklabels={@,Date,Email,URIorEmail,HexBytes}, 143 157 tick label style={font=\tiny}, 144 158 enlarge x limits=0.15, … … 170 184 xtick=data, 171 185 ylabel=Branch Misses per Instruction, 172 xticklabels={@,Date,Email,URIorEmail, xquote},186 xticklabels={@,Date,Email,URIorEmail,HexBytes}, 173 187 tick label style={font=\tiny}, 174 188 enlarge x limits=0.15,
Note: See TracChangeset
for help on using the changeset viewer.