# Changeset 4559

Ignore:
Timestamp:
May 15, 2015, 6:32:30 AM (4 years ago)
Message:

Tidy up UTF-8 and evaluation sections

Location:
docs/Working/icGrep
Files:
3 edited

### Legend:

Unmodified
 r4558 %than 50\% without the if-statement short-circuiting. %%% I think we'd need to show this always true to make this claim. \comment{ Additionally, \icGrep{} provides options that allow various internal representations to be printed out. less useful as it includes many details of low-level carry-handling that obscures the core logic. } The precompiled calculations of the various Unicode properties % In Table \ref{table:complexexpr}, we show the performance results obtained from an Intel i7-2600 using precompiled binaries for each engine that are suitable for any 64-bit architecture. from an Intel i7-2600 using generic 64-bit binaries for each engine. We limit the SIMD ISA within \icGrep{} to SSE2 which is available on all Intel/AMD 64-bit machines. % In each case, we report seconds taken per GB of input averaged over 10 runs each on our Wikimedia document collection. % Table \ref{table:relperf} further studies \icGrep{} on a newer Intel i7-4700MQ architecture and evaluates the improvement gained by the newer processor and improved SIMD instruction set architecture (ISA). % Both SSE2 and AVX1 use 128-bit registers. % The main advantage of AVX1 over SSE2 is its support for 3-operand form, which helps reduce register pressure. % AVX2 utilizes the improved ISA of AVX1 but uses 256-bit registers. % However, AVX2 has half the number of 256-bit registers (16) than 128-bit registers (32). % % \begin{table} of the workload across multiple cores is clearly an area for further work. % Nevertheless, our three thread system shows up to a 40\% speedup. %  over the single threaded version Nevertheless, our three-thread system shows up to a 40\% speedup. %  over the single threaded version % Table \ref{table:relperf} shows the speedups obtained with \icGrep{} icGrep{} on a newer Intel i7-4700MQ machine, considering three SIMD ISA alternatives and both single-threaded and multi-threaded versions. All speedups are relative to the single-threaded performance on the i7-2600 machine = 1.0. The SSE2 results are again using the generic binaries compiled for compatibility with all 64-bit processors.   The AVX1 results are for Intel AVX instructions in 128-bit mode.  The main advantage of AVX1 over SSE2 is its support for 3-operand form, which helps reduce register pressure.   The AVX2 results are for \icGrep{} compiled to use the 256-bit AVX2 instructions, processing blocks of 256 bytes at a time. Interestingly, the SSE2 column of Table \ref{table:relperf} shows that by simply using a newer hardware improves performance by $\sim21$ and $30\%$ for the sequential and multithreaded versions of \icGrep{}. Interestingly, the SSE2 column of Table \ref{table:relperf} shows that by simply using a newer hardware and compiler improves performance by $21\%$ and $30\%$ for the sequential and multithreaded versions of \icGrep{}. % By taking advantage of the improved AVX1 and AVX2 ISA there are further improvements but AVX2 exhibits
 r4558 expected. Mismatches between scope expectations and occurrences of suffix bytes indicate errors. bytes indicate errors  (we omit other error equations for brevity). Two helper streams are also useful. The Initial stream marks ASCII bytes and prefixes of multibyte sequences, the UnicodeClass stream for a given class involves logic for up to four positions. By convention, we define UnicodeClass($U$) for a given Unicode character class $U$ to be the stream marking the {\em final} position of Unicode character classes. $U$ to be the stream marking the {\em final} position of any characters in the class. Using these definitions, it is then possible to extend the matching \texttt{ni3} and $\text{CC}_{\texttt{hao}}$ is the bitstream that marks character \texttt{hao}. To match a two UTF-8 character sequence \texttt{ni3hao}, we first construct bitstream $M_1$, which marks the positions of the last byte of every character. An overlap between $M_1$ and $\text{CC}_{\texttt{ni3}}$ gives the start position for matching the next character. As illustrated by $M_2$, we find two matches for \texttt{ni3}, and from these two positions we can start the matching process for the next character \texttt{hao}. The final result stream $M_4$ shows one match for the multibyte sequence We start with the marker stream $m_0$ initialized to Initial, indicating all positions are in play. Using ScanThru, we move to the final position of each character $t_1$. Applying bitwise and with $\text{CC}_{\texttt{ni3}}$ and advancing gives the two matches $m_1$ for ni3.  Applying ScanThru once more advances to the final position of the character after \texttt{ni3}. The final result stream $m_2$ shows the lone match for the multibyte sequence \texttt{ni3hao}. $\text{CC}_{\text{ni3}}$                                           & \verb..1.............1.........\\ $\text{CC}_{\text{hao}}$                                           & \verb.....1....................\\ Initial                                                            & \verb1..1..111111111..1..111111\\ $m_0 = \mbox{\rm Initial}$                                         & \verb1..1..111111111..1..111111\\ NonFinal                                                           & \verb11.11.........11.11.......\\ $M_1 = \text{ScanThru}(\text{Initial}, \text{NonFinal})$           & \verb..1..111111111..1..1111111\\ $M_2 = \text{Advance}(M_1 \land \text{CC}_{\text{ni3}})$           & \verb...1.............1........\\ $M_3 = \text{ScanThru}(\text{Initial} \land M_2, \text{NonFinal})$ & \verb.....1.............1......\\ $M_4 = M_3 \land CC_{\text{hao}}$                                  & \verb.....1.................... $t_1 = \text{ScanThru}(m_0, \text{NonFinal})$                      & \verb..1..111111111..1..1111111\\ $m_1 = \text{Advance}(t_1 \land \text{CC}_{\text{ni3}})$           & \verb...1.............1........\\ $t_2 = \text{ScanThru}(t_2, \text{NonFinal})$                      & \verb.....1.............1......\\ $m_2 = \text{Advance}(t_2 \land CC_{\text{hao}})$                  & \verb......1................... \end{tabular} \end{center} In order to remedy this problem, \icGrep{} again uses the NonFinal stream  to fill in the gaps'' in the CC bitstream so that the stream  to fill in the gaps'' in the UnicodeClass($U$) bitstream so that the MatchStar addition can move through a contiguous sequence of one bits.  In this way, matching of an arbitrary Unicode character class $U$ can be implemented using ${\mbox{MatchStar}(m, U |\mbox{NonFinal})}$. can be implemented using ${\mbox{MatchStar}(m, \mbox{UnicodeClass}(U) \vee \mbox{NonFinal})}$. \paragraph{Predefined Unicode Classes.}