Changeset 4559


Ignore:
Timestamp:
May 15, 2015, 6:32:30 AM (4 years ago)
Author:
cameron
Message:

Tidy up UTF-8 and evaluation sections

Location:
docs/Working/icGrep
Files:
3 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icGrep/evaluation.tex

    r4558 r4559  
    5252%than 50\% without the if-statement short-circuiting. %%% I think we'd need to show this always true to make this claim.
    5353
     54\comment{
    5455Additionally, \icGrep{} provides options that allow
    5556various internal representations to be printed out.   
     
    6465less useful as it includes many
    6566details of low-level carry-handling that obscures the core logic.
     67}
    6668
    6769The precompiled calculations of the various Unicode properties
     
    196198%
    197199In Table \ref{table:complexexpr}, we show the performance results obtained
    198 from an Intel i7-2600 using precompiled binaries for each engine
    199 that are suitable for any 64-bit architecture.
     200from an Intel i7-2600 using generic 64-bit binaries for each engine.
     201We limit the SIMD ISA within \icGrep{} to SSE2 which is available
     202on all Intel/AMD 64-bit machines.
    200203%
    201204In each case, we report seconds taken per GB of input averaged over 10
    202205runs each on our Wikimedia document collection.
    203 %
    204 Table \ref{table:relperf} further studies \icGrep{} on a newer Intel
    205 i7-4700MQ architecture and evaluates the improvement gained by the
    206 newer processor and improved SIMD instruction set architecture (ISA).
    207 %
    208 Both SSE2 and AVX1 use 128-bit registers.
    209 %
    210 The main advantage of AVX1 over SSE2 is its support for 3-operand form,
    211 which helps reduce register pressure.
    212 %
    213 AVX2 utilizes the improved ISA of AVX1 but uses 256-bit registers.
    214 %
    215 However, AVX2 has half the number of 256-bit registers (16) than 128-bit registers (32).
     206
     207%
    216208
    217209% \begin{table}
     
    279271of the workload across multiple cores is clearly an area for further work.
    280272%
    281 Nevertheless, our three thread system shows up to a 40\% speedup. %  over the single threaded version
     273Nevertheless, our three-thread system shows up to a 40\% speedup. %  over the single threaded version
     274
     275
     276
     277%
     278Table \ref{table:relperf} shows the speedups obtained with \icGrep{}
     279icGrep{} on a newer Intel i7-4700MQ machine, considering three SIMD ISA alternatives
     280and both single-threaded and multi-threaded versions.
     281All speedups are relative to the
     282single-threaded performance on the i7-2600 machine = 1.0.
     283The SSE2 results are again using the generic binaries compiled for compatibility
     284with all 64-bit processors.   The AVX1 results are for Intel AVX instructions
     285in 128-bit mode.  The main advantage of AVX1 over SSE2 is its support for 3-operand form,
     286which helps reduce register pressure.   The AVX2 results are for \icGrep{}
     287compiled to use the 256-bit AVX2 instructions, processing blocks of 256 bytes at a time.
     288
    282289
    283290
     
    306313
    307314
    308 Interestingly, the SSE2 column of Table \ref{table:relperf} shows that by simply using a newer hardware
    309 improves performance by $\sim21$ and $30\%$ for the sequential and multithreaded versions of \icGrep{}.
     315Interestingly, the SSE2 column of Table \ref{table:relperf} shows that by simply using a newer hardware and compiler
     316improves performance by $21\%$ and $30\%$ for the sequential and multithreaded versions of \icGrep{}.
    310317%
    311318By taking advantage of the improved AVX1 and AVX2 ISA there are further improvements but AVX2 exhibits
  • docs/Working/icGrep/unicode-re.tex

    r4558 r4559  
    2929expected.
    3030Mismatches between scope expectations and occurrences of suffix
    31 bytes indicate errors.
     31bytes indicate errors  (we omit other error equations for brevity).
    3232Two helper streams are also useful.
    3333The Initial stream marks ASCII bytes and prefixes of multibyte sequences,
     
    5353the UnicodeClass stream for a given class involves logic for up to four positions.
    5454By convention, we define UnicodeClass($U$) for a given Unicode character class
    55 $U$ to be the stream marking the {\em final} position of
    56 Unicode character classes.
     55$U$ to be the stream marking the {\em final} position of any characters in
     56the class.
    5757
    5858Using these definitions, it is then possible to extend the matching
     
    7777\texttt{ni3} and $\text{CC}_{\texttt{hao}}$ is the bitstream that
    7878marks character \texttt{hao}.
    79 To match a two UTF-8 character sequence \texttt{ni3hao}, we first construct bitstream $M_1$, which
    80 marks the positions of the last byte of every character.
    81 An overlap between $M_1$ and $\text{CC}_{\texttt{ni3}}$ gives the start
    82 position for matching the next character.
    83 As illustrated by $M_2$, we find two matches for \texttt{ni3},
    84 and from these two positions we can start the matching process for
    85 the next character \texttt{hao}.
    86 The final result stream $M_4$ shows one match for the multibyte sequence
     79We start with the marker stream $m_0$ initialized to Initial, indicating all positions are in play.
     80Using ScanThru, we move to the final position of each character $t_1$.
     81Applying bitwise and with $\text{CC}_{\texttt{ni3}}$ and advancing gives the
     82two matches $m_1$ for ni3.  Applying ScanThru once more advances to the
     83final position of the character after \texttt{ni3}. 
     84The final result stream $m_2$ shows the lone match for the multibyte sequence
    8785\texttt{ni3hao}.
    8886
     
    9593$\text{CC}_{\text{ni3}}$                                           & \verb`..1.............1.........`\\
    9694$\text{CC}_{\text{hao}}$                                           & \verb`.....1....................`\\
    97 Initial                                                            & \verb`1..1..111111111..1..111111`\\
     95$m_0 = \mbox{\rm Initial}$                                         & \verb`1..1..111111111..1..111111`\\
    9896NonFinal                                                           & \verb`11.11.........11.11.......`\\
    99 $M_1 = \text{ScanThru}(\text{Initial}, \text{NonFinal})$           & \verb`..1..111111111..1..1111111`\\
    100 $M_2 = \text{Advance}(M_1 \land \text{CC}_{\text{ni3}})$           & \verb`...1.............1........`\\
    101 $M_3 = \text{ScanThru}(\text{Initial} \land M_2, \text{NonFinal})$ & \verb`.....1.............1......`\\
    102 $M_4 = M_3 \land CC_{\text{hao}}$                                  & \verb`.....1....................`
     97$t_1 = \text{ScanThru}(m_0, \text{NonFinal})$                      & \verb`..1..111111111..1..1111111`\\
     98$m_1 = \text{Advance}(t_1 \land \text{CC}_{\text{ni3}})$           & \verb`...1.............1........`\\
     99$t_2 = \text{ScanThru}(t_2, \text{NonFinal})$                      & \verb`.....1.............1......`\\
     100$m_2 = \text{Advance}(t_2 \land CC_{\text{hao}})$                  & \verb`......1...................`
    103101\end{tabular}
    104102\end{center}
     
    121119
    122120In order to remedy this problem, \icGrep{} again uses the NonFinal
    123 stream  to ``fill in the gaps'' in the CC bitstream so that the
     121stream  to ``fill in the gaps'' in the UnicodeClass($U$) bitstream so that the
    124122 MatchStar addition can move through a contiguous sequence of one
    125123 bits.  In this way, matching of an arbitrary Unicode character class
    126124 $U$
    127 can be implemented using ${\mbox{MatchStar}(m, U |\mbox{NonFinal})}$.
     125can be implemented using ${\mbox{MatchStar}(m, \mbox{UnicodeClass}(U) \vee \mbox{NonFinal})}$.
    128126
    129127\paragraph{Predefined Unicode Classes.}
Note: See TracChangeset for help on using the changeset viewer.