Ignore:
Timestamp:
May 15, 2015, 6:32:30 AM (4 years ago)
Author:
cameron
Message:

Tidy up UTF-8 and evaluation sections

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icGrep/evaluation.tex

    r4558 r4559  
    5252%than 50\% without the if-statement short-circuiting. %%% I think we'd need to show this always true to make this claim.
    5353
     54\comment{
    5455Additionally, \icGrep{} provides options that allow
    5556various internal representations to be printed out.   
     
    6465less useful as it includes many
    6566details of low-level carry-handling that obscures the core logic.
     67}
    6668
    6769The precompiled calculations of the various Unicode properties
     
    196198%
    197199In Table \ref{table:complexexpr}, we show the performance results obtained
    198 from an Intel i7-2600 using precompiled binaries for each engine
    199 that are suitable for any 64-bit architecture.
     200from an Intel i7-2600 using generic 64-bit binaries for each engine.
     201We limit the SIMD ISA within \icGrep{} to SSE2 which is available
     202on all Intel/AMD 64-bit machines.
    200203%
    201204In each case, we report seconds taken per GB of input averaged over 10
    202205runs each on our Wikimedia document collection.
    203 %
    204 Table \ref{table:relperf} further studies \icGrep{} on a newer Intel
    205 i7-4700MQ architecture and evaluates the improvement gained by the
    206 newer processor and improved SIMD instruction set architecture (ISA).
    207 %
    208 Both SSE2 and AVX1 use 128-bit registers.
    209 %
    210 The main advantage of AVX1 over SSE2 is its support for 3-operand form,
    211 which helps reduce register pressure.
    212 %
    213 AVX2 utilizes the improved ISA of AVX1 but uses 256-bit registers.
    214 %
    215 However, AVX2 has half the number of 256-bit registers (16) than 128-bit registers (32).
     206
     207%
    216208
    217209% \begin{table}
     
    279271of the workload across multiple cores is clearly an area for further work.
    280272%
    281 Nevertheless, our three thread system shows up to a 40\% speedup. %  over the single threaded version
     273Nevertheless, our three-thread system shows up to a 40\% speedup. %  over the single threaded version
     274
     275
     276
     277%
     278Table \ref{table:relperf} shows the speedups obtained with \icGrep{}
     279icGrep{} on a newer Intel i7-4700MQ machine, considering three SIMD ISA alternatives
     280and both single-threaded and multi-threaded versions.
     281All speedups are relative to the
     282single-threaded performance on the i7-2600 machine = 1.0.
     283The SSE2 results are again using the generic binaries compiled for compatibility
     284with all 64-bit processors.   The AVX1 results are for Intel AVX instructions
     285in 128-bit mode.  The main advantage of AVX1 over SSE2 is its support for 3-operand form,
     286which helps reduce register pressure.   The AVX2 results are for \icGrep{}
     287compiled to use the 256-bit AVX2 instructions, processing blocks of 256 bytes at a time.
     288
    282289
    283290
     
    306313
    307314
    308 Interestingly, the SSE2 column of Table \ref{table:relperf} shows that by simply using a newer hardware
    309 improves performance by $\sim21$ and $30\%$ for the sequential and multithreaded versions of \icGrep{}.
     315Interestingly, the SSE2 column of Table \ref{table:relperf} shows that by simply using a newer hardware and compiler
     316improves performance by $21\%$ and $30\%$ for the sequential and multithreaded versions of \icGrep{}.
    310317%
    311318By taking advantage of the improved AVX1 and AVX2 ISA there are further improvements but AVX2 exhibits
Note: See TracChangeset for help on using the changeset viewer.