 r4558 %than 50\% without the if-statement short-circuiting. %%% I think we'd need to show this always true to make this claim. \comment{ Additionally, \icGrep{} provides options that allow various internal representations to be printed out. less useful as it includes many details of low-level carry-handling that obscures the core logic. } The precompiled calculations of the various Unicode properties % In Table \ref{table:complexexpr}, we show the performance results obtained from an Intel i7-2600 using precompiled binaries for each engine that are suitable for any 64-bit architecture. from an Intel i7-2600 using generic 64-bit binaries for each engine. We limit the SIMD ISA within \icGrep{} to SSE2 which is available on all Intel/AMD 64-bit machines. % In each case, we report seconds taken per GB of input averaged over 10 runs each on our Wikimedia document collection. % Table \ref{table:relperf} further studies \icGrep{} on a newer Intel i7-4700MQ architecture and evaluates the improvement gained by the newer processor and improved SIMD instruction set architecture (ISA). % Both SSE2 and AVX1 use 128-bit registers. % The main advantage of AVX1 over SSE2 is its support for 3-operand form, which helps reduce register pressure. % AVX2 utilizes the improved ISA of AVX1 but uses 256-bit registers. % However, AVX2 has half the number of 256-bit registers (16) than 128-bit registers (32). % % \begin{table} of the workload across multiple cores is clearly an area for further work. % Nevertheless, our three thread system shows up to a 40\% speedup. %  over the single threaded version Nevertheless, our three-thread system shows up to a 40\% speedup. %  over the single threaded version % Table \ref{table:relperf} shows the speedups obtained with \icGrep{} icGrep{} on a newer Intel i7-4700MQ machine, considering three SIMD ISA alternatives and both single-threaded and multi-threaded versions. All speedups are relative to the single-threaded performance on the i7-2600 machine = 1.0. The SSE2 results are again using the generic binaries compiled for compatibility with all 64-bit processors.   The AVX1 results are for Intel AVX instructions in 128-bit mode.  The main advantage of AVX1 over SSE2 is its support for 3-operand form, which helps reduce register pressure.   The AVX2 results are for \icGrep{} compiled to use the 256-bit AVX2 instructions, processing blocks of 256 bytes at a time. Interestingly, the SSE2 column of Table \ref{table:relperf} shows that by simply using a newer hardware improves performance by $\sim21$ and $30\%$ for the sequential and multithreaded versions of \icGrep{}. Interestingly, the SSE2 column of Table \ref{table:relperf} shows that by simply using a newer hardware and compiler improves performance by $21\%$ and $30\%$ for the sequential and multithreaded versions of \icGrep{}. % By taking advantage of the improved AVX1 and AVX2 ISA there are further improvements but AVX2 exhibits
