Version 5 (modified by cameron, 4 years ago) (diff)


Using LLVM Tools with Parabix

Let's see what we can do with the LLVM tools. We will apply Clang, the LLVM C++ compiler as well as several back-end tools, such as llc, opt, llvm-dis.

We demonstrate the tools on some prototype Parabix code for regular expression matching.

Check It Out

First get yourself a copy of what you need, installed on a Ubuntu 64-bit machine.

mkdir proto
cd proto
svn co
svn co
svn co

A Regular Expression Demo

Now let's compile code for matching the regular expression: ([^ @]+)@([^ @]+). It should be in the file RE/data/test/, modify as needed. Now run the regular compilation chain with gcc.

cd RE
cd output
cd src

Hmm. A lot of work, but now we have the executable in re. Let's look at how it performs.

perf stat -e cycles:u,instructions:u ./re ../../performance/data/howto -c
Matching Lines:15057

 Performance counter stats for './re ../../performance/data/howto -c':

        62,431,920 cycles:u                  #    0.000 GHz                    
       145,810,959 instructions:u            #    2.34  insns per cycle        

       0.030353954 seconds time elapsed

Your results with vary, depending on your machine. For interest, let's compare with egrep.

perf stat -e cycles:u,instructions:u egrep '([^ @]+)@([^ @]+)' ../../performance/data/howto -c

 Performance counter stats for 'egrep ([^ @]+)@([^ @]+) ../../performance/data/howto -c':

       595,201,507 cycles:u                  #    0.000 GHz                    
     1,500,703,136 instructions:u            #    2.52  insns per cycle        

       0.168963462 seconds time elapsed

Well egrep found the same number of matches, but a lot slower!

Using LLVM Tools

Now let's look at using the LLVM tools.

Clang Is An Alternative to GCC

cd proto/RE/output/demo1src
clang++ -msse2  -O3 -std=gnu++0x -o clang-re re.cpp  -I../util -I../lib/ -I../lib/cc-lib/ 

This just uses clang as a compiler instead of gcc.

perf stat -e cycles:u,instructions:u ./clang-re ../../performance/data/howto -c
Matching Lines:15057

 Performance counter stats for './clang-re ../../performance/data/howto -c':

        58,330,370 cycles:u                  #    0.000 GHz                    
       136,538,756 instructions:u            #    2.34  insns per cycle        

       0.029585823 seconds time elapsed

Even better than gcc!

Getting to LLVM IR

Now let's try breaking it down in steps, using LLVM tools. First, we use clang to make LLVM bitcode.

clang++ -msse2  -O3 -std=gnu++0x -emit-llvm -c -o llvm-re.bc re.cpp  -I../util -I../lib/ -I../lib/cc-lib/

Now bitcode is not readable, but we can use llvm-dis to disassemble and produce a corresponding llvm-re.ll file containing IR code.

llvm-dis-3.4 llvm-re.bc


If we look through the generated IR file, we see all sorts of interesting code sequences, like this one.

  %324 = and <2 x i64> %263, <i64 71777214294589695, i64 71777214294589695>
  %325 = and <2 x i64> %287, <i64 71777214294589695, i64 71777214294589695>
  %326 = bitcast <2 x i64> %325 to <8 x i16>
  %327 = bitcast <2 x i64> %324 to <8 x i16>
  %328 = call <16 x i8> @llvm.x86.sse2.packuswb.128(<8 x i16> %326, <8 x i16> %327) #2
  %329 = bitcast <16 x i8> %328 to <2 x i64>
  %330 = bitcast <16 x i8> %328 to <8 x i16>
  %331 = call <8 x i16> @llvm.x86.sse2.psrli.w(<8 x i16> %330, i32 4) #2

What we should recognize right away is that LLVM seems to have a notion of SIMD vector types built in: <2 x i64> and <8 x i16> are used extensively. Also note that we can directly use SSE2 intrinsics from LLVM.

LLC Static Optimizer

We can now use llc to generate object code for our current machine.

llc-3.4  -filetype=obj  llvm-re.bc  -o  opt-re.o
clang++ opt-re.o -o opt-re

Now check out the performance.

cameron@cs-osl-06:~/proto/RE/output/demo1src$ perf stat -e cycles:u,instructions:u ./opt-re ../../performance/data/howto -c
Matching Lines:15057

 Performance counter stats for './opt-re ../../performance/data/howto -c':

        49,439,793 cycles:u                  #    0.000 GHz                    
       113,650,693 instructions:u            #    2.30  insns per cycle        

       0.021609171 seconds time elapsed

This is quite a big change! What's going on?

We shouldn't get too excited. In this case, the key here is that llc was compiling to the current architecture including avx2 extensions, even though the original command line argument was -msse2.

Generating Assembly

We can prove this by using llc to generate assembly instead of object code.

llc-3.4  -filetype=asm  llvm-re.bc  -o  opt-re.s

If we inspect the output, we find that, for example, pxor instructions are replaced by vpxor, throughout.

If we want to force llc to use an older architecture, we can name it.

llc-3.4 -mcpu=core2 -filetype=asm  llvm-re.bc  -o  opt-re-core2.s

Now only the SSE2 pxor appears.

Generating C++

Another interesting option is generating C++ code.

llc-3.4 -march=cpp llvm-re.bc  -o  opt-re.cpp

The output is a program that regenerates the given IR.