source: trunk/lib_ir/beginner_task.md @ 4315

Last change on this file since 4315 was 4315, checked in by linmengl, 4 years ago

initial checkin of the beginner's task.

File size: 2.8 KB
Line 
1Beginner's task to the IR library
2=================================
3
4Welcome to the beginner's task document. Through this document, you will learn how to make simple modifications to the Parabix LLVM.
5
6## Setup
7
81. Clone the repository of [Parabix-LLVM](http://parabix.costar.sfu.ca/svn/parabix-LLVM) and follow the instructions in README.md.
9
102. Build `lib_ir`:
11
12    cd build
13    make check
14
15## Find the optimized IR library
16
17LLVM IR source files (`s2p.ll`, `p2s.ll` and `s2p_ideal.ll`) are linked together into `ir_impl.bc`. You can find this file in the build directory.
18
19To have a look at its content, dis-assemble it with:
20
21    llvm-dis ir_impl.bc -o ir_impl.ll
22
23Now open `ir_impl.ll` in any text editor and search for `p2s_bytemerge_ir`. This is the main function used for the inverse transposition. Note all the `call` there.
24
25Our Makefile also does optimization on `ir_impl.bc`. The result is in `ir_impl_opt.bc`. Have a look at its content:
26
27    llvm-dis ir_impl_opt.bc -o ir_impl_opt.ll
28
29Now open `ir_impl_opt.ll`, find `p2s_bytemerge_ir` and you will see all the function calls are inlined now.
30
31## Find the assembly code for `mergeh_8` and `mergel_8`
32
33For the inverse transposition, to get a good performance you need to generate the right machine code for `mergeh_8` and `mergel_8`.
34
35Look at what assembly code LLVM generates is important. Let''s do this by typing:
36
37    llc-svn -O3 -mattr=+sse2 ir_impl_opt.bc
38
39BTW, if you are curious about the Haswell assembly, you can type:
40
41    llc-svn -O3 -mattr=+avx2,+sse2,+bmi2 ir_impl_opt.bc
42
43You can see how we explicitly tell llc to build with AVX2, SSE2 and BMI2.
44
45Now open `ir_impl_opt.s` and search for `mergeh_8`. You will find the following piece of code:
46
47    mergeh_8:                               # @mergeh_8
48    # BB#0:                                 # %entry
49        punpckhbw   %xmm0, %xmm1
50        movdqa  %xmm1, %xmm0
51        retl
52    .Ltmp14:
53        .size   mergeh_8, .Ltmp14-mergeh_8
54
55Seems good enough. How about `p2s_step_ir`. Oh `pextrw` spotted!
56
57    pextrw  $7, %xmm1, %edx
58    pextrw  $7, %xmm3, %eax
59    movl    %eax, 28(%esp)          # 4-byte Spill
60    pextrw  $3, %xmm1, %eax
61    movl    %eax, 24(%esp)          # 4-byte Spill
62    pextrw  $3, %xmm3, %eax
63    movl    %eax, 44(%esp)          # 4-byte Spill
64    ...
65
66`pextrw` extracts field from a SIMD register and it is always the sign of scalarization. It comes from this line of code in `ir_impl_opt.ll`:
67
68    %r0.i = lshr <8 x i16> %aa.i, %shift_mask
69
70For `p2s_step_ir` the `shift_mask` is a variable that is only clear in the run-time. So LLVM have no idea of whether the `lshr` is an arbitrary shifting or an immediate shifting (shifting with the same amount for each of the fields). LLVM decides to scalarize this `<8 x i16>` vector to fit the case of arbitrary shifting.
71
Note: See TracBrowser for help on using the repository browser.