Ignore:
Timestamp:
Dec 8, 2014, 3:04:28 PM (4 years ago)
Author:
linmengl
Message:

cont. beginner's task

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/lib_ir/beginner_task.md

    r4315 r4326  
    22=================================
    33
    4 Welcome to the beginner's task document. Through this document, you will learn how to make simple modifications to the Parabix LLVM.
     4Welcome to the beginner's task. Through this task, you will get familiar with the LLVM tools. The goal of this task is to find the implementation of the inverse transposition.
    55
    66## Setup
     
    1515## Find the optimized IR library
    1616
    17 LLVM IR source files (`s2p.ll`, `p2s.ll` and `s2p_ideal.ll`) are linked together into `ir_impl.bc`. You can find this file in the build directory.
     17LLVM IR source files (`s2p.ll`, `p2s.ll` and `s2p_ideal.ll`) are linked together into `ir_impl.bc` by cmake. You can find this file in the `build` directory. LLVM IR can reside in two types of files: `.ll` and `.bc`. `.ll` is its text format for human and `.bc` is its bitcode format for compressed storage.
    1818
    19 To have a look at its content, dis-assemble it with:
     19To have a look at `ir_impl.bc`, dis-assemble it with:
    2020
    2121    llvm-dis ir_impl.bc -o ir_impl.ll
     
    4141    llc-svn -O3 -mattr=+avx2,+sse2,+bmi2 ir_impl_opt.bc
    4242
    43 You can see how we explicitly tell llc to build with AVX2, SSE2 and BMI2.
     43By using `llc`, we compile LLVM IR file `ir_impl_opt.bc` into naive machine assembly. You can see how we explicitly tell `llc` to build with AVX2, SSE2 and BMI2.
    4444
    4545Now open `ir_impl_opt.s` and search for `mergeh_8`. You will find the following piece of code:
     
    6868    %r0.i = lshr <8 x i16> %aa.i, %shift_mask
    6969
    70 For `p2s_step_ir` the `shift_mask` is a variable that is only clear in the run-time. So LLVM have no idea of whether the `lshr` is an arbitrary shifting or an immediate shifting (shifting with the same amount for each of the fields). LLVM decides to scalarize this `<8 x i16>` vector to fit the case of arbitrary shifting.
     70But, is this the real performance bottleneck?
    7171
     72For `p2s_step_ir`, the `shift_mask` is a variable that is only available in the run time, so LLVM could not assume anything about the shifting amount. It decides to scalarize this `<8 x i16>` vector to meet the need of the hardest case: arbitrary amount for each field.
     73
     74However, every time we call `p2s_step_ir`, we call it with a constant `shift_mask`. These constants are propagated onto the inlined `p2s_step_ir` functions. The clue of this propagation is in `p2s_bytemerge_ir`. Find this function in `ir_impl_opt.ll` and you will see the following code:
     75
     76    %aa.i.i181 = bitcast <4 x i32> %p4 to <8 x i16>
     77    %r0.i.i182 = lshr <8 x i16> %aa.i.i181, <i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4>
     78
     79The `shift_mask` is replaced with a constant vector. LLVM then recognizes this `lshr` as an immediate shifting (shifting with the same amount for each of the fields). Immediate shifting can be compiled into better assembly code such as `psllw`. This explains why we can't find `pextrw` in the assembly of `p2s_bytemerge_ir`.
     80
Note: See TracChangeset for help on using the changeset viewer.