wiki:SSE2_Hoisting

Version 4 (modified by cameron, 4 years ago) (diff)

--

SSE2 Hoisting

The goal of this project is to turn SSE2 specific code written against the SSE2 SIMD instruction set into high-performance platform-independent code expressed using LLVM IR.

The project will be design as a transformation Pass that may be used to create a standalone hoisting tool or as a component in a larger tool.

Subgoals

This project has two main subgoals.

  1. Hoisting.

Replace SSE2 intrinsics by single LLVM IR operations or short sequences of LLVM IR operations, wherever possible. Bitcasts may be freely introduced as required without penalty. In general, constant vectors may also be introduced with the expectation that the vector may either be completely eliminated during code generation or that the penalty will be small.

The final project report should include a table listing *all* SSE2 intrinsics and the hoisting transformation used. If the hoisting requires a nontrivial implementation involving more than a few LLVM instructions, then an LLVM function should be defined. Note: the ideal is to use a single LLVM operation. If this is not possible, a sequence of instructions implementing the intrinsic as a parallel operation should be used. Only as a last resort should a fully sequential implementation be provided. Marks will be deducted for clearly suboptimal implementations.

  1. Hoist-Aware Code Generation.

In this phase, we identify code selection and generation strategies to reverse effect of hoisting for SSE2 targets and to perform SSE2-aware optimizations for other targets.

  1. In each case that an SSE2 intrinsic requires a sequence of LLVM operations (excluding bitcasts and constants), ensure that SSE2 code generation recognizes the transformed sequence to allow the single intrinsic to be produced during code generation.
  1. Modify the code generator for at least one other target to recognize sequences produces by SSE2 hoisting and generate efficient code based on that recognition.

Project Evaluation

Project evaluation will require the assessment of the SSE2 Hoisting system against at least one, but preferably two open-source code bases that make nontrivial use of SSE2 code.

Ideally each open-source code base will have its own test suite. Using the test suite, the first assessment for each project is to demonstrate that the result of hoisting is functionally equivalent to the original program, i.e., passes all the tests that the original does. This correctness testing should be applied for each target architecture.

The second assessment will be to assess performance. This will involve measuring performance of the original implementation against selected test data and comparing against the following alternative versions.

  1. The result of the hoisted code compiled with SSE2/x86-64 as the target SIMD architecture.
  1. The result of the hoisted code compiled with a later SIMD ISA on x86-64 architecture.
  1. The result of the hoisted code complied with non x86 architecture, such as ARM with Neon SIMD instructions.

I2Result Issue

We seem to have an issue with the i2 results in hoisting movemask_pd.