wiki:WhileLoopCarryOptimizationStrategy

While Loop 64-Bit Compilation Strategy

The idea is to compile all operations inside a bitstream while loop as 64-bit operations, including the bitwise logic operations that otherwise could be performed 128 or 256 bits at a time using general registers.

The expected benefits of this strategy are the following.

  • There is no data movement between 64-bit and SIMD registers within the loop.
  • A single CarryQueue can be created for all loop variable carries.
  • The compiled while loop condition can be made into a single logical-or operation using the loop CarryQueue and the controlling cursor stream.

For example consider the comment-CDATA-PI loop in source:proto/parabix2/parabix2.py@322#L168. There are 14 logical operations within the loop body. So performing them 64 bits at a time rather than 128 costs an additional 14 operations. But with superscalar execution of 3 logical operations per cycle the actual cost should only be about 5 cycles. However, the entire while loop can now be processed using 64-bit operations uniformly, avoiding any 128-bit to 64-bit movements. There are 7 variables involved in advance/scan operations. This would cost 4 data movement operations each per 128-bit block, or 28 data movements in all. If the data movements are indirect via memory, this might a total of 56 cycles.

Last modified 9 years ago Last modified on Dec 20, 2009, 9:02:10 AM