wiki:CarryQueue

Version 2 (modified by eamiri, 9 years ago) (diff)

--

Carry Queue

A Carry Queue is queue of N carry variables held within a single integer value of at least N bits.

The main goal of the Carry Queue concept is to minimize the cost of recording carries for a set of variables when compiling unbounded bit stream processing code into block-by-block low-level implementation code. Rather than one separate variable that is saved/restored per carry variable, the carry queue only requires the save and restore of a single integer.

The second advantage of the carry queue concept is to optimize loop control for while loops over unbounded bit streams. If all the carry variables required within the loop are held in the same carry queue, then the loop condition requiring a logical-or combination of these variables can be easily implemented as a mask selection from the carry queue (or just using the carry queue value if no other variables than loop variables are stored within this carry queue).

The third advantage of the carry queue is that a single ADC can be used to both store one carry value and load the next value, replacing a LAHF/SAHF pair of operations.

The technique is this: Let accum be the carry queue variable that we want to hold all the carries. It is initialized to 0. When a carry value is needed from a previous iteration, we arrange that the carry bit is at the high end of the carry accumulator. Then the operation accum = adc(accum, accum) loads the value of this bit into the carry flag and also shifts the remaining values in the carry queue to the left one position, making the next carry variable ready for loading into the carry flag.

However, the adc operation above also stores the current value of the carry flag into the low end of the carry queue at the same time. In essence, we have a combine enqueue/dequeue operation.

This FIFO processing of carry variables requires that each iteration of block-by-block bitstream processing performs the same number of enqueue/dequeue operations in the same order.

If the number of carry variables N is less than the register width R, then a shift operation accum <<= R-N is required between each loop iteration to position the carry variables from the just completed iteration for the next iteration.


A note on implementation

The implementation of carry queue has two steps. First we need to modify the current C macro for addition. We need to pick a register the contains the carry queue and use adc on that register rather than lahf, sahf operations. This will affect the template program not the compiler.

Second step is in the compiler, where we need to take care of two things. First, when there are more than 64 carries, we will have more than one carry queue. The compiler is responsible to make sure that the right carry queue is loaded at the right time. The second thing that the compiler should take care of is a shift in carry queue that might be needed to be done at the end of processing of each block.

The code for handling this issue in the compiler might have dependency on the target language. For example in C we may need to call a macro or function that has inline assembly while in assembly we may do something slightly different. Also it has minor interaction with Factor Out pass of the compiler. If a code for loading/storing carry queues is needed to be put inside an if-then-else statement, this code must be put inside both branches of the statement. So one may prefer to put this pass after the Factor Out pass so that workload of Factor Out does not become larger, but this is not necessary.