source: icGREP/icgrep-devel/icgrep/kernels/pdep_kernel.h @ 5611

Last change on this file since 5611 was 5588, checked in by cameron, 23 months ago

PDEP kernel - initial check-in from Adam

File size: 5.3 KB
2 *  Copyright (c) 2016 International Characters.
3 *  This software is licensed to the public under the Open Software License 3.0.
4 */
5#ifndef PDEP_KERNEL_H
6#define PDEP_KERNEL_H
8#include "kernel.h"
9#include <llvm/IR/Value.h>
10namespace IDISA { class IDISA_Builder; }
12What this kernel does:
14Given a swizzled input stream set and a PDEP marker stream, apply a PDEP operation to each of the input streams in
15the input stream set. The PDEPed result streams are returned in a swizzled output stream set.
17How it works:
19You should know how the PDEP operation works before continuing (Wikipedia has a pretty good explanation.)
21The swizzled configuration of the input streams mean that the first blockWidth/mSwizzleFactor bits of each input
22stream are contained in the first BitBlock of the first input StreamSetBlock. The second BitBlock contains the next
23blockWidth/mSwizzleFactor bits for each input stream, and so on. The key observation underpinning the action of the PDEP kernel is that we apply the PDEP operation
24using blockWidth/mSwizzleFactor bits of an input stream as the source bits. Since the first swizzle contains blockWidth/mSwizzleFactor
25bits from each of the input streams, we can begin processing the input streams in the input stream set by applying the first blockWidth/mSwizzleFactor
26bits of the PDEP marker stream to each of the swizzle fields in the first BitBlock.
28We can continue using the first blockWidth/mSwizzleFactor bits of each input stream until we have completely consumed it. This occurs
29when the combined popcount of the PDEP masks we've used up to this point > blockWidth/mSwizzleFactor. Once we've exhausted the first
30BitBlock (i.e. swizzle), we move on to the next one. This pattern continues until we've consumed
31the entire PDEP marker stream. Note that it's possible for the kernel to consume the entire PDEP marker
32stream without consuming the entirety of the first BitBlock in the first BitStreamBlock, if the PDEP marker stream has a low popcount
33(i.e. > blockWidth/mSwizzleFactor).
35There is actually a slight complication that was glossed over in the description above. Consider the following scenario: we've consumed
36half of a blockWidth/mSwizzleFactor segment, and we're now starting the PDEP loop again. However, this time the PDEP marker stream segment is
370xffffffff. That is, the popcount is 64. That means we'll consume 64 bits from the source bit stream, but the current segment only contains 64/2 =
3832 bits. To get around this issue, we "look ahead" to the next segment, whether that next segment is the next BitBlock in the current StreamSetBlock
39or the first BitBlock in the next StreamSetBlock. Regardless of where we find the segment, we combine the current segment and the next segement in
40such a way that we're guarenteed to have 64 source bits to pass to the PDEP operation. The logic responsible for creating this "combined" value
41can be found immediately after the opening brace of the outer for loop in the definition of the generateDoBlockMethod function:
43Value * current_blk_idx = kb->CreateSub(kb->CreateUDiv(updatedProcessedBits, blockWidth), base_block_idx);  // blk index == stream set block index
45// kb->CreateUDiv(updatedProcessedBits, blockWidth) gives us the absolute block idx of the current block.
46// However, getAdjustedStreamBlockPtr (used later) calculates an offseted block for us based on the processed item count
47// of sourceStreamSet. We want to get the index of the block we're currently processing relative to the
48// "base block" calculated by getAdjustedInputStreamPtr. That's why we subtract base_block_idx from
49// kb->CreateUDiv(updatedProcessedBits, blockWidth)
51Value * current_swizzle_idx = kb->CreateUDiv(kb->CreateURem(updatedProcessedBits, blockWidth), pdepWidth);
53// updatedProcessedBits % blockWidth is how many bits of the current block we've processed.
54// Divide that by pdepWidth to get which BitBlock/swizzle we're currently processing.
56Value * next_block_idx = kb->CreateSub(kb->CreateUDiv(kb->CreateAdd(pdepWidth, updatedProcessedBits), blockWidth), base_block_idx);
57Value * next_swizzle_idx = kb->CreateUDiv(kb->CreateURem(kb->CreateAdd(pdepWidth, updatedProcessedBits), blockWidth), pdepWidth);
58// Q: Why add pdepWidth (i.e. 64) and not 256?
59// A: Although it is true that each BitBlock/swizzle contains 256 bits, each swizzle only contains 64 bits from each of the streams
60// it is composed of. Each 64 bit field is consumed in parallel, at the same rate. Therefore, once we've consumed "64 bits"
61// we've actually consumed 64*4 bits, and it's time to move to the next one.
64namespace kernel {
65class PDEPkernel final : public BlockOrientedKernel {
67    PDEPkernel(const std::unique_ptr<kernel::KernelBuilder> & kb, unsigned streamCount, unsigned PDEP_width = 64);
68    bool isCachable() const override { return true; }
69    bool hasSignature() const override { return false; }
71    const unsigned mSwizzleFactor;
72    const unsigned mPDEPWidth;
73    void generateDoBlockMethod(const std::unique_ptr<KernelBuilder> & kb) override;
74    std::vector<llvm::Value *> get_PDEP_masks(const std::unique_ptr<KernelBuilder> & kb, llvm::Value * PDEP_ms_blk,
75                                              const unsigned mask_width);
76    std::vector<llvm::Value *> get_block_popcounts(const std::unique_ptr<KernelBuilder> & kb, llvm::Value * blk,
77                                                  const unsigned field_width);
Note: See TracBrowser for help on using the repository browser.