source: icGREP/icgrep-devel/icgrep/kernels/pdep_kernel.h @ 5865

Last change on this file since 5865 was 5857, checked in by xwa163, 21 months ago
  1. Fix crash of pdep_kernel
  2. add initial version for character_deposit
File size: 5.7 KB
2 *  Copyright (c) 2017 International Characters.
3 *  This software is licensed to the public under the Open Software License 3.0.
4 */
5#ifndef PDEP_KERNEL_H
6#define PDEP_KERNEL_H
8#include "kernel.h"
9#include <llvm/IR/Value.h>
10#include <string>
11namespace IDISA { class IDISA_Builder; }
13What this kernel does:
15Given a swizzled input stream set and a PDEP marker stream, apply a PDEP operation to each of the input streams in
16the input stream set. The PDEPed result streams are returned in a swizzled output stream set.
18The length of the input stream set (in bits) must be greater than or equal to the total popcount of the PDEP marker
19stream, otherwise the PDEP operation will run out of source bits before the entire PDEP stream has been processed.
21How it works:
23You should know how the PDEP operation works before continuing (Wikipedia has a pretty good explanation.)
25The swizzled configuration of the input streams mean that the first blockWidth/mSwizzleFactor bits of each (unswizzled) input
26stream are contained in the first BitBlock of the first input StreamSetBlock. The second BitBlock contains the next
27blockWidth/mSwizzleFactor bits for each input stream, and so on. The key observation underpinning the action of the PDEP kernel is that we apply the PDEP operation
28using blockWidth/mSwizzleFactor bits of an input stream as the source bits. Since the first BitBlock (i.e. swizzle) contains blockWidth/mSwizzleFactor
29bits from each of the input streams, we can begin processing the input streams in the input stream set by applying the first blockWidth/mSwizzleFactor
30bits of the PDEP marker stream to each of the swizzle fields in the first BitBlock.
32We continue using the first blockWidth/mSwizzleFactor bits of each input stream until we have completely consumed them. This occurs
33when the combined popcount of the PDEP masks we've used up to this point > blockWidth/mSwizzleFactor. Once we've exhausted the first
34BitBlock (i.e. swizzle), we move on to the next one. This pattern continues until we've consumed
35the entire PDEP marker stream. Note that it's possible for the kernel to consume the entire PDEP marker
36stream without consuming the entirety of the first BitBlock in the first BitStreamBlock, if the PDEP marker stream has a low popcount
37(i.e. > blockWidth/mSwizzleFactor).
39There is actually a slight complication that was glossed over in the description above. Consider the following scenario: we've consumed
40half of a blockWidth/mSwizzleFactor segment, and we're now starting the PDEP loop again. However, this time the PDEP marker stream segment is
410xffffffff. That is, the popcount is 64. That means we'll consume 64 bits from the source bit stream, but the current segment only contains 64/2 =
4232 bits. To get around this issue, we "look ahead" to the next segment, whether that next segment is the next BitBlock in the current StreamSetBlock
43or the first BitBlock in the next StreamSetBlock. Regardless of where we find the segment, we combine the current segment and the next segement in
44such a way that we're guarenteed to have 64 source bits to pass to the PDEP operation. The logic responsible for creating this "combined" value
45can be found immediately after the opening brace of the outer for loop in the definition of the generateDoBlockMethod function:
47Value * current_blk_idx = kb->CreateSub(kb->CreateUDiv(updatedProcessedBits, blockWidth), base_block_idx);  // blk index == stream set block index
49// kb->CreateUDiv(updatedProcessedBits, blockWidth) gives us the absolute block idx of the current block.
50// However, getAdjustedStreamBlockPtr (used later) calculates an offseted block for us based on the processed item count
51// of sourceStreamSet. We want to get the index of the block we're currently processing relative to the
52// "base block" calculated by getAdjustedInputStreamPtr. That's why we subtract base_block_idx from
53// kb->CreateUDiv(updatedProcessedBits, blockWidth)
55Value * current_swizzle_idx = kb->CreateUDiv(kb->CreateURem(updatedProcessedBits, blockWidth), pdepWidth);
57// updatedProcessedBits % blockWidth is how many bits of the current block we've processed.
58// Divide that by pdepWidth to get which BitBlock/swizzle we're currently processing.
60Value * next_block_idx = kb->CreateSub(kb->CreateUDiv(kb->CreateAdd(pdepWidth, updatedProcessedBits), blockWidth), base_block_idx);
61Value * next_swizzle_idx = kb->CreateUDiv(kb->CreateURem(kb->CreateAdd(pdepWidth, updatedProcessedBits), blockWidth), pdepWidth);
62// Q: Why add pdepWidth (i.e. 64) and not 256?
63// A: Although it is true that each BitBlock/swizzle contains 256 bits, each swizzle only contains 64 bits from each of the streams
64// it is composed of. Each 64 bit field is consumed in parallel, at the same rate. Therefore, once we've consumed "64 bits"
65// we've actually consumed 64*4 bits, and it's time to move to the next one.
68namespace kernel {
69class PDEPkernel : public MultiBlockKernel {
71    PDEPkernel(const std::unique_ptr<kernel::KernelBuilder> & kb, unsigned streamCount, unsigned swizzleFactor, unsigned PDEP_width = 64, std::string name = "PDEPdel");
72    bool isCachable() const override { return true; }
73    bool hasSignature() const override { return false; }
75    const unsigned mSwizzleFactor;
76    const unsigned mPDEPWidth;
77    void generateMultiBlockLogic(const std::unique_ptr<KernelBuilder> & kb, llvm::Value * const numOfStrides) override;
78    std::vector<llvm::Value *> get_PDEP_masks(const std::unique_ptr<KernelBuilder> & kb, llvm::Value * PDEP_ms_blk,
79                                              const unsigned mask_width);
80    std::vector<llvm::Value *> get_block_popcounts(const std::unique_ptr<KernelBuilder> & kb, llvm::Value * blk,
81                                                   const unsigned field_width);
Note: See TracBrowser for help on using the repository browser.