Ignore:
Timestamp:
Sep 2, 2017, 11:59:14 PM (20 months ago)
Author:
cameron
Message:

PDEP kernels from Adam with pdep_width_less_1 fix

File:
1 edited

Legend:

Unmodified
Added
Removed
  • icGREP/icgrep-devel/icgrep/kernels/pdep_kernel.h

    r5588 r5627  
    11/*
    2  *  Copyright (c) 2016 International Characters.
     2 *  Copyright (c) 2017 International Characters.
    33 *  This software is licensed to the public under the Open Software License 3.0.
    44 */
     
    1313
    1414Given a swizzled input stream set and a PDEP marker stream, apply a PDEP operation to each of the input streams in
    15 the input stream set. The PDEPed result streams are returned in a swizzled output stream set.
     15the input stream set. The PDEPed result streams are returned in a swizzled output stream set.
     16
     17The length of the input stream set (in bits) must be greater than or equal to the total popcount of the PDEP marker
     18stream, otherwise the PDEP operation will run out of source bits before the entire PDEP stream has been processed.
    1619
    1720How it works:
     
    1922You should know how the PDEP operation works before continuing (Wikipedia has a pretty good explanation.)
    2023
    21 The swizzled configuration of the input streams mean that the first blockWidth/mSwizzleFactor bits of each input
     24The swizzled configuration of the input streams mean that the first blockWidth/mSwizzleFactor bits of each (unswizzled) input
    2225stream are contained in the first BitBlock of the first input StreamSetBlock. The second BitBlock contains the next
    2326blockWidth/mSwizzleFactor bits for each input stream, and so on. The key observation underpinning the action of the PDEP kernel is that we apply the PDEP operation
    24 using blockWidth/mSwizzleFactor bits of an input stream as the source bits. Since the first swizzle contains blockWidth/mSwizzleFactor
     27using blockWidth/mSwizzleFactor bits of an input stream as the source bits. Since the first BitBlock (i.e. swizzle) contains blockWidth/mSwizzleFactor
    2528bits from each of the input streams, we can begin processing the input streams in the input stream set by applying the first blockWidth/mSwizzleFactor
    2629bits of the PDEP marker stream to each of the swizzle fields in the first BitBlock.
    2730
    28 We can continue using the first blockWidth/mSwizzleFactor bits of each input stream until we have completely consumed it. This occurs
     31We continue using the first blockWidth/mSwizzleFactor bits of each input stream until we have completely consumed them. This occurs
    2932when the combined popcount of the PDEP masks we've used up to this point > blockWidth/mSwizzleFactor. Once we've exhausted the first
    3033BitBlock (i.e. swizzle), we move on to the next one. This pattern continues until we've consumed
     
    6366
    6467namespace kernel {
    65 class PDEPkernel final : public BlockOrientedKernel {
     68class PDEPkernel : public MultiBlockKernel {
    6669public:
    67     PDEPkernel(const std::unique_ptr<kernel::KernelBuilder> & kb, unsigned streamCount, unsigned PDEP_width = 64);
     70    PDEPkernel(const std::unique_ptr<kernel::KernelBuilder> & kb, unsigned streamCount, unsigned swizzleFactor, unsigned PDEP_width = 64);
    6871    bool isCachable() const override { return true; }
    6972    bool hasSignature() const override { return false; }
     
    7174    const unsigned mSwizzleFactor;
    7275    const unsigned mPDEPWidth;
    73     void generateDoBlockMethod(const std::unique_ptr<KernelBuilder> & kb) override;
     76    void generateMultiBlockLogic(const std::unique_ptr<KernelBuilder> & kb) override;
    7477    std::vector<llvm::Value *> get_PDEP_masks(const std::unique_ptr<KernelBuilder> & kb, llvm::Value * PDEP_ms_blk,
    7578                                              const unsigned mask_width);
    7679    std::vector<llvm::Value *> get_block_popcounts(const std::unique_ptr<KernelBuilder> & kb, llvm::Value * blk,
    77                                                   const unsigned field_width);
    78 
     80                                                   const unsigned field_width);
    7981};   
    8082}
    8183   
    8284#endif
    83 
Note: See TracChangeset for help on using the changeset viewer.