Changeset 3471 for docs/Working


Ignore:
Timestamp:
Sep 13, 2013, 3:53:08 PM (6 years ago)
Author:
ksherdy
Message:

Added section on hardware. References.

Location:
docs/Working/re
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/re/ppopp-re.tex

    r3469 r3471  
    114114and hence the name nrgrep \cite{navarro2000}.
    115115
     116{\em a brief review} 
    116117There has been considerable interest in using parallelization techniques
    117 to improve the performance of regular expression matching.
    118 {\em a brief review} 
    119 
    120 Scarpazza and Braudaway demonstrate that irregular,
    121 control-flow dominated text processing algorithms can be efficiently
    122 executed on multicore hardware \cite{scarpazza2008fast}.
    123 In related work, Pasetto et al. present a flexible tool that
     118to improve the performance of regular expression matching on parallel hardware
     119such as multi-core processors (CPUs), graphics processing units (GPUs),
     120field-programmable gate arrays (FPGAs), and even more exotic architectures such as
     121the Cell Broadband Engine (Cell BE). % FPGA results (synthesis of patterns into logic circuits) vs. memory based approaches (STTs in memory)
     122%CPU
     123Scarpazza and Braudaway \cite{scarpazza2008fast} demonstrated that
     124text processing algorithms that exhibit irregular memory access patterns
     125can be efficiently executed on multicore hardware.
     126In related work, Pasetto et al. presented a flexible tool that
    124127performs small-ruleset regular expression matching at a rate of
    1251282.88 Gbps per chip on Intel Xeon E5472 hardware \cite{pasetto2010}.
    126 Naghmouchi et al. demonstrate that the Aho-Corasick
    127 string matching algorithm is well suited for implementation on commodity multicore, the
    128 Cell Broadband Engine and GPGPU hardware \cite{scarpazza2011top, naghmouchi2010}. In each case, both thread-
    129 level (additional cores) and data-level (SIMD units) hardware features are leveraged for performance.
    130 On the Cell Broadband Engine, Scarpazza’s implementation delivers a throughput of 40
    131 Gbps for a small dictionary size of approximately 100 patterns, and a throughput of 3.3-3.4
    132 Gbps for a larger dictionary of tens of thousands of patterns.
    133 Scarpazza and Russell present a SIMD tokenizer that delivers 1.00–1.78 Gbps on a single
    134 Cell Broadband Engine chip \cite{scarpazza2008} and extend this approach for emulation on the Intel Larrabee
    135 instruction set \cite{scarpazza2009larrabee}. Iorio and van Lunteren describe a string matching implementation for
    136 automata that achieves 4 Gbps on the Cell Broadband Engine \cite{iorio2008}.
    137 In more recent work,
    138 
    139 TODO
    140 
    141 SIMD\cite{salapura2012accelerating}
    142 
    143 and
    144 % GPUs\cite{lin2013accelerating}
    145 TODO
     129Naghmouchi et al. demonstrated that the Aho-Corasick (AC)
     130string matching algorithm \cite{aho1975} is well suited for parallel
     131implementation on multi-core CPUs, GPUs and the Cell BE \cite{scarpazza2011top, naghmouchi2010}.
     132On each hardware, both thread-level parallelism (additional cores) and data-level parallelism
     133(wide SIMD units) are leveraged for performance.
     134Salapura et. al., advocated the use of vector-style processing for regular expressions
     135in business analytics applications and leveraged the SIMD hardware available
     136on multi-core processors to acheive a speedup of better than 1.8 over a
     137range of data sizes of interest \cite{salapura2012accelerating}.
     138%Cell
     139In \cite{scarpazza2008}, Scarpazza and Russell presented a SIMD tokenizer
     140that delivered 1.00–1.78 Gbps on a single
     141Cell BE chip and extended this approach for emulation on the Intel Larrabee
     142instruction set \cite{scarpazza2009larrabee}.
     143On the Cell BE, Scarpazza \cite{scarpazza2009cell} described a pattern matching
     144implementation that delivered a throughput of 40
     145Gbps for a small dictionary of approximately 100 patterns, and a throughput of 3.3-3.4
     146Gbps for a larger dictionary of thousands of patterns. Iorio and van Lunteren \cite{iorio2008}
     147presented a string matching implementation for automata that achieves
     1484 Gbps on the Cell BE.
     149% GPU
     150In more recent work, Tumeo et al. \cite{tumeo2010efficient} presented a chunk-based
     151implementation of the AC algorithm for
     152accelerating string matching on GPUs. Lin et al., proposed
     153the Parallel Failureless Aho-Corasick (PFAC)
     154algorithm to accelerate pattern matching on GPU hardware and
     155achieved 143 Gbps throughput, 14.74 times faster
     156than the AC algorithm performed on a four core
     157multi-core processor using OpenMP \cite{lin2013accelerating}.
    146158
    147159Whereas the existing approaches to parallelization have been
  • docs/Working/re/reference.bib

    r3468 r3471  
    4949  publisher={Springer}
    5050}
    51 
    5251
    5352@book{aho2007,
     
    268267        pages={20}
    269268}
     269
     270@inproceedings{scarpazza2009cell,
     271        author={D. P. Scarpazza and G. F. Russell},
     272        year={2009},
     273        title={High-performance regular expression scanning on the Cell/BE processor},
     274        booktitle={Proceedings of the 23rd international conference on Supercomputing},
     275        publisher={ACM},
     276        pages={14-25}
     277}
     278
     279@article{aho1975,
     280        author={Alfred V. Aho and Margaret J. Corasick},
     281        year={1975},
     282        month={June},
     283        title={Efficient string matching: an aid to bibliographic search},
     284        journal={Commun.ACM},
     285        volume={18},
     286        number={6},
     287        pages={333-340},
     288        keywords={bibliographic search; computational complexity; finite state machines; information retrieval; keywords and phrases; string pattern matching; text-editing},
     289        isbn={0001-0782},
     290        url={http://doi.acm.org/10.1145/360825.360855}
     291}
     292
     293@inproceedings{tumeo2010efficient,
     294  title={Efficient pattern matching on GPUs for intrusion detection systems},
     295  author={Tumeo, Antonino and Villa, Oreste and Sciuto, Donatella},
     296  booktitle={Proceedings of the 7th ACM international conference on Computing frontiers},
     297  pages={87--88},
     298  year={2010},
     299  organization={ACM}
     300}
     301
     302@inproceedings{tumeo2011,
     303 author = {Tumeo, Antonino and Secchi, Simone and Villa, Oreste},
     304 title = {Experiences with string matching on the fermi architecture},
     305 booktitle = {Proceedings of the 24th international conference on Architecture of computing systems},
     306 series = {ARCS'11},
     307 year = {2011},
     308 isbn = {978-3-642-19136-7},
     309 location = {Como, Italy},
     310 pages = {26--37},
     311 numpages = {12},
     312 url = {http://dl.acm.org/citation.cfm?id=1966221.1966225},
     313 acmid = {1966225},
     314 publisher = {Springer-Verlag},
     315 address = {Berlin, Heidelberg},
     316}
Note: See TracChangeset for help on using the changeset viewer.