Changeset 3124 for docs/Working/re


Ignore:
Timestamp:
May 10, 2013, 1:10:27 PM (6 years ago)
Author:
ksherdy
Message:

Rough notes.

Location:
docs/Working/re
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/re/re-main.tex

    r3123 r3124  
    3838Performance results show a dramatic speed-up over traditional and state-of-the-art alternatives.
    3939
    40 
    4140\end{abstract}
    4241
     
    4443\label{Introduction}
    4544%\input{introduction.tex}
     45
     46Regular expresssion matching is an extensively studied problem with a multitude
     47of algorithms and software tools developed to the demands of particular problem contexts.
    4648
    4749Given a text T$_{1..n}$ of n characters and a pattern P, the pattern matching problem can be
     
    5254The pattern P can be just a simple string,
    5355but it can also be, for example, a regular expression.
    54 In this paper our focus is regular expression matching.
    5556
    5657A regular expression, or pattern, is an expression that specifies a set of strings.
    57 A regular expression is composed of (i) basic strings and (ii)
     58A regular expression is composed of (i) simple strings (ii) the empty or (ii)
    5859union, concatenation and Kleene closure of other regular expressions.
    5960To avoid parentheses it is assumed that the Kleene star has the highest priority,
    6061next concatenation and then alternation, however, most formalisms provides grouping
    6162operators to allow the definition of scope and operator precedence.
    62 
    6363Readers unfamiliar with the concept of regular expression matching are referred
    6464classical texts such as \cite{aho2007}.
    6565
    66 \subsection{Grep}
    67 
    6866Regular expression matching is commonly performed using a variety of
    69 publically available software tools. Namely, the UNIX grep family,
    70 the GNU grep family, agrep, cgrep, sgrep, nrgrep, and Perl regular
     67publically available software tools. The most prominent, UNIX grep,
     68Gnu grep, agrep, cgrep, nrgrep, and Perl regular
    7169expressions \cite{Abou-assaleh04surveyof}.
    7270
    73 % Grep
    74 % Unix grep
    75 % Gnu grep
    76 % agrep
    77 % nrgrep
     71Amongst these Gnu grep, agrep, and nrgrep are widely known and considered as
     72the fastest regular expression matching tools in practice \cite{}.
    7873
    79 Of particular interest are well-studied and performance oriented
    80 Gnu grep, agrep, and nrgrep.
     74Of particular interest to this study are the performance oriented Gnu grep, agrep, and nrgrep.
    8175
    82 As such, we compare the performance of our parallel \bitstream{} techniques against
     76% motivation / previous work
     77Although the finite state machine methods used in the scanning and parsing of
     78text streams is considered to be the hardest of the “13 dwarves” to parallelize
     79[1], parallel bitstream technology shows considerable promise for these types of
     80applications [3, 4]. In this approach, character streams are processed W positions
     81at a time using the W-bit SIMD registers commonly found on commodity processors
     82(e.g., 128-bit XMM registers on Intel/AMD chips). This is achieved by
     83first slicing the byte streams into eight separate basis bitstreams, one for each bit
     84position within the byte. These basis bitstreams are then combined with bitwise
     85logic and shifting operations to compute further parallel bit streams of interest.
     86
     87We further increase the parallelism in our methods by introducing a new parallel
     88scanning primitive which we have coined 'Match Star' that returns all matches in a single
     89operation and eliminates the need for back tracking ... (ELABORATE)
     90
     91--- STATE the content of the next sections. ---
     92
     93
     94We compare the performance of our parallel \bitstream{} techniques against
    8395various grep concentrate on the simpler case of
    8496reporting initial or final occurrence positions.
     97
     98
     99\section{Background}
     100\label{Background}
     101%\input{background.tex}
    85102
    86103% Background
    87104
    88105% History
    89 
    90 Regular expresssion matching is an extensively studied problem with a multitude
    91 of algorithms and software tools developed to the demands of particular problem contexts.
    92106
    93107Historically, the origins of regular expression matching date back to automata theory
     
    114128may have O(2^m) states.
    115129
    116 Overall, the general process is first to build a
    117 NFA from the regular expression, convert the NFA into a
    118 DFA, minimize the number of states in the DFA,
    119 and finally use the DFA to scan the text.
     130In general, the general process is first to build a
     131NFA from the regular expression and simulate the NFA on text input,
     132or alternatively to convert the NFA into a
     133DFA, optionally minimize the number of states in the DFA,
     134and finally simulate the DFA on text input.
    120135
    121 \section{Background}
    122 \label{Background}
    123 %\input{background.tex}
    124136
    125137\section{Methodology}
  • docs/Working/re/re-main.tex.backup

    r3123 r3124  
    3838Performance results show a dramatic speed-up over traditional and state-of-the-art alternatives.
    3939
    40 
    4140\end{abstract}
    4241
     
    4443\label{Introduction}
    4544%\input{introduction.tex}
     45
     46% parallel bitstream technology, parallelization, regular expressions
     47
     48% motivation / previous work
     49Although the finite state machine methods used in the scanning and parsing of
     50text streams is considered to be the hardest of the “13 dwarves” to parallelize
     51[1], parallel bitstream technology shows considerable promise for these types of
     52applications [3, 4]. In this approach, character streams are processed W positions
     53at a time using the W-bit SIMD registers commonly found on commodity processors
     54(e.g., 128-bit XMM registers on Intel/AMD chips). This is achieved by
     55first slicing the byte streams into eight separate basis bitstreams, one for each bit
     56position within the byte. These basis bitstreams are then combined with bitwise
     57logic and shifting operations to compute further parallel bit streams of interest.
     58
     59We further increase the parallelism in our methods by introducing a new parallel
     60scanning primitive which we have coined 'Match Star' that returns all matches in a single
     61operation and eliminates the need for back tracking ... (ELABORATE)
     62
    4663
    4764Given a text T$_{1..n}$ of n characters and a pattern P, the pattern matching problem can be
     
    5269The pattern P can be just a simple string,
    5370but it can also be, for example, a regular expression.
    54 In this paper our focus is regular expression matching.
    5571
    5672A regular expression, or pattern, is an expression that specifies a set of strings.
    57 A regular expression is composed of (i) basic strings and (ii)
     73A regular expression is composed of (i) simple strings (ii) the empty or (ii)
    5874union, concatenation and Kleene closure of other regular expressions.
    5975To avoid parentheses it is assumed that the Kleene star has the highest priority,
    6076next concatenation and then alternation, however, most formalisms provides grouping
    6177operators to allow the definition of scope and operator precedence.
    62 
    6378Readers unfamiliar with the concept of regular expression matching are referred
    6479classical texts such as \cite{aho2007}.
    6580
    66 \subsection{Grep}
    67 
    6881Regular expression matching is commonly performed using a variety of
    69 publically available software tools. Namely, the UNIX grep family,
    70 the GNU grep family, agrep, cgrep, sgrep, nrgrep, and Perl regular
     82publically available software tools. The most prominent, UNIX grep,
     83Gnu grep, agrep, cgrep, nrgrep, and Perl regular
    7184expressions \cite{Abou-assaleh04surveyof}.
    7285
    73 % Grep
     86Amongst these Gnu grep, agrep, and nrgrep are widely known and considered as
     87the fastest regular expression matching tools in practice \cite{}.
     88
     89Of particular interest to this study are the performance oriented Gnu grep, agrep, and nrgrep.
     90
    7491% Unix grep
    7592% Gnu grep
     
    7794% nrgrep
    7895
    79 Of particular interest are well-studied and performance oriented
    80 Gnu grep, agrep, and nrgrep.
     96Regular expresssion matching is an extensively studied problem with a multitude
     97of algorithms and software tools developed to the demands of particular problem contexts.
    8198
    8299As such, we compare the performance of our parallel \bitstream{} techniques against
     
    84101reporting initial or final occurrence positions.
    85102
     103
     104\section{Background}
     105\label{Background}
     106%\input{background.tex}
     107
    86108% Background
    87109
    88110% History
    89111
    90 Regular expresssion matching is an extensively studied problem with a multitude
    91 of algorithms and software tools developed to the demands of particular problem contexts.
    92 
    93112Historically, the origins of regular expression matching date back to automata theory
    94113and formal language theory developed by Kleene in the 1950s \cite{kleene1951representation}.
    95114
    96 The traditional technique [16] to search a regular expression of length m in
    97 a text of length n is to first convert the expression into a non-deterministic
    98 automaton (NFA) with O(m) nodes.
     115In 1959, Dana and Scott demonstrated that
     116NFAs can be simulated using Deterministic Finite Automata (DFA) in which each DFA
     117state corresponds to a set of NFA states.
    99118
    100119Thompson, in 1968, is credited with the first construction to convert regular expressions
    101120to nondeterministic finite automata (NFA) for regular expression matching.
     121
    102122Thompson’s publication \cite{thompson1968} marked the beginning of a long line of
    103123regular expression implementations that construct automata for pattern matching.
    104124
    105 The general process is first to build a NFA.
    106 Automaton (NFA) from the regular expression, then convert the NFA into a
    107 Deterministic Finite Automaton (DFA), and finally use the DFA to scan the text.
    108 
    109 It is possible to search the text using the NFA directly in O(mn) worst case time.
    110 
    111 The cost comes from the fact that
     125The traditional technique [16] to search a regular expression of length m in
     126a text of length n is to first convert the expression into a non-deterministic
     127automaton (NFA) with O(m) nodes. It is possible to search the text using the
     128NFA directly in O(mn) worst case time. The cost comes from the fact that
    112129more than one state of the NFA may be active at each step, and therefore all
    113130may need to be updated. A more effcient choice is to convert the NFA into a
    114 deterministic finite automaton (DFA). A DFA has only a single active state
    115 and allows to search the text at O(n) worst-case optimal. The problem with this
    116 approach is that the DFA may have O(2^m) states.
     131DFA. A DFA has only a single active state and allows to search the text at
     132O(n) worst-case optimal. The problem with this approach is that the DFA
     133may have O(2^m) states.
    117134
     135In general, the general process is first to build a
     136NFA from the regular expression and simulate the NFA on text input,
     137or alternatively to convert the NFA into a
     138DFA, optionally minimize the number of states in the DFA,
     139and finally simulate the DFA on text input.
    118140
    119 
    120 
    121 
    122 
    123 
    124 \section{Background}
    125 \label{Background}
    126 %\input{background.tex}
    127141
    128142\section{Methodology}
Note: See TracChangeset for help on using the changeset viewer.