Ignore:
Timestamp:
May 10, 2013, 1:10:27 PM (6 years ago)
Author:
ksherdy
Message:

Rough notes.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/re/re-main.tex.backup

    r3123 r3124  
    3838Performance results show a dramatic speed-up over traditional and state-of-the-art alternatives.
    3939
    40 
    4140\end{abstract}
    4241
     
    4443\label{Introduction}
    4544%\input{introduction.tex}
     45
     46% parallel bitstream technology, parallelization, regular expressions
     47
     48% motivation / previous work
     49Although the finite state machine methods used in the scanning and parsing of
     50text streams is considered to be the hardest of the “13 dwarves” to parallelize
     51[1], parallel bitstream technology shows considerable promise for these types of
     52applications [3, 4]. In this approach, character streams are processed W positions
     53at a time using the W-bit SIMD registers commonly found on commodity processors
     54(e.g., 128-bit XMM registers on Intel/AMD chips). This is achieved by
     55first slicing the byte streams into eight separate basis bitstreams, one for each bit
     56position within the byte. These basis bitstreams are then combined with bitwise
     57logic and shifting operations to compute further parallel bit streams of interest.
     58
     59We further increase the parallelism in our methods by introducing a new parallel
     60scanning primitive which we have coined 'Match Star' that returns all matches in a single
     61operation and eliminates the need for back tracking ... (ELABORATE)
     62
    4663
    4764Given a text T$_{1..n}$ of n characters and a pattern P, the pattern matching problem can be
     
    5269The pattern P can be just a simple string,
    5370but it can also be, for example, a regular expression.
    54 In this paper our focus is regular expression matching.
    5571
    5672A regular expression, or pattern, is an expression that specifies a set of strings.
    57 A regular expression is composed of (i) basic strings and (ii)
     73A regular expression is composed of (i) simple strings (ii) the empty or (ii)
    5874union, concatenation and Kleene closure of other regular expressions.
    5975To avoid parentheses it is assumed that the Kleene star has the highest priority,
    6076next concatenation and then alternation, however, most formalisms provides grouping
    6177operators to allow the definition of scope and operator precedence.
    62 
    6378Readers unfamiliar with the concept of regular expression matching are referred
    6479classical texts such as \cite{aho2007}.
    6580
    66 \subsection{Grep}
    67 
    6881Regular expression matching is commonly performed using a variety of
    69 publically available software tools. Namely, the UNIX grep family,
    70 the GNU grep family, agrep, cgrep, sgrep, nrgrep, and Perl regular
     82publically available software tools. The most prominent, UNIX grep,
     83Gnu grep, agrep, cgrep, nrgrep, and Perl regular
    7184expressions \cite{Abou-assaleh04surveyof}.
    7285
    73 % Grep
     86Amongst these Gnu grep, agrep, and nrgrep are widely known and considered as
     87the fastest regular expression matching tools in practice \cite{}.
     88
     89Of particular interest to this study are the performance oriented Gnu grep, agrep, and nrgrep.
     90
    7491% Unix grep
    7592% Gnu grep
     
    7794% nrgrep
    7895
    79 Of particular interest are well-studied and performance oriented
    80 Gnu grep, agrep, and nrgrep.
     96Regular expresssion matching is an extensively studied problem with a multitude
     97of algorithms and software tools developed to the demands of particular problem contexts.
    8198
    8299As such, we compare the performance of our parallel \bitstream{} techniques against
     
    84101reporting initial or final occurrence positions.
    85102
     103
     104\section{Background}
     105\label{Background}
     106%\input{background.tex}
     107
    86108% Background
    87109
    88110% History
    89111
    90 Regular expresssion matching is an extensively studied problem with a multitude
    91 of algorithms and software tools developed to the demands of particular problem contexts.
    92 
    93112Historically, the origins of regular expression matching date back to automata theory
    94113and formal language theory developed by Kleene in the 1950s \cite{kleene1951representation}.
    95114
    96 The traditional technique [16] to search a regular expression of length m in
    97 a text of length n is to first convert the expression into a non-deterministic
    98 automaton (NFA) with O(m) nodes.
     115In 1959, Dana and Scott demonstrated that
     116NFAs can be simulated using Deterministic Finite Automata (DFA) in which each DFA
     117state corresponds to a set of NFA states.
    99118
    100119Thompson, in 1968, is credited with the first construction to convert regular expressions
    101120to nondeterministic finite automata (NFA) for regular expression matching.
     121
    102122Thompson’s publication \cite{thompson1968} marked the beginning of a long line of
    103123regular expression implementations that construct automata for pattern matching.
    104124
    105 The general process is first to build a NFA.
    106 Automaton (NFA) from the regular expression, then convert the NFA into a
    107 Deterministic Finite Automaton (DFA), and finally use the DFA to scan the text.
    108 
    109 It is possible to search the text using the NFA directly in O(mn) worst case time.
    110 
    111 The cost comes from the fact that
     125The traditional technique [16] to search a regular expression of length m in
     126a text of length n is to first convert the expression into a non-deterministic
     127automaton (NFA) with O(m) nodes. It is possible to search the text using the
     128NFA directly in O(mn) worst case time. The cost comes from the fact that
    112129more than one state of the NFA may be active at each step, and therefore all
    113130may need to be updated. A more effcient choice is to convert the NFA into a
    114 deterministic finite automaton (DFA). A DFA has only a single active state
    115 and allows to search the text at O(n) worst-case optimal. The problem with this
    116 approach is that the DFA may have O(2^m) states.
     131DFA. A DFA has only a single active state and allows to search the text at
     132O(n) worst-case optimal. The problem with this approach is that the DFA
     133may have O(2^m) states.
    117134
     135In general, the general process is first to build a
     136NFA from the regular expression and simulate the NFA on text input,
     137or alternatively to convert the NFA into a
     138DFA, optionally minimize the number of states in the DFA,
     139and finally simulate the DFA on text input.
    118140
    119 
    120 
    121 
    122 
    123 
    124 \section{Background}
    125 \label{Background}
    126 %\input{background.tex}
    127141
    128142\section{Methodology}
Note: See TracChangeset for help on using the changeset viewer.