Ignore:
Timestamp:
May 10, 2013, 4:16:34 PM (6 years ago)
Author:
ksherdy
Message:

Rough notes.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/re/re-main.tex.backup

    r3124 r3126  
    2525\begin{abstract}
    2626
    27 A data parallel regular expression matching method using the concept of bitstream technology
    28 is introduced and studied in application to the problem of fast regular expression matching.
     27A parallel regular expression matching method is introduced and studied in
     28application to the problem of online pattern matching.
    2929
    3030The method is based on the concept of parallel
     
    4444%\input{introduction.tex}
    4545
    46 % parallel bitstream technology, parallelization, regular expressions
     46Regular expresssion matching is an extensively studied problem with application to
     47numerous application domains. A multitude
     48of algorithms and software tools have been developed to the address the particular
     49demands of the various application domains.
     50
     51The pattern matching problem can be
     52stated as follows. Given a text T$_{1..n}$ of n characters and a pattern P,
     53find all the text positions of T that start an occurrence of P. Alternatively,
     54one may want all the final positions of occurrences. Some applications require
     55slightly different output such as the line that matches the pattern.
     56
     57A pattern P can be a simple string, but it can also be, a regular expression.
     58A regular expression, is an expression that specifies a set of strings.
     59A regular expression is composed of (i) simple strings and (ii) the
     60union, concatenation and Kleene closure of other regular expressions.
     61To avoid parentheses it is assumed that the Kleene star has the highest priority,
     62next concatenation and then alternation, however, most formalisms provides grouping
     63operators to allow the definition of scope and operator precedence.
     64Readers unfamiliar with the concept of regular expression matching are referred
     65classical texts such as \cite{aho2007}.
     66
     67Regular expression matching is commonly performed using a wide variety of
     68publically available software tools for on-line pattern matching. For instance,
     69UNIX grep, Gnu grep, agrep, cgrep, nrgrep, and Perl regular
     70expressions \cite{abou-assaleh2004}. Amongst these tools Gnu grep (egrep),
     71agrep, and nrgrep are widely known and considered as
     72the fastest regular expression matching tools in practice \cite{navarro2000}.
     73and are of particular interest to this study.
     74
     75% simple patterns, extended patterns, regular expressions
    4776
    4877% motivation / previous work
    49 Although the finite state machine methods used in the scanning and parsing of
     78Although tradi finite state machine methods used in the scanning and parsing of
    5079text streams is considered to be the hardest of the “13 dwarves” to parallelize
    5180[1], parallel bitstream technology shows considerable promise for these types of
     
    5887
    5988We further increase the parallelism in our methods by introducing a new parallel
    60 scanning primitive which we have coined 'Match Star' that returns all matches in a single
    61 operation and eliminates the need for back tracking ... (ELABORATE)
     89scanning primitive which we have coined Match Star. Match Star returns all matches
     90in a single operation and eliminates backtracking
     91when a partially successful search path fails.
    6292
     93       
     94The remainder of this paper is organized as follows.
    6395
    64 Given a text T$_{1..n}$ of n characters and a pattern P, the pattern matching problem can be
    65 stated as follows. Find all the text positions of T that start an occurrence of P. 
    66 Alternatively, one may want all the final positions of occurrences. Some
    67 applications require slightly different output such as the line that matches the pattern.
     96Section~\ref{Background} presents background material on classic
     97regular expression pattern matching techniques and provides insight into the
     98efficiency of traditional regular expression software tools.
    6899
    69 The pattern P can be just a simple string,
    70 but it can also be, for example, a regular expression.
     100Section~\ref{Bitwise Parallel Data Streams} describes out data parallel
     101regular expression matching techniques.
    71102
    72 A regular expression, or pattern, is an expression that specifies a set of strings.
    73 A regular expression is composed of (i) simple strings (ii) the empty or (ii)
    74 union, concatenation and Kleene closure of other regular expressions.
    75 To avoid parentheses it is assumed that the Kleene star has the highest priority,
    76 next concatenation and then alternation, however, most formalisms provides grouping
    77 operators to allow the definition of scope and operator precedence.
    78 Readers unfamiliar with the concept of regular expression matching are referred
    79 classical texts such as \cite{aho2007}.
     103Section~\ref{Compiler Technology}
    80104
    81 Regular expression matching is commonly performed using a variety of
    82 publically available software tools. The most prominent, UNIX grep, 
    83 Gnu grep, agrep, cgrep, nrgrep, and Perl regular
    84 expressions \cite{Abou-assaleh04surveyof}.
     105Section~\ref{Methodology} describes the evaluation framework and Section~\ref{Experimental Results}
     106presents a detailed performance analysis of our data parallel \bitstream{} techniques against
     107Gnu grep, agrep, and nr-grep.
    85108
    86 Amongst these Gnu grep, agrep, and nrgrep are widely known and considered as
    87 the fastest regular expression matching tools in practice \cite{}.
    88 
    89 Of particular interest to this study are the performance oriented Gnu grep, agrep, and nrgrep.
    90 
    91 % Unix grep
    92 % Gnu grep
    93 % agrep
    94 % nrgrep
    95 
    96 Regular expresssion matching is an extensively studied problem with a multitude
    97 of algorithms and software tools developed to the demands of particular problem contexts.
    98 
    99 As such, we compare the performance of our parallel \bitstream{} techniques against
    100 various grep concentrate on the simpler case of
    101 reporting initial or final occurrence positions.
    102 
     109Section~\ref{conclusion} concludes the paper.
    103110
    104111\section{Background}
     
    111118
    112119Historically, the origins of regular expression matching date back to automata theory
    113 and formal language theory developed by Kleene in the 1950s \cite{kleene1951representation}.
     120and formal language theory developed by Kleene in the 1950s \cite{kleene1951}.
    114121
    115122In 1959, Dana and Scott demonstrated that
     
    140147
    141148
     149
     150\section{Parallel Bitwise Data Streams}
     151\label{Parallel Bitwise Data Streams}
     152
     153
     154\section{Compiler Technology}
     155\label{Compiler Technology}
     156
    142157\section{Methodology}
    143158\label{Methodology}
    144159%\input{methodology.tex}
     160
     161We compare the performance of our parallel \bitstream{} techniques against
     162Gnu grep, agrep, and nr-grep.
     163
     164Given a regular expression R and a test T the regular expression matching
     165problem finds all ending position of substrings in Q that matches a string in
     166the language denoted by R.
     167
     168The behaviour of Gnu grep, agrep, and nr-grep are differ in that
     169
     170Gnu grep
     171
     172agrep
     173
     174nr-grep
     175
     176
    145177
    146178\section{Experimental Results}
Note: See TracChangeset for help on using the changeset viewer.