source: docs/Working/re/re-main.tex @ 3123

Last change on this file since 3123 was 3123, checked in by ksherdy, 6 years ago

Check-in rough notes only.

File size: 4.8 KB
Line 
1\documentclass[a4paper,10pt]{article}
2\usepackage[utf8]{inputenc}
3
4\def \Bitstream{Bit Stream}
5\def \bitstream{bit stream}
6
7%opening
8\title{Fast Regular Expression Matching using Parallel \Bitstream{}s}
9\author{
10{Robert D. Cameron} \\
11\and 
12{Kenneth S. Herdy} \\
13\and 
14{Ben Hull} \\
15\and 
16{Thomas C. Shermer} \\
17\\School of Computing Science
18\\Simon Fraser University
19}
20\begin{document}
21
22\date{}
23\maketitle
24
25\begin{abstract}
26
27A data parallel regular expression matching method using the concept of bitstream technology
28is introduced and studied in application to the problem of fast regular expression matching.
29
30The method is based on the concept of parallel
31\bitstream{} technology, in which parallel streams of bits are formed such
32that each stream comprises bits in one-to-one correspondence with the
33character code units of a source data stream.
34
35On processors supporting W-bit addition operations, the method processes W source characters 
36in parallel and performs up to W finite state transitions per clock cycle.
37
38Performance results show a dramatic speed-up over traditional and state-of-the-art alternatives.
39
40
41\end{abstract}
42
43\section{Introduction}
44\label{Introduction}
45%\input{introduction.tex}
46
47Given a text T$_{1..n}$ of n characters and a pattern P, the pattern matching problem can be
48stated as follows. Find all the text positions of T that start an occurrence of P. 
49Alternatively, one may want all the final positions of occurrences. Some
50applications require slightly different output such as the line that matches the pattern.
51
52The pattern P can be just a simple string,
53but it can also be, for example, a regular expression.
54In this paper our focus is regular expression matching.
55
56A regular expression, or pattern, is an expression that specifies a set of strings.
57A regular expression is composed of (i) basic strings and (ii)
58union, concatenation and Kleene closure of other regular expressions.
59To avoid parentheses it is assumed that the Kleene star has the highest priority,
60next concatenation and then alternation, however, most formalisms provides grouping
61operators to allow the definition of scope and operator precedence.
62
63Readers unfamiliar with the concept of regular expression matching are referred
64classical texts such as \cite{aho2007}.
65
66\subsection{Grep}
67
68Regular expression matching is commonly performed using a variety of
69publically available software tools. Namely, the UNIX grep family,
70the GNU grep family, agrep, cgrep, sgrep, nrgrep, and Perl regular
71expressions \cite{Abou-assaleh04surveyof}.
72
73% Grep
74% Unix grep
75% Gnu grep
76% agrep
77% nrgrep
78
79Of particular interest are well-studied and performance oriented
80Gnu grep, agrep, and nrgrep.
81
82As such, we compare the performance of our parallel \bitstream{} techniques against
83various grep concentrate on the simpler case of
84reporting initial or final occurrence positions.
85
86% Background
87
88% History
89
90Regular expresssion matching is an extensively studied problem with a multitude
91of algorithms and software tools developed to the demands of particular problem contexts.
92
93Historically, the origins of regular expression matching date back to automata theory
94and formal language theory developed by Kleene in the 1950s \cite{kleene1951representation}.
95
96In 1959, Dana and Scott demonstrated that
97NFAs can be simulated using Deterministic Finite Automata (DFA) in which each DFA
98state corresponds to a set of NFA states.
99
100Thompson, in 1968, is credited with the first construction to convert regular expressions
101to nondeterministic finite automata (NFA) for regular expression matching.
102
103Thompson’s publication \cite{thompson1968} marked the beginning of a long line of
104regular expression implementations that construct automata for pattern matching.
105
106The traditional technique [16] to search a regular expression of length m in
107a text of length n is to first convert the expression into a non-deterministic
108automaton (NFA) with O(m) nodes. It is possible to search the text using the
109NFA directly in O(mn) worst case time. The cost comes from the fact that
110more than one state of the NFA may be active at each step, and therefore all
111may need to be updated. A more effcient choice is to convert the NFA into a
112DFA. A DFA has only a single active state and allows to search the text at
113O(n) worst-case optimal. The problem with this approach is that the DFA
114may have O(2^m) states.
115
116Overall, the general process is first to build a
117NFA from the regular expression, convert the NFA into a
118DFA, minimize the number of states in the DFA,
119and finally use the DFA to scan the text.
120
121\section{Background}
122\label{Background}
123%\input{background.tex}
124
125\section{Methodology}
126\label{Methodology}
127%\input{methodology.tex}
128
129\section{Experimental Results}
130\label{results}
131%\input{results.tex}
132
133\section{Conclusion and Future Work}
134\label{conclusion}
135%\input{conclusion.tex}
136
137{
138  \bibliographystyle{acm}
139  \bibliography{reference}
140}
141
142\end{document}
Note: See TracBrowser for help on using the repository browser.