 Timestamp:
 May 10, 2013, 1:10:27 PM (6 years ago)
 Location:
 docs/Working/re
 Files:

 2 edited
Legend:
 Unmodified
 Added
 Removed

docs/Working/re/remain.tex
r3123 r3124 38 38 Performance results show a dramatic speedup over traditional and stateoftheart alternatives. 39 39 40 41 40 \end{abstract} 42 41 … … 44 43 \label{Introduction} 45 44 %\input{introduction.tex} 45 46 Regular expresssion matching is an extensively studied problem with a multitude 47 of algorithms and software tools developed to the demands of particular problem contexts. 46 48 47 49 Given a text T$_{1..n}$ of n characters and a pattern P, the pattern matching problem can be … … 52 54 The pattern P can be just a simple string, 53 55 but it can also be, for example, a regular expression. 54 In this paper our focus is regular expression matching.55 56 56 57 A regular expression, or pattern, is an expression that specifies a set of strings. 57 A regular expression is composed of (i) basic strings and(ii)58 A regular expression is composed of (i) simple strings (ii) the empty or (ii) 58 59 union, concatenation and Kleene closure of other regular expressions. 59 60 To avoid parentheses it is assumed that the Kleene star has the highest priority, 60 61 next concatenation and then alternation, however, most formalisms provides grouping 61 62 operators to allow the definition of scope and operator precedence. 62 63 63 Readers unfamiliar with the concept of regular expression matching are referred 64 64 classical texts such as \cite{aho2007}. 65 65 66 \subsection{Grep}67 68 66 Regular expression matching is commonly performed using a variety of 69 publically available software tools. Namely, the UNIX grep family,70 the GNU grep family, agrep, cgrep, sgrep, nrgrep, and Perl regular67 publically available software tools. The most prominent, UNIX grep, 68 Gnu grep, agrep, cgrep, nrgrep, and Perl regular 71 69 expressions \cite{Abouassaleh04surveyof}. 72 70 73 % Grep 74 % Unix grep 75 % Gnu grep 76 % agrep 77 % nrgrep 71 Amongst these Gnu grep, agrep, and nrgrep are widely known and considered as 72 the fastest regular expression matching tools in practice \cite{}. 78 73 79 Of particular interest are wellstudied and performance oriented 80 Gnu grep, agrep, and nrgrep. 74 Of particular interest to this study are the performance oriented Gnu grep, agrep, and nrgrep. 81 75 82 As such, we compare the performance of our parallel \bitstream{} techniques against 76 % motivation / previous work 77 Although the finite state machine methods used in the scanning and parsing of 78 text streams is considered to be the hardest of the â13 dwarvesâ to parallelize 79 [1], parallel bitstream technology shows considerable promise for these types of 80 applications [3, 4]. In this approach, character streams are processed W positions 81 at a time using the Wbit SIMD registers commonly found on commodity processors 82 (e.g., 128bit XMM registers on Intel/AMD chips). This is achieved by 83 first slicing the byte streams into eight separate basis bitstreams, one for each bit 84 position within the byte. These basis bitstreams are then combined with bitwise 85 logic and shifting operations to compute further parallel bit streams of interest. 86 87 We further increase the parallelism in our methods by introducing a new parallel 88 scanning primitive which we have coined 'Match Star' that returns all matches in a single 89 operation and eliminates the need for back tracking ... (ELABORATE) 90 91  STATE the content of the next sections.  92 93 94 We compare the performance of our parallel \bitstream{} techniques against 83 95 various grep concentrate on the simpler case of 84 96 reporting initial or final occurrence positions. 97 98 99 \section{Background} 100 \label{Background} 101 %\input{background.tex} 85 102 86 103 % Background 87 104 88 105 % History 89 90 Regular expresssion matching is an extensively studied problem with a multitude91 of algorithms and software tools developed to the demands of particular problem contexts.92 106 93 107 Historically, the origins of regular expression matching date back to automata theory … … 114 128 may have O(2^m) states. 115 129 116 Overall, the general process is first to build a 117 NFA from the regular expression, convert the NFA into a 118 DFA, minimize the number of states in the DFA, 119 and finally use the DFA to scan the text. 130 In general, the general process is first to build a 131 NFA from the regular expression and simulate the NFA on text input, 132 or alternatively to convert the NFA into a 133 DFA, optionally minimize the number of states in the DFA, 134 and finally simulate the DFA on text input. 120 135 121 \section{Background}122 \label{Background}123 %\input{background.tex}124 136 125 137 \section{Methodology} 
docs/Working/re/remain.tex.backup
r3123 r3124 38 38 Performance results show a dramatic speedup over traditional and stateoftheart alternatives. 39 39 40 41 40 \end{abstract} 42 41 … … 44 43 \label{Introduction} 45 44 %\input{introduction.tex} 45 46 % parallel bitstream technology, parallelization, regular expressions 47 48 % motivation / previous work 49 Although the finite state machine methods used in the scanning and parsing of 50 text streams is considered to be the hardest of the â13 dwarvesâ to parallelize 51 [1], parallel bitstream technology shows considerable promise for these types of 52 applications [3, 4]. In this approach, character streams are processed W positions 53 at a time using the Wbit SIMD registers commonly found on commodity processors 54 (e.g., 128bit XMM registers on Intel/AMD chips). This is achieved by 55 first slicing the byte streams into eight separate basis bitstreams, one for each bit 56 position within the byte. These basis bitstreams are then combined with bitwise 57 logic and shifting operations to compute further parallel bit streams of interest. 58 59 We further increase the parallelism in our methods by introducing a new parallel 60 scanning primitive which we have coined 'Match Star' that returns all matches in a single 61 operation and eliminates the need for back tracking ... (ELABORATE) 62 46 63 47 64 Given a text T$_{1..n}$ of n characters and a pattern P, the pattern matching problem can be … … 52 69 The pattern P can be just a simple string, 53 70 but it can also be, for example, a regular expression. 54 In this paper our focus is regular expression matching.55 71 56 72 A regular expression, or pattern, is an expression that specifies a set of strings. 57 A regular expression is composed of (i) basic strings and(ii)73 A regular expression is composed of (i) simple strings (ii) the empty or (ii) 58 74 union, concatenation and Kleene closure of other regular expressions. 59 75 To avoid parentheses it is assumed that the Kleene star has the highest priority, 60 76 next concatenation and then alternation, however, most formalisms provides grouping 61 77 operators to allow the definition of scope and operator precedence. 62 63 78 Readers unfamiliar with the concept of regular expression matching are referred 64 79 classical texts such as \cite{aho2007}. 65 80 66 \subsection{Grep}67 68 81 Regular expression matching is commonly performed using a variety of 69 publically available software tools. Namely, the UNIX grep family,70 the GNU grep family, agrep, cgrep, sgrep, nrgrep, and Perl regular82 publically available software tools. The most prominent, UNIX grep, 83 Gnu grep, agrep, cgrep, nrgrep, and Perl regular 71 84 expressions \cite{Abouassaleh04surveyof}. 72 85 73 % Grep 86 Amongst these Gnu grep, agrep, and nrgrep are widely known and considered as 87 the fastest regular expression matching tools in practice \cite{}. 88 89 Of particular interest to this study are the performance oriented Gnu grep, agrep, and nrgrep. 90 74 91 % Unix grep 75 92 % Gnu grep … … 77 94 % nrgrep 78 95 79 Of particular interest are wellstudied and performance oriented 80 Gnu grep, agrep, and nrgrep. 96 Regular expresssion matching is an extensively studied problem with a multitude 97 of algorithms and software tools developed to the demands of particular problem contexts. 81 98 82 99 As such, we compare the performance of our parallel \bitstream{} techniques against … … 84 101 reporting initial or final occurrence positions. 85 102 103 104 \section{Background} 105 \label{Background} 106 %\input{background.tex} 107 86 108 % Background 87 109 88 110 % History 89 111 90 Regular expresssion matching is an extensively studied problem with a multitude91 of algorithms and software tools developed to the demands of particular problem contexts.92 93 112 Historically, the origins of regular expression matching date back to automata theory 94 113 and formal language theory developed by Kleene in the 1950s \cite{kleene1951representation}. 95 114 96 The traditional technique [16] to search a regular expression of length m in 97 a text of length n is to first convert the expression into a nondeterministic 98 automaton (NFA) with O(m) nodes. 115 In 1959, Dana and Scott demonstrated that 116 NFAs can be simulated using Deterministic Finite Automata (DFA) in which each DFA 117 state corresponds to a set of NFA states. 99 118 100 119 Thompson, in 1968, is credited with the first construction to convert regular expressions 101 120 to nondeterministic finite automata (NFA) for regular expression matching. 121 102 122 Thompsonâs publication \cite{thompson1968} marked the beginning of a long line of 103 123 regular expression implementations that construct automata for pattern matching. 104 124 105 The general process is first to build a NFA. 106 Automaton (NFA) from the regular expression, then convert the NFA into a 107 Deterministic Finite Automaton (DFA), and finally use the DFA to scan the text. 108 109 It is possible to search the text using the NFA directly in O(mn) worst case time. 110 111 The cost comes from the fact that 125 The traditional technique [16] to search a regular expression of length m in 126 a text of length n is to first convert the expression into a nondeterministic 127 automaton (NFA) with O(m) nodes. It is possible to search the text using the 128 NFA directly in O(mn) worst case time. The cost comes from the fact that 112 129 more than one state of the NFA may be active at each step, and therefore all 113 130 may need to be updated. A more effcient choice is to convert the NFA into a 114 deterministic finite automaton (DFA). A DFA has only a single active state 115 and allows to search the text at O(n) worstcase optimal. The problem with this116 approach is that the DFAmay have O(2^m) states.131 DFA. A DFA has only a single active state and allows to search the text at 132 O(n) worstcase optimal. The problem with this approach is that the DFA 133 may have O(2^m) states. 117 134 135 In general, the general process is first to build a 136 NFA from the regular expression and simulate the NFA on text input, 137 or alternatively to convert the NFA into a 138 DFA, optionally minimize the number of states in the DFA, 139 and finally simulate the DFA on text input. 118 140 119 120 121 122 123 124 \section{Background}125 \label{Background}126 %\input{background.tex}127 141 128 142 \section{Methodology}
Note: See TracChangeset
for help on using the changeset viewer.