# Changeset 3126

Ignore:
Timestamp:
May 10, 2013, 4:16:34 PM (6 years ago)
Message:

Rough notes.

Location:
docs/Working/re
Files:
8 edited

Unmodified
Removed
• ## docs/Working/re/re-main.aux

 r3123 \relax \citation{aho2007} \citation{abou-assaleh2004} \citation{navarro2000} \@writefile{toc}{\contentsline {section}{\numberline {1}Introduction}{1}} \newlabel{Introduction}{{1}{1}} \citation{kleene1951representation} \citation{kleene1951} \citation{thompson1968} \@writefile{toc}{\contentsline {section}{\numberline {2}Background}{2}} \newlabel{Background}{{2}{2}} \bibstyle{acm} \bibdata{reference} \bibcite{aho2007}{1} \bibcite{kleene1951representation}{2} \@writefile{toc}{\contentsline {section}{\numberline {2}Background}{2}} \newlabel{Background}{{2}{2}} \@writefile{toc}{\contentsline {section}{\numberline {3}Methodology}{2}} \newlabel{Methodology}{{3}{2}} \@writefile{toc}{\contentsline {section}{\numberline {4}Experimental Results}{2}} \newlabel{results}{{4}{2}} \@writefile{toc}{\contentsline {section}{\numberline {5}Conclusion and Future Work}{2}} \newlabel{conclusion}{{5}{2}} \bibcite{abou-assaleh2004}{1} \bibcite{aho2007}{2} \bibcite{kleene1951}{3} \bibcite{navarro2000}{4} \bibcite{thompson1968}{5} \@writefile{toc}{\contentsline {section}{\numberline {3}Parallel Bitwise Data Streams}{3}} \newlabel{Parallel Bitwise Data Streams}{{3}{3}} \@writefile{toc}{\contentsline {section}{\numberline {4}Compiler Technology}{3}} \newlabel{Compiler Technology}{{4}{3}} \@writefile{toc}{\contentsline {section}{\numberline {5}Methodology}{3}} \newlabel{Methodology}{{5}{3}} \@writefile{toc}{\contentsline {section}{\numberline {6}Experimental Results}{3}} \newlabel{Experimental Results}{{6}{3}} \@writefile{toc}{\contentsline {section}{\numberline {7}Conclusion}{3}} \newlabel{Conclusion}{{7}{3}}
• ## docs/Working/re/re-main.bbl

 r3123 \begin{thebibliography}{1} \bibitem{abou-assaleh2004} {\sc Abou-assaleh, T., and Ai, W.} \newblock Survey of global regular expression print (grep) tools. \newblock Tech. rep., 2004. \bibitem{aho2007} \newblock Addison Wesley, 2007. \bibitem{kleene1951representation} \bibitem{kleene1951} {\sc Kleene, S.~C.} \newblock Representation of events in nerve nets and finite automata. \bibitem{navarro2000} {\sc Navarro, G.} \newblock Nr-grep: A fast and flexible pattern matching tool. \newblock {\em Software Practice and Experience (SPE 31\/} (2000), 2001. \bibitem{thompson1968} {\sc Thompson, K.} \newblock Programming techniques: Regular expression search algorithm. \newblock {\em Communications of the ACM 11}, 6 (1968), 419--422. \end{thebibliography}
• ## docs/Working/re/re-main.blg

 r3123 The style file: acm.bst Database file #1: reference.bib Warning--empty journal in kleene1951representation You've used 2 entries, Warning--empty institution in abou-assaleh2004 Warning--empty journal in kleene1951 You've used 5 entries, 2253 wiz_defined-function locations, 537 strings with 4112 characters, and the built_in function-call counts, 483 in all, are: = -- 41 > -- 24 556 strings with 4444 characters, and the built_in function-call counts, 1249 in all, are: = -- 116 > -- 50 < -- 0 + -- 10 - -- 8 * -- 28 := -- 93 add.period$-- 5 call.type$ -- 2 change.case$-- 9 + -- 21 - -- 16 * -- 86 := -- 225 add.period$ -- 14 call.type$-- 5 change.case$ -- 23 chr.to.int$-- 0 cite$ -- 3 duplicate$-- 21 empty$ -- 37 format.name$-- 8 if$ -- 93 cite$-- 7 duplicate$ -- 48 empty$-- 104 format.name$ -- 16 if$-- 250 int.to.chr$ -- 0 int.to.str$-- 2 missing$ -- 3 newline$-- 12 num.names$ -- 4 pop$-- 14 int.to.str$ -- 5 missing$-- 5 newline$ -- 27 num.names$-- 10 pop$ -- 23 preamble$-- 1 purify$ -- 8 purify$-- 18 quote$ -- 0 skip$-- 11 skip$ -- 26 stack$-- 0 substring$ -- 10 swap$-- 3 substring$ -- 58 swap$-- 8 text.length$ -- 0 text.prefix$-- 0 top$ -- 0 type$-- 6 warning$ -- 1 while$-- 4 width$ -- 3 write$-- 19 (There was 1 warning) type$ -- 18 warning$-- 2 while$ -- 12 width$-- 6 write$ -- 49 (There were 2 warnings)
• ## docs/Working/re/re-main.log

 r3123 This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009/Debian) (format=pdflatex 2012.10.7)  9 MAY 2013 16:27 This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009/Debian) (format=pdflatex 2012.10.7)  10 MAY 2013 15:24 entering extended mode %&-line parsing enabled. (Font)              <6> on input line 23. LaTeX Font Info:    External font cmex10' loaded for size (Font)              <7> on input line 47. LaTeX Font Info:    External font cmex10' loaded for size (Font)              <5> on input line 47. (Font)              <7> on input line 52. LaTeX Font Info:    External font cmex10' loaded for size (Font)              <5> on input line 52. [1 {/var/lib/texmf/fonts/map/pdftex/updmap/pdftex.map}] (./re-main.bbl) [2] (./re-main.aux) ) {/var/lib/texmf/fonts/map/pdftex/updmap/pdftex.map}] LaTeX Warning: Reference Bitwise Parallel Data Streams' on page 2 undefined on input line 100. [2] (./re-main.bbl) [3] (./re-main.aux) LaTeX Warning: There were undefined references. ) Here is how much of TeX's memory you used: 455 strings out of 493848 4467 string characters out of 1152823 54050 words of memory out of 3000000 3784 multiletter control sequences out of 15000+50000 8534 words of font info for 30 fonts, out of 3000000 for 9000 463 strings out of 493848 4615 string characters out of 1152823 55050 words of memory out of 3000000 3791 multiletter control sequences out of 15000+50000 8842 words of font info for 31 fonts, out of 3000000 for 9000 714 hyphenation exceptions out of 8191 23i,6n,19p,165b,199s stack positions out of 5000i,500n,10000p,200000b,50000s Output written on re-main.pdf (2 pages, 132063 bytes). Output written on re-main.pdf (3 pages, 152136 bytes). PDF statistics: 49 PDF objects out of 1000 (max. 8388607) 56 PDF objects out of 1000 (max. 8388607) 0 named destinations out of 1000 (max. 500000) 1 words of extra memory for PDF output out of 10000 (max. 10000000)
• ## docs/Working/re/re-main.tex

 r3124 \usepackage[utf8]{inputenc} \def \Bitstream{Bit Stream} \def \bitstream{bit stream} %opening \title{Fast Regular Expression Matching using Parallel \Bitstream{}s} \title{Fast Regular Expression Matching using Parallel Bitstreams} \author{ {Robert D. Cameron} \\ \begin{abstract} A data parallel regular expression matching method using the concept of bitstream technology is introduced and studied in application to the problem of fast regular expression matching. A parallel regular expression matching method is introduced and studied in application to the problem of online pattern matching. The method is based on the concept of parallel \bitstream{} technology, in which parallel streams of bits are formed such bitstream technology, in which parallel streams of bits are formed such that each stream comprises bits in one-to-one correspondence with the character code units of a source data stream. %\input{introduction.tex} Regular expresssion matching is an extensively studied problem with a multitude of algorithms and software tools developed to the demands of particular problem contexts. Regular expresssion matching is an extensively studied problem with application to numerous application domains. A multitude of algorithms and software tools have been developed to the address the particular demands of the various application domains. Given a text T$_{1..n}$ of n characters and a pattern P, the pattern matching problem can be stated as follows. Find all the text positions of T that start an occurrence of P. Alternatively, one may want all the final positions of occurrences. Some applications require slightly different output such as the line that matches the pattern. The pattern matching problem can be stated as follows. Given a text T$_{1..n}$ of n characters and a pattern P, find all the text positions of T that start an occurrence of P. Alternatively, one may want all the final positions of occurrences. Some applications require slightly different output such as the line that matches the pattern. The pattern P can be just a simple string, but it can also be, for example, a regular expression. A regular expression, or pattern, is an expression that specifies a set of strings. A regular expression is composed of (i) simple strings (ii) the empty or (ii) A pattern P can be a simple string, but it can also be, a regular expression. A regular expression, is an expression that specifies a set of strings. A regular expression is composed of (i) simple strings and (ii) the union, concatenation and Kleene closure of other regular expressions. To avoid parentheses it is assumed that the Kleene star has the highest priority, classical texts such as \cite{aho2007}. Regular expression matching is commonly performed using a variety of publically available software tools. The most prominent, UNIX grep, Gnu grep, agrep, cgrep, nrgrep, and Perl regular expressions \cite{Abou-assaleh04surveyof}. Regular expression matching is commonly performed using a wide variety of publically available software tools for on-line pattern matching. For instance, UNIX grep, Gnu grep, agrep, cgrep, nrgrep, and Perl regular expressions \cite{abou-assaleh2004}. Amongst these tools Gnu grep (egrep), agrep, and nrgrep are widely known and considered as the fastest regular expression matching tools in practice \cite{navarro2000}. and are of particular interest to this study. Amongst these Gnu grep, agrep, and nrgrep are widely known and considered as the fastest regular expression matching tools in practice \cite{}. Of particular interest to this study are the performance oriented Gnu grep, agrep, and nrgrep. % simple patterns, extended patterns, regular expressions % motivation / previous work Although the finite state machine methods used in the scanning and parsing of Although tradi finite state machine methods used in the scanning and parsing of text streams is considered to be the hardest of the â13 dwarvesâ to parallelize [1], parallel bitstream technology shows considerable promise for these types of We further increase the parallelism in our methods by introducing a new parallel scanning primitive which we have coined 'Match Star' that returns all matches in a single operation and eliminates the need for back tracking ... (ELABORATE) scanning primitive which we have coined Match Star. Match Star returns all matches in a single operation and eliminates backtracking when a partially successful search path fails. --- STATE the content of the next sections. --- The remainder of this paper is organized as follows. Section~\ref{Background} presents background material on classic regular expression pattern matching techniques and provides insight into the efficiency of traditional regular expression software tools. We compare the performance of our parallel \bitstream{} techniques against various grep concentrate on the simpler case of reporting initial or final occurrence positions. Section~\ref{Parallel Bitwise Data Streams} describes out data parallel regular expression matching techniques. Section~\ref{Compiler Technology} Section~\ref{Methodology} describes the evaluation framework and Section~\ref{Experimental Results} presents a detailed performance analysis of our data parallel bitstream techniques against Gnu grep, agrep, and nr-grep. Section~\ref{Conclusion} concludes the paper. \section{Background} Historically, the origins of regular expression matching date back to automata theory and formal language theory developed by Kleene in the 1950s \cite{kleene1951representation}. and formal language theory developed by Kleene in the 1950s \cite{kleene1951}. In 1959, Dana and Scott demonstrated that DFA. A DFA has only a single active state and allows to search the text at O(n) worst-case optimal. The problem with this approach is that the DFA may have O(2^m) states. may have O($2^m$) states. In general, the general process is first to build a and finally simulate the DFA on text input. \section{Parallel Bitwise Data Streams} \label{Parallel Bitwise Data Streams} \section{Compiler Technology} \label{Compiler Technology} \section{Methodology} %\input{methodology.tex} We compare the performance of our parallel \bitstream{} techniques against Gnu grep, agrep, and nr-grep. Given a regular expression R and a test T the regular expression matching problem finds all ending position of substrings in Q that matches a string in the language denoted by R. The behaviour of Gnu grep, agrep, and nr-grep are differ in that Gnu grep agrep nr-grep \section{Experimental Results} \label{results} \label{Experimental Results} %\input{results.tex} \section{Conclusion and Future Work} \label{conclusion} \section{Conclusion} \label{Conclusion} %\input{conclusion.tex}
• ## docs/Working/re/re-main.tex.backup

 r3124 \begin{abstract} A data parallel regular expression matching method using the concept of bitstream technology is introduced and studied in application to the problem of fast regular expression matching. A parallel regular expression matching method is introduced and studied in application to the problem of online pattern matching. The method is based on the concept of parallel %\input{introduction.tex} % parallel bitstream technology, parallelization, regular expressions Regular expresssion matching is an extensively studied problem with application to numerous application domains. A multitude of algorithms and software tools have been developed to the address the particular demands of the various application domains. The pattern matching problem can be stated as follows. Given a text T$_{1..n}$ of n characters and a pattern P, find all the text positions of T that start an occurrence of P. Alternatively, one may want all the final positions of occurrences. Some applications require slightly different output such as the line that matches the pattern. A pattern P can be a simple string, but it can also be, a regular expression. A regular expression, is an expression that specifies a set of strings. A regular expression is composed of (i) simple strings and (ii) the union, concatenation and Kleene closure of other regular expressions. To avoid parentheses it is assumed that the Kleene star has the highest priority, next concatenation and then alternation, however, most formalisms provides grouping operators to allow the definition of scope and operator precedence. Readers unfamiliar with the concept of regular expression matching are referred classical texts such as \cite{aho2007}. Regular expression matching is commonly performed using a wide variety of publically available software tools for on-line pattern matching. For instance, UNIX grep, Gnu grep, agrep, cgrep, nrgrep, and Perl regular expressions \cite{abou-assaleh2004}. Amongst these tools Gnu grep (egrep), agrep, and nrgrep are widely known and considered as the fastest regular expression matching tools in practice \cite{navarro2000}. and are of particular interest to this study. % simple patterns, extended patterns, regular expressions % motivation / previous work Although the finite state machine methods used in the scanning and parsing of Although tradi finite state machine methods used in the scanning and parsing of text streams is considered to be the hardest of the â13 dwarvesâ to parallelize [1], parallel bitstream technology shows considerable promise for these types of We further increase the parallelism in our methods by introducing a new parallel scanning primitive which we have coined 'Match Star' that returns all matches in a single operation and eliminates the need for back tracking ... (ELABORATE) scanning primitive which we have coined Match Star. Match Star returns all matches in a single operation and eliminates backtracking when a partially successful search path fails. The remainder of this paper is organized as follows. Given a text T$_{1..n}$ of n characters and a pattern P, the pattern matching problem can be stated as follows. Find all the text positions of T that start an occurrence of P. Alternatively, one may want all the final positions of occurrences. Some applications require slightly different output such as the line that matches the pattern. Section~\ref{Background} presents background material on classic regular expression pattern matching techniques and provides insight into the efficiency of traditional regular expression software tools. The pattern P can be just a simple string, but it can also be, for example, a regular expression. Section~\ref{Bitwise Parallel Data Streams} describes out data parallel regular expression matching techniques. A regular expression, or pattern, is an expression that specifies a set of strings. A regular expression is composed of (i) simple strings (ii) the empty or (ii) union, concatenation and Kleene closure of other regular expressions. To avoid parentheses it is assumed that the Kleene star has the highest priority, next concatenation and then alternation, however, most formalisms provides grouping operators to allow the definition of scope and operator precedence. Readers unfamiliar with the concept of regular expression matching are referred classical texts such as \cite{aho2007}. Section~\ref{Compiler Technology} Regular expression matching is commonly performed using a variety of publically available software tools. The most prominent, UNIX grep, Gnu grep, agrep, cgrep, nrgrep, and Perl regular expressions \cite{Abou-assaleh04surveyof}. Section~\ref{Methodology} describes the evaluation framework and Section~\ref{Experimental Results} presents a detailed performance analysis of our data parallel \bitstream{} techniques against Gnu grep, agrep, and nr-grep. Amongst these Gnu grep, agrep, and nrgrep are widely known and considered as the fastest regular expression matching tools in practice \cite{}. Of particular interest to this study are the performance oriented Gnu grep, agrep, and nrgrep. % Unix grep % Gnu grep % agrep % nrgrep Regular expresssion matching is an extensively studied problem with a multitude of algorithms and software tools developed to the demands of particular problem contexts. As such, we compare the performance of our parallel \bitstream{} techniques against various grep concentrate on the simpler case of reporting initial or final occurrence positions. Section~\ref{conclusion} concludes the paper. \section{Background} Historically, the origins of regular expression matching date back to automata theory and formal language theory developed by Kleene in the 1950s \cite{kleene1951representation}. and formal language theory developed by Kleene in the 1950s \cite{kleene1951}. In 1959, Dana and Scott demonstrated that \section{Parallel Bitwise Data Streams} \label{Parallel Bitwise Data Streams} \section{Compiler Technology} \label{Compiler Technology} \section{Methodology} \label{Methodology} %\input{methodology.tex} We compare the performance of our parallel \bitstream{} techniques against Gnu grep, agrep, and nr-grep. Given a regular expression R and a test T the regular expression matching problem finds all ending position of substrings in Q that matches a string in the language denoted by R. The behaviour of Gnu grep, agrep, and nr-grep are differ in that Gnu grep agrep nr-grep \section{Experimental Results}
• ## docs/Working/re/reference.bib

 r3123 } @article{kleene1951representation, @article{kleene1951, title={Representation of events in nerve nets and finite automata}, author={Kleene, Stephen Cole}, } @TECHREPORT{Abou-assaleh04surveyof, @TECHREPORT{abou-assaleh2004, author = {Tony Abou-assaleh and Wei Ai}, title = {Survey of global regular expression print (GREP) tools}, pages={419-422} } @ARTICLE{navarro2000, author = {Gonzalo Navarro}, title = {NR-grep: A Fast and Flexible Pattern Matching Tool}, journal = {Software Practice and Experience (SPE}, year = {2000}, volume = {31}, pages = {2001} }
Note: See TracChangeset for help on using the changeset viewer.