# Changeset 3491

Ignore:
Timestamp:
Sep 15, 2013, 11:09:20 AM (6 years ago)
Message:

Update MatchStar? description and diagram

Location:
docs/Working/re
Files:
4 edited

### Legend:

Unmodified
 r3490 manner. \begin{figure}[tbh] \begin{center} \begin{tabular}{cr}\\ input data  & \verba4534q--b29z---az---a4q--bca22z--\\ $B_7$ & \verb.................................\\ $B_6$ & \verb1....1..1..1...11...1.1..111..1..\\ $B_5$ & \verb111111111111111111111111111111111\\ $B_4$ & \verb.11111...111....1....11.....111..\\ $B_3$ & \verb......11..11111.1111...11.....111\\ $B_2$ & \verb.11.1.11....111..111.1.11......11\\ $B_1$ & \verb...1....11.1....1........11.111..\\ $B_0$ & \verb1.11.111..1.1111.1111.111.11...11\\ \verb:[a]: & \verb1..............1....1......1.....\\ \verb:[z]: & \verb...........1....1.............1..\\ \verb:[0-9]: & \verb.1111....11..........1......11... \end{tabular} \end{center} \caption{Basis and Character Class Streams} \label{fig:streams} \end{figure} A key concept in this streaming approach is the derivation of bit streams that are parallel to the input data stream, i.e., in one-to-one \verb:[0-9]: that may be computed from the basis bit streams using bitwise logic. Zero bits are marked with periods ({\tt .}) so that the one bits stand out. Transposition and character class construction are straightforward using the Parabix tool chain \cite{lin2012parabix}. \begin{tabular}{cr}\\ input data  & \verba4534q--b29z---az---a4q--bca22z--\\ $B_7$ & \verb.................................\\ $B_6$ & \verb1....1..1..1...11...1.1..111..1..\\ $B_5$ & \verb111111111111111111111111111111111\\ $B_4$ & \verb.11111...111....1....11.....111..\\ $B_3$ & \verb......11..11111.1111...11.....111\\ $B_2$ & \verb.11.1.11....111..111.1.11......11\\ $B_1$ & \verb...1....11.1....1........11.111..\\ $B_0$ & \verb1.11.111..1.1111.1111.111.11...11\\ \verb:[a]: & \verb1..............1....1......1.....\\ \verb:[z]: & \verb...........1....1.............1..\\ \verb:[0-9]: & \verb.1111....11..........1......11... $M_1$ & \verb.1..............1....1......1....\\ $M_2$ & \verb.11111..........1....11.....111..\\ $M_3$ & \verb.................1.............1. \end{tabular} \end{center} \caption{Basis and Character Class Streams} \label{fig:streams} \end{figure} \caption{Marker Streams in Matching {\tt a[0-9]*z}} \label{fig:streams2} \end{figure} \paragraph*{Marker Streams.}  Now consider how bit-parallel data marking positions after a final \verb:z: has been found. Without describing the details of how these streams are computed for the time being, Figure \ref{fig:streams} shows what each for the time being, Figure \ref{fig:streams2} shows what each of these streams should be for our example matching problem. Note our convention that a marker stream contains a 1 bit MatchStar takes a marker bitstream and a character class bitstream as input.  It returns all positions that can be reached by advancing the marker bitstream zero or more times through the character class bitstream. Figure \ref{fig:matchstar} illustrates the MatchStar method.  The second and third rows are the input bitstreams: the initial marker position bitstream and the character class bitstream derived from the source data. In the first operation ($T_0$), marker positions that cannot be advanced are temporarily removed from consideration by masking off marker positions that aren't character class positions using bitwise logic.  Next, the temporary marker bitstream is added to the character class bitstream.  $T_1$ has 1s in three types of positions.  There will be a 1 immediately following a block of character class positions that spanned one or more marker positions, at any character class positions that weren't affected by the addition (and are not part of the desired output), and at any marker position that wasn't the first in its block of character class positions.  Any character class positions that have a 0 in $T_1$ were affected by the addition and are part of the desired output.  These positions are obtained and the undesired 1 bits are removed by XORing with the character class stream. $T_2$ is now only missing marker positions that were removed in the first step as well as marker positions that were 1s in $T_1$.  The \begin{figure}[tbh] \begin{center} \begin{tabular}{cr}\\ input data  & \verba4534q--b29z---az---a4q--bca22z--\\ $M_1$ & \verb.1..............1....1......1....\\ $D = \text{\tt [0-9]}$ & \verb.1111....11..........1......11...\\ $T_0 = M_1 \wedge D$ & \verb.1...................1......1....\\ $T_1 = T_0 + D$ & \verb.....1...11...........1.......1..\\ $T_2 = T_1 \oplus D$ & \verb.11111...............11.....111..\\ $M_2 = T_2 \, | \, M_1$ & \verb.11111..........1....11.....111.. \end{tabular} \end{center} \caption{$M_2 = \text{MatchStar}(M_1, D)$} \label{fig:matchstar} \end{figure} Figure \ref{fig:matchstar} illustrates the MatchStar method.  The second and third rows are the input bitstreams: the initial marker position bitstream and the character class bitstream derived from the source data. It is important to note that our bitstreams are shown in natural left-to-right order reflecting the conventional presentation of our character data input.   However, this reverses the normal order of presentation when considering bitstreams as numeric values.  The key point here is that when we perform bitstream addition, we will show bit movement from left-to-right. That $\verb:111.: + \verb:1...: = \verb:...1:$ In the first operation ($T_0$), marker positions that cannot be advanced are temporarily removed from consideration by masking off marker positions that aren't character class positions using bitwise logic.  Next, the temporary marker bitstream is added to the character class bitstream. The addition produces 1s in three types of positions.  There will be a 1 immediately following a block of character class positions that spanned one or more marker positions, at any character class positions that weren't affected by the addition (and are not part of the desired output), and at any marker position that wasn't the first in its block of character class positions.  Any character class positions that have a 0 in $T_1$ were affected by the addition and are part of the desired output.  These positions are obtained and the undesired 1 bits are removed by XORing with the character class stream. $T_2$ is now only missing marker positions that were removed in the first step as well as marker positions that were 1s in $T_1$.  The output marker stream is obtained by ORing $T_2$ with the initial marker stream. \begin{figure*}[tbh] \begin{center} \begin{tabular}{cr}\\ source data & \verb--142578---125-231-----127--5394---94761205-\\ $M_0$ & \verb.......1......1..1..1...1.............1..1..\\ $D =$\verb:[0-9]: & \verb..111111...111.111.....111..1111...11111111.\\ $T_0 = M_0 \wedge D$ & \verb.......1.........1......1.............1..1..\\ $T_1 = T_0 + D$ & \verb.1.........1111.......1..1..1111..1...1...1.\\ $T_2 = T_1 \oplus D$ & \verb.1111111......1111....111.........1111.111..\\ $M_1 = T_2 \, | \, M_0$ & \verb.1111111......1111..1.111.........11111111.. \end{tabular} \end{center} \caption{Match Star} \label{fig:matchstar1} \end{figure*} In general, given a marker stream $M$ and a character class stream $C$,
 r3490 def latex_streams2(chardata): lgth = len(chardata) basis_bits = Basis_bits() lex = Lex() pablo.EOF_mask = pablo.transpose_streams(chardata, basis_bits) do_lex(basis_bits, lex) M1 = pablo.Advance(lex.a) M2 = pablo.MatchStar(M1, lex.digit) M3 = pablo.Advance(M2 & lex.z) return pablo.latex_streams([('input data ', chardata), ('$M_1$', pablo.bitstream2string(M1, lgth, zero_ch)), ('$M_2$', pablo.bitstream2string(M2, lgth, zero_ch)), ('$M_3$', pablo.bitstream2string(M3, lgth, zero_ch))]) def latex_streams3(chardata): lgth = len(chardata) basis_bits = Basis_bits() lex = Lex() pablo.EOF_mask = pablo.transpose_streams(chardata, basis_bits) do_lex(basis_bits, lex) M1 = pablo.Advance(lex.a) T0 = M1 & lex.digit T1 = T0 + lex.digit T2 = T1 ^ lex.digit #M2 = pablo.MatchStar(m1, lex.digit) M2 = T2 | M1 return pablo.latex_streams([('input data ', chardata), ('$M_1$', pablo.bitstream2string(M1, lgth, zero_ch)), ('$D = \\text{\\tt [0-9]}$', pablo.bitstream2string(lex.digit, lgth, zero_ch)), ('$T_0 = M_1 \wedge D$', pablo.bitstream2string(T0, lgth, zero_ch)), ('$T_1 = T_0 + D$', pablo.bitstream2string(T1, lgth, zero_ch)), ('$T_2 = T_1 \oplus D$', pablo.bitstream2string(T2, lgth, zero_ch)), ('$M_2 = T_2 \, | \, M_1$', pablo.bitstream2string(M2, lgth, zero_ch))]) if __name__ == "__main__": print latex_streams1(r"""a4534q--b29z---az---a4q--bca22z--""") print latex_streams2(r"""a4534q--b29z---az---a4q--bca22z--""") print latex_streams3(r"""a4534q--b29z---az---a4q--bca22z--""")