Changeset 3491 for docs/Working


Ignore:
Timestamp:
Sep 15, 2013, 11:09:20 AM (6 years ago)
Author:
cameron
Message:

Update MatchStar? description and diagram

Location:
docs/Working/re
Files:
4 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/re/ppopp-re.tex

    r3490 r3491  
    301301manner.
    302302
     303\begin{figure}[tbh]
     304\begin{center}
     305\begin{tabular}{cr}\\
     306input data  & \verb`a4534q--b29z---az---a4q--bca22z--`\\
     307$B_7$ & \verb`.................................`\\
     308$B_6$ & \verb`1....1..1..1...11...1.1..111..1..`\\
     309$B_5$ & \verb`111111111111111111111111111111111`\\
     310$B_4$ & \verb`.11111...111....1....11.....111..`\\
     311$B_3$ & \verb`......11..11111.1111...11.....111`\\
     312$B_2$ & \verb`.11.1.11....111..111.1.11......11`\\
     313$B_1$ & \verb`...1....11.1....1........11.111..`\\
     314$B_0$ & \verb`1.11.111..1.1111.1111.111.11...11`\\
     315\verb:[a]: & \verb`1..............1....1......1.....`\\
     316\verb:[z]: & \verb`...........1....1.............1..`\\
     317\verb:[0-9]: & \verb`.1111....11..........1......11...`
     318\end{tabular}
     319
     320\end{center}
     321\caption{Basis and Character Class Streams}
     322\label{fig:streams}
     323\end{figure}
     324
    303325A key concept in this streaming approach is the derivation of bit streams
    304326that are parallel to the input data stream, i.e., in one-to-one
     
    322344\verb:[0-9]:
    323345that may be computed from the basis bit streams using bitwise logic.
     346Zero bits are marked with periods ({\tt .}) so that the one bits stand out.
    324347Transposition and character class construction are straightforward
    325348using the Parabix tool chain \cite{lin2012parabix}.
     
    329352\begin{tabular}{cr}\\
    330353input data  & \verb`a4534q--b29z---az---a4q--bca22z--`\\
    331 $B_7$ & \verb`.................................`\\
    332 $B_6$ & \verb`1....1..1..1...11...1.1..111..1..`\\
    333 $B_5$ & \verb`111111111111111111111111111111111`\\
    334 $B_4$ & \verb`.11111...111....1....11.....111..`\\
    335 $B_3$ & \verb`......11..11111.1111...11.....111`\\
    336 $B_2$ & \verb`.11.1.11....111..111.1.11......11`\\
    337 $B_1$ & \verb`...1....11.1....1........11.111..`\\
    338 $B_0$ & \verb`1.11.111..1.1111.1111.111.11...11`\\
    339 \verb:[a]: & \verb`1..............1....1......1.....`\\
    340 \verb:[z]: & \verb`...........1....1.............1..`\\
    341 \verb:[0-9]: & \verb`.1111....11..........1......11...`
     354$M_1$ & \verb`.1..............1....1......1....`\\
     355$M_2$ & \verb`.11111..........1....11.....111..`\\
     356$M_3$ & \verb`.................1.............1.`
    342357\end{tabular}
    343358
    344359\end{center}
    345 \caption{Basis and Character Class Streams}
    346 \label{fig:streams}
    347 \end{figure}
    348 
     360\caption{Marker Streams in Matching {\tt a[0-9]*z}}
     361\label{fig:streams2}
     362\end{figure}
    349363
    350364\paragraph*{Marker Streams.}  Now consider how bit-parallel data
     
    363377marking positions after a final \verb:z: has been found.
    364378Without describing the details of how these streams are computed
    365 for the time being, Figure \ref{fig:streams} shows what each
     379for the time being, Figure \ref{fig:streams2} shows what each
    366380of these streams should be for our example matching problem.
    367381Note our convention that a marker stream contains a 1 bit
     
    373387MatchStar takes a marker bitstream and a character class bitstream as input.  It returns all positions that can be reached by advancing the marker bitstream zero or more times through the character class bitstream.
    374388
    375 Figure \ref{fig:matchstar} illustrates the MatchStar method.  The second and third rows are the input bitstreams: the initial marker position bitstream and the character class bitstream derived from the source data.
    376 
    377 In the first operation ($T_0$), marker positions that cannot be advanced are temporarily removed from consideration by masking off marker positions that aren't character class positions using bitwise logic.  Next, the temporary marker bitstream is added to the character class bitstream.  $T_1$ has 1s in three types of positions.  There will be a 1 immediately following a block of character class positions that spanned one or more marker positions, at any character class positions that weren't affected by the addition (and are not part of the desired output), and at any marker position that wasn't the first in its block of character class positions.  Any character class positions that have a 0 in $T_1$ were affected by the addition and are part of the desired output.  These positions are obtained and the undesired 1 bits are removed by XORing with the character class stream. $T_2$ is now only missing marker positions that were removed in the first step as well as marker positions that were 1s in $T_1$.  The
     389\begin{figure}[tbh]
     390\begin{center}
     391\begin{tabular}{cr}\\
     392input data  & \verb`a4534q--b29z---az---a4q--bca22z--`\\
     393$M_1$ & \verb`.1..............1....1......1....`\\
     394$D = \text{\tt [0-9]}$ & \verb`.1111....11..........1......11...`\\
     395$T_0 = M_1 \wedge D$ & \verb`.1...................1......1....`\\
     396$T_1 = T_0 + D$ & \verb`.....1...11...........1.......1..`\\
     397$T_2 = T_1 \oplus D$ & \verb`.11111...............11.....111..`\\
     398$M_2 = T_2 \, | \, M_1$ & \verb`.11111..........1....11.....111..`
     399\end{tabular}
     400
     401\end{center}
     402\caption{$M_2 = \text{MatchStar}(M_1, D)$}
     403\label{fig:matchstar}
     404\end{figure}
     405
     406
     407Figure \ref{fig:matchstar} illustrates the MatchStar method.  The second and third rows are the input bitstreams: the initial marker position bitstream and the character class bitstream derived from the source data. 
     408
     409It is important to note that our bitstreams are shown in natural left-to-right order reflecting the
     410conventional presentation of our character data input.   However, this reverses the normal
     411order of presentation when considering bitstreams as numeric values.  The key point here is
     412that when we perform bitstream addition, we will show bit movement from left-to-right.
     413That $\verb:111.: + \verb:1...: = \verb:...1:$
     414
     415In the first operation ($T_0$), marker positions that cannot be advanced are temporarily removed from consideration by masking off marker positions that aren't character class positions using bitwise logic.  Next, the temporary marker bitstream is added to the character class bitstream. 
     416The addition produces 1s in three types of positions.  There will be a 1 immediately following a block of character class positions that spanned one or more marker positions, at any character class positions that weren't affected by the addition (and are not part of the desired output), and at any marker position that wasn't the first in its block of character class positions.  Any character class positions that have a 0 in $T_1$ were affected by the addition and are part of the desired output.  These positions are obtained and the undesired 1 bits are removed by XORing with the character class stream. $T_2$ is now only missing marker positions that were removed in the first step as well as marker positions that were 1s in $T_1$.  The
    378417output marker stream is obtained by ORing $T_2$ with the initial marker stream.
    379 
    380 \begin{figure*}[tbh]
    381 \begin{center}
    382 \begin{tabular}{cr}\\
    383 source data & \verb`--142578---125-231-----127--5394---94761205-`\\
    384 $M_0$ & \verb`.......1......1..1..1...1.............1..1..`\\
    385 $D = $\verb:[0-9]: & \verb`..111111...111.111.....111..1111...11111111.`\\
    386 $T_0 = M_0 \wedge D$ & \verb`.......1.........1......1.............1..1..`\\
    387 $T_1 = T_0 + D$ & \verb`.1.........1111.......1..1..1111..1...1...1.`\\
    388 $T_2 = T_1 \oplus D$ & \verb`.1111111......1111....111.........1111.111..`\\
    389 $M_1 = T_2 \, | \, M_0$ & \verb`.1111111......1111..1.111.........11111111..`
    390 \end{tabular}
    391 
    392 \end{center}
    393 \caption{Match Star}
    394 \label{fig:matchstar1}
    395 \end{figure*}
    396 
    397418
    398419In general, given a marker stream $M$ and a character class stream $C$,
  • docs/Working/re/scripts/pablo.py

    r3488 r3491  
    124124
    125125
     126def MatchStar(m, c):
     127        return (((m & c) + c) ^ c) | m
    126128#
    127129# Advance-and-Scan
  • docs/Working/re/scripts/streams.py

    r3490 r3491  
    9999
    100100
     101def latex_streams2(chardata):
     102        lgth = len(chardata)
     103        basis_bits = Basis_bits()
     104        lex = Lex()
     105        pablo.EOF_mask = pablo.transpose_streams(chardata, basis_bits)
     106        do_lex(basis_bits, lex)
     107        M1 = pablo.Advance(lex.a)
     108        M2 = pablo.MatchStar(M1, lex.digit)
     109        M3 = pablo.Advance(M2 & lex.z)
     110        return pablo.latex_streams([('input data ', chardata),
     111                              ('$M_1$', pablo.bitstream2string(M1, lgth, zero_ch)),
     112                              ('$M_2$', pablo.bitstream2string(M2, lgth, zero_ch)),
     113                              ('$M_3$', pablo.bitstream2string(M3, lgth, zero_ch))])
     114
     115def latex_streams3(chardata):
     116        lgth = len(chardata)
     117        basis_bits = Basis_bits()
     118        lex = Lex()
     119        pablo.EOF_mask = pablo.transpose_streams(chardata, basis_bits)
     120        do_lex(basis_bits, lex)
     121        M1 = pablo.Advance(lex.a)
     122        T0 = M1 & lex.digit
     123        T1 = T0 + lex.digit
     124        T2 = T1 ^ lex.digit
     125        #M2 = pablo.MatchStar(m1, lex.digit)
     126        M2 = T2 | M1
     127        return pablo.latex_streams([('input data ', chardata),
     128                              ('$M_1$', pablo.bitstream2string(M1, lgth, zero_ch)),
     129                              ('$D = \\text{\\tt [0-9]}$', pablo.bitstream2string(lex.digit, lgth, zero_ch)),
     130                              ('$T_0 = M_1 \wedge D$', pablo.bitstream2string(T0, lgth, zero_ch)),
     131                              ('$T_1 = T_0 + D$', pablo.bitstream2string(T1, lgth, zero_ch)),
     132                              ('$T_2 = T_1 \oplus D$', pablo.bitstream2string(T2, lgth, zero_ch)),
     133                              ('$M_2 = T_2 \, | \, M_1$', pablo.bitstream2string(M2, lgth, zero_ch))])
     134
     135
     136
    101137
    102138if __name__ == "__main__":
    103139        print latex_streams1(r"""a4534q--b29z---az---a4q--bca22z--""")
     140        print latex_streams2(r"""a4534q--b29z---az---a4q--bca22z--""")
     141        print latex_streams3(r"""a4534q--b29z---az---a4q--bca22z--""")
    104142
    105143
Note: See TracChangeset for help on using the changeset viewer.