Changeset 3636 for docs


Ignore:
Timestamp:
Feb 22, 2014, 3:37:03 PM (5 years ago)
Author:
cameron
Message:

Introduce ambiguity into MatchStar? examples

Location:
docs/Working/re
Files:
4 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/re/re-main.tex

    r3634 r3636  
    280280\begin{center}
    281281\begin{tabular}{cr}\\
    282 input data  & \verb`a4534q--b29z---az---a4q--bca22z--`\\
     282input data  & \verb`a453z--b3z--az--a12949z--ca22z7--`\\
    283283$B_7$ & \verb`.................................`\\
    284 $B_6$ & \verb`1....1..1..1...11...1.1..111..1..`\\
     284$B_6$ & \verb`1...1..1.1..11..1.....1..11..1...`\\
    285285$B_5$ & \verb`111111111111111111111111111111111`\\
    286 $B_4$ & \verb`.11111...111....1....11.....111..`\\
    287 $B_3$ & \verb`......11..11111.1111...11.....111`\\
    288 $B_2$ & \verb`.11.1.11....111..111.1.11......11`\\
    289 $B_1$ & \verb`...1....11.1....1........11.111..`\\
    290 $B_0$ & \verb`1.11.111..1.1111.1111.111.11...11`\\
    291 \verb:[a]: & \verb`1..............1....1......1.....`\\
    292 \verb:[z]: & \verb`...........1....1.............1..`\\
    293 \verb:[0-9]: & \verb`.1111....11..........1......11...`
     286$B_4$ & \verb`.1111...11...1...111111....1111..`\\
     287$B_3$ & \verb`....111..111.111...1.1111....1.11`\\
     288$B_2$ & \verb`.11..11...11..11....1..11.....111`\\
     289$B_1$ & \verb`...11..111...1....1...1..1.1111..`\\
     290$B_0$ & \verb`1.11.11.1.111.1111.1.1.1111...111`\\
     291\verb:[a]: & \verb`1...........1...1.........1......`\\
     292\verb:[z9]: & \verb`....1....1...1.....1.11......1...`\\
     293\verb:[0-9]: & \verb`.111....1........11111.....11.1..`
    294294\end{tabular}
    295295
     
    317317transposed representation, and the character class bit streams
    318318\verb:[a]:,
    319 \verb:[z]:, and
     319\verb:[z9]:, and
    320320\verb:[0-9]:
    321321that may be computed from the basis bit streams using bitwise logic.
     
    327327\begin{center}
    328328\begin{tabular}{cr}\\
    329 input data  & \verb`a4534q--b29z---az---a4q--bca22z--`\\
    330 $M_1$ & \verb`.1..............1....1......1....`\\
    331 $M_2$ & \verb`.11111..........1....11.....111..`\\
    332 $M_3$ & \verb`.................1.............1.`
     329input data  & \verb`a453z--b3z--az--a12949z--ca22z7--`\\
     330$M_1$ & \verb`.1...........1...1.........1.....`\\
     331$M_2$ & \verb`.1111........1...111111....111...`\\
     332$M_3$ & \verb`.....1........1.....1.11......1..`
    333333\end{tabular}
    334334
    335335\end{center}
    336 \caption{Marker Streams in Matching {\tt a[0-9]*z}}
     336\caption{Marker Streams in Matching {\tt a[0-9]*[z9]}}
    337337\label{fig:streams2}
    338338\end{figure}
     
    342342the problem of searching the input stream of Figure \ref{fig:streams}
    343343to finding occurrence of strings matching
    344 the regular expression \verb:a[0-9]*z:.
     344the regular expression \verb:a[0-9]*[z9]:. 
     345Note that this is an ambiguous regular expression, which could match
     346texts such as \verb:a12949z: in multiple ways.
    345347The matching process involves the concept of {\em marker streams}, that
    346348is streams that mark the positions of current matches during the
     
    351353reachable from positions marked by $M_1$ by further matching zero or
    352354more digits (\verb:[0-9]*:) and finally $M_3$ the stream
    353 marking positions after a final \verb:z: has been found.
     355marking positions after a final \verb:z: or \verb:9: has been found.
    354356Without describing the details of how these streams are computed
    355357for the time being, Figure \ref{fig:streams2} shows what each
    356358of these streams should be for our example matching problem.
    357 Note our convention that a marker stream contains a 1 bit
     359Our convention that a marker stream contains a 1 bit
    358360at the next character position to be matched, that is,
    359361immediately past the last position that was matched.
     362Note that all three matches from the third occurrence of \verb:a:
     363are correctly marked in $M_3$.
    360364
    361365
     
    366370\begin{center}
    367371\begin{tabular}{cr}\\
    368 input data  & \verb`a4534q--b29z---az---a4q--bca22z--`\\
    369 $M_1$ & \verb`.1..............1....1......1....`\\
    370 $D = \text{\tt [0-9]}$ & \verb`.1111....11..........1......11...`\\
    371 $T_0 = M_1 \wedge D$ & \verb`.1...................1......1....`\\
    372 $T_1 = T_0 + D$ & \verb`.....1...11...........1.......1..`\\
    373 $T_2 = T_1 \oplus D$ & \verb`.11111...............11.....111..`\\
    374 $M_2 = T_2 \, | \, M_1$ & \verb`.11111..........1....11.....111..`
     372input data  & \verb`a453z--b3z--az--a12949z--ca22z7--`\\
     373$M_1$ & \verb`.1...........1...1.........1.....`\\
     374$D = \text{\tt [0-9]}$ & \verb`.111....1........11111.....11.1..`\\
     375$T_0 = M_1 \wedge D$ & \verb`.1...............1.........1.....`\\
     376$T_1 = T_0 + D$ & \verb`....1...1.............1......11..`\\
     377$T_2 = T_1 \oplus D$ & \verb`.1111............111111....111...`\\
     378$M_2 = T_2 \, | \, M_1$ & \verb`.1111........1...111111....111...`
    375379\end{tabular}
    376380
     
    402406all possible positions that can be reached by 0 or more occurrences
    403407of characters in class $C$ from each position in $M$. 
     408
     409MatchStar differs from ScanThru of the Parabix tool chain in that it
     410finds all matches, not just the longest match.  This is necessary
     411for general matching involving possibly ambiguous regular
     412expressions.   
    404413
    405414\input{compilation}
  • docs/Working/re/scripts/ccinput.txt

    r3488 r3636  
    44
    55lex.a = [a]
    6 lex.z = [z]
     6lex.z9 = [z9]
  • docs/Working/re/scripts/streams.py

    r3491 r3636  
    7676        temp24 = (basis_bits.bit_6 &~ basis_bits.bit_7)
    7777        temp25 = (temp23 & temp24)
    78         lex.z = (temp22 & temp25)
     78        temp26 = (temp22 & temp25)
     79        temp27 = (temp23 & temp20)
     80        temp28 = (temp9 & temp27)
     81        lex.z9 = (temp26 | temp28)
    7982
    8083
     
    9598                              ('$B_0$', pablo.bitstream2string(basis_bits.bit_7, lgth, zero_ch)),
    9699                              ('\\verb:[a]:', pablo.bitstream2string(lex.a, lgth, zero_ch)),
    97                               ('\\verb:[z]:', pablo.bitstream2string(lex.z, lgth, zero_ch)),
     100                              ('\\verb:[z9]:', pablo.bitstream2string(lex.z9, lgth, zero_ch)),
    98101                              ('\\verb:[0-9]:', pablo.bitstream2string(lex.digit, lgth, zero_ch))])
    99102
     
    107110        M1 = pablo.Advance(lex.a)
    108111        M2 = pablo.MatchStar(M1, lex.digit)
    109         M3 = pablo.Advance(M2 & lex.z)
     112        M3 = pablo.Advance(M2 & lex.z9)
    110113        return pablo.latex_streams([('input data ', chardata),
    111114                              ('$M_1$', pablo.bitstream2string(M1, lgth, zero_ch)),
     
    137140
    138141if __name__ == "__main__":
    139         print latex_streams1(r"""a4534q--b29z---az---a4q--bca22z--""")
    140         print latex_streams2(r"""a4534q--b29z---az---a4q--bca22z--""")
    141         print latex_streams3(r"""a4534q--b29z---az---a4q--bca22z--""")
     142        #data = r"""a4534q--b29z---az---a4q--bca22z--"""
     143        data = r"""a453z--b3z--az--a12949z--ca22z7--"""
     144        print latex_streams1(data)
     145        print latex_streams2(data)
     146        print latex_streams3(data)
    142147
    143148
Note: See TracChangeset for help on using the changeset viewer.