- Timestamp:
- Sep 15, 2013, 11:09:20 AM (5 years ago)
- Location:
- docs/Working/re
- Files:
-
- 4 edited
Legend:
- Unmodified
- Added
- Removed
-
docs/Working/re/ppopp-re.tex
r3490 r3491 301 301 manner. 302 302 303 \begin{figure}[tbh] 304 \begin{center} 305 \begin{tabular}{cr}\\ 306 input data & \verb`a4534q--b29z---az---a4q--bca22z--`\\ 307 $B_7$ & \verb`.................................`\\ 308 $B_6$ & \verb`1....1..1..1...11...1.1..111..1..`\\ 309 $B_5$ & \verb`111111111111111111111111111111111`\\ 310 $B_4$ & \verb`.11111...111....1....11.....111..`\\ 311 $B_3$ & \verb`......11..11111.1111...11.....111`\\ 312 $B_2$ & \verb`.11.1.11....111..111.1.11......11`\\ 313 $B_1$ & \verb`...1....11.1....1........11.111..`\\ 314 $B_0$ & \verb`1.11.111..1.1111.1111.111.11...11`\\ 315 \verb:[a]: & \verb`1..............1....1......1.....`\\ 316 \verb:[z]: & \verb`...........1....1.............1..`\\ 317 \verb:[0-9]: & \verb`.1111....11..........1......11...` 318 \end{tabular} 319 320 \end{center} 321 \caption{Basis and Character Class Streams} 322 \label{fig:streams} 323 \end{figure} 324 303 325 A key concept in this streaming approach is the derivation of bit streams 304 326 that are parallel to the input data stream, i.e., in one-to-one … … 322 344 \verb:[0-9]: 323 345 that may be computed from the basis bit streams using bitwise logic. 346 Zero bits are marked with periods ({\tt .}) so that the one bits stand out. 324 347 Transposition and character class construction are straightforward 325 348 using the Parabix tool chain \cite{lin2012parabix}. … … 329 352 \begin{tabular}{cr}\\ 330 353 input data & \verb`a4534q--b29z---az---a4q--bca22z--`\\ 331 $B_7$ & \verb`.................................`\\ 332 $B_6$ & \verb`1....1..1..1...11...1.1..111..1..`\\ 333 $B_5$ & \verb`111111111111111111111111111111111`\\ 334 $B_4$ & \verb`.11111...111....1....11.....111..`\\ 335 $B_3$ & \verb`......11..11111.1111...11.....111`\\ 336 $B_2$ & \verb`.11.1.11....111..111.1.11......11`\\ 337 $B_1$ & \verb`...1....11.1....1........11.111..`\\ 338 $B_0$ & \verb`1.11.111..1.1111.1111.111.11...11`\\ 339 \verb:[a]: & \verb`1..............1....1......1.....`\\ 340 \verb:[z]: & \verb`...........1....1.............1..`\\ 341 \verb:[0-9]: & \verb`.1111....11..........1......11...` 354 $M_1$ & \verb`.1..............1....1......1....`\\ 355 $M_2$ & \verb`.11111..........1....11.....111..`\\ 356 $M_3$ & \verb`.................1.............1.` 342 357 \end{tabular} 343 358 344 359 \end{center} 345 \caption{Basis and Character Class Streams} 346 \label{fig:streams} 347 \end{figure} 348 360 \caption{Marker Streams in Matching {\tt a[0-9]*z}} 361 \label{fig:streams2} 362 \end{figure} 349 363 350 364 \paragraph*{Marker Streams.} Now consider how bit-parallel data … … 363 377 marking positions after a final \verb:z: has been found. 364 378 Without describing the details of how these streams are computed 365 for the time being, Figure \ref{fig:streams } shows what each379 for the time being, Figure \ref{fig:streams2} shows what each 366 380 of these streams should be for our example matching problem. 367 381 Note our convention that a marker stream contains a 1 bit … … 373 387 MatchStar takes a marker bitstream and a character class bitstream as input. It returns all positions that can be reached by advancing the marker bitstream zero or more times through the character class bitstream. 374 388 375 Figure \ref{fig:matchstar} illustrates the MatchStar method. The second and third rows are the input bitstreams: the initial marker position bitstream and the character class bitstream derived from the source data. 376 377 In the first operation ($T_0$), marker positions that cannot be advanced are temporarily removed from consideration by masking off marker positions that aren't character class positions using bitwise logic. Next, the temporary marker bitstream is added to the character class bitstream. $T_1$ has 1s in three types of positions. There will be a 1 immediately following a block of character class positions that spanned one or more marker positions, at any character class positions that weren't affected by the addition (and are not part of the desired output), and at any marker position that wasn't the first in its block of character class positions. Any character class positions that have a 0 in $T_1$ were affected by the addition and are part of the desired output. These positions are obtained and the undesired 1 bits are removed by XORing with the character class stream. $T_2$ is now only missing marker positions that were removed in the first step as well as marker positions that were 1s in $T_1$. The 389 \begin{figure}[tbh] 390 \begin{center} 391 \begin{tabular}{cr}\\ 392 input data & \verb`a4534q--b29z---az---a4q--bca22z--`\\ 393 $M_1$ & \verb`.1..............1....1......1....`\\ 394 $D = \text{\tt [0-9]}$ & \verb`.1111....11..........1......11...`\\ 395 $T_0 = M_1 \wedge D$ & \verb`.1...................1......1....`\\ 396 $T_1 = T_0 + D$ & \verb`.....1...11...........1.......1..`\\ 397 $T_2 = T_1 \oplus D$ & \verb`.11111...............11.....111..`\\ 398 $M_2 = T_2 \, | \, M_1$ & \verb`.11111..........1....11.....111..` 399 \end{tabular} 400 401 \end{center} 402 \caption{$M_2 = \text{MatchStar}(M_1, D)$} 403 \label{fig:matchstar} 404 \end{figure} 405 406 407 Figure \ref{fig:matchstar} illustrates the MatchStar method. The second and third rows are the input bitstreams: the initial marker position bitstream and the character class bitstream derived from the source data. 408 409 It is important to note that our bitstreams are shown in natural left-to-right order reflecting the 410 conventional presentation of our character data input. However, this reverses the normal 411 order of presentation when considering bitstreams as numeric values. The key point here is 412 that when we perform bitstream addition, we will show bit movement from left-to-right. 413 That $\verb:111.: + \verb:1...: = \verb:...1:$ 414 415 In the first operation ($T_0$), marker positions that cannot be advanced are temporarily removed from consideration by masking off marker positions that aren't character class positions using bitwise logic. Next, the temporary marker bitstream is added to the character class bitstream. 416 The addition produces 1s in three types of positions. There will be a 1 immediately following a block of character class positions that spanned one or more marker positions, at any character class positions that weren't affected by the addition (and are not part of the desired output), and at any marker position that wasn't the first in its block of character class positions. Any character class positions that have a 0 in $T_1$ were affected by the addition and are part of the desired output. These positions are obtained and the undesired 1 bits are removed by XORing with the character class stream. $T_2$ is now only missing marker positions that were removed in the first step as well as marker positions that were 1s in $T_1$. The 378 417 output marker stream is obtained by ORing $T_2$ with the initial marker stream. 379 380 \begin{figure*}[tbh]381 \begin{center}382 \begin{tabular}{cr}\\383 source data & \verb`--142578---125-231-----127--5394---94761205-`\\384 $M_0$ & \verb`.......1......1..1..1...1.............1..1..`\\385 $D = $\verb:[0-9]: & \verb`..111111...111.111.....111..1111...11111111.`\\386 $T_0 = M_0 \wedge D$ & \verb`.......1.........1......1.............1..1..`\\387 $T_1 = T_0 + D$ & \verb`.1.........1111.......1..1..1111..1...1...1.`\\388 $T_2 = T_1 \oplus D$ & \verb`.1111111......1111....111.........1111.111..`\\389 $M_1 = T_2 \, | \, M_0$ & \verb`.1111111......1111..1.111.........11111111..`390 \end{tabular}391 392 \end{center}393 \caption{Match Star}394 \label{fig:matchstar1}395 \end{figure*}396 397 418 398 419 In general, given a marker stream $M$ and a character class stream $C$, -
docs/Working/re/scripts/pablo.py
r3488 r3491 124 124 125 125 126 def MatchStar(m, c): 127 return (((m & c) + c) ^ c) | m 126 128 # 127 129 # Advance-and-Scan -
docs/Working/re/scripts/streams.py
r3490 r3491 99 99 100 100 101 def latex_streams2(chardata): 102 lgth = len(chardata) 103 basis_bits = Basis_bits() 104 lex = Lex() 105 pablo.EOF_mask = pablo.transpose_streams(chardata, basis_bits) 106 do_lex(basis_bits, lex) 107 M1 = pablo.Advance(lex.a) 108 M2 = pablo.MatchStar(M1, lex.digit) 109 M3 = pablo.Advance(M2 & lex.z) 110 return pablo.latex_streams([('input data ', chardata), 111 ('$M_1$', pablo.bitstream2string(M1, lgth, zero_ch)), 112 ('$M_2$', pablo.bitstream2string(M2, lgth, zero_ch)), 113 ('$M_3$', pablo.bitstream2string(M3, lgth, zero_ch))]) 114 115 def latex_streams3(chardata): 116 lgth = len(chardata) 117 basis_bits = Basis_bits() 118 lex = Lex() 119 pablo.EOF_mask = pablo.transpose_streams(chardata, basis_bits) 120 do_lex(basis_bits, lex) 121 M1 = pablo.Advance(lex.a) 122 T0 = M1 & lex.digit 123 T1 = T0 + lex.digit 124 T2 = T1 ^ lex.digit 125 #M2 = pablo.MatchStar(m1, lex.digit) 126 M2 = T2 | M1 127 return pablo.latex_streams([('input data ', chardata), 128 ('$M_1$', pablo.bitstream2string(M1, lgth, zero_ch)), 129 ('$D = \\text{\\tt [0-9]}$', pablo.bitstream2string(lex.digit, lgth, zero_ch)), 130 ('$T_0 = M_1 \wedge D$', pablo.bitstream2string(T0, lgth, zero_ch)), 131 ('$T_1 = T_0 + D$', pablo.bitstream2string(T1, lgth, zero_ch)), 132 ('$T_2 = T_1 \oplus D$', pablo.bitstream2string(T2, lgth, zero_ch)), 133 ('$M_2 = T_2 \, | \, M_1$', pablo.bitstream2string(M2, lgth, zero_ch))]) 134 135 136 101 137 102 138 if __name__ == "__main__": 103 139 print latex_streams1(r"""a4534q--b29z---az---a4q--bca22z--""") 140 print latex_streams2(r"""a4534q--b29z---az---a4q--bca22z--""") 141 print latex_streams3(r"""a4534q--b29z---az---a4q--bca22z--""") 104 142 105 143
Note: See TracChangeset
for help on using the changeset viewer.