Changeset 2530 for docs/Working


Ignore:
Timestamp:
Oct 20, 2012, 4:14:07 PM (7 years ago)
Author:
nmedfort
Message:

edits and content stream subsection

Location:
docs/Working/icXML
Files:
6 edited

Legend:

Unmodified
Added
Removed
  • docs/Working/icXML/FigGen/genfig.py

    r2514 r2530  
    4747        return rslt
    4848
    49 def do_filter_strm(source, filter_strm):
     49def do_filter_strm(source, filter_strm, StringEnds, StringEndReplacement):
    5050        rslt = ''
    5151        for i in range(len(source)):
    5252                if filter_strm & 1: rslt += '_'
     53                elif StringEnds & 1: rslt += StringEndReplacement
    5354                else: rslt += source[i]
    5455                filter_strm >>= 1
     56                StringEnds >>= 1
    5557        return rslt
    5658
     
    187189                              ('Empty Tag Marks', bitutil.bitstream2string(callouts.EmptyTag_marks, lgth, zero_ch)),
    188190                              ('Element Names', bitutil.bitstream2string(callouts.ElemName_ends - callouts.ElemName_starts, lgth, zero_ch)),
    189                               ('Att Names', bitutil.bitstream2string(callouts.AttName_ends - callouts.AttName_starts, lgth, zero_ch)),
    190                               ('Att Values', bitutil.bitstream2string(callouts.AttValSpan, lgth, zero_ch)),
    191                               ('String ends', bitutil.bitstream2string(StringEnds, lgth, zero_ch)),
    192                               ('String marks', set_marks(u8data,StringEnds, '0')),
    193                               ('Transition marks', bitutil.bitstream2string(transition_marks, lgth, zero_ch)),
    194                               ('Transition chars', do_select(u8data, transition_marks)),
    195                               ('Deletion mask', bitutil.bitstream2string(callouts.delmask, lgth, zero_ch)),
    196                               ('Undeleted', do_filter_strm(u8data, callouts.delmask|StringEnds))])
     191                              ('Attribute Names', bitutil.bitstream2string(callouts.AttName_ends - callouts.AttName_starts, lgth, zero_ch)),
     192                              ('Attribute Values', bitutil.bitstream2string(callouts.AttValSpan, lgth, zero_ch)),
     193                              ('String Ends', bitutil.bitstream2string(StringEnds, lgth, zero_ch)),
     194                              # ('String marks', set_marks(u8data,StringEnds, '`$\\varnothing$\\verb`')),
     195                              ('Markup Identifiers', bitutil.bitstream2string(transition_marks, lgth, zero_ch)),
     196                              # ('Transition chars', do_select(u8data, transition_marks)),
     197                              ('Deletion Mask', bitutil.bitstream2string(callouts.delmask, lgth, zero_ch)),
     198                              ('Undeleted Data', do_filter_strm(u8data, callouts.delmask, StringEnds, '`{\\tt\\it 0}\\verb`'))])
    197199
    198200
    199201
    200202if __name__ == "__main__":
    201         demo_stags("<root><t1>text</t1><t2 a1='foo' a2 = 'fie'>more</t2><tag3 att3='b'/></root>")
    202 
    203 
    204 
    205 
    206 
    207 
    208 
    209 
    210 
    211 
    212 
    213 
    214 
    215 
    216 
    217 
    218 
     203        demo_stags("<document>fee<element a1='fie' a2 = 'foe'></element>fum</document>")
     204
     205
     206
     207
     208
     209
     210
     211
     212
     213
     214
     215
     216
     217
     218
     219
     220
  • docs/Working/icXML/arch-overview.tex

    r2522 r2530  
    6767The former takes the (transposed) basis bit streams and selectively filters them, according to the
    6868information provided by the Parallel Markup Parser, and the latter transforms the
    69 filtered streams into the tagged UTF-16 {\it content stream}.
    70 This is discussed in Section \ref{sec:parfilter}.
     69filtered streams into the tagged UTF-16 {\it content stream}, discussed in Section \ref{section:arch:contentstream}.
    7170
    7271Combined, the symbol and content stream form \icXML{}'s compressed IR of the XML document.
  • docs/Working/icXML/background-parabix.tex

    r2522 r2530  
    11\subsection{The Parabix Framework}
    22\label{background:parabix}
     3
     4\begin{figure*}[tbh]
     5\begin{center}
     6\begin{tabular}{cr}\\
     7Source Data & \verb`<document>fee<element a1='fie' a2 = 'foe'></element>fum</document>`\\
     8Tag Openers & \verb`1____________1____________________________1____________1__________`\\
     9Start Tag Marks & \verb`_1____________1___________________________________________________`\\
     10End Tag Marks & \verb`___________________________________________1____________1_________`\\
     11Empty Tag Marks & \verb`__________________________________________________________________`\\
     12Element Names & \verb`_11111111_____1111111_____________________________________________`\\
     13Attribute Names & \verb`______________________11_______11_________________________________`\\
     14Attribute Values & \verb`__________________________111________111__________________________`\\
     15% String Ends & \verb`1____________1_______________1__________1_1____________1__________`\\
     16% Markup Identifiers & \verb`_________1______________1_________1______1_1____________1_________`\\
     17% Deletion Mask & \verb`_11111111_____1111111111_1____1111_11_______11111111_____111111111`\\
     18% Undeleted Data & \verb``{\tt\it 0}\verb`________>fee`{\tt\it 0}\verb`__________=_fie`{\tt\it 0}\verb`____=__foe`{\tt\it 0}\verb`>`{\tt\it 0}\verb`/________fum`{\tt\it 0}\verb`/_________`
     19\end{tabular}
     20\end{center}
     21\caption{XML Source Data and Derived Parallel Bit Streams}
     22\label{fig:parabix1}
     23\end{figure*}
    324
    425The Parabix (parallel bit stream) framework is a transformative approach to XML parsing
     
    1940{\tt [0-9]} if and only if $\lnot(b_0 \lor b_1) \land (b_2 \land b_3) \land \lnot(b_4 \land (b_5 \lor b_6))$.
    2041An important observation here is that ranges of characters may
    21 require fewer operations than individual characters and multiple
    22 classes can sometimes share the classification cost.
     42require fewer operations than individual characters and
     43% the classification cost could be amortized over many character classes.
     44multiple classes can share the classification cost.
    2345
    24 \begin{figure}[tbh]
     46\begin{figure}[h]
    2547\begin{center}
    2648\begin{tabular}{r c c c c }
    27 STRING & \ttfamily{b} & \ttfamily{7} & \ttfamily{\verb`<`} & \ttfamily{A} \\
     49String & \ttfamily{b} & \ttfamily{7} & \ttfamily{\verb`<`} & \ttfamily{A} \\
    2850ASCII & \ttfamily{\footnotesize 0110001{\bfseries 0}} & \ttfamily{\footnotesize 0011011{\bfseries 1}} & \ttfamily{\footnotesize 0011110{\bfseries 0}} & \ttfamily{\footnotesize 0100000{\bfseries 1}} \\
    2951\hline
     
    4870% process, intra-element well-formedness validation is performed on each block
    4971% of text.
    50 
    51 \begin{figure*}[tbh]
    52 \begin{center}
    53 \begin{tabular}{cr}\\
    54 Source Data & \verb`<root><t1>text</t1><t2 a1='foo' a2 = 'fie'>more</t2><tag3 att3='b'/></root>`\\
    55 Tag Openers & \verb`1_____1_______1____1___________________________1____1_______________1______`\\
    56 Start Tag Marks & \verb`_1_____1____________1________________________________1_____________________`\\
    57 End Tag Marks & \verb`_______________1________________________________1____________________1_____`\\
    58 Empty Tag Marks & \verb`___________________________________________________________________1_______`\\
    59 Element Names & \verb`_1111__11___________11_______________________________1111__________________`\\
    60 Att Names & \verb`_______________________11_______11________________________1111_____________`\\
    61 Att Values & \verb`___________________________111________111_______________________1__________`\\
    62 String ends & \verb`1_____1_______1____1__________1__________1_____1____1____________1__1______`\\
    63 String marks & \verb`0_____0_______0____0__________0__________0_____0____0____________0__0______`\\
    64 Transition marks & \verb`_____1___1_____1_________1_________1______1_____1_____________1____1_1_____`\\
    65 Transition chars & \verb`_____>___>_____/_________=_________=______>_____/_____________=____>_/_____`\\
    66 Deletion mask & \verb`_1111__11_______111_11111_1____1111_11___________111_111111111_1__1___11111`\\
    67 Undeleted & \verb`_____>___>text_/_________=_foo_____=__fie_>more_/_____________=_b__>_/_____`
    68 \end{tabular}
    69 \end{center}
    70 \caption{XML Source Data and Derived Parallel Bit Streams}
    71 \label{fig:parabix1}
    72 \end{figure*}
    7372
    7473Consider, for example, the XML source data stream shown in the first line of Figure \ref{fig:parabix1}.
  • docs/Working/icXML/background-xerces.tex

    r2522 r2530  
    3232Even if it were possible, Amdahl's Law dictates that tackling any one of these functions for
    3333parallelization in isolation would only produce a minute improvement in perfomance.
    34 Unfortunetly, early investigation into these functions found they were already performing well in their given tasks
    35 and only trivial enhancements were possible.
     34Unfortunetly, early investigation into these functions found
     35that incorporating speculation-free thread-level parallelization was impossible
     36and they were already performing well in their given tasks;
     37thus only trivial enhancements were attainable.
    3638In order to obtain a systematic acceleration of Xerces,
    3739it should be expected that a comprehensive restructuring
  • docs/Working/icXML/icxml-main.tex

    r2528 r2530  
    3333\usepackage{CJKutf8}
    3434\usepackage{morefloats}
     35\usepackage{amssymb}
    3536\begin{document}
    3637
     
    148149\input{parfilter.tex}
    149150
     151\input{arch-contentstream.tex}
     152
    150153\input{arch-namespace.tex}
    151154
  • docs/Working/icXML/parfilter.tex

    r2523 r2530  
    2121completed by applying parallel deletion and inverse transposition of the
    2222UTF-16 bit streams\cite{Cameron2008}.
     23
     24\begin{figure*}[tbh]
     25\begin{center}
     26\begin{tabular}{cr}\\
     27Source Data & \verb`<document>fee<element a1='fie' a2 = 'foe'></element>fum</document>`\\
     28% Tag Openers & \verb`1____________1____________________________1____________1__________`\\
     29% Start Tag Marks & \verb`_1____________1___________________________________________________`\\
     30% End Tag Marks & \verb`___________________________________________1____________1_________`\\
     31% Empty Tag Marks & \verb`__________________________________________________________________`\\
     32% Element Names & \verb`_11111111_____1111111_____________________________________________`\\
     33% Attribute Names & \verb`______________________11_______11_________________________________`\\
     34% Attribute Values & \verb`__________________________111________111__________________________`\\
     35String Ends & \verb`1____________1_______________1__________1_1____________1__________`\\
     36Markup Identifiers & \verb`_________1______________1_________1______1_1____________1_________`\\
     37Deletion Mask & \verb`_11111111_____1111111111_1____1111_11_______11111111_____111111111`\\
     38Undeleted Data & \verb``{\tt\it 0}\verb`________>fee`{\tt\it 0}\verb`__________=_fie`{\tt\it 0}\verb`____=__foe`{\tt\it 0}\verb`>`{\tt\it 0}\verb`/________fum`{\tt\it 0}\verb`/_________`
     39\end{tabular}
     40\end{center}
     41\caption{XML Source Data and Derived Parallel Bit Streams}
     42\label{fig:parabix2}
     43\end{figure*}
    2344
    2445Rather than immediately paying the
     
    82103the process of reducing markup data to tag bytes
    83104preceding each significant XML transition as described
    84 in section \ref{sec:contentbuffer}.  Overall, \icXML{}
     105in section~\ref{section:arch:contentstream}.  Overall, \icXML{}
    85106avoids separate buffer copying operations for each of the
    86107these filtering steps, paying the cost of parallel
    87108deletion and inverse transposition only once. 
    88109Currently, \icXML{} employs the parallel-prefix compress algorithm
    89 of Steele\cite{HackersDelight}  Performance
    90 is independent of the number of positions deleted.  As a
    91 further note, future versions of \icXML{} are expected to
    92 take advantage of the parallel extract operation\cite{HilewitzLee2006}
     110of Steele~\cite{HackersDelight}  Performance
     111is independent of the number of positions deleted.
     112Future versions of \icXML{} are expected to
     113take advantage of the parallel extract operation~\cite{HilewitzLee2006}
    93114that Intel is now providing in its Haswell architecture.
    94 
    95 
Note: See TracChangeset for help on using the changeset viewer.