Changeset 1292 for docs


Ignore:
Timestamp:
Aug 8, 2011, 2:23:58 PM (8 years ago)
Author:
ksherdy
Message:

Minor edits.

Location:
docs/PACT2011
Files:
3 edited

Legend:

Unmodified
Added
Removed
  • docs/PACT2011/02-background.tex

    r1291 r1292  
    33
    44\subsection{XML}
    5 In 1998, the W3C officially adopted XML as a standard. The defining characteristics of XML are that it can represent virtually any type of information through the use of self-describing markup tags and can easily store semi-structured data in a descriptive fashion. XML markup encodes a description of an XML document's storage layout and logical structure. Because XML was intended to be human-readable, XML markup tags are often verbose by design \cite{TR:XML}.
    6 An example of XML document is as follows.
     5In 1998, the W3C officially adopted XML as a standard. The defining characteristics of XML are that it can represent virtually any type of information through the use of self-describing markup tags and can easily store semi-structured data in a descriptive fashion. XML markup encodes a description of an XML document's storage layout and logical structure. Because XML was intended to be human-readable, XML markup tags are often verbose by design \cite{TR:XML}.
     6
     7% <ProductName Language="Chinese">郚件</ProductName> % can't represent in verbose and not really sure if the google auto-translater is correct
     8
     9XML files can be classified as ``document-oriented'' or ``data-oriented'' \cite{DuCharme04}. Documented-oriented XML is designed for human readability, such as shown in Figure \ref{fig:sample_xml}; data-oriented XML files are intended to be parsed by machines and omit ``human-friendly'' formatting techniques, such as the use of whitespace and descriptive ``natural language'' naming schemes.  Although the XML specification itself does not distinguish between ``XML for documents'' and ``XML for data'' \cite{TR:XML}, the latter often requires the use of an XML parser to extract the information within. The role of an XML parser is to transform the text-based XML data into application ready data.
    710
    811\begin{figure}[h]
     
    2427\end{figure}
    2528
    26 % <ProductName Language="Chinese">郚件</ProductName> % can't represent in verbose and not really sure if the google auto-translater is correct
    2729
    28 
    29 XML files can be classified as ``document-oriented'' or ``data-oriented'' \cite{DuCharme04}. Documented-oriented XML is designed for human readability, such as shown in Figure \ref{fig:sample_xml}; data-oriented XML files are intended to be parsed by machines and omit ``human-friendly'' formatting techniques, such as the use of whitespace and descriptive ``natural language'' naming schemes.  Although the XML specification itself does not distinguish between ``XML for documents'' and ``XML for data'' \cite{TR:XML}, the latter often requires the use of an XML parser to extract the information within. The role of an XML parser is to transform the text-based XML data into application ready data.
    3030%For example, an XML parser for a web browser may take a XML file, apply a style sheet to it, and display it to the end user in an attractive yet informative way; an XML database parser may take a XML file and construct indexes and/or compress the tree into a proprietary format to provide the end user with efficient relational, hierarchical, and/or object-based query access to it.
    3131
     
    3434\subsection{Traditional XML Parsers}
    3535% However, textual data tends to consist of variable-length items in generally unpredictable patterns \cite{Cameron2010}.
    36 Traditional XML parsers process XML sequentially a single byte-at-a-time. Following this approach, an XML parser processes a source document serially, from the first to the last byte of the source file. Each character of the source text is examined in turn to distinguish between the XML-specific markup, such as an opening angle bracket `<', and the content held within the document. The current character that the parser is processing is commonly referred to using the concept of a current cursor position. As the parser moves the cursor through the source document, the parser alternates between markup scanning, and data validation and processing operations. At each processing step, the parser scans the source document and either locates the expected markup, or reports an error condition and terminates. In other words, traditional XML parsers are complex finite-state machines that use byte comparisons to transition between data and metadata states. Each state transition indicates the context in which to interpret the subsequent characters. Unfortunately, textual data tends to consist of variable-length items sequenced in generally unpredictable patterns \cite{Cameron2010}; thus any character could be a state transition until deemed otherwise.
     36Traditional XML parsers process XML sequentially a single byte-at-a-time. Following this approach, an XML parser processes a source document serially, from the first to the last byte of the source file. Each character of the source text is examined in turn to distinguish between the XML-specific markup, such as an opening angle bracket `<', and the content held within the document. The current character that the parser is processing is commonly referred to using the concept of a current cursor position. As the parser moves the cursor through the source document, the parser alternates between markup scanning, and data validation and processing operations. At each processing step, the parser scans the source document and either locates the expected markup, or reports an error condition and terminates. In other words, traditional XML parsers operate as complex finite-state machines that use byte comparisons to transition between data and metadata states. Each state transition indicates the context in which to interpret the subsequent characters. Unfortunately, textual data tends to consist of variable-length items sequenced in generally unpredictable patterns \cite{Cameron2010}; thus any character could be a state transition until deemed otherwise.
    3737
    3838Expat and Xerces-C are popular byte-a-time sequential parsers. Both are C/C++ based and open-source. Expat was originally released in 1998; it is currently used in Mozilla Firefox and provides the core functionality of many additional XML processing tools \cite{expat}. Xerces-C was released in 1999 and is the foundation of the Apache XML project \cite{xerces}.
     
    4646
    4747\subsection {Parallel XML Parsing}
    48 In general, parallel XML acceleration methods comes in one of two forms: multithreaded approaches and SIMD-based techniques.
     48In general, parallel XML acceleration methods come in one of two forms: multithreaded approaches and SIMD-based techniques.
    4949Multithreaded XML parsers take advantage of multiple cores via number of strategies.
    5050Common strategies include preparsing the XML file to locate key partitioning points \cite{ZhangPanChiu09} and speculative p-DFAs \cite{ZhangPanChiu09}.
    5151SIMD XML parsers leverage the SIMD registers to overcome the performance limitations of the sequential byte-at-a-time processing model and its
    52 inherently data dependent branch misprediction rates.  Further, SIMD instructions allow the processor to perform the same
     52inherently data dependent branch misprediction rates.  Further, data parallel SIMD instructions allow the processor to perform the same
    5353operation on multiple pieces of data simultaneously.  The Parabix1 and Parabix2 parsers studied in this paper
    54 fall into the SIMD classification. The Parabix parser are described in further detail in Section \ref{section:parabix}.
     54fall under the SIMD classification. The Parabix parser versions studied are described in further detail in Section \ref{section:parabix}.
    5555
    5656%\subsection {SIMD Operations}
  • docs/PACT2011/main.bbl

    r1125 r1292  
    7777\newblock In {\em {XML 2004}}, {Washington D.C.}, 2004.
    7878
     79\bibitem{Perkins05}
     80{E. Perkins and M. Kostoulas and A. Heifets and M. Matsa and N. Mendelsohn}.
     81\newblock {Performance Analysis of {XML} APIs}.
     82\newblock In {\em XML 2005}, Atlanta, Georgia, Nov. 2005.
     83
    7984\bibitem{Parabix1}
    8085R.~D.~C. et. al.
     
    125130  Information and Knowledge Management}, New Orleans, Louisiana, 2003.
    126131
    127 \bibitem{Perkins05}
    128 {Perkins, E. and Kostoulas, M. and Heifets, A. and Matsa, M. and Mendelsohn,
    129   N.}
    130 \newblock {Performance Analysis of {XML} APIs}.
    131 \newblock In {\em XML 2005}, Atlanta, Georgia, Nov. 2005.
    132 
    133132\bibitem{ParaDOM2009}
    134133B.~Shah, P.~Rao, B.~Moon, and M.~Rajagopalan.
Note: See TracChangeset for help on using the changeset viewer.