Changeset 1097 for docs/PACT2011


Ignore:
Timestamp:
Apr 8, 2011, 7:28:14 PM (8 years ago)
Author:
ksherdy
Message:

Minor updates.

Location:
docs/PACT2011
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • docs/PACT2011/02-background.tex

    r1096 r1097  
    3636Traditional XML parsers process XML sequentially a single byte-at-a-time. Following this approach, an XML parser processes a source document serially, from the first to the last byte of the source file. Each character of the source text is examined in turn to distinguish between the XML-specific markup, such as an opening angle bracket `<', and the content held within the document. The current character that the parser is processing is commonly referred to using the concept of a current cursor position. As the parser moves the cursor through the source document, the parser alternates between markup scanning, and data validation and processing operations. At each processing step, the parser scans the source document and either locates the expected markup, or reports an error condition and terminates. In other words, traditional XML parsers are complex finite-state machines that use byte comparisons to transition between data and metadata states. Each state transition indicates the context in which to interpret the subsequent characters. Unfortunetly, textual data tends to consist of variable-length items sequenced in generally unpredictable patterns \cite{Cameron2010}; thus any character could be a state transition until deemed otherwise.
    3737
    38 Expat and Xerces-C are popular byte-a-time sequential parsers. Both are C/C++ based and open-source. Expat was originally released in 1998; it is currently used in Mozilla Firefox and forms the core of many additional XML processing tools \cite{expat}. Xerces-C was released in 1999 and is the foundation of the Apache XML project \cite{xerces}.
     38Expat and Xerces-C are popular byte-a-time sequential parsers. Both are C/C++ based and open-source. Expat was originally released in 1998; it is currently used in Mozilla Firefox and provides the core functionality of many additional XML processing tools \cite{expat}. Xerces-C was released in 1999 and is the foundation of the Apache XML project \cite{xerces}.
    3939
    4040% For example, the main loop of Xerces-C well-formedness scanner contains:
     
    5050Common strategies include preparsing the XML file to locate key partitioning points \cite{ZhangPanChiu09} and speculative p-DFAs \cite{ZhangPanChiu09}.
    5151SIMD XML parsers leverage the SIMD registers to overcome the performance limitations of the sequential byte-at-a-time processing model and its
    52 inherent data dependent branch misprediction rates.  Further, SIMD instructions allow the processor to perform the same
     52inherently data dependent branch misprediction rates.  Further, SIMD instructions allow the processor to perform the same
    5353operation on multiple pieces of data simultaneously.  The Parabix1 and Parabix2 parsers studied in this paper
    5454fall into the SIMD classification and are described in more detail in Section \ref{section:parabix}.
Note: See TracChangeset for help on using the changeset viewer.