Changeset 1108 for docs/PACT2011

Apr 10, 2011, 9:36:34 PM (8 years ago)

Significant rewrite.

2 edited


  • docs/PACT2011/05-corei3.tex

    r1107 r1108  
    4141Despite improvements in branch prediction, branch misprediction penalties contribute
    4242significantly to XML parsing performance. On modern commodity processors the cost of a single branch
    43 misprediction is generally cited as over 10 CPU cycles.  As shown in
    44 Figure \ref{corei3_BM}, the cost of branch mispredictions per XML byte for Expat
    45 can be over 7 cycles---this cost alone is equal to the total cost for Parabix2 to process each byte of XML when given the same input.
     43misprediction is commonly cited as over 10 CPU cycles.  As shown in
     44Figure \ref{corei3_BM}, the cost of branch mispredictions for the Expat parser
     45can be over 7 cycles per XML byte---this cost alone is equal to the average total cost for Parabix2 to process each byte of XML.
    47 But reducing the branch misprediction rate is difficult for text-based
    48 applications due to the variable-length nature of syntactic elements.
    49 Therefore, the goal is to reduce the total number of branches.  However, traditional byte-at-a-time XML
    50 parsing requires a large number of inevitable branches.  As
    51 shown in Figure \ref{corei3_BR}, Xerces can have an average of 13
    52 branches for each byte it processed on the high markup density file.
    53 Parabix1 minimizes the branches by using parallel bit streams for each 128-bit block but still requires a few
    54 branches for sequential scanning. Utilizing the new parallel scanning technique, Parabix2 is relatively branch-free, as shown in Figure \ref{corei3_BR}. As a result, Parabix2 has minimal
    55 dependency on the markup density of the workloads.
     47In general, reducing the branch misprediction rate is difficult in text-based XML parsing
     48applications. This is due in part to the variable length nature of the syntactic elements contained within XML documents, a data dependent characterstic,
     49as well as the extensive set of syntax constraints imposed by the XML 1.0 specification. As such, traditional byte-at-a-time XML parsers generate a performance limiting
     50number of branch mispredictions.  As shown in Figure \ref{corei3_BR}, Xerces averages up to 13
     51branches per XML byte processed on high density markup.
     53The performance improvement of Parabix1 in terms of branch mispredictions results from the veritable elimination of conditional branch instructions in scanning. Leveraging the processor built-in {\em bit scan}
     54operation together with parallel bit stream technology Parabix1 can scan up to 64 bytes of source XML with a single {\em bit scan} instruction. In comparison, a byte-at-a-time parser must
     55process a conditional branch instruction per XML byte scanned.
     57As shown in Figure \ref{corei3_BR} Parabix2 processing is almost branch free. Utilizing a new parallel scanning technique based on bit stream addition, Parabix2 exhibits minimal
     58conditional branch dependence on source XML markup density. \ref{corei3_BR} shows displays this lack of data dependence via the constant number of branch
     59mispredictions produced for each of the source XML files.
    5660% Parabix1 minimize the branches by using parallel bit
    5761% streams.  Parabix1 still have a few branches for each block of 128
Note: See TracChangeset for help on using the changeset viewer.