Ignore:
Timestamp:
Apr 10, 2011, 8:46:19 PM (8 years ago)
Author:
ksherdy
Message:

Minor edits.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/PACT2011/05-corei3.tex

    r1079 r1107  
    33%some of the numbers are roughly calculated, needs to be recalculated for final version
    44\subsection{Cache behavior}
    5 \CITHREE\ has a three level cache hierarchy.  The miss penalty for each
    6 level is approximately 4, 11, and 36 cycles respectively.  Figure
     5\CITHREE\ has a three level cache hierarchy.  The approximate miss penalty for each cache
     6level is 4, 11, and 36 cycles respectively.  Figure
    77\ref{corei3_L1DM}, Figure \ref{corei3_L2DM} and Figure
    8 \ref{corei3_L3TM} show the L1, L2 and L3 data cache misses of all the
    9 four parsers.  Although XML parsing is not a memory intensive
    10 application, the cost of cache miss for Expat and Xerces can be about
    11 half a cycle per byte while the performance of Parabix is essentially
    12 unaffected by cache misses.  Cache misses are not just a problem for
    13 performance but also energy consumption.  L1 cache miss cost about
    14 8.3nJ; L2 cache miss cost about 19nJ; L3 cache miss cost about 40nJ.
    15 With a 1GB input file, Expat and Xerces would consume over 0.6J and 0.9J due to cache misses alone respectively.
    16 %With a 1GB input file, Expat would consume more than 0.6J and Xerces
     8\ref{corei3_L3TM} show the L1, L2 and L3 data cache misses for each of the parsers.  Although XML parsing is non memory intensive
     9application, cache misses for the Expat and Xerces parsers represent a 0.5 cycle per XML byte cost whereas the performance of the Parabix parsers remains essentially
     10unaffected by data cache misses.  Cache misses not only consume additional CPU cycles but increase application energy consumption.  L1, L2, and L3 cache misses consume
     11approximately 8.3nJ, 19nJ, and 40nJ respectively. As such, given a 1GB XML file as input, Expat and Xerces would consume over 0.6J and 0.9J respectively due to cache misses alone.
     12%With a 1GB input file, Expat would consume more than 0.6J and Xercesn
    1713%would consume 0.9J on cache misses alone.
    1814
     
    2218\includegraphics[width=0.5\textwidth]{plots/corei3_L1DM.pdf}
    2319\end{center}
    24 \caption{L1 Data Cache Misses on \CITHREE\ (y-axis: Cache Misses per KByte)}
     20\caption{L1 Data Cache Misses on \CITHREE\ (y-axis: Cache Misses per kB)}
    2521\label{corei3_L1DM}
    2622\end{figure}
     
    3026\includegraphics[width=0.5\textwidth]{plots/corei3_L2DM.pdf}
    3127\end{center}
    32 \caption{L2 Data Cache Misses on \CITHREE\ (y-axis: Cache Misses per KByte)}
     28\caption{L2 Data Cache Misses on \CITHREE\ (y-axis: Cache Misses per kB)}
    3329\label{corei3_L2DM}
    3430\end{figure}
     
    3834\includegraphics[width=0.5\textwidth]{plots/corei3_L3CM.pdf}
    3935\end{center}
    40 \caption{L3 Cache Misses on \CITHREE\ (y-axis: Cache Misses per KByte)}
     36\caption{L3 Cache Misses on \CITHREE\ (y-axis: Cache Misses per kB)}
    4137\label{corei3_L3TM}
    4238\end{figure}
    4339
    4440\subsection{Branch Mispredictions}
    45 Despite years of improvement, branch misprediction is still a
    46 significant bottleneck when it comes to performance.  The cost of a branch
    47 misprediction is generally over 10 CPU cycles.  As shown in
    48 Figure \ref{corei3_BM}, the cost of branch mispredictions per byte of XML for Expat
    49 can be over 7 cycles---which is approximately the number of cycles
    50 required by Parabix2 to process a byte of XML data using the same workload.
     41Despite improvements in branch prediction, branch misprediction penalties contribute
     42significantly to XML parsing performance. On modern commodity processors the cost of a single branch
     43misprediction is generally cited as over 10 CPU cycles.  As shown in
     44Figure \ref{corei3_BM}, the cost of branch mispredictions per XML byte for Expat
     45can be over 7 cycles---this cost alone is equal to the total cost for Parabix2 to process each byte of XML when given the same input.
    5146
    5247But reducing the branch misprediction rate is difficult for text-based
     
    7065\includegraphics[width=0.5\textwidth]{plots/corei3_BR.pdf}
    7166\end{center}
    72 \caption{Branches on \CITHREE\ (y-axis: Branches per KByte)}
     67\caption{Branches on \CITHREE\ (y-axis: Branches per kB)}
    7368\label{corei3_BR}
    7469\end{figure}
     
    7873\includegraphics[width=0.5\textwidth]{plots/corei3_BM.pdf}
    7974\end{center}
    80 \caption{Branch Mispredictions on \CITHREE\ (y-axis: Branch Mispredictions per KByte)}
     75\caption{Branch Mispredictions on \CITHREE\ (y-axis: Branch Mispredictions per kB)}
    8176\label{corei3_BM}
    8277\end{figure}
     
    122117\includegraphics[width=0.5\textwidth]{plots/corei3_INS_p1.pdf}
    123118\end{center}
    124 \caption{Parabix1 SIMD Instruction Ratio (y-axis: percent)}
     119\caption{Parabix1 SIMD vs. Non-SIMD Instructions (y-axis: Percent SIMD Instructions}
    125120\label{corei3_INS_p1}
    126121\end{figure}
     
    130125\includegraphics[width=0.5\textwidth]{plots/corei3_INS_p2.pdf}
    131126\end{center}
    132 \caption{Parabix2 SIMD Instruction Ratio (y-axis: percent)}
     127\caption{Parabix2 SIMD vs. Non-SIMD Instructions (y-axis: Percent SIMD Instructions)}
    133128\label{corei3_INS_p2}
    134129\end{figure}
     
    154149\includegraphics[width=0.5\textwidth]{plots/corei3_TOT.pdf}
    155150\end{center}
    156 \caption{Processing Time on \CITHREE\ (y-axis: Total CPU Cycles per KByte)}
     151\caption{Processing Time on \CITHREE\ (y-axis: Total CPU Cycles per kB)}
    157152\label{corei3_TOT}
    158153\end{figure}
     
    190185\includegraphics[width=0.5\textwidth]{plots/corei3_energy.pdf}
    191186\end{center}
    192 \caption{Energy Consumption on \CITHREE\ ($\mu$J per KByte)}
     187\caption{Energy Consumption on \CITHREE\ ($\mu$J per kB)}
    193188\label{corei3_energy}
    194189\end{figure}
Note: See TracChangeset for help on using the changeset viewer.