# Changeset 2473

Ignore:
Timestamp:
Oct 17, 2012, 5:16:56 PM (7 years ago)
Message:

 r2453 \section{Leveraging SIMD Parallelism for Multicore: Pipeline Parallelism} \subsection{Pipeline Strategy for ICXML} As discussed in section \ref{}, Xerces can be considered as a complex finite-state machine. Finite-state machine belongs to the hardest application class to parallelize and process efficiently among all presented in Berkeley study reports \cite{Asanovic:EECS-2006-183}. However, ICXML reconstructs Xerces and provides logical layers between modules, which naturally enables pipeline parallel processing. In our pipeline model, each thread is in charge of one module or one group of modules. A straight forward division is to take advantage of the layer between Parabix Subsystem and Markup Processor. In this case, the first thread $T_1$ will read 16k of XML input $I$ at a time and process all the modules in Parabix Subsystem to generates content buffer, symbol array, URI array, context ID array and store them to a pre-allocated shared data structure $S$. The second thread $T_2$ reads the shared data provided by the first thread and goes through all the modules in Markup Processor and writes output $O$. The shared data structure is implemented using a ring buffer, where each entry consists of all the arrays shared between the two threads with size of 160k. In the example of Figure \ref{threads_timeline1} and \ref{threads_timeline2}, the ring buffer has four entries. A lock-free mechanism is applied to ensure that each entry can only be read or written by one thread at the same time. In Figure \ref{threads_timeline1}, the processing time of the first thread is longer, thus the second thread always wait for the first thread to finish processing one chunk of input and write to the shared memory. Figure \ref{threads_timeline2} illustrates a different situation where the second thread is slower and the first thread has to wait for the second thread finishing reading the shared data before it can reuse the memory space. \begin{figure} \includegraphics[width=0.50\textwidth]{plots/threads_timeline1.pdf} \label{icxml_structure} \caption{} \label{threads_timeline1} \end{figure} \begin{figure} \includegraphics[width=0.50\textwidth]{plots/threads_timeline2.pdf} \label{icxml_structure} \caption{} \label{threads_timeline2} \end{figure} \subsection{Performance Comparison} \begin{figure} \begin{center} \includegraphics[width=0.50\textwidth]{plots/single-multi-thread.pdf} \label{xerces_structure} \caption{Performance comparison without namespace} \caption{Performance comparison of single-thread vs. multithread without namespace} \label{single-multi-thread} \end{center} \begin{figure} \includegraphics[width=0.50\textwidth]{plots/single-multi-thread_ns.pdf} \label{icxml_structure} \caption{Performance comparison with namespace} \caption{Performance comparison of single-thread vs. multithread with namespace} \label{single-multi-thread_ns} \end{figure} \begin{center} \includegraphics[width=0.50\textwidth]{plots/threads_comp.pdf} \label{xerces_structure} \caption{} \caption{Performance comparison of the two threads without namespace} \label{threads_comp} \end{center} \begin{figure} \includegraphics[width=0.50\textwidth]{plots/threads_comp_ns.pdf} \label{icxml_structure} \caption{} \caption{Performance comparison of the two threads with namespace} \label{threads_comp_ns} \end{figure}