# Changeset 2871

Ignore:
Timestamp:
Jan 30, 2013, 5:29:40 PM (6 years ago)
Message:

more edits

Location:
docs/Working/icXML
Files:
4 edited

Unmodified
Removed
• ## docs/Working/icXML/arch-overview.tex

 r2866 \subsection{Overview} \def \CSG{Stream Generator} \def \CSG{Content Stream Generator} \icXML{} is more than an optimized version of Xerces. Many components were grouped, restructured and
• ## docs/Working/icXML/icxml-main.tex

 r2866 resolution, bit parallel methods in namespace processing, as well as staged processing using pipeline parallelism to take advantage of multiple cores. multiple cores. \begin{figure*}[th] \begin{center} \begin{tabular}{rr}\\ Source Data & \verbfeefum\\ Tag Openers & \verb1____________1____________________________1____________1__________\\ Start Tag Marks & \verb_1____________1___________________________________________________\\ End Tag Marks & \verb___________________________________________1____________1_________\\ Empty Tag Marks & \verb__________________________________________________________________\\ Element Names & \verb_11111111_____1111111_____________________________________________\\ Attribute Names & \verb______________________11_______11_________________________________\\ Attribute Values & \verb__________________________111________111__________________________\\ % String Ends & \verb1____________1_______________1__________1_1____________1__________\\ % Markup Identifiers & \verb_________1______________1_________1______1_1____________1_________\\ % Deletion Mask & \verb_11111111_____1111111111_1____1111_11_______11111111_____111111111\\ % Undeleted Data & \verb{\tt\it 0}\verb________>fee{\tt\it 0}\verb__________=_fie{\tt\it 0}\verb____=__foe{\tt\it 0}\verb>{\tt\it 0}\verb/________fum{\tt\it 0}\verb/_________ \end{tabular} \end{center} \caption{XML Source Data and Derived Parallel Bit Streams} \label{fig:parabix1} \end{figure*} The remainder of this paper is organized as follows. \input{background-xerces} \begin{figure*}[th] \begin{center} \begin{tabular}{rr}\\ Source Data & \verbfeefum\\ Tag Openers & \verb1____________1____________________________1____________1__________\\ Start Tag Marks & \verb_1____________1___________________________________________________\\ End Tag Marks & \verb___________________________________________1____________1_________\\ Empty Tag Marks & \verb__________________________________________________________________\\ Element Names & \verb_11111111_____1111111_____________________________________________\\ Attribute Names & \verb______________________11_______11_________________________________\\ Attribute Values & \verb__________________________111________111__________________________\\ % String Ends & \verb1____________1_______________1__________1_1____________1__________\\ % Markup Identifiers & \verb_________1______________1_________1______1_1____________1_________\\ % Deletion Mask & \verb_11111111_____1111111111_1____1111_11_______11111111_____111111111\\ % Undeleted Data & \verb{\tt\it 0}\verb________>fee{\tt\it 0}\verb__________=_fie{\tt\it 0}\verb____=__foe{\tt\it 0}\verb>{\tt\it 0}\verb/________fum{\tt\it 0}\verb/_________ \end{tabular} \end{center} \caption{XML Source Data and Derived Parallel Bit Streams} \label{fig:parabix1} \end{figure*} \input{background-parabix}
 r2525 % which naturally enables pipeline parallel processing. As discussed in section \ref{background:xerces}, Xerces can be considered a complex finite-state machine. As an application class, finite-state machines are considered very difficult to parallelize and have been termed embarassingly sequential.'' \cite{Asanovic:EECS-2006-183}. However, \icXML{} is designed to organize processing into logical layers that are separable.   In particular, layers within the \PS{} are designed to operate As discussed in section \ref{background:xerces}, Xerces can be considered a FSM application. These are embarassingly sequential.''\cite{Asanovic:EECS-2006-183} and notoriously difficult to parallelize. However, \icXML{} is designed to organize processing into logical layers. In particular, layers within the \PS{} are designed to operate over significant segments of input data before passing their outputs on for subsequent processing.  This fits well into the general model of pipeline The most straightforward division of work in \icXML{} is to separate the \PS{} and the \MP{} into distinct logical layers in a two-stage pipeline. the \PS{} and the \MP{} into distinct logical layers into two seperate stages. The resultant application, {\it\icXMLp{}}, is a course-grained software-pipeline application. In this case, the \PS{} thread $T_1$ reads 16k of XML input $I$ at a time and produces the content, symbol and URI streams, then stores them in a pre-allocated shared data structure $S$. \subfigure[]{ \includegraphics[width=0.48\textwidth]{plots/threads_timeline2.pdf} \label{threads_timeline2} } \caption{Thread Balance in Two-Stage Pipelines} \label{threads_timeline2} \end{figure} % and the first thread has to wait for the second thread finishing reading the shared data before it can reuse the memory space. Overall, our design assumption is that an accelerated Xerces parser will be most significant for applications that themselves perform substantial processing on the parsed XML data delivered.  Our design is intended for a range of applications ranging between two design points.   The first design point is one in which XML parsing cost handled by the \PS{} dominates at 67\% of the overall cost, with the cost of application processing (including the driver logic withinn the \MP{}) still being quite significant at 33\%.   The second is almost the reverse scenario, the cost of application processing dominates at 60\% of the overall cost, while the overall cost of parsing represents an overhead of 40\%. Overall, our design is intended to benefit a range of applications. Conceptually, we consider two design points. The first, the parsing performed by the \PS{} dominates at 67\% of the overall cost, with the cost of application processing (including the driver logic within the \MP{}) at 33\%. The second is almost the opposite scenario, the cost of application processing dominates at 60\%, while the cost of XML parsing represents an overhead of 40\%. Our design is also predicated on a goal of using the Parabix framework to achieve achieving a 50\% to 100\% improvement in the parsing engine itself.   Our best case scenario is Our design is predicated on a goal of using the Parabix framework to achieve a 50\% to 100\% improvement in the parsing engine itself. In a best case scenario, a 100\% improvement of the \PS{} for the design point in which XML parsing dominates at 67\% of the total application cost. In this case, single-threaded \icXML{} should achieve a 50\% speedup In this case, the single-threaded \icXML{} should achieve a 1.5x speedup over Xerces so that the total application cost reduces to 67\% of the original. However, with our two-stage pipeline model, our ideal scenario gives us two well-balanced threads each performing about 33\% of the original work.   In this case, Amdahl's law predicts that we could expect up to a 3X speedup, at best. However, in \icXMLp{}, our ideal scenario gives us two well-balanced threads each performing about 33\% of the original work. In this case, Amdahl's law predicts that we could expect up to a 3x speedup at best. At the other extreme of our design range, we consider an application in which core parsing cost is 40\%.   Assuming the 2X speedup of in which core parsing cost is 40\%.   Assuming the 2x speedup of the \PS{} over the corresponding Xerces core, single-threaded \icXML{} delivers a 25\% speedup.   However, the most significant