Ignore:
Timestamp:
Aug 23, 2011, 1:02:30 AM (8 years ago)
Author:
ashriram
Message:

new abstract for new intro

File:
1 edited

Legend:

Unmodified
Added
Removed
  • docs/HPCA2012/09-pipeline.tex

    r1347 r1348  
    66Moreover, using mulitiple cores, we can further improve the performance of Parabix while keeping the energy consumption at the same level.
    77
    8 A typical approach to parallelizing software, data parallelism, requires nearly independent data,
    9 However, the nature of XML files makes them hard to partition nicely for data parallelism.
    10 Several approaches have been used to address this problem.
    11 A preparsing phase has been proposed to help partition the XML document \cite{dataparallel}.
    12 The goal of this preparsing is to determine the tree structure of the XML document
    13 so that it can be used to guide the full parsing in the next phase.
    14 Another data parallel algorithm is called ParDOM \cite{Shah:2009}.
    15 It first builds partial DOM node tree structures for each data segments and then link them
    16 using preorder numbers that has been assigned to each start element to determine the ordering among siblings
    17 and a stack to manage the parent-child relationship between elements.
     8A typical approach to parallelizing software, data parallelism,
     9requires nearly independent data, However, the nature of XML files
     10makes them hard to partition nicely for data parallelism.  Several
     11approaches have been used to address this problem.  A preparsing phase
     12has been proposed to help partition the XML document
     13\cite{dataparallel}.  The goal of this preparsing is to determine the
     14tree structure of the XML document so that it can be used to guide the
     15full parsing in the next phase.  Another data parallel algorithm is
     16called ParDOM \cite{Shah:2009}.  It first builds partial DOM node tree
     17structures for each data segments and then link them using preorder
     18numbers that has been assigned to each start element to determine the
     19ordering among siblings and a stack to manage the parent-child
     20relationship between elements.
    1821
    19 Data parallelism approaches introduce a lot of overheads to solve the data dependencies between segments.
    20 Therefore, instead of partitioning the data into segments and assigning different data segments to different cores,
    21 we propose a pipeline parallelism strategy that partitions the process into several stages and let each core work with one single stage.
     22Data parallelism approaches introduce a lot of overheads to solve the
     23data dependencies between segments.  Therefore, instead of
     24partitioning the data into segments and assigning different data
     25segments to different cores, we propose a pipeline parallelism
     26strategy that partitions the process into several stages and let each
     27core work with one single stage.
    2228
    23 The interface between stages is implemented using a circular array,
    24 where each entry consists of all ten data structures for one segment as listed in Table \ref{pass_structure}.
    25 Each thread keeps an index of the array ($I_N$),
    26 which is compared with the index ($I_{N-1}$) kept by its previous thread before processing the segment.
    27 If $I_N$ is smaller than $I_{N-1}$, thread N can start processing segment $I_N$,
    28 otherwise the thread keeps reading $I_{N-1}$ until $I_{N-1}$ is larger than $I_N$.
    29 The time consumed by continuously loading the value of $I_{N-1}$ and
    30 comparing it with $I_N$ will be later referred as stall time.
    31 When a thread finishes processing the segment, it increases the index by one.
     29The interface between stages is implemented using a circular array,
     30where each entry consists of all ten data structures for one segment
     31as listed in Table \ref{pass_structure}.  Each thread keeps an index
     32of the array ($I_N$), which is compared with the index ($I_{N-1}$)
     33kept by its previous thread before processing the segment.  If $I_N$
     34is smaller than $I_{N-1}$, thread N can start processing segment
     35$I_N$, otherwise the thread keeps reading $I_{N-1}$ until $I_{N-1}$ is
     36larger than $I_N$.  The time consumed by continuously loading the
     37value of $I_{N-1}$ and comparing it with $I_N$ will be later referred
     38as stall time.  When a thread finishes processing the segment, it
     39increases the index by one.
    3240
    3341\begin{table*}[t]
Note: See TracChangeset for help on using the changeset viewer.