Changeset 1302 for docs/Working
 Timestamp:
 Aug 8, 2011, 4:43:21 PM (8 years ago)
 Location:
 docs/Working
 Files:

 2 edited
Legend:
 Unmodified
 Added
 Removed

docs/Working/parlensort.tex
r1187 r1302 89 89 be implemented using our existing Pablo language, avoiding lowlevel C programming. 90 90 91 \section{DIV2 grouping} 92 Given streams $S$ and $E$ that mark the starts and ends of Names. 93 We can partition $E$ into lengthsorted groups using div2length strategy: $G_i = n(E_{i*21})E_{i*2}$ 94 95 First we calculate $T$, which marks the two positions after each name. 96 Then we can get $G_1$ as follows. 97 \begin{eqnarray*} 98 T & = & E  n(E)\\ 99 S_2 & = & n(n(S))\\ 100 G_1 & = & S_2 \wedge T \\ 101 \end{eqnarray*} 102 103 $G_i (i>1)$ can be similarly determined as follows. 104 105 \begin{eqnarray*} 106 S_{i*2} & = & n(n(S_{(i1)*2} \wedge \neg T)) \\ 107 G_i & = & S_{i*2} \wedge T \\ 108 \end{eqnarray*} 109 110 Once the length sorting above has completed, then we can process 111 names using bitscan techniques. 112 113 For each $G_i (i>0)$: 114 \begin{enumerate} 115 \item Sequentially scan $G_i$ to find positions $P_i = \{p_{i,0},p_{i,1},...,p_{i,n},\}$. 116 \item For each $k$ in $\{1,2...,n\}$, calculate hash value for name $N[p_{i,ki*2}]...N[p_{i,k1}]$ 117 \item Look up hash table. If the name exist, process next name. 118 Otherwise, if $N[p_{i,k1}]$ is a delimiter (name length is odd), lookup name $N[p_{i,ki*2}]...N[p_{i,k2}] (i>1)$ or name $N[p_{1,k2}]$. 119 If it exists, get its GID $g$ and insert $N[p_{i,ki*2}]...N[p_{i,k1}]$ with GID $g$. 120 If it dosen't exist, insert this name with a new GID and insert $N[p_{i,ki*2}]...N[p_{i,k1}]$ with the same GID. 121 If $N[p_{i,k1}]$ is not a delimiter (name length is even), insert $N[p_{i,ki*2}]...N[p_{i,k1}]$ with a new GID. 122 \end{enumerate} 123 124 % \section{Grouping Strategies} 125 % \subsection{Length Grouping} 126 % \subsubsection{ID grouping} 127 % 128 % \subsubsection{LOG grouping} 129 % 130 % \subsubsection{DIV grouping} 131 % 132 % \subsection{Length + hash Grouping} 133 % 134 % \section{Sorting Algorithms} 135 % \subsection{Sequential Sorting} 136 % 137 % \subsection{Parallel Sorting} 138 % 139 % \section{Group Processing Strategies} 140 % 141 % \subsection{Linear SIMD Search} 142 % 143 % \subsection{Trie} 144 % 145 % \subsection{Hash} 146 91 147 \end{document} 92 148
Note: See TracChangeset
for help on using the changeset viewer.