Publication: Generalized Shuffle Permutations on Boolean Cubes
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
In a generalized shuffle permutation an address (a[q-1]a[1-2]...a[0]) receives its content from an address obtained through a cyclic shift on a subset of the q dimensions used for the encoding of the addresses. Big-complementation may be combined with the shift. We give an algorithm that requires (K/2)+2 exchanges for K elements per processor, when storage dimensions are part of the permutation, and concurrent communication on all ports of every processor possible. The number of element exchanges in sequence is independent of the number of processor dimensions sigma(r) in the permutation. With no storage dimensions in the permutation our best algorithm requires (sigma[r]+1)(K/2sigma[r]) element exchanges. We also give an algorithm for sigma(r)=2, or the real shuffle consists of a number of cycles of length two, that requires (K/2)+1 element exchanges in sequence when there is no bit complement. The lower bound is (K/2) for both real and mixed shuffles with no bit complementation. The minimum number of communication start-ups for sigma(r) for both cases, which is also the lower bound. The data transfer time for communication restricted to one port per processor is sigma(r)(K/2), and the minimum number of start-ups is sigma(r). The analysis is verified by experimental results on the Intel iPSC/1, and for one case also on the Connection Machine.