Publication: Matrix Multiplication on Hypercubes Using Full Bandwidth and Constant Storage
Open/View Files
Date
1991
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Ho, Ching-Tien, S. Lennart Johnsson, and Alan Edelman. 1991. Matrix Multiplication on Hypercubes Using Full Bandwidth and Constant Storage. Harvard Computer Science Group Technical Report TR-19-91.
Research Data
Abstract
For matrix multiplication on hypercube multiprocessors with the product matrix accumulated in place a processor must receive about P^2/√ N elements of each input operand, with operands of size PxP distributed evenly over N processors. With concurrent communication on all ports, the number of element transfers in sequence can be reduced to P^2/√N logN for each input operand. We present a two-level partitioning of the matrices and an algorithm for the matrix multiplication with optimal data motion and constant storage. The algorithm has sequential arithmetic complexity 2P^3, and parallel arithmetic complexity 2P^3/N. The algorithm has been implemented on the Connection Machine model CM-2. For the performance on the 8K CM-2, we measured about 1.6 Gflops, which would scale up to about 13 Gflops for a 64K full machine.
Description
Other Available Sources
Keywords
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service