Publication:
Matrix Multiplication on Hypercubes Using Full Bandwidth and Constant Storage

Thumbnail Image

Date

1991

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Ho, Ching-Tien, S. Lennart Johnsson, and Alan Edelman. 1991. Matrix Multiplication on Hypercubes Using Full Bandwidth and Constant Storage. Harvard Computer Science Group Technical Report TR-19-91.

Research Data

Abstract

For matrix multiplication on hypercube multiprocessors with the product matrix accumulated in place a processor must receive about P^2/√ N elements of each input operand, with operands of size PxP distributed evenly over N processors. With concurrent communication on all ports, the number of element transfers in sequence can be reduced to P^2/√N logN for each input operand. We present a two-level partitioning of the matrices and an algorithm for the matrix multiplication with optimal data motion and constant storage. The algorithm has sequential arithmetic complexity 2P^3, and parallel arithmetic complexity 2P^3/N. The algorithm has been implemented on the Connection Machine model CM-2. For the performance on the 8K CM-2, we measured about 1.6 Gflops, which would scale up to about 13 Gflops for a 64K full machine.

Description

Other Available Sources

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories