Publication:

Block-Cyclic Dense Linear Algebra

Loading...
Thumbnail Image

Date

1992

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Lichtenstein, Woody and S. Lennart Johnsson. 1992. Block-Cyclic Dense Linear Algebra. Harvard Computer Science Group Technical Report TR-04-92.

Abstract

Block-cyclic order elimination algorithms for LU and QR factorization and solve routines are described for distributed memory architectures with processing nodes configured as two-dimensional arrays of arbitrary shape. The cyclic order elimination together with a consecutive data allocation yields good load-balance for both the factorization and solution phases for the solution of dense systems of equations by LU and QR decomposition. Blocking may offer a substantial performance enhancement on architectures for which the level-2 or level-3 BLAS are ideal for operations local to a node. High rank updates local to a node may have a performance that is a factor of four or more higher than a rank-1 update. We show that in many parallel implementations, the O(N2) work in the factorization may be of the same significance as the O(N3) work, even for large matrices. The O(N2) work is poorly load-balanced in two-dimensional nodal arrays, which we show are optimal with respect to communication for consecutive data allocation, block-cyclic order elimination, and a simple, but fairly general, communications model. In our Connection Machine system CM{200 implementation, the peak performance for LU factorization is about 9.4 Gflops/s in 64-bit precision and 16 Gflops/s in 32-bit precision. Blocking offers an overall performance enhancement of about a factor of two. The broadcast and reduce operations fully utilize the bandwidth available in the Boolean cube network interconnecting the nodes along each axis of the two{dimensional nodal array embedded in the cube network.

Description

Other Available Sources

Research Data

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories