Publication: Local Basic Linear Algebra Subroutines (LBLAS) for the CM-5/5E
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
The Connection Machine Scientific Software Library (CMSSL) is a library of scientific routines designed for distributed memory architectures. The BLAS of the CMSSL have been implemented as a two{level structure to exploit optimizations local to nodes and across nodes. This paper presents the implementation considerations and performance of the Local BLAS, or BLAS local to each node of the system. A wide variety of loop structures and unrollings have been implemented in order to achieve a uniform and high performance, irrespective of the data layout in node memory. The CMSSL is the only existing high{performance library capable of supporting both the data parallel and message passing modes of programming a distributed memory computer. The implications of implementing BLAS on distributed memory computers are considered in this light.