Publication: Implementing O(N) N-body Algorithms Efficiently in Data Parallel Languages (High Performance Fortran)
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
O(N) algorithms for N-body simulations enable the simulation of particle systems with up to 100 million particles on current Massively Parallel Processors (MPPs). Our optimization techniques mainly focus on minimizing the data movement through careful management of the data distribution and the data references, both between the memories of different nodes, and within the memory hierarchy of each node. We show how the techniques can be expressed in languages with an array syntax, such as Connection Machine Fortran (CMF). All CMF constructs used, with one exception, are included in High Performance Fortran. The effectiveness of our techniques is demonstrated on an implementation of Anderson's hierarchical O(N) Nbody method for the Connection Machine system CM-5/5E. Of the total execution time, communication accounts for about 10-20% of the total time, with the average efficiency for arithmetic operations being about 40% and the total efficiency (including communication) being about 35%. For the CM-5E a performance in excess of 60 Mflop/s per node (peak 160 Mflop/s per node) has been measured.