OPTIMIZING THE MATRIX MULTIPLICATION PARALLEL ALGORITHMS ON A DISTRIBUTED-MEMORY MIMD MULTIPROCESSOR
Titel:
OPTIMIZING THE MATRIX MULTIPLICATION PARALLEL ALGORITHMS ON A DISTRIBUTED-MEMORY MIMD MULTIPROCESSOR
Auteur:
Garg, Sharad Sholl, Howard A. Ammar, Reda A.
Verschenen in:
International journal of parallel, emergent and distributed systems
Paginering:
Jaargang 2 (1994) nr. 4 pagina's 291-303
Jaar:
1994
Inhoud:
In the past few years, there have been significant developments in the area of distributed and parallel processing. More powerful and new hardware architectures are being produced at a rapid rate, such as distributed-memory MIMD computers, which have provided enormous computing power to the software engineers. These multiprocessors may provide a significant speed-up over the serial execution of an algorithm. However, this requires careful partitioning and allocation of data and control to the processor set. Matrix multiplication is a fundamental parallel algorithm which can be effectively executed on a distributed-memory multiprocessor and can show significant improvement in the speed-up over the serial execution. Ideally, we should be able to achieve a linear speed up with increase in the number of processors, but in practice the speed up is much less, and in fact increasing the number of processors beyond a certain number may result in degradation of the completion time. This degradation is caused by increased communications between modules. Therefore, the optimum speed-up is a function of the number of processors and the communication cost. To find the optimum performance, a user need to experiment with all the available processors on a multiprocessor. In this paper, we studied the detailed performance of the parallel matrix multiplication algorithm. The study defines the factors that control the performance of this class of algorithms and shows how to use these factors to optimize the algorithm's execution time. Also, an analytic approach is described which can eliminate a trial and effort method to actually determine the size of processor set.