大型矩阵相乘并行计算的特性分析
Performance Analysis of Large Scale Parallel Matrix Multiplication

作者: 尚 智 :新加坡科技研究局高性能计算研究院,新加坡; 陈 硕 :同济大学航空航天与力学学院,上海;

关键词: 大规模计算海量数据并行计算分布式运算矩阵相乘 Large Scale Computing Massive Data Parallel Computing Distributed Computing Matrix Multiplication

摘要:

随着科学研究和工程计算的发展,大规模计算和模拟已经无法避免。这些大规模计算往往涉及海量数据的运算和处理,因此并行计算被用来一方面解决大规模的快速计算,另一方面解决海量数据的处理。基于MPI的并行计算可以方便地进行分布式运算,把海量数据分散在集群超级计算机上,使得每单个处理器(CPU)处理一小部分数据,从而实现快速运算和大规模计算。本文基于MPI的并行编程,实现了大规模矩阵的相乘运算,并且测试了点对点通信下的不通信机制(阻塞通信、非阻塞通信及其混合通信)的标准通信的并行性能。针对大型矩阵相乘计算,组建了完整的快速标准通信方法,并且防止死锁的发生。为今后的进一步实际应用奠定基础,提供有用的参考。

Abstract: The large scale computing is difficult to be avoided following the requirement of modern scientific re- searches and practical engineering applications. The computing and processing of massive data have to be involved in these large scale computing. The parallel computing therefore is employed to solve these issues of large scale comput- ing not only on fast computing but also on data processing. MPI-based parallel computing can easily realize distributed computing and massive data scattered in the cluster supercomputer, making each a single processor to handle a small portion of data in order to achieve fast computing and large scale computing. Based on MPI parallel programming a large-scale matrix multiplication operation was developed. Through the testes on the parallel performance of the point to point communications with the blocking communication, non-blocking communication and mixed communication, a complete set of rapid communication to prevent the occurrence of deadlock was established. The results were significant for the future practical applications.

文章引用: 尚 智 , 陈 硕 (2013) 大型矩阵相乘并行计算的特性分析。 软件工程与应用, 2, 15-19. doi: 10.12677/SEA.2013.21003

参考文献

[1] 王飞, 王建业, 张安堂. 实对称矩阵特征值分解高速并行算法的FPGA实现[J]. 空军工程大学(自然科学版), 2008, 9(6): 67-70.

[2] 位寅生, 谭久彬, 郭荣. MUSIC空间谱估计并行运算算法[J]. 系统工程与电子技术, 2012, 34(1): 12-16.

[3] 程豪, 张云泉, 张先轶, 李玉成. CPU-GPU并行矩阵乘法的实现与性能分析[J]. 软件技术与数据库, 2010, 36(13): 24-26.

[4] 伍湘君, 黄丽萍. 超级计算机上矩阵乘的并行计算与实现[J]. 应用气象学报, 2005, 16(1): 122-127.

[5] 唐俊奇. 多处理机中矩阵乘法的算法研究[J]. 中国西部科技, 2007, 2: 4-8.

[6] L. E. Cannon. A cellular computer to implement the Kalman filter algorithm. Montana State University, 1969.

[7] L. S. Blackford, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, R. Pozo, K. Remington and R. C. Whaley. An updated set of basic linear algebra subprograms (BLAS). ACM Transactions on Mathe- matical Software, 2002, 28(2): 135-151.

[8] S. Robinson. Toward an optimal algorithm for matrix multipli- cation. SIAM News, 2005, http://www.siam.org/pdf/news/174.pdf

[9] H. Cohn, R. Kleinberg, B. Szegedy and C. Umans. Group-theo- retic algorithms for matrix multiplication. Proceedings of the 46th Annual Symposium on Foundations of Computer Science, 23-25 October 2005.

[10] 迟学斌. 高性能并行计算[M]. 北京: 中国科学院计算机网络信息中心, 2005: 31-36.

[11] Y. Fournier, J. Bonelle, C. Moulinec, Z. Shang, A. G. Sunderland and J. C. Uribe. Optimizing Code_Saturne computations on Peta- scale systems. Computers & Fluids, 2011, 45: 103-108.

分享
Top