具有同时多种子检测与并行扩展能力的BLAST算法加速器研究
The Research on BLAST Accelerator with Parallel Multi-Seeds Detection and Extension

作者: 夏 飞 , 金国庆 , 张春雷 :海军工程大学电子工程学院,湖北 武汉;

关键词: 多种子并行检测硬件加速生物序列搜索FPGAMulti-Seeds Detection Hardware Acceleration Bio-Sequence Searching FPGA

摘要:
本文针对目前流行的BLAST算法存在的种子检测效率不高的问题,提出了一种不同于常规查询策略的并行多种子检测算法,并基于脉动阵列结构设计和实现了一种具备同时多种子检测与并行扩展能力的算法加速器,实现了对NCBI BLAST程序前两个阶段的加速。加速器阵列采用流水工作模式,能够动态计算所有单词的匹配点位置,避免了索引结构的存储开销;采用了阵列分组和并行种子收集策略减少了有效种子数量;采用组内种子合并策略避免了冗余扩展操作;采用多种子并行扩展策略实现了无阻塞的数据库搜索,显著提高了数据库搜索效率。实验结果表明,本设计在阵列规模和时钟频率上都优于相关研究工作,相对于目前主流的通用微处理器可获得20倍以上的加速效果。

Abstract: BLAST adopts index-based approach to detect the matches between two substrings by looking up a large table and processing one match per query. In this paper, we propose a systolic array approach to detect string matches without using looking up tables. The pipelining systolic array is implemented as a multi-seeds detection and parallel extension engine to accelerate the first two stages of NCBI BLAST algorithm. Different from the index-based approach, our implementation con- sumes little memory resources and eliminates redundant string extensions. Our FPGA implemen-tation achieves superior performance results in both of processing element number and clock frequency over related works in the area of FPGA BLAST accelerators. The experimental results show a speedup factor of more than 20× over the NCBI BLAST software running on a current main- stream PC platform.

文章引用: 夏 飞 , 金国庆 , 张春雷 (2015) 具有同时多种子检测与并行扩展能力的BLAST算法加速器研究。 计算生物学, 5, 1-10. doi: 10.12677/HJCB.2015.51001

参考文献

[1] Altschul, S.F., Gish, W., Miller, M., Myers, E.W. and Lipman, D.J. (1990) Basic local alignment search tool. Journal of Molecular Biology, 215, 403-410.

[2] Kim, T.K., et al. (2005) HGBS: A hardware-oriented grid blast system. Proceedings of 5th International Symposium on Cluster Computing and the Grid, 1, 520-526.

[3] Yutao, Q. and Feng, L. (2003) Cyberparablast: The parallelized blast web server. Proceedings of 2nd International Conference on Cyber-worlds (CW’03), 3-5 December 2003, 474-477.

[4] Farmerie, W.G., Hammer, J., Liu, L. and Schneider, M. (2003) Having a blast: Analyzing gene sequence data with blastquest. Proceedings of 14th International Workshop on Database and Expert Systems Applications (DEXA’03), Prague, 1-5 September 2003, 37-41.

[5] Bjornson, R., Sherman, A., Weston, S., Willard, N. and Wing, J. (2002) Turboblast: A parallel implementation of blast built on the turbo-hub. Proceedings of 16th International Parallel and Distributed Processing Symposium (IPDPS’02), Ft. Lauderdale, 15-19 April 2002, 183-190.

[6] Lin, H., Ma, X., Chandramohan, P., Geist, A. and Samatova, N. (2005) Efficient data access for parallel blast. Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS’05), 1, 8 p.

[7] Oehmen, C. and Nieplocha, J. (2006) Scalablast: A Scalable Implementation of BLAST for High-Performance Data- Intensive Bioinformatics Analysis. IEEE Transaction on Parallel and Distributed Systems, 17, 740-749.

[8] National Center for Biotechnology Information (2014) Notes on GenBankstatistics. http://www.ncbi.nlm.nih.gov/genbank/statistics

[9] Buhler, J.D., Lancaster, J.M., Jacob, A.C. and Chamberlain, R.D. (2007) Mercury Blastn: Faster DNA sequence comparison using a streaming hardware architecture. Proceedings of 3rd Annual Reconfigurable Systems Summer Institute (RSSI’07), Urbana, 17-20 July 2007, 10 p.

[10] Krishnanurthy, P., Buhler, J., Chamberlain, R., et al. (2007) Biosequence similarity search on the mercury system. Proceedings of 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP’04), 365-375. Journal of VLSI Signal Process System, 49, 101-121.

[11] Lancaster, J.M., Buhler, J.D. and Chamberlain, R.D. (2009) Acceleration of ungapped extension in Mercury BLAST. Journal of Microprocessors & Microsystems, 33, 281-289.

[12] Jacob, A., Lancaster, J., Buhler, J. and Chamberlain, R.D. (2007) FPGA-accelerated seed generation in Mercury BLASTP. Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, Napa, 23-25 April 2007, 95-106.

[13] Muriki, K. and Underwood, K.D. and Sass, R. (2005) RC-BLAST: Towards a portable, cost-effective open source hardware implementation. Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, Denver, 4-8 April 2005, 196-203.

[14] Herbordt, M.C., Model, J., Gu, Y.F., Sukhwani, B. and Van Court, T. (2006) Single pass, blast-like, approximate string matching on FPGAs. Proceedings of 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, Napa, 24-26 April 2006, 217-226.

[15] Lavenier, D., Xinchun, L. and Georges, G. (2006) Seed-based genomic sequence comparison using a FPGA/FLASH accelerator. Proceedings of IEEE International Conference on Field Programmable Technology, Bangkok, 13-15 December 2006, 41-48.

[16] Sotiriades, E. and Dollas, A. (2007) Design space exploration for the BLAST algorithm implementation. Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, Napa, 23-25 April 2007, 323-326.

[17] Sotiriades, E., Kozanitis, C. and Dollas, A. (2006) FPGA based architecture for DNA sequence comparison and database search. Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium, Rhodes Island, 25-29 April 2006, 186-193.

[18] Chang, C. (2012) BLAST implementation on BEE2. Technical Report, Electrical Engineering and Computer Science University of California at Berkeley. http://bee2.eecs.berkeley.edu

[19] CLC desktop Hardware-acceleration. White Paper on CLC Bioinformatics Cube, 2006. http://www.clccube.com

[20] Mitrion. Inc. (2009) Ncbi blast accelerator. http://www.mitrionics.com

[21] Timelogic Biocomputing Solisions (2010) Timelogic decypher BLAST engine intro-duction. http://www.timelogic.com/decypher_blast.html

[22] NCBI BLAST software download, 2010. ftp://ftp.ncbi.nih.gov/blast/

[23] European Bioinformatics Institute, EBI (2008) EBI database web. http://www.ebi.ac.uk/uniprot/database/download.html

[24] National Center for Biotechnology Information (2009) BLAST database web. http://blast.ncbi.nlm.nih.gov/Blast.cgi

[25] General purpose micro-processor power dissipation statistic. List of CPU power dissipation figures, 2014. http://en.wikipedia.org/wiki/ListofCPUpowerdissipation

分享
Top