计算机科学与应用

Vol.5 No.8 (August 2015)

一种基于关联矩阵的高效DNA序列挖掘算法
An Efficient Algorithm for Mining DNA Sequences Based on the Association Matrix

 

作者:

毛国君 , 杨静欣 :中央财经大学信息学院,北京

 

关键词:

DNA序列数据挖掘关联矩阵关键序列挖掘DNA Sequence Data Mining Association Matrix Key Sequence Mining

 

摘要:

DNA分析是生物信息学研究中基础而核心的工作,而数据挖掘作为支撑生物信息学的重要技术,已经被广泛应用到DNA序列的分析中。与传统的商业领域的事务序列相比,DNA序列具有项目符号少但序列长度长的特点,因此经典的序列挖掘算法很难适应DNA序列的模式挖掘需要。本文在分析DNA序列的挖掘需求基础上,提出了一种称为关联矩阵的数据结构。关联矩阵能够将序列数据压缩成可分析的矩阵形式,所以它的空间紧凑性能够使得超长的DNA序列能够在有限的内存中加以处理。基于关联矩阵结构,设计了高效的DNA序列的关键序列挖掘算法。实验说明了本文算法在DNA序列分析中的高效性。

The DNA analysis is the core of bioinformatics research, and as an important technology to support bioinformatics, the data mining has been widely applied to the analysis of DNA sequences. Compared to the transaction sequences in traditional business areas, DNA sequences have the characteristics that are item-less but length-longer, so the classic sequence mining algorithms are not perfectly suitable for the DNA sequence pattern mining. Based on the analysis of DNA sequence mining demands, we propose an efficient data structure, called Association Matrix. Such a structure can compress a long DNA sequence into a matrix form which can be effectively analyzed. Therefore, by making use of the space compactness of this structure, we can deal with DNA sequences with a super-long length in a limited memory. Based on the Association Matrix, we design an efficient mining algorithm to find the key segments from DNA Sequence. Experiments show that the proposed algorithm performs well in DNA sequence mining.


文章引用:

毛国君 , 杨静欣 (2015) 一种基于关联矩阵的高效DNA序列挖掘算法。 计算机科学与应用, 5, 271-277. doi: 10.12677/CSA.2015.58035

 

参考文献

分享
Top