Research on Statistical Analysis of Gene Splicing Sites
Abstract: The genes of eukaryotes are composed of several exons and introns. After transcript process, sequences of exons are retained, while sequences of introns are cleaved off. A large number of experiments of molecular biology validate that the splicing sites between exon and intron follow the rule of GT-AG, only a few GT or AG sequences are true splicing sites, and the accuracy of the prediction still needs to be improved. In this study, the training dataset of splicing site of HS3D was downloaded, and a statistical analysis of the sequence near the splicing site of the promoter was carried out. The sequence showed high specificity when the true and false sequence lengths of the left splicing site side and right splicing site side were both more than seven, which was helpful to train the sequences characters so as to accurately identify the true and false splicing sites.
文章引用: 李宏彬 , 赫光中 (2016) 基因剪切位点的统计分析研究。 计算生物学， 6， 41-49. doi: 10.12677/HJCB.2016.63006
 Sun, J. (1993) Predicting the Splicing Sites of mRNA by Neural Network. Acta Biophysica Sinica, 9, 127-131.
 Xia, H., Zhou, Q. and Yanda, L.I. (2002) Application of Hidden Markov Model in the Recognition of Splicing Sites. Journal of Tsinghua University, 42, 1214-1217.
Snyder, E.E. and Stormo, G.D. (1993) Identification of Coding Regions in Genomic DNA Sequences: An Application of Dynamic Programming and Neural Networks. Nucleic Acids Research, 21, 607-613.
Zhang, L.R. and Luo, L.F. (2003) Splice Site Prediction with Quadratic Discriminant Analysis Using Diversity Measure. Nucleic Acids Research, 31, 6214-6220.
Cai, D., Delcher, A., Kao, B. and Kasif, S. (2000) Modeling Splice Sites with Bayes Networks. Bioinformatics, 16, 152-158.
Yin, C. and Yau, S.T. (2007) Prediction of Protein Coding Regions by the 3-Base Periodicity Analysis of a DNA Sequence. Journal of Theoretical Biology, 247, 687-694.
Pollastro, P. and Rampone, S. (2002) HS3D, a Dataset of Homo Sapiens Splice Regions, and Its Extraction Procedure from a Major Public Database. International Journal of Modern Physics C, 13, 1105-1117.