An Approach to Normalization of Dai Text for Speech Synthesis
Abstract: With the purpose of developing a Dai speech synthesis system, this paper focuses on the study of Dai numbers and special characters normalization. Both numbers and special characters are the non-standard words in Dai text. The main purpose of the text normalization is to represent the pronunciation of non-standard words with standard words. The normalization process includes non-standard words recognition, ambiguity judgment, disambiguation and non-standard transla-tion. Firstly, the non-standard words are recognized and the ambiguous types of these non-stan- dard words are determined using a method based on rule-based and context-keyword, in this paper. Then, the types of ambiguity are judged on regular expression. Lastly, the correct pronunciation of no-standard words is determined according to the transformation rules. Experimental results show that the correct rate of this normalization is more than 94.6%. This purposed method can fully satisfy the front-end text analysis in Dai text to speech conversion system, and has a good natural language processing application value.
文章引用: 伍烛梅 , 杨鉴 , 王展 (2016) 傣语语音合成中的文本归一化方法。 计算机科学与应用， 6， 415-422. doi: 10.12677/CSA.2016.67051
 戴红亮, 张公瑾. 西双版纳傣语基础教程[M]. 北京: 中央民族大学出版社, 2012.
 玉康, 张秋生, 岩温龙. 西双版纳傣语基础教程[M]. 昆明: 云南民族出版社, 2006.
 Gao, L., Chen, Q., Li, Y.H., et al. (2010) Several Problems of Text Analysis in Tibetan Speech Synthesis. Journal of Northwest University for Nationalities (Natural Science Edition), 2, 1-7.
 Hopkins, H. and Edmunds, T. (2016) Broadcast System Using Text to Speech Conversion. United States Patent 9263027.
Haunschild, R. and Bornmann, L. (2016) Normalization of Mendeley Reader Counts for Impact Assessment. Journal of in Formetrics, 10, 62-73.
Sproat, R., Black, A.W. and Chen S. (2001) Normalization of Non-Standard Words. Computer Speech & Language, 15, 287-333.
 戴红亮. 西双版纳傣语数词层次分析[J]. 民族语文, 2004(4): 22-26.
 邱涛, 王斌, 杨晓春. 利用关键因子过滤的正则表达式匹配算法[J]. 计算机科学与探索, 2016(3): 326-337.