多值决策表的最小决策树生成
Minimal Decision Tree Generation for Multi-Label Decision Tables

作者: 乔莹 , 许美玲 , 钟发荣 , 曾静 , 莫毓昌 :浙江师范大学,浙江 金华;

关键词: 多值决策表决策树动态规划算法Multi-Label Decision Tables Decision Trees Dynamic Programming Algorithm

摘要: 决策树技术在数据挖掘的分类领域应用极其广泛,可以从普通决策表(每行记录包含一个决策值)中挖掘有价值的信息,但是要从多值决策表(每行记录包含多个决策值)中挖掘潜在的信息则比较困难。多值决策表中每行记录包含多个决策值,多个决策属性用一个集合表示。针对已有的启发式算法,如贪心算法,由于性能不稳定的特点,该算法获得的决策树规模变化较大,本文基于动态规划的思想,提出了使决策树规模最小化的算法。该算法将多值决策表分解为多个子表,通过多值决策表的子表进行构造最小决策树,进而对多值决策表进行数据挖掘。

Abstract: Decision tree is a widely used classification in data mining. It can discover the essential knowledge from the common decision tables (each row has a decision). However, it is difficult to do data mining from the multi-label decision tables (each row has a set of decisions). In a multi-label decision tables, each row contains several decisions, and several decision attributes are represented using a set. By testing the existing heuristic algorithms, such as greedy algorithms, their performance is not stable, i.e., the size of the decision tree might become very large. In this paper, we propose a dynamic programming algorithm to minimize the size of the decision trees for a multi- label decision table. In our algorithm, the multi-label decision table is divided into several subtables, and the decision tree is constructed by using all subtables of the multi-label decision table, then useful information can be discovered from the multi-label decision tables.

文章引用: 乔莹 , 许美玲 , 钟发荣 , 曾静 , 莫毓昌 (2016) 多值决策表的最小决策树生成。 计算机科学与应用, 6, 617-628. doi: 10.12677/CSA.2016.610076

参考文献

[1] Boutell, M.R., Luo, J., Shen, X. and Brown, C.M. (2004) Learning Multi-Label Scene Classification. Pattern Recognition, 37, 1757-1771.
http://dx.doi.org/10.1016/j.patcog.2004.03.009

[2] Wieczorkowska, A., Synak, P., Lewis, R.A. and Ras, Z.W. (2005) Extracting Emotions from Music Data. ISMIS, Volume 3488 of the series Lecture Notes in Computer Science, 456-465.

[3] Blockeel, H., Schietgat, L., Struyf, J., Dzeroski, S. and Clare, A. (2006) Decision Trees for Hierarchical Multilabel Classification: A Case Study in Functional Genomics. In: Fürnkranz, J., Scheffer, T. and Spiliopoulou, M., Eds., Proceedings PKDD, ser. LNCS, Springer, Berlin, Vol. 4213, 18-29.

[4] Zhou, Z.-H., Jiang, K. and Li, M. (2005) Multi-Instance Learning Basedweb Mining. Applied Intelligence, 22, 135-147.
http://dx.doi.org/10.1007/s10489-005-5602-z

[5] Moshkov, M. and Zielosko, B. (2011) Combinatorial Machine Learning -A Rough Set Approach. ser. Studies in Computational Intelligence, Springer, Vol. 360.
http://dx.doi.org/10.1007/978-3-642-20995-6

[6] Comité, F.D., Gilleron, R. and Tommasi, M. (2003) Learning Multi-Label Alternating Decision Trees from Texts and Data. Proceedings of 3rd International Conference, MLDM 2003, Leipzig, 5-7 July 2003, 35-49.
http://dx.doi.org/10.1007/3-540-45065-3

[7] Loza Mencía, E. and Fürnkranz, J. (2008) Pairwise Learning of Multilabel Clas-sifications with Perceptrons. IEEE International Joint Conference on Neural Networks, 1-8 June 2008, 2899-2906.
http://dx.doi.org/10.1109/IJCNN.2008.4634206

[8] Tsoumakas, G., Katakis, I. and Vlahavas, I.P. (2010) Mining Multi-Label Data. In: Maimon, O. and Rokach, L., Eds., Data Mining and KnowledgeDiscovery Handbook, Tel Aviv University, 667-685.

[9] Zhou, Z.-H., Zhang, M.-L., Huang, S.-J. and Li, Y.-F. (2012) Multi-Instance Multi-Label Learning. Artificial Intelli-gence, 176, 2291-2320.
http://dx.doi.org/10.1016/j.artint.2011.10.002

[10] Azad, M., Chikalov, I., Moshkov, M. and Zielosko, B. (2012) Greedy Algorithm for Construction of Decision Trees for Tables with Many-Valued Decisions. Proceedings of the 21st International Workshop on Concurrency, Specification and Programming, Berlin, 26-28 September 2012, ser. CEUR Workshop Proceedings, L. Popova-Zeugmann, Ed.CEUR-WS.org, 2012, Vol. 928.

[11] Azad, M., Chikalov, I. and Moshkov, M. (2013) Three Approaches to Deal within Consistent Decision Tables— Comparison of Decision Tree Complexity. RSFDGrC, Halifax, 11-14 Oc-tober 2013, 46-54.
http://dx.doi.org/10.1007/978-3-642-41218-9

[12] Tsoumakas, G. and Katakis, I. (2007) Multi-Label Classification: An Over-view. IJDWM, 3, 1-13.
http://dx.doi.org/10.4018/jdwm.2007070101

[13] Zhu, X. and Goldberg, A.B. (2009) Introduction to Semi-Supervised Learning, ser. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, San Rafael, Califor-nia.

[14] Cour, T., Sapp, B., Jordan, C. and Taskar, B. (2009) Learning from Ambiguously Labeled Images. CVPR, Miami, Florida, 20-25 July 2009, 919-926.

[15] Hüllermeier, E. and Beringer, J. (2006) Learning from Ambiguously Labeled Examples. Intelligent Data Analysis, 10, 419-439.

[16] Jin, R. and Ghahramani, Z. (2002) Learning with Multiple Labels. NIPS, Vancouver, British Co-lumbia, 9-14 December 2002, 897-904.

[17] Moshkov, M. and Chikalov, I. (2000) On Algorithm for Constructing of Decision Trees with Minimal Depth. Fundamenta Informaticae, 41, 295-299.

分享
Top