大数据分析中机器学习研究
The Study of Machine Learning in Big Data Analysis

作者: 洪歧 * , 杨刚 , 惠立山 :陕西理工大学,数学与计算机科学学院,陕西 汉中;

关键词: 大数据机器学习半监督学习大数据机器学习系统概率图模型R语言Big Data Machine Learning Semi-Supervised Learning Machine Learning System in Big Data Probabilistic Graph Model R Language

摘要:
机器学习在大数据分析中起着越来越重要的作用,本文主要对大数据背景下机器学习方法和技术等进行了归纳和总结。首先对机器学习的基本模型、分类进行简介;然后对大数据环境下的机器学习的几个关键技术进行了叙述;接着展示了目前流行的四种大数据机器学习系统,并分析了其特点;最后指明了大数据机器学习的主要研究方向和所遇到的挑战因素等。

Abstract: Machine learning played a more and more important role in the analysis of large data. The main methods and techniques of machine learning under the background of large data were summa-rized. Firstly, the basic model and classification of machine learning were introduced. Then, sev-eral key technologies of machine learning in large data environment were described. And the ar-ticle showed the popular four kinds of big data machine learning systems, and analyzed their characteristics. In the end, it pointed out the main research direction and the challenges of the big data machine learning.

文章引用: 洪歧 , 杨刚 , 惠立山 (2017) 大数据分析中机器学习研究。 人工智能与机器人研究, 6, 16-21. doi: 10.12677/AIRR.2017.61003

参考文献

[1] 李武军, 周志华. 大数据哈希学习: 现状与趋势[J]. 科学通报, 2015, 60(5-6): 485-490.

[2] 黄宜华. 大数据机器学习系统研究进展[J]. 大数据, 2015, 1(1): 28-47.

[3] Zhou, Z.H., Chawla, N.V., Jin, Y., et al. (2014) Big Data Opportunities and Challenges: Discussions from Data Analytics Perspectives. IEEE Computational Intelligence Magazine, 9, 62-74. https://doi.org/10.1109/MCI.2014.2350953

[4] Jordan, M. (2011) Message from the President: The Era of Big Data. ISBA Bulletin, 18, 1-3.

[5] Kleiner A., Talwalkar, A., Sarkar, P., et al. (2012) The Big Data Bootstrap. Proceedings of the 29th International Conference on Machine Learning (ICML), Edinburgh, 27 June-3 July 2012, 1759-1766.

[6] Bryant, R.E. (2011) Data-Intensive Scalable Computing for Scientific Applications. Computing in Science & Engineering, 13, 25-33. https://doi.org/10.1109/MCSE.2011.73

[7] 卓林超, 王堃. 大数据中面向乱序数据的改进型BP算法[J]. 系统工程理论与实践, 2014, 34(6): 158-164.

[8] 许烁娜, 曾碧卿, 熊芳敏. 面向大数据的在线特征提取研究[J]. 计算机科学, 2014, 41(9): 239-242.

[9] 田文英. 机器学习与数据挖掘[J]. 石家庄职业技术学院学报, 2004, 16(6): 30-32.

[10] 汪加才, 常青. 面向机器学习与数据挖掘实践教学的自由软件分析[J]. 南京审计学院学报, 2004, 1(3): 91-95.

[11] 王肇国, 易涵, 张为华. 基于机器学习特性的数据中心能耗优化方法[J]. 软件学报, 2014(7): 1432-1447.

[12] 朱军, 胡文波. 贝叶斯机器学习前沿进展综述[J]. 计算机研究与发展, 2015, 52(1): 16-26.

[13] 陈康, 向勇, 喻超. 大数据时代机器学习的新趋势[J]. 电信科学, 2012, 28(12): 88-95.

[14] Zaharia, M., Chowdhury, M., Das, T., et al. (2012) Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI), San Jose, 25-27 April 2012, 141-146.

[15] Venkataraman, S., Bodzsar, E., Roy, I., et al. (2013) Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices. Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys), Prague, 14-17 April 2013, 197-210.

[16] TeraData (2012) The Threat Beneath The Surface: Big Data Analytics, Big Security and Real-Time Cyber Threat Response For Federal Agencies. TeraData, Miamisburg, 1-35.

[17] Zhang, X., Liu, C., Surya, N., et al. (2014) Privacy Preservation over Big Data in Cloud Systems. In: Nepal, S. and Pathan, M., Eds., Security, Privacy and Trust in Cloud Systems, Springer, Berlin Heidelberg, 239-257.

[18] 王晓. 大数据环境下机器学习算法趋势研究[J]. 哈尔滨师范大学自然科学学报, 2013, 29(4): 48-50.

[19] 张长水. 机器学习面临的挑战[J]. 中国科学: 信息科学, 2013, 43(12): 1612-1623.

[20] Darwiche, A. (2009) Modeling and Reasoning with Bayesian Networks. Cambridge University Press, Cambridge, 32-35. https://doi.org/10.1017/CBO9780511811357

[21] Pan, J.L. and Yang, Q. (2010) A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 22, 1345-1359. https://doi.org/10.1109/TKDE.2009.191

[22] Bahadori, M.T., Liu, Y. and Zhang, D. (2011) Learning with Minimum Supervision: A General Framework for Transductive Transfer Learning. IEEE International Conference on Data Mining (ICDM), Vancouver, 11-14 December 2011, 61-70. https://doi.org/10.1109/icdm.2011.92

分享
Top