线性回归模型中响应值的选取对二分类问题的影响
The Effects of Different Response Values in Linear Regression Model on Binary Classification

作者: 王小英 , 杨岩丽 , 陈常龙 :华北电力大学数理学院,北京;

关键词: 二分类问题响应值选取判别分析线性回归模型最小二乘法Binary Classification Response Values Discriminant Analysis Linear Regression Model Least Square

摘要:
我们利用多元线性回归模型处理两个总体的分类问题,首先对响应变量按一定的规则赋值,并在最小二乘法的基础上构建判别函数及判别准则,进而论证了响应值的选取对平衡及不平衡数据二分类问题的影响。此外,我们将此判别方法与经典判别分析方法如:经典马氏距离判别法、Bayes判别法进行比较,并得到它们之间的内在联系及优缺点。

Abstract: We use the multiple linear regression model to deal with the classification problem of two popula-tions. Firstly, we assign the response variables and some corresponding values with certain rules, and then construct discriminant function and criterion via least square method. On this basis, we discuss the effects of different response values on classification for balanced and unbalanced data in linear model. In addition, we compare the mentioned discriminant method above with classic discriminant methods including the classical Mahalanobis distance discriminant and Bayes dis-criminant. At last, we find the inner relation between these methods as well as their advantages and disadvantages.

文章引用: 王小英 , 杨岩丽 , 陈常龙 (2015) 线性回归模型中响应值的选取对二分类问题的影响。 统计学与应用, 4, 47-55. doi: 10.12677/SA.2015.42007

参考文献

[1] 张尧庭, 方开泰 (1988) 多元统计分析引论. 科学出版社, 北京.

[2] Hastie, T., Tibshirani, R. and Friedman, J. (2009) Elements of statistical learning: data mining, inference and prediction. 2nd Edition, Springer, Berlin.

[3] Mai, Q. and Zou, H. (2012) A direct approach to sparse discriminant analysis in ultra-high dimensions. Biometrika, 99, 29-42.

[4] Fan, J.Q., Feng, Y. and Tong, X. (2012) A road to classification in high dimensional space. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 74, 745-771.

[5] 邰淑彩, 孙韫玉, 何娟娟 (2005) 应用数理统计(第二版). 武汉大学出版社, 武汉.

[6] Breiman, L. and Spector, P. (1992) Submodel selection and evaluation in regression: the x-random case. International Statistical Review, 60, 291-319.

[7] Weiss, G.M. and Provost, F. (2003) Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19, 315-354.

[8] Kubat, M., Holte, R. and Matwin, S. (1998) Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 30, 195-215.

[9] Lewis, D. and Gale, W. (1994) Training text classifiers by uncertainty sampling. Proceedings of ACM-SIGIR Conference on Information Retrieval, New York, 73-79.

[10] 陶新民, 郝思媛, 张冬雪, 徐鹏 (2013) 不均衡数据分类算法的综述. 重庆邮电大学学报(自然科学版), 1, 106- 108.

[11] Tibshirani, R.J. (1996) Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, Series B, 58, 267-288.

[12] Fan, J. and Li, R. (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348-1360.

[13] Fan, J. and Lv, J. (2008) Sure independence screening for ultrahigh dimensional feature space (with discussion). Journal of the Royal Statistical Society, Series B, 70, 849-911.

[14] Fan, J. and Lv, J. (2010) A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20, 101-148.

[15] 薛毅, 陈立萍 (2007) 统计建模与R软件. 清华大学出版社, 北京.

分享
Top