﻿ 线性回归模型中响应值的选取对二分类问题的影响

# 线性回归模型中响应值的选取对二分类问题的影响The Effects of Different Response Values in Linear Regression Model on Binary Classification

Abstract: We use the multiple linear regression model to deal with the classification problem of two popula-tions. Firstly, we assign the response variables and some corresponding values with certain rules, and then construct discriminant function and criterion via least square method. On this basis, we discuss the effects of different response values on classification for balanced and unbalanced data in linear model. In addition, we compare the mentioned discriminant method above with classic discriminant methods including the classical Mahalanobis distance discriminant and Bayes dis-criminant. At last, we find the inner relation between these methods as well as their advantages and disadvantages.

[1] 张尧庭, 方开泰 (1988) 多元统计分析引论. 科学出版社, 北京.

[2] Hastie, T., Tibshirani, R. and Friedman, J. (2009) Elements of statistical learning: data mining, inference and prediction. 2nd Edition, Springer, Berlin.

[3] Mai, Q. and Zou, H. (2012) A direct approach to sparse discriminant analysis in ultra-high dimensions. Biometrika, 99, 29-42.

[4] Fan, J.Q., Feng, Y. and Tong, X. (2012) A road to classification in high dimensional space. Journal of the Royal Statistical Society, Series B, Statistical Methodology, 74, 745-771.

[5] 邰淑彩, 孙韫玉, 何娟娟 (2005) 应用数理统计(第二版). 武汉大学出版社, 武汉.

[6] Breiman, L. and Spector, P. (1992) Submodel selection and evaluation in regression: the x-random case. International Statistical Review, 60, 291-319.

[7] Weiss, G.M. and Provost, F. (2003) Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19, 315-354.

[8] Kubat, M., Holte, R. and Matwin, S. (1998) Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 30, 195-215.

[9] Lewis, D. and Gale, W. (1994) Training text classifiers by uncertainty sampling. Proceedings of ACM-SIGIR Conference on Information Retrieval, New York, 73-79.

[10] 陶新民, 郝思媛, 张冬雪, 徐鹏 (2013) 不均衡数据分类算法的综述. 重庆邮电大学学报(自然科学版), 1, 106- 108.

[11] Tibshirani, R.J. (1996) Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, Series B, 58, 267-288.

[12] Fan, J. and Li, R. (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348-1360.

[13] Fan, J. and Lv, J. (2008) Sure independence screening for ultrahigh dimensional feature space (with discussion). Journal of the Royal Statistical Society, Series B, 70, 849-911.

[14] Fan, J. and Lv, J. (2010) A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20, 101-148.

[15] 薛毅, 陈立萍 (2007) 统计建模与R软件. 清华大学出版社, 北京.

Top