﻿ 交叉验证法在模型选择中的应用——以OLS和RR为例

# 交叉验证法在模型选择中的应用——以OLS和RR为例Application of Cross-Validation in Model Selection—Take OLS and RR as Examples

Abstract: This paper reviews the origin and development of cross-validation, and summarizes the previous research results. On this basis, leave-one-out cross-validation is used to solve some problems for model selection. OLS and RR were used to analyze the reaction of acetylene data, establishing appropriate models and selecting the optimal model. At the same time, the rationality and reality of the model selection were discussed.

1. 交叉验证法概述

2. 数据分析

Table 1. Data of acetylene reaction

$A=\left(\begin{array}{ccc}{x}_{1}& {x}_{2}& {x}_{3}\end{array}\right)$ ，利用MATLAB求得x1、x2、x3的相关关系矩阵为：

$Cov\left(A\right)=\left(\begin{array}{ccc}1& 0.2236& -0.9582\\ 0.2236& 1& -0.2402\\ -0.9582& -0.2402& 1\end{array}\right)$

3. 模型分析

${y}_{i}={\beta }_{0}+{\sum }_{s\in S}{x}_{i,s}{\beta }_{s}+{\epsilon }_{i},\text{\hspace{0.17em}}i=1,\cdots ,16$ (1)

 (2)

(一) OLS法模型估计

(二) RR法模型估计

Table 2. Seven models for model estimation with OLS and RR

(三) 留一交叉验证法模型选择 [9]

Table 3. Prediction error of models with OLS and RR

4. 合理性探讨

5. 结语

[1] Larson, S.C. (1931) The Shrinkage of the Coefficient of Multiple Correlation. Journal of Educational Psychology, 22, 45-55.
https://doi.org/10.1037/h0072400

[2] Stone, M. (1974) Cross-Validatory Choice and Assessment of Sta-tistical Prediction. Journal of the Royal Statistical Society: Series B (Methodological), 36, 111-147.
https://doi.org/10.1111/j.2517-6161.1974.tb00994.x

[3] Geisser, S. (1974) A Predictive Approach to the Random Effect Model. Biometrika, 61, 101-107.
https://doi.org/10.1093/biomet/61.1.101

[4] Geisser, S. (1975) The Predictive Sample Reuse Method with Ap-plications. Journal of the American Statistical Association, 70, 320-328.
https://doi.org/10.1080/01621459.1975.10479865

[5] Devroye, L.P. and Wagner, T.J. (1979) Distribution-Free Performance Bounds for Potential Function Rules. IEEE Transactions on Information Theory, 25, 601-604.
https://doi.org/10.1109/TIT.1979.1056087

[6] Shao, J. (1993) Linear Model Selection by Cross-Validation. Journal of the American Statistical Association, 88, 486-494.
https://doi.org/10.1080/01621459.1993.10476299

[7] Dietterich, T. (1998) Approximate Statisitical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation, 10, 1895-1924.
https://doi.org/10.1162/089976698300017197

[8] Hoerl, A.E. and Kennard, R.W. (1970) Ridge Regression: Applications to Nonorthogonal Problems. Technometrics, 12, 69-82.
https://doi.org/10.1080/00401706.1970.10488635

[9] Celisse, A. (2008) Model Selection in Density Estimation via Cross-Validation. Density Estimation, 14, 1-39.

Top