﻿ 非参数回归在房价预测上的应用

# 非参数回归在房价预测上的应用Application of Nonparametric Regression in House Price Forecast

Abstract: Housing is closely related to human life, which is an important part of the total wealth of residents and also affects people’s happiness index to a certain extent. Therefore, great importance is attached to the qualitative and quantitative research on housing prices both at home and abroad. Based on the Boston house price data of Harrison and Rubinfeld, this paper discusses the comparative analysis of OLS regression and nonparametric regression in house price prediction by using R software. The results show that the OLS regression model is against the OLS regression statistical hypothesis, and OLS regression is not in line with the theoretical basis. Based on the characteristics of nonparametric regression, it is more suitable to use nonparametric regression (Lasso regression and Ridge regression) to predict house prices, and Bootstrap method and circulation method are used to select the model. When using multiple linear regression to analyze the data, we can’t ignore the premise hypothesis when the multiple linear regression is established. However, the data in reality are often not ideal, so the applicability of nonparametric regression is wider.

1. 引言

2. 文献综述

3. 研究方法

${y}_{t}={\beta }_{0}+{\beta }_{1}{x}_{t1}+\cdots +{\beta }_{a}{x}_{ta}+{\epsilon }_{t}$

${y}_{t}={\stackrel{^}{\beta }}_{0}+{\stackrel{^}{\beta }}_{1}{x}_{t1}+\cdots +{\stackrel{^}{\beta }}_{a}{x}_{ta}+{\stackrel{^}{\epsilon }}_{t}$

${\stackrel{^}{y}}_{t}={\stackrel{^}{\beta }}_{0}+{\stackrel{^}{\beta }}_{1}{x}_{t1}+\cdots +{\stackrel{^}{\beta }}_{a}{x}_{ta}$

3.1. OLS回归

3.2. 非参数回归

Lasso回归的参数估计方法： $\mathrm{min}{\sum }_{t=1}^{n}{\left\{{y}_{t}-{\beta }_{0}+{\beta }_{1}{x}_{t1}+\cdots +{\beta }_{a}{x}_{ta}\right\}}^{2}+\alpha {\sum }_{j=1}^{d}|{\beta }_{j}|$。Lasso回归会将不显著的变量的系数压缩至0，惩罚力度 $\alpha$ 越大，减少的变量越多，Lasso回归可以起到降维的目的。

Ridge回归的参数估计方法： $\mathrm{min}{\sum }_{t=1}^{n}{\left\{{y}_{t}-{\beta }_{0}+{\beta }_{1}{x}_{t1}+\cdots +{\beta }_{a}{x}_{ta}\right\}}^{2}+\alpha {\sum }_{j=1}^{d}{\beta }_{j}^{2}$。随着惩罚力度 $\alpha$ 的增大，Ridge回归使得预测变量的系数收缩至0但不会变成0 [9]。

3.3. 随机性处理

4. 数据来源及预处理

4.1. 数据来源

4.2. 数据预处理

5. 多元线性回归模型

5.1. OLS回归

5.1.1. 拟合OLS回归模型

Table 1. The coefficients and significance of OLS regression (1) and (2) models

$\begin{array}{c}\stackrel{^}{MEDV}=50.04133-0.10626\ast CRIM+2.77260\ast CHAS-20.87625\ast NOX\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+4.34167\ast RM-1.26070*DIS+0.27666\ast RAD-3.73767\ast TAX\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}-1.10388\ast PTRATIO+1.36656\ast B-0.48774\ast LSTAT\end{array}$

$\text{Adjusted}R\text{-squared}=0.7223$

5.1.2. OLS回归统计假设检验

(1) 从图1 Normal Q-Q上可以看出，标准化残差散点大部分都没有落在45˚角的直线上，且散点双侧严重偏离直线，违反了OLS回归残差服从正态性的假设。

(2) 从图1 Residuals vs Fitted上可以看出，残差值和拟合值有明显的曲线关系，这说明残差项里面还存在着未被提取出来的与拟合值线性相关的变量，即违反了多元线性回归自变量和因变量线性相关的假设。

(3) 从图1 Scale-Location上可以看出，残差方差随着拟合值水平的变化而变化，标准化残差散点并不是随机分布的，即该多元线性回归违反了同方差性。

(4) 使用Durbin-Watson检验函数检验残差的序列相关性，检验结果显示D-W统计量的值为1.742645，p值为0.024。在5%的显著性水平下，拒绝残差值之间相互独立的原假设，残差值之间是相关的，违反了多元线性回归残差独立性的统计假设。

Figure 1. OLS regression diagnosis chart

5.2. 非参数回归

5.2.1. Lasso回归

Figure 2. Model error of different lambda values based on Lasso regression

Figure 3. Coefficients of predictive variables in Lasso regression

5.2.2. Ridge回归

Figure 4. Model error of different lambda values based on Ridge regression

Figure 5. Coefficients of predictive variables in Ridge regression

5.3. 模型选择

Figure 6. The mean value of MSE difference (VS) between Ridge and Lasso regression obtained by self-help method

5.4. 模型的拟合效果

$\begin{array}{c}\stackrel{^}{MEDV}=16.92617-0.00816\ast CRIM+1.66474\ast CHAS+4.26095\ast RM\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}-0.15784\ast DIS-0.79626\ast TAX-0.71296\ast PTRATIO\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+0.66186\ast B-0.52029\ast LSTAT\end{array}$

Figure 7. The fitting effect of Lasso regression

6. 结论

[1] 易成栋, 任建宇, 高璇. 房价、住房不平等与居民幸福感——基于中国综合社会调查2005、2015年数据的实证研究[J]. 中央财经大学学报, 2020(6): 105-117.

[2] 周佳琪, 金百锁. 基于空间网络自回归变点模型的合肥市房地产价格影响因素分析[J]. 中国科学院大学学报, 2020, 37(3): 398-404.

[3] 薛建谱, 王卫华. 基于均衡模型的我国商品房价格影响因素分析[J]. 统计与决策, 2013(22): 118-121.

[4] 范允奇, 王艺明. 中国房价影响因素的区域差异与时序变化研究[J]. 贵州财经大学学报, 2014(1): 62-67.

[5] Malpezzi, S. (1999) A Simple Error Correction Model of House Prices. Journal of Housing Economics, 8, 27-62.
https://doi.org/10.1006/jhec.1999.0240

[6] 邬嘉怡, 王思玉, 史宏炜, 李虎森, 楼凯达, 崔丽鸿. 基于多小波的北京市房屋市场价格的分析预测[J]. 北京化工大学学报(自然科学版), 2019, 46(5): 101-106.

[7] 唐晓彬, 张瑞, 刘立新. 基于蝙蝠算法SVR模型的北京市二手房价预测研究[J]. 统计研究, 2018, 35(11): 71-81.

[8] 黄文, 王正林. 数据挖掘-R语言实战[M]. 北京: 电子工业出版社, 2014: 160-169.

[9] 张守一, 葛新权, 王斌. 非参数回归及其应用[J]. 数量经济技术经济研究, 1997(10): 60-65+87.

Top