基于随机森林方法的北京市二手房价格研究
Analysis of Beijing Second-Hand House Price Based on Random Forest

作者: 李晓童 , 郭 萱 , 王成杰 :中国石油大学(北京)理学院,北京;

关键词: 二手房房价预测Boostrap抽样决策树随机森林Second-Hand House Housing Forecast Bootstrap Sampling Decision Trees Random Forest Model

摘要:
随着经济的发展和可供开发土地的减少,二手房价一路飙高。截止到2016年5月底,北京城内六区二手房均价已超6万。对二手房价格进行评估预测将对居民生活产生重要影响,也可以给政府宏观调控提供一定参考。目前关于房价的数学模型多使用线性回归模型,神经网络模型和支持向量机模型。线性回归模型中对房价与预测变量线性关系的设定易造成较大误差,神经网络与支持向量机解释性较差。本文针对北京市16,795套在售二手房,对多类别变量建立随机森林模型,进行房价影响因素研究以及房价预测,通过方差解释性变化得到lat (小区所处纬度),long (小区所处经度)和cate (小区所处区域)三个预测变量对房价的影响最为显著,通过随机森林变量重要性输出得到cate,lat和long对房价的影响最大。然后通过00B (out-of bag)样本得到随机森林二手房价格预测精度为0.69。最后将房价数据输入神经网络模型与支持向量机模型,得到房价预测精度分别为:5.15、1.10。结果表明,随机森林预测效果最佳;支持向量机模型次之,预测结果不够稳定;而神经网络预测误差较大,不适用于本文二手房价格预测。

Abstract: With the development of economy and reducing of available land, the price of second-hand house is rising continuously. By the end of May 2016, average price of second-hand house in Beijing has been more than ¥60,000/m2. Evaluating the price of second-hand house will not only produce important influence on residents’ life, but also bring effective reference on the government’s macroeconomic regulation and control. Current mathematical model about housing price includes linear regression model, neural network model (NN) and support vector machine model (SVM). In linear regression model, the suppose of linear relationship may cause more error. NN and SVM are proved to have poor explanatory. Based on the price of 16,795 second-hand houses in Beijing, the random forest model was established to study the influence factors of house price and the forecast of house price. Method of variance explanatory changes shows lat (Residential latitude), long (Residential longitude) and cate (Residential area) are the three main significant prediction variables on housing price, while random forest model picks up cate, lat and long to be the most important. Through analysis of OOB (out-of bag) samples, random forest gets a precision of 0.69 in second-hand housing forecast. Finally, put price data into NN and SVM model and forecast, precision 5.15 and 1.10 were got respectively. The result shows that random forest forecast is the best, followed by SVM. NN prediction does not apply to the second-hand house data in this paper.

文章引用: 李晓童 , 郭 萱 , 王成杰 (2017) 基于随机森林方法的北京市二手房价格研究。 数据挖掘, 7, 37-45. doi: 10.12677/HJDM.2017.72004

参考文献

[1] 仲小瑾. 基于多元线性回归分析法的房地产价格评估[J]. 商业时代, 2014: 133-134.

[2] 李菲, 孙文彬. 灰色理论在商品住宅价格预测中的应用[J]. 辽宁工程大学学报, 2004, 6(3): 271-273.

[3] 张辉. 关于多当今社会BP神经网络的房地产价格评估与研究方向[J]. 房地产导刊, 2013.

[4] 陈静. 基于支持向量机的房地产估价方法研究[D]: [硕士学位论文]. 西安: 长安大学, 2008.

[5] 郭志强. 基于支持向量机回归的房地产批量估价[D]: [硕士学位论文]. 广州: 暨南大学, 2013.

[6] James, G. (2014) An Introduction to Statistical Learning with Applications in R. University of Southern California, 303-324.

[7] 杨沐晞. 基于随机森林模型的二手房价格评估研究[D]: [硕士学位论文]. 长沙: 中南大学, 2012.

分享
Top