﻿ 基于改进的KNN算法的家政服务行业人单匹配

# 基于改进的KNN算法的家政服务行业人单匹配Person Service Items Fit Based on the Novel KNN Algorithm in Household Service Industry

Abstract: In the Internet era, the domestic service industry generally has problems such as low customer sa-tisfaction and low professionalism of housekeeper, which hinders the development of the industry. Under these circumstances, the reasons for this can be found that the mismatching between the domestic housekeeper and the service items is the most fundamental reason. Based on this back-ground, this paper uses KNN, a distance-based algorithm in the big data environment, to match the housekeeping staff with the service project, and improve it by the sample distance weight based on the traditional KNN algorithm. An improved person service items fit model is obtained. Expe-riments show that the improved person service items fit model has an accuracy rate of 69.36% compared with the traditional person service items fit model based on the traditional KNN algo-rithm. The classification result is better and the error rate is lower. It can match the housekeeper and the service project well and promote the professional training of housekeeping staff, so that it will enhance customer satisfaction and promotes the long-term development of the domestic ser-vice industry.

1. 引言

1.1. 研究背景

1.2. 文献回顾

1.3. KNN分类算法

KNN算法是大数据环境下经典的分类算法，理论相对比较成熟而且比较简单易懂。

Nguyen等人以KNN算法为基础，对越南奎霍普热带丘陵区降雨诱发的浅层滑坡进行空间预测，将坡度、坡长、坡向、土壤类型等12个影响因素作为特征属性，预测是否会发生滑坡，并将其与支持向量机等算法的结果进行对比，发现k近邻模型优于支持向量机等模型 [7]。王波，程福云根据时间序列模型构建了一种基于KNN算法的股票预测模型，模型相较普通的时间序列模型更为简单，且更为精确 [8]。Srividya等人提出应用KNN识别高中生、大学生和职场人士等不同群体的心理健康状态，以此监测具有异常行为的个人 [9]。

2. KNN算法

2.1. KNN算法分类流程

KNN算法是一种距离类大数据分类算法，主要是以训练样本与测试样本的距离为条件进行分类。主要分类过程如下：

(1) 收集数据集，准备数据，对数据进行预处理，将数据进行清洗及标准化，使样本数据处于同一个量纲上。

(2) 将数据集分为训练集和测试集，即分离数据集。

(3) 设定参数k，选用距离公式。

(4) 计算测试样本到这训练样本的距离，并取前k个距离最近的样本作为测试样本的k个近邻。k个近邻中出现频率最高的样本的类别标签就是该测试样本的类别标签，以此为依据，得出各个测试样本的类别标签。

2.2. KNN算法的改进

${\mu }_{ic}=\left\{\begin{array}{l}0.51+0.49\left(\frac{{n}_{i}}{k}\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}如果训练样本本身的类别标签为{l}_{c}\\ 0.49\left(\frac{{n}_{i}}{k}\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}其他情况\end{array}$ (1)

${\mu }_{c}\left({y}_{j}\right)=\frac{\underset{1}{\overset{k}{\sum }}{\mu }_{ic}\frac{1}{\text{dist}{\left({y}_{j},{x}_{i}\right)}^{\frac{2}{b-1}}}}{\underset{1}{\overset{k}{\sum }}\frac{1}{\text{dist}{\left({y}_{j},{x}_{i}\right)}^{\frac{2}{b-1}}}}$ (2)

3. 基于改进的KNN算法的家政服务人单匹配研究

3.1. 数据预处理

Table 1. Basic information of data set

${X}_{1}=\left\{\begin{array}{l}0，户籍为农村\\ 1，户籍为城镇\end{array}$${X}_{5}=\left\{\begin{array}{l}0，性别为女\\ 1，性别为男\end{array}$。而数值型数据则用 $x=\frac{{x}_{j}-{x}_{\mathrm{min}}}{{x}_{\mathrm{max}}-{x}_{\mathrm{min}}}$ 将数据归一化到[0, 1]之间。

Table 2. Standardization of household service personnel data

3.2. 模型分类准确度的计算

Table 3. Comparison of accuracy

4. 研究总结

[1] 陆子平, 聂鸿飞. 我国第三方物流企业顾客满意度评价体系的实证分析[J]. 物流技术, 2014, 33(1): 166-168+177.

[2] 杨浩雄, 王雯. 第三方物流企业顾客满意度测评体系研究[J]. 管理评论, 2015, 27(1): 181-193.

[3] 解芳. 快递服务质量与顾客再次购买意愿关系实证研究——基于顾客信任的中介作用[J]. 财经理论与实践, 2016, 37(3): 123-127.

[4] Weller, I., Hymer, C.B., Nyberg, A.J. and Ebert, J. (2019) How Matching Creates Value: Cogs and Wheels for Human Capital Resources Research. Academy of Management Annals, 13, 188-214.
https://doi.org/10.5465/annals.2016.0117

[5] 杨续昌, 陈友玲, 兰桂花, 等. 基于聚类分析和双边匹配的产品开发任务分配方法[J]. 计算机集成制造系统, 2017, 23(4): 717-725.

[6] 朱丽娜. 基于熵权理论的人岗双边匹配决策[J]. 吉林金融研究, 2018(4): 61-67.

[7] Bui, D.T., Nguyen, Q.P., Hoang, N.D., et al. (2016) A Novel Fuzzy K-Nearest Neighbor Inference Model with Differential Evolution for Spatial Prediction of Rainfall-Induced Shallow Landslides in a Tropical Hilly Area Using GIS. Landslides, 14, 1-17.
https://doi.org/10.1007/s10346-016-0708-4

[8] 王波, 程福云. KNN算法在股票预测中的应用[J]. 科技创业月刊, 2015, 28(16): 25-26.

[9] Srividya, M., Mohanavalli, S. and Bhalaji, N. (2018) Behavioral Modeling for Mental Health Using Machine Learning Algorithms. Journal of Medical Systems, 42, 88.
https://doi.org/10.1007/s10916-018-0934-5

[10] Keller, J.M., Gray, M.R. and Givens, J.A. (2012) A Fuzzy K-Nearest Neighbor Algorithm. IEEE Transactions on Systems Man & Cybernetics, SMC-15, 580-585.
https://doi.org/10.1109/TSMC.1985.6313426

Top