﻿ 基于链分解的多标签分类属性约简

# 基于链分解的多标签分类属性约简Attribute Reduction for Multi-Label Classification Based on Chain Decomposition

Abstract: In this paper, a new multi-label attribute reduction algorithm based on the chain decomposition is proposed. Considering the correlation between the labels, the labels are sorted. According to the sorting method, the multi-label problem is decomposed into a single-label problem chain. For each sub-problem, the lower approximation, the positive region and the dependency are redefined by the rough set method, and the attributes are reduced. Experimental results show that the algo-rithm can remove most of the redundant attributes without reducing the classification accuracy.

1. 引言

2. 多标签链分解

$X=\left\{{x}_{1},{x}_{2},{x}_{3},\cdots ,{x}_{n}\right\}$ 为样本集， $Y=\left\{{{y}^{\prime }}_{1},{{y}^{\prime }}_{2},{{y}^{\prime }}_{3},\cdots ,{{y}^{\prime }}_{m}\right\}$ 为标签集， $A=\left\{{a}_{1},{a}_{2},{a}_{3},\cdots ,{a}_{d}\right\}$ 代表属性集合。我们可以将多标签数据集表示为 $\left(X,A,Y\right)$。对每一个属性 $a\in A$，样本 ${x}_{i}$ 在属性a上的取值记为 $a\left({x}_{i}\right)$。对每个标签 ${{y}^{\prime }}_{j}\in Y$，样本 ${x}_{i}$ 的标签值为 ${{y}^{\prime }}_{j}\left({x}_{i}\right)$，如果 ${x}_{i}$ 具有标签 ${{y}^{\prime }}_{j}$${{y}^{\prime }}_{j}\left({x}_{i}\right)=1$ 否则为0。下面我们给出两种标签排序的方法。

${\left[{x}_{i}\right]}_{A}^{\epsilon }=\left\{x\in X:|a\left({x}_{i}\right)-a\left(x\right)|\le \epsilon ,a\in A\right\}$

${T}_{j}\left({x}_{i}\right)=\underset{x\in {\left[{x}_{i}\right]}_{A}^{\epsilon }}{\sum }{{y}^{\prime }}_{j}\left(x\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}j=1,2,\cdots ,m,\text{\hspace{0.17em}}\text{\hspace{0.17em}}i=1,2,\cdots ,n.$

$rank\left({{y}^{\prime }}_{j}\right)=\frac{{\sum }_{i=1}^{n}{T}_{j}\left({x}_{i}\right)}{n}$

${X}_{j}=\left\{{x}_{i}|{{y}^{\prime }}_{j}\left({x}_{i}\right)=1,1\le i\le n\right\},\text{\hspace{0.17em}}\text{\hspace{0.17em}}j=1,2,\cdots ,m.$

${A}_{1}=A\cup \left\{{y}_{1}\right\}$

${A}_{j}={A}_{j-1}\cup \left\{{y}_{j}\right\},\text{\hspace{0.17em}}j=2,3,\cdots ,m.$

3. 属性约简

${E}_{j}=\left\{x\in X:{y}_{j}\left(x\right)=1\right\},\text{ }\text{​}\text{ }\text{\hspace{0.17em}}j=1,2,\cdots ,m.$

$E=\left\{{E}_{j}|j=1,2,\cdots ,m\right\}.$

${B}_{1}=B\cup \left\{{y}_{1}\right\}$

${B}_{j}={B}_{j-1}\cup \left\{{y}_{j}\right\},\text{\hspace{0.17em}}\text{\hspace{0.17em}}j=1,2,\cdots ,m.$

${\left[{x}_{i}\right]}_{{B}_{j}}^{\epsilon }=\left\{x\in X:|a\left({x}_{i}\right)-a\left(x\right)|\le \epsilon ,|{y}_{j}\left({x}_{i}\right)-{y}_{j}\left(x\right)|\le \epsilon ,a,{y}_{j}\in {B}_{j}\right\}.$

${\underset{_}{R}}_{{B}_{j}}^{\epsilon }\left({E}_{j}\right)=\left\{{x}_{i}\in X:{\left[{x}_{i}\right]}_{{B}_{j}}^{\epsilon }\subseteq {E}_{j}\right\}.$

$PO{S}_{B}^{\epsilon }\left(E\right)=\underset{{E}_{j}\in E}{\cup }{\underset{_}{R}}_{{B}_{j}}^{\epsilon }\left({E}_{j}\right).$

$PO{S}_{{B}^{\prime }}^{\epsilon }\left(E\right)\subseteq PO{S}_{B}^{\epsilon }\left( E \right)$

1) $PO{S}_{B}^{\epsilon }\left(E\right)=PO{S}_{A}^{\epsilon }\left(E\right)$

2) $PO{S}_{{B}^{\prime }}^{\epsilon }\left(E\right)\ne PO{S}_{A}^{\epsilon }\left(E\right)$ 其中 ${B}^{\prime }\subset B$

${\gamma }_{B}^{\epsilon }\left(E\right)=\frac{1}{m}\underset{j=1}{\overset{m}{\sum }}\frac{|{\underset{_}{R}}_{B}^{\epsilon }\left({E}_{j}\right)|}{|X|}$

${\gamma }_{{B}^{\prime }}^{\epsilon }\left(E\right)\le {\gamma }_{B}^{\epsilon }\left( E \right)$

1) ${\gamma }_{B}^{\epsilon }\left(E\right)={\gamma }_{A}^{\epsilon }\left(E\right)$

2) 对于任意的 ${B}^{\prime }\subset B$${\gamma }_{{B}^{\prime }}^{\epsilon }\left(E\right)\ne {\gamma }_{A}^{\epsilon }\left(E\right)$

$Si{g}_{in}\left({a}_{i},B\right)={\gamma }_{B}^{\epsilon }\left(E\right)-{\gamma }_{\left(B-{a}_{i}\right)}^{\epsilon }\left(E\right).$

4. 实验结果

Table 1. Multi-label data sets

Hamming Loss表示测试样本中被误分类的标签在样本所有标签中占的比例。值越小分类能力越强。显然，评价指标与数据集中原始标签的数量密切相关，当数据集中标签数量增加时，其值可能会增加。因此，对于不同的数据集，它是不可比较的。表2的第2列给出了原始数据的Hamming Loss值。第3至第6列中分别是五种算法的简化数据值。从表中可以看出，该算法在数据集CAL500、Yeast、Genbase上的性能明显优于其他算法。而在数据集Medical上与最优算法也仅相差0.004。

Table 2. Hamming loss

Table 3. Coverage

Coverage用于考察在样本的类别标签排序序列中，覆盖所有相关标签所需要的搜索深度情况，该指标取值越小性能越优。表3对于Coverage在给定的5个数据集中取得较好的效果，特别是在数据集Yeast上算法CDDR显示出较大的优势，同时在其他2个数据上CDDR的性能与最优算法性能也是非常接近的，没有太多差异。

5. 总结

[1] Read, J., Pfahringer, B., Holmes, G. and Frank, E. (2009) Classifier Chains for Multi-Label Classification. In: Buntine, W., Grobelnik, M. and Shawe-Taylor, J., Eds., Lecture Notes in Artificial Intelligence 5782, Springer, Berlin, 254-269.
https://doi.org/10.1007/978-3-642-04174-7_17

[2] Read, J., Pfahringer, B., Holmes, G. and Frank, E. (2011) Classifier Chains for Multi-Label Classification. Machine. Learning, 85, 333-359.
https://doi.org/10.1007/s10994-011-5256-5

[3] Pawlak, Z. (2011) Rough Sets. International Journal of Computer and Information Sciences, 11, 34l-356.
https://doi.org/10.1007/BF01001956

[4] Hu, Q.H., Yu, D.R., Liu, J.F. and Wu, C. (2008) Neighborhood Rough Set Based Heterogeneous Feature Subset Selection. Information Sciences, 178, 3577-3594.
https://doi.org/10.1016/j.ins.2008.05.024

[5] Li, H., Li, D., Zhai, Y., Wang, S. and Zhang, J. (2016) A Novel At-tribute Reduction Approach for Multi-Label Data Based on Rough Set Theory. Information Sciences, 367-368, 827-847.
https://doi.org/10.1016/j.ins.2016.07.008

[6] Lin, Y., Li, Y., Wang, C. and Chen, J. (2018) Attribute Reduction for Multi-Label Learning with Fuzzy Rough Set. Knowledge-Based Systems, 152, 51-61.
https://doi.org/10.1016/j.knosys.2018.04.004

[7] Liu, J., Lin, Y., Li, Y., Weng, W. and Wu, S. (2018) Online Multi-Label Streaming Feature Selection Based on Neighborhood Rough Set. Pattern Recognition, 84, 273-287.
https://doi.org/10.1016/j.patcog.2018.07.021

[8] Lin, Y., Hua, Q., Liu, J., Chen, J. and Duan, J. (2016) Mul-ti-Label Feature Selection Based on Neighborhood Mutual Information. Applied Soft Computing, 38, 244-256.
https://doi.org/10.1016/j.asoc.2015.10.009

[9] Lin, Y., Li, Y., Wang, C. and Chen, J. (2018) Attribute Reduction for Multi-Label Learning with Fuzzy Rough Set. Knowledge-Based Systems, 152, 51-61.
https://doi.org/10.1016/j.knosys.2018.04.004

[10] Fan, X., Chen, Q., Qiao, Z., Wang, C. and Ten, M. (2020) At-tribute Reduction for Multi-Label Classification Based on Labels of Positive Region. Soft Computing, 24, 14039-14049.
https://doi.org/10.1007/s00500-020-04780-4

Top