﻿ 基于关系结构的面板数据聚类方法研究

# 基于关系结构的面板数据聚类方法研究Research on Clustering Method Based on Relationship Structure of Panel Data

Abstract: This paper studies the panel data clustering method, and proposes a clustering method based on the structural relationship between the influence and response of the panel data variables. The linear relationship, the nonlinear relationship, the multi-index based on the trajectory feature and the shape feature are discussed respectively. This paper divides the data with the same structural relationship into the same class, and divides the data with different relationship structures into different classes, so that the classes have the same or similar structural relationships and trajectory characteristics, and the structural relationships and trajectory characteristics of the data between classes and classes big different.

1. 问题提出

${y}_{it}=\underset{j=1}{\overset{p}{\sum }}{\beta }_{i,j}{y}_{i,t-j}+{\alpha }_{i}+{\epsilon }_{it},\text{\hspace{0.17em}}|{\beta }_{i}|<1$ (1)

2. 基于参数关系面板数据聚类

2.1. 基于线性关系面板数据相似性度量

${y}_{it}^{\left(1\right)}=f\left({y}_{it}^{\left(2\right)},{y}_{it}^{\left(3\right)},\cdots ,{y}_{it}^{\left(m\right)}\right)+{\epsilon }_{it}$, (2)

${r}_{il}^{\left(1\right)}=\frac{\mathrm{cov}\left({y}_{it}^{\left(1\right)},{y}_{it}^{\left(l\right)}\right)}{\sqrt{\mathrm{var}\left({y}_{it}^{\left(1\right)}\right)}\sqrt{\mathrm{var}\left({y}_{it}^{\left(l\right)}\right)}},l=2,\cdots ,m$

$\mathrm{cov}\left({y}_{it}^{\left(1\right)},{y}_{it}^{\left(l\right)}\right)=\frac{1}{T}\underset{t=1}{\overset{T}{\sum }}\left({y}_{it}^{\left(1\right)}-{\stackrel{¯}{y}}_{i}^{\left(1\right)}\right)\left({y}_{it}^{\left(l\right)}-{\stackrel{¯}{y}}_{i}^{\left(l\right)}\right)$,

${\stackrel{¯}{y}}_{i}^{\left(1\right)}\text{=}\frac{1}{T}\underset{t=1}{\overset{T}{\sum }}{y}_{it}^{\left(1\right)},\text{\hspace{0.17em}}{\stackrel{¯}{y}}_{i}^{\left(l\right)}\text{=}\frac{1}{T}\underset{t=1}{\overset{T}{\sum }}{y}_{it}^{\left(l\right)}$.

${G}_{i}=\left({\left({y}_{i}^{1}\right)}^{\prime },{\left({y}_{i}^{2}\right)}^{\prime },\cdots ,{\left({y}_{i}^{m}\right)}^{\prime }\right),\text{\hspace{0.17em}}i=1,2,\cdots ,N$,

${y}_{i}^{k}=\left({y}_{i1}^{k},\cdots ,{y}_{iT}^{k}\right),\text{\hspace{0.17em}}t=1,2,\cdots ,T$.

${d}_{ij}=\sqrt{\left({r}_{i}^{\left(1\right)}-{r}_{j}^{\left(1\right)}\right){\left({r}_{i}^{\left(1\right)}-{r}_{j}^{\left(1\right)}\right)}^{\prime }}$, (3)

2.2. 非线性结构关系相似性度量

${y}_{it}^{\left(1\right)}=f\left({y}_{it}^{\left(2\right)},{y}_{it}^{\left(3\right)},\cdots ,{y}_{it}^{\left(m\right)}\right)+{\epsilon }_{it}$, (4)

$\frac{\text{d}{y}_{it}^{\left(1\right)}}{\text{d}t}=\frac{\partial {y}_{it}^{\left(1\right)}}{\partial {y}_{it-l}^{\left(2\right)}}\frac{\text{d}{y}_{it}^{\left(2\right)}}{\text{d}t}+\frac{\partial {y}_{it}^{\left(1\right)}}{\partial {y}_{it-l}^{\left(3\right)}}\frac{\text{d}{y}_{it}^{\left(3\right)}}{\text{d}t}+\cdots +\frac{\partial {y}_{it}^{\left(1\right)}}{\partial {y}_{it-l}^{\left(m\right)}}\frac{\text{d}{y}_{it}^{\left(m\right)}}{\text{d}t}$,

${y}_{it}^{\left(1\right)}=f\left({y}_{it}^{\left(2\right)},{y}_{it}^{\left(3\right)},\cdots ,{y}_{it}^{\left(m\right)}\right)+{\epsilon }_{it},\text{\hspace{0.17em}}i=1,2\cdots ,N$,

${y}_{jt}^{\left(1\right)}=f\left({y}_{jt}^{\left(2\right)},{y}_{jt}^{\left(3\right)},\cdots ,{y}_{jt}^{\left(m\right)}\right)+{\epsilon }_{jt},\text{\hspace{0.17em}}j=1,2,\cdots ,N$,

$\frac{\partial {y}_{it}^{\left(1\right)}}{\partial {y}_{it-l}^{\left(k\right)}}\approx \frac{\partial {y}_{jt}^{\left(1\right)}}{\partial {y}_{jt-l}^{\left(k\right)}},\text{\hspace{0.17em}}k=2,3,\cdots ,m$. (5)

$\frac{\Delta {y}_{it}^{\left(1\right)}}{\Delta {y}_{it-l}^{\left(2\right)}}\approx \frac{\Delta {y}_{jt}^{\left(1\right)}}{\Delta {y}_{jt-l}^{\left(2\right)}}$. (6)

$\frac{\Delta {y}_{it}^{\left(1\right)}}{\Delta {y}_{it-l}^{\left(2\right)}}\approx \frac{\Delta {y}_{jt}^{\left(1\right)}}{\Delta {y}_{jt-l}^{\left(2\right)}},\frac{\Delta {y}_{it}^{\left(1\right)}}{\Delta {y}_{it-l}^{\left(3\right)}}\approx \frac{\Delta {y}_{jt}^{\left(1\right)}}{\Delta {y}_{jt-l}^{\left(3\right)}},\cdots ,\frac{\Delta {y}_{it}^{\left(1\right)}}{\Delta {y}_{it-l}^{\left(m\right)}}\approx \frac{\Delta {y}_{jt}^{\left(1\right)}}{\Delta {y}_{jt-l}^{\left(m\right)}}$. (7)

${d}_{ij}=\frac{1}{T}\underset{t=1}{\overset{T}{\sum }}\sqrt{{\left(\frac{\Delta {y}_{it}^{\left(1\right)}}{\Delta {y}_{it-l}^{\left(2\right)}}-\frac{\Delta {y}_{jt}^{\left(1\right)}}{\Delta {y}_{jt-l}^{\left(2\right)}}\right)}^{2}+{\left(\frac{\Delta {y}_{it}^{\left(1\right)}}{\Delta {y}_{it-l}^{\left(3\right)}}-\frac{\Delta {y}_{jt}^{\left(1\right)}}{\Delta {y}_{jt-l}^{\left(3\right)}}\right)}^{2}+{\left(\frac{\Delta {y}_{it}^{\left(1\right)}}{\Delta {y}_{it-l}^{\left(m\right)}}-\frac{\Delta {y}_{jt}^{\left(1\right)}}{\Delta {y}_{jt-l}^{\left(m\right)}}\right)}^{2}}$, (8)

2.3. 结构关系数据聚类

1) 初始聚类点确定

${D}_{\left(0\right)}=\left[\begin{array}{ccccc}0& {d}_{12}& {d}_{13}& \cdots & {d}_{1N}\\ 0& 0& {d}_{23}& \cdots & {d}_{2N}\\ ⋮& ⋮& ⋮& \ddots & ⋮\\ 0& 0& 0& \cdots & {d}_{\left(N-1\right)N}\\ 0& 0& 0& \cdots & 0\end{array}\right]$,

2) 聚合规则

${d}_{rl}^{2}=\mathrm{min}\left\{{d}_{i,j}^{2},i\in {G}_{i},i=1,2,\cdots ,K,j\notin {G}_{i},j=1,2,N-K\right\}$. (9)

3. 基于轨迹特征的面板数据关系聚类

3.1. 离散数据平滑处理

$‖{y}_{it}-{f}_{i}\left(t\right)‖=0$,

$0<‖{y}_{it}-{f}_{i}\left(t\right)‖<{\epsilon }_{it}$,

3.2. 基函数确定

${y}_{it}=\underset{k=1}{\overset{K}{\sum }}{\alpha }_{k}{\phi }_{k}\left(t\right),\text{\hspace{0.17em}}\left(k=1,2,\cdots ,K\right)$

3.3. 基函数系数向量的估计

$SSE\left(y/\alpha \right)={\left(y-\Phi \alpha \right)}^{\prime }\left(y-\Phi \alpha \right)$, (10)

$\alpha ={\left({\Phi }^{\prime }\Phi \right)}^{-1}{\Phi }^{\prime }y$.

3.4. 基于符号表示的相似性度量

${y}_{it}={y}_{i}\left(t\right)+{u}_{it}=\underset{k=1}{\overset{p}{\sum }}{\varphi }_{ik}{y}_{i}\left(t-k\right)+\underset{l=1}{\overset{q}{\sum }}{\theta }_{il}{\epsilon }_{i}\left(t-l\right)$. (11)

$M\left(f\left(t\right)\right)=\left({m}_{1},{m}_{2},\cdots ,{m}_{V}\right)$,

${m}_{v}=\left\{\begin{array}{l}1,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{ }\text{ }第v个顶点在\text{\hspace{0.17em}}f\left(t\right)\text{\hspace{0.17em}}领域内达到极大值点\\ 0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}第v个顶点在\text{\hspace{0.17em}}f\left(t\right)\text{\hspace{0.17em}}领域内达到极小值点\end{array},\text{\hspace{0.17em}}v=1,2,\cdots ,V$

$M\left(f\left(t\right)\right)$ 在一定领域内的顶点个数为 $|M\left(f\left(t\right)\right)|$

$|M\left({f}_{i}\left(t\right)\cap {f}_{j}\left(t\right)\right)|$,

$D\left({f}_{i}\left(t\right),{f}_{j}\left(t\right)\right)=|M\left({f}_{i}\left(t\right)\right)|+|{f}_{j}\left(t\right)|-|M\left({f}_{i}\left(t\right)\cap {f}_{j}\left(t\right)\right)|$. (12)

Salvatore Ingrassia (2003)已经证明该距离满足对称性，正定性，三角不等式关系，即：

$D\left({f}_{i}\left(t\right),{f}_{j}\left(t\right)\right)\ge 0$,

$D\left({f}_{i}\left(t\right),{f}_{j}\left(t\right)\right)=D\left({f}_{j}\left(t\right),{f}_{i}\left(t\right)\right)$,

$D\left({f}_{i}\left(t\right),{f}_{j}\left(t\right)\right)\le D\left({f}_{i}\left(t\right),{f}_{k}\left(t\right)\right)+D\left({f}_{k}\left(t\right),{f}_{j}\left(t\right)\right)$.

3.5. 基于形状特征的函数性数据聚类

$\left\{{y}_{i}^{k}\left(t\right),k=1,2,\cdots ,K\right\}$, (13)

${D}^{r}\left({y}_{i}\left(t\right)\right)=\mathrm{min}\left\{D\left({y}_{i}\left(t\right),{y}_{i}^{k}\left(t\right)\right),k=1,2,\cdots ,K\right\}$,

$D\left({y}_{i}\left(t\right),{y}_{i}^{k}\left(t\right)\right)=|M\left({y}_{i}\left(t\right)\right)|+|{y}_{j}^{k}\left(t\right)|-|M\left({y}_{i}\left(t\right)\cap {y}_{j}^{k}\right)|$

4. 小结

[1] Ökonomie, R. (2009) Panel VAR Models with Spatial Dependence. Economics Series, No. 1, 237-275.

[2] Verdier, V. (2016) Estimation of Dynamic Panel Data Models with Cross-Sectional Dependence: Using Cluster Dependence for Efficiency. Journal of Applied Econometrics, 31, 85-105.
https://doi.org/10.1002/jae.2486

[3] 姜超. 多指标面板数据聚类的SAS实现[J].经济研究导刊, 2013(26):255-258.

[4] 刘翠霞, 史代敏. 基于关系聚类的动态面板数据模型及其应用研究[J]. 统计与信息论坛, 2015, 30(3): 10-16.

[5] 李因果, 何晓群. 面板数据聚类方法及应用[J]. 统计研究, 2010, 27(9): 73-77.

[6] 任娟. 多指标面板数据融合聚类分析[J]. 数理统计与管理, 2016, 32(1): 57-67.

[7] 王双英, 王群伟, 曹泽. 多指标面板数据聚类方法及应用-以行业一次能源消费面板数据为例[J]. 数理统计与管理, 2014(1): 42-49.

[8] 杨毅, 赵国浩, 秦爱民. 面板数据的有序聚类分析及其应用——以全球气候变化聚类分析为例[J]. 统计与信息论坛, 2012, 27(7): 13-18.

Top