# 基于改进时空图卷积网络的乒乓球击球动作识别Recognition of Table-Tennis Action Based on Improved Spatio-Temporal Graph Convolutional Network

Abstract: The problem of table tennis training with assistance of computer video was studied in this paper. Action recognition algorithm based on method of key point of the bone only learns the spa-tio-temporal information of the human bone points, and can remove interference factors such as environment and light. Video of sports activities including forehand, backhand, forehand, backhand, and non-hit action was collected through the camera, and 18 key points of human bones were ex-tracted using OpenPose to construct a dataset of bones of players playing table tennis. Convolution kernel of the ST-GCN network was adjusted according to the core strength area of table tennis striking, and accuracy of the final training model’s striking action can reach 98%. Generalization test was performed on video of table tennis striking beyond data set proposed in this paper, and generalization effect showed that the proposed spatio-temporal graph convolutional network method showed better results and thus had higher practical value than the proposed ST-GCN net-work.

1. 引言

2. 算法流程设计

Figure 1. Flowchart of recognition of table tennis striking action

3. 数据集采集

3.1. 数据集采集硬件设备与乒乓球运动员选择

3.2. 数据集采集

(a1) 正手击球 (b1) 正手拉球 (c1) 反手击球 (d1) 反手拉球 (a2) 正手击球 (b2) 正手拉球 (c2) 反手击球 (d2) 反手拉球

Figure 2. Similar actions in table-tennis striking action: (a1) and (b1) are a certain frame with similar actions in the process of forehand and forehand; (a2) and (b2) are a certain frame with difference between forehand and forehand;(c1) and (d1) are a certain frame of similar actions in the process of backhand and backhand

4. 数据集处理

4.1. 手工处理数据

Table 1. System resulting data of standard experiment

4.2. OpenPose提取骨骼关键点

OpenPose [7] 是美国卡耐基梅隆大学的研究者在人体姿态识别项目中提出的一个模型 [8]，此模型可以实时跟踪识别15、18或者25个身体关键点，单只手上的21个手部关键点，70个面部关键点 [7]。此模型是通过自底向上的方法实时检测出图像中多人的人体、面部和手部的关键点，是基于深度学习的实时多人姿态估计应用的开山之作 [8]。乒乓球横拍击球是腰部和手臂动作的配合，使用OpenPose提取人体中的18个关键点，所提取获取的是每个骨骼关键点的索引(index)、像素坐标(x,y)和置信度，如图3所示。

$\left\{\begin{array}{l}f=1\\ m=⌊\frac{{C}_{fram}}{16}⌋\\ f+=m\end{array}$ (1)

Table 2. Storage of continuous bone key point using json format

Figure 3. 18 bone points of human body extracted by OpenPose

5. 基于图卷积的动作识别算法

5.1. ST-GCN网络

$x={D}^{-1}AX$ (2)

5.2. 骨骼点连通方式

(a) ST-GCN使用的连接方式 (b) 文章提出的连接方式

Figure 4. Connection of skeleton point: (a) connection method used in the ST-GCN project; (b) connection method using improved skeletal point proposed in the article based on core strength area of table tennis striking

6. 实验结果与分析

6.1. 模型训练实验结果分析

Table 3. Comparison of performance among network models

6.2. 模型泛化效果测试

(a) ST-GCN 网络模型泛化效果 (b) 文章改进后网络模型泛化效果测试

Figure 5. Comparison of generalization effect between ST-GCN network and network improved in this paper

7. 结语

[1] Kipf, T.N. and Welling, M. (2016) Semi-Supervised Classification with Graph Convolutional Networks. arXiv:1609.02907 [cs.LG]

[2] Yan, S., Xiong, Y. and Lin, D. (2018) Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. arXiv:1801.07455 [cs.CV]

[3] Simonyan, K. and Zisserman, A. (2014) Two-Stream Convolutional Networks for Action Recognition in Video. arXiv:1406.2199 [cs.CV]

[4] Tran, D., Bourdev, L., Fergus, R., et al. (2015) Learning Spatiotemporal Features with 3d Convolutional Networks. Proceedings of the IEEE International Conference on Computer Vision, Santiago, 7-13 December 2015, 4489-4497.
https://doi.org/10.1109/ICCV.2015.510

[5] Donahue, J., Anne Hendricks, L., Guadarrama, S., et al. (2017) Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 677-691.
https://doi.org/10.1109/TPAMI.2016.2599174

[6] Shao, D., Zhao, Y., Dai, B., et al. (2020) FineGym: A Hierarchical Video Dataset for Fine-Grained Action Understanding. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, 13-19 June 2020, 2616-2625.
https://doi.org/10.1109/CVPR42600.2020.00269

[7] Zhe, C., Simon, T., Wei, S.E., et al. (2017) Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 21-26 July 2017, 1302-1310.

[8] 人工智能小技巧. Github开源人体姿态识别项目OpenPose中文文档[Z/OL].
https://www.jianshu.com/p/3aa810b35a5d, 2018-11-11.

[9] Kip, T.N. and Welling, M. (2016) Semi-Supervised Classification with Graph Convolutional Networks. arXiv:1609.02907 [cs.LG]

[10] 日知. 如何理解 Graph Convolutional Network (GCN) [Z/OL].