基于LCS和LS-SVM的多机器人强化学习
Multi-Robot Reinforcement Learning Based on LCS and LS-SVM

作者: 邵 杰 , 林海霞 :郑州成功财经学院信息工程系,郑州; 杜丽娟 :商丘工学院信息与电子学院,商丘;

关键词: 学习分类器协同最小二乘支持向量机强化学习多机器人Learning Classifier System LS-SVM Reinforcement Learning Multi-Robot

摘要:
本文提出了一种LCS和LS-SVM相结合的多机器人强化学习方法,LS-SVM获得的最优学习策略作为LCS的初始规则集。LCS通过与环境的交互,能更快发现指导多机器人强化学习的规则,为强化学习系统的动作选择提供实时、动态的反馈,使多机器人自主地学习到相互协作的最优策略。算法的分析和仿真表明多机器人学习空间大、学习速度收敛慢、学习效果不确定等问题得到很大的改善。

Abstract:
This paper presents a multi-robot reinforcement learning method combination LCS and LS-SVM, the optimal learning strategy LS-SVM obtained as an initial rule set of LCS. LCS interact with the environment, which can quickly find the guiding rules for multi-robot reinforcement learning, provide real-time, dynamic feedback, so that multi-robot autonomously learn the optimal strategy of mutual cooperation. Algorithm analysis and simulation show that a large space for multi-robot learning, the learning speed converges slowly, uncertainties and other learning problems can get a great improvement.

文章引用: 邵 杰 , 杜丽娟 , 林海霞 (2013) 基于LCS和LS-SVM的多机器人强化学习。 人工智能与机器人研究, 2, 24-28. doi: 10.12677/AIRR.2013.21004

参考文献

[1] J. shao, J. Y. Yang. Multi-robot reinforcement learning based on learning classifier system with gradient descent methods. Jour- nal of Computational Information Systems, 2010, 6(8): 2449- 2455.

[2] 高阳, 陈世福, 陆鑫. 强化学习研究综述[J]. 自动化学报, 2004, 30(1): 86-100.

[3] 沈晶, 程晓北, 刘海波等. 动态环境中的强化学习[J]. 控制理论与应用, 2008, 25(1): 71-74.

[4] 邵杰, 杨静宇, 万鸣华, 黄传波. 基于学习分类器的多机器人路径规划收敛性研究[J]. 计算机研究与发展, 2010, 47(5): 948-955.

[5] 焦殿科, 石川. 共享经验的多主体强化学习研究[J]. 计算机工程, 2008, 34(11): 219-221.

[6] 陈卫东, 席玉庚, 顾东雷. 自主机器人的强化学习进展[J]. 机器人, 2001, 23(4): 379-384.

[7] 王雪松, 田西兰, 程玉虎, 易建强. 基于协同最小二乘支持向量机的Q学习[J]. 自动化学报, 2009, 35(2): 215-219.

[8] X.-L. Wang, Z.-J. Yin, Y.-B. Lv and S.-F. Li. Operating rules classification system of water supply reservoir based on learning classifier system. Expert Systems with Applications, 2008, 36(3): 5654-5659.

[9] P. Musilek. Enhanced learning classifier system for robot navi- gation. International Conference on Intelligent Robots and Sys- tems, Edmonton, 2-6 August 2005: 3390-3395.

分享
Top