一种改进的强化学习方法在RoboCup中应用研究

广西师范大学学报（自然科学版） ›› 2010, Vol. 28 ›› Issue (3): 99-103.

一种改进的强化学习方法在RoboCup中应用研究

程显毅^1,2, 朱倩²

1.南通大学计算机科学与技术学院,江苏南通226019;
2.江苏大学计算机科学与通信工程学院,江苏镇江212013

收稿日期:2010-05-13 出版日期:2010-09-20 发布日期:2023-02-06
通讯作者: 程显毅(1956—),男,黑龙江哈尔滨人,南通大学教授,博导。E-mail:xycheng@ntu.edu.cn
基金资助:
国家自然科学基金资助项目(60702056);江苏省研究生创新项目(ZX09B2042)

Improved Reinforcement Learning Algorithm and Its Application in RoboCup

CHENG Xian-yi^1,2, ZHU Qian²

1. College of Computer Science and Technology,Nantong University,Nantong Jiangsu 226019,China;
2. College of Computer Science and Telecommunications Engineering,Jiangsu University, Zhenjiang Jiangsu 212013,China

Received:2010-05-13 Online:2010-09-20 Published:2023-02-06

摘要/Abstract

摘要： 基于CMAC(cerebella model articulation controller)提出一种动态强化学习方法(dynamic cerebella model articulation controller-advantage learning,DCMAC-AL)。该方法利用advantage(λ) learning计算状态-动作函数,强化不同动作的值函数差异,以避免动作抖动;然后在CMAC函数拟合基础上,利用Bellman误差动态添加特征值,提高CMAC函数拟合的自适应性。同时,在RoboCup 仿真平台上对多智能体防守任务(takeaway)进行建模,利用新提出的算法进行学习实验。实验结果表明,DCMAC-AL比采用CMAC的advantage(λ) learning方法有更好的学习效果。

关键词: 强化学习, agent, RoboCup, CMAC

Abstract: An improved algorithm based on CMAC (cerebella modelarticulation controller) and named DCMAC-AL is proposed.It uses advantage(λ) learning to calculate the state-action function,emphasizes the differences among action values and shuns action oscillation.It creates novel features based on Bellman error to improvethe adaption of CMAC.Besides,it provides a mathematic model for takeaway in RoboCup Soccer Simulation and experiment with DCMAC-AL.The results demonstrate thatDCMAC-AL outperforms advantage(λ) learning in regard to learning effort.

Key words: reinforcement learning, agent, RoboCup, CMAC

中图分类号:

TP181

程显毅, 朱倩. 一种改进的强化学习方法在RoboCup中应用研究[J]. 广西师范大学学报（自然科学版）, 2010, 28(3): 99-103.

CHENG Xian-yi, ZHU Qian. Improved Reinforcement Learning Algorithm and Its Application in RoboCup[J]. Journal of Guangxi Normal University(Natural Science Edition), 2010, 28(3): 99-103.

参考文献

[1] SUTTON S R,BARTO A G.Reinforcement learning[M].Cambridge,MA:MIT Press,1998:24-26.
[2] BAKKER B.Reinforcement learning with long short-term memory[C]//Advances in Neural Information Processing System 14.Cambridge,MA:MIT Press,2002:987-990.
[3] PHILIPP W K,SHIE M,DOINA P.Automatic basis function construction for approximate dynamic programming and reinforcement learning[C]//Proceedings of the 23rd International Conference on Machine learning.Cambridge:MIT Press,2006:1103-1115.
[4] 高阳,胡景凯,王本年,等.基于CMAC网络强化学习的电梯群控调度[J].电子学报,2007,35(2):262-265.
[5] 李明爱,焦利芳,郝冬梅,等.基于多个并行CMAC神经网络的强化学习方法[J].系统仿真学报,2008,20(24):6683-6687.
[6] STONE P,SUTTON R S,KUHLMANN G.Reinforcement learning for RoboCup-soccer keepaway[J].Adaptive Behavior,2005,13(3):165-188.
[7] ATIL S,TOLEDO C B.A new perspective to the keepaway soccer:the takers (ShortPaper)[C]//ISCEN A,EROG-UL U.Proc of 7th Int Conf on Autonomous Agents and Multiagent Systems (AAMAS 2008).Estoril,Portugal:Springer Press,2008:566-569.

相关文章 8

[1]	李志欣, 苏强. 基于知识辅助的图像描述生成[J]. 广西师范大学学报（自然科学版）, 2022, 40(5): 418-432.
[2]	陈高建, 王菁, 栗倩文, 袁云静, 曹嘉琛. 数据驱动的自动化机器学习流程生成方法[J]. 广西师范大学学报（自然科学版）, 2022, 40(3): 185-193.
[3]	唐峯竹, 唐欣, 李春海, 李晓欢. 基于深度强化学习的多无人机任务动态分配[J]. 广西师范大学学报（自然科学版）, 2021, 39(6): 63-71.
[4]	张林兰, 刘青. 基于模糊准则的不完全信息双边协商研究[J]. 广西师范大学学报（自然科学版）, 2015, 33(4): 38-42.
[5]	周建, 王莉莉, Ahmed Rahmani, 刘昕. 分布式多agent系统在飞行冲突解脱中的应用[J]. 广西师范大学学报（自然科学版）, 2015, 33(3): 16-22.
[6]	苏诚, 陈文娜, 周玲, 黄冬梅. 面向海洋空间数据集成的多Agent任务分配机制[J]. 广西师范大学学报（自然科学版）, 2011, 29(2): 205-209.
[7]	吴礻韦娴, 苏诚, 陈明, 冯国富, 池涛. 基于Agent的温室无线传感网络分簇管理模型[J]. 广西师范大学学报（自然科学版）, 2011, 29(2): 210-214.
[8]	柳相楠, 陈明, 冯国富, 池涛. 基于移动Agent的无线传感网络拓扑控制策略[J]. 广西师范大学学报（自然科学版）, 2011, 29(2): 215-218.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed