Journal of Guangxi Normal University(Natural Science Edition) ›› 2025, Vol. 43 ›› Issue (6): 107-119.doi: 10.16088/j.issn.1001-6600.2024122302
• Intelligence Information Processing • Previous Articles Next Articles
LU Mengxiao1,ZHANG Yangchun1*,ZHANG Xiaofeng2
| [1] 刘潇, 刘书洋, 庄韫恺, 等. 强化学习可解释性基础问题探索和方法综述[J]. 软件学报, 2023, 34(5): 2300-2316. DOI: 10.13328/j.cnki.jos.006485. [2] 罗彪, 胡天萌, 周育豪, 等. 多智能体强化学习控制与决策研究综述[J]. 自动化学报, 2025, 51(3): 510-539. DOI: 10.16383/j.aas.c240392. [3] 陈秀锋, 王成鑫, 赵凤阳, 等. 改进DQN算法的单点交叉口信号控制方法[J]. 广西师范大学学报(自然科学版), 2024, 42(6): 81-88. DOI: 10.16088/j.issn.1001-6600.2023110105. [4] 揭慧鑫, 刘勇, 马良. 基于新型多目标深度强化学习模型求解固定式-移动式-无人机式协同配送的AED选址问题[J]. 计算机应用研究, 2025, 42(5): 1370-1377. DOI:10.19734/j.issn.1001-3695.2024.10.0358. [5] 孔梦燕, 张亚生, 董飞虎. 基于深度强化学习的低轨卫星网络算力路由研究[J]. 计算机测量与控制, 2025, 33(2): 286-292, 316. DOI: 10.16526/j.cnki.11-4762/tp.2025.02.036. [6] 周铭. 智能信息系统中的强化学习算法在推荐系统中的应用[J]. 信息系统工程, 2024(8): 52-55. DOI: 10.3969/j.issn.1001-2362.2024.08.015. [7] 张一博, 高丙朋. 基于深度强化学习的AUV路径规划研究[J]. 东北师大学报(自然科学版), 2025, 57(1): 53-62. DOI: 10.16163/j.cnki.dslkxb202312260002. [8] 谭灏南. 基于强化学习的DDoS攻击检测与缓解研究[D]. 广州: 广州大学, 2024. DOI: 10.27040/d.cnki.ggzdu.2024.001321. [9] 刘胜全, 刘博. 基于深度强化学习的工业网络入侵检测研究[J]. 东北师大学报(自然科学版), 2024, 56(1): 80-86. DOI: 10.16163/j.cnki.dslkxb202210290001. [10] 张有兵, 林一航, 黄冠弘, 等. 深度强化学习在微电网系统调控中的应用综述[J]. 电网技术, 2023, 47(7): 2774-2788. DOI: 10.13335/j.1000-3673.pst.2022.0490. [11] 李一江. 微电网中基于深度强化学习的能源优化管理方案的研究[D]. 南京: 南京邮电大学, 2023. DOI: 10.27251/d.cnki.gnjdc.2023.001827. [12] 陈帅. 基于强化学习的微电网能量管理与调度[D]. 北京: 北京科技大学, 2023. DOI: 10.26945/d.cnki.gbjku.2023.000345. [13] 袁梦婷. 基于深度强化学习的无人机避障航迹规划方法研究[D]. 成都: 四川大学, 2023. DOI: 10.27342/d.cnki.gscdu.2023.000594. [14] 李子涵. 基于强化学习的无人机集群对抗仿真研究[D]. 西安: 西安工业大学, 2023. DOI: 10.27391/d.cnki.gxagu.2023.000627. [15] 张磊. 基于强化学习的多无人机协同控制算法研究[D]. 长春: 中国科学院大学(中国科学院长春光学精密机械与物理研究所), 2023. DOI: 10.27522/d.cnki.gkcgs.2023.000121. [16] MOERLAND T M, BROEKENS J, PLAAT A, et al. Model-based reinforcement learning: a survey[J]. Foundations and Trends in Machine Learning, 2023, 16(1): 1-118. DOI: 10.1561/2200000086. [17] 乌兰, 刘全, 黄志刚, 等. 离线强化学习研究综述[J]. 计算机学报, 2025, 48(1): 156-187. DOI: 10.11897/SP.J.1016.2025.00156. [18] 汤瑞航, 黄初华, 秦进. 一种基于确定性环境模型的离线强化学习方法[J]. 计算机应用研究, 2025, 42(5): 1352-1355. DOI: 10.19734/j.issn.1001-3695.2024.10.0357. [19] BARRETO A, DABNEY W, MUNOS R, et al. Successor features for transfer in reinforcement learning[C] //Advances in Neural Information Processing Systems 30 (NIPS 2017). Red Hook, NY: Curran Associates Inc., 2017: 4058-4068. [20] CARVALHO W C, SARAIVA A, FILOS A, et al. Combining behaviors with the successor features keyboard[C] //Advances in Neural Information Processing Systems 36 (NeurIPS 2023). Red Hook, NY: Curran Associates Inc., 2024: 436. [21] LIU Y T, AHMAD A. Multi-task reinforcement learning in continuous control with successor feature-based concurrent composition[C] //2024 European Control Conference (ECC). Piscataway, NJ: IEEE, 2024: 3860-3867. DOI: 10.23919/ECC64448.2024.10591301. [22] BORSA D, BARRETO A, QUAN J, et al. Universal successor features approximators[EB/OL]. (2018-12-18)[2024-12-23]. https://arxiv.org/abs/1812.07626. DOI: 10.48550/arXiv.1812.07626. [23] CARVALHO W, FILOS A, LEWIS R L, et al. Composing task knowledge with modular successor feature approximators[EB/OL]. (2023-08-25)[2024-12-23]. https://arxiv.org/abs/2301.12305. DOI: 10.48550/arXiv.2301.12305. [24] FENG Z Y, ZHANG B W, BI J X, et al. Safety-constrained policy transfer with successor features[C] //2023 IEEE International Conference on Robotics and Automation (ICRA). Piscataway, NJ: IEEE, 2023: 7219-7225. DOI: 10.1109/ICRA48891.2023.10161256. [25] JAIN A K, WILTZER H, FAREBROTHER J, et al. Non-adversarial inverse reinforcement learning via successor feature matching[EB/OL]. (2024-11-11)[2024-12-23]. https://arxiv.org/abs/2411.07007v1. DOI: 10.48550/arXiv.2411.07007. [26] NEMECEK M, PARR R. Policy caches with successor features[C] //Proceedings of the 38th International Conference on Machine Learning: PMLR 139. Cambridge, MA: JMLR, 2021: 8025-8033. [27] HUNT J, BARRETO A, LILLICRAP T, et al. Composing entropic policies using divergence correction[C] //Proceedings of the 36th International Conference on Machine Learning: PMLR 97. Cambridge, MA: JMLR, 2019: 2911-2920. [28] BELLEMARE M G, DABNEY W, ROWLAND M. Distributional reinforcement learning[M]. Cambridge, MA: MIT Press, 2023. DOI: 10.7551/mitpress/14207.001.0001. [29] KUZNETSOV A, SHVECHIKOV P, GRISHIN A, et al. Controlling overestimation bias with truncated mixture of continuous distributional quantile critics[C] //Proceedings of the 37th International Conference on Machine Learning: PMLR 119. Cambridge, MA: JMLR, 2020: 5556-5566. [30] THÉATE T, ERNST D. Risk-sensitive policy with distributional reinforcement learning[J]. Algorithms, 2023, 16(7): 325. DOI: 10.3390/a16070325. [31] LUIS C E, BOTTERO A G, VINOGRADSKA J, et al. Value-distributional model-based reinforcement learning[EB/OL]. (2024-09-03)[2024-12-23]. https://arxiv.org/abs/2308.06590. DOI: 10.48550/arXiv.2308.06590. [32] BELLEMARE M G, DABNEY W, MUNOS R. A distributional perspective on reinforcement learning[C] //Proceedings of the 34th International Conference on Machine Learning: PMLR 70. Cambridge, MA: JMLR, 2017: 449-458. [33] SZEPESVRI C. Algorithms for reinforcement learning[M]. Cham: Springer Nature Switzerland AG, 2010. DOI: 10.1007/978-3-031-01551-9. [34] DUAN J L, GUAN Y, LI S E, et al. Distributional soft actor-critic: off-policy reinforcement learning for addressing value estimation errors[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(11): 6584-6598. DOI: 10.1109/TNNLS.2021.3082568. [35] MLLER A. Integral probability metrics and their generating classes of functions[J]. Advances in Applied Probability, 1997, 29(2): 429-443. DOI: 10.2307/1428011. [36] DABNEY W, ROWLAND M, BELLEMARE M, et al. Distributional reinforcement learning with quantile regression[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 2892-2901. DOI: 10.1609/aaai.v32i1.1179191. [37] DABNEY W, OSTROVSKI G, SILVER D, et al. Implicit quantile networks for distributional reinforcement learning[C] //Proceedings of the 35th International Conference on Machine Learning: PMLR 80. Cambridge, MA: JMLR, 2018: 1096-1105. [38] YANG D, ZHAO L, LIN Z C, et al. Fully parameterized quantile function for distributional reinforcement learning[C] //Advances in Neural Information Processing Systems 32 (NeurIPS 2019). Red Hook, NY: Curran Associates Inc., 2019: 556. [39] COLLINS J R. Robust estimation of a location parameter in the presence of asymmetry[J]. The Annals of Statistics, 1976, 4(1): 68-85. DOI: 10.1214/aos/1176343348. [40] ALEGRE L N, FELTEN F, TALBI E G, et al. MO-Gym: a library of multi-objective reinforcement learning environments[C] //Proceedings of the 34th Benelux Conference on Artificial Intelligence. Lamot Mechelen: BNAIC/Benelearn, 2022: 1-4. [41] GIMELFARB M, BARRETO A, SANNER S, et al. Risk-aware transfer in reinforcement learning using successor features[C] //Advances in Neural Information Processing Systems 34 (NeurIPS 2021). Red Hook, NY: Curran Associates Inc., 2021, 34: 17298-17310. |
| [1] | LIU Songkai, ZENG Yucong, ZHANG Lei, LI Yanzhang, WANG Qiujie, LIU Longcheng, CHEN Ping, ZHAO Wenbo. Transient Stability Preventive Control Method Based on Deep Extreme Learning Machine [J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(5): 64-74. |
| [2] | TIAN Sheng, XIONG Chenyin, LONG Anyang. Point Cloud Classification Method of Urban Roads Based on Improved PointNet++ [J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(4): 1-14. |
| [3] | SONG Mingkai, ZHU Chengjie. Research on Fault Location of Distribution Network Based on H-WOA-GWO and Region Correction Strategies [J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(4): 24-37. |
| [4] | CHEN Yu, CHEN Lei, ZHANG Yi, ZHANG Zhirui. Wind Speed Prediction Model Based on QMD-LDBO-BiGRU [J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(4): 38-57. |
| [5] | HAN Shuo, JIANG Linfeng, YANG Jianbin. Attention-based PINNs Method for Solving Saint-Venant Equations [J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(4): 58-68. |
| [6] | LI Fanghao, LIU Liqun, WU Qingfeng. Microgrid Fault Location Based on Sniffing Strategy Slime Mould Algorithm [J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(2): 30-41. |
| [7] | LIU Junjie, MA Kai, HUANG Zehua, TIAN Miao, QIU Qinjun , TAO Liufeng, XIE Zhong. Geological Structure Recognition Based on Transfer Learning and Channel Prior Attention Mechanism [J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(2): 107-120. |
| [8] | TIAN Sheng, CHEN Dong. A Joint Eco-driving Optimization Research for Connected Fuel Cell Hybrid Vehicle via Deep Reinforcement Learning [J]. Journal of Guangxi Normal University(Natural Science Edition), 2024, 42(6): 67-80. |
| [9] | CAO Feng, WANG Jiafan, YI Jianbing, LI Jun. A Multi-clause Dynamic Deduction Algorithm Based on Clause Stability and Its Application [J]. Journal of Guangxi Normal University(Natural Science Edition), 2024, 42(6): 164-176. |
| [10] | HUANG Mantong, YU Xin. A Neural Network Algorithm Based on Penalty Function Method for Solving Non-smooth Pseudoconvex Optimization Problems and Its Applications [J]. Journal of Guangxi Normal University(Natural Science Edition), 2024, 42(5): 101-109. |
| [11] | ZHENG Xiubin, CHEN Jun. Parameters Identification of Photovoltaic Cells Based on Improved Dung Beetle Optimization Algorithm [J]. Journal of Guangxi Normal University(Natural Science Edition), 2024, 42(4): 51-63. |
| [12] | LÜ Hui, LÜ Weifeng. Fundus Hemorrhagic Spot Detection Algorithm Based on Improved YOLOv5 [J]. Journal of Guangxi Normal University(Natural Science Edition), 2024, 42(3): 99-107. |
| [13] | HUANG Wei, WEI Duqu. Synchronization Behavior of Memristor Morris-Lecar Neural Networks [J]. Journal of Guangxi Normal University(Natural Science Edition), 2024, 42(1): 67-78. |
| [14] | WANG Shanshan, HE Jiawen, WU Ni, ZHU Wei, LAN Xin. Combined Model for Wind Power Prediction Based on GRA-ISSA-SVR-EC [J]. Journal of Guangxi Normal University(Natural Science Edition), 2023, 41(4): 61-73. |
| [15] | JIANG Yibo, LIU Huijia, WU Tian. Research on Identification of Lightning Overvoltage in Transmission Line by Improved Residual Network [J]. Journal of Guangxi Normal University(Natural Science Edition), 2023, 41(4): 74-83. |
|