Optimizing the Path of a Mobile Agent in the Environment with Static Obstacles using Reinforcement Learning

ABSTRACT: Path planning for a mobile agent concerns the problem of searching for a collision-free and optimal path between the initial and target positions. The space in which the agent moves contains a number of obstacles modeled by ordered grids, each representing the obstacle location in the space of the agent movement. The final boundary of an obstacles is formed by its actual boundary plus a minimum safe distance, taking into account the size of the agent, which allows to treating the obstacles as points in the environment. In the article, reinforcement learning algorithms were used to determine the path, including numerous design methods: dynamic programming (policy iteration algorithm and value iteration algorithm), Monte Carlo method (Monte Carlo control algorithm), temporal-difference (TD) learning (Q-learning algorithm and Sarsa algorithm), eligibility traces (Q()) algorithm and Sarsa() algorithm), planning and learning (Dyna-Q algorithm), and gradient methods (Q-learning algorithm with Adam optimizer and Sarsa algorithm with Adam optimizer). The reinforcement learning algorithms operate on the principle of determining the agent's policy, which seeks the minimum distance between the initial and target positions of the mobile agent, while avoiding obstacles. These learning procedures differ between, dynamic programming, which requires a good knowledge of the environment model to determine the agent's policy, and other methods which do not require this knowledge. The aim of the reported work was to examine the above named algorithms in terms of their effectiveness and speed of finding the optimal solution. Based on the results of the simulation studies, the most effective methods turned out to be that using gradient methods for optimization, i.e.: Q-learning with Adam optimizer.

KEYWORDS: Path Planning, Reinforcement Learning (RL), Static Obstacles Environment, Q-learning, Sarsa Algorithm, Dyna-Q, Adam Optimizer, Policy Optimization

REFERENCES

M. N. Ab Wahab, A. Nazir, A. Khalil, B. Bhatt, M. H. Mohd Noor, M. F. Akbar, and A. S. A. Mohamed, “Optimised path planning using enhanced firefly algorithm for a mobile robot,” PLoS ONE, vol. 19, no. 8, e0308264, Aug. 2024, doi: 10.1371/journal.pone.0308264. - doi:10.1371/journal.pone.0308264

R. E. Bellman, “A Markovian decision process,” J. Math. Mech., vol. 6, no. 5, pp. 679-684, 1957, doi: 10.1512/iumj.1957.6.56038. - doi:10.1512/iumj.1957.6.56038

M. Blaich, M. Rosenfelder, M. Schuster, O. Bittel, and J. Reuter, “Fast grid based collision avoidance for vessels using A-star search algorithm,” In Proceedings of the 17th International Conference on Methods and Models in Automation and Robotics, MMAR, Aug. 27-30, 2012; pp. 385-390, doi: 10.1109/MMAR.2012.6347858. - doi:10.1109/MMAR.2012.6347858

P. Brundavani, and Y. Avanija, “Robot path planning and tracking with the flower pollination search optimization algorithm,” J. Comput. Allied Intell., vol. 2, no. 4, pp. 70-81, Aug. 2024, doi: 10.69996/jcai.2024020. - doi:10.69996/jcai.2024020

C. Chen, X.-Q. Chen, F. Ma, X.-J. Zeng, and J. Wang, ”A knowledge-free path planning approach for smart ships based on reinforcement learning,” Ocean Eng., vol. 189, e106299, Aug. 2019, doi: 10.1016/j.oceaneng.2019.106299. - doi:10.1016/j.oceaneng.2019.106299

J. Chen, J. Liang, and Y. Tong, “Path planning of mobile robot based on improved differential evolution algorithm,” In Proceedings of the 16th International Conference on Control, Automation, Robotics and Vision, ICARCV, Dec. 13-15, 2020, pp. 811-816, doi: 10.1109/ICARCV50220.2020.9305415. - doi:10.1109/ICARCV50220.2020.9305415

E. W. Dijkstra, “A note on two problems in connexion with graphs,” Numer. Math, vol. 1, pp. 269-271, 1959. - doi:10.1007/BF01386390

H. Dong, Z. Ding, and S. Zhang, Eds. “Deep Reinforcement Learning: Fundamentals, Research and Applications,” Springer Nature Book, Jun. 2020, https://deepreinforcementlearningbook.org. - doi:10.1007/978-981-15-4095-0

L. E. Dubins, “On curves of minimal length with a constraint on average curvature, and with prescribed initial and terminal positions and tangents.” Am. J. Math., vol. 79, pp. 497-516, 1957, doi: 10.2307/2372560. - doi:10.2307/2372560

V. Francois-Lavet,P. Henderson, R. Islam, M. G. Bellemare, and J. Pineau, “An Introduction to deep reinforcement learning,” Found. Trends Mach. Learn., vol. 11, pp. 219-354, Nov. 2018, doi: 10.48550/arXiv.1811.12560. - doi:10.1561/2200000071

R. Jaramillo-Martínez, E. Chavero-Navarrete, and T. Ibarra-Pérez, “Reinforcement-learning-based path planning: A reward function strategy,” Applied Sci., vol. 14, e7654, Aug. 2024, doi: 10.3390/app14177654. - doi:10.3390/app14177654

D.P. Kingma, and J. L. Ba, “Adam: A method for stochastic optimization,” Computing Research Repository, Dec. 2014, https://arxiv.org/abs/1412.6980.

S. M. LaValle, “Rapidly-exploring random trees: A new tool for path planning,” Annu. Res. Rep., Computer Science Department, Iowa State University. Ames, Iowa, USA, 1998. https://api.semanticscholar.org/CorpusID:14744621.

A. Lazarowska, “Comparison of discrete artificial potential field algorithm and wave-front algorithm for autonomous ship trajectory planning,” IEEE Access, vol. 8, pp. 221013-22102, Dec. 2020, doi: 10.1109/ACCESS.2020.3043539. - doi:10.1109/ACCESS.2020.3043539

Y. Li, J. Zhao, Z. Chen, G. Xiong, and S. Liu, “A robot path planning method based on improved genetic algorithm and improved dynamic window approach,” Sustainability, vol. 15, no. 5, e4656, Mar. 2023, doi: 10.3390/su15054656. - doi:10.3390/su15054656

M. Lin, K. Yuan, C. Shi, and Y. Wang, “Path planning of mobile robot based on improved A* algorithm,” In Proceedings of the 29th Chinese Control And Decision Conference, CCDC, May 28-30, 2017, pp. 3570-3576, doi: 10.1109/CCDC.2017.7979125. - doi:10.1109/CCDC.2017.7979125

L. Liu, J. Yao, D. He, J. Chen, J. Huang, H. Xu, B. Wang, and J. Guo, “Global dynamic path planning fusion algorithm combining jump-A* algorithm and dynamic window approach,” IEEE Access, vol. 9, pp. 19632-19638, Jan. 2021, doi: 10.1109/ACCESS.2021.3052865. - doi:10.1109/ACCESS.2021.3052865

L. Liu, L. Li, H Nian, Y. Lu, H. Zhao, and Y. Chen, “Enhanced grey wolf optimization algorithm for mobile robot path planning,” Electronics, vol. 12, no. 18. e4026, Sep. 2023, doi: 10.3390/electronics12194026. - doi:10.3390/electronics12194026

J. Luo, Z. X. Wang, and K. L. Pan, “Reliable path planning algorithm based on improved artificial potential field method,” IEEE Access, vol. 10, p. 108276-108284, Oct. 2022, doi: 10.1109/ACCESS.2022.3212741. - doi:10.1109/ACCESS.2022.3212741

W. Min, L. Mo, B. Yin, and S. Li,” An improved cuckoo search algorithm and its application in robot path planning,” Appl. Sci., vol. 14, no. 20, e9572, Oct. 2024, doi: 10.3390/app14209572. - doi:10.3390/app14209572

D. K. Muhsen, F. A. Raheem,and A. T. Sadiq, ”A Systematic review of rapidly exploring random tree RRT algorithm for single and multiple robots,” Cybern. Inf. Technol., vol. 24, pp. 78-101, Sep. 2024, doi: 10.2478/cait-2024-0026. - doi:10.2478/cait-2024-0026

G. Parlangeli, and G. Indiveri, “Dubins inspired 2D smooth paths with bounded curvature and curvature derivative,” IFAC Proceedings Volumes, vol. 43, no. 16, pp. 252-257, 2010, doi: 10.3182/20100906-3-IT-2019.00045. - doi:10.3182/20100906-3-IT-2019.00045

G. A. Rummery, and M. Niranjan, “On-line Q-learning using connecionist systems,” Technical Report, Cambridge University Engineering Department. Cambridge, England. 1994.

D. Shan, S. Zhang, X. Wang, and P. Zhang, “Path-planning strategy: adaptive ant colony optimization combined with an enhanced dynamic window approach,” Electronics, vol. 13, no. 5, e825, Feb. 2024, doi: 10.3390/electronics13050825. - doi:10.3390/electronics13050825

R. Singh, J. Ren, and X. Lin, “A review of deep reinforcement learning algorithms for mobile robot path planning,” Vehicles, vol. 5, no. 4, pp. 1423-1451, Oct. 2023, doi: 10.3390/vehicles5040078. - doi:10.3390/vehicles5040078

P. Sudhakara, and V. Ganapathy, “Trajectory planning of a mobile robot using enhanced A-star algorithm,” J. Sci. Technol., vol. 9, no. 41, pp. 1-10, 2016, doi: 10.17485/ijst/2016/v9i41/93816. - doi:10.17485/ijst/2016/v9i41/93816

R. S. Sutton, “Learning to predict by the methods of temporal differences,” Mach. Learn., vol. 3, pp. 9-44, 1988, https://10.1007/BF00115009. - doi:10.1007/BF00115009

R. S. Sutton, and A. G. Barto, “Reinforcement Learning. An Introduction,” 2d edition, MIT Press. Cambridge, MA, USA, 2020.

R. Tang, X. Tang, and H. Zhao, “Enhancing whale optimization algorithm with differential evolution and Lévy flight for robot path planning.” Int. J. Adv. Comput., Sci. Appl., vol. 15, no. 6, pp. 401-410, 2024, doi: 10.14569/IJACSA.2024.0150540. - doi:10.14569/IJACSA.2024.0150540

P. Vana, and J. Faigl, “Optimal solution of the generalized Dubins interval problem: Finding the shortest curvature-constrained path through a set of regions,” Auton. Robots, vol. 44, pp. 1359-1376, Aug. 2020, doi: 10.1007/s10514-020-09932-x. - doi:10.1007/s10514-020-09932-x

S. X. Wang, “The improved Dijkstra's shortest path algorithm and its application”. Procedia Eng., vol. 29, pp. 1186-1190, Feb. 2012, doi: 10.1016/j.proeng.2012.01.110. - doi:10.1016/j.proeng.2012.01.110

X. Wang, Z. Liu, and J. Liu, “Mobile robot path planning based on an improved A-star algorithm,” In Proceedings of the 2nd International Conference on Computer Graphics, Artificial Intelligence, and Data Processing, ICCAID, May 23, 2023, doi: 10.1117/12.2674526. - doi:10.1117/12.2674526

Watkins, C. Learning from delayed rewards. Ph.D. dissertation, Cambridge University, Cambridge, U.K., 1989.

Y. Xu, B. Sang, and Y. Zhang, “Application of improved sparrow search algorithm to path planning of mobile robots,” Biomimetics, vol. 9, no. 6, e351, Jun. 2024, doi: 10.3390/biomimetics9060351. - doi:10.3390/biomimetics9060351

G. Zhang, Y. Deng, W. Zhang, and C. Huang, “Novel DVS guidance and path-following control for underactuated ships in presence of multiple static and moving obstacles”, Ocean Eng., vol. 170, pp. 100–110, Dec. 2018, doi: 10.1016/j.oceaneng.2018.10.009. - doi:10.1016/j.oceaneng.2018.10.009

L. Zheng, W. Yu, G. Li, G. Qin, and Y. Luo, “Particle swarm algorithm path-planning method for mobile robots based on artificial potential fields,” Sensors, vol. 23, no. 13, e6082, Apr. 2023, doi: 10.3390/s23136082. - doi:10.3390/s23136082

Citation note:

Sawicki A., Tomera M.: Optimizing the Path of a Mobile Agent in the Environment with Static Obstacles using Reinforcement Learning. TransNav, the International Journal on Marine Navigation and Safety of Sea Transportation, Vol. 19, No. 3, doi:10.12716/1001.19.03.20, pp. 863-873, 2025

BibTeX EndNote

Authors in other databases:

Adrian Sawicki: ORCID iD icon

orcid.org/0009-0006-7904-4410

Mirosław Tomera: ORCID iD icon

orcid.org/0000-0003-4745-597X Scopus icon