%0 Journal Article
%A Sawicki, Adrian
%A Tomera, Miroslaw
%T Optimizing the Path of a Mobile Agent in the Environment with Static Obstacles using Reinforcement Learning
%J TransNav, the International Journal on Marine Navigation and Safety of Sea Transportation
%V 19
%N 3
%P 863-873
%D 2025
%U ./Article_Optimizing_the_Path_of_a_Mobile_Sawicki,75,1570.html
%X Path planning for a mobile agent concerns the problem of searching for a collision-free and optimal path between the initial and target positions. The space in which the agent moves contains a number of obstacles modeled by ordered grids, each representing the obstacle location in the space of the agent movement. The final boundary of an obstacles is formed by its actual boundary plus a minimum safe distance, taking into account the size of the agent, which allows to treating the obstacles as points in the environment. In the article, reinforcement learning algorithms were used to determine the path, including numerous design methods: dynamic programming (policy iteration algorithm and value iteration algorithm), Monte Carlo method (Monte Carlo control algorithm), temporal-difference (TD) learning (Q-learning algorithm and Sarsa algorithm), eligibility traces (Q(ïZ)) algorithm and Sarsa(ïZ) algorithm), planning and learning (Dyna-Q algorithm), and gradient methods (Q-learning algorithm with Adam optimizer and Sarsa algorithm with Adam optimizer). The reinforcement learning algorithms operate on the principle of determining the agent's policy, which seeks the minimum distance between the initial and target positions of the mobile agent, while avoiding obstacles. These learning procedures differ between, dynamic programming, which requires a good knowledge of the environment model to determine the agent's policy, and other methods which do not require this knowledge. The aim of the reported work was to examine the above named algorithms in terms of their effectiveness and speed of finding the optimal solution. Based on the results of the simulation studies, the most effective methods turned out to be that using gradient methods for optimization, i.e.: Q-learning with Adam optimizer.
%@ 2083-6473
%R 10.12716/1001.19.03.20