397
1 INTRODUCTION
Berthing is a critical phase of marine navigation in
confined port waters, where a ship must approach a
quay with small positional, heading, and speed errors
while maintaining sufficient margins against
surrounding structures. Previous studies from the
viewpoint of harbor pilots have shown that berthing is
operationally demanding because it is affected by port
geometry, ship size, human decision making, and
limited maneuvering space [1]. Such practical
characteristics make berthing trajectory planning
different from ordinary waypoint-following problems:
the planner must update a feasible approach from the
current ship state while still satisfying the prescribed
terminal position, heading, and speed at the berth.
Automatic berthing has been studied using
model-based optimization [25], neural-network
control [69], reinforcement learning [1011], and
imitation learning (IL) [12]. Model-based approaches
can explicitly handle constraints and ship dynamics,
but they often require careful model identification and
repeated optimization for each operating condition.
Learning-based approaches can provide fast online
inference after training, but a forward-time policy does
not necessarily satisfy the terminal berthing condition
because the terminal pose and speed are not imposed
as hard constraints. This issue was addressed by
introducing backward-time imitation learning (BTIL),
which generated backward-time trajectory
distributions from AIS data [13]. However, a
backward-time plan alone is not always consistent
with the ship’s current state.
A Bidirectional Imitation-Learning Framework for
Real-Time Trajectory Planning in Automatic Berthing
T. Higaki & H. Hashimoto
Osaka Metropolitan University, Sakai, Osaka, Japan
ABSTRACT: This paper presents a bidirectional imitation-learning framework for AIS-based real-time trajectory
planning in automatic berthing. The framework connects a ship’s current state with a prescribed terminal
berthing condition by using two imitation-learning policies: forward-time imitation learning from the current
ship state and backward-time imitation learning from the desired terminal state. The sampled trajectories are
converted into time-indexed probability distributions by time-weighted adaptive kernel density estimation, and
the integrated trajectory is obtained from the joint distribution of the two directions. This formulation is intended
to preserve the feasibility of the initial approach while improving consistency with the final berthing position,
heading, and speed. The method was evaluated using actual AIS trajectories of PCC and car ferries berthing at
Shinmoji Port, Japan. The results show that the integrated planner reduced the terminal deviations observed in
the forward-time planner alone and avoided the initial-state inconsistency observed in the backward-time
planner alone. The proposed approach is positioned as a practical data-driven planning framework for real-time
berthing support under similar vessel and port conditions.
http://www.transnav.eu
the International Journal
on Marine Navigation
and Safety of Sea Transportation
Volume 20
Number 2
June 2026
DOI: 10.12716/1001.20.02.14
398
Therefore, we propose a bidirectional imitation-
learning framework that connects an IL trajectory from
the ship’s current state with a BTIL trajectory from the
terminal berthing state using kernel density estimation
(KDE). This integration enables the construction of
trajectories that are consistent with both the current
ship state and the terminal berthing condition. The
concept of exploiting information from both the start
and goal sides aligns with the motion-planning
literatures [1415], where bidirectional
sampling/exploration has been shown to improve
planning efficiency and robustness in constrained
environments. In addition, the offline pre-training of IL
and BTIL allows the proposed method to generate
trajectories in real time during online operation.
Furthermore, our IL/BTIL-based framework avoids
explicit reward design, which is advantageous because
even experts may fail to define a reward function that
leads to the intended task completion in autonomous
navigation [16].
In this study, using AIS data from vessels berthing
at Shinmoji Port in Fukuoka Prefecture, Japan, we
trained both IL and BTIL models and examined the
limitations of each model when being used
independently. Then, we demonstrated that the
proposed method resolved these issues and achieved
real-time trajectory generation that was consistent with
the initial state and terminal berthing condition under
the static quay geometry. Finally, we investigated the
generalization performance of the proposed method to
different initial states, destinations, and berthing styles.
2 METHODS
2.1 Imitation Learning (IL)
The forward-time trajectory planner was trained by task-
relevant adversarial imitation learning (TRAIL; [17]) with
proximal policy optimization (PPO; [18]) as the policy
optimizer. The network structure and principal learning
settings followed the previous study [13]. In this paper, IL is
used only to generate candidate approach trajectories from
the current ship state; the terminal-side correction is
provided later by BTIL.
2.2 Backward-Time Imitation Learning (BTIL)
BTIL is adopted to generate backward-time
trajectories. The detailed algorithm has been described
in [13]; briefly, each expert trajectory is reversed in
time, and the policy is trained from the terminal
berthing condition toward earlier states. This training
direction makes the sampled trajectories naturally
concentrated around the desired terminal condition
when they are mapped back to forward time. In this
study, BTIL is not used as a stand-alone planner, but as
one of the two distributions to be integrated with the
forward-time IL result.
2.3 Integration of IL and BTIL
As shown in Figure 1, the proposed framework trains
the IL and BTIL policies independently and then
integrates their sampled trajectories at the distribution
level. The forward-time policy is first rolled out from
the initial condition of the ship, whereas the backward-
time policy is rolled out from the prescribed berthing
condition and then mapped back to the ordinary time
direction. These two rollouts provide complementary
information: the former reflects reachability from the
current ship state, and the latter reflects consistency
with the terminal berthing condition.
Figure 1. Overview of the proposed bidirectional imitation-
learning framework. The forward-time policy generates an
approach from the initial ship state, whereas the backward-
time policy generates an approach from the terminal berthing
state. Each rollout is converted into a time-indexed trajectory
distribution by TWA-KDE, and the integrated distribution is
obtained by multiplying the two directional distributions.
Directly averaging the two trajectories is not
suitable because the reliability of each trajectory
depends on time. The forward-time trajectory is most
reliable near the initial state and becomes less
constrained toward the end of the horizon. Conversely,
the backward-time trajectory is most reliable near the
terminal state and becomes less constrained when
traced back to earlier states. Therefore, each sampled
trajectory is first represented as a time-state density
field, and the two density fields are integrated
afterward.
In this study, time-weighted adaptive kernel
density estimation (TWA-KDE) is introduced for this
conversion. TWA-KDE constructs a two-dimensional
density over the time index and a selected state
variable. The state variables considered here are the
ship position, heading angle, surge and sway
velocities, and yaw rate, namely X, Y, Ψ, u, v, and r. The
density for X is defined as follows:
( )
( )
( ) ( ) ( ) ( )
0
/
1
,
1
T
t
t
XX
w t T
XX
tt
f t X K K
T h t h t h t h t
=
=
+

(1)
Here, h and hX are adaptive bandwidths regarding
time t and latitudinal position X, respectively; |X|max is
a representative value for normalization; w(x) is an
adaptive weight; K(x) is the normal kernel function.
The adaptive bandwidths linearly increase over time so
as to keep the constraints at the initial condition of the
forward-time trajectory (i.e., the ship’s current state)
and the backward-time trajectory (i.e., the terminal
berthing state). This process decreases the per-sample
normalization over time, which can hinder smooth
integration of forward-time and backward-time
distributions. To address the issue, we define the








 
 

 

 


 



 








399
adaptive weight such that
( )
1
0
1w x dx =
is satisfied on
the interval [0,1].
The same construction is applied to the other five
variables. Although the variables have different
physical units and ranges, the use of variable-wise
representative scales allows the same TWA-KDE form
to be used for all of them:
( )
( )
( ) ( ) ( ) ( )
0
/
1
,
1
T
t
t
YY
w t T
YY
tt
f t Y K K
T h t h t h t h t
=
=
+

(2)
( )
( )
( ) ( ) ( ) ( )
0
ΨΨ
/
ΨΨ
1
,Ψ
1
T
t
t
w t T
tt
f t K K
T h t h t h t h t
=
=
+

(3)
( )
( )
( ) ( ) ( ) ( )
0
/
1
,
1
T
t
t
uu
w t T
uu
tt
f t u K K
T h t h t h t h t
=
=
+

(4)
( )
( )
( ) ( ) ( ) ( )
0
/
1
,
1
T
t
t
vv
w t T
vv
tt
f t v K K
T h t h t h t h t
=
=
+

(5)
( )
( )
( ) ( ) ( ) ( )
0
/
1
,
1
T
t
t
rr
w t T
rr
tt
f t r K K
T h t h t h t h t
=
=
+

(6)
( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
Ψ
where ,
Ψ
, ,
,
max
Y
max max
u
max max
vr
Y
h t h t
T
u
h t h t h t h t
TT
vr
h t h t h t h t
TT
=
=
=
=
=
The symbols hY, hΨ, hu, hv, and hr denote adaptive
bandwidths for Y, Ψ, u, v, and r, respectively. The
corresponding representative values |Y|max, |Ψ|max,
|u|max, |v|max, and |r|max were selected from the range
of the actual AIS-based trajectories. In the numerical
implementation, the time axis and each state-variable
axis were divided into 400 grid points. The proposed
integration was conducted separately for each state
variable; thus, the method avoids constructing a high-
dimensional KDE over all six variables at once.
Applying TWA-KDE to the trajectories sampled
with IL and BTIL, forward-time and backward-time
distributions are obtained as ffwd (t,) = f(t,) and fbwd (t,)
= f(T-
t
,), respectively. Note that,
t
, the time step in
backward-time simulations, are reversed to that in the
real world. Then, these distributions are integrated by
computing their joint probability as
( ) ( ) ( )
, , ,
int fwd bwd
f t f t f t =
(7)
Finally, assuming fint (|t)fint (t,), an integrated
trajectory can be derived by taking the maximum
values of the integrated distributions at each time step.
( )
argmax |
t int
X
X f X t
=
(8)
( )
argmax |
t int
Y
Y f Y t
=
(9)
( )
Ψ
Ψ argmax Ψ |
t int
ft
=
(10)
( )
argmax |
t int
u
u f u t
=
(11)
( )
argmax |
t int
v
v f v t
=
(12)
( )
argmax |
t int
r
r f r t
=
(13)
The entire process, including forward sampling,
backward sampling, and integration, is completed
within a few seconds on a common PC, and thus our
approach can generate an integrated trajectory in real
time. Note that real time” here specifically denotes
online planning with pre-trained IL/BTIL models,
excluding any offline model updates during the
process.
2.4 Architecture and Settings
The forward-time and backward-time planners were
implemented with the same state representation,
action definition, and network size so that their outputs
could be integrated on a common state space. The
principal architecture followed the previous BTIL
framework [13], while the present work used two
directional planners and combined their outputs
through the distribution-level procedure described in
Section 2.3.
At each time step (t or
t
), the input state is
composed of two parts:
( ) ( )
,
t base detect
s S t S t

=

(14)
where Sbase corresponds to a target ship and destination,
and Sdetect corresponds to the detection of port
geometry.
Figure 2 illustrates the definition of the base state
variables used to describe the relative geometry
between the ship and the target berth. Specifically, Sbase
is given by
' ' ' '
, , , , , , ,
base x y
S d d d d u v r

=

(15)
where the prime indicates normalization.
Figure 2. Definition of the base state variables used to
describe the relative geometry between the ship and the
target berth. The state includes the destination position
expressed in both the ship-fixed frame (dx and dy) and the



 
400
berth-aligned frame (
d
and
d
), the heading error relative
to the berth (
ψ
), and the ship-motion variables (u, v, and r).
In this study, the terminal condition was defined as
the ideal docking state at t=T, where (i) the position
deviation from the destination is zero, (ii) the heading
deviation from the terminal berth is zero, and (iii) the
speed over ground is zero. These three conditions can
be simplified in the following form,
0, 0, 0 d U at t T
= = = =
(16)
2 2 2 2 2 2
,
xy
whered d d d d U u v
= + = + = +
Sdetect gives a simple geometric description of the
surrounding quay. As shown in Figure 3, multiple
detection lines are extended from the ship, and each
element stores the normalized distance to the first
intersection with a quay line:
''
1
,,
detect N
S d d
=
(17)
'
where
1
i
lim
i
d
if quaylineexists
d
d
if quaylinedoesnotexist
=
Figure 3. Detection-line representation of the static quay
geometry. Each detection line stores the distance from the
ship to the first intersection with a quay line.
If no quay line is detected within the detection
range dlim, the corresponding value is set to one. Note
that Sdetect is used only for detecting static quay
geometry in the virtual port, i.e., not for dynamic
obstacle avoidance.
The actor outputs a stochastic action from a beta
policy [19]. The action was defined in the acceleration
space as
( )
, , ,
t
a A t A u v r==
(18)
Note that a ship maneuvering model was not
incorporated, as this study aims to directly imitate
expert berthing trajectories extracted from AIS data.
The learning settings were kept close to those in the
previous study [13] to isolate the effect of the proposed
forward-backward integration. The actor and critic
each consisted of three hidden layers with 64 units and
tanh activation. The discriminator consisted of two
hidden layers with 64 units and ELU activation. Adam
[20] was used for both the generator and discriminator,
with learning rates of αg=1 ×10
-5
and αd=5×10
-5
. For the
generator (PPO), the clipping value and entropy
coefficient were set as ε=0.2 and c=1×10
-4
; for the
discriminator (TRAIL), the interval and coefficient of
invariant set were determined as TI=20 and λ=1.0,
respectively. The simulation time step, horizon length,
and discount factor were Δt=5 s, T=300, and γ=0.99,
respectively. The episode ended when the ship collided
with the quay. The number, length, and interval of the
detection lines were set as N=16, dlim=1 km, and
Δφ=22.5°, respectively.
3 RESULTS
3.1 Preprocessing and Training
The proposed planner was trained and tested using
AIS records of actual berthing operations at Shinmoji
Port, Fukuoka Prefecture, Japan. A virtual port used in
the learning environment was reconstructed from
geographic coordinates of the quay lines. The origin of
the local coordinate system was set at 130.9844°E and
33.8688°N.
The AIS data used in this study were obtained in
December 2021, and January and August 2022. Each
record contained the time stamp, longitude, latitude,
speed over ground, course over ground, and heading
angle. Because AIS messages were not sampled at a
constant interval, the raw histories were first
resampled to a uniform 1-s time grid by cubic-spline
interpolation. The surge and sway velocity
components were then obtained by combining the
speed over ground with the drift angle calculated from
the course and heading. The yaw rate and yaw
acceleration were computed from the heading-angle
history by finite differences. Since the acceleration
signals derived from AIS data were sensitive to
measurement noise and interpolation errors, the
angular acceleration was smoothed by a moving
average with k=2 before being used for learning.
Figure 4. Berthing trajectory of Housho-Maru (indicated in
gray) with the assistance of the tugboat Kazashi-Maru
(indicated in red) on December 27, 2021.


    








401
A PCC named Housho-Maru was selected as the
expert data, which has an overall length of 165.0 m and
an overall breadth of 27.6 m and is equipped with bow
and stern thrusters. By plotting the positions of nearby
vessels during the berthing phase of Housho-Maru, as
shown in Figure 4, we identified instances where a
tugboat remained tightly alongside the target ship and
supported its berthing. Such tugboat-assisted cases
were visually distinguished and manually excluded
from the dataset. Consequently, the five berthing
trajectories shown in Figure 5 were used as training
data.
Figure 5. Berthing trajectories of Housho-Maru on December
5, 9, and 13, 2021, and January 19 and 27, 2022.
The normalization constants and action limits were
determined from these AIS-based berthing histories.
The representative state scales were set to
|X|max=|Y|max=2 km, |Ψ|max=360°, |u|max=3.0 m/s,
|v|max=0.5 m/s, and |r|max=1.2 °/s. The action bounds
were set to |
u
|max=1.6×10
-2
m/s
2
, |
v
|max=1.4×10
-2
m/s
2
,
and |
r
|max=0.18 °/s
2
.
3.2 Verification of IL and BTIL
After sufficient training of the IL-based and BTIL-
based route planners, we tested them with eight
trajectories of Toyofuji-Maru (obtained on December 7,
15, and 23, 2021; January 7, 15, and 24, 2022; and
August 9 and 23, 2022), which is one of the sister ships
of Housho-Maru and has the same hull form, actuators,
and destination but different initial states.
Figure 6 presents the forward-time trajectories and
histories of the ship’s positions and velocities
generated by IL. The tested trajectories followed the
overall trend of the expert trajectories obtained from
Toyofuji-Maru. However, deviations emerged as time
progressed, and the terminal states did not align with
those of the expert. In some cases, the final ship
position remained separated from the target berth and
was not aligned parallel to it, indicating that the
berthing task was not successfully completed. This is
because TRAIL optimizes the agent’s policy so that the
encountered states remain close to those of the expert
on average, while no explicit constraint is imposed at
terminal states.
Figure 6. IL results for the Toyofuji-Maru test cases: planned
tracks and corresponding time histories of position and
velocity components.
To address this issue, BTIL was presented. Figure 7
shows the backward-time trajectories and histories of
the ship’s positions and velocities generated by BTIL.
Note that the time step is reversed, and the initial state
in the simulations (t =0) represents the terminal state in
the real world (t=T). It is also important to note that the
signs of the velocity components u, v, and r are
reversed. As a matter of course, the initial states of the
tested trajectories perfectly matched the terminal states
of the expert. However, as it went back in time, the
discrepancy with the expert increased; as a result, in
some cases, the BTIL-based planner failed to draw
feasible routes from the initial position of the ship.
Figure 7. BTIL results for the Toyofuji-Maru test cases:
planned tracks and corresponding time histories of position
and velocity components.
    







󰇛󰇜
󰇛󰇜
󰇛󰇜
󰇛
󰇜
󰇛󰇜
󰇛󰇜
󰇛
󰇜
󰇛󰇜 󰇛󰇜
󰇛󰇜 󰇛󰇜 󰇛󰇜
󰇛󰇜 󰇛󰇜 󰇛󰇜
  
  












󰇛󰇜
󰇛󰇜
󰇛󰇜
󰇛
󰇜
󰇛󰇜
󰇛󰇜
󰇛
󰇜
󰇛󰇜 󰇛󰇜
󰇛󰇜 󰇛󰇜 󰇛󰇜
󰇛󰇜 󰇛󰇜 󰇛󰇜
  
  












402
3.3 Validation of the Proposed Method
The proposed framework was then evaluated. Figure 8
shows the forward-time, backward-time, and
integrated distributions regarding the ship’s positions
and velocities. As intended, the forward distribution
exhibits a sharp and narrow peak at t=0, corresponding
to the ships initial state, and gradually broadens as
time proceeds. In contrast, the backward distribution
shows a sharp and narrow peak at
t
=0,
corresponding to the docking stage, and gradually
spreads when traced back in time. By computing the
joint probability of these distributions, a smooth
integrated distribution was obtained that bridges the
forward- and backward-time behaviors.
Figure 8. Forward-time, backward-time and integrated
distributions regarding the ship’s positions and velocities.
The distributions are compared with the expert data (the AIS
data of Toyofuji-Maru obtained on December 23, 2021).
By selecting the ideal velocity that maximizes the
distribution at each time step as in (8)(13), an
integrated trajectory can be readily obtained. Figure 9
shows the integrated trajectories and time histories of
the ship’s positions and velocities. The results indicate
that the integrated trajectories agree with the real-ship
trajectories over the entire berthing process. Note that
the generation of forward-time and backward-time
trajectories, as well as their integration, each requires
less than one second, enabling real-time trajectory
planning within a few seconds.
Figure 10 compares the terminal position deviations
d, heading deviations ψ, and speeds over ground U for
the expert, IL, and the proposed method. Relative to
the IL-only, the proposed method greatly improved
each metric and was able to satisfy the terminal
constraints to a degree comparable to the expert. Even
for the expert, d and ψ were not exactly zero because
these values are calculated as deviations from the mean
terminal state in the expert data. Although the terminal
speed of the proposed method did not become exactly
zero, the maximum residual was about 1 cm/s, and its
impact is therefore considered sufficiently small.
Figure 9. Trajectories extracted from the integrated forward-
backward distributions and their associated state histories.
Figure 10. Box plots of the terminal position deviation,
heading deviation, and speed over ground across the eight
test cases: December 7, 15, and 23, 2021; January 7, 15, and 24,
2022; and August 9 and 23, 2022.
4 DISCUSSION
In this section, we investigated the generalization
performance of the proposed method to different
initial states, destinations, and berthing styles. Note
that the proposed method outputs accelerations within
the IL framework and generates trajectories feasible for
the target ship; thus, it is not expected to generalize
across substantially different ship types or sizes, where
feasible trajectories can change markedly. We therefore
excluded such cross-vessel generalization from the
scope of this study.
We collected AIS data from four car ferries, named
Settsu, Yamato, Hibiki, and Izumi, which are similar in
size to Housho-Maru/Toyofuji-Maru and are equipped
with bow and stern thrusters. These four ferries are
sister ships with nearly identical hull forms (overall
length 195.0 m and breadth 29.6 m). Figure 11 shows an
example of their berthing trajectories; Hibiki and
Izumi, and Settsu and Yamato, berth at the same
destination, respectively.
󰇛󰇜
󰇛󰇜
󰇛
󰇜
󰇛󰇜
󰇛󰇜
󰇛
󰇜

󰇛󰇜
󰇛󰇜 󰇛󰇜 󰇛󰇜

󰇛󰇜

󰇛󰇜
  
󰇛󰇜
󰇛󰇜
󰇛󰇜
󰇛
󰇜
󰇛󰇜
󰇛󰇜
󰇛
󰇜
󰇛󰇜 󰇛󰇜
󰇛󰇜 󰇛󰇜 󰇛󰇜
󰇛󰇜 󰇛󰇜 󰇛󰇜
  
  












  





  


  










403
Figure 11. Berthing trajectories of Housho-Maru, Toyofuji-
Maru, Hibiki, Izumi, Settsu, and Yamato.
Table 1 summarizes the training and test datasets.
Using data from Housho-Maru, Hibiki, and Settsu, we
trained three IL/BTIL model sets, A, B, and C,
respectively. We validated them using the test data
from Toyofuji-Maru, Izumi, and Yamato. Note that the
model set A is identical to that used in the previous
section, whose generalization performance to different
initial states (i.e., from Housho-Maru to Toyofuji-
Maru) has already been presented.
Table 1. Summary of the training and test datasets.
Ship name
Data
count
Data acquisition date
Usage
Housho-
Maru
5
December 5, 9, and 13, 2021
January 19 and 27, 2022
Training for
model set A
Hibiki
8
January 2, 4, 6, 8, 10, 12, 18, and
20, 2022
Training for
model set B
Settsu
13
January 1, 3, 5, 7, 11, 13, 15, 17,
21, 23, 25, 27, and 31, 2022
Training for
model set C
Toyofuji-
Maru
8
December 7, 15, and 23, 2021
January 7, 15, and 24, 2022
August 9 and 23, 2022
Test (bow-out
berthing)
Izumi
14
December 2, 4, 6, 8, 10, 12, 14, 16,
18, 20, 22, 26, 28, and 30, 2021
Test (bow-in
berthing)
Yamato
11
December 1, 3, 5, 7, 9, 11, 13, 15,
17, 19, and 21, 2021
Test (bow-in
berthing)
For the training of the model sets B and C, the
representative values for normalization were
configured as |X|max=|Y|max=2 km, |Ψ|max=360°,
|u|max=4.0 m/s, |v|max=0.5 m/s, and |r|max=0.8°/s. The
upper and lower bounds of the action space were
determined as |
u
|max=2.2×10
-2
m/s
2
, |
v
|max=1.4×10
-2
m/s
2
, and |
r
|max=0.1°/s
2
.
Figure 12 shows the trajectories using model sets A,
B, and C in the test cases of Toyofuji-Maru, Izumi, and
Yamato. The model set A, trained to imitate Housho-
Maru, failed to reproduce the berthing trajectories of
Izumi and Yamato; conversely, the model sets B and C,
trained on Hibiki and Settsu, failed to reproduce
Toyofuji-Maru. The current IL-based framework
struggled to generalize across berthing styles because
the possible actions, especially angular acceleration,
differ significantly between bow-in and bow-out
berthing. On the other hand, within the same berthing
styles (bow-in to bow-in), the proposed method was
able to generate trajectories comparable to the expert
even when the initial state and destination differed.
Figure 12. Integrated trajectories generated using the model
sets A, B, and C. The tested results are compared with the
expert data (the AIS data of Toyofuji-Maru obtained on
August 9, 2022; Izumi on December 16, 2021; and Yamato on
December 17, 2021).
Figure 13 presents the degree of satisfaction of the
terminal conditions when using the model sets B and
C. In all bow-in cases, the proposed method kept
terminal position/heading deviations and speeds over
ground at levels comparable to the expert, indicating
that the terminal constraints were effectively enforced.
When applied to the same destinations as in training,
i.e., from Hibiki to Izumi and from Settsu to Yamato, it
can be confirmed that even IL-only satisfied the
terminal constraints well. These results suggest that,
for bow-in berthing in which the state does not change
drastically during berthing, the proposed method was
effective for generalizing to different destinations,
rather than the same destinations.
Figure 13. Box plots of the terminal position deviation,
heading deviation, and speed over ground in the test cases of
Izumi and Yamato.
    













󰇛󰇜 󰇛󰇜 󰇛󰇜
󰇛󰇜 󰇛󰇜 󰇛󰇜
  
 






󰇛󰇜 󰇛󰇜 󰇛󰇜
󰇛󰇜 󰇛󰇜 󰇛󰇜
  




 


󰇛󰇜 󰇛󰇜 󰇛󰇜
󰇛󰇜 󰇛󰇜 󰇛󰇜
  







   

   

404
5 CONCLUSION
In this study, we proposed a real-time bidirectional
trajectory planning framework that integrates forward-
time imitation learning (IL) and backward-time
imitation learning (BTIL). The proposed framework
transforms forward-time and backward-time
trajectories into probability distributions using the
time-weighted adaptive kernel density estimation
(TWA-KDE) and generates routes by taking the joint
distributions of forward-time and backward-time
trajectory distributions. Since the framework is based
on IL, it can generate expert-like berthing trajectories
from AIS-based demonstrations without manual
design or tuning of objective functions. Also, the
required amount of expert data is limited to a small
number of AIS trajectories, which reduces the burden
of data collection.
One promising direction for future work is the
incorporation of disturbance effects. While
disturbances were not considered in the current
formulation, integrating meteorological and
oceanographic data corresponding to the AIS records
would enable route planning that accounts for
environmental disturbances in a natural manner.
Furthermore, there should exist a threshold beyond
which the forward-time and backward-time trajectory
distributions can no longer be smoothly integrated,
and identifying this threshold warrants further
investigation. If such a threshold can be quantified, it
could serve as a criterion for determining whether safe
berthing is feasible from the current state and could
support supervisory decision-making, including
transitions from autonomous operation to human
intervention when necessary. Future work will
therefore focus on validating the effectiveness of the
proposed method in real-world environments while
addressing these issues.
REFERENCES
U. Gruenefeld, T. C. Stratmann, Y. Brueck, A. Hahn, S. Boll,
and W. Heuten, “Investigations on container ship
berthing from the pilot’s perspective: Accident analysis,
ethnographic study, and online survey,” TransNav,
International Journal on Marine Navigation and Safety of
Sea Transportation, vol. 12(3), pp. 493498, 2018.
K. Shouji, K. Ohtsu, and S. Mizoguchi, An automatic
berthing study by optimal control techniques,” IFAC
Proceedings Volumes, vol. 25(3), pp. 185194, 1992.
N. Mizuno, Y. Uchida, and T. Okazaki, “Quasi real-time
optimal control scheme for automatic berthing,” IFAC-
PapersOnLine, vol. 48(16), pp. 305312, 2015.
A. Maki, N. Sakamoto, Y. Akimoto, H. Nishikawa, and N.
Umeda, “Application of optimal control theory based on
the evolution strategy (CMA-ES) to automatic berthing,”
Journal of Marine Science and Technology, vol. 25(1), pp.
221233, 2020.
R. Suyama, Y. Miyauchi, and A. Maki, “Ship trajectory
planning method for reproducing human operation at
ports,” Ocean Engineering, vol. 266, p. 112763, 2022.
H. Yamato, H. Uetsuki, and T. Koyama, “Automatic berthing
by the neural controller,” Proceedings of the Ninth Ship
Control Systems Symposium, vol. 3, pp. 183201, 1990.
N. K. Im, S. K. Lee, and D. B. Hyung, “An application of ANN
to automatic ship berthing using selective controller,”
TransNav, International Journal on Marine Navigation
and Safety of Sea Transportation, vol. 1(1), pp. 101105,
2007.
Y. A. Ahmed, and K. Hasegawa, “Consistently trained
artificial neural network for automatic ship berthing
control,” TransNav, the International Journal on Marine
Navigation and Safety of Sea Transportation, vol. 9(3), pp.
417426, 2015.
N. K. Im, and V. S. Nguyen, Artificial neural network
controller for automatic ship berthing using head-up
coordinate system,” International Journal of Naval
Architecture and Ocean Engineering, vol. 10(3), pp. 235
249, 2018.
S. Shimizu, K. Nishihara, Y. Miyauchi, K. Wakita, R. Suyama,
A. Maki, and S. Shirakawa, “Automatic berthing using
supervised learning and reinforcement learning,” Ocean
Engineering, vol. 265, p. 112553, 2022.
Y. Higo, M. Sakano, H. Nobe, and H. Hashimoto,
“Development of trajectory-tracking maneuvering
system for automatic berthing/unberthing based on
double deep Q-network and experimental validation
with an actual large ferry,” Ocean Engineering, vol. 287,
p. 115750, 2023.
T. Higaki, H. Nobe, and H. Hashimoto, “Human-like
automatic berthing system based on imitative trajectory
plan and tracking control,” Proc. OCEANS 2024-
Singapore, pp. 15, 2024.
T. Higaki, and H. Hashimoto, Docking assistance method
for autonomous berthing by backward-time imitation
learning and kernel density estimation based on AIS
data,” Ocean Engineering, vol. 318, p. 120122, 2025.
H. Zhuang, Q. Shen, Y. Qian, W. Yuan, C. Wang, and M.
Yang, “Fast bidirectional motion planning for self-driving
general N-trailers vehicle maneuvering in narrow space,”
IEEE Open Journal of Intelligent Transportation Systems,
vol. 4, pp. 989999, 2023.
Z. Sheng, T. Song, J. Song, Y. Liu, and P. Ren, “Bidirectional
rapidly exploring random tree path planning algorithm
based on adaptive strategies and artificial potential
fields,” Engineering Applications of Artificial
Intelligence, vol. 148, p. 110393, 2025.
I. Yanchin, and O. Petrov, “Towards autonomous shipping:
Benefits and challenges in the field of information
technology and telecommunication,” TransNav,
International Journal on Marine Navigation and Safety of
Sea Transportation, vol. 14(3), pp. 611619, 2020.
K. Zolna, S. Reed, A. Novikov, S. G. Colmenarejo, D. Budden,
S. Cabi, M. Denil, N. Freitas, and Z. Wang, Task-relevant
adversarial imitation learning,” Proceedings of the 2020
Conference on Robot Learning, PMLR 155, pp. 247263,
2021.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O.
Klimov, “Proximal policy optimization algorithms,”
arXiv:1707.06347, 2017.
P. W. Chou, D. Maturana, and S. Scherer, “Improving
stochastic policy gradients in continuous control with
deep reinforcement learning using the beta distribution,”
Proceedings of the 34th International Conference on
Machine Learning, PMLR 70, pp. 834843, 2017.
D. P. Kingma, and J. Ba, “Adam: A method for stochastic
optimization,” Proceedings of the 3rd International
Conference on Learning Representations, ICLR 2015,
2015.