397

1 INTRODUCTION

Berthing is a critical phase of marine navigation in

confined port waters, where a ship must approach a

quay with small positional, heading, and speed errors

while maintaining sufficient margins against

surrounding structures. Previous studies from the

viewpoint of harbor pilots have shown that berthing is

operationally demanding because it is affected by port

geometry, ship size, human decision making, and

limited maneuvering space [1]. Such practical

characteristics make berthing trajectory planning

different from ordinary waypoint-following problems:

the planner must update a feasible approach from the

current ship state while still satisfying the prescribed

terminal position, heading, and speed at the berth.

Automatic berthing has been studied using

model-based optimization [2–5], neural-network

control [6–9], reinforcement learning [10–11], and

imitation learning (IL) [12]. Model-based approaches

can explicitly handle constraints and ship dynamics,

but they often require careful model identification and

repeated optimization for each operating condition.

Learning-based approaches can provide fast online

inference after training, but a forward-time policy does

not necessarily satisfy the terminal berthing condition

because the terminal pose and speed are not imposed

as hard constraints. This issue was addressed by

introducing backward-time imitation learning (BTIL),

which generated backward-time trajectory

distributions from AIS data [13]. However, a

backward-time plan alone is not always consistent

with the ship’s current state.

A Bidirectional Imitation-Learning Framework for

Real-Time Trajectory Planning in Automatic Berthing

T. Higaki & H. Hashimoto

Osaka Metropolitan University, Sakai, Osaka, Japan

ABSTRACT: This paper presents a bidirectional imitation-learning framework for AIS-based real-time trajectory

planning in automatic berthing. The framework connects a ship’s current state with a prescribed terminal

berthing condition by using two imitation-learning policies: forward-time imitation learning from the current

ship state and backward-time imitation learning from the desired terminal state. The sampled trajectories are

converted into time-indexed probability distributions by time-weighted adaptive kernel density estimation, and

the integrated trajectory is obtained from the joint distribution of the two directions. This formulation is intended

to preserve the feasibility of the initial approach while improving consistency with the final berthing position,

heading, and speed. The method was evaluated using actual AIS trajectories of PCC and car ferries berthing at

Shinmoji Port, Japan. The results show that the integrated planner reduced the terminal deviations observed in

the forward-time planner alone and avoided the initial-state inconsistency observed in the backward-time

planner alone. The proposed approach is positioned as a practical data-driven planning framework for real-time

berthing support under similar vessel and port conditions.

http://www.transnav.eu

the International Journal

on Marine Navigation

and Safety of Sea Transportation

Volume 20

Number 2

June 2026

DOI: 10.12716/1001.20.02.14

398

Therefore, we propose a bidirectional imitation-

learning framework that connects an IL trajectory from

the ship’s current state with a BTIL trajectory from the

terminal berthing state using kernel density estimation

(KDE). This integration enables the construction of

trajectories that are consistent with both the current

ship state and the terminal berthing condition. The

concept of exploiting information from both the start

and goal sides aligns with the motion-planning

literatures [14–15], where bidirectional

sampling/exploration has been shown to improve

planning efficiency and robustness in constrained

environments. In addition, the offline pre-training of IL

and BTIL allows the proposed method to generate

trajectories in real time during online operation.

Furthermore, our IL/BTIL-based framework avoids

explicit reward design, which is advantageous because

even experts may fail to define a reward function that

leads to the intended task completion in autonomous

navigation [16].

In this study, using AIS data from vessels berthing

at Shinmoji Port in Fukuoka Prefecture, Japan, we

trained both IL and BTIL models and examined the

limitations of each model when being used

independently. Then, we demonstrated that the

proposed method resolved these issues and achieved

real-time trajectory generation that was consistent with

the initial state and terminal berthing condition under

the static quay geometry. Finally, we investigated the

generalization performance of the proposed method to

different initial states, destinations, and berthing styles.

2 METHODS

2.1 Imitation Learning (IL)

The forward-time trajectory planner was trained by task-

relevant adversarial imitation learning (TRAIL; [17]) with

proximal policy optimization (PPO; [18]) as the policy

optimizer. The network structure and principal learning

settings followed the previous study [13]. In this paper, IL is

used only to generate candidate approach trajectories from

the current ship state; the terminal-side correction is

provided later by BTIL.

2.2 Backward-Time Imitation Learning (BTIL)

BTIL is adopted to generate backward-time

trajectories. The detailed algorithm has been described

in [13]; briefly, each expert trajectory is reversed in

time, and the policy is trained from the terminal

berthing condition toward earlier states. This training

direction makes the sampled trajectories naturally

concentrated around the desired terminal condition

when they are mapped back to forward time. In this

study, BTIL is not used as a stand-alone planner, but as

one of the two distributions to be integrated with the

forward-time IL result.

2.3 Integration of IL and BTIL

As shown in Figure 1, the proposed framework trains

the IL and BTIL policies independently and then

integrates their sampled trajectories at the distribution

level. The forward-time policy is first rolled out from

the initial condition of the ship, whereas the backward-

time policy is rolled out from the prescribed berthing

condition and then mapped back to the ordinary time

direction. These two rollouts provide complementary

information: the former reflects reachability from the

current ship state, and the latter reflects consistency

with the terminal berthing condition.

Figure 1. Overview of the proposed bidirectional imitation-

learning framework. The forward-time policy generates an

approach from the initial ship state, whereas the backward-

time policy generates an approach from the terminal berthing

state. Each rollout is converted into a time-indexed trajectory

distribution by TWA-KDE, and the integrated distribution is

obtained by multiplying the two directional distributions.

Directly averaging the two trajectories is not

suitable because the reliability of each trajectory

depends on time. The forward-time trajectory is most

reliable near the initial state and becomes less

constrained toward the end of the horizon. Conversely,

the backward-time trajectory is most reliable near the

terminal state and becomes less constrained when

traced back to earlier states. Therefore, each sampled

trajectory is first represented as a time-state density

field, and the two density fields are integrated

afterward.

In this study, time-weighted adaptive kernel

density estimation (TWA-KDE) is introduced for this

conversion. TWA-KDE constructs a two-dimensional

density over the time index and a selected state

variable. The state variables considered here are the

ship position, heading angle, surge and sway

velocities, and yaw rate, namely X, Y, Ψ, u, v, and r. The

density for X is defined as follows:

( )

( ) ( ) ( ) ( )

w t T

f t X K K

T h t h t h t h t



   

−

   

















(1)

( ) ( ) ( )

( )

where 20 1, ,

sinh 1

max

h t h t h t

w x e K x e





−





= + =



  

Here, h and hX are adaptive bandwidths regarding

time t and latitudinal position X, respectively; |X|max is

a representative value for normalization; w(x) is an

adaptive weight; K(x) is the normal kernel function.

The adaptive bandwidths linearly increase over time so

as to keep the constraints at the initial condition of the

forward-time trajectory (i.e., the ship’s current state)

and the backward-time trajectory (i.e., the terminal

berthing state). This process decreases the per-sample

normalization over time, which can hinder smooth

integration of forward-time and backward-time

distributions. To address the issue, we define the















 



 



 





 









 













399

adaptive weight such that

( )

1w x dx =



is satisfied on

the interval [0,1].

The same construction is applied to the other five

variables. Although the variables have different

physical units and ranges, the use of variable-wise

representative scales allows the same TWA-KDE form

to be used for all of them:

( )

( ) ( ) ( ) ( )

w t T

f t Y K K

T h t h t h t h t



   

−

   

















(2)

( )

( ) ( ) ( ) ( )

ΨΨ

,Ψ

w t T

f t K K

T h t h t h t h t



   

−

   

















(3)

( )

( ) ( ) ( ) ( )

w t T

f t u K K

T h t h t h t h t



   

−

   

















(4)

( )

( ) ( ) ( ) ( )

w t T

f t v K K

T h t h t h t h t



   

−

   

















(5)

( )

( ) ( ) ( ) ( )

w t T

f t r K K

T h t h t h t h t



   

−

   

















(6)

( ) ( )

( ) ( ) ( ) ( )

where ,

, ,

max

max max

h t h t

h t h t h t h t



  



  

The symbols hY, hΨ, hu, hv, and hr denote adaptive

bandwidths for Y, Ψ, u, v, and r, respectively. The

corresponding representative values |Y|max, |Ψ|max,

|u|max, |v|max, and |r|max were selected from the range

of the actual AIS-based trajectories. In the numerical

implementation, the time axis and each state-variable

axis were divided into 400 grid points. The proposed

integration was conducted separately for each state

variable; thus, the method avoids constructing a high-

dimensional KDE over all six variables at once.

Applying TWA-KDE to the trajectories sampled

with IL and BTIL, forward-time and backward-time

distributions are obtained as ffwd (t,) = f(t,) and fbwd (t,)

= f(T-

,), respectively. Note that,

, the time step in

backward-time simulations, are reversed to that in the

real world. Then, these distributions are integrated by

computing their joint probability as

( ) ( ) ( )

, , ,

int fwd bwd

f t f t f t =  

(7)

Finally, assuming fint (|t)∝fint (t,), an integrated

trajectory can be derived by taking the maximum

values of the integrated distributions at each time step.

( )

argmax |

t int

X f X t



(8)

( )

argmax |

t int

Y f Y t



(9)

( )

Ψ argmax Ψ |

t int



(10)

( )

argmax |

t int

u f u t



(11)

( )

argmax |

t int

v f v t



(12)

( )

argmax |

t int

r f r t



(13)

The entire process, including forward sampling,

backward sampling, and integration, is completed

within a few seconds on a common PC, and thus our

approach can generate an integrated trajectory in real

time. Note that “real time” here specifically denotes

online planning with pre-trained IL/BTIL models,

excluding any offline model updates during the

process.

2.4 Architecture and Settings

The forward-time and backward-time planners were

implemented with the same state representation,

action definition, and network size so that their outputs

could be integrated on a common state space. The

principal architecture followed the previous BTIL

framework [13], while the present work used two

directional planners and combined their outputs

through the distribution-level procedure described in

Section 2.3.

At each time step (t or

), the input state is

composed of two parts:

( ) ( )

t base detect

s S t S t





(14)

where Sbase corresponds to a target ship and destination,

and Sdetect corresponds to the detection of port

geometry.

Figure 2 illustrates the definition of the base state

variables used to describe the relative geometry

between the ship and the target berth. Specifically, Sbase

is given by

' ' ' '

, , , , , , ,

base x y

S d d d d u v r



⊥





   

(15)

where the prime indicates normalization.

Figure 2. Definition of the base state variables used to

describe the relative geometry between the ship and the

target berth. The state includes the destination position

expressed in both the ship-fixed frame (dx and dy) and the







 

400

berth-aligned frame (

and

⊥

), the heading error relative

to the berth (

), and the ship-motion variables (u, v, and r).

In this study, the terminal condition was defined as

the ideal docking state at t=T, where (i) the position

deviation from the destination is zero, (ii) the heading

deviation from the terminal berth is zero, and (iii) the

speed over ground is zero. These three conditions can

be simplified in the following form,

0, 0, 0 d U at t T



= = = =

(16)

2 2 2 2 2 2

whered d d d d U u v

⊥

= + = + = +

Sdetect gives a simple geometric description of the

surrounding quay. As shown in Figure 3, multiple

detection lines are extended from the ship, and each

element stores the normalized distance to the first

intersection with a quay line:

detect N

S d d





(17)

where

lim

if quaylineexists

if quaylinedoesnotexist











Figure 3. Detection-line representation of the static quay

geometry. Each detection line stores the distance from the

ship to the first intersection with a quay line.

If no quay line is detected within the detection

range dlim, the corresponding value is set to one. Note

that Sdetect is used only for detecting static quay

geometry in the virtual port, i.e., not for dynamic

obstacle avoidance.

The actor outputs a stochastic action from a beta

policy [19]. The action was defined in the acceleration

space as

( )

 

, , ,

a A t A u v r==

(18)

Note that a ship maneuvering model was not

incorporated, as this study aims to directly imitate

expert berthing trajectories extracted from AIS data.

The learning settings were kept close to those in the

previous study [13] to isolate the effect of the proposed

forward-backward integration. The actor and critic

each consisted of three hidden layers with 64 units and

tanh activation. The discriminator consisted of two

hidden layers with 64 units and ELU activation. Adam

[20] was used for both the generator and discriminator,

with learning rates of αg=1 ×10

-5

and αd=5×10

-5

. For the

generator (PPO), the clipping value and entropy

coefficient were set as ε=0.2 and c=1×10

-4

; for the

discriminator (TRAIL), the interval and coefficient of

invariant set were determined as TI=20 and λ=1.0,

respectively. The simulation time step, horizon length,

and discount factor were Δt=5 s, T=300, and γ=0.99,

respectively. The episode ended when the ship collided

with the quay. The number, length, and interval of the

detection lines were set as N=16, dlim=1 km, and

Δφ=22.5°, respectively.

3 RESULTS

3.1 Preprocessing and Training

The proposed planner was trained and tested using

AIS records of actual berthing operations at Shinmoji

Port, Fukuoka Prefecture, Japan. A virtual port used in

the learning environment was reconstructed from

geographic coordinates of the quay lines. The origin of

the local coordinate system was set at 130.9844°E and

33.8688°N.

The AIS data used in this study were obtained in

December 2021, and January and August 2022. Each

record contained the time stamp, longitude, latitude,

speed over ground, course over ground, and heading

angle. Because AIS messages were not sampled at a

constant interval, the raw histories were first

resampled to a uniform 1-s time grid by cubic-spline

interpolation. The surge and sway velocity

components were then obtained by combining the

speed over ground with the drift angle calculated from

the course and heading. The yaw rate and yaw

acceleration were computed from the heading-angle

history by finite differences. Since the acceleration

signals derived from AIS data were sensitive to

measurement noise and interpolation errors, the

angular acceleration was smoothed by a moving

average with k=2 before being used for learning.

Figure 4. Berthing trajectory of Housho-Maru (indicated in

gray) with the assistance of the tugboat Kazashi-Maru

(indicated in red) on December 27, 2021.





    

 











 



401

A PCC named Housho-Maru was selected as the

expert data, which has an overall length of 165.0 m and

an overall breadth of 27.6 m and is equipped with bow

and stern thrusters. By plotting the positions of nearby

vessels during the berthing phase of Housho-Maru, as

shown in Figure 4, we identified instances where a

tugboat remained tightly alongside the target ship and

supported its berthing. Such tugboat-assisted cases

were visually distinguished and manually excluded

from the dataset. Consequently, the five berthing

trajectories shown in Figure 5 were used as training

data.

Figure 5. Berthing trajectories of Housho-Maru on December

5, 9, and 13, 2021, and January 19 and 27, 2022.

The normalization constants and action limits were

determined from these AIS-based berthing histories.

The representative state scales were set to

|X|max=|Y|max=2 km, |Ψ|max=360°, |u|max=3.0 m/s,

|v|max=0.5 m/s, and |r|max=1.2 °/s. The action bounds

were set to |

|max=1.6×10

-2

m/s

, |

|max=1.4×10

-2

m/s

and |

|max=0.18 °/s

3.2 Verification of IL and BTIL

After sufficient training of the IL-based and BTIL-

based route planners, we tested them with eight

trajectories of Toyofuji-Maru (obtained on December 7,

15, and 23, 2021; January 7, 15, and 24, 2022; and

August 9 and 23, 2022), which is one of the sister ships

of Housho-Maru and has the same hull form, actuators,

and destination but different initial states.

Figure 6 presents the forward-time trajectories and

histories of the ship’s positions and velocities

generated by IL. The tested trajectories followed the

overall trend of the expert trajectories obtained from

Toyofuji-Maru. However, deviations emerged as time

progressed, and the terminal states did not align with

those of the expert. In some cases, the final ship

position remained separated from the target berth and

was not aligned parallel to it, indicating that the

berthing task was not successfully completed. This is

because TRAIL optimizes the agent’s policy so that the

encountered states remain close to those of the expert

on average, while no explicit constraint is imposed at

terminal states.

Figure 6. IL results for the Toyofuji-Maru test cases: planned

tracks and corresponding time histories of position and

velocity components.

To address this issue, BTIL was presented. Figure 7

shows the backward-time trajectories and histories of

the ship’s positions and velocities generated by BTIL.

Note that the time step is reversed, and the initial state

in the simulations (t =0) represents the terminal state in

the real world (t=T). It is also important to note that the

signs of the velocity components u, v, and r are

reversed. As a matter of course, the initial states of the

tested trajectories perfectly matched the terminal states

of the expert. However, as it went back in time, the

discrepancy with the expert increased; as a result, in

some cases, the BTIL-based planner failed to draw

feasible routes from the initial position of the ship.

Figure 7. BTIL results for the Toyofuji-Maru test cases:

planned tracks and corresponding time histories of position

and velocity components.

    

 











 

 󰇛󰇜

 󰇛󰇜

 󰇛



󰇜

 󰇛󰇜

 󰇛󰇜

 󰇛



󰇜

 󰇛󰇜  󰇛󰇜

 󰇛󰇜  󰇛󰇜  󰇛󰇜

 󰇛󰇜  󰇛󰇜  󰇛󰇜

  

























 󰇛󰇜

 󰇛󰇜

 󰇛



󰇜

 󰇛󰇜

 󰇛󰇜

 󰇛



󰇜

 󰇛󰇜  󰇛󰇜

 󰇛󰇜  󰇛󰇜  󰇛󰇜

 󰇛󰇜  󰇛󰇜  󰇛󰇜

  

























402

3.3 Validation of the Proposed Method

The proposed framework was then evaluated. Figure 8

shows the forward-time, backward-time, and

integrated distributions regarding the ship’s positions

and velocities. As intended, the forward distribution

exhibits a sharp and narrow peak at t=0, corresponding

to the ship’s initial state, and gradually broadens as

time proceeds. In contrast, the backward distribution

shows a sharp and narrow peak at

=0,

corresponding to the docking stage, and gradually

spreads when traced back in time. By computing the

joint probability of these distributions, a smooth

integrated distribution was obtained that bridges the

forward- and backward-time behaviors.

Figure 8. Forward-time, backward-time and integrated

distributions regarding the ship’s positions and velocities.

The distributions are compared with the expert data (the AIS

data of Toyofuji-Maru obtained on December 23, 2021).

By selecting the ideal velocity that maximizes the

distribution at each time step as in (8)–(13), an

integrated trajectory can be readily obtained. Figure 9

shows the integrated trajectories and time histories of

the ship’s positions and velocities. The results indicate

that the integrated trajectories agree with the real-ship

trajectories over the entire berthing process. Note that

the generation of forward-time and backward-time

trajectories, as well as their integration, each requires

less than one second, enabling real-time trajectory

planning within a few seconds.

Figure 10 compares the terminal position deviations

d, heading deviations ψ, and speeds over ground U for

the expert, IL, and the proposed method. Relative to

the IL-only, the proposed method greatly improved

each metric and was able to satisfy the terminal

constraints to a degree comparable to the expert. Even

for the expert, d and ψ were not exactly zero because

these values are calculated as deviations from the mean

terminal state in the expert data. Although the terminal

speed of the proposed method did not become exactly

zero, the maximum residual was about 1 cm/s, and its

impact is therefore considered sufficiently small.

Figure 9. Trajectories extracted from the integrated forward-

backward distributions and their associated state histories.

Figure 10. Box plots of the terminal position deviation,

heading deviation, and speed over ground across the eight

test cases: December 7, 15, and 23, 2021; January 7, 15, and 24,

2022; and August 9 and 23, 2022.

4 DISCUSSION

In this section, we investigated the generalization

performance of the proposed method to different

initial states, destinations, and berthing styles. Note

that the proposed method outputs accelerations within

the IL framework and generates trajectories feasible for

the target ship; thus, it is not expected to generalize

across substantially different ship types or sizes, where

feasible trajectories can change markedly. We therefore

excluded such cross-vessel generalization from the

scope of this study.

We collected AIS data from four car ferries, named

Settsu, Yamato, Hibiki, and Izumi, which are similar in

size to Housho-Maru/Toyofuji-Maru and are equipped

with bow and stern thrusters. These four ferries are

sister ships with nearly identical hull forms (overall

length 195.0 m and breadth 29.6 m). Figure 11 shows an

example of their berthing trajectories; Hibiki and

Izumi, and Settsu and Yamato, berth at the same

destination, respectively.

 󰇛󰇜

 󰇛󰇜

 󰇛



󰇜

 󰇛󰇜

 󰇛󰇜

 󰇛



󰇜





󰇛󰇜

 󰇛󰇜  󰇛󰇜  󰇛󰇜





󰇛󰇜 



󰇛󰇜

  

 󰇛󰇜

 󰇛󰇜

 󰇛



󰇜

 󰇛󰇜

 󰇛󰇜

 󰇛



󰇜

 󰇛󰇜  󰇛󰇜

 󰇛󰇜  󰇛󰇜  󰇛󰇜

 󰇛󰇜  󰇛󰇜  󰇛󰇜

  

























  















  





















  























403

Figure 11. Berthing trajectories of Housho-Maru, Toyofuji-

Maru, Hibiki, Izumi, Settsu, and Yamato.

Table 1 summarizes the training and test datasets.

Using data from Housho-Maru, Hibiki, and Settsu, we

trained three IL/BTIL model sets, A, B, and C,

respectively. We validated them using the test data

from Toyofuji-Maru, Izumi, and Yamato. Note that the

model set A is identical to that used in the previous

section, whose generalization performance to different

initial states (i.e., from Housho-Maru to Toyofuji-

Maru) has already been presented.

Table 1. Summary of the training and test datasets.

Ship name

Data

count

Data acquisition date

Usage

Housho-

Maru

December 5, 9, and 13, 2021

January 19 and 27, 2022

Training for

model set A

Hibiki

January 2, 4, 6, 8, 10, 12, 18, and

20, 2022

Training for

model set B

Settsu

January 1, 3, 5, 7, 11, 13, 15, 17,

21, 23, 25, 27, and 31, 2022

Training for

model set C

Toyofuji-

Maru

December 7, 15, and 23, 2021

January 7, 15, and 24, 2022

August 9 and 23, 2022

Test (bow-out

berthing)

Izumi

December 2, 4, 6, 8, 10, 12, 14, 16,

18, 20, 22, 26, 28, and 30, 2021

Test (bow-in

berthing)

Yamato

December 1, 3, 5, 7, 9, 11, 13, 15,

17, 19, and 21, 2021

Test (bow-in

berthing)

For the training of the model sets B and C, the

representative values for normalization were

configured as |X|max=|Y|max=2 km, |Ψ|max=360°,

|u|max=4.0 m/s, |v|max=0.5 m/s, and |r|max=0.8°/s. The

upper and lower bounds of the action space were

determined as |

|max=2.2×10

-2

m/s

, |

|max=1.4×10

-2

m/s

, and |

|max=0.1°/s

Figure 12 shows the trajectories using model sets A,

B, and C in the test cases of Toyofuji-Maru, Izumi, and

Yamato. The model set A, trained to imitate Housho-

Maru, failed to reproduce the berthing trajectories of

Izumi and Yamato; conversely, the model sets B and C,

trained on Hibiki and Settsu, failed to reproduce

Toyofuji-Maru. The current IL-based framework

struggled to generalize across berthing styles because

the possible actions, especially angular acceleration,

differ significantly between bow-in and bow-out

berthing. On the other hand, within the same berthing

styles (bow-in to bow-in), the proposed method was

able to generate trajectories comparable to the expert

even when the initial state and destination differed.

Figure 12. Integrated trajectories generated using the model

sets A, B, and C. The tested results are compared with the

expert data (the AIS data of Toyofuji-Maru obtained on

August 9, 2022; Izumi on December 16, 2021; and Yamato on

December 17, 2021).

Figure 13 presents the degree of satisfaction of the

terminal conditions when using the model sets B and

C. In all bow-in cases, the proposed method kept

terminal position/heading deviations and speeds over

ground at levels comparable to the expert, indicating

that the terminal constraints were effectively enforced.

When applied to the same destinations as in training,

i.e., from Hibiki to Izumi and from Settsu to Yamato, it

can be confirmed that even IL-only satisfied the

terminal constraints well. These results suggest that,

for bow-in berthing in which the state does not change

drastically during berthing, the proposed method was

effective for generalizing to different destinations,

rather than the same destinations.

Figure 13. Box plots of the terminal position deviation,

heading deviation, and speed over ground in the test cases of

Izumi and Yamato.

    































 󰇛󰇜  󰇛󰇜  󰇛󰇜

 󰇛󰇜  󰇛󰇜  󰇛󰇜

  

 













 󰇛󰇜  󰇛󰇜  󰇛󰇜

 󰇛󰇜  󰇛󰇜  󰇛󰇜

  









 





 󰇛󰇜  󰇛󰇜  󰇛󰇜

 󰇛󰇜  󰇛󰇜  󰇛󰇜

  















    





404

5 CONCLUSION

In this study, we proposed a real-time bidirectional

trajectory planning framework that integrates forward-

time imitation learning (IL) and backward-time

imitation learning (BTIL). The proposed framework

transforms forward-time and backward-time

trajectories into probability distributions using the

time-weighted adaptive kernel density estimation

(TWA-KDE) and generates routes by taking the joint

distributions of forward-time and backward-time

trajectory distributions. Since the framework is based

on IL, it can generate expert-like berthing trajectories

from AIS-based demonstrations without manual

design or tuning of objective functions. Also, the

required amount of expert data is limited to a small

number of AIS trajectories, which reduces the burden

of data collection.

One promising direction for future work is the

incorporation of disturbance effects. While

disturbances were not considered in the current

formulation, integrating meteorological and

oceanographic data corresponding to the AIS records

would enable route planning that accounts for

environmental disturbances in a natural manner.

Furthermore, there should exist a threshold beyond

which the forward-time and backward-time trajectory

distributions can no longer be smoothly integrated,

and identifying this threshold warrants further

investigation. If such a threshold can be quantified, it

could serve as a criterion for determining whether safe

berthing is feasible from the current state and could

support supervisory decision-making, including

transitions from autonomous operation to human

intervention when necessary. Future work will

therefore focus on validating the effectiveness of the

proposed method in real-world environments while

addressing these issues.

REFERENCES

U. Gruenefeld, T. C. Stratmann, Y. Brueck, A. Hahn, S. Boll,

and W. Heuten, “Investigations on container ship

berthing from the pilot’s perspective: Accident analysis,

ethnographic study, and online survey,” TransNav,

International Journal on Marine Navigation and Safety of

Sea Transportation, vol. 12(3), pp. 493–498, 2018.

K. Shouji, K. Ohtsu, and S. Mizoguchi, “An automatic

berthing study by optimal control techniques,” IFAC

Proceedings Volumes, vol. 25(3), pp. 185–194, 1992.

N. Mizuno, Y. Uchida, and T. Okazaki, “Quasi real-time

optimal control scheme for automatic berthing,” IFAC-

PapersOnLine, vol. 48(16), pp. 305–312, 2015.

A. Maki, N. Sakamoto, Y. Akimoto, H. Nishikawa, and N.

Umeda, “Application of optimal control theory based on

the evolution strategy (CMA-ES) to automatic berthing,”

Journal of Marine Science and Technology, vol. 25(1), pp.

221–233, 2020.

R. Suyama, Y. Miyauchi, and A. Maki, “Ship trajectory

planning method for reproducing human operation at

ports,” Ocean Engineering, vol. 266, p. 112763, 2022.

H. Yamato, H. Uetsuki, and T. Koyama, “Automatic berthing

by the neural controller,” Proceedings of the Ninth Ship

Control Systems Symposium, vol. 3, pp. 183–201, 1990.

N. K. Im, S. K. Lee, and D. B. Hyung, “An application of ANN

to automatic ship berthing using selective controller,”

TransNav, International Journal on Marine Navigation

and Safety of Sea Transportation, vol. 1(1), pp. 101–105,

2007.

Y. A. Ahmed, and K. Hasegawa, “Consistently trained

artificial neural network for automatic ship berthing

control,” TransNav, the International Journal on Marine

Navigation and Safety of Sea Transportation, vol. 9(3), pp.

417–426, 2015.

N. K. Im, and V. S. Nguyen, “Artificial neural network

controller for automatic ship berthing using head-up

coordinate system,” International Journal of Naval

Architecture and Ocean Engineering, vol. 10(3), pp. 235–

249, 2018.

S. Shimizu, K. Nishihara, Y. Miyauchi, K. Wakita, R. Suyama,

A. Maki, and S. Shirakawa, “Automatic berthing using

supervised learning and reinforcement learning,” Ocean

Engineering, vol. 265, p. 112553, 2022.

Y. Higo, M. Sakano, H. Nobe, and H. Hashimoto,

“Development of trajectory-tracking maneuvering

system for automatic berthing/unberthing based on

double deep Q-network and experimental validation

with an actual large ferry,” Ocean Engineering, vol. 287,

p. 115750, 2023.

T. Higaki, H. Nobe, and H. Hashimoto, “Human-like

automatic berthing system based on imitative trajectory

plan and tracking control,” Proc. OCEANS 2024-

Singapore, pp. 1–5, 2024.

T. Higaki, and H. Hashimoto, “Docking assistance method

for autonomous berthing by backward-time imitation

learning and kernel density estimation based on AIS

data,” Ocean Engineering, vol. 318, p. 120122, 2025.

H. Zhuang, Q. Shen, Y. Qian, W. Yuan, C. Wang, and M.

Yang, “Fast bidirectional motion planning for self-driving

general N-trailers vehicle maneuvering in narrow space,”

IEEE Open Journal of Intelligent Transportation Systems,

vol. 4, pp. 989–999, 2023.

Z. Sheng, T. Song, J. Song, Y. Liu, and P. Ren, “Bidirectional

rapidly exploring random tree path planning algorithm

based on adaptive strategies and artificial potential

fields,” Engineering Applications of Artificial

Intelligence, vol. 148, p. 110393, 2025.

I. Yanchin, and O. Petrov, “Towards autonomous shipping:

Benefits and challenges in the field of information

technology and telecommunication,” TransNav,

International Journal on Marine Navigation and Safety of

Sea Transportation, vol. 14(3), pp. 611–619, 2020.

K. Zolna, S. Reed, A. Novikov, S. G. Colmenarejo, D. Budden,

S. Cabi, M. Denil, N. Freitas, and Z. Wang, “Task-relevant

adversarial imitation learning,” Proceedings of the 2020

Conference on Robot Learning, PMLR 155, pp. 247–263,

2021.

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O.

Klimov, “Proximal policy optimization algorithms,”

arXiv:1707.06347, 2017.

P. W. Chou, D. Maturana, and S. Scherer, “Improving

stochastic policy gradients in continuous control with

deep reinforcement learning using the beta distribution,”

Proceedings of the 34th International Conference on

Machine Learning, PMLR 70, pp. 834–843, 2017.

D. P. Kingma, and J. Ba, “Adam: A method for stochastic

optimization,” Proceedings of the 3rd International

Conference on Learning Representations, ICLR 2015,

2015.