International Journal

on Marine Navigation

and Safety of Sea Transportation

Volume 5

Number 2

June 2011

255

1 INTRODUCTION

One of the main tasks in Artificial Intelligence is to

create the advanced systems that can effectively find

satisfactory solution of given problem and improve

it over time. Intelligent autonomous agents used in

these systems can quickly adapt to current situation,

i.e. change their behavior based on interactions with

the environment (Fig. 1), become more efficient

over time, and adapt to new situations as they occur.

Figure 1. Interaction of helmsman with an environment.

Such abilities are very important for simulating

helmsman behavior in ship maneuvering on restrict-

ed waters.

For simpler layouts learning process can be per-

formed using classic approach, i.e. Temporal Differ-

ence Reinforcement Learning (Tesauro 1995,

Kaelbling, Littman & Moore 1996) or Artificial

Neural Networks with fixed structures (Braun &

Weisbrod 1993). Dealing with high-dimensional

spaces is a known challenge in Reinforcement

Learning approach (Łącki 2007) which predicts the

long-term reward for taking actions in different

states (Sutton & Barto 1998).

Evolving neural networks with genetic algorithms

has been highly effective in advanced tasks, particu-

larly those with continuous hidden states (Kenneth

& Miikkulainen 2005). Neuroevolution gives an ad-

vantage from evolving neural network topologies

along with weights which can effectively store ac-

tion values in machine learning tasks. The main idea

of using evolutionary neural networks (ENN) in ship

handling is based on evolving population of helms-

men.

The neural network is the helmsman's brain mak-

ing him capable of choosing action regarding actual

navigational situation of the vessel which is repre-

sented by input signals. These input signals are cal-

culated and encoded from current situation of the

environment.

In every time step the network calculates its out-

put from signals received on the input layer. Output

signal is then transformed to one of available actions

influencing helmsman’s environment. In this case

the vessel on route within the restricted waters is

part of the helmsman’s environment. Main goal of

the agents is to maximize their fitness values. These

values are calculated from helmsmen behavior dur-

Multirole Population of Automated Helmsmen

in Neuroevolutionary Ship Handling

M. Lacki

Gdynia Maritime University, Gdynia, Poland

ABSTRACT: This paper presents the proposal of advanced intelligent system able to simulate and demon-

strate learning behavior of helmsmen in ship maneuvering. Simulated helmsmen are treated as individuals in

population, which through environmental sensing learn themselves to safely navigate on restricted waters. In-

dividuals are being organized in groups specialized for particular task in ship maneuvering process. Neuroev-

olutionary algorithms, which develop artificial neural networks through evolutionary operations, are used in

this system.

256

ing simulation. The best-fitted individuals become

parents for next generation.

2 NEUROEVOLUTION OF AUGMENTING

TOPOLOGIES

Neuroevolution of Augmenting Topologies (NEAT)

method is one of the Topology and Weight Evolving

Artificial Neural Networks (TWEANN’s) method

(Kenneth & Miikkulainen 2002). In this method the

whole population begins evolution with minimal

networks structures and adds nodes and connections

to them over generations, allowing complex prob-

lems to be solved gradually starting from simple

ones.

The NEAT method consists of solutions to three

main challenges in evolving neural network topolo-

gy:

1 Begin with a minimal structure and add neurons

and connections between them incrementally to

discover most efficient solutions throughout evo-

lution.

2 Cross over disparate topologies in a meaningful

way by matching up genes with the same histori-

cal markings.

3 Separate each innovative individual into a differ-

ent species to protect it disappearing from the

population prematurely.

2.1 Genetic Encoding

Evolving structure requires a flexible genetic encod-

ing. In order to allow structures to increase their

complexity, their representations must be dynamic

and expandable (Braun & Weisbrod 1993). Each ge-

nome in NEAT includes a number of inputs, neurons

and outputs, as well as a list of connection genes,

each of which refers to two nodes being connected

(Fig. 2).

Figure 2. Genotype and phenotype of evolutionary neural net-

work.

In this approach each connection gene specifies

the output node, the input node, the weight of the

connection, and an innovation number, which allows

finding corresponding genes during crossover. Con-

nection loopbacks are also allowed, as shown in fig-

ure 2.

2.2 Mutation

Mutation in evolutionary neural networks can

change both connection weights and network struc-

tures (Fig. 3). Connection weights mutate as in any

neuroevolutionary system, with each connection ei-

ther perturbed or not.

Structural mutations, which form the basis of

complexity, occur in two ways. Through mutation

the genome can be expanded by adding genes or

shrunk by removing them. In the add connection

mutation, a single new connection gene is added

connecting two previously unconnected nodes. In

the add node mutation, the new node is placed, thus

allowing to create new connections in future possi-

ble mutations.

Figure 3. An example of weights and connection mutation in

connections genome.

2.3 Crossover

Through innovation numbers, the system knows ex-

actly what genes match up with each other. Un-

matched genes are either disjoint or excess, depend-

ing on whether they occur within or outside the

range of the other parent's innovation numbers.

In crossing over operation (Fig. 4), the genes with

the same innovation numbers are lined up. The off-

spring is then formed in one of two ways: in uniform

crossover, matching genes are randomly chosen for

the offspring genome. In blended crossover, the

connection weights of matching genes are averaged.

These two types of crossover were found to be most

effective in ENN in extensive testing compared to

other crossover methods (Kenneth & Miikkulainen

2002).

257

Figure 4. An example of crossover operation of equally fitted

parents.

The disjoint and excess genes are inherited from

the more fitted parent or from both parents, if they

are equally fitted. Disabled genes (with zeroed

weight) have a chance of being re-enabled during

mutation, allowing networks to reuse older genes

once again.

Evolutionary neural network can store history

tracks of every gene in the population, allowing

matching genes to be found and linked-up together

even in different genome structures. Old behaviors

encoded in the pre-existing network structure are not

destroyed and remain qualitatively the same, while

the new structure provides an opportunity to elabo-

rate on these original behaviors.

In agent learning process, the genomes in NEAT

will gradually get larger through mutation. Genomes

of different sizes will sometimes result with different

connections at the same positions. Any crossover

operator must be able to recombine networks with

differing topologies, to pass agents skills to next

generations in meaningful way. Historical markings

represented by innovation numbers allow NEAT to

perform crossover without analyzing topologies.

Genomes of different organizations and sizes stay

compatible throughout evolution, and the variable-

length genome problem is essentially avoided. This

methodology allows NEAT to increase the complex-

ity of the structure while different networks still re-

main compatible.

2.4 Speciation

Speciation of population can be seen as a result from

the same process as adaptation (Beyer & Schwefel

2002), natural selection exerted by interaction

among organisms, and between organisms and their

environment (Spears 1995). Divergent adaptation of

different populations would lead to speciation. Spe-

ciation of the population assures that individuals

compete primarily within their own niches instead of

competition within the whole population. In this way

topological innovations of neural network are pro-

tected and have time to optimize their structure be-

fore they have to compete with other experienced

agents in the population.

Generally, during species assigning process, as

described in (Łącki 2009a), when a new agent ap-

pears in population, its genome must be assigned to

one of the existing species. If this new agent is struc-

turally too innovative comparing to any other indi-

viduals in whole population, the new species is cre-

ated.

Compatibility of agent’s genome g with particular

species s is estimated accordingly to value of dis-

tance between two individuals. This distance is cal-

culated with formula 1:

++=

(1)

where: c1; c2; c3 - weight (importance) coeffi-

cients; E - number of excesses; D - number of dis-

joints;

- average weight differences of matching

genes; N – the number of genes in the larger ge-

nome.

There must be estimated a compatibility threshold

at the beginning of the simulation and if

δ ≤ δ

then genome g is placed into this species. One can

avoid the problem of choosing the best value of

making

dynamic. The algorithm can raise

if there are

too many species in population, and lower

if there are

too few

2.5 Fitness sharing

Fitness sharing occurs when organisms in the same

species must compete with each other for life-

sustaining resources of their niche. Thus, a species

cannot afford to become too big even if many of its

individuals perform well.

Therefore, any one species is unlikely to take

over the entire population, which is crucial for spe-

ciated evolution to maintain topological diversity.

The adjusted fitness f

’

for individual i is calculated

according to its distance

from every other individ-

ual j in the population:

∑

jish

)),((

(2)

where: f

– fitness value of individual i; sh - shar-

ing function; n - number of individuals in whole

population; - average weight differences of matching

genes;

(i,j) – distance between individuals i and j.

The sharing function sh is set to 0 when distance

(i,j) is above the threshold

t; otherwise, sh(

(i,j)) is

set to 1 (Spears 1995). Thus, sum of sh calculates

the number of organisms in the same species as in-

258

dividual i. This reduction is natural since species are

already clustered by compatibility using the thresh-

old

t. A potentially different number of offspring is

assigned to every species. This number is propor-

tional to the calculated sum of adjusted fitness val-

ues f

’

of its members.

In the first step of species reproduction process

the system eliminates the lowest performing mem-

bers from the population. In the next step the entire

population is replaced by the offspring of the re-

maining organisms in each species (Fig. 5).

Figure 5. Evolution within one species in speciated population.

The other selection methods in speciated popula-

tion are also considered in future research, i.e. island

selection or elite selection of best fitted individuals

of every species with particular task.

The final effect of speciating the population is

that structural innovations are protected.

3 MULTIROLE SHIP HANDLING WITH

EVOLUTIONARY NEURAL NETWORK

The main goal of authors work is to make a system

able to simulate a set of navigational situations of

ship maneuvering through a restricted coastal area.

This goal may be achieved with Evolutionary Neural

Networks (ENNs).

Figure 6. Sample data signals of ship handling with ENN.

Navigational situation of a moving vessel can be

described in many ways. Most important is to define

proper state vector from abundant range of data sig-

nals (Fig. 6) and arbitrary determine fitness function

values received by the agent. Fitness calculation is

of primary meaning when determining the quality of

each agent. Subsequently it defines helmsman's abil-

ity to avoid obstacles while sailing toward designat-

ed goal.

Figure 7. Model of simulated simplified coastal environment.

In the simplified simulation model there are no

moving vessels in the area (Fig. 7) (Łącki 2009b).

Helmsman observes current situation which is

mapped into input signals for his neural network. In

general considered input signals indicating (i.e.):

− The ship is on the collision course with an obsta-

cle,

− Ships course over ground,

− Ships angular velocity,

− Distance to collision,

− The ship is approaching destination,

− Ships angle to destination,

− The ship is heading out of the area,

− Danger has increased,

− Danger has decreased,

− Ship is heading on goal.

All the input signals are encoded binary (Łącki

2010a). Neural network output value is the rudder

angle. It is crucial for effectiveness of simulation to

determine the number of neural network outputs.

More outputs mean more calculations but on the

other hand better accuracy and fidelity of designed

environment. Additionally too many outputs in-

crease complexity of the learning process, thus mak-

ing an agent unable to quickly adapt to new situa-

tions. This accuracy vs. performance dilemmas were

examined extensively in previous works (Łącki

2008-2010).

259

The fitness value of an individual is calculated

from arbitrary set action values, i.e.:

− -1 for increase of the distance to goal in every

time step,

− -10 when ship is on the collision course (with an

obstacle or shallow waters),

− +10 when she's heading to goal without any ob-

stacles on course,

− -100 when she hits an obstacle or run aground,

− +100 when ship reaches a goal,

− -100 when she departs from the area in any other

way, etc.

In the simplified simulation model speed of the

ship was constant. In the advanced model, in which

one considers a set of possible task, there must be

possibility for the agent to adjust speed of the vessel.

Situation evaluation can be treated as multi-criteria

problem which calculates a danger of getting strand-

ed on shallow water, encountering a vessel with

dangerous cargo, getting to close to shore, etc. It can

be estimated with available optimization algorithms

(Filipowicz, Łącki & Szłapczyńska 2005) with func-

tion of ship's position, course and angular velocity,

information gained from other vessels (if considered

in the model) and coastal operators.

3.1 Multirole ship handling system

Situation Determining Unit (SDU) is an intelligent

module responsible for grouped helmsmen manage-

ment (Fig. 8).

Figure 8. Complementary multirole neuroevolutionary ship

handling system.

This unit constantly receives information from

state of the environment, processes it, determines ac-

tual navigational situation and regarding it chooses

the best fitted group of helmsmen for this task. SDU

is designed with neuroevolutionary system. The best

performing neural network is the one making deci-

sions. Its performance is determined regarding fit-

ness values reinforcing the neural network by Fit-

ness Value Estimator. With elite selection method

this neural network participates in creation of new

generation of SDU’s.

State vector is dynamic in that its signals depend

on current navigational situation chosen by SDU.

Set of available actions is determined in the same

way. An example of input and output signals for

general navigation task is presented on figure 9.

Figure 9. Input and output signals in general navigation task.

List of considered possible tasks:

− General navigation – during this task SDU ob-

serves possible collision warnings. All the vessels

are treated as points. Action: coarse rudder angle.

− Collision avoidance – The vessels involved in

collision situation are treated as 2D objects. Ac-

tion: precise rudder angle, propeller thrust con-

trol,

− Turning – Action: rudder deflection, propeller

thrust and bow/stern thrusters’ control,

− Mooring – Action: propeller thrust and bow/stern

thrusters’ control, rudder deflection.

Two different multirole models of restricted wa-

ters environment can be taken into consideration in

the advanced system:

− Single vessel multirole population simulation,

− Multi-task multi-agent simulation.

In the first model the main goal is to train popula-

tion of helmsmen to safely handle a particular model

of a ship in specified single multirole task (i.e. safe

passage trough congested water channel, approach to

dock and mooring). In this case agents compete sim-

ultaneously with each other in one population (may

be grouped into species) to handle a ship as best as

possible. Any other vessels in the area are treated as

stationary or moving objects on reasonably predicta-

ble routes (Fig 10).

260

Figure 10. Example of an environment with simultaneous mul-

tirole population of helmsman.

The second model consist a small number of dif-

ferent vessels in the same restricted waters with

helmsmen as the best trained agents for particular

ships (Fig. 11). In this case every helmsman has dif-

ferent goal and different environmental situation,

depending on actions taken by other helmsmen on

the other vessels.

Figure 11. Example of multi-task multi-agent environment.

There are four agents allocated on four vessels. Starting points

of the vessels are indicated with black outlines while goals for

them are marked with white ones.

4 REMARKS

Neuroevolution approach to intelligent agents train-

ing tasks can effectively improve learning process of

simulated helmsman behavior in ship handling

(Łącki 2008). Neural networks based on NEAT in-

crease complexity of considered model of ship ma-

neuvering in restricted waters.

Implementation of multirole division of helms-

men population in neuroevolutionary system allows

simulating complex agents’ behavior in the envi-

ronments with much larger state space than it was

possible in a classic state machine learning algo-

rithms (Łącki 2007). In this system it is also very

important to change parameters of genetic opera-

tions dynamically as well as the input signals vector

and set of available actions.

REFERENCES

Beyer, H.-G. & Schwefel, P. H. 2002. Evolution strategies – A

comprehensive introduction. Natural Computing, 1(1):3–

52.

Braun, H. & Weisbrod, J. 1993. Evolving feedforward neural

networks. Proceedings of ANNGA93, International Confer-

ence on Artificial Neural Networks and Genetic Algo-

rithms. Berlin: Springer.

Chu T. C., Lin Y. C. 2003. A Fuzzy TOPSIS Method for Robot

Selection, the International Journal of Advanced Manufac-

turing Technology: 284-290,

Filipowicz, W., Łącki, M. & Szłapczyńska, J. 2005, Multicrite-

ria decision support for vessels routing, Proceedings of

ESREL’05 Conference.

Kaelbling, L. P., Littman & Moore. 1996. Reinforcement

Learning: A Survey.

Kenneth, O.S., & Miikkulainen, R. 2002. Efficient Evolution of

Neural Network Topologies, Proceedings of the 2002 Con-

gress on Evolutionary Computation, Piscataway.

Kenneth, O.S. & Miikkulainen R. 2005. Real-Time Neuroevo-

lution in the NERO Video Game, Proceedings of the IEEE

2005 Symposium on Computational Intelligence and

Games, Piscataway.

Łącki, M. 2007 Machine Learning Algorithms in Decision

Making Support in Ship Handling, Proceedings of TST

Conference, Katowice-Ustroń, WKŁ.

Łącki, M. 2008, Neuroevolutionary approach towards ship

handling, Proceedings of TST Conference, Katowice-

Ustroń, WKŁ.

Łącki, M. 2009a, Speciation of population in neuroevolution-

ary ship handling, Marine Navigation and Safety of Sea

Transportation, CRC Press/Balkema, Taylor & Francis

Group, Boca Raton – London - New York - Leiden, p. 541-

545.

Łącki, M. 2009b, Ewolucyjne sieci NEAT w sterowaniu stat-

kiem, Inżynieria Wiedzy i Systemy Ekspertowe, Akade-

micka Oficyna Wydawnicza EXIT, Warszawa, p. 535-544.

Łącki, M. 2010a, Wyznaczanie punktów trasy w neuroewolu-

cyjnym sterowaniu statkiem. Logistyka, No 6.

Łącki, M. 2010b, Model środowiska wieloagentowego w neu-

roewolucyjnym sterowaniu statkiem. Zeszyty Naukowe

Akademii Morskiej w Gdyni, No 67, p. 31-37.

Spears, W. 1995. Speciation using tag bits. Handbook of Evo-

lutionary Computation. IOP Publishing Ltd. and Oxford

University Press.

Stanley, K. O. & Miikkulainen, R. 2002. Efficient reinforce-

ment learning through evolving neural network topologies.

Proceedings of the Genetic and Evolutionary Computation.

Conference (GECCO-2002). San Francisco, CA: Morgan

Kaufmann.

Stanley, K. O. & Miikkulainen, R. 2005. Real-Time Neuroevo-

lution in the NERO Video Game, Proceedings of the IEEE

2005 Symposium on Computational Intelligence and

Games, Piscataway.

Sutton, R. 1996. Generalization in Reinforcement Learning:

Successful Examples Using Sparse Coarse Coding.

Touretzky, D., Mozer, M., & Hasselmo, M. (Eds.), Neural

Information Processing Systems 8.

Sutton, R. & Barto, A. 1998. Reinforcement Learning: An In-

troduction.

Tesauro, G. 1995. Temporal Difference Learning and TD-

Gammon, Communications of the Association for Compu-

ting Machinery, vol. 38, No. 3.