International Journal
on Marine Navigation
and Safety of Sea Transportation
Volume 4
Number 2
June 2010
211
1 ARTIFICIAL INTELIGENCE IN DECISSION
MAKING SUPPORT
1.1 Introduction
In Artificial Intelligence (AI) one of the main tasks
is to create intelligent agents that adapt to current
situation, i.e. change their behavior based on interac-
tions with the environment (Fig 1.), becoming more
efficient over time, and adapting to new situations as
they occur.
Such ability is important for simulating helms-
man behavior in ship maneuvering on restricted wa-
ters.
Figure 1. General model of agent-based systems.
Learning process for simpler layouts can be per-
formed using classic approach, i.e. Temporal Differ-
ence Reinforcement Learning (Tesauro 1995) or Ar-
tificial Neural Networks with fixed structures (Braun
& Weisbrod 1993). Dealing with high-dimensional
spaces is a known challenge in Reinforcement
Learning approach which predicts the long-term re-
ward for taking actions in different states (Sutton &
Barto 1998).
1.2 Reinforcement Learning approach
Reinforcement Learning algorithms were taken into
consideration in previous research studies by the au-
thor (Łącki 2007). In this approach the agent re-
ceives description of current situation from the envi-
ronment and chooses one of available actions.
Environmental situation, which should fundamental-
ly affect agents’ behavior, is described by actual
state and signal called reward. The agents’ goal is to
maximize total amount of reward collected over
time. In simpler case total accumulated reward is a
sum of immediate rewards received in every time
step. Unfortunately the results of extensive simula-
tions were insufficient in high-dimensional envi-
ronment, such as helmsman behavior in ship maneu-
vering on restricted waters. Since simulated model
of environment consist only one active agent at a
time, the overall learning speed was rather slow. It
has occurred that the state space was too huge to al-
low the agent to learn effectively. Coarse coding of
states (Sutton 1996) and simplification of state vec-
tor has speeded up learning process. At the same
time inaccuracy in model of restricted waters envi-
ronment increases. In the long run the agent was
able to take the proper action to actual task but had
to learn correct behavior for slightly different task
Speciation of Population in Neuroevolutionary
Ship Handling
M. Lacki
Gdynia Maritime University, Gdynia, Poland
ABSTRACT: This paper presents the idea of using machine learning techniques to simulate and demonstrate
learning behavior in ship maneuvering. Simulated helmsman is treated as an individual in population, which
through environmental sensing learns itself to navigate through restricted waters selecting an optimum trajec-
tory. Learning phase of the task is to observe current situation and choose one of the available actions. The
individual improves his fitness function with reaching destination and decreases its value for hitting an obsta-
cle. Neuroevolutionary approach is used to solve this task. Speciation of population is proposed as a method
to secure innovative solutions.
212
by searching whole state space again. Due to limited
computer resources the state space boundaries must
be defined at the beginning of simulation. Further-
more to improve state-action pair value backups in
episodic learning process the eligibility traces where
used, which also requires additional memory re-
sources.
In advanced tasks, particularly those with contin-
uous hidden states and high-dimensional spaces,
evolutionary approach to artificial neural networks,
has proven to be more efficient.
1.3 Neuroevolutionary approach
Neuroevolution is evolving neural networks, both
connection weights and structure, with genetic algo-
rithms. The main idea of using evolutionary neural
networks (ENNs) in ship handling is based on train-
ing population of helmsmen (Łącki 2008).
Figure 2. Model of restricted waters environment.
The neural network is the helmsman’s mind al-
lowing him to make decisions based on actual navi-
gational situation which is represented by input sig-
nals received from environment. In each step the
network calculates its output from signals received.
These input signals are calculated from current situa-
tion of the environment, in this case: vessel in a con-
fined area.
Neural network output value is the rudder angle.
In actual evaluation there is only single output. Its
value, which is calculated through evolution process
of individuals in population, is normalized to rudder
angle range from -35 degrees (port) to 35 degrees
(starboard). There are plans to introduce several
neural network outputs with normalization in order
to bring the approach close to neural network deci-
sion support systems.
Classic artificial neural networks are not adequate
in dynamic environment. Ship handling in restricted
waters requires efficient network topology of
helmsman’s mind. To create such structure appeared
to be a difficult task. The main cause of this difficul-
ty comes from unknown hidden states and abundant
variety of input signals. Furthermore evolutionary
approach to neural networks is multi-agent system.
It means that there are autonomous units searching
for optimal solution simultaneously (Fig. 3).
Figure 3. Multi-agent simulation system. Helmsmen compete
with each other simultaneously to find the optimum route to
goal.
In agent-based systems most important is to de-
fine proper state vector from available data signals
derived from environment. It is also crucial to de-
termine fitness function values received by the agent
(Łącki 2008). Fitness calculation is of primary
meaning when determining the quality of each indi-
vidual. Subsequently it defines helmsman’s ability to
avoid obstacles while sailing toward designated
goal.
The fitness value of an individual is adjusted in
two ways: from arbitrary set action values and from
calculated values, i.e.: distance to goal, relative
heading to goal, distance to closest obstacle, etc.
Subjectively assigned action values are as fol-
lows: -1 if action leads to increase of the distance to
goal in every time step, -10 when the ship is on the
collision course (with an obstacle or shallow wa-
ters), +10 when she’s heading to goal without any
obstacles on course, -100 when she hits an obstacle
or run aground, +100 when ship reaches a goal and
100 when she depart from the area in any other way,
etc;
213
To simplify calculations ship’s dynamic was re-
duced. For example speed of the ship remains con-
stant despite significant radar deflection.
1.4 Multi-criteria input signals
Evaluation of quality of a state is to be treated as
multi-criteria problem. Its aim is to estimate a risk
factor of getting stranded, getting too close to the
shore, encountering a vessel with dangerous cargo,
etc. It can be estimated by function of ship’s posi-
tion, course and angular velocity and information
gained from other vessels (if considered in the mod-
el) and coastal operators. One of the efficient meth-
ods to estimate value of risk factor is Fuzzy TOP-
SIS (Filipowicz, Łącki & Szłapczyńska 2005).
TOPSIS stands for Technique for Order Prefer-
ence by Similarity to an Ideal Solution. Was origi-
nated by Hwang and Yoon as a new multi-attribute
decision making (MADM) method in 1981. Initially
the approach was intended for crisp values then ex-
tended for fuzzy parameters (Chu & Lin 2003).
The main concept of this method is based on dis-
tance calculation. The best alternative among the
available set is the closest to the best possible solu-
tion and the farthest from the worst possible solution
simultaneously. The best possible solution, referred
to as an ideal one, is defined as a set of the best at-
tribute values, whereas the worst possible one, re-
ferred to as a negative-ideal solution, is a set of the
worst at-tribute values. In this method every criteria
is of benefit or cost type. In the discussed problem
distance to closest obstacle is benefit criteria (should
be kept as high as possible), while probability of en-
countering a vessel with dangerous cargo is a cost
one (therefore is to be as low as justified).
The final TOPSIS ranking is created by sorting
the coefficient values assigned to each of the alterna-
tives in descending order. The alternative with the
highest ranking value claims to be the best one.
When vessels hits an obstacle or depart from the
area in forbidden way then its position is reset to ini-
tial values and the helmsman receives negative
points to his fitness value. The ones that reach the
goal reset their positions to initial ones and increases
helmsmen fitness values respectively. Therefore, af-
ter several dozen of episodes there will be some of
the individuals distinguished by their high fitness
values.
The main goal of the individuals in population is
to maximize their fitness values. This value is calcu-
lated from helmsman behavior during simulation as
described above. The best-fitted individuals become
parents for next generation.
Offspring genome is calculated from parents’ ge-
nomes using evolutionary operations.
2 EVOLUTIONARY OPERATIONS IN NEAT
NETWORKS
Neuroevolutionary systems are based on Topology
and Weight Evolving Artificial Neural Networks
(TWEANNs). These neural networks have the dis-
advantage that the correct although simplified topol-
ogy need not be known at the beginning it will
evolve through evolutionary operations.
Among TWEANNs there is Neuro Evolution of
Augmenting Topologies (NEAT). It is unique in that
it begins evolution with a population of minimal
networks and adds nodes and connections to them
over generations, allowing complex problems to be
solved gradually based on simple ones (Stanley &
Miikkulainen 2002). This way, NEAT searches
through a minimal number of weight dimensions and
finds the appropriate complexity level of network
topology adjusted to the problem. This process of
complexification has important implications on
search patterns. It may not be practical to find a so-
lution in a high-dimensional space by searching in
that space directly. But it may be possible to find so-
lution by searching in lower dimensional spaces and
further transfer of the best solutions into the high-
dimensional space.
The NEAT network delivers solutions to three
fundamental problems in evolving artificial neural
network topologies:
Innovation numbers line up genes with the same
origin to allow disparate topologies to cross over
in a meaningful way (innovation number is a
unique value assigned to a new gene).
Separation of each innovation into a different
species protects its disappearing from the popula-
tion prematurely.
Start from a minimal structure, add nodes and
connections, incrementally discovers most effi-
cient network topologies throughout evolution.
2.1 Selection
There are many ways to select individuals to become
potential parents for next generation. Replacing the
entire population on each generation may cause fast
convergence to local extremes since there is strong
selection method causing that everyone’s genome
would likely be inherited from best fitted individual.
In addition, behaviors would remain static during the
large gaps of time between generations.
The alternative is to replace a single individual
every few time intervals as it is done in evolutionary
strategy algorithms (Beyer & Schwefel 2002).
214
Figure 4. Evolution in population without species.
The worst individual, the one with lowest fitness
value, is removed and replaced with an offspring of
parents chosen from among the best. This cycle of
removal and replacement happens continually
throughout the simulation (Fig. 4).
2.2 Crossover
Every time a new connection gene appears in ge-
nome, what can only happen with mutation, a unique
value is assigned to this gene called innovation
number. Through innovation numbers, the system
knows exactly which genes match up with another.
The numbers are inherited and during crossover re-
main? unchanged, and allow algorithm to perform
evolutionary operations without the need for expen-
sive topological analysis. Genes that do not match
are either disjoint or excess, depending on whether
they occur within or outside the range of the other
parent’s innovation numbers.
During crossover the genes with the same innova-
tion numbers are lined up. The connection weights
of matching genes are averaged.
The disjoint and excess genes are inherited from
the more fit parent or, if they are equally fit, from
both parents. Disabled genes have a chance of being
re-enabled during crossover, allowing networks to
make use of older genes once again.
2.3 Mutation
Mutation is the main evolving mechanism in evolu-
tionary neural networks. It can change both network
topology and connection weights.
Connection weights mutate as in any neuroevolu-
tionary systems, with each existing connection be-
tween nodes either affected or not.
Structural mutations, which form the basis of
network complexity, occur in three ways. Each mu-
tation expands the size of the genome by adding
genes. In the add connection mutation, a single new
connection gene is added connecting two previously
unconnected nodes. In the add node mutation, an ex-
isting connection is split and the new node placed
where the old connection used to be. The old con-
nection is disabled and two new connections are
added to the genome. In the add layer mutation, a
new layer is created, if the maximum layer number
has not been reached yet. After that there is possibil-
ity to evolve new nodes in that new layer by add
node mutation.
There are also mutations removing connections,
nodes and layers and special mutation disabling par-
ticular node. This node can be re-enabled in future
mutations. Probability of each type of mutation is
obviously different but its value is of primary mean-
ing in efficient evolution.
3 SPECIATION
Speciation can be seen as a result from the same
process as adaptation: natural selection exerted by
interaction among organisms, and between organ-
isms and their environment. Divergent adaptation of
different populations would lead to speciation.
In the course of the modern synthesis in the 20th
century a somewhat different view emerged that
considered speciation and divergent adaptation, the
two separate processes required for the origin of
species diversity, mainly as resulting from different
and unrelated mechanisms. Speciation of the popula-
tion assures that individuals compete primarily with-
in their own niches instead of competition within the
whole population (Stanley & Miikkulainen 2005). In
this way topological innovations are protected and
have time to optimize their structure before they
have to compete with other niches in the population.
3.1 Algorithm
When new individual appears in population, it must
be assigned to one of the existing species or, if it is
too innovative comparing to any other individuals,
new species is created. The whole species assigning
algorithm is presented below.
Begin of the Genome Loop:
Take the next genome g from population P;
Begin of the Species Loop:
If all species in S have been checked:
create new species s
new
and place g in it;
Else
Get the next species s from S;
If g is compatible with s, add g to s;
If g has not been placed:
215
continue the Species Loop;
Else exit the Species Loop;
If not all genomes in G have been placed:
continue the Genome Loop;
Else exit the Genome Loop;
Compatibility of genome g with species s is esti-
mated accordingly to value of distance
δ
between
two individuals which is calculated with formula 1:
Wc
N
Dc
N
Ec
3
21
++=
δ
(1)
where: c
1
, c
2
, c
3
weight (importance) coeffi-
cients; E number of excesses; D number of dis-
joints; W average weight differences of matching
genes; N the number of genes in the larger ge-
nome.
If
δ δ
t
, a compatibility threshold, then genome
g is placed into this species.
One can avoid the problem of choosing the best
value of
δ
by making
δ
t
dynamic. The algorithm can
raise
δ
t
if there are too many species in population, and
lower
δ
t
if there are too few.
3.2 Fitness sharing
Fitness sharing means that organisms in the same
species must share the fitness of their niche. Thus, a
species cannot afford to become too big even if
many of its individuals perform well.
Therefore, any one species is unlikely to take
over the entire population, which is crucial for spe-
ciated evolution to maintain topological diversity.
The adjusted fitness f
i
for individual i is calculated
according to its distance
δ
from every other individ-
ual j in the population:
=
=
n
j
i
i
jish
f
f
1
'
)),((
δ
(2)
The sharing function sh is set to 0 when distance
δ
(i,j) is above the threshold
δ
t; otherwise, sh(
δ
(i,j)) is
set to 1 (Spears 1995). Thus, sum of sh calculates
the number of organisms in the same species as in-
dividual i. This reduction is natural since species are
already clustered by compatibility using the thresh-
old
δ
t. A potentially different number of offspring is
assigned to every species. This number is propor-
tional to the calculated sum of adjusted fitness val-
ues f
i
of its members.
Species reproduce by first eliminating the lowest
performing members from the population. In the
next step the entire population is replaced by the off-
spring of the remaining organisms in each species
(Fig. 5). The other selection methods in speciated
population are also considered in future research, i.e.
island selection or permanent isolation of best fitted
individuals of every species with particular task.
Figure 5. Evolution within one species in speciated population.
The final effect of speciating the population is
that structural innovations are protected.
4 REMARKS
Speciation of population in neuroevolutionary ma-
chine learning can effectively improve learning pro-
cess and decision making support in ship handling.
Artificial neural networks with evolving topology
and weights based on modified NEAT networks can
increase learning speed of helmsmen. Complexity of
considered model of ship maneuvering in restricted
waters environment does not affect learning process
very much. It is possible to use simulation models
with much larger state space than it was possible in
classic state machine learning algorithms without
neural network function approximations (Kaelbling,
Littman & Moore 1996). Issues like different selec-
tion methods of best fitted individuals, input signals
encoding, splitting one output to several neural net-
work outputs with normalization of signal values are
also worth to be revised in future research in area of
artificial intelligence support towards ship handling.
REFERENCES
Beyer, H.-G. & Paul Schwefel, H. 2002. Evolution strategies
A comprehensive introduction. Natural Computing, 1(1):3
52.
Braun, H. & Weisbrod, J. 1993. Evolving feedforward neural
networks. Proceedings of ANNGA93, International Confer-
ence on Artificial Neural Networks and Genetic Algo-
rithms. Berlin: Springer.
216
Chu T. C., Lin Y. C. 2003. A Fuzzy TOPSIS Method for Robot
Selection, the International Journal of Advanced Manufac-
turing Technology: 284-290,
Filipowicz, W., Łącki, M. & Szłapczyńska, J. 2005, Multicrite-
ria decision support for vessels routing, Proceedings of
ESREL’05 Conference.
Kaelbling, L. P., Littman & Moore. 1996. Reinforcement
Learning: A Survey.
Łącki M., 2007 Machine Learning Algorithms in Decision
Making Support in Ship Handling, Proceedings of TST
Conference, Katowice-Ustroń, WKŁ.
Łącki, M. 2008, Neuroevolutionary approach towards ship
handling, Proceedings of TST Conference, Katowice-
Ustroń, WKŁ.
Spears, W. 1995. Speciation using tag bits. Handbook of Evo-
lutionary Computation. IOP Publishing Ltd. and Oxford
University Press.
Stanley, K. O. & Miikkulainen, R. 2002. Efficient reinforce-
ment learning through evolving neural network topologies.
Proceedings of the Genetic and Evolutionary Computation.
Conference (GECCO-2002). San Francisco, CA: Morgan
Kaufmann.
Stanley, K. O. & Miikkulainen, R. 2005. Real-Time Neuroevo-
lution in the NERO Video Game, Proceedings of the IEEE
2005 Symposium on Computational Intelligence and
Games, Piscataway
Sutton, R. 1996. Generalization in Reinforcement Learning:
Successful Examples Using Sparse Coarse Coding.
Touretzky, D., Mozer, M., & Hasselmo, M. (Eds.), Neural
Information Processing Systems 8.
Sutton, R. & Barto, A. 1998. Reinforcement Learning: An In-
troduction.
Tesauro, G. 1995. Temporal Difference Learning and TD-
Gammon, Communications of the Association for Compu-
ting Machinery, vol. 38, No. 3.