875
1 INTRODUCTION
Unmanned ocean transportation is sure to
revolutionize maritime unmanned navigation in the
future. In October 2016, the Norwegian Maritime
AdministrationandtheNorwegianCoastalAuthority
established the worldʹs first autonomous ship test
zoneintheTrondheimfiordaswellastheNorwegian
Forum for Autonomous Ships. It
marked the
promotionofunmannedshipresearchtothenational
level in Norway. In early December 2018, the
ʺSuomenlinna IIʺ polar passenger ferry successfully
crossedthetestareaneartheportofHelsinki,under
theunmannedstate,andpassedtheremoteseatrial.
Theintelligentdecisionmakingmoduleisthe
ʺbrainʺ
of unmanned ships. It involves various technologies
such as route optimization, risk warning, smart
decisionmaking,andenergyefficiencymanagement.
It can make most decisions based on the external
navigation environment information, ship internal
information, and shorebased support information.
Forexample,itcantakeexcellentnavigationdecisions
and
sendcontrol commandsto theexecutionunitto
makeappropriatedecisions(Finnetal.,2010).
Takingshipcollisionavoidanceasanexample,the
intelligentdecisionmakingmoduleobtainstheactual
navigationsituationaroundtheshipaccordingtothe
targets acquired by the radar, AIS, shipborne
infrared camera, visible light camera,
and other
sensors and its fusion information, and conducts a
riskinformationanalysisforthesurroundingtargets
(Trucco, 2008). If there is a dangerous target, the
collision avoidance decision is made through the
intelligent collision avoidance technology combined
with the current position, direction it is heading
toward, and speed. The
instructions formed by the
decision, such as changing the course and changing
Developing Generative Adversarial Nets to Extend
Training Sets and Optimize Discrete Actions
R.L.Zhang&M.Furusho
KobeUniversity,Kobe,Japan
ABSTRACT: This study proposes the use of generative adversarial networks (GANs) to solve two crucial
problemsintheunmannedshipnavigation:insufficienttrainingdataforneuralnetworksandconvergenceof
optimalactionsunderdiscreteconditions.Toachievesmartcollisionavoidanceofunmannedshipsinvarious
sea environments,
first, this study proposes a collision avoidance decision model based on a deep
reinforcementlearningmethod.Then,itutilizesGANstogenerateenoughrealisticimagetrainingsetstotrain
thedecisionmodel.Accordingtogenerativenetworklearning,theconditionalprobabilitydistributionofship
maneuversislearnt(actionunits).Subsequently,the
decisionsystemcanselectareasonableactiontoavoidthe
obstaclesdueto thediscrete responses ofthe generated modeltodifferent actions andachieve the effectof
intelligentcollisionavoidance.Theexperimentalresultsshowedthatthegeneratedtargetshipimagesetcanbe
usedasthetrainingsetofdecision
neural networks.Further,a theoreticalreference tooptimizetheoptimal
convergenceofdiscreteactionsisprovided.
http://www.transnav.eu
the International Journal
on Marine Navigation
and Safety of Sea Transportation
Volume 13
Number 4
December 2019
DOI:10.12716/1001.13.04.22
876
thespeed,aresenttotheruddercontrolsystem.Inthe
process of collision avoidance of ships, the
information transmitted by multiple sensors and
equipment is continuously integrated, making sure
the collision avoidance scheme is adjusted in time
(Wang,2007).Thecoreofthecurrentresearchishow
the decision
making module can satisfy the optimal
navigational operations in all types of extreme
offshoreenvironments.
Therefore, in the risk assessment and early
warningresearch ofunmannedshipnavigation,itis
necessarytofocusontheunmannedshipincomplex
navigation conditions (such as ports, straits, canals,
andother intensivewaters),
shipcollision avoidance
and hydrometeorology, geographical environment,
traffic situation, and other issues. This research is
based on ship sensor data acquisition and training
optimization of decision neural networks
(Mazurowski, 2008). An intelligent risk warning
model and method suitable for unmanned ships
under complex navigation conditions is formed to
approachrealtime
warningofships(Scheffer,2012).
In the intelligent decisionmaking research, an
intelligentfusioncorrelationanalysisiscarriedouton
staticanddynamictargetsandnavigationconditions
aroundunmannedships.Intelligenttheories,suchas
deep learning, knowledge base, and situation
calculation,areapplied.Researchonshipnavigation
intelligent decision theory
based on ship navigation
system information and shorebased support
information, break through the key technologies of
ship autonomous meteorological navigation
technology. Technologies such as ship collision
avoidance,reefavoidance,antishelfintegration,and
smart processing of navigation information support
autonomous decisionmaking of ship navigation
(Capraro,2006).
Toachieveintelligent
collisionavoidancefunction
of unmanned ships in various environments, a
collision avoidance decision module based on deep
reinforcement learning is proposed to make
autonomous decisions under various conditions
(Mnih, 2015). In the Cyber confrontation game, the
DeepMind team collects enough data for training;
however, in the real navigation environment, it
is
difficult to obtain data in a rich and varied nautical
environment.Inparticular,varioustypesofencounter
shipshavedifferentpointsofobservationindifferent
situations, and it is difficult to predict their future
pathofnavigation(Sarukkai,2000).Intheprocessof
calculating the global solution optimal solution, the
decisionmodel isdifficulttodifferentiatedue tothe
discreteactionasaresult,theglobaloptimalsolution
cannot converge. Therefore, this study proposes a
generative adversarial networks (GANs) model to
solve the problem of neural network training data,
andthecombinationofGANanddeepreinforcement
learningtosolve
theconvergenceproblemofoptimal
actionunderdiscreteactionunitconditions.
2 RELATEDWORK
2.1 TheprincipleofGAN
GAN is a new method proposed by Goodfellow
(2014)totraingeneratedmodels.ThemethodofGAN
includes the generation and discrimination of two
“adversarial” models. The generated model (G) is
used
to capture the data distribution, and the
discriminant model (D) is used to estimate the
probability that a sample is derived from real data
ratherthanthegenerateddata.Boththegeneratorand
discriminatorarecommonconvolutionalnetworksas
well as fully connected networks. The generator
generatesasamplefrom
thestochasticvector,andthe
discriminator discriminates between the generated
sampleandtherealtrainingsetsample.
This optimization process can be attributed to a
twoplayer minimax game problem. Both purposes
canbeachieved throughabackpropagationmethod.
Awelltrainedgenerationnetworkcantransformany
noisevectorintoa
samplesimilartothetrainingset.
Thisnoisecanbeseenastheencodingofthesample
inalow dimensionalspace.The generatorgenerates
meaningful data based on the stochastic vectors. In
contrast, the discriminator learns how to determine
realandgenerateddataandthenpassesthelearning
experience
to the generator, thereby, enabling the
generator to generate more workable data based on
the stochastic vectors. Such a trained generator can
have many uses; one of them being environmental
generationinautomaticnavigation.
Thespecificprocesstoobtainvarioustargetships
isshowninFigure1.First,afewstochastic
vectorsare
fedasinputinthegeneratornetwork,andfakedata
are subsequently generated by the generator. The
aforementioned fake data can correspond to a few
shipstatepicturesornavigationdatasuchasAISdata
of a nearby encounter of the given ship or the path
planning data
after the ship route is updated. We
input the fake data to the discriminator, and the
discriminatordetermines whethertheinput dataare
realdataorfakedatageneratedbythegenerator.The
similarity between the generated data and the real
data gradually increases, then the discriminating
ability required by the
discriminator also increases
accordingly. Furthermore, the generator and the
discriminator share a mutually competitive and
mutually adversarial equation. The generated data
are considered to sufficiently mirror real data, and
therefore,thefakedatainputbythegeneratorappear
sufficientlyrealistic.Theapproximateaccuracyofthe
discriminatorinthiscaseis
50%.Thiscorrespondsto
the ta rget ship image data that are required in a
criticalseaenvironment.
Figure1.ApplyingGANgeneratevarioustargetshipswith
differentbackground
877
2.2 GANapplicationexamples
Inthepreviousstudy,theauthorsproposedusingthe
GAN to build an executable method for maritime
navigationroutereplanning.Theunmannedshipcan
independently generate new routes based on the
environmental information around the ship before
anypossibledangeroccursandcaneveninterrupt
the
remote support. With the total generative time less
than one second, the trained model helps the ship
avoid obstacle or any latent disaster. In addition,
GAN is easy to embed into the framework of
reinforcement learning. For example, when using
deep reinforcement learning to solve collision
avoidance problems, GAN
can be used to learn the
conditionalprobabilitydistributionofanaction. The
agent(ownship)canselectareasonableroutebased
on the response of the generated model to different
actions.
2.3 Globaloptimalityofdiscreteactions
The mathematical equivalent of deep reinforcement
learning can be considered as Markov decision
processes indiscretetime definedbyfivefactors(S,
ʹaʹ,P,ʹrʹ,γ)withaneuralnetworkinsteadofQvalue
(Zhang, 2017). Here, S is the finitestatespace (state
set) inwhich the unmannedship is located;a is the
behavior decision space of the unmanned
ship; i.e.,
the set of all actions or reactions in a space in any
state, for example, the left rudder, right rudder,
acceleration,deceleration, heeling, andstopping, etc.
(S) = P(Sʹ|S, a) , where P is a conditional
probabilityindicatingthattheunmannedshipreaches
the next state under state s and actionʹaʹ. The
probability of the state Sʹ,
(S| Sʹ) is a reward
function,whichrepresentstheexcitationobtainedby
theunmannedshipfromthestateStothenextstateSʹ
in the case of action a. γ
(0,1) is reward
attenuationfactor,therewardatthenextinstanttime
tisattenuatedusingthisfactor.
In actual navigation practices, completing a
collision avoidance process may require different
operational coordination methods. These operations
areincoherentanddiscrete.Further,thewaydifferent
people respond to the same event may be
different.
Generally, both the rewardʹrʹ and the attenuation
functionγaredifferent.Thus,amethodtoconverge
anoptimalglobalactiongroupisrequired.Thisstudy
will discuss the possibility of combining GAN and
deepreinforcementlearningtosolvethisproblem.
3 GANMETHODFORGENERATINGTRAINING
DATA
SETS
3.1 Conventionalacquisitionoftargetshipdataset
method
Theacquisition ofrelated targetship datais usually
carriedoutatthepositionoftheshipʹsbridge.Here,
we can observe the state of the target shipʹs
navigation,andthen,photographthetargetshipthat
willbeencountered.
Theauthorcarriedoutaseven
day summer research voyage on the university
trainingshipʺFUKAEMARUʺ.Adatasetofatotalof
4,000 images of valid target ships were obtained.
However,thisisnotenoughforthetrainingofneural
recognition networks. The target recognition and
classification
requirealarge numberof data setsfor
both training as well as target identification, if the
positionalpostureofthetargetshipistobeperceived.
Therefore, a new approach to get training data is
needed.
3.2 ExampleoftheGANgeneratelifeboatimagedata
Thissectionmainlydemonstratesthe
useofthedemo
providedbyBigGAN,asshowninFigure2,whichis
theprocessofgeneratingalifeboatusingGAN.Inthe
images obtained at different times in the generation
process, the sea surface appears in the image
generated in Figure 2.a, and there are a group
of
fuzzy things in the middle that cannot identify the
object.InFigure2.b,theorangeupperbodyandblack
rubberarecommoninthelifeboat.Thehull,shownin
Figure 2.c, is an almost completed generated image,
andoureyeandtrainingmodelcanroughlyidentify
this as a lifeboat.
We can input different stochastic
vectors and combine the real data input by the
discriminator to get a large number of highquality
imagedatasets.AsshowninFigure3,we gotthree
different types of lifeboats. More importantly, the
background ofthelifeboat could alsobe changed to
provide a large amount of training data for our
unsupervised decision model. It wasobserved to be
muchricher thanthe targetship datacollected from
therealseas.
Figure2.Targetshipimagedatageneratingprocess
Figure3. Example of generated lifeboats with different
backgrounds
The most important part of this research is to
obtain a data set of the target ship with sufficiently
high quality and quantity. As shown in Figure 4, a
portionofthe entirelargescalelifeboattargetimage
datasetisshown.Theseimageswerenottakenbythe
camera and
were generated entirely from the GAN
model.Usingdifferenttruncationandnoise_seed,our
modelcouldgeneratevariousencounter situationsat
seaasshowninFigures5and6,aswellasthevarious
forms possible for the target ship at the time of the
encounter;includingvarious typesofaccidents
such
as collision, stranding, fire and loss of goods. We
obtainedthetrainingdatasetforlifeboats,oceanliner
878
datasetsasshowninFigures7and8,aswellasdata
setsforvariousothertypesofmarinemovingtargets.
Figure4. Largescale data generated with different ship
statusandbackgrounds.(Lifeboat)
Truncation:0.14;Noise_seed:0
Figure5. Largescale data generated with different ship
statusandbackgrounds.(Lifeboat)
Truncation:0.28;Noise_seed:0
Figure6. Largescale data generated with different ship
statusandbackgrounds.(Lifeboat)
Truncation:0.56;Noise_seed:0
Figure7. Largescale data generated with different ship
statusandbackgrounds.(Oceanliner)
Truncation:0.10;Noise_seed:4
Figure8. Largescale data generated with different ship
statusandbackgrounds.(Oceanliner)
Truncation:0.10;Noise_seed:8
AsshowninFigure4–6,thelargerthetruncation,
thegreaterthediversityofthegeneratedsamples.In
fact,truncationcontrolsthetruncationdistanceofthe
hidden varia ble distribution (generally Gaussian),
which is the sampling range. Therefore, it is not
difficulttounderstanditsroleindiversity.Asshown
in
Figures 7–8, the influence of the value of
noise_seed on the generated result is the initial
condition of each sample generation, and the final
resultwillbedifferent,whichcanbeusedtoimprove
the generation diversity. When training image
recognitionmodelsofconvolutionalneuralnetworks
and decision models, such as
deep reinforcement
learning, the quality of the input data considerably
affects the effect of the training results. The target
imagedatasetgeneratedbyGANhasthesameimage
size and image density, which can easily solve the
problemofinconsistentinputdataduringthetraining
process.Inaddition,the
GANmodelsolvesmanyof
the scene data that are difficult to obtain in a real
navigation environment, making it possible to use
largescale data entry for deep reinforcement
learning.
879
4 GANWITHPOLICYGRADIENTFOROPTIMIZE
DISCRETEACTIONS.
Although the number of variants of GAN and their
versatilityisincreasing,theiradversarialthinkinghas
notchanged.Inotherwords,adiscriminatorthatcan
identifytherealdataandgeneratethedataisadded
inthegeneration process,sothat
thegeneratorGand
thediscriminatorDcancompetewitheachother.The
roleofDistotrytodistinguishtherealdataandthe
generateddatatoimprovethegenerateddatathatcan
confuse D. When D can no longer separate the true
andfalse data,it is
consideredthatG hasreached a
stablestate.
The numerous advantages are summarized as
follows:
Itcangeneratebettersamples;
No need to make inferences about hidden
variablesduringtraining;
Themodelonlyusesbackpropagationwithoutthe
needforaMarkovchain;
Gʹsparameterupdatedoesnotcomedirectlyfrom
the data sample, however, uses backpropagation
fromD;
Intheory,aslongasthedifferentiablefunctioncan
beusedtoconstructDandG,itcanbecombined
with deep neural network to make a deep
generationmodel.
Part of the agent decision module to complete a
collision avoidance evaluation may require different
action coordination. These discrete actions
are
difficult to complete the mathematical differential
operation; thus, it is necessary to find a way to
converge a global optimal action group. The last of
the above advantages is precisely its limitation. In
discrete data, data are not continuous like image
processingandcanbedifferentiated.Therefore,GAN
cannotbe
realizedfordiscretedata.
As shown in Figure 9, when using the deep
reinforcement learning model to solve the optimal
decision problem, GAN and deep reinforcement
learning are combined to select a reasonable global
optimal action combination. In Figure 9.a, an
adversarial idea is portrayed, where real data from
the
sea environment plus generated data of G are
required to train D. However, from the content
described in the related work section, the discrete
outputofGisobtained,whichmakesitdifficultforD
toreturnagradienttoupdateG,andtherefore,afew
changesneedtobemade.
AsshowninFigure9.b,the
value returned by the policy network is G. The
existingdotiscalledcurrentstate.Thegeneratednext
dotoperationiscalledaction,becauseDneeds tobea
completesequencescore.Thus,theMonteCarlotree
search (MCTS) is used to complete the
various
possibilitiesofeachaction.Drewardsthesecomplete
sequences,passesinformationbacktoG,andupdates
G by enhanced learning. This is done to use the
reinforcement learning method to train a generation
networkthatcangeneratetheglobalactionset.
Figure9.Policygradientconvergenceofdiscretedata
5 CONCLUSION
This study uses the GAN method to implement a
large number of generations of decision model
trainingdatasets.Inthegenerateddataset,according
tothesettingsofthetwoparametersoftruncationand
noise_seed, different target ship image data can be
obtained. Apart from the different positions
of the
targetshipand statedata,the encountersituationof
differentbackgrounds andscenesand imagedata of
various target shipsunderdangerous conditions are
obtained.Thetargetshipimagedatasetgeneratedby
thegenerativeadversarialmodelisusefultotrainthe
ship target recognition neural network under
different environmental backgrounds, however, for
the shipʹs motion situation prediction, collision
avoidance decision, etc., it provided discontinuous
data.Atthetimeofprocessing,GANdidnotsatisfy
this demand. Therefore, this study combines GAN
withtheideaofpolicygradientindeepreinforcement
learning, and a method for solving the
convergence
problem of discrete global action set is creatively
proposed.
REFERENCES
Capraro, G. T., Farina, A., Griffiths, H. & Wicks, M. C.
(2006). Knowledgebased radar signal and data
processing: a tutorial review. IEEE Signal Processing
Magazine,23(1),1829.
Finn, A. & Scheding, S. (2010). Developments and
challenges forautonomous unmannedvehicles.
IntelligentSystemsReferenceLibrary,3,128154.
Goodfellow,I.,
PougetAbadie,J.,Mirza,M.,Xu,B.,Warde
Farley, D., Ozair, S. ... & Bengio, Y. (2014). Generative
adversarial nets. In Advances in neural information
processingsystems(pp.26722680).
Mazurowski, M.A., Habas,P. A., Zurada, J. M., Lo, J.Y.,
Baker, J. A. & Tourassi, G. D. (2008).
Training neural
network classifiers for medical decision making: The
880
effects of imbalanced datasets on classification
performance.Neuralnetworks,21(23),427436.
Mnih,V.,Kavukcuoglu,K.,Silver,D.,Rusu,A.A.,Veness,
J.,Bellemare,M.G....&Petersen,S. (2015).Humanlevel
control through deep reinforcement learning. Nature,
518(7540),529.
Sarukkai, R. R. (2000). Link prediction and path analysis
using Markov chains1. Computer Networks, 33(16),
377386.
Scheffer,M.,Carpenter,S.R.,Lenton,T.M.,Bascompte,J.,
Brock, W., Dakos, V., ... & Pascual, M. (2012).
Anticipating critical transitions. science, 338(6105), 344
348.
Trucco, P., Cagno, E., Ruggeri, F. & Grande, O. (2008). A
Bayesian Belief Network modelling
of organisational
factors in risk analysis: A case study in maritime
transportation.ReliabilityEngineering&SystemSafety,
93(6),845856.
Wang, X., Yadav, V. & Balakrishnan, S. N. (2007).
Cooperative UAV formation flying with
obstacle/collisionavoidance.
Zhang, R. L. & Furusho, M. (2017). Conversion timing of
seafarer’s decisionmaking for unmanned
ship
navigation. TransNav: International Journal on Marine
NavigationandSafetyofSeaTransportation,11.