443
1 INTRODUCTION
One of the basic problems of marine traffic
engineeringistodeterminetheoptimalparametersof
newlybuiltandmodernizedelementsofwaterways.
Dependingonthetypeofwaterway,theseparameters
maybe,forexample,thewidthofthewaterwayorthe
diameter of the turning circle. These
parameters are
usuallydeterminedbyoneoftwomethods:analytical
method or more expensive and a more accurate
simulation method. Also the statistical data from
computer simulation model have been used to
determinewaterwayparameters[GucmaL.2005].
With AIS (Automatic Identification System) data
accessibility,theinput data forthe
modelrepresents
the actual navigator behavior has been received. It
helpstobetterunderstandtheshipsmovementinthe
waterway.Thecharacteristicsoftheshiptrafficfrom
theAISdataanalysis will be used togenerateinput
parameters.
NowadaysAISdataareusedinresearchesonthe
actualbehavior of vessels.
Number of traffic studies
havebeen conductedinlast years. A classicaltraffic
flow theory was used in an initially developed
mathematical model (Yip, 2013). BP neural network
wasusedtoforecastvesseltrafficflow(Zhangetal.,
2018).Theautomaticrecognitionoftrafficflowbased
onkerneldensityestimation
isproposedby(Lietal.,
2018). Most studies focus on the determination of
traffic parameters and their distribution. However,
thisworkfocusesontheuseofAISdatatodetermine
the relationship between traffic flow parameters,
vesseldimensionsandthewidthofthefairway.
Thepaperpresentsstudiesontraffic
flowinBaltic
Sea ports as a part of researches on a general
mathematical model of vessel traffic streams. The
calculations are performed partially with the
mathematical software tool IWRAP MK2
recommended by IALA. The statistical analysis was
carriedoutusingStististica10software.
2 METHOD
2.1 Analyzedarea
Inthe
studystraightwaterwaysectionwastakeninto
account. To create a model the fairways of different
Use of a Multiple Regression Model to Determine the
Parameters of Vessel Traffic Flow in Port Areas
A.Nowy&L.Gucma
M
aritimeUniversityofSzczecin,Szczecin,Poland
ABSTRACT: The paper presents the method of determining ships traffic stream parameters by means of
regressionmethod.Theaimofthestudieswastodeterminethecorrelationbetweentheshipʹsparametersand
theparametersofthefairway.Developingthepresentedmodelwithinformationon
thepositionofthevesselʹs
antennaandinformationontheaccuracyofpositiondeterminationwillallowcreatingamodelforpredicting
theparametersofwaterways.
http://www.transnav.eu
the International Journal
on Marine Navigation
and Safety of Sea Transportation
Volume 14
Number 2
June 2020
DOI:10.12716/1001.14.02.23
444
width have been chosen. The authors analyzed the
movementofshipsinthefollowingarea(Tab.1).
Table1.Localizationoftheanalyzedwaterway.
_______________________________________________
Lp. LocalizationD‐widthof D10m‐width
thedredgedbetween10m
fairway[m] isobaths[m]
_______________________________________________
1 Swinoujscie170245
2 PoliceRaduń90132
3 GdańskPortowyKanal 135165
4 Gdańsk‐MartwaWisła 45105
5 GdyniaPortowyKanal190194
6 Kołobrzeg5050
7 KaliningradApproach50150
338berth
_______________________________________________
Movements in the port are regulated by port
regulations.Itcanaffectedthemaximumspeedofthe
ship or prohibition of any activities due to the bad
weatherconditions.Inthatreasonforthestudiesonly
arrivals of ships with wind≤10m/s were taken into
account. This will also
limit the applicability of the
model. In the following data analysis, only data
samples with incoming ships are studied. A vessel
with incoming direction means that the ship comes
intothewaterwayfromtheopensea.
2.2 Data
Researcheshavebeenconductedonthebasisofdata
possessed from AIS obtained
from Polish Maritime
Administration. Vessel traffic was analyzed using
datafrom2015to2017.Generalcargovessels(GC)of
lengthL50mwereconsidered.
AIS raw data was processed using IWRAP MK2
application. The statistical function can be found
using historical AIS data. The traffic patterns are
illustrated in a
density plot, which helps to identify
the location of navigational routes (legs). Making a
crosssection of the legand creatingahistogram for
eachdirectionthemathematicalrepresentationusing
anumberofprobabilityfunctionsisprepared.
AIS data was filtered. Only ships going ahead
wereconsidered.Forthatreasonnext
positionofthe
vesselswaschecked(1kmahead).Ifthepositionwas
recorded the ship was included to database. This
allowed to select only this group of vessels which
actually moved in a given direction.Mooring and
circulatedvesselswereexcluded.
It should be added that all the considerations
presentedwerecarriedoutforonewaytraffic.Ships
were divided into group with the same dimensions
(length L and width B) and similar maneuvering
characteristics.Usuallytherewereasisterships.
2.3 Method
This paper presents the methodsof determining the
parametersoftrafficflowonstraightwaterwayusing
a
classic model of multiple regression supported by
the analysis of residuals. In the model the
introduction of hydrometeorological conditions and
maneuverability features of ships was omitted. It is
obviousthatsuchassumptionsconsiderablysimplify
themodel.
For each waterway center of the traffic lane was
established. Crossingline perpendicular to
the
channel has been selected to derive the data for the
behavior of ship traffic. For all sections, lateral
distributions were determined by analyzing the
number of ship crossings of report lines. In further
step mean and standard deviation of lateral
distributionforeachsectionandeachgroupofships
was
determined.
Theaimofthestudyistofindarelationbetween
trafficstreamparametersandwidthofthewaterway.
Multipleregressionmethodwasusedtobuildmodels
of mean and standard deviation of shipʹs distance
fromthecenterofthefairway.Afterimplementation
the position of the AIS
antenna suchmodels can be
used to determine the probability of collision of the
ship with hydrotechnical structures in the analyzed
areas.
2.4 Multipleregressionmodel
Themodelbasedonmultipleregressiondescribesthe
relationshipbetweenthedependentvariableyandn
independentvariablesformulatedasfollows:
011 nn
y
bbx bx
 (1)
where:
1
b ‐modelcoefficient
The following parameters of vessel traffic flow
suchasmeanmandstandarddeviationσofvesselsʹ
position in relation to the center of the track were
selected as dependent variables.It is assumed that
the center of the track is located symmetrically in
relationtothemean
widthofthedraggedwaterway
D. In the regression model following independent
variableswereconsidered:
widthoftheshipB[m],
lengthoftheshipL[m],
widthofthedraggedfairwayD[m],
widthbetween10misobathsD10m[m]
The basic problem occurring during the building
of multiple regression models is the internal
correlation between independent variables. In the
proposedmodelitisobviousandoccursbetweenthe
lengthand width of ships. Despite the correlation is
very strong authors not decided to remove the
independentvariabletheshipʹslengthbecauseithasa
theoreticaleffect
onthewidthofthetrafficlane.
A very important independent variable in the
modelisthewidthofthefairway.Themoredifficult
(thenarrower)areafornavigationthemoreaccurate
thesteeringofthevesselisperformed.Tolerancefor
errors is less and the probability of a collision
increases. The freedom of maneuver choice is also
reduced and only some maneuvering methods are
effective and safe. Analyzing the fairway area and
draughtoftheshipsauthorsdecidedtoaddvariable
D
10mwidthbetween10misobaths.
445
3 RESULTS
3.1 Parameterestimation
Models of two dependent variables: mean m and
standarddeviationσofvesselsʹpositioninrelationto
thecenterofthetrackatacertainlevelofsignificance
canbedefinedas:
10
010
m
BD DmL
mb bBbDb D bL (2)
10
010
m
BD DmL
bbBbDbD bL
 (3)
Table2presentsthemultipleregression
coefficientsofthemodelobtainedbytheleastsquares
method.Inaddition,acoefficientofdeterminationR
2
is presented, which determines the percentage of
variationofthedependentvariableexplainedbythe
modelandthestandarderrorsoftheestimationsare
interpretedastheaveragedeviationofthedependent
variableinthesamplefromthetheoreticalvalue.The
significance of regression models was studied by
means
ofFstatistics.
3.2 Modelofvariablem
Tab. 3 shows parameters for the first model where
dependent variable is mean of vesselsʹ position in
relationtothecenterofthetrack.
Estimating the parameters we obtain the
regressionfunctionof:
10
24.268 0.809* 0.1165*
0.474* 0.042*
m
mBD
DL

(4)
Standardestimationerrorsofparametersaresmall
inthe case of the independentvariable variables(≈
0.03forDandD
10m;≈0.38forB;≈0.06forL)forand
thevariableandacceptableinthecaseofintercept(≈
4.00).Thesignificanceofthewholevariablemodelis
p<0.000.
Toverifythestatisticalsignificanceofonevariable,
thetStudenttestwasperformed.Thetestisdesigned
to determine whether
an explanatory variable has a
significant effect on a dependent variable. In model
“mean”onlyvariableLisnotstatisticallysignificant.
However, it was used in a model to determine
changes in the mean position of vessels from the
center of the track due to the length of the vessel.
Standarderrorofestimateequals=14.067.Thismeans
that the predicted values of the dependent variable
differ from the empirical values on average by
14.067%.Theequation(4)canthereforebewrittenas:
10
24.268 0.809* 0.1165*
0.474* 0.042* 14.067
m
mBD
DL


(5)
Table2.Coefficientsofmultipleregressionmodel.
__________________________________________________________________________________________________
Dependent
0
b 
B
b 
L
b 
D
b 
10 m
D
b R
2
s[m] Significance p
variablesofregression
__________________________________________________________________________________________________
m24.2679 0.8088 0.0420 0.1165 0.4741 0.6433 14.067 F=138.460.000
12.4513 0.0271 0.0112 0.1264 0.0430.5285 4.1523 F=88.156 0.000
__________________________________________________________________________________________________
Table3.Regressionsummaryfordependentvariable:Meanm
__________________________________________________________________________________________________
R=.80209949R2=.64336358AdjustedR2=.63871686F(4,307)=138.46p<0.0000.Std.Errorofestimate:14.067
b*Std.Err.ofb* bStd.Err.ofbt(307) pvalue
__________________________________________________________________________________________________
Intercept‐24.2679 4.005164‐6.05915 0.000000
L[m]‐0.074105 0.097887‐0.0420 0.055484‐0.75705 0.449600
B[m]‐0.208607 0.098607‐0.8088 0.382290‐2.11555 0.035188
D
10m 0.943881 0.0646130.4741 0.032456 14.608250.000000
D‐0.242209 0.066555‐0.1165 0.032013‐3.63925 0.000321
__________________________________________________________________________________________________
Table4.Regressionsummaryfordependentvariable:Std.Dev.
__________________________________________________________________________________________________
R=.73115299R2=.53458470AdjustedR2=.52852065F(4,307)=88.156;p<0.0000Std.Errorofestimate:4.1523
b*Std.Err.ofb* bStd.Err.ofbt(307) pvalue
__________________________________________________________________________________________________
Intercept12.45138 1.182242 10.532000.000000
L[m]‐0.076701 0.111823‐0.01123 0.016378‐0.68591 0.493285
B[m]‐0.270152 0.112645‐0.27063 0.112844‐2.39825 0.017070
D1.016896 0.0760300.12639 0.009450 13.374890.000000
D
10m‐0.335100 0.073812‐0.04349 0.009580‐4.53992 0.000008
__________________________________________________________________________________________________
446
3.3 Modelofvariable
σ
For the second model where standard deviation of
vessels distance from the centeris obtained
regressionfunctionisasfollows:
10
1 2.451 0.271* 0.126*
0.043* 0.011*
m
B
D
DL

(6)
Tab.4showstheresultsforthemodelofstandard
deviation. As in the first model standard estimation
errors of parameters are small in the case of the
independent variable variables (≈0.001 for D and
D
10m;≈0.11forB;≈0.02forL)forandthevariableand
acceptable in the case of intercept (≈ 1.18). The
significance of the whole variable model is p<0.000.
Again,variableLhasn’tgotany effectonthemodel
(p=0.49). Standard error of estimate equal s=4.1523.
Thismeansthat
thepredictedvaluesofthedependent
variable differ from the empiricalvalues on average
by4.1523%.Theequation(6)canthereforebewritten
as:
10
1 2.451 0.271* 0.126*
0.043* 0.011* 4.15
m
B
D
DL


 (7)
3.4 Verificationofthemodel
The statistical validityof the model was tested with
the use of several indicators. The first one is the
determination coefficient R2. The coefficient R
2
for
thefirstmodelis0.6387whichmeansthatthemodel
explains 64% of the variability of the response data
around its mean. The second model explains 53%
(R
2
=0.5285). The coefficient of determination R
2
is
satisfactory according to the accepted interpretation
which leads to further researches on the topic. The
coefficient of determination can be low cause the
modelactuallypredictsnavigatorbehavior. Humans
are harder to predict that for example physical
process. It should be remembered that we are not
always able to
achieve the very high value of the
coefficientR
2
.Theaimoftheevaluationoftheexisting
modelisnottoobtainthehighestpossiblelevelofR
2
,
buttodeterminearelationshipbetweentheconsider
variablesandreliableparameterassessments.
Inorderto obtaina reasonablycorrect regression
model,theobtainedresidualsvaluesmustalwaysbe
analyzed after estimation and verification of this
model. The analysis of residual according to [1],
should begin from the most important
matter i.e.
checking presumptions of the classic method of the
smallest squares. This is because the correctly
constructed model is characterized by certain
desirable properties of the residuals (such as
normality, constancy of variance, lack of the
autocorrelation).
3.5 Normalityofresidualsvalues
In order to obtain normality checking of
regression
model,graphofresidualnormalitywascreated(Fig.1,
Fig 2). It enables a visual examination of residuals
compliance with normal distribution. If points are
situated along the straight line that confirm the
normalityofresidualdistribution.Someobjectioncan
relatetothefirstandlastsobservations,becauseitisa
bit off from the line, but this distance has not
influenced significantly the normality of residuals
values.Thesameinformationgiveasthehistogramof
residuals(Fig.3,Fig.4).Itcanbenoticedthatthisisa
good situation because the normal line (red line on
the graph) crosses the column
upper edge centers
(especiallyforsecondmodel).
Normal Probability Plot of Residuals
Dependent v ariable:Mean
-40 -30 -20 -10 0 10 20 30
Residuals
-4
-3
-2
-1
0
1
2
3
4
Expected Normal Value
Figure1.Normalitygraphofresidualsvaluesfor„Mean”
Normal Probabilit y Plot of Residuals
De
p
endent variable: St d.Dev.
-15 -10 -5 0 5 10 15 20 25
Residuals
-4
-3
-2
-1
0
1
2
3
4
Expect ed Normal Value
Figure2.Normalitygraphofresidualsvaluesfor„Standard
deviation”
Distribution of Raw residuals. Dependent variable:Mean
Expected Normal
-45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35
0
5
10
15
20
25
30
35
40
45
50
No of ob s
Figure3.Histogramofresidualsfor“Mean”
447
Distribution of Raw residuals. Dependent variable:Std.dev.
Expected Normal
-20-15-10-5 0 5 1015202530
0
20
40
60
80
100
120
140
160
180
No of o bs
Figure4.Histogramofresidualsfor“Standarddeviation”
3.6 Autocorrelationoftheresidualvalues
The assumption of autocorrelation was not verified
duetothefactthatobservationsarenotordered.
3.7 Therandomnesstest
The randomness test is designed to examine the
correctnessofthe analytical formof the model. This
canbeobtainedbymeansofbotha
visualassessment
ofthedistributionofresidualsandstatisticaltests.In
thispaperauthorsdecidedtouseafirsmethod.Ifthe
residuals of the model fulfill the assumption of
randomness, then in the graph the residuals as
observed values (for both the explanatory variables
and the explained variable) should
be arranged at
random and should not show any regularity (e.g.
subsequentseriesofpositiveandnegativeresiduals).
In Fig.5 and Fig. 6 the residuals of the model
diagram in relation to the empirical values of the
explanatory variable is shown. The residuals are
distributed irregularly, so we can assume that
the
assumption of randomness is fulfilled. It can be
noticedthatformeanmthereisalackofobservation
in range 3050m. This is the results that need to be
studiedinthefurtherresearches.Thesameeffectcan
beseenonFig.7.
Observed Values vs. Residuals
Dependent variable: Mean
-40-200 20406080100
Observed Values
-40
-30
-20
-10
0
10
20
30
Residuals
0.95 Conf.Int.
Figure5. Residuals distribution in relation to observed
valuesforvariable“Mean”
Observed Values vs. Residuals
De
p
endent variable: Std. Dev.
-5 0 5 10 15 20 25 30 35 40 45
Observed Values
-15
-10
-5
0
5
10
15
20
25
Residuals
0.95 Conf .Int .
Figure6. Residuals distribution in relation to observed
valuesforvariable“Std.Dev.”
3.8 Stabilityofresidualsvaluesvariance
The next desirable property of residuals values is a
presumption about homoscedasticity of random
component. For this purpose, a visual evaluation of
the distribution of residuals in relation to predicted
(theoretical) values was applied. The regular
distribution of points on the residue scatterplot in
relation
to the predicted values (Fig.7, Fig.8) do not
confirmed categorically the homoscedasticity of the
variance of the random component. Further tests
shouldbecarriedout(e.g.Goldfeld‐Quandttest).The
existenceofheteroscedasticitydoesnotalwaysmean
a bad choice of model or poor quality of statistical
data. In that
reason the model was not modifiedon
thisstageoftheresearches.
Predicted v s. Residual Scores
Dependent variable: Mean
-30 -20 -10 0 10 20 30 40 50 60 70
Predicted Values
-40
-30
-20
-10
0
10
20
30
Residuals
0 .9 5 C o nf.In t.
Figure7. Residuals distribution in relations to predicted
valuesforvariable“Mean”.
Pr e d ic te d v s . R e s id ua l Sc o re s
De
p
endent variable: Std.Dev.
2 4 6 8 10 12 14 16 18 20 22 24 26
Predicted Values
-15
-10
-5
0
5
10
15
Residuals
0 .9 5 C o nf.In t.
Figure8. Residuals distribution in relations to predicted
valuesforvariable“Std.Dev.”
448
3.9 Atypicalobservationinregressanalysis
Afteradaptingtheregressionequationonthebasisof
the observation results, it is always necessary to
analyze the predicted values and residuals. In
regression analysis, it is important that the modelis
not determined excessively by individual
observations with values significantly different from
those typical for a given sample. Such deviating
valuescansignificantlydisturbthecalculationresults
andleadtoincorrectconclusions.Sometimesthisone
observation has to be deleted to prevent such case.
However, observations that do not match to the
modelmayindicatedeficienciesinthemodelorabad
algebraicformofthemodel.
Inordertodetectsuchoutliersgraphofresiduals
distribution in relation to deleted residuals was
generated (Fig.9 and Fig.10). It can be noticed that
thereisnocomingoffobservation.Itcanbeobserved
thattherearesomeobservationsthatcanberemoved
after statistical
analysis and identification of the
source of this effect. In the presented models,
however, no observations have been removed. In
addition, it was noticed that the sample removal of
someoutliersdidnothaveasignificantimpactonthe
qualityoftheexaminedmodels.
Residuals vs. Deleted Residuals
Dependant variable:Mean
-40 -30 -20 -10 0 10 20 30
Residuals
-40
-30
-20
-10
0
10
20
30
Deleted residuals
0.95 % Conf.Int.
Figure9. Residuals distribution in relations to deleted
residualsforvariable“Mean”.
Residuals vs. Deleted Residuals
Dependent variable: Std.Dev.
-15 -10 -5 0 5 10 15 20 25
Residuals
-15
-10
-5
0
5
10
15
20
25
30
Deleted Residuals
0.95 Conf.Int.
Figure10. Residuals distribution in relations to deleted
residualsforvariable“Std.dev.”.
3.10 Prediction basedontheregressionmodel
Duringtheregressionmodel buildingthepossibility
ofpredictionofvariablevaluesistakenintoaccount
i.e. what values will be assumed by a dependent
variable with different values of an independent
variable. The final stage of regression analysis is to
use a verified
regression model for prediction of a
dependentvariable.Agraphicalrepresentationofthe
scatterplot can be used.Figure 11shows the
observedandpredictedvaluesofthemeanpositionof
the vesselʹs distance from center with a prediction
intervalof95%. Thelimits of the prediction interval
areshown
withadashedline.Fig.12showsthegraph
of observed and predictedvalues of standard
deviationsoftheshipʹsdistancefromthecenterofthe
track on straight sections. The decrease of variance
with the increase of mean as well as increase of
variance with the increase of standard
deviations
shouldbenoticed.
Pr edict ed vs. O bserved Values
Dependent variable: Mean
-30 -20 -10 0 10 20 30 40 50 60 70
Pr edict ed V alues
-40
-20
0
20
40
60
80
100
Observed Values
0.95 Pr ed. Int .
Figure11. Comparison of predicted values ofmean of the
shipʹsdistancefromthecenterofthetrackusingamultiple
regression model and the observed values in straight
sections.
Predict ed vs. Observ ed Values
De
p
endent variable: St d.Dev.
2 4 6 8 101214161820222426
Predict ed Values
-5
0
5
10
15
20
25
30
35
40
45
Observed Values
0. 95 Pred.I nt.
Figure12 Comparison of predicted values of standard
deviationsoftheshipʹsdistancefromthecenterofthetrack
usingamultipleregressionmodelandtheobservedvalues
instraightsections.
4 CONCLUSIONS
Multiple regression is used in prediction, i.e.
determination of future values of a dependent
variable on the basis of the equation. Used
independent variables indicated a significant impact
onthemodelwhichonlyconfirmedtheassumptions
thatwiththeincreaseinwidthofvesselBandlength
L, the
mean and standard deviation of the vesselʹs
position in relation to the center of the track
449
decreases.However,thelargertheavailablewidthof
thewaterarea,thesevaluesincrease.Theaimofthe
studies was to build a model which describes the
abovementioneddependenciesindetail.
DespitethefactthattheinformationfromtheAIS
system,whichwasusedtobuildthemodel,covered
thewholerangeofvariablesLandB,thevariablesm
andσshowed deficiencies in data continuity. From
thehistogramitcanbeseenthatthevaluesofmean
and standard deviation of the position of vessels in
relationtothecenterofthetrackintherangefrom30
to 50 m practically do not occur.It is probably
necessary to take a larger sample for tests of other
waterway width D and D10m. Lack of data can be
seenintheresidualsplot.Therefore,furtheranalyses
should be carried out taking into account other
fairways.
Thepresentedmodelsare
basedonAISdata.The
position of the vessel in relation to thecenter ofthe
trackreferstothepositionoftheantenna.Takinginto
account the position of shipʹs starboard and port
extremitiesandtheangleofdrift,itwillbepossibleto
buildamodelallowing
todeterminethemeanwidth
ofthesafemaneuveringareaoftheship.However,on
the basis ofthe builtregression models itis already
possibletoforecasttheparametersofthevesseltraffic
flowinportareas.Furtherresearchshouldalsotake
into account weather conditions and analyze the
accuracy
of the position obtained from the AIS
system.
Itisplannedtobuildamodeltakingintoaccount
all relevant factors (including hydrometeorological
conditionsand maneuverabilityof theship).Further
workwillalsofocusontheconstructionofregression
modelsfordifferenttypesofwaterwayssuchasport
entrancesand
bends.
BIBLIOGRAPHY
[1]Gucma L. (2005), Modelowanie czynników ryzyka
zderzenia jednostek pływających z konstrukcjami
portowymi i pełnomorskimi. Wydawnictwo Naukowe
AkademiiMorskiejwSzczecinie.
[2]Li,Wei‐Feng;Mei,Bin;Shi,GuoYou(2018):Automatic
recognition of marine traffic flow regions based on
Kernel Density Estimation. Journal of Marine Science
and
Technology26,pp.84–91.
[3]Stanisz A. (2007), Przystępny kurs statystyki. Statsoft
Polska,Kraków2007r.
[4]Yip,T.L.(2013),Amarinetrafficflowmodel.TransNav,
the International Journal on Marine Navigation and
SafetyofSeaTransportation7,1,pp.109–113.
[5]Zhang, Zeguo; Yin, Jianchuan; Wang,
Nini; Hui, Zi
gang(2018): Vessel traffic flow analysis and prediction
byanimprovedPSOBPmechanismbasedonAISdata.
in: Evolving Systems (2018).
https://doi.org/10.1007/s125300189243y.