International Journal
on Marine Navigation
and Safety of Sea Transportation
Volume 6
Number 2
June 2012
181
1 INTRODUCTION
In recent years, data mining has caused great con-
cern to the industry. The so-called mining technolo-
gy is to present large amounts of data by digging in-
to useful information and knowledge, which applies
to business management, production control, market
analysis, engineering design and scientific explora-
tion and other fields. This is the data mining tech-
nology on information processing in navigation, the
use of data mining association rules FP-TREE algo-
rithm, mining the large number of ships AIS infor-
mation, access to the ship's route similarity analysis.
1.1 FP-TREE algorithm [1]
Data mining technology is the result of the devel-
opment of information technology, since the 60
years since the last century, databases, and infor-
mation technology has systematically evolved from
the original document processing to complex and
powerful database system, and now the technology
is quite mature. Data mining is the definitions of raw
data from a large number of extracted or “dig out"
the useful information into knowledge.
In the knowledge model of data mining, associa-
tion rules model is the more important one. The con-
cept of association rules by the Agrawal, Imielinski,
Swami suggested that the data in a simple but very
useful rule. Association rules models are descriptive
models, association rules discovery algorithm is un-
supervised learning.
FP-TREE [1][2][3] is one of association rule min-
ing algorithm, which uses divide and conquer strate-
gy, after the first pass scan, the frequency of the da-
tabase into a set of compressed frequent pattern tree
(FP-TREE), while retaining the associated infor-
mation, then FP-TREE library to differentiate into a
number of conditions, the length of each library, and
a frequency of 1 set of related libraries of these con-
ditions were re-excavation.
Applied Research of Route Similarity Analysis
Based on Association Rules
Z. Xiang, R. Liu, Q. Hu & C. Shi
Merchant Marine College, Shanghai Maritime University, China
ABSTRACT: In recent years, with the development of information technology, businesses have accumulated
a lot of useful historical data, as the shipping industry does. These data can be found deposited a large number
of "knowledge", for example, Shipping records for historical information, Ship-Port relations information,
Ship-ship relations information, Port & shipping route relations, Shipping route information. It can provide
intellectual support to shipping informatization development.
Association rules in data mining technology is one of important technologies. The technology, based on sta-
tistical methods, can mine the associated and implied "knowledge" from data warehouse ,which has a large
number of accumulated data. Apart from this, the technology can also play an important role in the prediction.
In this paper, based on FP-growth algorithm, we improve it forming Relevent ships routes.
From the prevalent perspective of data mining, deal with the corresponding vessels' dynamic information, ac-
quired from the AIS, such as data collection, data statistics. On this basis, get the ship-port relation and ship-
ship relation after a certain level of data analysis, processing, handing. Furthermore, this paper use the numer-
ous historical ship-port relation and ship-ship relation to build a mathematical model on the ship-port and
ship-ship relation. And use the improved association algorithm, FP-growth algorithm, to acquire the strong
association rules between ship-port and ship-ship, and eventually mine the similarity of the ship route.
Main points of this paper as follows:
Collect ,count and check the data, which is from ship dynamic information;
Establish the mathematical model between ship-port and ship-ship relation;
Improve the algorithm;
Analyse the similarity of ship route more accurately using the improved algorithm.
182
1.2 AIS information [4]
Automatic Identification System (AIS) is the current
advanced ship-aided navigation equipment; the In-
ternational Maritime Organization has adopted the
mandatory installation of AIS requirements. AIS can
automatically send the ship a static continuous, dy-
namic and voyage information, security, short mes-
sage, but can also automatically receive the infor-
mation sent around the ship, and exchange
information with the coast station.
AIS information includes static information such
as name, call sign, MMSI, ship type, ship size and
other information, and dynamic information such as
vessel position, ground speed, navigational status,
draft, destination, estimated time of arrival and other
information, but also with the voyage information
and security-related text messages.
1.3 AIS information on data mining [5][6]
In this paper, using association rules such as FP-
TREE algorithms, we deal with the corresponding
vessels' dynamic information, such as estimated time
of arrival, port of destination and departure infor-
mation, acquired from the AIS, such as data collec-
tion, data statistics. On this basis, get the ship-port
relation and ship-ship relation after a certain level of
data analysis, processing, handing to build a mathe-
matical model on the ship-port and ship-ship relation
to acquire the strong association rules between ship-
port and ship-ship, and eventually mine the similari-
ty of the ship route.
2 THE PROPOSAL OF THIS PAPER
At first, we used to provide users dynamic-related
information services with AIS ship data, such as re-
al-time latitude and longitude of the ship, speed, ar-
rival time, etc. Also including the port border query-
ing. But we are lack of a more intelligent data
mining reports service, for example, previous leaves
harbor of the ships, the ships’ similar route. These
are the users concerned, and it is very useful.
Therefore, in order to provide such services, we
need to mine existing data to find out the regular
pattern between ships and their anchored port and
also we need to find out ships’ similar routes(the
similar route among ships).The conclusion we mine
from AIS data can provide information on the effect
of enhanced information services . We collected
ships infomation through the www.manyships.com
dynamic data, and similarly, the Chinese port border
has also been part of we collect. It is easy for us to
find the ship's ports and similar routes and provide
us a lot of supports for data ming.
The most important task of this paper is: data min-
ing, to identify routes of ships similar to the (first
find was anchored in this port).
3 PREPARATION FOR DATA MING
3.1 Data collection
This article uses real-time AIS data from
www.manyships.com. AIS database established a
table for each ships, with the AIS data updating con-
tinuously the database update this table as table 1.
The number of table is Real-time. So, the first
step we need to do is to scheduler an interval job to
collect data using database snapshots[4]. With this
methods,we can collect one week or one month even
more AIS data[5] . To say that, for accurcy re-
sults,job’s interval can not be too long.
3.2 Data processing and Summary
After collecting the data,we must immediately calcu-
late whether the ship is in the ports and also calcu-
late its last port .
Finally we proceed the AIS data for each ships. The-
se data included the ship's number(mmsi), name,
port of arrival, arrival times, the last port, the last
departure time, callsign, type, so as follow. These
data be prepared for data mining as table 2.
Each port has its own id, so ‘fid’ in the picture
means the port’s id , ’prevfid’ means the last port’s
id. With these data we can be ming AIS data.
Table 1. Original datasets snapshot
183
Table 2. Datasets snapshot after procceed
4 MINING THE SIMILAR PATH AND THE
RELEVENT PORT
In order to retrieve the relative path between the
ship, we use association rule discovery methods to
determine the correlation between the ports and the
ships.
The central idea of this method is: Calculate the
ship reaches port summary statistics. Use summary
information as a transaction set; The port each ship
reached (or leave) is the association rules item. Each
transaction is a single arrangement of the ship reach-
es port. Aim of the algorithm is digging out the rele-
vant port, and getting the ships' similar paths.
4.1 Algorithm is defined as follows[1][3]:
(1).
}......,{
21 m
IIII =
is item set.
(2). Let
D
transaction sets, each transaction
T
is
.
(3). Each transaction has a unique identifier TID. In
addition, add a field to indicate the transaction from
the ship (the number of ship).
For an example of Transaction sets is as table 3:
Table 3 example of Transaction sets
___________________________________________________
TID PORTID Mmsi
___________________________________________________
1 200 , 47 , 48, 203 , 56 , 207 , 88 , 1 412402820
2 47 , 21, 48, 200, 66 , 71 412403000
3 21, 200 , 42 , 59 , 88 , 1 412403660
4 21, 48 , 49 , 88 , 71, 6 412410010
5 47 ,200 , 48 , 62 , 66 , 51 , 71, 1 412429760
___________________________________________________
Each port has an unique identity number ,so as
ships. In this paper, Example uses the portid and
mmsi to instead of the specific ports and ships.
4.2 Algorithm Description:
We intend to make a FP-tree for the transaction sets,
the tree node is a transaction set, namely the port
(except root node). Edge of the FP-tree is ships’ col-
lection, on behalf of the ship go across the two ports.
We use ‘Mmsi’ instead of ship’s name.(mmsi is the
unique identity of the ships in database).
The structure of Node data as table 4 :
Table 4 The structure of Node
___________________________________________________
Parent Portid Support Child
___________________________________________________
The structure of Node data contains the number
of ports, its support count, its child node, its parent
node.
The structure of Edge data is as table 5:
Table 5 The structure of Edge
___________________________________________________
PortID1 Mmsi[] mmsi Support[] PortID2
___________________________________________________
Edge data structure as shown above, Mmsi [] ex-
pressed as mmsi array, a collection of both ships
Mmsi number. Similarly, the corresponding number
of ships mmsi also have the corresponding support
count. PortID1 and PortID2 mean this edge belongs
to which two port nodes.
4.3 Steps of the algorithm
According to the minimum support, we use the
transaction sets generated a FP-tree by FP-growth
algorithm.
1 [1][2] scan transaction set D, in order to acquire
all the frequent items contained in D ,we named
the collection of these items F,and also caculate
their respective support. Frequent items in de-