International Journal
on Marine Navigation
and Safety of Sea Transportation
Volume 6
Number 2
June 2012
181
1 INTRODUCTION
In recent years, data mining has caused great con-
cern to the industry. The so-called mining technolo-
gy is to present large amounts of data by digging in-
to useful information and knowledge, which applies
to business management, production control, market
analysis, engineering design and scientific explora-
tion and other fields. This is the data mining tech-
nology on information processing in navigation, the
use of data mining association rules FP-TREE algo-
rithm, mining the large number of ships AIS infor-
mation, access to the ship's route similarity analysis.
1.1 FP-TREE algorithm [1]
Data mining technology is the result of the devel-
opment of information technology, since the 60
years since the last century, databases, and infor-
mation technology has systematically evolved from
the original document processing to complex and
powerful database system, and now the technology
is quite mature. Data mining is the definitions of raw
data from a large number of extracted or “dig out"
the useful information into knowledge.
In the knowledge model of data mining, associa-
tion rules model is the more important one. The con-
cept of association rules by the Agrawal, Imielinski,
Swami suggested that the data in a simple but very
useful rule. Association rules models are descriptive
models, association rules discovery algorithm is un-
supervised learning.
FP-TREE [1][2][3] is one of association rule min-
ing algorithm, which uses divide and conquer strate-
gy, after the first pass scan, the frequency of the da-
tabase into a set of compressed frequent pattern tree
(FP-TREE), while retaining the associated infor-
mation, then FP-TREE library to differentiate into a
number of conditions, the length of each library, and
a frequency of 1 set of related libraries of these con-
ditions were re-excavation.
Applied Research of Route Similarity Analysis
Based on Association Rules
Z. Xiang, R. Liu, Q. Hu & C. Shi
Merchant Marine College, Shanghai Maritime University, China
ABSTRACT: In recent years, with the development of information technology, businesses have accumulated
a lot of useful historical data, as the shipping industry does. These data can be found deposited a large number
of "knowledge", for example, Shipping records for historical information, Ship-Port relations information,
Ship-ship relations information, Port & shipping route relations, Shipping route information. It can provide
intellectual support to shipping informatization development.
Association rules in data mining technology is one of important technologies. The technology, based on sta-
tistical methods, can mine the associated and implied "knowledge" from data warehouse ,which has a large
number of accumulated data. Apart from this, the technology can also play an important role in the prediction.
In this paper, based on FP-growth algorithm, we improve it forming Relevent ships routes.
From the prevalent perspective of data mining, deal with the corresponding vessels' dynamic information, ac-
quired from the AIS, such as data collection, data statistics. On this basis, get the ship-port relation and ship-
ship relation after a certain level of data analysis, processing, handing. Furthermore, this paper use the numer-
ous historical ship-port relation and ship-ship relation to build a mathematical model on the ship-port and
ship-ship relation. And use the improved association algorithm, FP-growth algorithm, to acquire the strong
association rules between ship-port and ship-ship, and eventually mine the similarity of the ship route.
Main points of this paper as follows:
Collect ,count and check the data, which is from ship dynamic information;
Establish the mathematical model between ship-port and ship-ship relation;
Improve the algorithm;
Analyse the similarity of ship route more accurately using the improved algorithm.
182
1.2 AIS information [4]
Automatic Identification System (AIS) is the current
advanced ship-aided navigation equipment; the In-
ternational Maritime Organization has adopted the
mandatory installation of AIS requirements. AIS can
automatically send the ship a static continuous, dy-
namic and voyage information, security, short mes-
sage, but can also automatically receive the infor-
mation sent around the ship, and exchange
information with the coast station.
AIS information includes static information such
as name, call sign, MMSI, ship type, ship size and
other information, and dynamic information such as
vessel position, ground speed, navigational status,
draft, destination, estimated time of arrival and other
information, but also with the voyage information
and security-related text messages.
1.3 AIS information on data mining [5][6]
In this paper, using association rules such as FP-
TREE algorithms, we deal with the corresponding
vessels' dynamic information, such as estimated time
of arrival, port of destination and departure infor-
mation, acquired from the AIS, such as data collec-
tion, data statistics. On this basis, get the ship-port
relation and ship-ship relation after a certain level of
data analysis, processing, handing to build a mathe-
matical model on the ship-port and ship-ship relation
to acquire the strong association rules between ship-
port and ship-ship, and eventually mine the similari-
ty of the ship route.
2 THE PROPOSAL OF THIS PAPER
At first, we used to provide users dynamic-related
information services with AIS ship data, such as re-
al-time latitude and longitude of the ship, speed, ar-
rival time, etc. Also including the port border query-
ing. But we are lack of a more intelligent data
mining reports service, for example, previous leaves
harbor of the ships, the ships’ similar route. These
are the users concerned, and it is very useful.
Therefore, in order to provide such services, we
need to mine existing data to find out the regular
pattern between ships and their anchored port and
also we need to find out ships’ similar routes(the
similar route among ships).The conclusion we mine
from AIS data can provide information on the effect
of enhanced information services . We collected
ships infomation through the www.manyships.com
dynamic data, and similarly, the Chinese port border
has also been part of we collect. It is easy for us to
find the ship's ports and similar routes and provide
us a lot of supports for data ming.
The most important task of this paper is: data min-
ing, to identify routes of ships similar to the (first
find was anchored in this port).
3 PREPARATION FOR DATA MING
3.1 Data collection
This article uses real-time AIS data from
www.manyships.com. AIS database established a
table for each ships, with the AIS data updating con-
tinuously the database update this table as table 1.
The number of table is Real-time. So, the first
step we need to do is to scheduler an interval job to
collect data using database snapshots[4]. With this
methods,we can collect one week or one month even
more AIS data[5] . To say that, for accurcy re-
sults,job’s interval can not be too long.
3.2 Data processing and Summary
After collecting the data,we must immediately calcu-
late whether the ship is in the ports and also calcu-
late its last port .
Finally we proceed the AIS data for each ships. The-
se data included the ship's number(mmsi), name,
port of arrival, arrival times, the last port, the last
departure time, callsign, type, so as follow. These
data be prepared for data mining as table 2.
Each port has its own id, so ‘fid’ in the picture
means the port’s id , ’prevfid’ means the last port’s
id. With these data we can be ming AIS data.
Table 1. Original datasets snapshot
183
Table 2. Datasets snapshot after procceed
4 MINING THE SIMILAR PATH AND THE
RELEVENT PORT
In order to retrieve the relative path between the
ship, we use association rule discovery methods to
determine the correlation between the ports and the
ships.
The central idea of this method is: Calculate the
ship reaches port summary statistics. Use summary
information as a transaction set; The port each ship
reached (or leave) is the association rules item. Each
transaction is a single arrangement of the ship reach-
es port. Aim of the algorithm is digging out the rele-
vant port, and getting the ships' similar paths.
4.1 Algorithm is defined as follows[1][3]:
(1).
}......,{
21 m
IIII =
is item set.
(2). Let
D
transaction sets, each transaction
T
is
.
(3). Each transaction has a unique identifier TID. In
addition, add a field to indicate the transaction from
the ship (the number of ship).
For an example of Transaction sets is as table 3:
Table 3 example of Transaction sets
___________________________________________________
TID PORTID Mmsi
___________________________________________________
1 200 , 47 , 48, 203 , 56 , 207 , 88 , 1 412402820
2 47 , 21, 48, 200, 66 , 71 412403000
3 21, 200 , 42 , 59 , 88 , 1 412403660
4 21, 48 , 49 , 88 , 71, 6 412410010
5 47 ,200 , 48 , 62 , 66 , 51 , 71, 1 412429760
___________________________________________________
Each port has an unique identity number ,so as
ships. In this paper, Example uses the portid and
mmsi to instead of the specific ports and ships.
4.2 Algorithm Description:
We intend to make a FP-tree for the transaction sets,
the tree node is a transaction set, namely the port
(except root node). Edge of the FP-tree is ships’ col-
lection, on behalf of the ship go across the two ports.
We use ‘Mmsi’ instead of ship’s name.(mmsi is the
unique identity of the ships in database).
The structure of Node data as table 4 :
Table 4 The structure of Node
___________________________________________________
Parent Portid Support Child
___________________________________________________
The structure of Node data contains the number
of ports, its support count, its child node, its parent
node.
The structure of Edge data is as table 5:
Table 5 The structure of Edge
___________________________________________________
PortID1 Mmsi[] mmsi Support[] PortID2
___________________________________________________
Edge data structure as shown above, Mmsi [] ex-
pressed as mmsi array, a collection of both ships
Mmsi number. Similarly, the corresponding number
of ships mmsi also have the corresponding support
count. PortID1 and PortID2 mean this edge belongs
to which two port nodes.
4.3 Steps of the algorithm
According to the minimum support, we use the
transaction sets generated a FP-tree by FP-growth
algorithm.
1 [1][2] scan transaction set D, in order to acquire
all the frequent items contained in D ,we named
the collection of these items F,and also caculate
their respective support. Frequent items in de-
184
scending order according to their support, the re-
sults recorded as L(table 6):
2 [1][2] Create the root of FP-tree T, marks "null";
Then, for each transaction TID following:
According to the order of L to select and sort the
frequent items TID. Let the sorted frequent item
list as [x|P], x is the first frequent item, and P is
the remaining frequent items; Then call IN-
SERT_TREE([x|P],T).The INSERT_TREE([x|P],
T) follows the process of implementation: If T
have their children named N and meet the N. Por-
tid = x. Portid, increase the N's Support 1; else
create a new node N, its count is set to 1, link to
its parent node T and through the node chain
structure link to the node with the same Portid. If
P is not empty, recursively call the IN-
SERT_TREE (P, N).
In addition, whether the addition of new nodes,
each edge must be coupled with corresponding
Mmsi TID number. if repeated, Mmsi Support
count increased by 1.
Re-scanned in the transaction set, a complete with
side information on FP-tree built.
Using the example 3.1 , let the minimum support
MIN_SUPPORT = 3, the 1-frequent itemsets is as
follows:
1 - frequent item sets as table 6 :
Table 6 1 - frequent
_________________
item count
_________________
200 4
48 4
47 3
21 3
88 3
71 3
1 3
_________________
And we also acquire the FP-tree as figure 1.
The information of edge as table 7:
Table 7 information of edge after calculated
___________________________________________________
Edge ID PortID1 Mmsi[] mmsiSupport[] PortID2
___________________________________________________
1 200 412402820, 4,4,2 48
412403000,
412429760
2 48 412402820, 3,3,1 47
412403000,
412429760
3 47 412403000 3 21
4 21 412403000 1 71
5 47 412402820 1 88
6 88 412402820 1 1
7 47 412403000 1 71
8 71 412402820 1 1
9 200 412403660 1 21
10 21 412403660 1 88
11 88 412403660 1 1
12 48 412410010 1 21
13 21 412410010 1 88
14 88 412410010 1 71
___________________________________________________
Figure 1 generated tree
185
4.4 Mining Rules
We has generated a FP - tree, according to frequent
1 - itemsets, we can mine the frequent pattern as ta-
ble 8:
Table 8 frequent pattern of Ports
___________________________
id frequent pattern of Ports
___________________________
1 200-1
2 200-47
3 48-47
4 200-48-47
5 200-48
6 48-71
___________________________
Meanwhile, according to the minimum support
(min_support) branch cut, we also can get the ship-
related items as table 9:
Table 9 frequent pattern of ships
________________________________
Id frequent pattern of ships
________________________________
1 412402820-412403000
________________________________
This result shows that a total of 6 groups of ports
are associated. What’s more,the MMSI number
which is 412402820 and 412403000 of the two ships
has some similar shipping routes.
Thus, by calculating the ship - port summary in-
formation,we not only acquire the Correlation of the
ports but also the the Correlation of the ships’
routes.
5 CONCLUSIONS AND SUMMARY
In this paper,based on the AIS data analysis and pro-
cessing, we use improved FP-growth algorithm,the
algorithm of data mining association rules to analyze
the ship arrival information, and build FP-tree, by
scanning the transaction sets, creating FP-tree root
sub-set of structural conditions of frequent library,
and mining the sub-set to get frenquent pat-
tern.Through all these methods,we aggregate and
mine the information of ship and port, then can find
out the relevant port and ships with the similar
routes.
Use of the collected AIS data which is more than
one week to run above algorithm method ,we get 42
groups of related ports and 105 ships.(if you want to
get more accurate results, you need to capture more
of the AIS data) This shows the feasibility of the al-
gorithm. In addition, compared to other association
rule algorithm, this algorithm only need to scan the
database twice.
6 DEFICIENCY AND OUTLOOK
In this paper,we use improved FP-tree Algorithm to
dig out the similar route and relevant port,but defi-
ciency as follow:
1. FP-tree need lots of Server’s memory.
2. how to set minimum support and confidence is
not in-depth reserch.
Thus,our further research will focus on improving
the efficiency and application. In addition, data min-
ing application in the navigation area is also the di-
rection of future research
REFERENCE
[1] MA Xu-hui, Zhang A-hong. Association Rules Generated
By the FP Tree Depth-First Algorithm. Computer
Knowledge and Technology, 2010, 13: 058
[2] HUI Liang, QIAN Xue-zhong. Non-check mining algo-
rithm of maximum frequent patterns in association rules
based on FP-tree. Journal of Computer Applications, 2010,
07: 064
[3] YAN Wei, BAI Wen-yang, ZHANG Yan. Hiding Associa-
tion Rules with FP-Tree Based Transaction Dataset Recon-
strucion. Department of Computer Science and Technolo-
gy, 2008
[4] JI Xian-biao, SHAO Zhe-ping, PAN Jia-cai, et al. Devel-
opment of distributed data collection system of AIS infor-
mation and key techniques[J].Journal of Shanghai Maritime
University, 2007, 28(3): 28-31
[5] Lee C Y. An algorithm for path connections and its applica-
tions. IRE Trans Electronic Computers, 1961, EC-10: 346-
365
[6] ZHANG Wen, TANG Xi-jin, YOSHIDA Taketoshi. AIS:
An approach to Web information processing based on Web
text mining. Systems Engineering-Theory & Practice, 2010,
01:015