International Journal

on Marine Navigation

and Safety of Sea Transportation

Volume 6

Number 2

June 2012

181

1 INTRODUCTION

In recent years, data mining has caused great con-

cern to the industry. The so-called mining technolo-

gy is to present large amounts of data by digging in-

to useful information and knowledge, which applies

to business management, production control, market

analysis, engineering design and scientific explora-

tion and other fields. This is the data mining tech-

nology on information processing in navigation, the

use of data mining association rules FP-TREE algo-

rithm, mining the large number of ships AIS infor-

mation, access to the ship's route similarity analysis.

1.1 FP-TREE algorithm [1]

Data mining technology is the result of the devel-

opment of information technology, since the 60

years since the last century, databases, and infor-

mation technology has systematically evolved from

the original document processing to complex and

powerful database system, and now the technology

is quite mature. Data mining is the definitions of raw

data from a large number of extracted or “dig out"

the useful information into knowledge.

In the knowledge model of data mining, associa-

tion rules model is the more important one. The con-

cept of association rules by the Agrawal, Imielinski,

Swami suggested that the data in a simple but very

useful rule. Association rules models are descriptive

models, association rules discovery algorithm is un-

supervised learning.

FP-TREE [1][2][3] is one of association rule min-

ing algorithm, which uses divide and conquer strate-

gy, after the first pass scan, the frequency of the da-

tabase into a set of compressed frequent pattern tree

(FP-TREE), while retaining the associated infor-

mation, then FP-TREE library to differentiate into a

number of conditions, the length of each library, and

a frequency of 1 set of related libraries of these con-

ditions were re-excavation.

Applied Research of Route Similarity Analysis

Based on Association Rules

Z. Xiang, R. Liu, Q. Hu & C. Shi

Merchant Marine College, Shanghai Maritime University, China

ABSTRACT: In recent years, with the development of information technology, businesses have accumulated

a lot of useful historical data, as the shipping industry does. These data can be found deposited a large number

of "knowledge", for example, Shipping records for historical information, Ship-Port relations information,

Ship-ship relations information, Port & shipping route relations, Shipping route information. It can provide

intellectual support to shipping informatization development.

Association rules in data mining technology is one of important technologies. The technology, based on sta-

tistical methods, can mine the associated and implied "knowledge" from data warehouse ,which has a large

number of accumulated data. Apart from this, the technology can also play an important role in the prediction.

In this paper, based on FP-growth algorithm, we improve it forming Relevent ships routes.

From the prevalent perspective of data mining, deal with the corresponding vessels' dynamic information, ac-

quired from the AIS, such as data collection, data statistics. On this basis, get the ship-port relation and ship-

ship relation after a certain level of data analysis, processing, handing. Furthermore, this paper use the numer-

ous historical ship-port relation and ship-ship relation to build a mathematical model on the ship-port and

ship-ship relation. And use the improved association algorithm, FP-growth algorithm, to acquire the strong

association rules between ship-port and ship-ship, and eventually mine the similarity of the ship route.

Main points of this paper as follows:

Collect ,count and check the data, which is from ship dynamic information;

Establish the mathematical model between ship-port and ship-ship relation;

Improve the algorithm;

Analyse the similarity of ship route more accurately using the improved algorithm.

182

1.2 AIS information [4]

Automatic Identification System (AIS) is the current

advanced ship-aided navigation equipment; the In-

ternational Maritime Organization has adopted the

mandatory installation of AIS requirements. AIS can

automatically send the ship a static continuous, dy-

namic and voyage information, security, short mes-

sage, but can also automatically receive the infor-

mation sent around the ship, and exchange

information with the coast station.

AIS information includes static information such

as name, call sign, MMSI, ship type, ship size and

other information, and dynamic information such as

vessel position, ground speed, navigational status,

draft, destination, estimated time of arrival and other

information, but also with the voyage information

and security-related text messages.

1.3 AIS information on data mining [5][6]

In this paper, using association rules such as FP-

TREE algorithms, we deal with the corresponding

vessels' dynamic information, such as estimated time

of arrival, port of destination and departure infor-

mation, acquired from the AIS, such as data collec-

tion, data statistics. On this basis, get the ship-port

relation and ship-ship relation after a certain level of

data analysis, processing, handing to build a mathe-

matical model on the ship-port and ship-ship relation

to acquire the strong association rules between ship-

port and ship-ship, and eventually mine the similari-

ty of the ship route.

2 THE PROPOSAL OF THIS PAPER

At first, we used to provide users dynamic-related

information services with AIS ship data, such as re-

al-time latitude and longitude of the ship, speed, ar-

rival time, etc. Also including the port border query-

ing. But we are lack of a more intelligent data

mining reports service, for example, previous leaves

harbor of the ships, the ships’ similar route. These

are the users concerned, and it is very useful.

Therefore, in order to provide such services, we

need to mine existing data to find out the regular

pattern between ships and their anchored port and

also we need to find out ships’ similar routes(the

similar route among ships).The conclusion we mine

from AIS data can provide information on the effect

of enhanced information services . We collected

ships infomation through the www.manyships.com

dynamic data, and similarly, the Chinese port border

has also been part of we collect. It is easy for us to

find the ship's ports and similar routes and provide

us a lot of supports for data ming.

The most important task of this paper is: data min-

ing, to identify routes of ships similar to the (first

find was anchored in this port).

3 PREPARATION FOR DATA MING

3.1 Data collection

This article uses real-time AIS data from

www.manyships.com. AIS database established a

table for each ships, with the AIS data updating con-

tinuously the database update this table as table 1.

The number of table is Real-time. So, the first

step we need to do is to scheduler an interval job to

collect data using database snapshots[4]. With this

methods,we can collect one week or one month even

more AIS data[5] . To say that, for accurcy re-

sults,job’s interval can not be too long.

3.2 Data processing and Summary

After collecting the data,we must immediately calcu-

late whether the ship is in the ports and also calcu-

late its last port .

Finally we proceed the AIS data for each ships. The-

se data included the ship's number(mmsi), name,

port of arrival, arrival times, the last port, the last

departure time, callsign, type, so as follow. These

data be prepared for data mining as table 2.

Each port has its own id, so ‘fid’ in the picture

means the port’s id , ’prevfid’ means the last port’s

id. With these data we can be ming AIS data.

Table 1. Original datasets snapshot

183

Table 2. Datasets snapshot after procceed

4 MINING THE SIMILAR PATH AND THE

RELEVENT PORT

In order to retrieve the relative path between the

ship, we use association rule discovery methods to

determine the correlation between the ports and the

ships.

The central idea of this method is: Calculate the

ship reaches port summary statistics. Use summary

information as a transaction set; The port each ship

reached (or leave) is the association rules item. Each

transaction is a single arrangement of the ship reach-

es port. Aim of the algorithm is digging out the rele-

vant port, and getting the ships' similar paths.

4.1 Algorithm is defined as follows[1][3]:

(1).

}......,{

21 m

IIII =

is item set.

(2). Let

transaction sets, each transaction

IT ⊆

(3). Each transaction has a unique identifier TID. In

addition, add a field to indicate the transaction from

the ship (the number of ship).

For an example of Transaction sets is as table 3:

Table 3 example of Transaction sets

___________________________________________________

TID PORTID Mmsi

___________________________________________________

1 200 , 47 , 48, 203 , 56 , 207 , 88 , 1 412402820

2 47 , 21, 48, 200, 66 , 71 412403000

3 21, 200 , 42 , 59 , 88 , 1 412403660

4 21, 48 , 49 , 88 , 71, 6 412410010

5 47 ,200 , 48 , 62 , 66 , 51 , 71, 1 412429760

___________________________________________________

Each port has an unique identity number ,so as

ships. In this paper, Example uses the portid and

mmsi to instead of the specific ports and ships.

4.2 Algorithm Description:

We intend to make a FP-tree for the transaction sets,

the tree node is a transaction set, namely the port

(except root node). Edge of the FP-tree is ships’ col-

lection, on behalf of the ship go across the two ports.

We use ‘Mmsi’ instead of ship’s name.(mmsi is the

unique identity of the ships in database).

The structure of Node data as table 4 :

Table 4 The structure of Node

___________________________________________________

Parent Portid Support Child

___________________________________________________

The structure of Node data contains the number

of ports, its support count, its child node, its parent

node.

The structure of Edge data is as table 5:

Table 5 The structure of Edge

___________________________________________________

PortID1 Mmsi[] mmsi Support[] PortID2

___________________________________________________

Edge data structure as shown above, Mmsi [] ex-

pressed as mmsi array, a collection of both ships

Mmsi number. Similarly, the corresponding number

of ships mmsi also have the corresponding support

count. PortID1 and PortID2 mean this edge belongs

to which two port nodes.

4.3 Steps of the algorithm

According to the minimum support, we use the

transaction sets generated a FP-tree by FP-growth

algorithm.

1 [1][2] scan transaction set D, in order to acquire

all the frequent items contained in D ,we named

the collection of these items F,and also caculate

their respective support. Frequent items in de-

184

scending order according to their support, the re-

sults recorded as L(table 6):

2 [1][2] Create the root of FP-tree T, marks "null";

Then, for each transaction TID following:

According to the order of L to select and sort the

frequent items TID. Let the sorted frequent item

list as [x|P], x is the first frequent item, and P is

the remaining frequent items; Then call IN-

SERT_TREE([x|P],T).The INSERT_TREE([x|P],

T) follows the process of implementation: If T

have their children named N and meet the N. Por-

tid = x. Portid, increase the N's Support 1; else

create a new node N, its count is set to 1, link to

its parent node T and through the node chain

structure link to the node with the same Portid. If

P is not empty, recursively call the IN-

SERT_TREE (P, N).

In addition, whether the addition of new nodes,

each edge must be coupled with corresponding

Mmsi TID number. if repeated, Mmsi Support

count increased by 1.

Re-scanned in the transaction set, a complete with

side information on FP-tree built.

Using the example 3.1 , let the minimum support

MIN_SUPPORT = 3, the 1-frequent itemsets is as

follows:

1 - frequent item sets as table 6 :

Table 6 1 - frequent

_________________

item count

_________________

200 4

48 4

47 3

21 3

88 3

71 3

1 3

_________________

And we also acquire the FP-tree as figure 1.

The information of edge as table 7:

Table 7 information of edge after calculated

___________________________________________________

Edge ID PortID1 Mmsi[] mmsiSupport[] PortID2

___________________________________________________

1 200 412402820, 4,4,2 48

412403000,

412429760

2 48 412402820, 3,3,1 47

412403000,

412429760

3 47 412403000 3 21

4 21 412403000 1 71

5 47 412402820 1 88

6 88 412402820 1 1

7 47 412403000 1 71

8 71 412402820 1 1

9 200 412403660 1 21

10 21 412403660 1 88

11 88 412403660 1 1

12 48 412410010 1 21

13 21 412410010 1 88

14 88 412410010 1 71

___________________________________________________

Figure 1 generated tree

185

4.4 Mining Rules

We has generated a FP - tree, according to frequent

1 - itemsets, we can mine the frequent pattern as ta-

ble 8:

Table 8 frequent pattern of Ports

___________________________

id frequent pattern of Ports

___________________________

1 200-1

2 200-47

3 48-47

4 200-48-47

5 200-48

6 48-71

___________________________

Meanwhile, according to the minimum support

(min_support) branch cut, we also can get the ship-