Named Entity Disambiguation for Maritime-related Data Retrieved from Heterogenous Sources
1 Poznań University of Economics and Business, Poznań, Poland
ABSTRACT: The article concerns integration and disambiguation of data related to the maritime domain. A developed system is described, which collects and merges data about several maritime-related entities (vessels, vessel types, ports, companies etc.) retrieved from different internet sources and feeds the data into a single database. This process is however not trivial. There are few challenges, which need to be faced to successfully conduct it. Firstly, in different sources, entities may be referenced to in different ways, for example, by using different text strings. Additionally, some of these references may be ambiguous, i.e. potentially the reference may point to more than one entity. To enable efficient analysis of data coming from different sources, such ambiguities must be resolved automatically as a preprocessing step, before the data is uploaded to the database and utilized in further computations. The aim of the disambiguation process is to assign artificial, unique identifiers to each entity and then, if possible, automatically assign these identifiers to each data item related to a given entity. In the article, developed methods for resolving such ambiguities are discussed and their evaluation is presented.
Małyszko J., Abramowicz W., Stróżyna M.: Named Entity Disambiguation for Maritime-related Data Retrieved from Heterogenous Sources. TransNav, the International Journal on Marine Navigation and Safety of Sea Transportation, Vol. 10, No. 3, doi:10.12716/1001.10.03.12, pp. 465-477, 2016
