Journal is indexed in following databases:



2024 Journal Impact Factor - 0.6
2024 CiteScore - 1.9



HomePage
 




 


 

ISSN 2083-6473
ISSN 2083-6481 (electronic version)
 

 

 

Editor-in-Chief

Associate Editor
Prof. Tomasz Neumann
 

Published by
TransNav, Faculty of Navigation
Gdynia Maritime University
3, John Paul II Avenue
81-345 Gdynia, POLAND
www http://www.transnav.eu
e-mail transnav@umg.edu.pl
Optimizing AIS Data Format Based on HELCOM Datasets
1 Tallinn University of Technology, Tallinn, Estonia
ABSTRACT: Automatic Identification System (AIS) data plays a vital role in a wide range of maritime research areas, including logistics optimization, navigational safety analysis, economic activity monitoring, and environmental impact assessment. The HELCOM (Helsinki Commission) organization collects and maintains extensive AIS data for the Baltic Sea region, offering researchers valuable insights into vessel movement and marine traffic patterns. However, the raw AIS data (typically provided in CSV plaintext format) is often large and inefficient to store due to a) plain-text redundancy, b) high levels of duplication and repetitive information. For effective storage and transmission, AIS data is usually compressed as it is, using widely used compression tools (e.g. zip archive). In this study, we investigate techniques for optimizing the storage of HELCOM AIS data by manipulations of data format and structure. Our research reveals that after the undertaken steps, the size of the uncompressed dataset decreased by approx. 60%; the compressed dataset size decreased by approx. 90% compared to the original, revealing the potential for substantial storage savings. To further improve data handling, we experimented with various structural optimizations of the CSV format, including data arranging by core attributes, column ordering optimization, dataset normalization involving the segregation of mutable and immutable parts. For example, vessel-specific attributes such as ship name, MMSI (Maritime Mobile Service Identity) code, IMO (International Maritime Organization), origin, and dimensions, which stay the same across records for a vessel, can be moved into a separate file during normalization, which significantly reduces the dataset size. The article compares several AIS data persisting strategies to identify the most memory-efficient approaches. Furthermore, we introduce a data generation tool that produces synthetic AIS datasets in customizable formats and patterns. This tool enables reproducibility of the study and supports further experimentation with AIS data optimization approaches.
KEYWORDS:
REFERENCES
Clissa, L. (2022). Survey of big data sizes in 2021. arXiv. https://doi.org/10.48550/arXiv.2202.07659
Corvino, M., Daffinà, F., Francalanci, C., Giacomazzi, P., Magliani, M., Ravanelli, P., & Stahl, T. (2025). A Methodology to extract Geo-Referenced Standard Routes from AIS Data [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2503.22734
Monserrate, S. G. (2022). The Cloud Is Material: On the Environmental Impacts of Computation and Data Storage. MIT Case Studies in Social and Ethical Responsibilities of Computing, Winter 2022. https://doi.org/10.21428/2c646de5.031d4553
Safdie, S. (2024). What is the carbon footprint of data storage? Greenly. https://greenly.earth/en-gb/blog/industries/what-is-the-carbon-footprint-of-data-storage
Aujoux, C., Kotera, K., & Blanchard, O. (2021). Estimating the carbon footprint of the GRAND project, a multi-decade astrophysics experiment. Astroparticle Physics, 131, 102587. https://doi.org/10.1016/j.astropartphys.2021.102587
Istrate, R., Tulus, V., Grass, R. N., Vanbever, L., Stark, W. J., & Guillén-Gosálbez, G. (2024). The environmental sustainability of digital content consumption. Nature Communications, 15(1), 3981. https://doi.org/10.1038/s41467-024-47621-w
Stecuła, B., Stecuła, K., & Kapczyński, A. (2022). Compression of text in selected languages—Efficiency, volume, and time comparison. Sensors, 22(17), 6393. https://doi.org/10.3390/s22176393
Sobczyński, S. (2025, May 25). zstd vs zip vs 7-Zip (LZMA2): .NET compression benchmark. hasto.pl. https://hasto.pl/compression-benchmark-zip-vs-7-zip-lzma2-vs-zstandard
Shadura, O., Bockelman, B. P., Canal, P., Piparo, D., & Zhang, Z. (2020). ROOT I/O compression improvements for HEP analysis. EPJ Web of Conferences, 245, 02017.
Marcon, C., Mete, A. S., Van Gemmeren, P., & Carminati, L. (2024). Optimizing ATLAS data storage: The impact of compression algorithms on ATLAS physics analysis data formats. EPJ Web of Conferences, 295, 03027. https://doi.org/10.1051/epjconf/202429503027
Mao, Y., Cui, Y., Kuo, T., & Xue, C. J. (2022). A fast transformer-based General-Purpose lossless compressor. arXiv.org. https://arxiv.org/abs/2203.16114
Gastegger, M. (2020, June). A performance comparison of 7z/LZMA and 7z/bzip2/tar [Unpublished manuscript, TU Wien]. ResearchGate. Retrieved from https://www.researchgate.net/publication/350049637_A_Performance_Comparison_of_7zLZMA_and_7zbzip2tar
Zhang, T., Wang, Z., & Wang, P. (2024). A method for compressing AIS trajectory based on the adaptive core threshold difference Douglas–Peucker algorithm. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-71779-4
Citation note:
Šustov K., Zaitseva-Pärnaste I.: Optimizing AIS Data Format Based on HELCOM Datasets. TransNav, the International Journal on Marine Navigation and Safety of Sea Transportation, Vol. 19, No. 4, doi:10.12716/1001.19.04.16, pp. 1189-1194, 2025

Other publications of authors:


File downloaded 2 times








Important: TransNav.eu cookie usage
The TransNav.eu website uses certain cookies. A cookie is a text-only string of information that the TransNav.EU website transfers to the cookie file of the browser on your computer. Cookies allow the TransNav.eu website to perform properly and remember your browsing history. Cookies also help a website to arrange content to match your preferred interests more quickly. Cookies alone cannot be used to identify you.
Akceptuję pliki cookies z tej strony