365

1 INTRODUCTION

Satellite remote sensing has become a vital tool for

monitoring and analysing environmental processes

across diverse spatial and temporal scales. Among the

numerous satellite missions, the European Space

Agency's Sentinel-2 program stands out for its high-

resolution optical imagery and frequent revisit times,

providing critical data for applications such as land

cover mapping [1] and coastal zone management.

However, the effectiveness of satellite imagery is often

diminished by atmospheric interference, particularly

cloud cover, which can obscure surface details, reduce

image quality, and potentially distort analytical results.

This challenge is especially pressing in consistently

cloudy areas, like coastal regions, where accurate

environmental analysis is crucial [2].

In maritime environments, optical satellite imagery

is increasingly used to support coastal monitoring,

hydrographic surveys, shoreline mapping, vessel

detection, environmental hazard assessment, and

navigation-related decision-making. However, cloud

contamination significantly limits the availability and

reliability of remotely sensed information. Therefore,

accurate cloud detection constitutes an essential

preprocessing step for maritime monitoring systems

and navigation-support applications.

Accurately detecting clouds is a fundamental

preprocessing step in remote sensing workflows.

While thick clouds are generally identifiable in

standard true-colour composites, thin cirrus clouds

present a significant challenge due to their low optical

thickness and partial transparency. These clouds are

difficult to detect over water bodies, where the spectral

contrast between clouds and the underlying surface is

minimal.

Various cloud detection algorithms have been

developed to address this challenge, including rule-

based methods, thresholding techniques, and machine

learning approaches. Among these, the Random Forest

(RF) algorithm has gained popularity. Studies have

demonstrated the effectiveness of RF-based methods in

Hybrid Method for Cloud Detection Using Sentinel-2

Imagery and Spectral Indices

O. Specht

Gdynia Maritime University, Gdynia, Poland

ABSTRACT: Satellite imagery constitutes an extremely valuable source of information about the natural

environment and the processes occurring on the Earth's surface. However, the availability of useful optical

imagery is often significantly limited by atmospheric factors, most notably the presence of clouds. Cloud cover

can completely hinder the interpretation of satellite images or lead to erroneous analysis results if not properly

identified and filtered out. Moreover, not all clouds are easily detectable—thin cirrus clouds located above water

bodies often remain unnoticed in basic RGB visualizations. This article aims to develop and validate an effective

method for cloud identification using selected spectral bands and remote sensing indices, which was further

tested on the coastal waters of Poland.

http://www.transnav.eu

the International Journal

on Marine Navigation

and Safety of Sea Transportation

Volume 20

Number 2

June 2026

DOI: 10.12716/1001.20.02.11

366

cloud detection tasks, particularly when combined

with spectral indices that enhance the contrast between

clouds and other surface features [3].

Spectral indices, such as the Normalized Difference

Vegetation Index (NDVI), Normalized Difference

Snow Index (NDSI), and Cloud Discrimination Index

(CDI), are widely used to improve cloud detection

accuracy. These indices take advantage of the unique

spectral characteristics of clouds in contrast to

vegetation, snow, and other land cover types, enabling

more precise classification. The primary objective of

this study is to develop and evaluate a hybrid cloud

detection method that integrates selected spectral

indices (NDVI, NDSI, CDI) with the Random Forest

algorithm to enhance the accuracy of cloud

identification in Sentinel-2 imagery over coastal areas.

The proposed approach aims to address the limitations

of existing methods in detecting thin cirrus clouds,

particularly over water bodies, and to provide a

reliable tool for preprocessing optical satellite data in

environmental monitoring applications.

This article is structured into four main sections.

The first section, Introduction, presents the motivation

for addressing this research topic. The second section,

Materials and Methods, describes the study area, data

sources, preprocessing steps, training sample

preparation, and the proposed hybrid method. The

third section, Results, discusses the outcomes of

applying the method to a cloud-affected Sentinel-2

scene over northern Poland. The article concludes with

a general summary of the findings.

2 MATERIALS AND METHODS

2.1 Study Area

The satellite image shows the northern region of

Poland, highlighting the coastal area of the Pomeranian

Voivodeship. It includes part of the southern Baltic Sea,

specifically Gdańsk Bay (Zatoka Gdańska), the Hel

Peninsula (Półwysep Helski), and the shoreline

extending westward toward the open Baltic Sea

(Fig. 1). This location was selected for its frequent use

as a measurement site.

Figure 1. Location of the study area in northern Poland.

The Gulf of Gdańsk is an important maritime area

characterized by intensive shipping traffic, port

operations, hydrographic surveys, and coastal

management activities. Reliable satellite-based

observations are therefore important for supporting

marine monitoring and navigation-related

applications. The region features a variety of surface

types, including forests, urban areas, the coastal waters

of the Gulf of Gdańsk, and parts of the Vistula Lagoon.

This land–water diversity, combined with dynamic

environmental conditions, makes it a valuable yet

challenging area for geospatial analysis. One of the

major challenges in the remote sensing of this region is

the frequent cloud cover, especially during the autumn

and winter months. In Gdynia, December is the

cloudiest month, with the sky being overcast or mostly

cloudy about 69% of the time [4]. These conditions

significantly limit the availability of cloud-free optical

satellite imagery, underscoring the need for a cloud

extraction method.

2.2 Data used

To establish and validate the proposed method, we

utilized multispectral satellite imagery obtained from

the Sentinel-2 mission. Developed by the European

Space Agency (ESA) as part of the Copernicus

program, Sentinel-2 is a multispectral Earth

observation mission [5]. It offers high-resolution

optical imagery tailored for applications including

land-use mapping, vegetation monitoring, water

quality assessment, and coastal zone studies.

The Sentinel-2 satellites are equipped with the

MultiSpectral Instrument (MSI), which collects

imagery across 13 spectral bands ranging from the

visible and near-infrared (VNIR) to the short-wave

infrared (SWIR) region. These bands vary in spatial

resolution—10 m, 20 m, and 60 m—depending on their

wavelength and intended application (European Space

Agency) [6]. The spectral characteristics of the Sentinel-

2 bands used in this study are summarized below

(Tab. 1):

Table 1. Sentinel-2 Spectral Band Characteristics [6]

Band

Description

Spatial Resolution

[m]

Central Wavelength

[nm]

Ultra Blue

443

Blue

490

Green

560

Red

665

Red Edge (VNIR)

705

Red Edge (VNIR)

740

Red Edge (VNIR)

783

Near Infrared (NIR)

842

B8a

Narrow NIR

(VNIR)

865

Water Vapour

(SWIR)

940

B10

Cirrus (SWIR)

1375

B11

SWIR

1610

B12

SWIR

2190

In this study, Sentinel-2 Level-2A satellite imagery

was used to develop and test the proposed

methodology. Specifically, an image acquired on

September 6, 2024, covering the northern part of

Poland and its surrounding water bodies, was selected

as the primary dataset. The methods primarily utilized

the spectral bands: Band 2 (Blue, 490 nm), Band 3

367

(Green, 560 nm), Band 4 (Red, 665 nm), Band 8 (Near-

Infrared, 842 nm), and Band 11 (Short-Wave Infrared,

1610 nm) [6].

2.3 Preprocessing

As an initial step, a review and analysis of commonly

used spectral indices was conducted to identify those

that are most suitable for cloud detection in

multispectral satellite imagery. These indices are

frequently utilized in remote sensing to enhance the

spectral contrast between various surface types, such

as vegetation, soil, water, snow, and clouds. Several

indices were assessed based on literature sources and

preliminary testing with Sentinel-2 imagery, including:

− NDVI (Normalized Difference Vegetation Index) –

used to identify vegetated areas and marine algae

monitoring [7-8],

− NDWI (Normalized Difference Water Index) –

useful in distinguishing water bodies from land [9],

the delineation of open water features [10], and

cloud-covered areas.

− NDSI (Normalized Difference Snow Index) – often

used to discriminate between snow and clouds as

well as bright land features [11],

− BI (Brightness Index) – highlights bright surfaces,

such as clouds and bare soil [12],

− CDI (Cloud Displacement Index) – developed to

improve cloud detection by analysing spectral

dissimilarities over time or across bands [13].

Based on the literature review and empirical

analysis of Sentinel-2 data in the study area, the

following three spectral indices were selected for the

proposed cloud detection method: NDVI, NDSI, and

CDI.

The NDVI is calculated as follows [7-8]:

NIR RED

−

(1)

where:

NIR–recording in the near-infrared range (the so-called

NIR band),

RED– recording in the green-light range (so-called red

band).

NDVI values close to 1 indicate the presence of

healthy vegetation, while negative values suggest the

presence of water, clouds, or snow. In the case of water

bodies, NDVI values are often similar to those of

clouds, which may lead to classification challenges in

such areas.

The Normalized Difference Snow Index is

calculated using the following formula [11]:

GREEN SWIR

−

(2)

where:

GREEN – reflectance in the green-light range (so-called

green band),

SWIR – reflectance in the short-wave infrared range

(so-called SWIR band),

High NDSI values typically indicate snow or

clouds, as both reflect strongly in the visible spectrum

(especially green) but absorb more in the SWIR range.

This makes NDSI particularly useful for distinguishing

bright surfaces such as snow and clouds from other

land covers.

A simplified formulation of CDI can be expressed

as [13]:

BLUE SWIR

−

(3)

where:

BLUE – reflectance in the blue-light range (so-called

blue band).

This index is particularly effective for separating

clouds from bright non-cloud surfaces.

2.4 Training samples preparation

For supervised classification, a set of training samples

was manually prepared using a point-based approach,

which is one of the simplest and most effective

methods for collecting labelled data in remote sensing

analysis. Each point represents a pixel with a known

class and is used to guide the classification algorithm

in distinguishing between different surface and

atmospheric conditions.

Three classes were defined to represent different

cloud conditions:

− Class 0 – No Cloud: points were placed over cloud-

free areas such as open water, vegetation, and urban

surfaces.

− Class 1 – Thin Clouds: samples were collected from

light, semi-transparent clouds, especially cirrus

formations.

− Class 2 – Thick Clouds: points were placed on

bright, dense cloud formations with strong

reflectance.

To ensure balanced representation and reduce bias,

points were distributed evenly across the entire image,

avoiding excessive clustering in specific regions. For

each class, a minimum of 30 to 50 points should be

collected, which is considered a good practice for

achieving reliable classification accuracy. Once the

training points were digitized, pixel values were

extracted from selected input layers, including both

spectral bands and calculated indices. The input

features included:

− Spectral bands: B2 (Blue), B3 (Green), B4 (Red), B8

(NIR), and B11 (SWIR)

− Spectral indices: NDVI, NDSI, and CDI

These raster layers were stacked, and their pixel

values at each training point's location were extracted

to build the training dataset.

2.5 Hybrid cloud detection approach

In this study, a hybrid approach was implemented for

cloud detection by combining spectral indices and

Sentinel-2 spectral bands with a supervised

classification algorithm. The method integrates

physically based indicators with data-driven machine

learning to improve classification accuracy in

spectrally complex environments, such as coastal

zones (Fig. 2).

368

Figure 2. Hybrid Cloud Detection Workflow.

The Random Forest (RF) algorithm was chosen as

the classification method because of its well-

documented robustness, high performance, and

resistance to overfitting in remote sensing applications

[11-12]. Random Forest is an ensemble learning

approach that constructs multiple decision trees

during training and outputs the class that represents

the mode of the predictions made by the individual

trees. It is especially effective in managing high-

dimensional input data and heterogeneous feature

sets.

In this study, the RF model was trained using a

feature set that included the previously mentioned

spectral indices—NDVI, NDSI, and CDI—along with

Sentinel-2 spectral bands: Band 2 (Blue), Band 3

(Green), Band 4 (Red), Band 8 (NIR), and Band 11

(SWIR). This feature combination was chosen based on

its proven relevance for distinguishing between cloud

types and cloud-free surfaces, particularly in coastal

and mixed land-water regions.

To evaluate the performance of the classification

model, the confusion matrix, precision, and recall were

calculated for each class based on the confusion matrix.

Precision (P) is defined as the proportion of correctly

predicted instances of a given class to the total number

of cases predicted as that class:

TP FP+

(4)

where:

TP (True Positives) – the number of points that were

correctly classified as belonging to the target class,

FP (False Positives) – the number of points that were

incorrectly classified as belonging to the target class,

when in fact they belong to a different one.

Recall (R) is defined as the proportion of correctly

predicted instances of a given class to the total number

of actual cases of that class in the reference dataset:

TP FN+

(5)

where:

FN (False Negatives) – the number of points that truly

belong to the target class but were incorrectly classified

as belonging to another class.

Together, precision and recall provide

a comprehensive view of classification performance,

striking a balance between the trade-off between false

positives and false negatives.

3 RESULTS

The first step in evaluating the effectiveness of the

proposed cloud detection method involved a visual

and analytical assessment of the spectral index

outputs. The three selected indices—NDVI, NDSI, and

CDI—were used with the Sentinel-2 imagery to

highlight various surface characteristics relevant to

cloud detection.

The first index analysed is the NDVI, which is

commonly used to identify vegetated areas by utilizing

the spectral contrast between the red and near-infrared

bands (Fig. 3).

Figure 3. The map presents the NDVI values over the coastal

and inland areas of northern Poland.

The NDVI effectively highlights vegetated areas

(green to blue), but it cannot reliably differentiate

between clouds and water surfaces, nor can it detect

thin clouds with high confidence. These findings

support the decision to use the NDVI only as part of a

hybrid approach, complemented by indices such as

369

NDSI and CDI, which are more sensitive to cloud

characteristics.

The next index assessed in this study is the NDSI

(Fig. 4).

Figure 4. NDSI distribution across the study area based on

Sentinel-2 imagery (06.09.2024).

This result confirms that NDSI is more sensitive to

the presence of optically thick clouds, particularly in

regions where NDVI struggled to differentiate clouds

from water surfaces. However, the index may still have

difficulty with thin cirrus clouds or high-altitude haze,

which do not create a strong contrast in green versus

SWIR reflectance. Therefore, further enhancement is

accomplished with the Cloud Displacement Index

(Fig. 5).

Figure 5. CDI distribution over the study area based on

Sentinel-2 imagery (06.09.2024).

The CDI map confirms its role as a vital component

in hybrid cloud detection, especially for distinguishing

clouds from spectrally similar surfaces like snow, sand,

or water. Its inclusion in the classification model

greatly improves the identification of both thin and

thick clouds, which NDVI or NDSI alone may not fully

capture.

Based on the generated spectral index maps (NDVI,

NDSI, CDI) and the reference true colour image (Fig.

6), the process of identifying training data (Fig. 7) was

initiated.

Figure 6. Analysed area in true colour.

Figure 7. Distribution of training points for cloud

classification.

After constructing the classification model, it was

then applied to the entire Sentinel-2 scene to generate a

complete cloud cover map (Fig. 8).

Figure 8. Cloud classification applied to the entire Sentinel-2

scene acquired over the Gdańsk Bay.

In the next stage of the study, the cloud

classification map underwent validation. A confusion

matrix (Fig. 9) was used to assess the model’s

performance.

370

Figure 9. Confusion matrix showing the classification results

of the cloud detection model. Class 0 represents cloud-free

areas, class 1 corresponds to thin clouds, and class 2 refers to

thick clouds.

The confusion matrix shows that the classifier

performed exceptionally well in distinguishing cloud-

free areas (class 0), achieving a perfect classification

with 23 correctly identified samples and no

misclassifications. Thin clouds (class 1) were classified

correctly in 16 instances, while 1 instance was

misclassified as class 0 and 6 as class 2, indicating some

spectral overlap with both neighbouring categories.

For thick clouds (class 2), the model achieved 30 correct

classifications, with minor confusion: 1 sample was

misclassified as class 0, and 6 as class 1. Overall, the

classifier demonstrated high accuracy across all

categories, though further refinement could enhance

the discrimination between thin and thick clouds.

To quantitatively evaluate the classification model,

precision and recall were calculated for each class

(Tab. 2).

Table 2. Precision and recall values calculated for each class

based on the confusion matrix.

Class

Precision

Recall

Sample

0 – no clouds

0.92

1.00

1 – thin clouds

0.73

0.70

2 – thick clouds

0.83

0.81

The evaluation metrics indicate that the model

excelled in identifying cloud-free areas (class 0),

achieving both high precision (0.92) and perfect recall

(1.00), which means that no cloud-free samples were

misclassified. For thick clouds (class 2), the model also

demonstrated strong performance, with balanced

precision (0.83) and recall (0.81), suggesting

dependable detection of dense cloud structures.

The lowest precision and recall were observed for

thin clouds (class 1), with values of 0.73 and 0.70,

respectively. This result suggests that thin clouds are

more challenging to classify accurately, likely due to

their lower spectral contrast and partial transparency,

which can create confusion with both clear sky and

thick cloud classes. Incorporating additional spectral

indices or training samples may improve the

classification of this category in future work.

4 CONCLUSIONS

This study demonstrated the effectiveness of a hybrid

cloud detection method that combines selected spectral

indices (NDVI, NDSI, CDI) with Sentinel-2 spectral

bands and a supervised learning approach using a

machine learning algorithm (Random Forest). The

proposed approach enabled accurate identification of

various cloud types over a spectrally complex coastal

region in northern Poland.

The results indicated that the model excelled in

detecting cloud-free areas, achieving a precision of 0.92

and a perfect recall of 1.00. Thick clouds were also

classified reliably (precision = 0.83; recall = 0.81), while

thin clouds posed the greatest challenge due to their

spectral similarity to both clear skies and thick clouds

(precision = 0.73; recall = 0.70). These findings

emphasize the importance of combining spectral

indicators with effective classification techniques to

reduce misclassification in complex land–water

environments.

The method’s simplicity, flexibility, and relatively

low computational cost make it suitable for operational

maritime applications, including coastal monitoring,

hydrographic data preprocessing, shoreline mapping,

vessel detection support, and navigation-related

environmental monitoring. By improving the

reliability of cloud-free Sentinel-2 products, the

proposed approach may support safer and more

efficient decision-making in coastal and marine

environments. Future research may explore

integrating additional spectral features, time series

analysis, or deep learning-based models to enhance the

detection of optically thin clouds and haze layers.

REFERENCES

[1] D. Phiri, M. Simwanda, S. Salekin, V. R. Nyirenda, Y.

Murayama, and M. Ranagalage, “Sentinel-2 data for land

cover/use mapping: A review,” Remote Sensing, vol. 12,

no. 14, p. 2291, Jul. 2020, doi: 10.3390/rs12142291.

[2] F. Rodríguez-Puerta, R. L. Perroy, C. Barrera, J. P. Price,

and B. García-Pascual, “Five-year evaluation of Sentinel-

2 cloud-free mosaic generation under varied cloud cover

conditions in Hawai’i,” Remote Sensing, vol. 16, no. 24, p.

4791, 2024, doi: 10.3390/rs16244791.

[3] B. Zhou, S. Gao, Y. Yin, Y. Zhang, Y. Yu, Q. Qian, and M.

Zhu, “Enhancing active fire detection in Sentinel-2

imagery using GLCM texture features in random forest

models,” Scientific Reports, vol. 14, p. 31076, 2024, doi:

10.1038/s41598-024-81976-w.

[4] WeatherSpark, “Average weather in Gdynia, Poland, year

round.” [Online]. Available:

https://weatherspark.com/y/84137/Average-Weather-in-

Gdynia-Poland-Year-Round

[5] European Space Agency, “Sentinel-2 User Handbook,”

ESA Standard Document, 2015. [Online]. Available:

https://sentinel.esa.int/documents/247904/685211/Sentine

l-2_User_Handbook

[6] European Space Agency, “Sentinel-2 MSI Technical

Guide,” 2023. [Online]. Available:

https://sentinel.esa.int/web/sentinel/technical-

guides/sentinel-2-msi

[7] S. Huang, L. Tang, J. P. Hupy, G. A. Wang, and J. Shao, “A

commentary review on the use of normalized difference

vegetation index (NDVI) in the era of popular remote

sensing,” Journal of Forestry Research, vol. 32, pp. 1–6,

2021, doi: 10.1007/s11676-020-01155-1.

371

[8] O. Lewicka, “Application of NDVI for marine algae

monitoring: a Polish case study,” in Proc. IEEE Int.

Workshop on Metrology for the Sea (MetroSea), La

Valletta, Malta, 2023, pp. 57–61, doi:

10.1109/MetroSea58055.2023.10317358.

[9] Q. Guo, R. Pu, J. Li, and J. Cheng, “A weighted normalized

difference water index for water extraction using Landsat

imagery,” International Journal of Remote Sensing, vol.

38, no. 19, pp. 5430–5445, 2017, doi:

10.1080/01431161.2017.1341667.

[10] S. K. McFeeters, “The use of the normalized difference

water index (NDWI) in the delineation of open water

features,” International Journal of Remote Sensing, vol.

17, no. 7, pp. 1425–1432, 1996, doi:

10.1080/01431169608948714.

[11] S. Raghubanshi, R. Agrawal, and B. P. Rathore,

“Enhanced snow cover mapping using objectbased

classification and normalized difference snow index

(NDSI),” Earth Science Informatics, vol. 16, pp. 2813–

2824, 2023, doi: 10.1007/s12145-023-01077-6.

[12] A. S. Vieira, R. F. do Valle Junior, V. S. Rodrigues, T. L.

da S. Quinaia, R. G. Mendes, C. A. Valera, L. F. S.

Fernandes, and F. A. L. Pacheco, “Estimating water

erosion from the brightness index of orbital images: A

framework for the prognosis of degraded pastures,”

Science of The Total Environment, vol. 776, p. 146019,

2021.

[13] D. Frantz, E. Haß, A. Uhl, J. Stoffels, and J. Hill,

“Improvement of the Fmask algorithm for Sentinel-2

images: Separating clouds from bright surfaces based on

parallax effects,” Remote Sensing of Environment, vol.

215, pp. 471–481, 2018.

[14] L. Breiman, “Random forests,” Machine Learning, vol.

45, no. 1, pp. 5–32, 2001.

[15] G. Belgiu and L. Drăguţ, “Random forest in remote

sensing: A review of applications and future directions,”

ISPRS Journal of Photogrammetry and Remote Sensing,

vol. 114, pp. 24–31, 2016.