365
1 INTRODUCTION
Satellite remote sensing has become a vital tool for
monitoring and analysing environmental processes
across diverse spatial and temporal scales. Among the
numerous satellite missions, the European Space
Agency's Sentinel-2 program stands out for its high-
resolution optical imagery and frequent revisit times,
providing critical data for applications such as land
cover mapping [1] and coastal zone management.
However, the effectiveness of satellite imagery is often
diminished by atmospheric interference, particularly
cloud cover, which can obscure surface details, reduce
image quality, and potentially distort analytical results.
This challenge is especially pressing in consistently
cloudy areas, like coastal regions, where accurate
environmental analysis is crucial [2].
In maritime environments, optical satellite imagery
is increasingly used to support coastal monitoring,
hydrographic surveys, shoreline mapping, vessel
detection, environmental hazard assessment, and
navigation-related decision-making. However, cloud
contamination significantly limits the availability and
reliability of remotely sensed information. Therefore,
accurate cloud detection constitutes an essential
preprocessing step for maritime monitoring systems
and navigation-support applications.
Accurately detecting clouds is a fundamental
preprocessing step in remote sensing workflows.
While thick clouds are generally identifiable in
standard true-colour composites, thin cirrus clouds
present a significant challenge due to their low optical
thickness and partial transparency. These clouds are
difficult to detect over water bodies, where the spectral
contrast between clouds and the underlying surface is
minimal.
Various cloud detection algorithms have been
developed to address this challenge, including rule-
based methods, thresholding techniques, and machine
learning approaches. Among these, the Random Forest
(RF) algorithm has gained popularity. Studies have
demonstrated the effectiveness of RF-based methods in
Hybrid Method for Cloud Detection Using Sentinel-2
Imagery and Spectral Indices
O. Specht
Gdynia Maritime University, Gdynia, Poland
ABSTRACT: Satellite imagery constitutes an extremely valuable source of information about the natural
environment and the processes occurring on the Earth's surface. However, the availability of useful optical
imagery is often significantly limited by atmospheric factors, most notably the presence of clouds. Cloud cover
can completely hinder the interpretation of satellite images or lead to erroneous analysis results if not properly
identified and filtered out. Moreover, not all clouds are easily detectablethin cirrus clouds located above water
bodies often remain unnoticed in basic RGB visualizations. This article aims to develop and validate an effective
method for cloud identification using selected spectral bands and remote sensing indices, which was further
tested on the coastal waters of Poland.
http://www.transnav.eu
the International Journal
on Marine Navigation
and Safety of Sea Transportation
Volume 20
Number 2
June 2026
DOI: 10.12716/1001.20.02.11
366
cloud detection tasks, particularly when combined
with spectral indices that enhance the contrast between
clouds and other surface features [3].
Spectral indices, such as the Normalized Difference
Vegetation Index (NDVI), Normalized Difference
Snow Index (NDSI), and Cloud Discrimination Index
(CDI), are widely used to improve cloud detection
accuracy. These indices take advantage of the unique
spectral characteristics of clouds in contrast to
vegetation, snow, and other land cover types, enabling
more precise classification. The primary objective of
this study is to develop and evaluate a hybrid cloud
detection method that integrates selected spectral
indices (NDVI, NDSI, CDI) with the Random Forest
algorithm to enhance the accuracy of cloud
identification in Sentinel-2 imagery over coastal areas.
The proposed approach aims to address the limitations
of existing methods in detecting thin cirrus clouds,
particularly over water bodies, and to provide a
reliable tool for preprocessing optical satellite data in
environmental monitoring applications.
This article is structured into four main sections.
The first section, Introduction, presents the motivation
for addressing this research topic. The second section,
Materials and Methods, describes the study area, data
sources, preprocessing steps, training sample
preparation, and the proposed hybrid method. The
third section, Results, discusses the outcomes of
applying the method to a cloud-affected Sentinel-2
scene over northern Poland. The article concludes with
a general summary of the findings.
2 MATERIALS AND METHODS
2.1 Study Area
The satellite image shows the northern region of
Poland, highlighting the coastal area of the Pomeranian
Voivodeship. It includes part of the southern Baltic Sea,
specifically Gdańsk Bay (Zatoka Gdańska), the Hel
Peninsula (Półwysep Helski), and the shoreline
extending westward toward the open Baltic Sea
(Fig. 1). This location was selected for its frequent use
as a measurement site.
Figure 1. Location of the study area in northern Poland.
The Gulf of Gdańsk is an important maritime area
characterized by intensive shipping traffic, port
operations, hydrographic surveys, and coastal
management activities. Reliable satellite-based
observations are therefore important for supporting
marine monitoring and navigation-related
applications. The region features a variety of surface
types, including forests, urban areas, the coastal waters
of the Gulf of Gdańsk, and parts of the Vistula Lagoon.
This landwater diversity, combined with dynamic
environmental conditions, makes it a valuable yet
challenging area for geospatial analysis. One of the
major challenges in the remote sensing of this region is
the frequent cloud cover, especially during the autumn
and winter months. In Gdynia, December is the
cloudiest month, with the sky being overcast or mostly
cloudy about 69% of the time [4]. These conditions
significantly limit the availability of cloud-free optical
satellite imagery, underscoring the need for a cloud
extraction method.
2.2 Data used
To establish and validate the proposed method, we
utilized multispectral satellite imagery obtained from
the Sentinel-2 mission. Developed by the European
Space Agency (ESA) as part of the Copernicus
program, Sentinel-2 is a multispectral Earth
observation mission [5]. It offers high-resolution
optical imagery tailored for applications including
land-use mapping, vegetation monitoring, water
quality assessment, and coastal zone studies.
The Sentinel-2 satellites are equipped with the
MultiSpectral Instrument (MSI), which collects
imagery across 13 spectral bands ranging from the
visible and near-infrared (VNIR) to the short-wave
infrared (SWIR) region. These bands vary in spatial
resolution10 m, 20 m, and 60 mdepending on their
wavelength and intended application (European Space
Agency) [6]. The spectral characteristics of the Sentinel-
2 bands used in this study are summarized below
(Tab. 1):
Table 1. Sentinel-2 Spectral Band Characteristics [6]
Band
Spatial Resolution
[m]
Central Wavelength
[nm]
B1
60
443
B2
10
490
B3
10
560
B4
10
665
B5
20
705
B6
20
740
B7
20
783
B8
10
842
B8a
20
865
B9
60
940
B10
60
1375
B11
20
1610
B12
20
2190
In this study, Sentinel-2 Level-2A satellite imagery
was used to develop and test the proposed
methodology. Specifically, an image acquired on
September 6, 2024, covering the northern part of
Poland and its surrounding water bodies, was selected
as the primary dataset. The methods primarily utilized
the spectral bands: Band 2 (Blue, 490 nm), Band 3
367
(Green, 560 nm), Band 4 (Red, 665 nm), Band 8 (Near-
Infrared, 842 nm), and Band 11 (Short-Wave Infrared,
1610 nm) [6].
2.3 Preprocessing
As an initial step, a review and analysis of commonly
used spectral indices was conducted to identify those
that are most suitable for cloud detection in
multispectral satellite imagery. These indices are
frequently utilized in remote sensing to enhance the
spectral contrast between various surface types, such
as vegetation, soil, water, snow, and clouds. Several
indices were assessed based on literature sources and
preliminary testing with Sentinel-2 imagery, including:
NDVI (Normalized Difference Vegetation Index)
used to identify vegetated areas and marine algae
monitoring [7-8],
NDWI (Normalized Difference Water Index)
useful in distinguishing water bodies from land [9],
the delineation of open water features [10], and
cloud-covered areas.
NDSI (Normalized Difference Snow Index) often
used to discriminate between snow and clouds as
well as bright land features [11],
BI (Brightness Index) highlights bright surfaces,
such as clouds and bare soil [12],
CDI (Cloud Displacement Index) developed to
improve cloud detection by analysing spectral
dissimilarities over time or across bands [13].
Based on the literature review and empirical
analysis of Sentinel-2 data in the study area, the
following three spectral indices were selected for the
proposed cloud detection method: NDVI, NDSI, and
CDI.
The NDVI is calculated as follows [7-8]:
NIR RED
NIR RED
+
(1)
where:
NIRrecording in the near-infrared range (the so-called
NIR band),
RED recording in the green-light range (so-called red
band).
NDVI values close to 1 indicate the presence of
healthy vegetation, while negative values suggest the
presence of water, clouds, or snow. In the case of water
bodies, NDVI values are often similar to those of
clouds, which may lead to classification challenges in
such areas.
The Normalized Difference Snow Index is
calculated using the following formula [11]:
GREEN SWIR
GREEN SWIR
+
(2)
where:
GREEN reflectance in the green-light range (so-called
green band),
SWIR reflectance in the short-wave infrared range
(so-called SWIR band),
High NDSI values typically indicate snow or
clouds, as both reflect strongly in the visible spectrum
(especially green) but absorb more in the SWIR range.
This makes NDSI particularly useful for distinguishing
bright surfaces such as snow and clouds from other
land covers.
A simplified formulation of CDI can be expressed
as [13]:
BLUE SWIR
BLUE SWIR
+
(3)
where:
BLUE reflectance in the blue-light range (so-called
blue band).
This index is particularly effective for separating
clouds from bright non-cloud surfaces.
2.4 Training samples preparation
For supervised classification, a set of training samples
was manually prepared using a point-based approach,
which is one of the simplest and most effective
methods for collecting labelled data in remote sensing
analysis. Each point represents a pixel with a known
class and is used to guide the classification algorithm
in distinguishing between different surface and
atmospheric conditions.
Three classes were defined to represent different
cloud conditions:
Class 0 No Cloud: points were placed over cloud-
free areas such as open water, vegetation, and urban
surfaces.
Class 1 Thin Clouds: samples were collected from
light, semi-transparent clouds, especially cirrus
formations.
Class 2 Thick Clouds: points were placed on
bright, dense cloud formations with strong
reflectance.
To ensure balanced representation and reduce bias,
points were distributed evenly across the entire image,
avoiding excessive clustering in specific regions. For
each class, a minimum of 30 to 50 points should be
collected, which is considered a good practice for
achieving reliable classification accuracy. Once the
training points were digitized, pixel values were
extracted from selected input layers, including both
spectral bands and calculated indices. The input
features included:
Spectral bands: B2 (Blue), B3 (Green), B4 (Red), B8
(NIR), and B11 (SWIR)
Spectral indices: NDVI, NDSI, and CDI
These raster layers were stacked, and their pixel
values at each training point's location were extracted
to build the training dataset.
2.5 Hybrid cloud detection approach
In this study, a hybrid approach was implemented for
cloud detection by combining spectral indices and
Sentinel-2 spectral bands with a supervised
classification algorithm. The method integrates
physically based indicators with data-driven machine
learning to improve classification accuracy in
spectrally complex environments, such as coastal
zones (Fig. 2).
368
Figure 2. Hybrid Cloud Detection Workflow.
The Random Forest (RF) algorithm was chosen as
the classification method because of its well-
documented robustness, high performance, and
resistance to overfitting in remote sensing applications
[11-12]. Random Forest is an ensemble learning
approach that constructs multiple decision trees
during training and outputs the class that represents
the mode of the predictions made by the individual
trees. It is especially effective in managing high-
dimensional input data and heterogeneous feature
sets.
In this study, the RF model was trained using a
feature set that included the previously mentioned
spectral indicesNDVI, NDSI, and CDIalong with
Sentinel-2 spectral bands: Band 2 (Blue), Band 3
(Green), Band 4 (Red), Band 8 (NIR), and Band 11
(SWIR). This feature combination was chosen based on
its proven relevance for distinguishing between cloud
types and cloud-free surfaces, particularly in coastal
and mixed land-water regions.
To evaluate the performance of the classification
model, the confusion matrix, precision, and recall were
calculated for each class based on the confusion matrix.
Precision (P) is defined as the proportion of correctly
predicted instances of a given class to the total number
of cases predicted as that class:
TP
TP FP+
(4)
where:
TP (True Positives) the number of points that were
correctly classified as belonging to the target class,
FP (False Positives) the number of points that were
incorrectly classified as belonging to the target class,
when in fact they belong to a different one.
Recall (R) is defined as the proportion of correctly
predicted instances of a given class to the total number
of actual cases of that class in the reference dataset:
TP
TP FN+
(5)
where:
FN (False Negatives) the number of points that truly
belong to the target class but were incorrectly classified
as belonging to another class.
Together, precision and recall provide
a comprehensive view of classification performance,
striking a balance between the trade-off between false
positives and false negatives.
3 RESULTS
The first step in evaluating the effectiveness of the
proposed cloud detection method involved a visual
and analytical assessment of the spectral index
outputs. The three selected indicesNDVI, NDSI, and
CDIwere used with the Sentinel-2 imagery to
highlight various surface characteristics relevant to
cloud detection.
The first index analysed is the NDVI, which is
commonly used to identify vegetated areas by utilizing
the spectral contrast between the red and near-infrared
bands (Fig. 3).
Figure 3. The map presents the NDVI values over the coastal
and inland areas of northern Poland.
The NDVI effectively highlights vegetated areas
(green to blue), but it cannot reliably differentiate
between clouds and water surfaces, nor can it detect
thin clouds with high confidence. These findings
support the decision to use the NDVI only as part of a
hybrid approach, complemented by indices such as
369
NDSI and CDI, which are more sensitive to cloud
characteristics.
The next index assessed in this study is the NDSI
(Fig. 4).
Figure 4. NDSI distribution across the study area based on
Sentinel-2 imagery (06.09.2024).
This result confirms that NDSI is more sensitive to
the presence of optically thick clouds, particularly in
regions where NDVI struggled to differentiate clouds
from water surfaces. However, the index may still have
difficulty with thin cirrus clouds or high-altitude haze,
which do not create a strong contrast in green versus
SWIR reflectance. Therefore, further enhancement is
accomplished with the Cloud Displacement Index
(Fig. 5).
Figure 5. CDI distribution over the study area based on
Sentinel-2 imagery (06.09.2024).
The CDI map confirms its role as a vital component
in hybrid cloud detection, especially for distinguishing
clouds from spectrally similar surfaces like snow, sand,
or water. Its inclusion in the classification model
greatly improves the identification of both thin and
thick clouds, which NDVI or NDSI alone may not fully
capture.
Based on the generated spectral index maps (NDVI,
NDSI, CDI) and the reference true colour image (Fig.
6), the process of identifying training data (Fig. 7) was
initiated.
Figure 6. Analysed area in true colour.
Figure 7. Distribution of training points for cloud
classification.
After constructing the classification model, it was
then applied to the entire Sentinel-2 scene to generate a
complete cloud cover map (Fig. 8).
Figure 8. Cloud classification applied to the entire Sentinel-2
scene acquired over the Gdańsk Bay.
In the next stage of the study, the cloud
classification map underwent validation. A confusion
matrix (Fig. 9) was used to assess the model’s
performance.
370
Figure 9. Confusion matrix showing the classification results
of the cloud detection model. Class 0 represents cloud-free
areas, class 1 corresponds to thin clouds, and class 2 refers to
thick clouds.
The confusion matrix shows that the classifier
performed exceptionally well in distinguishing cloud-
free areas (class 0), achieving a perfect classification
with 23 correctly identified samples and no
misclassifications. Thin clouds (class 1) were classified
correctly in 16 instances, while 1 instance was
misclassified as class 0 and 6 as class 2, indicating some
spectral overlap with both neighbouring categories.
For thick clouds (class 2), the model achieved 30 correct
classifications, with minor confusion: 1 sample was
misclassified as class 0, and 6 as class 1. Overall, the
classifier demonstrated high accuracy across all
categories, though further refinement could enhance
the discrimination between thin and thick clouds.
To quantitatively evaluate the classification model,
precision and recall were calculated for each class
(Tab. 2).
Table 2. Precision and recall values calculated for each class
based on the confusion matrix.
Class
Precision
Recall
Sample
0 no clouds
0.92
1.00
23
1 thin clouds
0.73
0.70
23
2 thick clouds
0.83
0.81
37
The evaluation metrics indicate that the model
excelled in identifying cloud-free areas (class 0),
achieving both high precision (0.92) and perfect recall
(1.00), which means that no cloud-free samples were
misclassified. For thick clouds (class 2), the model also
demonstrated strong performance, with balanced
precision (0.83) and recall (0.81), suggesting
dependable detection of dense cloud structures.
The lowest precision and recall were observed for
thin clouds (class 1), with values of 0.73 and 0.70,
respectively. This result suggests that thin clouds are
more challenging to classify accurately, likely due to
their lower spectral contrast and partial transparency,
which can create confusion with both clear sky and
thick cloud classes. Incorporating additional spectral
indices or training samples may improve the
classification of this category in future work.
4 CONCLUSIONS
This study demonstrated the effectiveness of a hybrid
cloud detection method that combines selected spectral
indices (NDVI, NDSI, CDI) with Sentinel-2 spectral
bands and a supervised learning approach using a
machine learning algorithm (Random Forest). The
proposed approach enabled accurate identification of
various cloud types over a spectrally complex coastal
region in northern Poland.
The results indicated that the model excelled in
detecting cloud-free areas, achieving a precision of 0.92
and a perfect recall of 1.00. Thick clouds were also
classified reliably (precision = 0.83; recall = 0.81), while
thin clouds posed the greatest challenge due to their
spectral similarity to both clear skies and thick clouds
(precision = 0.73; recall = 0.70). These findings
emphasize the importance of combining spectral
indicators with effective classification techniques to
reduce misclassification in complex landwater
environments.
The method’s simplicity, flexibility, and relatively
low computational cost make it suitable for operational
maritime applications, including coastal monitoring,
hydrographic data preprocessing, shoreline mapping,
vessel detection support, and navigation-related
environmental monitoring. By improving the
reliability of cloud-free Sentinel-2 products, the
proposed approach may support safer and more
efficient decision-making in coastal and marine
environments. Future research may explore
integrating additional spectral features, time series
analysis, or deep learning-based models to enhance the
detection of optically thin clouds and haze layers.
REFERENCES
[1] D. Phiri, M. Simwanda, S. Salekin, V. R. Nyirenda, Y.
Murayama, and M. Ranagalage, “Sentinel-2 data for land
cover/use mapping: A review,” Remote Sensing, vol. 12,
no. 14, p. 2291, Jul. 2020, doi: 10.3390/rs12142291.
[2] F. Rodríguez-Puerta, R. L. Perroy, C. Barrera, J. P. Price,
and B. García-Pascual, “Five-year evaluation of Sentinel-
2 cloud-free mosaic generation under varied cloud cover
conditions in Hawai’i,” Remote Sensing, vol. 16, no. 24, p.
4791, 2024, doi: 10.3390/rs16244791.
[3] B. Zhou, S. Gao, Y. Yin, Y. Zhang, Y. Yu, Q. Qian, and M.
Zhu, “Enhancing active fire detection in Sentinel-2
imagery using GLCM texture features in random forest
models,” Scientific Reports, vol. 14, p. 31076, 2024, doi:
10.1038/s41598-024-81976-w.
[4] WeatherSpark, “Average weather in Gdynia, Poland, year
round.” [Online]. Available:
https://weatherspark.com/y/84137/Average-Weather-in-
Gdynia-Poland-Year-Round
[5] European Space Agency, “Sentinel-2 User Handbook,”
ESA Standard Document, 2015. [Online]. Available:
https://sentinel.esa.int/documents/247904/685211/Sentine
l-2_User_Handbook
[6] European Space Agency, “Sentinel-2 MSI Technical
Guide,” 2023. [Online]. Available:
https://sentinel.esa.int/web/sentinel/technical-
guides/sentinel-2-msi
[7] S. Huang, L. Tang, J. P. Hupy, G. A. Wang, and J. Shao, A
commentary review on the use of normalized difference
vegetation index (NDVI) in the era of popular remote
sensing,” Journal of Forestry Research, vol. 32, pp. 16,
2021, doi: 10.1007/s11676-020-01155-1.
371
[8] O. Lewicka, “Application of NDVI for marine algae
monitoring: a Polish case study,” in Proc. IEEE Int.
Workshop on Metrology for the Sea (MetroSea), La
Valletta, Malta, 2023, pp. 5761, doi:
10.1109/MetroSea58055.2023.10317358.
[9] Q. Guo, R. Pu, J. Li, and J. Cheng, “A weighted normalized
difference water index for water extraction using Landsat
imagery,” International Journal of Remote Sensing, vol.
38, no. 19, pp. 54305445, 2017, doi:
10.1080/01431161.2017.1341667.
[10] S. K. McFeeters, “The use of the normalized difference
water index (NDWI) in the delineation of open water
features,” International Journal of Remote Sensing, vol.
17, no. 7, pp. 14251432, 1996, doi:
10.1080/01431169608948714.
[11] S. Raghubanshi, R. Agrawal, and B. P. Rathore,
“Enhanced snow cover mapping using objectbased
classification and normalized difference snow index
(NDSI),” Earth Science Informatics, vol. 16, pp. 2813
2824, 2023, doi: 10.1007/s12145-023-01077-6.
[12] A. S. Vieira, R. F. do Valle Junior, V. S. Rodrigues, T. L.
da S. Quinaia, R. G. Mendes, C. A. Valera, L. F. S.
Fernandes, and F. A. L. Pacheco, “Estimating water
erosion from the brightness index of orbital images: A
framework for the prognosis of degraded pastures,”
Science of The Total Environment, vol. 776, p. 146019,
2021.
[13] D. Frantz, E. Haß, A. Uhl, J. Stoffels, and J. Hill,
“Improvement of the Fmask algorithm for Sentinel-2
images: Separating clouds from bright surfaces based on
parallax effects,” Remote Sensing of Environment, vol.
215, pp. 471481, 2018.
[14] L. Breiman, “Random forests,” Machine Learning, vol.
45, no. 1, pp. 532, 2001.
[15] G. Belgiu and L. Drăguţ, Random forest in remote
sensing: A review of applications and future directions,”
ISPRS Journal of Photogrammetry and Remote Sensing,
vol. 114, pp. 2431, 2016.