(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org. Licensed under Creative Commons Attribution (CC BY) license. url:https://journals.plos.org/plosone/s/licenses-and-copyright ------------ MARIDA: A benchmark for Marine Debris detection from Sentinel-2 remote sensing data ['Katerina Kikaki', 'Remote Sensing Laboratory', 'National Technical University Of Athens', 'Athens', 'Zografou', 'Institute Of Oceanography', 'Hellenic Centre For Marine Research', 'Anavyssos', 'Ioannis Kakogeorgiou', 'Paraskevi Mikeli'] Date: 2022-01 Currently, a significant amount of research is focused on detecting Marine Debris and assessing its spectral behaviour via remote sensing, ultimately aiming at new operational monitoring solutions. Here, we introduce a Marine Debris Archive (MARIDA), as a benchmark dataset for developing and evaluating Machine Learning (ML) algorithms capable of detecting Marine Debris. MARIDA is the first dataset based on the multispectral Sentinel-2 (S2) satellite data, which distinguishes Marine Debris from various marine features that co-exist, including Sargassum macroalgae, Ships, Natural Organic Material, Waves, Wakes, Foam, dissimilar water types (i.e., Clear, Turbid Water, Sediment-Laden Water, Shallow Water), and Clouds. We provide annotations (georeferenced polygons/ pixels) from verified plastic debris events in several geographical regions globally, during different seasons, years and sea state conditions. A detailed spectral and statistical analysis of the MARIDA dataset is presented along with well-established ML baselines for weakly supervised semantic segmentation and multi-label classification tasks. MARIDA is an open-access dataset which enables the research community to explore the spectral behaviour of certain floating materials, sea state features and water types, to develop and evaluate Marine Debris detection solutions based on artificial intelligence and deep learning architectures, as well as satellite pre-processing pipelines. Funding: Part of this research has been supported by the European Regional Development Fund of the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship, and Innovation, under the call RESEARCH–CREATE–INNOVATE (project code: T1EDK-02966). This work has also been supported by NEANIAS, funded by the European Union’s Horizon 2020 research and innovation programme, under grant agreement No 863448. There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Copyright: © 2022 Kikaki et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. To this end, this study aims to fill this gap with a new, open-access benchmark dataset, named MARIDA—MARIne Debris Archive, based on S2 multispectral satellite data. MARIDA offers real cases with Marine Debris events, providing globally distributed annotations, ready for ML tasks. The produced dataset takes an innovative step forward by containing sea features that co-exist in remote sensing images, ultimately forming 15 thematic classes in total. Along with MARIDA, ML baselines for the weakly supervised semantic segmentation task [ 39 ] are presented, including shallow ML and deep neural network architectures. To enlarge the benchmark application area, the multi-label classification task is also considered. However, despite the challenging and continuously growing issue of Marine Debris, the currently available datasets are relatively limited in number and do not usually employ open-access high-resolution satellite data over geographically extended areas. These facts prohibit satellite data exploitation from ML frameworks and operational solutions. In addition, most of the currently available marine remote sensing datasets focus on detecting specific objects such as vessels [ 32 – 35 ]. Datasets for cloud detection over the ocean [ 36 ] and Sargassum macroalgae extraction [ 37 , 38 ] have also been developed with a limited number of classes. Furthermore, to better understand the spectral behaviour of Marine Debris, hyperspectral measurements have been conducted, exploring sensors’ capabilities in distinguishing plastics from other features such as vegetation, natural material, and water types [ 24 – 28 ]. Investigating Marine Debris characteristics (including its spectral behavior) has been also attempted via multispectral satellite observations [ 10 , 12 , 13 , 29 ], highlighting that spectral discrimination of Marine Debris from other sea surface features (e.g., ships, foam) is not straightforward. Indeed, differentiating floating plastic debris from bright features, such as waves, sunglint, clouds, is currently considered very challenging [ 5 , 6 ]. This is due to the fact that plastics have complex properties, diversifying in color, chemical composition, size and level of water submersion [ 30 , 31 ]. A high-quality dataset can address the challenges mentioned above, supporting also the development and improvement of Marine Debris detection methods, and assessing the operational aspects of any given solution (e.g., scalability). In particular, earth observation data from public and commercial satellite programs [ 10 – 14 ] have been employed for detecting and monitoring Marine Debris, as well as remote sensing data from manned aircraft [ 15 ], unmanned aerial vehicles (UAVs) [ 16 – 20 ], bridge-mounted [ 21 ] and underwater-cameras [ 22 ]. Spectral indices have also been proposed to enhance the detection of Marine Debris on multispectral satellite data, like the Floating Debris Index (FDI) [ 13 ] and the Plastic Index (PI) [ 23 ] that have been developed based on artificial plastic targets. Marine Debris, such as plastics, is a major global issue with important environmental, economic, human health and aesthetic aspects. Plastics remain in the ocean for a long time, and have been found in various areas worldwide [ 1 – 3 ], affecting marine life at different trophic levels [ 4 ]. To tackle the Marine Debris issue, several solutions for detecting [ 5 , 6 ], cleaning [ 7 ] and preventing [ 8 ] have been developed and validated. Among those, detecting and monitoring floating litter has recently gained the attention of most research and development efforts [ 9 ]. Materials and methods Dataset specifications MARIDA is an open-source dataset consisting of annotated georeferenced polygons/pixels on S2 satellite imagery. MARIDA was designed to be temporally and geographically well-distributed; thus, we used open-access data from the S2 satellite sensor which coverage includes global coastal waters. S2 is capable of detecting and continuous monitoring large floating debris, as it provides multispectral data at a spatial resolution of 10 m and 20 m with a frequent revisit time of 2–5 days. Regarding Marine Debris ground-truth data, reported events were collected from citizen scientists and social media over coastal areas and river mouths. After identifying these cases in S2 satellite data, the events were verified with very high-resolution satellite data (whenever possible due to availability), and the corresponding Marine Debris pixels were annotated. Additionally, sea surface features that co-occurred on satellite images were annotated: Ships, Sargassum macroalgae, Foam, Waves and Natural Organic Material (i.e., vegetation and woody), water types (i.e., Clear, Turbid Water and Sediment-Laden Water), Shallow Coastal Waters including benthic habitats, Clouds and Cloud Shadows. Regarding the annotation procedure, three image-interpretation experts annotated the satellite images by assessing the spectral and spatial patterns of all features, considering the limitations of the S2 sensor (i.e., different band resolutions and limited signal-to-noise ratio) [40]. Finally, an inter-annotator agreement protocol was established to merge the annotated data and aggregate the confidence levels derived from the three experts (see the Annotation process and protocol section). The current benchmark dataset aims to support real-world scientific issues that could eventually not only facilitate research efforts in Marine Debris, but also offer operational monitoring solutions. Thus, MARIDA consists of realistic, non-iconic and non-ideal (e.g., with term ideal, we refer to cloud-free data during calm sea state conditions) satellite observations. MARIDA’s annotations are also sparse to reduce the potentially noisy labels due to the complexity of sea surface features. The annotated polygons with real cases on S2 images (10 m resolution) do not correspond to thematic class endmembers or pure/clear pixels (in some cases, we annotated sparse Marine Debris pixels or floating materials pixels under very thin clouds). Machine learning frameworks Baselines. In order to trigger more research efforts towards Marine Debris detection methods and solutions, we provide software baselines for weakly supervised pixel-level semantic segmentation tasks, by employing a Random Forest model (RF) [56] and an U-Net architecture [57]. In particular, RF is a well-established supervised model, which has been widely used in remote sensing and computer vision community. A RF classifier consists of many decision trees and uses averaging to improve the predictive performance and control over-fitting. For our RF model, we extracted features similar to the first place team of Track 2 of the 2020 IEEE GRSS Data Fusion Contest [58]. We trained three different RF models: i) one based on spectral signatures of each pixel (RF SS ), ii) one based on spectral signatures and calculated spectral indices (RF SS+SI ), and iii) one with spectral signatures, spectral indices, and extracted Gray-Level Co-occurrence Matrix (GLCM) [59] textural features (RF SS+SI+GLCM ) in order to incorporate the spatial information. The extracted spectral indices were NDVI, NDWI, FAI, FDI, Shadow Index (SI), Normalized Difference Moisture Index (NDMI), Bare Soil Index (BSI) and NRD [40, 60], which are broadly used in remote sensing studies. To compute the GLCM features, Rayleigh corrected RGB composites were converted to grayscale images which consequently were quantized in 16 bins-level. The selected GLCM features were Contrast (CON), Dissimilarity (DIS), Homogeneity (HOMO), Energy (ENER), Correlation (COR) and Angular Second Moment [59]. For those features extraction, a window of size 13 x 13 was used. The U-Net is a well-established deep learning model for semantic segmentation. Its architecture consists of two parts, the down-sampling and the up-sampling part. The first part encodes the input image yielding a low dimensional representation using successive blocks of 3 x 3 convolutions for features extraction and max-pooling layers for down-sampling. The feature maps/ produced channels are doubled in each block, while the spatial dimensions are reduced by half. The second part decodes the internal representation using successive up-convolution layers to create the final segmentation output. For our task, the first input layer of U-Net was modified to adapt to the 11 Rayleigh reflectance S2 bands, and the final classification layer was changed to output the MARIDA classes. We also used 4 down-sampling and up-sampling blocks, as well as 16 hidden channels produced by the initial down-sampling block. To assess pixel-level semantic segmentation performance, we relied on three metrics. Our main evaluation metric was the Jaccard Index or Intersection-over-Union (IoU) [61]. In addition, the average for each class F 1 score (Macro-F 1 / mF 1 ) and the Pixel Accuracy (PA) for the per-class assessment were employed (S2 Appendix). Through MARIDA, we also provide multi-labels in patch-level, which formulate a weakly-supervised multi-label classification task with positive, and absent labels that are not necessarily negative [62, 63]. For the baseline of the multi-label classification task, we adopted the Residual neural network (ResNet) [64]. The evaluation metrics for the multi-label classification task are demonstrated in the S2 Appendix and the proposed baseline in the S4 Appendix. MARIDA dataset and analysis MARIDA contains 1381 patches, consisting of 837,357 annotated pixels, based on 63 S2 scenes acquired from 2015 to 2021. MARIDA provides patches with corresponding masks of pixel-wise annotated classes and confidence levels in the format of GeoTiff. For each patch, the assigned multi-labels are given in a JSON file. In addition, MARIDA includes shapefiles data in WGS’84/ UTM projection, with file naming convention following the below scheme: s2_dd-mm-yy_ttt, where s2 denotes the S2 sensor, dd denotes the day, mm the month, yy the year and ttt denotes the S2 tile. Shapefiles data include the class of each annotation, along with the confidence score and the report description. The produced dataset is composed of geodata, covering different sites around the globe (Fig 2). The selected study sites are distributed over eleven countries (i.e., Honduras, Guatemala, Haiti, Santo Domingo, Vietnam, South Africa, Scotland, Indonesia, Philippines, South Korea and China). PPT PowerPoint slide PNG larger image TIFF original image Download: Fig 2. The sites (red dots in the map) where Marine Debris events were reported, and corresponding Sentinel-2 satellite images were acquired and processed. Marine Debris and other features that co-existed were annotated in considered satellite data. The corresponding map is acquired from Natural Earth (http://www.naturalearthdata.com/). https://doi.org/10.1371/journal.pone.0262247.g002 Spectral signatures To study the spectral behavior of Marine Debris annotated data, we extracted the mean spectral signatures for each scene, leading to a detailed analysis presented thoroughly in the online material. The mean spectral reflectance of annotated pixels with high confidence in MARIDA is depicted in Fig 3. The mean spectral signatures are presented along with 25–75 percentiles as error bars to demonstrate the variation along with the skewness of their distribution. Atmospheric correction process, diverse proportions of floating Marine Debris within pixels, differences resulting from colours and immersion, and mixed conditions in the natural environment led to high variability of recorded Marine Debris spectral signatures. However, the recorded Marine Debris mean spectral reflectance is very similar with the corresponding simulated signature proposed recently by Hu [40]. Slightly higher values in our data indicate different debris proportions within pixels. In comparison with previous studies [10, 12, 13], which exploited S2 imagery, higher reflectance at Green and Red bands was observed, possibly due to the denser patches that we recorded. Additionally, the mean spectral signature of high-confidence NatM was considered for comparison, as in some cases with low subpixel proportions, their spectral discrimination was not straightforward. Regarding Marine Debris and NatM comparison, it was found that their discrimination might be possible in 865 nm and SWIR bands. Statistical analysis By applying t-SNE algorithm along with spectral signatures analysis described above (Figs 3 and 4, online material), important insights were gained about spectral behaviour of floating Marine Debris and the potential of spectral discrimination from other features with similar patterns such as SpS, Ship, Waves and NatM. Fig 4 presents t-SNE results for the considered features, indicating the different confidence level for each annotation with a different symbol. Based on the recorded data, a well-shaped Marine Debris cluster was developed, which is discrete from other clusters. Very sparse recorded Marine Debris (e.g., 20 April 2018 in Scotland) led to a smaller separate cluster between Waves and Marine Debris. A well-shaped Ship cluster was also mapped, yet some annotated Ship pixels were depicted in Marine Debris cluster due to the similar polymer types. Respectively, some dense Marine Debris pixels were mapped in the Ship cluster. Some Ship pixels were also depicted close to Waves pixels; this is evident in cases with moving vessels, where discrimination of boundary Ship pixels from water-related classes (i.e., Wakes) was challenging for a human expert. Occasionally, NatM cannot be spectrally separated from Marine Debris (e.g., 18 September 2020 at Motagura river mouth). Mixed conditions at the river mouth, low coverage at pixel-level and potentially colored marine litter (e.g., green or brown) led to uncertainties represented with low confidence Marine Debris and NatM annotations. However, dense Natural woody debris has a discrete spectral signature (e.g., 7 October 2018 at Nakdong river mouth). This fact was also confirmed by a smaller (but well-shaped) NatM cluster depicted in brown color (Fig 4). A discrete SpS cluster was also formed, including NatM (i.e., vegetation). In some cases the SpS annotated pixels have been mapped in the Marine Debris and Waves clusters, though, the majority of these cases corresponded to sparse floating materials that were detected at a lower subpixel level. This fact confirms that sparse floating vegetation pixels in some cases cannot be spectrally discriminated from sparse marine litter pixels (e.g., 4 March 2018 in Bali) [40]. MARIDA benchmark and ML baselines MARIDA is designed to be beneficial for several remote sensing applications and tasks which are described in detail in the following section (Discussion). However, it primarily aims to benchmark weakly supervised pixel-level semantic segmentation learning methods. In particular, the produced dataset falls into incomplete-supervision due to sparsely annotated data, inexact-supervision due to sensor limitations (i.e., 10 m resolution, different bands resolution), and inaccurate supervision derived from potential slightly noisy annotations (i.e., sensor noise, human error) [40]. [END] [1] Url: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0262247 (C) Plos One. "Accelerating the publication of peer-reviewed science." Licensed under Creative Commons Attribution (CC BY 4.0) URL: https://creativecommons.org/licenses/by/4.0/ via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/