Ensemble mapping and change analysis of the seafloor sedi- ment distribution in the Sylt Outer Reef, German North Sea from 2016-2018

Recent studies on seafloor mapping have presented different modelling methods for the automatic classification of seafloor sediments. However, most of these studies have applied these models to seafloor data with appropriate number of ground-truth samples, which raises the question whether these methods are applicable to studies with smaller numbers of ground-truth data. In this study, we aim to address this issue by conducting sediment class-specific predictions using ensemble modelling to map areas with limited or without ground-truth data and combined with hydroacoustic datasets. The resulting class-specific maps were then assembled into one map, where the most probable class was assigned to the appropriate location. Our approach was able to predict sediment classes without bias to the class with more ground-truth data and produced reliable seafloor sediment distributions maps that can be used for seafloor monitoring. Sediment shifts of a heterogenous seafloor in the Sylt Outer Reef, German North Sea were also assessed to understand the sediment dynamics in the area. The analyses of sediment shifts showed that the western area of the Sylt Outer Reef is highly active, and the results of the analyses assisted in providing recommendations on future seafloor monitoring activities.


Introduction
Advances in automated seafloor classification have been made in recent years. Seafloor habitat mappers have utilized machine learning classification methods to improve the identification of seafloor characteristics using hydroacoustic data, oceanographic variables, and ground-truth samples [1][2][3][4][5][6]. Some of the most common modelling techniques are classification tree analysis (CTA), generalized boosted models (GBM), artificial neural networks (ANN), and most prominently, random forest (RF) [2,[7][8][9][10][11]. Comparisons of different classification modelling techniques have been conducted, but there is no consensus in the literature on which model performs best [7,10,12,13]. Some studies attempted to address this issue by combining multiple modelling algorithms (ensemble modelling) to derive accurate spatial predictions of seafloor sediment [12]. The general idea behind ensemble modelling is to simulate more than one set of initial conditions using different modelling techniques and to derive a general prediction from all (or a part) of them. [14][15][16]. Ensemble modelling avoids the selection of one single 'best' mod-el, and thus, eliminates or reduces model selection bias [16]. In fact, the ensemble modelling approach has already been applied in the marine environment to map seabed sediments [12,13], submarine geomorphology [17], and benthic habitats [18][19][20]. However, for automated seafloor sediment classification, it has been found that ensemble modelling does not yield significantly different results as compared to using a single model [12,13]. Although, in these studies, ensemble modelling was not applied in a class-specific approach (i.e., different sediment classes were modelled in one row).
In addition to ensemble modelling, ensemble mapping has been suggested as another sediment mapping approach to alleviate the limitations of predicting sediment classes [21]. In ensemble mapping, predictions for each sediment class were generated using single or multiple classification techniques, and then combined the results into a single map by aggregating the modal classes. This method has been utilized to develop seafloor sediment distribution maps as an alternative to the typical thematic mapping (i.e., predicting multiple classes in one row) [2,21]. However, in these studies each sediment class was predicted using only a single model and not by ensemble modelling.
Most of the seafloor mapping studies that used classification models applied the algorithms to data with appropriate number of ground-truth samples [2,6,8,21,22], which raises the question of their applicability to studies with a smaller number of data ( e.g., <50). Especially for wide-scale hydroacoustic seafloor mapping, time and budget for comprehensive ground-truth sampling is scarce. The need for accurate seafloor sediment maps is especially important to monitor areas with heterogenous and dynamic seafloor, where changes in sediment distribution can alter the behavior and distribution of benthic species [23][24][25][26][27][28][29][30][31].
In this study, we propose an approach for addressing the limitation of small ground-truth data for automated seafloor sediment classification using hydroacoustic data, through ensemble modelling and ensemble mapping. Our main objective is to generate seafloor sediment distribution maps of selected sites in the Sylt Outer Reef (German North Sea), and to examine spatiotemporal lateral shifts in sediment distribution. The selected sites are embedded within a large continuous hydroacoustic dataset, but only a limited amount of ground-truth data exist locally. We assessed the applicability of our approach to different spatial scales, study areas, and datasets. For this purpose, we (1) identify the important variables to predict different sediment class, (2) predict each sediment class using ensemble modelling, (3) collate all class-specific predictions into one map through ensemble mapping, and (4) locate and evaluate the changes in the predicted seafloor sediment distribution maps.

Study Site
We selected two relatively well-investigated areas within the Special Area of Conservation Sylt Outer Reef (German North Sea). These areas, referred here as H3 and H5, are subsets representing the typical seafloor structure of the western Sylt Outer Reef and will be used to test the performance of our modelling approach (Figure 1). The areas have been the subject of the national seafloor mapping program SedAWZ, which is coordinated by the Federal Maritime Hydrographic Agency (BSH) [32,33]. Semi-and fully-automated procedures for the detection of stones have been tested in area H3 [34] and sediment dynamics have been studied in both areas [35,36].
The North Sea is a relatively shallow water body, with maximum depth of about 60 meters. The German Bight, where the study areas are located, represents the south-eastern part of the North Sea. Typical depth-averaged currents in the shallow part of the German Bight (depth<20m) are directed along the coast, in a counter-clockwise direction, driven by tidal residual circulation enhanced by westerly and southwesterly winds (e.g., [37,38]).
Tidal dynamics, wave actions, wind-driven currents, and mixing determine seabed sediment dynamics. The geomorphology and surface sediments of the Sylt Outer Reef is shaped by several glacial advances and retreats during the Pleistocene. Surface sediments consist of heterogeneously distributed coarse-grained lag deposits, which are mostly composed of siliciclastic material (reworked moraine deposits). The matrix grain-size vary from coarse sand to gravel, which can also be mixed with pebble-to boulder-sized particles. The coarse sediments are partly covered by Holocene marine fine-to medium-grained sands [39]. The subsurface structure in the Sylt Outer Reef, based on parametric sediment echosounder, revealed that finer sediments covered the coarser sediments and moraine deposits underneath [40].
Study area H3 is approximately 4.7 km 2 characterized by one large elongated bedform feature oriented towards northwest-southeast direction. The bedform is visible as high backscatter area (dark pixels) and surrounded by low backscatter areas (Figure 1, lower left box). Water depth ranges from 28 to 36 m.
H5 is a small area with a size of 1.8 km 2 with two parallel bedform features with a north-south orientation (Figure1, upper left box). Backscatter intensity is high in the southwest, but gradually decreases towards the northeast. The depth in H5 is slightly deeper than H3, with water depth ranging between 36 and 42 m. High backscatter areas were observed in deeper areas, while low backscatter regions dominate at shallow water depths [36].

Data Acquision and Processing
All data presented in this study was obtained during surveys performed between 2016 and 2018 in the two focus areas (Table 1). Focus area H3 was surveyed in 2016 and 2018 (17 months apart), while H5 was surveyed in 2017 and 2018 (4 months apart). Surveys were conducted with the German research vessel "Heincke" (Alfred Wegener Institute, Helmholtz Center for Polar and Marine Research, Germany).
Seafloor backscatter data was collected with an Edgetech 4200 MP side-scan sonar (SSS) at a frequency of 300 kHz and with a range of 75 m (H3) and 150 m (H5). The SSS was towed at a speed of 5 kn behind the vessel and was kept at 5-10m above the seafloor. Surveys were designed to achieve a 10% overlap and 0.25 m along-track resolution of the SSS mosaics. Multibeam echosounder (MBES) data were simultaneously collected with a hull-mounted Kongsberg EM710 system. The very shallow mode with frequency range of 65-106 kHz and pulse length of 0.2 msec, which is ideal for <100 m depth range, was used in our surveys. The maximum reliable swath width was 90°. However, the survey track distances were too wide to achieve a swath overlap of the MBES data. Side-scan data were processed using QPS Fledermaus Geocoder Toolbox v.7.8.8 software (see [36] for details on the procedure). The process applied backscatter, beam pattern, and angle-varying gain corrections; and improved the spatial accuracy of the SSS mosaics (spatial accuracy: ±0.25 m). The SSS mosaics were gridded to 0.25 m resolution with dB values cropped to ±3σ and logarithmically mapped to 8-bit scale. Post-processing of MBES data was conducted in QPS Qimera v2.0.1 software to correct the raw MBES data from tidal effects and reject invalid soundings.
Ground-truth information was collected from both underwater video and grain-size sampling. Underwater videos were obtained using a Kongsberg OE14-366 Color Zoom Camera (Kongsberg Maritime AS, Kongsberg, Norway; horizontal Resolution 460/470 TV lines) and a GOPRO 3+ Black Edition (GoPro, Inc., San Mateo, California; resolution: 1920 x 1440, 47.95 frames per second). The cameras were mounted on a robust metal frame with a laser scale (spacing: 10 cm). The GPS system of the research vessel was connected to the on-board control unit of the camera for geographic referencing. The cameras were deployed underwater as close as possible to the surface for at least five minutes, while the ship was drifting at a speed of less than 1 kn. Videos were initially screened for image quality to omit blurred footage. The remaining videos were then converted into individual images at two-second intervals using the scene video filter of VLC media player (VideoLan project, version 3.2.1.0). Subsequently, photos with a clear image of the seafloor were selected manually and the coordinates were recorded. Sediment samples were collected with a Van Veen grab sampler (HELCOM standard). Sites for sampling were selected based on the SSS mosaic of the study area, which was processed on-board upon acquisition. In the home laboratory, carbonate and organic matter were removed from the sediment using chemical treatment according to the procedures described in [41] and analyzed using a CILASS 1180L laser particle sizer (LPS, range: 0.04-2,500 µm). Particles larger than 2,000 µm were removed by sieving before measurement. Grainsize statistics were calculated in GRADISTAT v8.0© [42].
Overall, a total of 106 ground-truth samples (both sediment and video stills) were obtained at H3, while 76 samples were collected at H5.All samples including the grain size data were categorized according to Folk (1954) and BSH [32] sediment classification as: sand, coarse sediment (gravely sand, sandy gravel, gravel), and lag sediment (sediments of all size classes with gravel and stones). We used a probabilistic approach to predict each sediment class separately and then compared their probability to see which class is more likely to be assigned to the location. The final predictions were combined into an ensemble map, where the maximum prediction score (% of probability) was used to determine the sediment class location.
Here, we utilized the 'BIOMOD2' package within the statistics software R (CRAN) v.4.0.3 [15,43]. BIOMOD2 is the updated object-oriented version of BIOMOD package and has been developed for ecologists to predict species distribution, but it can also be used to model any binomial data (i.e., binary presence-absence object) in function of any explanatory variables [15]. BIOMOD2 has been used to predict macroalgal habitats [44], to map the distribution of medicinal plant species [45], and for ecological niche modelling of basking sharks [46], but it has not been applied to predict seafloor sediments.
Four machine-learning approaches that are commonly used in seafloor mapping were selected from the BIOMOD2 package: classification tree analysis (CTA), artificial neural networks (ANN), random forest (RF), and generalized boosted models (GBM). In CTA, a decision tree is grown by repeatedly splitting the data, then the complex tree is pruned back to the desired size using specific rules to reduce overfitting [47]. GBM uses a forward stage-wise procedure that iteratively fits simple trees to the training data, while gradually increasing focus on poorly modelled observations [16]. RF grows each tree with a randomized subset of predictors and several trees are grown as the predictors aggregated by averaging [14]. Lastly, in ANN, models were run several times and the mean prediction was used or the best fitting model was selected [14]. It uses sets of adaptive weights to link the response to the predictors [16].

Sediment Data
The sediment and video sample data were converted into points and binary format for the model. For example, locations where sand was observed were assigned 1, while areas where there is no sand were assigned 0. In addition, pseudo-absences data was built for each sediment class because most of the models require both presence and absence data and the quantity of true absences from our data was not sufficient. Pseudo-absences are artificial absence data, i.e., places where the species is supposed (but not confirmed) to be absent [48,49]. We used random sampling and repeated it three times with a selection of 200-500 pseudo-absences to prevent sampling bias [16].

Predictor Variables
Geophysical and textural features were extracted from processed MBES and SSS data, and from oceanographic models that were developed for the German Bight. These features were then used to predict the probability of occurrence of each sediment class. A total of 348 predictor variables were generated for this study.
Bathymetry, slope, northing, and easting were derived from our MBES data using the Benthic Terrain Modeler v3.0 Toolbox of ArcGIS 10.7.1 [50]. Spatial data on near-bottom (averaged over 1 m layer above the seabed) tidal residual currents and tide-induced maximum friction velocities were derived from the barotropic multi-layer setup for the south-eastern North Sea. FESOM-C coastal ocean model was used as a numerical tool. It was validated through a series of experiments with a particular focus on the North Sea area and its tidal dynamics in particular [51][52][53].
Textural features of the SSS mosaics were extracted using the grey-level co-occurrence matrix package in R (GLCM v.1.6.5.) to identify the spatial characteristics of the mosaics. GLCM evaluates the co-occurrence of pixel grey level values at given offsets to enhance image classification [54,55]. We applied grey levels of 32, window size of 9, and inter-pixel distance of 5 and 10. Feature calculation was conducted on different orientations: 0º, 45º, 90º, 135º, and the mean of all direction. A total of 80 statistical features were extracted for each side-scan mosaic. The list of the calculated GLCM statistics and geophysical features used in this study can be found in Supplementary Table 1.

Feature Selection
The combination of few presence data and many predictor variables can easily cause model overfitting [56]. In addition, correlation between two or more predictor variables in a statistical model can induce multi-collinearity [57]. Therefore, since we are working with a small number of occurrences, the selection of predictor variables is an important step in our approach. A general rule of thumb is the 1:10 ratio of presence data and predictors, which means to include only two predictors for twenty presence data [56,58].
Predictor variables were selected in an iterative process. Initially, the variance inflation factor (VIF) was used to detect collinearity between the predictors and to remove redundant variables. The VIF is based on the square of the multiple correlation coefficient (R 2 ) resulting from regressing the predictor variable against all other predictor variables [57]. A VIF greater than 10 indicates a collinearity problem [59]. Here, VIF analysis was performed using the 'vifstep' function in the R package 'usdm' [60]. All predictor variables were analysed in a stepwise procedure, whereas variables with VIF of >5 were removed. Further feature selection was conducted during model calibration. Variables with a low mean variable importance value (≤0.1) were excluded from the modelling.
We have set a maximum of five predictors to model each sediment class to avoid model overfitting and multicollinearity.

Model Calibration and Validation
The parameters and complexity of each model were modified depending on the sediment class, number of predictor variables, and presence data. Models were calibrated with 70% of the data and validated with the remaining 30%. This procedure was repeated 20 times. During calibration, model performance was assessed by the threshold-independent receiver operator characteristics (ROC), threshold-dependent true skill statistics (TSS), and Cohen's Kappa.
TSS ranges from -1 to +1 where +1 indicates perfect agreement and values of zero or less indicate a performance no better than random. This is different from Kappa because TSS is not affected by the size of validation set and prevalence. TSS score of 0.7 or higher indicates good or exceptionally good performance of the model [47]. ROC assess the relationship between the false positive fraction (specificity) and the sensitivity for a range of thresholds. Kappa indicates the best possible agreement [47].
Single models were repeatedly calibrated until the optimal TSS value (≥0.7) was achieved. As a result of multiple model parameters, a total of 240 models runs were conducted for each sediment class (4 algorithms x 20 cross-validation runs x 3 pseudo-absences sampling). A total of 960 single models were created for this study (Table 2). Subsequently, only models with TSS value of ≥0.7 were averaged to develop the ensemble model of each sediment class by committee averaging (Table 2). Committee average gives both prediction and measure of uncertainty, where each model decide for the sediment class being either present or absent, and then the sum was divided by the number of models. For example, when the prediction is around 0.5 it means that half of the models predict 1 and the other half predict 0 [15,16]. The measure of uncertainty of the predictions (coefficient of variation), and mean probability of the predictions were also calculated. A total of eight (8) ensemble models were generated for the two study areas. The R script used to perform ensemble modelling can be found in Supplementary Material 1.

Ensemble Mapping and Map Accuracy Assessment
The committee averaged predictions for each sediment class was aggregated to create an ensemble map. The procedure was conducted using the raster analysis tools of ESRI ArcGIS 10.7.1 and is explained in Appendix A. In summary, we used the maximum cell values of each sediment class as the parameter to combine them into one map. The output is an ensemble map of the predictions where the most probable class was assigned to the location.
Accuracy of the ensemble maps was calculated using the 'confusionMatrix' function of the 'caret' package in R [61]. A separate testing data was used to extract the predicted values in the ensemble maps in the location of the testing data, then a confusion table was constructed. Statistics such as overall accuracy and Kappa coefficient were calculated. The overall accuracy indicates the percentage of areas that were correctly predicted. Kappa coefficient measures the agreement between the observed and predicted data [62]. Whereas values of >0.75 is considered as good agreement between the data, while values of <0.4 represent poor agreement beyond chance [63].

Detecting Changes in Seafloor Sediment Maps
To determine if there are changes in the seafloor sediment maps of different years, we applied the change detection method for habitat classification maps of Rattray et al. [64]. The method uses transition matrix which is a conventional method assessment of land cover changes [65,66]. In recent years, it has been adapted to detect changes in benthic habitat maps and seafloor sediments [10,64,67]. The 'from-to' transition of the sediment classes, persistence, and the amount of gain/loss were calculated for H3 and H5. Gain refers to the increase in area coverage, while loss refers to the decrease. Persistence indicates no change in the sediment class [64,65].

Sediment Classes Based on Field Survey
The backscatter properties of the sand class in the SSS mosaics of H3 and H5 are different. Sand was reflected as medium-high backscatter in H5 instead of low backscatter like in H3 (Figure 3 and 5). Hence, we differentiate the two sand classes based on their backscatter properties-sand low-backscatter (SLBS) and sand high-backscatter (SHBS).
According to grab samples and underwater videos, lag sediments (LagSed) and sand-1 (SLBS) were the sediment classes in H3 (Figure 2 and 3). Lag sediments were observed in high-backscatter areas (dark pixels) and as clusters and patches of gravel, cobbles, and boulders with attached biotic species (Figure 2). SLBS class areas were observed in low backscatter zones (lighter pixels) and were seen as small oscillation ripples (~ 10 cm wavelength) in the underwater videos ( Figure 2).
We have identified two sediment classes from our survey data in H5, namely coarse sediment (CSed) and sand-2 (SHBS) (Figure 2 and 4). CSed was observed in high-backscatter areas in the SSS mosaic ( Figure 5). In the underwater images, CSed class are characterized by bedforms with coarse sediments and shell fragments on the lee slope. On the other hand, the SHBS class are reflected as medium-high backscatter in the SSS mosaics (Figure 4). When viewed at 25-cm resolution of SSS data, the SHBS area shows presence of ripples with approximately >20 cm of wavelength. This was subsequently verified in the underwater images as bedforms with shell fragments and coarser sediments on the troughs (Figure 2).

Ensemble Model Performance
Of the 240 individual models that were created, only models with TSS value of >0.70 were included in the final ensemble model, which was used to predict the sediment classes ( Table 2). The predictive power and accuracy of the ensemble models are excellent with high statistical reliability (TSS = >0.8/ ROC= >0.9) ( Table 3). The agreement between the response and explanatory variables was also good (Kappa= 0.4-0.9).
Of the four algorithms, GBM and RF performed the best in predicting coarse sediments (LagSed and CSed). On the other hand, ANN and GBM predicted sand very well. CTA had the poorest performance in predicting sediment classes with small sample size and few predictor variables. We observed that using only 2-3 model, instead of four, decreased the predictive accuracy of the ensemble model.
The importance of the predictor variables in the predicting performance of the algorithms are listed in Supplementary Table 2. Briefly, GLCM variables such as correlation, second moment, homogeneity and contrast highly influence the predictive performance of the model. Side-scan mosaic, slope, and easting are also important predictor variables. Notably, we found that SSS mosaic and slope can predict sand areas very well, while GLCM features can discriminate LagSed and CSed areas.
According to the 2018 dataset, the area of LagSed had slightly increased in 2018 from 41% to 49% of the total area ( Figure 5 and Table 5). Sand dominated 51% of the area around the bedform and some small patches within it ( Figure 5). The accuracy is reliable except inside the sorted bedform area, where the predictions seem to be artefacts from the side-scan mosaics that were used as input data in the models. Overlaying the class-specific predictions into one map based on the percentage of their probability of occurrence, have resulted in statistically reliable seafloor sediment map with overall accuracy (OA) of 100% (Table 4, Figure 6). The agreement between the observed and predicted classes are also excellent (Kappa = 1.00). Both maps were able to classify the high backscatter bedform as LagSed and its surrounding area as sand ( Figure  6).

Changes in Seafloor Sediment Distribution Maps of H3
The transition analysis of the seafloor sediment maps in H3 showed that most of the changes within the 17 months have happened around the boundary of lag sediment and sand-1 (SLBS) class area ( Figure 6). Lag sediment class was more affected by the transition, than the surrounding sand areas that were mostly unchanged (persistence = 2.03 km 2 of 4.71km 2 ) ( Table 5).
Along the boundary of the two classes, we noticed that most sand class shifted into LagSed class, particularly in the northeast and southwest portion (Figure 7). Moreover, most of the sand-to-LagSed transitions occurred within the bedform area. This transition has caused 16.3% increase in the area coverage of LagSed in 2018 and resulted to 8% loss of the sand class area in the map (Table 5). However, this loss for sand class is lower than its 43% area coverage which remained as unchanged for two years.
Overall, 2.26 km 2 (48%) of the map have changed in 2018 where LagSed is the most affected class.

Predicted Sediment Distribution in 2017 and 2018
The two parallel bedform features in H5 were predicted as CSed class, while the surrounding areas were classified as sand-2 (SHBS). Some areas outside the features were also predicted as CSed especially in the 2018 map, but the accuracy of this prediction is low (Figure 7).
In 2017, the two features have been predicted as CSed with good accuracy (TSS = 0.82, Table 3). However, some areas in the northeast of the bedforms were not classified (Figure 7). Around 63% of the total area of H5 (1.81 km 2 ) was predicted as SHBS and only 37% was predicted to be CSed. The prediction of SHBS in 2017 is particularly good (TSS=0.90) (Table 3, Figure 7).
In 2018, some areas in the northeastern portion of H5 were predicted as CSed (TSS = 0.86, Table 3) but with higher uncertainty (Figure 7). The prediction has also more visible noise or artefacts compared to the 2017 modelled data. The prediction of SHBS in 2018 has lower probability than in 2017 (TSS=0.83) (Figure 7). In both maps, CSed are well-defined in the southwest but seem to fade towards the northeast.

Seafloor Sediment Distribution Map of H5
The ensemble maps of H5 have both received a comparable and good statistical score ( Figure 8 and Table 4). Despite the artefacts in the original data (Figure 4), the 2017 map still obtained 94% overall accuracy and Kappa of 0.86 (Table 4, Figure 8). The 2018 ensemble map has lower but still good accuracy of 86% and Kappa score of 0.75, which indicate that the observed data (ground-truth) were classified correctly (Table 4, Figure  8).
Although, interpretation of the map must be done with care because of the artefacts in the raw data. The class-specific mean probability and uncertainty maps can be used to guide the interpretation if map accuracy is the main concern. Nevertheless, the maps of H5 can still provide a good overview of the probable location of CSed and SHBS sediments in the area (Figure 8).

Changes in Seafloor Sediment Distribution Maps of H5
By 2018, 35% (0.63 km 2 ) of the 2017 sediment distribution map have changed within 4 months. These changes were observed along the boundary of the classes and in the north-northwest portion of H5 (Figure 8). However, both sediment classes have gained and loss almost the same amount (Table 5). For example, CSed gained 8.72% of area coverage in 2018 from SHBS but also loss 8.68% of its area to SHBS in the same year.
We observed that the CSed-to-Sand class transition mainly occurred in the northnortheast facing side of the bedforms, and CSed class have gained more area in the northwest (Figure 9).

Predicting Seafloor Sediments with Limited Ground-Truth Samples
The accuracy of the predicted seafloor sediments in a heterogenous area, like the Sylt Outer Reef, can be influenced by several factors that may negatively influence results [21]. These factors include (1) an inadequacy of the selected classification system, (2) a low discriminatory power of the predictors, or (3) a mismatch between the response (i.e., grab sample) and predictor variables (e.g., backscatter mosaic).In addition, an unequal number of samples between sediment classes may result in under-or over-predictions in the modelling results [46] .Discrepancies between different techniques can be very large and some models may be more sensitive to sampling bias, which might reduce model transferability and selection [15,56,68]. These issues can be alleviated by creating an ensemble map that aggregates individual predictions into one map and by adopting a quantitative approach that models the spatial distribution of grain-size classes [2,21]. Moreover, ensemble modelling can compensate for unwanted inter-model variability and model selection bias, by aggregating the results of multiple models into one general prediction [15,16].
In this study, we tested the capacity of ensemble modelling as a reliable and reproducible approach for seafloor sediment mapping and monitoring. Unlike the usual thematic mapping, we conducted class-specific predictions using BIOMOD2 to classify areas with limited or lacking ground-truth data. By modelling each sediment class separately, each class was predicted without bias to the class with more ground-truth data.
The probability of occurrence of different sediment classes was modelled for two different locations and different temporal scale. In this regard, we first assumed that we would produce highly variable results, but we achieved comparable outputs. For example, GBM and RF models were able to predict coarse sediments (i.e., LagSed and CSed) in both H3 and H5. Moreover, there have been similarities in the important variables that predicts specific sediment class (Supplementary Table 1). In this regard, we have tested and proven that our method can be applied to different study areas, different spatial scale (larger or smaller scale), and for repeated surveys.
However, the most important factors that influenced our results are the quality of input data. Environmental predictor variables influence the probability of occurrence [16]. As we have seen, the nadir artefacts from the SSS mosaics were reflected in the probability of occurrence maps ( Figure 5 and 7). This imply that the quality of the data is important when performing our methodological approach.
In addition, we observed that the spatial distribution of the ground-truth samples highly influenced the prediction. This issue was addressed by generating three sets of randomly selected pseudo-absences, which substantially improved the model predictions. In species distribution modelling, pseudo-absences are meant to be compared with the presence data and help differentiate the conditions under which species can occur or not. Therefore, selecting the appropriate number of pseudo-absences may optimize model performance [49].
In this regard, survey design is important before collecting field data to ensure that all samples for each sediment class is well-distributed (spatially). The outputs of this study can be utilized for this purpose. For example, the probability of occurrence and uncertainty maps can guide scientists or seafloor mappers to decide where to collect a specific sediment class.

Seafloor Sediment Distribution in the Sylt Outer Reef from 2016 to 2018
Sediment distribution is an important parameter for the understanding of benthic habitats, for the management of maritime economic activities, and the monitoring of impacts of human activities on the seafloor [31,69,70]. We predicted and mapped the possible seafloor sediment types for two areas in the Sylt Outer Reef Special Area of Conservation.
In H3, the bedform feature was predicted to be composed of lag sediments and surrounded by sand. Among the two sediment classes, the LagSed class have been more affected by sediment shifts that occurred within the bedform area. We observed that more LagSed class have appeared especially nearby the boundary of the bedform, while more sand class were seen inside the bedform after two years. Boundaries of the bedforms were observed to be the most vulnerable to sediment shifts [35,36,[71][72][73]. On the other hand, the surrounding sandy areas seem to be stable over the period of observation.
The sediment class in H5 was more difficult to predict than in H3, because of the mismatch of the ground-truth data with the predictor variables (acoustic data). For example, areas that were interpreted to be sand based on grainsize analysis appeared as areas with medium-high backscatter strength (dark pixels), instead of showing low backscatter strength (light pixels) like the sand class in H3 (Figure 3). The stronger backscatter response of the sand class in H5 may be attributed to the high amount of shell fragments on the surface and by surface roughness caused by the ripples, as observed in the underwater videos.
Like H3, shifts in sediment class occurred along the boundaries of the two bedforms in H5. Although, the quantity of transition between the two classes are almost the same, it does not imply that changes did not occur, but rather signify that the intensity of changes are low. Shifts from CSed to SHBS class occurred at the northeast facing side of the bedform features, while Sand-to-CSed transitions were observed in the north-northwest area of H5.
In summary, sediment shifts were observed along the boundaries of the bedform features. These findings are in accordance with our previous study [36]. In our previous study we monitored the boundary lines to detect sediment shifts, but here we looked at the changes in the modelled sediment distribution maps. The results of both studies are comparable i.e., the sediment shifts were mainly observed in the northeast and southwest direction of the bedforms. The spatial sediment transitions that we detected in this study may be attributed to the fluctuations of the sandy materials along the boundary. The deposition or erosion of mobile sand fractions covers or uncovers the coarser sediments underneath , which is largely driven by tidal currents and storm events [35,36,73]. The mobilization of sandy materials along the boundary caused the oscillation of the boundaries, instead of moving the boundaries in one direction [36].

Sediment transitions and their implications
Monitoring changes in sediment distribution maps is especially important in areas with heterogenous seafloor cover, where tidal currents, wave actions, and wind-driven flows determine the seabed dynamics and may induce drastic changes in the sediment distribution pattern [21,67,74]. Moreover, sediment transition can be used to predict species responses to habitat change [23,25,26,75]. Changes in sediment composition along sediment gradients/boundaries can alter the behavior and distribution of benthic species. For example, the loss of coarse sediments forced benthic invertebrate communities to leave their habitat and move to fine sediments, which consequently changed the community compositions (taxa presence and absence) [27]. In addition, changes in detrital resources (i.e., coarse sediments), which serves as refuge in a soft sediment system, causes decline in macroinvertebrate species [28]. Therefore, monitoring of changes in seafloor sediments is vital for the conservation of benthic biodiversity and detrital resources, especially for important marine protected areas such as the Sylt Outer Reef.
Accurate prediction of sediment class is necessary to be able to detect the actual seabed change in a highly complex area [21,67,74]. In this regard, sediment distribution maps need to be updated to develop and implement appropriate strategies to manage maritime activities and marine conservation areas. However, the question is how frequent we must update these maps?
In this study, the sediment transitions imply that sediment dynamics in the western part of the Sylt Outer Reef are highly active and can cause conceivable changes in the sediment distribution in a short period of time. For example, approximately 48% of the sediment distribution map of H3 have changed after two years, while 35% of the maps in H5 experienced changes in just four months.
Therefore, in areas of the Sylt Outer Reef with seafloor features like in H3 and H5, seafloor monitoring can be conducted at approximately no more than 5 years, because by then the sediment distribution may have changed substantially at the boundaries of the features. This approximation is based on our findings for the two sites in the Sylt Outer Reef, where we observed that this survey interval is necessary to provide reliable recommendations for monitoring purposes. Moreover, to find out whether the observed changes have happened constantly between the studied time periods or because of an extreme event (e.g., severe storms), additional surveys ideally before and after a storm are necessary. The surveys can verify the actual cause of these changes and can evaluate the impact of storms to the sediment distribution pattern.
Seafloor dynamics are likely to be as variable as tidal currents or ocean climate patterns, and thus a regular interval (i.e., 5 years) may miss important dynamics. But monitoring a large area can be time consuming and costly. In this regard, repeated monitoring of subsets of areas, like this study, can be an alternative to evaluate seafloor changes, until it becomes evident that a new "full" survey is necessary. Moreover, since coarse sediments (i.e., LagSed and CSed) in the German Bight are important habitats for epibenthic assemblages, and sediment transition can have adverse effect in their ecosystem, mapping these areas is important for habitat monitoring and conservation efforts [34,75].

Outlook
Information on sediment distribution was found to be a very good predictor of benthic species densities and distribution [30,44,76,77]. Hence, our modelled prediction of sediment distribution can be used for marine conservation studies as input to species distribution modelling [23,44,77] and for monitoring of the impacts of human activities [24,31,69,78].
Moreover, the seafloor sediment maps that were generated in this study can provide information to future seafloor mapping efforts. The maps can be used by seafloor mappers in planning their survey and to design a systematic ground-truth sampling approach, which may improve the accuracy of the seafloor sediment maps in the future.
Furthermore, the methodological approach that we presented are being used to model species distribution and habitat suitability. Hence, the methods in this study can be adapted not only by geologists but also by biologists, ecologists, and environmental scientists.

Conclusions
Here, we presented that mapping sediment class can be done automatically using few ground-truth samples combined with acoustic data. This was accomplished by conducting class-specific predictions using the combinations of four classification model algorithms. The ensemble of these models has improved the predictive capacity of the models and produced reliable seafloor sediment maps that can be used for seafloor monitoring. We demonstrated that by aggregating bits of information, we can generate reliable information on seafloor integrity. Moreover, the methodological approach and results that we presented can be used as a tool for seafloor monitoring and benthic habitat mapping, and provides information on the seafloor sediment dynamics.   Steps of Ensemble Mapping The procedure was conducted using the raster analysis tools of ESRI ArcGIS 10.7.
Steps to ensemble each class-specific prediction into a single map are as follows: 1. The raster for each sediment class was converted into integer format to allow raster analysis. 2. Majority filter using the closest eight cells as a filter was run to join the small cells with the majority cells to reduce the noise in the raster. 3. Using the cell statistics function of ArcGIS, the maximum value (highest probability %) of the input rasters (e.g., raster for all sediment class in H3 in 2016) was computed. The output is the overlaid maximum scores of the sediment classes in one raster map (OverallMax). 4. After generating the OverallMax, each original raster (i.e., majority filtered) was subtracted from the OverallMax raster where 0 would be the cells with the max value in each. Two new rasters were created and called here as ClassMax1 and ClassMax2. 5. For each of the ClassMax rasters, set the 0 values to 1 for ClassMax1, and 2 for ClassMax2 using the Con function in raster calculator (e.g., Con (Class-Max1==0,1,0)). The result would be two new raster files with reclassified cell values. ClassCon 1 with the cells of maximum scores assigned as 1, and ClassCon2 with maximum scores assigned as 2. For example, the max scores of LagSed were assigned 1 and max scores of sand was assigned 2. 6.
Finally, the two ClassCon rasters were mosaicked to a new raster, where the cell value of the overlapping areas are the maximum value of the overlapping cells. The output is the ensemble map of the predictions of the two sediment classes, where the most probable class was assigned to the location. .