ARTICLE | doi:10.20944/preprints201810.0107.v1
Subject: Earth Sciences, Geophysics Keywords: multibeam echosounder; backscatter; multi-frequency; machine-learning
Online: 5 October 2018 (16:09:53 CEST)
We propose a probabilistic graphical model for discriminative substrate characterization, to support geological and biological habitat mapping in aquatic environments. The model, called a fully connected conditional random field (CRF), is demonstrated using multispectral and monospectral acoustic backscatter from heterogeneous seafloors in Patricia Bay, British Columbia, and Bedford Basin, Nova Scotia. Unlike previously proposed discriminative machine learning algorithms, the CRF model considers both the relative backscatter magnitudes of different substrates and their relative proximities. The model therefore combines the statistical flexibility of a machine learning algorithm with an inherently spatial treatment of the substrate. The CRF model predicts substrates such that nearby locations with similar backscattering characteristics are likely to be in the same substrate class. The degree of proximity and allowable backscatter similarity are controlled by parameters that are learned from the data. CRF model results were evaluated against a popular generative model known as a Gaussian Mixture model that doesn't include spatial dependencies, only covariance between substrate backscattering response over different frequencies. Both models are used in conjunction with sparse bed observations/samples in a supervised classification. A detailed accuracy assessment, including a leave-one-out cross-validation analysis, was performed using both models. Using multispectral backscatter, the GMM model trained on 50% of the bed observations resulted in a 75% and 89% average accuracies in Patricia Bay and Bedford Basin, respectively. The same metrics for the CRF model were 78% and 95%. Further, the CRF model resulted in a 91% mean cross-validation accuracy across four substrate classes at Patricia Bay, and a 99.5% mean accuracy across three substrate classes at Bedford Basin, which suggest that the CRF model generalizes extremely well to new data. This analysis also showed that the CRF model was much less sensitive to the specific number and locations of bed observations than the generative model, owing to its ability to incorporate spatial autocorrelation in substrates. The CRF approach therefore may prove to be a powerful `spatially aware' alternative to other discriminative classifiers.