Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

On the Generalization Ability of Data-Driven Models in the Problem of Total Cloud Cover Retrieval

Version 1 : Received: 3 November 2020 / Approved: 4 November 2020 / Online: 4 November 2020 (10:55:10 CET)

How to cite: Krinitskiy, M.; Aleksandrova, M.; Verezemskaya, P.; Gulev, S.; Sinitsyn, A.; Kovaleva, N.; Gavrikov, A. On the Generalization Ability of Data-Driven Models in the Problem of Total Cloud Cover Retrieval. Preprints 2020, 2020110192 (doi: 10.20944/preprints202011.0192.v1). Krinitskiy, M.; Aleksandrova, M.; Verezemskaya, P.; Gulev, S.; Sinitsyn, A.; Kovaleva, N.; Gavrikov, A. On the Generalization Ability of Data-Driven Models in the Problem of Total Cloud Cover Retrieval. Preprints 2020, 2020110192 (doi: 10.20944/preprints202011.0192.v1).

Abstract

Total cloud cover (TCC) retrieval from ground-based optical imagery is a problem being tackled by a few generations of researchers. The number of human-designed algorithms for the estimation of TCC grows every year. However, there is not very much progress in terms of quality, mostly due to the lack of systematic approach to the design of the algorithms, to the assessment of their generalization ability, and to the assessment of the TCC retrieval quality. In this study, we discuss the optimization nature of data-driven schemes for TCC retrieval. In order to compare the algorithms, we propose the framework for the assessment of the algorithms characteristics. We present several new algorithms that are based on deep learning techniques: a model for outliers filtering, and a few models for TCC retrieval from all-sky imagery. For training and assessment of data-driven algorithms of this study, we present the Dataset of All-Sky Imagery over the Ocean (DASIO) containing over one million of all-sky optical images of visible sky dome taken in various regions of the World Ocean. The research campaigns contributed to DASIO collection took place in the Atlantic ocean, the Indian ocean the Red and Mediterranean seas, and also in the Arctic ocean. Optical imagery collected during these missions are accompanied by standard meteorological observations of cloudiness characteristics made by experienced observers. We assess the generalization ability of the presented models in several scenarios that differ in terms of the regions selected for the train and validation subsets. As a result, we demonstrate that our models based on convolutional neural networks deliver superior quality compared to all previously published schemes. As a key result, we demonstrate considerable drop of the ability to generalize the training data in case of strong covariate shift between training and validation subsets of imagery which may take place in case of region-aware subsampling.

Subject Areas

Total cloud cover; all-sky camera; algorithms assessment; neural networks; machine learning; data-driven approach

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.