Version 1
: Received: 29 August 2023 / Approved: 30 August 2023 / Online: 31 August 2023 (02:55:17 CEST)
How to cite:
Li, Z.; Zhang, T. Neither Zero-Catch Data nor Model Structure. Noisy Labels Are the Key Hindrance of Improving Fisheries Forecasting Performance. Preprints2023, 2023082081. https://doi.org/10.20944/preprints202308.2081.v1
Li, Z.; Zhang, T. Neither Zero-Catch Data nor Model Structure. Noisy Labels Are the Key Hindrance of Improving Fisheries Forecasting Performance. Preprints 2023, 2023082081. https://doi.org/10.20944/preprints202308.2081.v1
Li, Z.; Zhang, T. Neither Zero-Catch Data nor Model Structure. Noisy Labels Are the Key Hindrance of Improving Fisheries Forecasting Performance. Preprints2023, 2023082081. https://doi.org/10.20944/preprints202308.2081.v1
APA Style
Li, Z., & Zhang, T. (2023). Neither Zero-Catch Data nor Model Structure. Noisy Labels Are the Key Hindrance of Improving Fisheries Forecasting Performance. Preprints. https://doi.org/10.20944/preprints202308.2081.v1
Chicago/Turabian Style
Li, Z. and Tianjiao Zhang. 2023 "Neither Zero-Catch Data nor Model Structure. Noisy Labels Are the Key Hindrance of Improving Fisheries Forecasting Performance" Preprints. https://doi.org/10.20944/preprints202308.2081.v1
Abstract
The zero-catch problem is a key issue in CPUE(Catch Per Unit Effort) standardization, and previous studies have treated all zero-catch data uniformly, but this actually loses some correctly-labeled samples. On the other hand, for the main catches with few zero-catch samples, the problem of low performance of fisheries forecasting remains unsolved even though the forecasting model structure is updating constantly, since we cannot know whether the samples are correctly recorded. In this paper, we propose a method based on confident learning theory to detect anomalous samples in the datasets and unify zero-catch and non-zero samples as noise through an overarching framework of learning with noisy labels, which reveals the heterogeneity among zero-catch samples (as well as among non-zero samples) and the homogeneity between zero-catch samples and non-zero samples. Using three species of tuna in the tropical Atlantic Ocean with the spatial resolution of 0.5 ◦ × 0.5 ◦ and time resolution of days from 2016 to 2019 as experimental material, performance on all three classical machine learning models(Random forest, Support Vector Machine and XGBoost) is significantly improved compared to each baseline.This confirms that we propose a self-adaptive, effective method for detecting and repairing anomalous samples in the fishery dataset.
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.