Du, Y.; Cai, Y.; Jin, X.; Wang, H.; Li, Y.; Lu, M. ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples. Mathematics2023, 11, 3891.
Du, Y.; Cai, Y.; Jin, X.; Wang, H.; Li, Y.; Lu, M. ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples. Mathematics 2023, 11, 3891.
Du, Y.; Cai, Y.; Jin, X.; Wang, H.; Li, Y.; Lu, M. ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples. Mathematics2023, 11, 3891.
Du, Y.; Cai, Y.; Jin, X.; Wang, H.; Li, Y.; Lu, M. ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples. Mathematics 2023, 11, 3891.
Abstract
Most existing data synthesis methods are designed to tackle problems such as dataset imbalance, data anonymization and insufficient sample size. There is a lack of effective synthesis methods for the limited number of datasets which contain a large of features and unknown noise to expand the size of the dataset. We propose a data synthesis method, named Adaptive Subspace Interpolation for Sample Optimization (ASISO). The idea is to divide the original feature space into several subspaces with an equal number of samples, and then perform interpolation for the samples in the adjacent subspaces. This method can adaptively adjust the size of the dataset containing unknown noise, and the expanded data typically contain minimal error with actual. Moreover, it adjusts the structure of the samples, which can significantly reduce the proportion of samples with large errors. In addition, the hyperparameters of this method have an intuitive explanation and usually require little calibration. Experimental results on artificial data and benchmark data sets demonstrate that ASISO is a robust and stable method to optimize samples.
Keywords
data synthesis; unknown noise; interpolation; sample optimization; robust
Subject
Computer Science and Mathematics, Probability and Statistics
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.