Submitted:
12 September 2023
Posted:
14 September 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Application
2.1. Step one: increase the number of minority class sample points
2.2. Step two: balanced data subset
- 1
- Select random data from a small number of classes.
- 2
- Calculate the distance of this random data from the K nearest neighbors.
- 3
- Multiply the distance with a random number between 0 and 1 and generate a sample, add the generated sample to the minority class.
- 4
- Repeat until the sample set is balanced.
2.3. Step three: clear boundary noise
- 1
- Constructs a matrix in which each element represents the distance between two time series at the corresponding position.
- 2
- Start at the top left corner of the matrix and traverse the matrix along the diagonal to the bottom right corner.
- 3
- At each position, select the distance at which d is minimized.
- 4
- Repeat steps 2-3 until you reach the lower right corner of the matrix.
- 5
- The element in the lower right corner of the matrix is the DTW distance between the two time series.

3. Experiments
3.1. Data Description
3.2. Metrics
3.3. Other Undersampling Algorithms,Classifier and Parameter
4. Experiment Results
5. Conclusion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Thabtah, F.; Hammoud, S.; Kamalov, F.; Gonsalves, A. Data Imbalance in Classification: Experimental Evaluation. Information Sciences 2020, 513, 429–441. [Google Scholar] [CrossRef]
- Cao, L.; Zhai, Y. Imbalanced Data Classification Based on a Hybrid Resampling SVM Method. 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom); IEEE: Beijing, 2015; pp. 1533–1536. [Google Scholar] [CrossRef]
- Ganganwar, V. An Overview of Classification Algorithms for Imbalanced Datasets. International Journal of Emerging Technology and Advanced Engineering 2012, 2, 42–47. [Google Scholar]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Han, H.;Wang,W.Y.; Mao, B.H. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. Advances in Intelligent Computing; Huang, D.S.; Zhang, X.P.; Huang, G.B., Eds.; Springer: Berlin, Heidelberg, 2005; Lecture Notes in Computer Science, pp. 878–887. [CrossRef]
- Bunkhumpornpat, C.; Sinapiromsaran, K.; Lursinsap, C. DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique. Applied Intelligence 2012, 36, 664–684. [Google Scholar] [CrossRef]
- Devi, D.; Biswas, S.K.; Purkayastha, B. A Review on Solution to Class Imbalance Problem: Undersampling Approaches. 2020 International Conference on Computational Performance Evaluation (ComPE), 2020, pp. 626–631. [CrossRef]
- Kubat, M.; Matwin, S. Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. In Proceedings of the Fourteenth International Conference on Machine Learning. Morgan Kaufmann; 1997; pp. 179–186. [Google Scholar]
- Koziarski, M.; Woźniak, M.; Krawczyk, B. Combined Cleaning and Resampling Algorithm for Multi-Class Imbalanced Data with Label Noise. Knowledge-Based Systems 2020, 204, 106223. [Google Scholar] [CrossRef]
- Kaur, H.; Pannu, H.S.; Malhi, A.K. A Systematic Review on Imbalanced Data Challenges in Machine Learning: Applications and Solutions. ACM Computing Surveys 2019, 52, 79:1–79:36. [Google Scholar] [CrossRef]
- Aguiar, G.; Krawczyk, B.; Cano, A. A Survey on Learning from Imbalanced Data Streams: Taxonomy, Challenges, Empirical Study, and Reproducible Experimental Framework. Machine Learning 2023. [Google Scholar] [CrossRef]
- Wang, S. Optimizing the Smoothed Bootstrap. Annals of the Institute of Statistical Mathematics 1995, 47, 65–80. [Google Scholar] [CrossRef]
- Fernandez, A.; Garcia, S.; Herrera, F.; Chawla, N.V. SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-Year Anniversary. Journal of Artificial Intelligence Research 2018, 61, 863–905. [Google Scholar] [CrossRef]
- Keogh, E.; Ratanamahatana, C.A. Exact Indexing of Dynamic Time Warping. Knowledge and Information Systems 2005, 7, 358–386. [Google Scholar] [CrossRef]
- Dau, H.A.; Bagnall, A.; Kamgar, K.; Yeh, C.C.M.; Zhu, Y.; Gharghabi, S.; Ratanamahatana, C.A.; Keogh, E. The UCR Time Series Archive. IEEE/CAA Journal of Automatica Sinica 2019, 6, 1293–1305. [Google Scholar] [CrossRef]
- Knerr, S.; Personnaz, L.; Dreyfus, G. Single-Layer Learning Revisited: A Stepwise Procedure for Building and Training a Neural Network. In Neurocomputing; Soulié, F.F., Hérault, J., Eds.; Springer Berlin Heidelberg: Berlin, Heidelberg, 1990; pp. 41–50. [Google Scholar] [CrossRef]
- Clark, P.; Boswell, R. Rule Induction with CN2: Some Recent Improvements. Machine Learning—EWSL-91; Kodratoff, Y., Ed.; Springer: Berlin, Heidelberg, 1991; Lecture Notes in Computer Science, pp. 151–163 [CrossRef]
- Anand, R.; Mehrotra, K.; Mohan, C.; Ranka, S. Efficient Classification for Multiclass Problems Using Modular Neural Networks. IEEE Transactions on Neural Networks 1995, 6, 117–124. [Google Scholar] [CrossRef] [PubMed]
- Gowda, T.; You, W.; Lignos, C.; May, J. Macro-Average: Rare Types Are Important Too. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, pp. 1138–1157, [arxiv:cs/2104.05700]. [CrossRef]
- Dempster, A.; Petitjean, F.; Webb, G.I. ROCKET: Exceptionally Fast and Accurate Time Series Classification Using Random Convolutional Kernels. Data Mining and Knowledge Discovery 2020, 34, 1454–1495, [1910.13051]. [CrossRef]









| Datasets | Imbalance rate |
|---|---|
| ChlorineConcentration | 2.88 |
| DistalPhalanxOutlineAgeGroup | 8.57 |
| DistalPhalanxTW | 11.83 |
| ECG5000 | 146.00 |
| FiftyWords | 52.00 |
| MedicalImages | 33.83 |
| MiddlePhalanxOutlineAgeGroup | 4.31 |
| MiddlePhalanxTW | 6.40 |
| ProximalPhalanxTW | 11.25 |
| Datasets | Precision | Recall | F1-score |
|---|---|---|---|
| ChlorineConcentration | 0.6577 | 0.6633 | 0.6602 |
| DistalPhalanxOutlineAgeGroup | 0.6576 | 0.7178 | 0.6739 |
| DistalPhalanxTW | 0.4803 | 0.4622 | 0.4680 |
| MedicalImages | 0.7507 | 0.7397 | 0.7311 |
| MiddlePhalanxOutlineAgeGroup | 0.4351 | 0.4064 | 0.4070 |
| MiddlePhalanxTW | 0.4004 | 0.4052 | 0.4010 |
| ProximalPhalanxTW | 0.5583 | 0.5075 | 0.5170 |
| ECG5000 | 0.6998 | 0.5904 | 0.6171 |
| FiftyWords | 0.7156 | 0.7261 | 0.6901 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).