Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Training Data Augmentation with Data Distilled by the Principal Component Analysis

Version 1 : Received: 11 December 2023 / Approved: 12 December 2023 / Online: 12 December 2023 (07:24:28 CET)

A peer-reviewed article of this Preprint also exists.

Sirakov, N.M.; Shahnewaz, T.; Nakhmani, A. Training Data Augmentation with Data Distilled by Principal Component Analysis. Electronics 2024, 13, 282. Sirakov, N.M.; Shahnewaz, T.; Nakhmani, A. Training Data Augmentation with Data Distilled by Principal Component Analysis. Electronics 2024, 13, 282.

Abstract

This work develops a new method for vector data augmentation. The proposed method applies the Principal Component Analysis (PCA) and determines the Eigenvectors of a set of training vectors for a Machine Learning (ML) method and uses them to generate the distilled vectors. The training and the PCA distilled vectors have the same dimension. The user chooses the number of vectors to be distilled and augmented to the set of training vectors. A statistical approach determines the lowest number of vectors to be distilled such that when augmented to the original vectors the extended set trains a ML classifier to achieve a required accuracy. Hence, the novelty of this study is, the distillation of vectors with the PCA method and their use to augment the original set of vectors. The advantage that comes from the novelty is that it increases the statistics of ML classifiers. To validate the advantage, we conducted experiments with four public databases and applied four classifiers: Neural Network, Logistic Regression, Support Vector Machine with linear and polynomial kernels. For the purpose of augmentation, we conduct several distillations including nested (double distillation). The latter notion means that new vectors were distilled from already distilled vectors. We train the classifiers with three sets of vectors: original vectors; original augmented with vectors distilled by PCA; original augmented with distilled and double distilled, by PCA, vectors. The experimental results are presented in the paper, and they confirm the advantage that the PCA distilled vectors increase the classification statistics of ML methods, if the distilled vectors augment the original training vectors.

Keywords

data; distillation; augmentation; classification; machine learning

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.