PreprintArticleVersion 1Preserved in Portico This version is not peer-reviewed
Unsupervised Feature Selection Using Recursive k-Means Silhouette Elimination (RkSE): A Two-Scenario Case Study for Fault Classification of High-Dimensional Sensor Data
Version 1
: Received: 9 August 2020 / Approved: 11 August 2020 / Online: 11 August 2020 (06:26:43 CEST)
How to cite:
Mallak, A.; Fathi, M. Unsupervised Feature Selection Using Recursive k-Means Silhouette Elimination (RkSE): A Two-Scenario Case Study for Fault Classification of High-Dimensional Sensor Data. Preprints2020, 2020080254. https://doi.org/10.20944/preprints202008.0254.v1
Mallak, A.; Fathi, M. Unsupervised Feature Selection Using Recursive k-Means Silhouette Elimination (RkSE): A Two-Scenario Case Study for Fault Classification of High-Dimensional Sensor Data. Preprints 2020, 2020080254. https://doi.org/10.20944/preprints202008.0254.v1
Mallak, A.; Fathi, M. Unsupervised Feature Selection Using Recursive k-Means Silhouette Elimination (RkSE): A Two-Scenario Case Study for Fault Classification of High-Dimensional Sensor Data. Preprints2020, 2020080254. https://doi.org/10.20944/preprints202008.0254.v1
APA Style
Mallak, A., & Fathi, M. (2020). Unsupervised Feature Selection Using Recursive k-Means Silhouette Elimination (RkSE): A Two-Scenario Case Study for Fault Classification of High-Dimensional Sensor Data. Preprints. https://doi.org/10.20944/preprints202008.0254.v1
Chicago/Turabian Style
Mallak, A. and Madjid Fathi. 2020 "Unsupervised Feature Selection Using Recursive k-Means Silhouette Elimination (RkSE): A Two-Scenario Case Study for Fault Classification of High-Dimensional Sensor Data" Preprints. https://doi.org/10.20944/preprints202008.0254.v1
Abstract
Feature selection is a crucial step to overcome the curse of dimensionality problem in data mining. This work proposes Recursive k-means Silhouette Elimination (RkSE) as a new unsupervised feature selection algorithm to reduce dimensionality in univariate and multivariate time-series datasets. Where k-means clustering is applied recursively to select the cluster representative features, following a unique application of silhouette measure for each cluster and a user-defined threshold as the feature selection or elimination criteria. The proposed method is evaluated on a hydraulic test rig, multi sensor readings in two different fashions: (1) Reduce the dimensionality in a multivariate classification problem using various classifiers of different functionalities. (2) Classification of univariate data in a sliding window scenario, where RkSE is used as a window compression method, to reduce the window dimensionality by selecting the best time points in a sliding window. Moreover, the results are validated using 10-fold cross validation technique. As well as, compared to the results when the classification is pulled directly with no feature selection applied. Additionally, a new taxonomy for k-means based feature selection methods is proposed. The experimental results and observations in the two comprehensive experiments demonstrated in this work reveal the capabilities and accuracy of the proposed method.
Keywords
feature selection; k-means; silhouette measure; clustering; big data; fault classification; sensor data; time-series data
Subject
Computer Science and Mathematics, Information Systems
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.