Unsupervised Feature Selection Using Recursive k-Means Silhouette Elimination (RkSE): A Two-Scenario Case Study for Fault Classification of High-Dimensional Sensor Data

Ahlam Mallak; Madjid Fathi

doi:10.20944/preprints202008.0254.v1

Submitted:

09 August 2020

Posted:

11 August 2020

You are already at the latest version

Abstract

Feature selection is a crucial step to overcome the curse of dimensionality problem in data mining. This work proposes Recursive k-means Silhouette Elimination (RkSE) as a new unsupervised feature selection algorithm to reduce dimensionality in univariate and multivariate time-series datasets. Where k-means clustering is applied recursively to select the cluster representative features, following a unique application of silhouette measure for each cluster and a user-defined threshold as the feature selection or elimination criteria. The proposed method is evaluated on a hydraulic test rig, multi sensor readings in two different fashions: (1) Reduce the dimensionality in a multivariate classification problem using various classifiers of different functionalities. (2) Classification of univariate data in a sliding window scenario, where RkSE is used as a window compression method, to reduce the window dimensionality by selecting the best time points in a sliding window. Moreover, the results are validated using 10-fold cross validation technique. As well as, compared to the results when the classification is pulled directly with no feature selection applied. Additionally, a new taxonomy for k-means based feature selection methods is proposed. The experimental results and observations in the two comprehensive experiments demonstrated in this work reveal the capabilities and accuracy of the proposed method.

Keywords:

feature selection

;

k-means

;

silhouette measure

;

clustering

;

big data

;

fault classification

;

sensor data

;

time-series data

Subject:

Computer Science and Mathematics - Information Systems

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Unsupervised Feature Selection Using Recursive k-Means Silhouette Elimination (RkSE): A Two-Scenario Case Study for Fault Classification of High-Dimensional Sensor Data

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe