Data-Driven Techniques for Mechanical Fault Classification

Marc-André Huneault; Markus Timusk; Chris Mechefske

doi:10.20944/preprints202606.1595.v1

Submitted:

22 June 2026

Posted:

23 June 2026

You are already at the latest version

Abstract

Data-driven techniques for machine condition monitoring are typically initially tested on relatively simple experimental datasets where they often achieve near-perfect classification accuracies. These results do not confirm applicability in real-world situations nor whether the computational demands required are justified. This paper introduces an experimental dataset designed to better approximate selected real-world operational conditions. This dataset is then used to evaluate both traditional and novel techniques for machinery fault classification and determine their effectiveness for use in computing resource-constrained environments, such as where edge computing might be employed on mobile or remote equipment. Four distinct data-driven methods were tested and analyzed using a variety of metrics beyond only classification accuracy. The traditional machine learning techniques struggled to identify fault conditions from unseen speed profiles, whereas deep learning models succeeded. Computational metrics indicated that some methods required significantly more memory usage and/or computing time than other models. One of the deep learning models had memory requirements and computational demand that would allow it to be theoretically feasible for real-time application on platforms such as the Jetson Nano or Raspberry Pi 5. This highlights the potential of ongoing deep learning model development for machine condition monitoring, as well as the limitations of traditional approaches.

Keywords:

artificial intelligence

;

bi-directional long short-term memory

;

convolutional neural network

;

deep learning

;

machine learning

;

multi-scale convolutional neural network

;

particle swarm optimization

;

recurrent neural network

;

support vector machine

Subject:

Engineering - Mechanical Engineering

1. Introduction

Downtime and maintenance costs significantly impact industrial, mining, and manufacturing operations. Failures, especially when they occur unexpectedly, can lead to bottlenecks and supply chain disruptions, increase operational and capital expenses, and expose workers to unsafe working conditions. To improve maintenance strategies, real-time monitoring and analysis techniques have been and continue to be developed to detect anomalous behaviour of components during the operation of equipment. Simple applications include using measured vibration amplitude thresholds to deduce the presence of a faulty component. More complex implementations include the use of data-driven machine learning (ML) models to estimate the health status of more complex machinery from those same vibration signals.

Early data-driven fault detection models were typically developed using a few manually selected features (commonly statistical) computed from a labeled dataset representing different faults embedded into a machine. These models were then used to train a classifier, such as a shallow multi-layered perceptron (MLP) or support vector machine (SVM), with various algorithms and data pre-processing/feature extraction methods to classify unseen or new signals [1]. A shortfall commonly found with these early fault detection models is their inability to handle data complexity due to the dynamic nature of certain machines and the effects of noise in the data. Ongoing development, coupled with enhanced data processing and storage capabilities, has led to the emergence of novel methods to address these limitations. These methods, commonly referred to as deep learning (DL) models, feature more extensive neural network structures with multiple interconnected layers, enabling them to better characterize faulty behaviours. Their increased depth enables them to conduct their own feature extraction, reducing the need for prior knowledge regarding the monitored machine's behaviour [2].

Newly developed models are consistently achieving high prediction accuracies on common fault classification datasets like the Case-Western University Bearing Dataset (CWRU) [3,4,5]. Such datasets are typically limited in size and are collected from simple experimental setups. They usually involve time-invariant sequences and feature a limited number of fault categories across the entire dataset. It is thus difficult to ascertain how models developed on these datasets would perform on data collected from a real application, which is likely to include more fault cases, noise from various sources, and variance in signal characteristics. Additionally, since many models are reporting excellent results on the same datasets, it can become difficult to select an appropriate model for a specific application. Identifying the limits of a model can showcase where they are likely to be successful, where they can be improved, and which applications they best suit.

To strengthen the argument for improving model selection criteria, particularly given the numerous reports of near-perfect performance from models trained and tested on commonly available experimental datasets, it is essential to emphasize the computational cost of training and inferring the proposed methodologies. This information can assist in selecting an appropriate model for a specific task from both hardware cost and computational constraint standpoints.

The goal of this research is to provide an improved analysis of existing and novel fault detection techniques using an experimental dataset collected with the intention of better representing a wider range of electromechanical systems and their operating conditions. The analysis will consider both performance and computational metrics to highlight the suitability of a given technique for real-world applications by better defining their abilities and limitations and whether the computation cost is worthwhile.

2. Materials and Methods

Intelligent machinery monitoring systems have been a topic of research interest for many decades [5]. Data-driven techniques have been developed to detect, classify, and predict the behaviour of machines and their components. Commonly referred to as ML (machine learning) models, these classifiers typically learn, from manually selected features, to identify the state of a monitored system. These models have had many successful implementations in the field of machine condition monitoring, especially in cases where little data and/or limited computational resources are available for training. However, the susceptibility of manually selected signal features to noise and other interference, limits their ability to learn from data collected in unfavourable conditions. Moreover, their restricted capacity to generalize non-linear relationships imposes an upper limit on system complexity that can be effectively monitored. In brief, promising results from traditional ML methods are typically only possible when data are collected from simple experimental apparatuses (featuring few components) subjected to minimal noise and consisting of subsets of short-term, steady-state recordings across discrete changes in operating conditions or fault classes.

Data-driven condition monitoring techniques have evolved due to increasing availability of data sources, computational models, and computational resources. Combined with the availability of abundant training data and computational power, DL (deep learning) classifiers have become larger and more efficient. Moreover, the advent of consumer application programming interfaces has streamlined the utilization of graphical processing units (GPUs) for parallel processing during model training, resulting in significantly accelerated training rates. In many cases in the context of fault detection and classification, these models are capable of learning from signals with little or no data preprocessing and make predictions on system behaviour with high classification accuracy.

However, demonstrating high classification accuracies with a proposed methodology does not always guarantee optimal results. Many of the datasets that researchers succeed with may not demand the characteristics of novel or large DL models. Furthermore, achieving near-perfect results on a particular dataset may not necessarily reflect the technique's suitability in a real-world context, where subtle variations between machines and operational cycles could impact model predictions and performance.

2.1. Background

2.1.1. Benchmark Machine Condition Monitoring Datasets

Many datasets have been developed and published for the evaluation of machine condition monitoring methodologies. Three widely used datasets are the Case Western Reserve University (CWRU) Bearing Dataset [6], the Paderborn University Bearing Data Set [7], and the Mechanical Failure Prevention Technology (MFPT) dataset [8]. These datasets serve as benchmarks for evaluating the performance of fault detection models and have been extensively used in academic research.

The CWRU dataset has been widely adopted for developing and testing machine learning and deep learning algorithms. The dataset was collected using an experimental setup consisting of an electric motor, a torque transducer/encoder, and a dynamometer. Bearings with different fault types and severities were used to simulate various fault conditions. Vibration data was collected using accelerometers placed at different locations on the bearing housing while the drive motor was operated for short times at a constant speed and load. The experiments representing normal (fault free) conditions are around 40 seconds in duration while the experiments involving faulty bearings are just over 10 seconds long each.

2.1.2. Machine Learning Methods for Machine Fault Diagnostics

Two popular and effective ML techniques are Support Vector Machines (SVM) and Artificial Neural Networks (ANN). The SVM, a well-established binary classification model for the classification of non-separable datasets, is still commonly used because of its ability to effectively generalize non-linear relationships while maintaining relative computational simplicity when compared to DL models. SVMs have frequently been implemented for the task of fault detection and diagnosis for conditions where limited data is available for training [9,10].

Artificial neural networks (ANNs) have been implemented to learn a system’s behaviour by using manually selected, or engineered, features derived from physical signals. These features are used to classify specific conditions and detect anomalies, facilitating the identification of potential faults or failures in the system. This approach leverages the ability of ANNs to model complex, non-linear relationships within the data, for the classification of various operational states and fault conditions [11]. ANNs can be scaled in terms of layers and nodes to increase their resolution and to learn their own features from raw signals. However, since each node has a trainable weight and bias term associated with it for every node of the following layer, this can become computationally expensive.

2.1.3. Deep Learning Methods for Machine Fault Diagnostics

Increased computational power, coupled with access to increasingly large and diverse datasets across various fields such as natural language processing (NLP) and image processing, has led to the development of more sophisticated machine learning models. These models have in some cases diminished the need for extensive data preprocessing or expert knowledge of the target application. This evolution in deep learning (DL) has also significantly impacted the field of machine condition monitoring where autoencoders, recurrent neural networks, convolutional neural networks, and multi-scale convolutional neural networks have been utilized.

Autoencoders are a form of unsupervised learning model trained to reconstruct signals and features from their compressed form. Typically used for anomaly detection tasks in machine condition monitoring applications, autoencoders grade the likelihood of normal or anomalous states from their ability to reconstruct the observed signals or features [12,13]. Should a signal or group of features differ significantly from that which was used to build the model, it can be considered likely that it represents anomalous behaviour. However, as the complexity of the operation of a machine increases, it becomes more difficult for these models to accurately decide if the observed signals are normal or not. As a result, variations of the autoencoder have been developed to overcome these limitations, such as the convolutional-autoencoder, which have increased model robustness and are therefore less sensitive to small variations in system behaviour, providing better classification performance.

Recurrent Neural Networks (RNNs) are a type of data-driven model particularly well suited for time-series or sequential data applications. These models process input as a sequence, which is highly valuable in machine condition monitoring, where RNNs and Long Short-Term Memory (LSTM) models are employed to manage time-series data and capture temporal dependencies. This ability to understand the order and timing of events is especially useful for detecting and predicting machinery faults, given the inherently sequential nature of the signals collected and the development of fault conditions over time.

Convolutional neural networks (CNNs) use trainable filters (kernels) that are passed over input data and used to extract localized features that can then be used for classification tasks. Unlike traditional neural networks, CNNs do not require full interconnection between all neurons across layers, which reduces computational complexity. Additionally, CNNs employ pooling operations to reduce the dimensionality of the data, leading to faster convergence, while dropout layers help prevent overfitting by limiting the model's reliance on specific features [14]. The combination of these factors reduces processing and memory requirements [15], while providing effective feature extraction capabilities. CNNs have emerged in the field of machine condition monitoring as a successful tool for denoising and dimensionality reduction in classification tasks [3,4,16].

Multi-scale convolutional neural networks (MSCNNs) have been developed to enhance feature extraction by combining convolutional layers with different kernel sizes [17]. This approach allows features at various scales to be captured, improving model performance in vibration signal analysis. A significant benefit of MSCNNs is their ability to process raw data, such as vibration signals, without the need for extensive feature engineering. However, a limitation is that capturing the required amount of training data in an industrial environment can be challenging, potentially hindering the practicality of their application.

As an input is passed through a convolutional neural network, the “observed" feature, or “perspective” on the original input, is scaled in size after every layer. This same concept can be applied to vibration signals in machine operations where, under varying working conditions, vibration patterns can be masked by the changing global operating frequencies of the machine [18,19].

There have been many successful applications of ML and DL implementations in machinery condition monitoring. Rarely reported in the results of these studies are the computational requirements needed to train and infer them, and whether that is practical for the task they are being evaluated with or how that translates to practical use in real-world situations. Issues such as dataset simplicity, limited variability, and potential overfitting challenge the generalizability of many models. More extensive testing of both ML and DL applications on complex, real-world-like datasets, incorporating an analysis of the computational resources required for inference and training, should take place to better highlight their suitability for practical applications.

2.2. Laurentian University Dataset

A new dataset was collected and used for training and testing different condition monitoring techniques applicable to various types of rotating equipment. The Laurentian University (LU) dataset addresses some of the shortcomings identified above. The dataset includes multiple sets of time-variant operating conditions, a large variety of fault types, a suitable sampling frequency for the different sensors, and an improved overall quantity of data.

All experiments featured in this dataset were conducted with a SpectraQuest Mechanical Fault Simulator apparatus and componentry kit [20]. The simulator features a 1HP electric motor powering a belt-driven gearbox connected to a permanent magnet brake. The apparatus was fitted with multiple accelerometers, current sensors on the motor windings, and a tachometer to measure motor speed. Each were sampled at 10kHz with healthy experiments (experiments with no faulty components) lasting 30 minutes and faulty experiments (experiments with at least one faulty component) lasting 20 minutes. Each experiment was conducted twice; once with a square wave speed profile (Figure 1) and then with a sawtooth speed profile (Figure 2).

Experiments were conducted with single and multiple faults on components, simulating real-world industrial situations. Experiments were also conducted using only healthy components, once before and once after all the faulty experiments. Table 1 shows a full list of the experiments. A more complete description of the experimental apparatus, instrumentation, data collection procedures and access to all the data is available from the authors.

3. Results

3.1. Fault Classification

This section evaluates and compares the performance of an RFC, SVM, CNN, and a Multi-Scale CNN when used to distinguish fault types as would be required in a typical real-world machine condition monitoring application. Each model was initially trained and tested on the CWRU dataset for benchmark metrics. Using the Laurentian University (LU) dataset, each model was trained on a square wave speed profile and tested on a sawtooth wave speed profile. Multiple instances of each model were evaluated with varying dataset characteristics and preprocessing methods. The SVM had a replicated hyperparameter selection scheme based on [9], the CNN was replicated from [16], and the multi-scale CNN was influenced by [21].

The LU dataset was reconfigured in multiple ways to evaluate model performance. The following datasets were generated for each control signal from two minutes of each fault class and healthy class (four minutes total of healthy data) for the feature datasets of the SVM implementation:

Sample size: 4000 standardized acceleration data points, Fault Classes: Grouped;
Sample size: 4000 raw acceleration data points, Fault Classes: Grouped.

The following were generated for each control signal from ten minutes of each fault class and healthy class (twenty minutes total of healthy data) to train and test the DL implementations:

Sample size: 4000 standardized acceleration data points, Fault Classes: Grouped;
Sample size: 4000 raw acceleration data points, Fault Classes: Grouped;
Sample size: 1000 standardized acceleration data points, Fault Classes: Grouped;
Sample size: 1000 raw acceleration data points, Fault Classes: Grouped;
Sample size: 4000 standardized acceleration data points, Fault Classes: Fully Separated;
Sample size: 4000 raw acceleration data points, Fault Classes: Fully Separated;
Sample size: 1000 standardized acceleration data points, Fault Classes: Fully Separated;
Sample size: 1000 raw acceleration data points, Fault Classes: Fully Separated.

The Grouped class configuration considers each fault for a given component as the same (all bearing faults for a given location have the same label). The Fully Separated class configuration separates each fault type and fault location as their own class. The class configurations are shown in Table 2. When generating standardized validation or test sets, the mean and standard deviation parameters used in the formulation were taken from the original training set. The class configuration for the CWRU dataset is shown in Table 3.

Every experiment featured in the CWRU dataset was compiled into binary files with sample sizes of 1000 and 4000 acceleration data points. As was done with the LU dataset, the list of file paths to each binary file was shuffled and split for training and validating. The entire dataset was reprocessed for both the SVM (only sample sizes of 4000 used for SVM) and DL implementation as follows:

Sample size: 4000 standardized acceleration data points;
Sample size: 4000 raw acceleration data points;
Sample size: 1000 standardized acceleration data points;
Sample size: 1000 raw acceleration data points.

3.1.1. Feature Dataset for the Support Vector Machine Implementation

Multiple datasets consisting of multiple time-series statistical features extracted from each sample of acceleration data were developed for the SVM implementation. These feature datasets were built from the segmented LU datasets. The computed statistical features were Mean, Variance, Root Mean Square, Gaussian Entropy, Standard Deviation, Fisher-Pearson Skewness, Sample Excess Kurtosis, Shape Factor, Crest Factor, Impulse Factor, Peak-to-Peak magnitude, Clearance Factor, Histogram Delta, Upper Bound of Histogram, Lower Bound of Histogram [10], and Moments of Degree (5, 6, and 7) [9]. The SVM’s hyperparameter selection scheme followed the methods presented in [10]. A graphical representation of this process is illustrated in Figure 3.

3.1.2. Convolutional Neural Network Implementation

The CNN used in this study was replicated from [16], which has reportedly reached accuracies nearing 100 percent on the CWRU dataset. The model was trained in batches of 64 samples with the Adam optimization algorithm and a learning rate of 0.001. The loss criterion was cross entropy loss. The only difference between the model used in [16] and this research was the input size, the fully connected classifier, and output size. The input size varied between 1000 or 4000 acceleration data points. The fully connected classifier input size was dependent on the input to the model and thus varied between (batch size, 216) and (batch size, 968). The output size of the classifier was dependent on the class configuration of the dataset, and thus varied between (batch size, 7) and (batch size, 27).

3.1.3. Multi-Scale CNN Implementation

The multi-scale convolutional neural-network implementation was based on the architecture used in [17]. Instead of using the time-frequency representation of the acceleration data, a one-dimensional vector of the raw or standardized acceleration data was used to capitalize on the feature extraction capabilities of the model. Additionally, the attention-based mechanism was replaced by a Convolutional Block Attention Module (CBAM) [21]. A detailed representation of this architecture can be found in Figure 4.

The following modifications were made to the model architecture:

The number of output channels from each convolutional layer in the Multi-Scale Convolutional Block was reduced to 24, resulting in a concatenated output of 96 channels;
A feature fusion block was added before the attention blocks to reduce the dimensionality of the input features and the size requirement of the attention module and classifier layer;
Hyperbolic tangent and batch normalization [23] activation functions were added after each convolutional block to stabilize input distribution to the next layers and minimize the gradient distribution across the model;
The number of channels of the convolution layer within the Initial Convolutional Block was increased to 64, its kernel size was reduced to five, and its stride increased to two. The reduction in kernel size was to reduce feature scaling due to the added convolutional layers in the Feature Fusion Block. A stride of two was implemented to reduce the dimensionality of the acceleration data while retaining localized information within each feature passed to the multi-scale convolutional layer;
The Classifier Block was modified such that the number of output features of the first fully connected sub-layer was three times the number of classes of the dataset it is trained on. The dropout probability was reduced to 0.2 to compensate for the reduced number of intermediary features and the addition of batch normalization layers.

Despite the inclusion batch normalization layers, early training sessions with a learning rate of 0.001 demonstrated a strong bias towards certain classes in the LU dataset with the full class separation configuration. The learning rate was reduced to 0.0001 to allow the model to better tune itself to the small differences in characteristics between classes.

The acceleration data samples describing the LU and CWRU datasets were shuffled and then split with 80 percent of samples used for training and 20 percent for validation. In the case where a trained model was to be tested on the sawtooth control signal features in the LU dataset, they were tested on the entire ten minutes of data for each class, or two minutes for the SVM configuration. As for the CWRU dataset, the entire dataset was used for all applications including the SVM. All final metrics were taken from the scores on the validation set.

The following metrics were collected from the DL model implementations:

The number of floating-point operations (FLOP) required for model inference;
The number of parameters in the model;
The average time duration of a training epoch;
The time duration for parsing through the LU or CWRU test datasets;
The accuracies per epoch on the validation datasets;
Confusion matrices of the best trained model on the test data;
A confusion matrix of the best trained model on the validation data;
Precision, recall and f1-score of the predictions of each class in the validation and test datasets;
Accuracy on the validation and test datasets.

The following metrics were collected from the SVM model implementations:

The time duration for selecting features;
The time duration for fitting the SVC;
The time duration for parsing through the LU or CWRU test datasets;
Number of iterations run by the optimization routine to fit the model;
The total number of support vectors in the SVC;
The total number of parameters (coefficients and intercepts) in the SVC;
Confusion matrix(ces) of the best trained model on the test data;
A confusion matrix of the best trained model on the validation data;
Precision, recall and f1-score of the predictions of each class in the validation and test datasets;
Accuracy on the validation and test datasets.

The same metrics collected from the SVC implementation were also collected for the Random Forest Classifier and default SVC configuration mentioned. From these metrics, a comparison can be made not only of performance, but for the computational requirements needed to implement them and whether that cost is worth the added model complexity.

4. Discussion

This section focuses on whether the selected ML and DL methodologies can successfully predict faults from a dataset representing a machine with two different transient operating states and a greater quantity of fault categories than is commonly found in public datasets. Assessment of the classification results and computational requirements was used to distinguish model suitability for embedded applications (implemented on relatively low-cost and power, edge-computing devices).

4.1. Fault Classification Performance

4.1.1. CWRU Dataset

As illustrated in Figure 5, the deep learning (DL) models were trained using 1,000 standardized acceleration data points per sample, while the highest-performing traditional machine learning (ML) models utilized features extracted from samples of 4,000 raw acceleration data points. It is important to note that the CWRU validation datasets contained a total of 2,880 or 722 samples, depending on whether they were divided into segments of 1,000 or 4,000 data points per sample, respectively. Among the DL models, the MSCNN achieved the highest overall classification accuracy on the validation set, correctly classifying nearly 100 percent of the samples. In contrast, the RFC achieved the highest accuracy among the ML models, with a maximum accuracy of approximately 92 percent.

The confusion matrices (not included in this paper) indicate that the CNN implementation had difficulty discerning the inner race bearing faults at the drive end location, mistaking them for outer race faults. Also, some of the samples from experiments featuring faults embedded in the ball (rolling element) of the bearing placed at the drive end location were partially misclassified as outer race faults. The discrepancy between the accuracies obtained in [16], from which the CNN was replicated, may be attributed to differences in the organization of fault types and the balance of class sizes.

Similarly, the confusion matrices indicate that the SVM models struggled to accurately predict faults in the outer race and rolling elements of the bearings. This difficulty may be due to the subtle nature of these faults and their influence on the selected features used for training the models. However, the RFC implementation did not exhibit similar challenges and only encountered difficulty in predicting rolling element faults located in the fan end bearing.

4.1.2. LU Dataset – Fully Separated Class Configuration

Only the DL implementations were evaluated with the fully separated class configuration of the LU dataset due to the computational constraints dictated by the optimization of an SVM or RFC. As shown in Figure 6 and Figure 7, the MSCNN implementation outperforms the standard CNN implementation by roughly 20 percent in accuracy on both the validation and test sets when distinguishing the fault types across various components and their locations. The MSCNN exhibits a reduction of 9.8 percent in classification accuracy from the validation set (square wave speed profile used for training) to the test set (unseen sawtooth speed profile), whereas the CNN implementation experiences a reduction of 10.1 percent. The confusion matrices demonstrated that the multiscale CNN implementation had difficulty distinguishing the outer race faults in the bearing located at the motor end housing from those in the pulley-end bearing housing. However, this difficulty did not occur in the reverse scenario. Additionally, the model encountered challenges distinguishing between the two types of bent shaft faults and frequently mislabeled the second shaft imbalance experiment as either the first imbalance or the chipped-tooth gear fault. Furthermore, the missing tooth experiment was predominantly misclassified as healthy data, with a smaller portion being mislabeled as part of the chipped tooth experiment.

The standard CNN model failed to correctly classify any gear faults or parallel misalignment faults in the fully separated class configuration. It also struggled with identifying cracked-shaft faults and differentiating the types of bearing faults in the motor-end housing. These results suggest that the CNN faced difficulties in scaling signal sources to identify the exact location of the faults, particularly with the level of resolution needed to detect small faults, such as a minor wedge section missing from the shaft. The increased number of classes may have introduced excessive complexity, hindering the CNN’s ability to detect the subtle characteristics of gear faults, which it managed to identify more effectively in the grouped class configuration.

4.1.3. LU Dataset – Grouped Class Configuration

When all fault types for a given component placed in each location (where applicable) were consolidated to the same class label (i.e. all bearing fault types at the motor end bearing housing have the same class label), the DL implementations had similar results in accuracy over the validation set and the test set. Regarding the ML implementations, the RFC seems to have greatly outperformed the SVC implementations on the validation set but had a similar classification accuracy over the test set. This could indicate the RFC overfit to the training data (square wave speed profile) and the variation in features between the training set and the test set deriving from different speed profiles affected its ability to accurately predict the fault conditions.

Figure 8. Best Model Accuracies on the Validation Set (Square Wave Speed Profile) from the LU Dataset, Grouped Class Configuration.

Figure 9. Best Model Accuracies on the Test Set (Sawtooth Wave Speed Profile) from the LU Dataset, Grouped Class Configuration.

The confusion matrices indicated that the multiscale CNN implementation had difficulty distinguishing the gear faults from the healthy condition data. This was expected based on the separation of the gearbox from the accelerometer by the belt and pulley drive. Some of the motor end bearing faults were considered pulley end bearing faults and some of the misalignment configurations were mistaken for gear faults or imbalance faults. The same can be said for the CNN implementation.

4.2. Model Computational Requirements

One of the motivations of this work was to consider the computational resources necessary for training and inferring each model included in the analysis. This assessment aims to provide further justification for the effectiveness of each method in comparison to the others. Assuming all trainable parameters are represented as 32-bit floating-point values, the number of parameters in each model was multiplied by 4 bytes and divided by 1024 to estimate the memory required to store the model parameters in kilobytes. While these values may not represent exact approximations of the actual memory allocation required to host and infer each model, they provide a reference for understanding size differences between them.

To approximate the number of parameters in the SVC models, the number of support vectors (data points in training set with non-zero Lagrange multipliers) was multiplied by the number of selected features (number of weights) which was then summed with the number of intercepts (number of support vectors). For the RFC, the number of parameters was approximated as the total number of nodes in the set of trees. For the DL models, the number of parameters in each model was computed by summing the elements within the model. Since all variations of the DL models contained negligible amounts of buffers, the approximate model memory requirement was computed with the number of parameters only as mentioned above. The number of floating-point operations (FLOPs) required to infer the models was computed using a python module and is represented in millions of FLOPs (MFLOPs) in Table 4 and Table 5. The number of FLOPs per second (FLOPs/s) required to infer each model in the time it takes to collect the input size at the sample frequency of collection was calculated to further represent the computational needs of each model.

The columns in Table 4 and Table 5 are defined as follows:

N_sv: Number of support vectors;
N_f : Number of selected features;
Acc: Accuracy in percentage (on test set);
Params: Number of parameters in the model;
N_ITER: Number of optimization iterations of the random forest classifier;
#Train Epochs: Number of training set epochs required for the DL models;
#Train Samples: Number of samples in the training set of corresponding size;
N_MFLOP: Number of millions of FLOPs required to infer the SVC;
Mem Req: Approximate amount of memory required to host the model in kB;
N_MFLOP/s Real-Time: Approximated number of MFLOPs per second required to infer each model in a real-time fault detection implementation using the same sample rate (10kHz) as the LU dataset.

4.2.1. Performance

The best performing model on all datasets was the Spatial-Channel Multi-Scale CNN with an overall accuracy of 99.3 percent on the CWRU dataset, 85.0 percent on the Grouped class configuration test set of the LU dataset, and 82.2 percent on the Fully Separated class configuration test set of the LU dataset. The near 20 percent misclassified LU samples related primarily to the gear faults. The CNN model did similarly to the MSCNN on the Grouped class configuration of the LU dataset (within 1 percent). However, it misclassified nearly 15 percent and 21 percent more samples on the CWRU and Fully Separated class configuration of the LU dataset than the MSCNN, respectively.

Both SVC implementations did similarly on both datasets with overall accuracies nearing 72 percent on the CWRU dataset and 31 percent on the Grouped class configuration of the LU dataset. These results demonstrate that the SVM could be used on a simpler application, but anything more than steady-state machinery with an array of fault classes similar to the LU dataset would not result in a favourable implementation. The random forest classifier, consisting of 500 trees, performed comparably to the deep learning models on the CWRU dataset and LU validation set. However, when tested on the LU test set, which featured a different speed profile from the training set, it slightly outperformed the SVC models, achieving an overall accuracy of 34.7 percent.

4.2.2. Model Size and Computational Demands

The largest model in each implementation was the random forest classifier. The smallest random forest classifier implementation required 28.4 percent more memory allocation for parameters than the largest of the DL models. However, its approximated value for the number of MFLOPs required to infer it was significantly lower than the DL implementations at 0.64 MFLOPs/s in a theoretical real-time application. Despite their poorer results in classification accuracy, the SVM implementations had the lowest computational requirement in terms of FLOPs required to infer them. A computing source capable of 140,000 FLOPs per second would be needed to process the best performing SVM on the LU dataset in a real-time detection application. This is nearly 1,500 times less than the best performing model; the spatial-channel multi-scale CNN. Additionally, the largest of the best performing SVCs on the CWRU dataset required roughly 64.9kB in memory to host it whereas the best performing random forest classifier required 834.3kB. The random forest classifier implementation scored well on the validation set from the CWRU dataset and the LU dataset, however when faced with the variation in speed profile between the LU training and test sets, it performed as poorly as the other ML methods (near 30 percent overall classification accuracy). The smallest random forest classifier implementation required 28.4 percent more memory allocation for parameters than the largest of the DL models.

4.2.3. Edge Computing Considerations

To help understand the computational requirements for inferring each model featured in this research, the specifications and costs of an Arduino Mega 2560 REV3, a Raspberry Pi 5, and a Jetson Nano are considered. It is important to note that these embedded devices may not meet the safety or weatherproofing standards necessary for many industrial environments. They were selected to illustrate potential capabilities rather than represent optimal use cases. An Arduino Mega would not be a practical option for data-driven embedded applications and is only listed to demonstrate the computational capabilities of a common, low-cost ECU.

The Raspberry Pi 5 and Jetson Nano appear capable of real-time inference for the models considered here, although training deep learning models on these devices would likely be slower and less practical than training on a workstation or GPU-enabled system. These relatively low-cost controllers demonstrate that there is room for model development and scale, further enabling them to predict behaviour in machinery.

Table 6. Potential Edge-Computing Devices (Theoretical).

Specification	Arduino Mega 2560	Raspberry Pi 5	Jetson Nano
Price & Source	$66.40 CAD on store-usa.arduino.cc	$112.00 CAD on PiShop.ca	$299 CAD on amazon.ca
Memory	8 kB SRAM, 248kB flash	4 GB or 8 GB	4 GB 64-bit LPDDR4, 1600MHz 25.6 GB/s
Controller/CPU	ATmega2560 – 8-bit AVR® RISC-based – 16MHz	Broadcom BCM2712 2.4GHz quad-core 64-bit Arm Cortex-A76 CPU	Nvidia Maxwell architecture with 128 NVIDIA CUDA® cores
Compute Performance	160 kFLOP/s (Assuming 1/100 FLOP per cycle)	9.6 GFLOP/s (Assuming 4 FLOP per cycle)	472 GFLOP/s (as claimed by Nvidia) [85]
Storage	-	128 GB MicroSD	16 GB eMMC 5.1
Additional Features	-	-	High-rate communication and GPIO pins

5. Conclusions

The results of this work demonstrate that, with further application-specific model tuning, the traditional techniques studied could be suitably applied to a system with complexity similar to that found in the CWRU dataset. However, they had significantly greater difficulty classifying the vibration signatures from the unseen speed profile in the LU dataset compared to the DL models. The novel DL technique (MSCNN) outperformed every other model on each dataset configuration. The CNN implementation performed similarly to the MSCNN on the grouped class configuration but could not distinguish one bearing fault type from another within the CWRU dataset and struggled with the greater number of classes in the fully separated class configuration of the LU dataset.

Computationally, the approximated memory and processing power required to infer the ML implementations (aside from the random forest classifier in terms of memory), were significantly less than would be needed to host the DL models. However, it is also demonstrated that the DL models are within reasonable bounds for processing on relatively low-cost, accessible embedded devices. The MSCNN requires more memory usage than the CNN but requires fewer floating-point operations and therefore less processing resources, to infer, making a case that the MSCNN is a better choice for both performance and computational demand.

In conclusion, these results suggest that the further development of DL models for condition monitoring and fault detection applications has merit. However, further investment into experimental dataset generation should be considered to refine the selection criteria.

References

Jardine, A.K.S.; Lin, D.; Banjevic, D. A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mech. Syst. Signal Process 2006, vol 20(no. 7), 1483–1510. [Google Scholar] [CrossRef]
Tiboni, M.; Remino, C.; Bussola, C.; Amici, C. A Review on Vibration-Based Condition Monitoring of Rotating Machinery. Appl. Sci. 2022, vol. 12(no. 3), 972. [Google Scholar] [CrossRef]
Wang, W.; Taylor, J.; Rees, R.J. Recent Advancement of Deep Learning Applications to Machine Condition Monitoring Part 2: Supplement Views and a Case Study. Acoust. Aust. 2021, vol. 49(no. 2), 221–228. [Google Scholar] [CrossRef]
Wang, W.; Taylor, J.; Rees, R.J. Recent Advancement of Deep Learning Applications to Machine Condition Monitoring Part 1: A Critical Review. Acoust. Aust. 2021, vol. 49(no. 2), 207–219. [Google Scholar] [CrossRef]
Wang, F.; Wang, K. Intelligent condition monitoring and diagnosis system: a computational intelligence approach. In Frontiers in artificial intelligence and applications; IOS Press: Amsterdam/Berlin, 2003; vol 93. [Google Scholar]
Case Western Reserve University, Case Western Reserve University Bearing Data. Available online: https://engineering.case.edu/bearingdatacenter (accessed on 30 April 2024).
Lessmeier, C.; Kimotho, J.K.; Zimmer, D.; Sextro, W. Condition Monitoring of Bearing Damage in Electromechanical Drive Systems by Using Motor Current Signals of Electric Motors: A Benchmark Data Set for Data-Driven Classification. PHM Soc. Eur. Conf. 2016, vol. 3(no. 1). [Google Scholar] [CrossRef]
Bechhoefer, E. Condition Based Maintenance Fault Database for Testing of Diagnostic and Prognostics Algorithms. Available online: https://www.mfpt.org/fault-data-sets/.
Gangsar, P.; Tiwari, R. A support vector machine based fault diagnostics of Induction motors for practical situation of multi-sensor limited data case. Measurement 2019, vol. 135, 694–711. [Google Scholar] [CrossRef]
Rojas, A.; Nandi, A.K. Detection and Classification of Rolling-Element Bearing Faults using Support Vector Machines. 2005 IEEE Workshop on Machine Learning for Signal Processing, Mystic, CT, USA; pp. 153–158.
Samanta, B.; Al-Balushi, K.R. Artificial neural network based fault diagnostics of rolling element bearings using time-domain features. Mech. Syst. Signal Process 2003, vol. 17(no. 2), 317–328. [Google Scholar] [CrossRef]
Jana, D.; Patil, J.; Herkal, S.; Nagarajaiah, S.; Duenas-Osorio, L. CNN and Convolutional Autoencoder (CAE) based real-time sensor fault detection, localization, and correction. Mech. Syst. Signal Process 2022, vol. 169, 108723. [Google Scholar] [CrossRef]
Arellano-Espitia, F.; Delgado-Prieto, M.; Martinez-Viol, V.; Saucedo-Dorantes, J.J.; Osornio-Rios, R.A. Deep-Learning-Based Methodology for Fault Diagnosis in Electromechanical Systems. Sensors 2020, vol. 20(no. 14), 3949. [Google Scholar]
Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Trans. Neural Netw. Learn. Syst. 2022, vol. 33(no. 12), 6999–7019. [Google Scholar] [CrossRef]
Zhang, S.; Zhang, S.; Wang, B.; Habetler, T.G. Deep Learning Algorithms for Bearing Fault Diagnostics—A Comprehensive Review. IEEE Access 2020, vol. 8, 29857–29881. [Google Scholar] [CrossRef]
Chen, C.C.; Liu, Z.; Yang, G.; Wu, C.C.; Ye, Q. An Improved Fault Diagnosis Using 1D-Convolutional Neural Network Model. Electronics 2020, vol. 10(no. 1), 59. [Google Scholar]
Huang, T.; Fu, S.; Feng, H.; Kuang, J. Bearing Fault Diagnosis Based on Shallow Multi-Scale Convolutional Neural Network with Attention. Energies 2019, vol. 12(no. 20), 3937. [Google Scholar]
Kim, Y.; Na, K.; Youn, B.D. A health-adaptive time-scale representation (HTSR) embedded convolutional neural network for gearbox fault diagnostics. Mech. Syst. Signal Process. 2022, vol. 167, 108575. [Google Scholar] [CrossRef]
He, J.; Wu, P.; Tong, Y.; Zhang, X.; Lei, M.; Gao, J. Bearing Fault Diagnosis via Improved One-Dimensional Multi-Scale Dilated CNN. Sensors 2021, vol. 21(no. 21), 7319. [Google Scholar]
SpectraQuest Machinery Fault Simulator. Available online: https://spectraquest.com/machinery-fault-simulator/details/mfs/.
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. Proc. European Conference on Computer Vision (ECCV), 2018; pp. 3–19. [Google Scholar]
Pedregosa, F. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, vol. 12, 2825–2830. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv 2015. [Google Scholar] [CrossRef]

Figure 1. Sample of the Square Wave Speed Profile.

Figure 2. Sample of the Sawtooth Wave Speed Profile.

Figure 3. SVM Model and Feature Selection Scheme.

Figure 4. Multi-Scale Convolutional Neural Network Architecture.

Figure 5. Best Model Accuracies on the Validation Set from the CWRU Dataset.

Figure 6. Best Model Accuracies on the Validation Set (Square Wave Speed Profile) from the LU Dataset, Fully Separated Class Configuration.

Figure 7. Best Model Accuracies on the Test Set (Sawtooth Wave Speed Profile) from the LU Dataset, Fully Separated Class Configuration.

Table 1. List of Experiment Files.

Fault Type	Location	Control Signal	Collection Time (s)	Exp. No.
Healthy 1	NA	Sawtooth	1200	0
Healthy 1	NA	Square	1200	1
Bearing – Inner Race	Motor End	Sawtooth	1200	2
Bearing – Inner Race	Motor End	Square	1200	3
Bearing – Inner Race	Pulley End	Sawtooth	1200	4
Bearing – Inner Race	Pulley End	Square	1200	5
Bearing – Outer Race	Motor End	Sawtooth	1200	6
Bearing – Outer Race	Motor End	Square	1200	7
Bearing – Outer Race	Pulley End	Sawtooth	1200	8
Bearing – Outer Race	Pulley End	Square	1200	9
Bearing – Ball	Motor End	Sawtooth	1200	10
Bearing – Ball	Motor End	Square	1200	11
Bearing – Ball	Pulley End	Sawtooth	1200	12
Bearing – Ball	Pulley End	Square	1200	13
Bearing – Combination	Motor End	Sawtooth	1200	14
Bearing – Combination	Motor End	Square	1200	15
Bearing – Combination	Pulley End	Sawtooth	1200	16
Bearing – Combination	Pulley End	Square	1200	17
Shaft – Cracked	NA	Sawtooth	1200	18
Shaft – Cracked	NA	Square	1200	19
Shaft – Centrally Bent	NA	Sawtooth	1200	20
Shaft – Centrally Bent	NA	Square	1200	21
Shaft – Coupling End Bent	NA	Sawtooth	1200	22
Shaft – Coupling End Bent	NA	Square	1200	23
Shaft – Hub Repair	NA	Sawtooth	1200	24
Shaft – Hub Repair	NA	Square	1200	25
Shaft Imbalance 1	NA	Sawtooth	1200	26
Shaft Imbalance 1	NA	Square	1200	27
Shaft Imbalance 1 + Bearing Inner Race	Motor End	Sawtooth	1200	28
Shaft Imbalance 1 + Bearing Inner Race	Motor End	Square	1200	29
Shaft Imbalance 1 + Bearing Inner Race	Pulley End	Sawtooth	1200	30
Shaft Imbalance 1 + Bearing Inner Race	Pulley End	Square	1200	31
Shaft Imbalance 2	NA	Sawtooth	1200	32
Shaft Imbalance 2	NA	Square	1200	33
Shaft Imbalance 2 + Bearing Outer Race	Motor End	Sawtooth	1200	34
Shaft Imbalance 2 + Bearing Outer Race	Motor End	Square	1200	35
Shaft Imbalance 2 + Bearing Outer Race	Pulley End	Sawtooth	1200	36
Shaft Imbalance 2 + Bearing Outer Race	Pulley End	Square	1200	37
Shaft Misalignment 1 – Parallel	NA	Sawtooth	1200	38
Shaft Misalignment 1 – Parallel	NA	Square	1200	39
Shaft Misalign 1 – Parallel + Bearing Inner Race	Motor End	Sawtooth	1200	40
Shaft Misalign 1 – Parallel + Bearing Inner Race	Motor End	Square	1200	41
Shaft Misalign 1 – Parallel + Bearing Inner Race	Pulley End	Sawtooth	1200	42
Shaft Misalign 1 – Parallel + Bearing Inner Race	Pulley End	Square	1200	43
Shaft Misalignment 2 – Angular	NA	Sawtooth	1200	44
Shaft Misalignment 2 – Angular	NA	Square	1200	45
Shaft Misalign 2 – Angular +Bearing Outer Race	Motor End	Sawtooth	1200	46
Shaft Misalign 2 – Angular +Bearing Outer Race	Motor End	Square	1200	47
Shaft Misalign 2 – Angular +Bearing Outer Race	Pulley End	Sawtooth	1200	48
Shaft Misalign 2 – Angular +Bearing Outer Race	Pulley End	Square	1200	49
Gear - Chipped Tooth	NA	Sawtooth	1200	50
Gear - Chipped Tooth	NA	Square	1200	51
Gear – Missing Tooth	NA	Sawtooth	1200	52
Gear – Missing Tooth	NA	Square	1200	53
Healthy 2	NA	Sawtooth	1200	54
Healthy 2	NA	Square	1200	55

Table 2. List of Experiment Files.

LU Full Class Separation	LU Grouped Classes
Healthy	Healthy
Bearing – Inner Race – Motor End	Bearing –Motor End
Bearing – Inner Race – Pulley End	Bearing –Pulley End
Bearing – Outer Race – Motor End	Shaft - Cracked
Bearing – Outer Race – Pulley End	Shaft - Bent
Bearing – Ball – Motor End	Shaft – Hub Repair
Bearing – Ball – Pulley End	Shaft - Imbalance
Bearing – Combination – Motor End	Shaft – Imbalance with Bearing – Motor End
Bearing – Combination – Pulley End	Shaft – Imbalance with Bearing – Pulley End
Shaft – Cracked	Shaft - Misalignment
Shaft – Centrally Bent	Shaft – Misalignment with Bearing – Motor End
Shaft – Coupling End Bent	Shaft – Misalignment with Bearing – Pulley End
Shaft – Hub Repair	Gear
Shaft – Imbalance 1
Shaft – Imbalance 1 with Bearing – Inner Race – Motor End
Shaft – Imbalance 1 with Bearing – Inner Race – Pulley End
Shaft – Imbalance 2
Shaft – Imbalance 2 with Bearing – Outer Race – Motor End
Shaft – Imbalance 2 with Bearing – Outer Race – Pulley End
Shaft – Misalignment 1 - Parallel
Shaft – Misalignment 1 – Parallel with Bearing - Inner Race – Motor End
Shaft – Misalignment 1 – Parallel with Bearing - Inner Race – Pulley End
Shaft – Misalignment 2 – Angular
Shaft – Misalignment 2 – Angular with Bearing - Outer Race – Motor End
Shaft – Misalignment 2 – Angular with Bearing - Outer Race – Pully End
Gear – Chipped Tooth
Gear – Missing Tooth

Table 3. CWRU Dataset Class Configurations.

Class Label	Fault Type	Fault Location
Normal Data	NA	NA
Inner Race, Drive End	Inner Raceway Fault	Drive End of the Motor
Inner Race, Fan End	Inner Raceway Fault	Fan End of the Motor
Outer Race, Drive End	Outer Raceway Fault	Drive End of the Motor
Outer Race, Fan End	Outer Raceway Fault	Fan End of the Motor
Ball, Drive End	Rolling Element Fault	Drive End of the Motor
Ball, Fan End	Rolling Element Fault	Fan End of the Motor

Table 4. Computational Requirements of the Selected Machine Learning Models.

Model Configuration	N_SV	N_f	Acc	Params	N_ITER	# Train Samples	N_MFLOP	Mem Req	N_MFLOP/s Real-Time
CWRU-Default SVC-4000 Raw	1845	2888	73.0	16626	5652	2888	0.0387	64.9	0.12
CWRU-Random Forest-4000 Raw	NA	2888	91.6	213582	NA	2888	0.2144	834.3	0.64
CWRU-Selected SVC-4000 Raw	1762	2888	71.8	15879	27391	2888	0.0370	62.0	0.11
LU-Grouped-Default SVC-1000 Raw	11684	67200	30.0	81866	53338	67200	0.2220	319.8	2.22
LU-Grouped-Random Forest-4000 Raw	NA	16800	34.7	600602	NA	16800	0.5985	2346.1	1.50
LU-Grouped-Selected SVC-4000 Raw	2808	16800	32.0	22542	134847	16800	0.0562	88.1	0.14

Table 5. Computational Requirements of the Selected Deep Learning Models.

Model Configuration	Acc	Params	#Train Epochs	#Train Samples	N_MFLOP	Mem Req	N_MFLOP/s Real-Time
CWRU-CNN-1000 Standardized	85.9	80103	21	11572	36.03	312.9	432.36
CWRU-SpatialChannelMSCNN-1000 Standardized	99.3	87406	18	11572	20.52	341.4	246.24
LU-Full-CNN-1000 Standardized	61.8	84443	17	720000	36.03	329.9	360.3
LU-Full-SpatialChannelMSCNN-1000 Raw	82.2	166326	18	720000	20.6	649.7	206
LU-Grouped-CNN-1000 Standardized	84.3	81405	28	336000	36.03	318.0	360.3
LU-Grouped-SpatialChannelMSCNN-1000 Standardized	85.0	110830	23	336000	20.54	432.9	205.4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.