Non-Stationary Airborne Acoustic Emission Analysis for CNC Drill Wear Classification Using Synchrosqueezed Wavelet Representation and Vision Transformer

Abubacker KM; Amuthakkannan Rajakannu; Jacob Wekalao; Mammar Al Tobi; S Vishnupriyan

doi:10.20944/preprints202605.0065.v1

Submitted:

02 May 2026

Posted:

04 May 2026

You are already at the latest version

Abstract

Drill bits can be one of the toughest components to maintain when working with CNC systems because of their unique geometries and slow wear of the tools themselves. When measuring wear on drill bits, it’s important to consider the impact tool wear can have on the drill's accuracy, the smoothness of the surfaces created, and the overall efficiency of the machining process. The wear of drill bits is a common occurrence and a normal part of the machining process. This paper seeks to address these challenges by implementing a classification framework for tool wear in CNC drill bits that utilises the Synchrosqueezed Wavelet Transform (SSWT) and the Vision Transformer (ViT). During controlled drilling experiments, Acoustic Emission (AE) signals were captured for each of the following tool conditions: Healthy Tool (HT), Low Wear (LW), Medium Wear (MW), and Severe Wear (SW). In this study, the wear of drill bits was measured and created artificially, with Electrochemical Machining (ECM) for drill bits of sizes 3.0 mm, 3.2 mm, 3.4 mm, 3.6 mm, and 3.8 mm. A system by National Instruments (NI) was used for data acquisition, and LabVIEW was used to acquire a set of data with high resolution and time-frequency representation developed with the SSWT method, which is designed for drill bit wear measurement. These features were captured in the SSWT time-frequency maps, which were used as input to a Vision Transformer that enables efficient capture of global relationships in the time–frequency domain. Unlike traditional convolution-based methods, the proposed transformer-based framework allows for automated multi-domain fusion and feature learning. During experiments with 10-fold cross-validation, the proposed SSWT-ViT framework demonstrated reliable generalisation, strong robustness, and high classification accuracy across varying wear states. Thus, the proposed method is appropriate for intelligent real-time monitoring of CNC drill bit conditions in an industrial setting.

Keywords:

CNC tool wear

;

Synchrosqueezed Wavelet Transform

;

acoustic emission

;

vision transformer

Subject:

Engineering - Control and Systems Engineering

1. Introduction

In recent years, Computerised Numerical Control (CNC) machining has emerged as a foundational manufacturing technology for mass production. This is due to the operational efficiency and high repeatability in yield that CNC machining offers. Primarily, CNC drilling is one of the most critical processes, being used in the aerospace, automotive, electronics, and medical sectors [1]. Drill bit wear plays a crucial role in determining process reliability, productivity, and machining accuracy. Because of reduced mechanical interaction between the workpiece and the cutting tool, the progressive evolution of drill bit wear can result in deterioration of surface integrity, increased cutting forces, severe tool failure, and loss of product quality, with increased process costs.

For example, what, when, and how much to monitor to ensure business consistency, product stability, and profitability are obviously essential questions to answer when determining what to monitor, when to monitor, and how much to monitor. The traditional methods of monitoring cutting tool wear, such as cutting tool wear estimation models, cutting tool wear monitoring, and cutting tool wear roughness measurement models, have several drawbacks. Some of these methods may be disruptive, costly, time-consuming, or require changes to tools. The evaluation of cutting tool wear is often completed after machining is finished, which may result in ineffective solutions to problems that disrupt process efficiency and productivity. Those methods usually do not lead to practical solutions to issues of process efficiency and productivity. For example, a process that employs cutting tool wear monitoring, post-facto evaluation of cutting tool wear, and roughness measurement may lead to detractors, especially because it is disruptive, costly, and time-consuming. Strategies that rely on tool wear monitoring, cutting tool wear evaluation, and roughness measurement can introduce problems that disrupt the process. Consequently, there is a growing interest in monitoring processes and systems that utilise artificial intelligence. Process monitoring systems that use artificial intelligence have the potential to reduce disruptions and increase reliability caused by process monitoring [2]. The use of artificial intelligence in process monitoring and analysis may reduce disruptions in systems that use continuous automatic control. The field of artificial intelligence has focused on using AI to monitor processes and systems to reduce disruptions which will improve the financial management [3]. The field of artificial intelligence has focused on using AI to monitor processes and systems to minimise disruptions. The application of artificial intelligence to process monitoring and control systems has also advanced control systems in the field. computer systems, monitoring Data acquisition and control processes are very important skill which is required in the modern condition monitoring engineers [4]

1.1. Real-Time Monitoring of Tool Wear with Acoustic Emission Signals

Industrial manufacturing suffers from tool wear issues, given their direct relation to the quality of the end product, the performance of the machines in use, and operational efficiency. One of the concerns in tool manufacturing is rapid tool wear. Real-time monitoring of the tool’s wear is now operational, thanks to the use of Acoustic Emission (AE) sensing technology. Drill bits can be one of the toughest components to maintain when working with CNC systems because of their unique geometries and slow wear of the tools themselves. When measuring wear on drill bits, it’s important to consider the impact tool wear can have on the drill’s accuracy, the smoothness of the surfaces created, and the overall efficiency of the machining process. The wear of drill bits is a common occurrence and a normal part of the machining process. AE signals are informative in condition monitoring, as they can capture occurrences of micro-cracking, sliding, and plastic deformation at the tool-workpiece interface. Tool monitoring using AE is, however, challenging, given the nonlinear, non-stationary, and noisy nature of the signals produced. For real-time monitoring of tool condition, AE data are essential and have, in recent years, drawn particular attention to their use in feature extraction. Tool wear is assessed using traditional approaches that include simple time-domain statistical metrics, such as the root-mean-square (RMS) value, kurtosis, skewness, and entropy. These statistical assessments provide real informative insights into wear progression [5]. Apart from that, these statistical assessments are insightful when it comes to analyzing how the wear progresses. With the use of Fourier transforms analysis on the frequency domain can show the presence of leading frequency elements corresponding to different levels of wear [6]. In the time and frequency domains, methods that are based on wavelets, like Wavelet Packet Decomposition (WPD), facilitate the analysis of transients through multiple dimensions.

To improve feature representation while overcoming the challenges posed by traditional handcrafted features, the study utilised the Synchrosqueezed Wavelet Transform (SSWT), which provides high-resolution time–frequency maps for the analysis of AE signals. SSWT represents and captures the subtle patterns while suppressing the effects of tool wear. Vision Transformers (ViT), unlike traditional classifiers that depend on manually fused features, learn complex hierarchical features on their own and globally in the time-frequency plane, without handcrafted fusion, wired in the t.

An intelligent tool-wear classification framework for CNC drill bits is proposed in this paper, which utilises AE signals processed with SSWT and analysed with a Vision Transformer. The framework was tested and validated under four tool conditions: Healthy Tool (HT), Low Wear (LW), Medium Wear (MW), and Severe Wear (SW). The method is found to achieve high classification accuracy, low computational cost, and robustness, making it ideal for real-time industrial monitoring of CNC tool condition.

The other sections of the study are structured as follows: Section 2 discusses the existing research on AE-tool wear monitoring, as well as other techniques that are more advanced in the field of feature extraction. Section 3 discusses the research methodology, as well as the specifics of data collection, SSWT feature extraction, and the classification framework. Section 4 discusses the the design of the experiments and the creation of the dataset, while Section 5 discusses and analyzes the results. Section 6 concludes the study and outlines potential areas for research going forward.

1.2. Research Gap in the Existing Work

Collecting multi-sensor data for traditional feature fusion is cumbersome and unfeasible. SSWT, on the other hand, can obtain high-resolution time-frequency characteristics and, therefore, reduces the complexity of data acquisition.
Previous studies have often relied on using one feature and representation; however, SSWT produces rich time-frequency maps spanning complex wear patterns.
Very few studies consider the temporal and spectral dimensions intra-signal; Vision Transformer is capable of learning global patterns within the SSWT maps automatically, bypassing the need for manual feature fusion.
Instance-based lazy classifiers have not been used for CNC drill bit wear, and Vision Transformers offer an end-to-end learning framework that surpasses traditional classifiers.
One of the defining characteristics of deep learning methodologies is the need for large volumes of data; however, the combination of SSWT and ViT mitigates this problem and leads to effective learning, even with moderately sized datasets, as a result of the efficient representation learning.
While most ML/DL methods have difficulties in real-time industrial applications, the SSWT + ViT framework provides a computationally efficient and robust tool wear classification in real-time.

1.3. Novelty in the Proposed Methodology

The main contribution of this paper is to simplify the system design by using a single Acoustic Emission (AE) sensor to capture the entire spectrum of tool wear using intra-signal feature representation. Rather than manually combining features from the time, frequency, and time-frequency domains, the Synchrosqueezed Wavelet Transform (SSWT) provides high-resolution time-frequency maps that contain valuable information about different wear states of a given CNC drill bit. As a single-tool wear-state representation, these maps enhance the distinctiveness of the AE signal, eliminating the need for multiple sensors.

The next novelty is an application of a Vision Transformer (ViT) for end-to-end wear classification. The ViT processes SSWT representations without hand-crafting multi-domain feature fusion or relying on other lazy learners, making it more efficient. This offers a cheaper alternative and a more efficient, deeper solution than traditional deep learning, making it ideal for the more demanding needs of real-time industrial tool condition monitoring.

2.0. Literature Review

Monitoring tool wear is crucial for improving the efficiency of both the product and the process. This is why it has become a hotspot for manufacturing development. The wear detection field has traditionally relied upon direct measurement via optical and scanning electron microscopes, as well as other methods such as profilometry. Unfortunately, such destructive techniques cannot monitor processes in real time and are time-consuming. As a result, in the field of tool condition monitoring, indirect methods, such as tool vibration monitoring, measurement of cutting forces, and tracking temperature and Acoustic Emission (AE), have become the norm [7].

2.1. Acoustic Emission in Tool Wear Monitoring

A material’s deformation and crack propagation generate local energy, emitting AE signals. In machining, AE monitors events associated with the tool and workpiece, such as chip generation, friction, and wear particle removal. As noted in the work of Maia, L et al (2024) , AE features strongly indicate tool wear, which provides a non-invasive diagnostic option [8].

Later studies emphasised the extraction of relevant features from AE signals. Statistical metrics such as RMS, standard deviation, and kurtosis were commonly computed in the Time Domain for ease and interpretability. FFT, RMS, and kurtosis can be calculated in the Frequency Domain and are associated with different wear stages. AE signals, however, are non-stationary. Thus, time–frequency analysis methods such as Wavelet Transform (WT) and Wavelet Packet Decomposition (WPD) were favoured. Liu, J and Wang, Z showed that WPD could better isolate wear-related components in AE signals than FFT, showcasing its effectiveness in sub-band decomposition of AE signals [9].

2.2. Synchrosqueezed Wavelet Representation for Feature Extraction

Prior research has shown that the complex, nonstationary nature of machining signals is more effectively represented when information from multiple feature domains is integrated.

The method proposed by Daubechies, I et al, captures the flavor and the philosophy of the EMD approach, although the component construction is done differently. The method is a hybrid of wavelet analysis and the reallocation method. The authors formulated an unambiguous mathematical definition for a certain class of functions that can be described as a superposition of a relatively small number of approximately harmonic components [10]

Further research confirmed that SSWT methods can increase the enhancement of the diagnosis. Guo, M et al, used the generalized horizontal synchrosqueezing transform to obtain the time-frequency distribution (TFD) to capture the impulse feature of the tool’s non-stationary vibration signal. The result of TFD helps the two-dimensional (2D) Fourier transform to identify the periodic pulses [11]. Subsequently, the periodic frequency point energy proportion factor is proposed to assess the varying degrees of tool wear.

Similarly, Peng, C. explored in their paper, a maximum probability multi-synchrosqueezing transform (MPMSST) , which incorporates distribution probability during the discretization process of the Intantaneous Frequency (IF), taking the frequency corresponding to the maximum distribution probability as the determined value of IF. The singular value decomposition (SVD) method is utilized to extract the principal component of the TFR, and a tool wear state feature index is constructed based on the sum of squares of the first 10 singular values [12]. Shi,J et al. proposed Synchrosqueezed FRWT (SSFRWT), which shares many properties of its SSWT counterpart while offering attractive new features. Then, the authors present a theoretical analysis of the SSFRWT, including the derivation of its basic properties. Moreover, the paper show that the discrete form of the SSFRWT admits efficient numerical implementation akin to that of the SSWT. Finally, the theoretical derivations are validated via simulations. [13].

Current studies have focused on refined time-frequency analysis and deep learning-based approaches. Abdeltawab, A., used hybrid CNN-BiLSTM to improve the accuracy of milling wear predictions. Despite the advancements, the author indicated that the traditional wavelet feature extraction and CNN architectures are unable to address the high-resolution time-frequency patterns, as well as the long-range dependencies, present in AE signals [14]. Because SSWT has improved its analysis and can determine more events related to tool wear, many researchers demonstrates that SSWT is capable of producing scalograms that are compatible with the Vision Transformer (ViT) architecture, which employs self-attention to model the global context and capture long-range dependencies. The SSWT-ViT model simplifies the process of automating tool wear classification by removing the need for manual feature fusion.

2.3. Machine Learning and Tool Wear Classification

Tool wear has been classified using different machine learning classifiers. Support Vector Machines (SVMs) are among the most popular because of their generalisation capabilities and competence with high-dimensional data [15]. Also gaining popularity for feature learning are Artificial Neural Networks (ANNs), intensive learning structures, due to their accuracy. However, their demand for large datasets and computation is a downside [16].

Other previously mentioned classifiers, such as Random Forests (RF) and Gradient Boosting Machines, have also been efficient as they provide interpretability through feature importance ranking [17]. A common drawback of these classifiers is the need for offline training, making them inflexible to changes, whether to new or evolving data.

2.4. Vision Transformer Based Fault Diagnosis

Dong, S et al proposed an adaptive vision transformer (ViT) incorporating a Kolmogorov-Arnold network (KAN) based on a Markov transition field (MTF), MTF-AViTK. The network can perform parallel processing and addressing long-term dependencies, better identifying tool wear states from cutting signals. A signal is converted into a 2D image via 2D time series encoding. The ViT performs token patch position encoding on 2D images and self-attention-based tool wear feature extraction in its encoder blocks. An adaptive multilayer perceptron (AdaptMLP) module is introduced to fine-tune the feature extraction process. The extracted features are mapped by the KAN to identify the tool wear state. [18].

Qiu, J., proposed A tool wear prediction method based on LSTM-Transformer model is proposed for tool wear prediction, which combines the advantages of long short-term memory network (LSTM ) and Transformer model. Firstly, the LSTM network is used to model the tool wear process sequence, and the long-term dependencies in the time series are captured. Then, the Transformer model is introduced to capture the global dependencies in the sequence and improve the parallel computing ability of the model. The experimental results show that the model has achieved excellent performance in the tool wear prediction task. [19]. The synthesis of power spectral density (PSD), convolutional neural networks (CNN), and vision transformers (ViT) that Si, S proposed, is referred to as PSD-CVT. In this design, CNN targets local feature extraction by capturing characterizing local features such as image and texture edges through surface alterations and image shapes. In contrast, the attention mechanism in ViT captures the global structure and long-range dependencies of the image. Tool wear prediction employs two ReLU and fully connected layers. [20].

The methodology Pproposed by Li,S et al the transformation of the cutting force signal during the milling cutter cutting process into a time–frequency image by continuous wavelet transform. This is followed by the introduction of a Contextual Transformer module after layer 1 and a Global Attention Mechanism module after layer 2 in the MobileViT network. [21]. However, the motivation behind this study is the lack of application of SWTT, ViT, and CNC drill bit wear classifications in existing literature.

3.0. Methodology

For tool wear classification of CNC drill bits, the proposed framework involves the acquisition of Acoustic Emission (AE) signals, processing the signals with Synchrosqueezed Wavelet Transform (SSWT) for time-frequency feature extraction, and classifying the features with a Vision Transformer (ViT) in an end-to-end manner. This subsection delves into the constituent parts of the framework and the methodology used.

It also details the sensors used, signal conditioning, data acquisition systems, software tools, and AE signal collection. The framework incorporates SSWT to construct detailed time–frequency images of varying tool wear states, which are processed by the ViT to obtain automated hierarchical feature representations and classification, thereby removing the need for additional manual feature fusion or sensor fusion.

3.1. Dataset Collection

AE signals were generated during the CNC drilling trials, which illustrated the four states of Tool wear: Healthy Tool (HT), Low Wear (LW), Medium Wear (MW), and Severe Wear (SW). A wideband, high-sensitivity piezoelectric acoustic emission (AE) sensor was attached to the side of the CNC machine spindle to capture elasto-wave emissions while drilling.

AE signals were captured at a sampling rate of 2 MHz and a resolution of 16 bits to record fast transient phenomena arising from interactions between the tool and the workpiece. Multiple drilling runs were performed on a standard workpiece (e.g., AISI 1045 steel) for each wear condition, while keeping spindle speed, feed rate, and depth of cut constant to minimise noise from external factors.

The AE signals were split into 100-ms bins and segmented based on tool wear state. Balancing the dataset in this way improved the performance of the supervised machine learning algorithm.

3.2. Feature Extraction

To lower the complexity of the classification, feature extraction is critical. AE signals from the drill bit condition monitoring and their features were used in the time, frequency, and time-frequency domains. This method used the strengths of every representation.

Justification for using SSWT and Processing AE Signals

The Synchrosqueezed Wavelet Transform (SSWT) has been used to analyse AE signals, as it has been shown to handle the nonstationary and transient characteristics of CNC drilling operations best. SSWT can optimise time-frequency analysis, providing the best time-frequency representation and localisation of energy levels across varying tool-workpiece interactions. This means it can capture subtle variations due to changes in a single tool across different wear states, providing a more detailed and differentiated representation of the characteristics of the AE signal than standard wavelet decomposition methods.

3.3. Pre-AE Signal Processing

In this work, AE signals will be preprocessed to increase signal reliability, consistency, and noise suppression before feature extraction. Signal acquisition was performed using NI hardware and LabVIEW software, which provide real-time signal amplification, conditioning, and noise reduction. Preprocessing includes:

Exposure to a band-pass filter set at (100 kHz to 1 MHz) to remove background and mechanical noise.

Standard scale of the signals across different experiments using amplitude normalising.

Uniform temporal resolution using segmentation in fixed-length frames.

Transient spikes and spurious bursts owing to the initial contact of tool and workpiece are minimized by outlier removal and envelope detection.

After this processing, the AE signals are ready for Vision Transformer SSWT time-frequency maps of the highest quality.

3.4. Feature Representation Using SSWT

Traditional methods separate time, frequency, and time-frequency features. They are rarely combined through SSWT. SSWT transforms a time series into a time-frequency representation, capturing the position (time) and the frequency (spectra) of the signal, as well as the amplitude (energy) for the entire time span. Such maps are characterized by the following:

Micro-cracking, friction, and progressive wear are associated with varying energies.

Dominant resonance frequencies are tied to various wear mechanisms.

The temporal dimension of wear phenomena helps the Vision Transformer capture and learn hierarchical global dependencies.

The SSWT transforms each AE signal segment into a 2-D time-frequency map to avoid the manual computation of features such as RMS, peak value, skewness, kurtosis, spectral centroid, dominant frequency, and sub-band energy. The Vision Transformer extracts and fuses relevant features from maps, improving classification accuracy and robustness.

3.5. Improvements over the Traditional WPD-Based Feature Extraction

The SSWT provides finer discrimination of tool wear states due to the sharper energy concentration.

There is no need to merge features across the time, frequency, and time-frequency domains, as a unified feature map is available.

SSWT diminishes the impact of the AE signal spike and background noise, improving noise robustness.

Feasibility of end-to-end learning: SSWT mapping to Vision Transformers no feature engineering, global and local pattern capturing at all scales

The combination of SSWT and Vision Transformers provides the most responsive classifier for rounded wear on CNC drill bits due to the detailed and evaluative efficiency of the classifier at this task.This approach presents an alternative to WPD and hand feature extraction, providing a robust, scalable intelligent tool condition monitor.

3.6. SSWT Maps Normalisation to Fit the Vision Transformer

Within the proposed architecture, an AE signal segment is mapped to a 2D SSWT time-frequency map, which contains time-frequency information. For each SSWT map, the time-frequency information is compressed to the range [0,1] during training to facilitate consistent scaling and stability with the Vision Transformer. This is done using Min-max

normalisation.

where X represents the SSWT coefficients, and X{min} and X{max} are the minimum and maximum values in the SSWT matrix.

Time-frequency representations by SSWT are standardised, so all SSWT maps have the same intensity, improving training stability, model performance, and convergence speed.

The Vision Transformer uses the normalized SSWT maps. This design allows the Vision Transformer to automate feature extraction and classification without the need for manual feature concatenation and multi-domain fusion.

3.7. Classification Using Vision Transformer

The classification task entails mapping each SSWT time-frequency map from a segment of an AE signal to one of four tool wear states: Healthy Tool (HT), Low Wear (LW), Medium Wear (MW), and Severe Wear (SW). Unlike conventional methods in which the feature vector is hand-made and a classifier is not done with much effort, this method utilises a Vision Transformer (ViT) to do hierarchical feature extraction and classification.

The ViT takes each input SSWT map and, for instance, splits it into patches of non-overlapping, pre-defined sizes, then flattens and projects the patches into an embedding space. The resultant embeddings then get processed in the multiple head (or) self -attention layers. This way, the model captures local and global patterns without the need for feature fusion in a handmade way. The model captures local patterns, which are the short-term variations in AE signals and global patterns, which are the long-range temporal and spectral correlations. To predict the class of wear states, a classification token is added to the end of the Transformer layer sequences, where the patch embeddings are.

3.8. Mathematical Formulation of SSWT and Vision Transformer

Mathematical Formulation of SSWT for Tool Wear AE Signals is given as follows.

3.8.1. Acoustic Emission Signal Model

Let the discrete AE signal acquired during drilling be

x (t) = \sum_{k = 1}^{K} A_{k} (t) c o s (ϕ_{k} (t)) + n (t)

where:

$A_{k} (t)$ : instantaneous amplitude (related to wear severity),
$ϕ_{k} (t)$ : instantaneous phase,
$ω_{k} (t) - \frac{d ϕ_{k} (t)}{d t}$ : instantaneous frequency,
$n (t)$ : measurement noise.

Tool wear causes non-stationary, multi-component frequency variations, making classical FFT unsuitable.

3.8.2. Continuous Wavelet Transform (CWT)

The CWT of the AE signal is defined as

W_{x} (a, b) - \frac{1}{\sqrt{a}} \int_{- \infty}^{\infty} x (t) ψ^{*} (\frac{t - b}{a}) d t

where:

$a > 0$ : scale parameter,
$b$ : time shift,
$ψ (t)$ : complex mother wavelet (Morlet commonly used),
( $\cdot)^{*}$ : complex conjugate.

3.8.3. Instantaneous Frequency Estimation

For each time-scale point

(a, b)

, the instantaneous frequency is estimated as

ω_{x} (a, b) - \{\begin{array}{l} \frac{\partial}{\partial b} a r g (W_{x} (a, b)), & W_{x} (a, b) \neq 0 \\ 0, & otherwise \end{array}

This step captures frequency modulation caused by:

micro-crack initiation,
frictional rubbing,
tool edge chipping.

3.8.4. Synchrosqueezing Operation

The SSWT sharpens the time-frequency representation by reallocating wavelet coefficients from the scale domain to the frequency domain:

T_{x} (b, ω) - \int_{a : |ω_{x} (a, b) - ω| < Δ ω} W_{x} (a, b) a^{- 3 / 2} d a

where:

$ω$ : true instantaneous frequency,
$Δ ω$ : frequency resolution threshold.
This reassignment concentrates energy ridges, making wear-related frequency components more separable.

3.8.5. Hilbert-Synchrosqueezed Time-Frequency Energy

The time-frequency energy density used for tool wear assessment:

E (b, ω) - {|T_{x} (b, ω)|}^{2}

Wear progression manifests as:

increased high-frequency energy,
ridge broadening,
energy migration to lower frequencies during severe wear.

1.6 SSWT Image Construction

The normalised SSWT energy map:

I_{S S W T} - \frac{E (b, ω) - E_{m i n}}{E_{m a x} - E_{m i n}}

This 2-D representation serves as the input image for the Vision Transformer.

Mathematical Formulation of Vision Transformer (ViT) for Tool Wear Classification is as follows.

Let the SSWT image be:

I \in R^{H \times W \times C}

where

C - 1

(grayscale) or 3 (RGB).

3.8.6. Patch Embedding

The image is divided into non-overlapping patches of size

P \times P

:

N - \frac{H W}{P^{2}}

Each patch is flattened:

x_{p}^{i} \in R^{I^{2} C}, i - 1, \dots, N

Linear projection:

z_{0}^{i} - x_{p}^{i} E + E_{p o s}

where:

$E \in R^{(P^{2} C)} \times D$ : trainable embedding matrix,
$E_{pos :}$ positional embedding.

A class token

z_{0}^{cls}

is appended:

Z_{0} - [z_{0}^{c l s}, z_{0}^{1}, \dots, z_{0}^{N}]

3.8.7. Multi-Head Self-Attention (MHSA)

For each transformer layer

l

:

Q - Z_{l - 1} W_{Q}, K - Z_{l - 1} W_{K}, V - Z_{l - 1} W_{V}

Scaled dot-product attention:

A t t e n t i o n (Q, K, V) - s o f t m a x (\frac{Q K^{⊤}}{\sqrt{d_{k}}}) V

Multi-head attention:

M H S A (Z) - Concat ({head}_{1}, \dots, {head}_{h}) W_{O}

This enables global time-frequency dependency modeling, critical for wear pattern recognition.

3.8.8. Feed-Forward Network (FFN)

Each attention block is followed by a two-layer FFN:

F F N (x) - σ (x W_{1} + b_{1}) W_{2} + b_{2}

where

σ (\cdot)

is GELU activation.

3.8.9. Transformer Encoder Layer

Z_{l} - F F N (M H S A (Z_{l - 1}) + Z_{l - 1}) + M H S A (Z_{l - 1})

Layer normalization is applied before each sub-block.

3.8.10. Hilbert-Synchrosqueezed Time-Frequency Energy

The time-frequency energy density used for tool wear assessment:

E (b, ω) - {|T_{x} (b, ω)|}^{2}

Wear progression manifests as:

increased high-frequency energy,
ridge broadening,
energy migration to lower frequencies during severe wear.

1.6 SSWT Image Construction

The normalized SSWT energy map:

I_{S S W T} - \frac{E (b, ω) - E_{m i n}}{E_{m a x} - E_{m i n}}

This 2-D representation serves as the input image for the Vision Transformer.

3.9. Advantages over Lazy Classifiers

End-to-End Learning: The ViT provides automated features and eliminates the need for manual feature fusion across the various domains (time, frequency, and time-frequency).

Global Dependency Modeling: Self-Attention helps capture long-range correlations in the SSWT maps which the lazy classifiers are incapable of exploiting.

Robust to Noise: The combination of SSWT preprocessing and ViT attention layers increases the resistance to temporary spikes and background noise in AE signals.

Scalable and Real-Time: The model can perform real-time industrial deployment after being trained. This differs from distance-based lazy learners, who store and compare all training instances.

4.0. Experimental Setup

This chapter outlines the specific experiments, including workflows, data preparation, and the metrics used in evaluating the proposed CNC drill bit tool-wear classification framework.

4.1. CNC Drilling Experiments and Collection of AE Data

The experiments make use of a CNC drilling machine where a specific model of ultrasonic AE sensor (XYZ-1000, with a frequency response of 100 kHz–1 MHz) has been mounted. The sensor has been fixed with a magnet adapter directed towards the spindle, providing consistent mechanical coupling and optimal transmission of the drilling signal. Table 1 provides the specifications for the CNC drilling machine.

The dimensions of the workpiece material were 100 mm × 50 mm × 20 mm and the specific workpiece material used was AISI 1045 medium carbon steel. The drills employed were High Speed Steel (HSS) drill bits, and these were in the range of 3 mm to 3.8 mm. The parameters used for the drills were:

• Vertical spindle speed: 1200 RPM

• Feed rate: 0.1 mm/rev

• Depth of cut: 10 mm

The following conditions, as per the industry standard, were used to create states of tool wear using the Electrochemical machining drill bits:

• Healthy Tool (HT) – New or as good as new drill bits.

• Low Wear (LW) – A wear of 0.3 mm has been implemented

• Medium Wear (MW) – A wear of 0.6 mm has been implemented

• Severe Wear (SW) – A wear of 0.9 mm has been implemented

A drilling operation was performed 20 times for each category, including drilling passes for healthy, low, medium, and severe wear conditions.

The AE signal for each drilling pass was recorded for 2 MHz with 16-bit resolution. The total data collection includes. The data collection for 3 mm falls into 4 categories (healthy, low, medium, and severe) and resulted in 4x50 = 200 data sets for 3 mm. In the same manner, data sets were collected for 3.2 mm, 3.4 mm, 3.6 mm, and 3.8 mm. Therefore, a total of 200 x 5 = 1,000 data sets were collected for our analysis. Table 2 provides the data collection breakdown.

4.2. Data Labelling and Segmentation

AE signal data collection occurred over 100ms and was done in non-overlapping segments, Each segment was identified to one of the four tool wear categories, and the datasets contained 1000 (250 for each wear class)

4.3. Feature Extraction and Fusion

As described in section 3, the SSWT was used for the analysis of each of the AE signal segments. SSWT is different from other feature extraction methods, as it consolidates all of the temporal and spectral information into one 2D feature map and thus does not require multi-domain feature extraction. SSWT maps were subject to Min-Max scaling within the range of [0,1] prior to being fed to the Vision Transformer to facilitate consistent intensity values and stable model training.

Figure 1. Schematic diagram of the experimental setup and the proposed methodology.

4.4. Classification and Validation

Tool wear states were classified using the Vision Transformer (ViT), which classifies the 2D feature maps and performs end-to-end feature extraction. The model was built in Python using the PyTorch library.

For every performance evaluation, the process involves cross-validation in ten folds. Each of these folds contains ten segments of the SSWT maps, of which 90 percent is used to train the ViT, and the remaining 10 percent is utilized to test the model. This is to ensure that all segments are encompassed to fully train and validate the model in order to improve the accuracy of model performance predictions.

A grid search method was employed to optimize the model hyperparameters, such as the number of transformer layers, embedding dimension, and learning rate, to maximize classification performance.

4.5. Performance Metrics

The assessment of the Vision Transformer model’s performance was done using the same performance metrics. Accuracy is the number of correctly labelled AE segments to the total number of segments.

Precision is the correctly labelled AE segments to the total number of predicted AE segments for each segment class.

Recall, or sensitivity, is the number of correctly labelled AE segments to total number of AE segments for each segment class.

The F1 score is the overall precision and recall and defines the number of false positives and false negatives.

Cohen’s Kappa score quantifies the number of predicted labels and ground truth labels that are the same and adjusts for the number of predicted labels that are the same [22] .

Furthermore, the Confusion Matrices were evaluated in order to recognize the trends of misclassifications in different classes. This way, the evaluative potential of the SSWT representations, as well as the effectiveness of the Vision Transformer to differentiate among the states of the Healthy Tool (HT), Low Wear (LW), Medium Wear (MW), and Severe Wear (SW), was assessed.

Figure 2. Signal collection from the CNC drilling machine.

5.0. Results and Discussion

In this part, the performance of classification of the proposed SSWT + Vision Transformer (ViT) framework for monitoring the wear of CNC drill bit tool is discussed. This includes quantitative evaluations, analyses of confusion matrices, and a discussion on the performance of different classes.

A total of 1,000 in the dataset, which is summarized in Table 2, consists of AE signal segments recorded for 4 different wear states (HT, LW, MW, SW) and a total of 5 different drill diameters. Each of these segments was converted to an SSWT time-frequency map that was normalized, and the obtained time-frequency maps were used as inputs to the Vision Transformer.

5.1. Classification Metrics and Accuracy

The performance of Vision Transformer was consistent in classifying the various tool wear states. Table 3 illustrates the average classification metrics obtained through 10-fold cross validation.

An overall tool wear classification accuracy of 99.3% was achieved by the model, which indicates that the model possesses reliability and generalizes well across all the wear states. The metrics Cohen’s kappa and the F1-score, indicate that there is an agreement between the predicted and actual wear levels of the tool which is in fact an indication of the SSWT representations potent to capture the AE signal class.

5.2. Confusion Matrix Analysis

As presented in Figure 3, the Vision Transformer normalized confusion matrix, and the class-wise prediction performance is summarized in Table 4.

Adjacent wear state bridges (LW ↔ MW, MW ↔ SW) showed a high volume of miscalculations. This is linked to the closer AE signal differences in the intermediate wear stages. Overall performance of the ViT confusion matrix indicates it is able to discriminate exceptionally.

5.3. Feature Representation and Analysis

The Vision Transformer, does not have to engage in manual feature engineering since it learns from SSWT maps. However, the exploratory analysis with Principal Component Analysis (PCA) on the SSWT representations, revealed that feature components which concentrated on certain regions within the frequency spectrum as well as certain time windows with transient AE signals were the most useful in defining the classes.

This confirms that the SSWT-based time–frequency representation effectively encodes tool-wear characteristics, thereby supporting the ViT in robust classification.

5.4. Analysis of Confusion Matrices

The normalised confusion matrix for the SSWT + Vision Transformer is presented in Table 4. Each SSWT map was classified into one of the four wear states (HT, LW, MW, SW), and the percentages indicate correct and incorrect predictions for each class.

Table 4. Normalised Confusion Matrix for SSWT + Vision Transformer.

Actual/Predicted	HT	LW	MW	SW
HT	99.4%	0.4%	0.1%	0.1%
LW	0.3%	99.0%	0.6%	0.1%
MW	0.1%	0.5%	98.8%	0.6%
SW	0.1%	0.1%	0.5%	98.7%

The Vision Transformer demonstrates excellent discrimination across all wear states. Most misclassifications occur between adjacent wear states (LW ↔ MW or MW ↔ SW), likely reflecting subtle variations in the AE signal patterns during intermediate wear stages.

In contrast to conventional classifiers, the Vision Transformer, for instance, analyzes SSWT maps to learn hierarchical features instead of relying on hand-crafted time, frequency, or sub-band features. These representations learn the most relevant features pertaining to tool wear, which include the short bursts of energy and time-frequency shifts of AE signals.

Exploratory analysis using Principal Component Analysis (PCA) on SSWT maps shows that energy concentration in specific time-frequency regions contributes most to the separation of wear classes. This is consistent with previous research that tool wear is attributed to the AE signal dynamics—amplitude variations, bursts, and localized transients.

In summary, the SSWT + Vision Transformer framework simplifies the process by eliminating manual feature extraction, while also providing robust, high-fidelity CNC drill bit wear classification.

5.5. Cross-Validation of the Proposed Methodology

In order to enhance the generalizability and reliability of the obtained results, we opted for 10- fold cross-validation. In this approach, we first randomly divided the full dataset into 10 equal-sized folds. In each iteration, we trained the model using 9 of the folds and used the left-out fold for testing. This resulted in each fold being utilized once as the validation subset over the course of 10 iterations. After completing this, we computed the mean performance for all of the folds. The overall cross-validation accuracy AccCV can be computed as Preprints 211478 i002

where k equals 10, representing the number of folds and Acci denotes the accuracy of the model at the ith fold.

This approach manages to keep a reasonable computational cost while performing reliably and steadily, minimising bias and variance for a single training-test split. This approach was also applied across the various wear categories, considering healthy tools, to assess robustness. This stratification ensured that the training and testing datasets were kept in proportion, which adds to the reliability of performance assessment across the different classifiers.

5.6. Comparison with Existing Methods

When compared with traditional methods, machine learning methods featuring fusion and lazy classifiers outperform most attempts reported in the literature. The literature reports that SVM methods consistently and reliably achieve 95-97\% accuracy on comparable datasets, but they spend far too much time and money on computation. Models based on deep learning may achieve higher accuracy, but the time and money required to train them are enormous.

The suggested method achieves high accuracy, interpretability, and efficiency, making it ideal for real-time tool-condition monitoring in industrial CNC environments. The prediction accuracy of the proposed classification is compared with the classification in Table 4.

5.7. Limitations and Future Considerations

The SSWT + Vision Transformer model performs very well. Yet this study utilised data sets created under controlled conditions in drilling. Other conditions, such as varying cutting parameters, tool geometries, material compositions, background noise, and industrial noise, were unaccounted for, which may limit the model’s performance in real-world situations. The security of the data transmission is not considered in the proposed work. This can be solved using the federate learning approach. [23]

Table 5. Comparison of the proposed Methodology with previous research works using ML and DL.

Reference	Methodology used/Core application	Obtained Accuracy
Truong et al. [24]	BRANN (Bayesian-regularised ANN)	73.33%
Gougam et al. [25]	Hybrid CNN–ResNet–BiLSTM	98%
Kumar et al. [26]	Encoder–Decoder LSTM	94.20%
Kumar et al. [26]	Hybrid LSTM	97.85%
Bilgili et al. [27]	LSTM (with industrial edge data)	98%
Zhang et al. [28]	ResNet	97.7%
Hoang et al. [29]	Gaussian Process Regression + ANFIS	97.57%
Proposed methodology	Synchrosqueezed Wavelet Representation and Vision Transformer	99.3%

The following is suggested as the primary focus of additional investigations in the future:

Evaluating the framework under varying tool shapes, compositions, and other characteristics of machining processes [30].

Adding methods of adaptive online learning to the Vision Transformer for use during real-time updating as new wear patterns are detected using a reliable image based analysis[31].

Applying the framework to more extensive data sets, which would more fully address the potential of deep learning systems such as Vision Transformers to enhance function in complex and variable industrial environments [32].

Using ensemble and hybrid methods that integrate SSWT, ViT, and other time—frequency deep learning approaches to improve robustness and reliability.

5.8. Practical Application of the Proposed Methodology

The automation, predictive maintenance, and quality assurance aspects of smart manufacturing today make this methodology extremely applicable [28]. Other advantages include:

Implementation of a Single Sensor: The need for only one AE signal means no additional sensors, making the system less invasive and more affordable.

Monitoring in real time: The SSWT time–frequency representations, when used in conjunction with the Vision Transformer, enable rapid inference and real-time adaptive decisions, which are valuable in fast-changing manufacturing environments. The industrial Internet of Things means that the framework can be used at the edge of cyber-physical production systems, supporting intelligent monitoring as part of Industry 4.0 [33].

Industry 4.0 techniques: Machine learning condition monitoring can be integrated with Digital Twin and Federated Learning methods [31] to improve transparency and data security in predicting the condition of CNC machine tools.

The system’s predictive capabilities are optimal because it can identify the progression of wear on the tooling, thereby aiding maintenance scheduling and reducing unplanned downtime.

As part of our evaluation of the SSWT–ViT framework, we considered the costs associated with its computation and its ability to integrate with systems, scale, and operate in real time. The ViT-based model, which can be trained with SSWT time-frequency representations, has low computational resource requirements and is thereby suitable for contemporary industrial PCs. The framework supports real-time tool wear monitoring and can integrate with current CNC monitoring systems using standard data acquisition systems. Also, it can be adjusted to accommodate various drill sizes and machining parameters. The framework has a compact preprocessing and inference pipeline, which enables it to operate with minimal disruption to the production process. All these factors highlight the framework’s suitability for actual use in industrial settings.

5.0. Conclusion

In this work, the Synchrosqueezed Wavelet Transform (SSWT) successfully analyses CNC drilling AE signals, which consist of non-linear and non-stationary components. The SSWT produces a high-quality time-frequency map that can be used as input to the Vision Transformer. This provides richer input to the Vision Transformer and enables hierarchical feature extraction without requiring manual engineering. The SSWT and Vision Transformer frameworks performed best, achieving an accuracy of 99.3%. The model accurately classified the state of the tools into one of the following categories: Healthy Tool (HT), Low Wear (LW), Medium Wear (MW), and Severe Wear (SW), and distinguished and classified several intermediate wear conditions. The results expand on established approaches in non-linear signal analysis like the Ensemble Empirical Mode Decomposition [34], and showcase the applicability of transformer based time-frequency methodologies for fault diagnosis [35]. The framework proved its robustness regardless of drill size and operating conditions, further confirming its applicability to real-time, smart CNC tool monitoring. This work lays the groundwork for automated, scalable predictive maintenance systems, which can enhance the efficiency of machining processes, decrease the downtimes, and prolong the useful life of tools.

In conclusion, the study highlights that parameter optimization plays a crucial role in enhancing the performance and reliability of tool wear condition monitoring systems. The effective tuning of parameters using various optimization approaches, including conventional algorithms as well as advanced Machine Learning and Deep Learning techniques, significantly improves prediction accuracy and decision-making capabilities [36,37,38,39,40,41,42,43]. By leveraging these methods, it becomes possible to better capture complex patterns in sensor data, such as acoustic emission signals, leading to more precise detection and classification of tool wear states. Therefore, the integration of optimized models not only strengthens monitoring efficiency but also contributes to improved productivity, reduced downtime, and the advancement of intelligent manufacturing systems [44,45,46,47,48,49,50].

6.1. Key Conclusions

1. SSWT captures AE signals during wear tool monitoring.

2. Vision transformers do not require handcrafted feature extraction because they learn SSWT maps’ tiered features automatically.

3. This method is adaptable to predictive maintenance and CNC drilling operations and applies to real-time industrial monitoring.

4. Future improvements could include larger datasets, hybrid models, and online learning to enhance adaptability and performance.

5. This research lays the groundwork for establishing AE-based tool wear monitoring and intelligent CNC machining applications aligned with Industry 4.0 and innovative manufacturing initiatives.

Consent for publication

All authors have been informed and have consented to publication.

Competing interests

The authors declare that they have no competing interests.

References

Xiqing, M.; Chuangwen, X. Tool Wear Monitoring of Acoustic Emission Signals from Milling Processes; 2009 First International Workshop on Education Technology and Computer Science: Wuhan, China, 2009; pp. 431–435. [Google Scholar] [CrossRef]
Vijayalakshmi, K. Reliability improvement in component-based software development environment. Int. J. Inf. Syst. Change Manag. 2011, 5(2), 99–123. [Google Scholar] [CrossRef]
Kamarudeen, M.; Vijayalakshmi, K. A machine learning-based financial management mobile application to enhance college students’ financial literacy. Proc. Int. Conf. Res. Educ. Sci. 2023, 9(1), 1237–1253. [Google Scholar]
Davy, R.; Ajitha Priyadarsini, S.; Jeen Robert, R. B.; Kennedy, S. M. Cutting-edge tool wear monitoring in AISI4140 steel hard turning using least-square support vector machine. J. Chin. Inst. Eng. 2024, 47(3), 1–16. [Google Scholar] [CrossRef]
Siddique, M.F.; Umar, M.; Ahmad, W.; et al. Advanced fault diagnosis in milling cutting tools using vision transformers with semi-supervised learning and uncertainty quantification. Sci. Rep. 2025, 15, 42460. [Google Scholar] [CrossRef]
Twardowski, P.; Tabaszewski, M.; Wiciak-Pikuła, M.; Felusiak-Czyryca, A. Identification of tool wear using acoustic emission signal and machine learning methods. Precis. Eng. 2021, 72, 738–744. [Google Scholar] [CrossRef]
Nagaraj, S.; Diaz-Elsayed, N. Tool Condition Monitoring in the Milling of Low- to High-Yield-Strength Materials. Machines 2025, 13, 276. [Google Scholar] [CrossRef]
Maia, L. H. A.; Abrão, A. M.; Vasconcelos, W. L.; Júnior, J. L.; Fernandes, G. H. N.; Machado, Á. R. Enhancing Machining Efficiency: Real-Time Monitoring of Tool Wear with Acoustic Emission and STFT Techniques. Lubricants 2024, 12(11), 380. [Google Scholar] [CrossRef]
Liu, J.; Wang, Z. Hybrid fusion of acoustic emission and vibration features for tool wear diagnosis. Mech. Syst. Signal Process. 2018, vol. 104, 468–482. [Google Scholar]
Daubechies, I.; Lu, J.; Wu, H.-T. Synchrosqueezed wavelet transforms: An empirical mode decomposition-like tool. Appl. Comput. Harmon. Anal. 2011, 30(2), 243–261. [Google Scholar] [CrossRef]
Guo, M.; Tu, X.; Abbas, S.; Zhuo, S.; Li, X. Time-frequency analysis-based impulse feature extraction method for quantitative evaluation of milling tool wear. Struct. Health Monit. 2023, 23(3), 1766–1778. [Google Scholar] [CrossRef]
Peng, C.; Zheng, J.; Chen, T.; Jing, Z.; Wang, Z.; Su, Y.; Shi, Y. Tool wear feature extraction in BTA deep hole drilling process based on maximum probability multi-synchrosqueezing transform of spindle current signal. Measurement 2025, 241, 115780. [Google Scholar] [CrossRef]
Shi, J.; Chen, G.; Zhao, Y.; Tao, R. Synchrosqueezed Fractional Wavelet Transform: A New High-Resolution Time-Frequency Representation. IEEE Trans. Signal Process. 2023, vol. 71, 264–278. [Google Scholar] [CrossRef]
Abdeltawab, A.; Xi, Z.; Longjia, Z.; Galal, A. M. Wavelet-based hybrid CNN-BiLSTM approach in tool wear monitoring. Digit. Signal Process. 2026, 168(Part B), 105529. [Google Scholar] [CrossRef]
Sun, T.; Li, Q.; Chen, Y. Random forests and gradient boosting machines in tool wear prediction. Expert Syst. Appl. 2019, vol. 116, 170–181. [Google Scholar]
Aha, D. W.; Kibler, D.; Albert, M. K. Instance-based learning algorithms. Mach. Learn. 1991, vol. 6(no. 1), 37–66. [Google Scholar] [CrossRef]
Altman, N. S. An introduction to kernel and nearest-neighbour nonparametric regression. Am—Stat 1992, vol. 46(no. 3), 175–185. [Google Scholar] [CrossRef]
Dong, S.; Meng, Y.; Yin, S.; Liu, X. Tool wear state recognition study based on an MTF and a vision transformer with a Kolmogorov–Arnold network. Mech. Syst. Signal Process. 2025, 228, 112473. [Google Scholar] [CrossRef]
Qiu, J.; Liu, J.; Chu, Z.; Gao, Z.; Wu, X. Tool wear prediction based on LSTM-Transformer model. In Association for Computing Machinery; 2025. [Google Scholar] [CrossRef]
Si, S.; Mu, D.; Si, Z. Intelligent tool wear prediction based on deep learning PSD-CVT model. Sci. Rep. 2024, 14, 20754. [Google Scholar] [CrossRef]
Li, S.; Li, M.; Gao, Y. Deep Learning Tool Wear State Identification Method Based on Cutting Force Signal. Sensors 2025, 25(3), 662. [Google Scholar] [CrossRef]
Han, N.; Pei, Y.; Song, Z. Signal Separation Operator Based on Wavelet Transform for Non-Stationary Signal Decomposition. Sensors 2024, 24, 6026. [Google Scholar] [CrossRef]
Vijayalakshmi, K.; Sitharselvam, P. M.; Thamarai, I.; Ashok, J.; Sathish, G.; Mayakannan, S. Secure and private federated learning through encrypted parameter aggregation. In Handbook on federated learning: Advances, applications and opportunities; CRC Press, 2024; pp. 80–105. [Google Scholar] [CrossRef]
Truong, T.T.; Airao, J.; Karras, P.; Hojati, F.; Azarhoushang, B.; Aghababaei, R. Data-driven prediction of tool wear using Bayesian-regularised artificial neural networks. arXiv. 2023. Available online: https://arxiv.org/abs/2311.18620.
Gougam, F.; Afia, A.; Aitchikh, M.A.; Touzout, W.; Rahmoune, C.; Benazzouz, D. Computer numerical control machine tool wear monitoring through a data-driven approach. J. Intell. Manuf. 2024, 35(5), 1471–1485. [Google Scholar] [CrossRef]
Kumar, S.; Kolekar, T.; Kotecha, K.; Patil, S.; Bongale, A. Performance evaluation for tool wear prediction based on Bi-directional, Encoder-Decoder, and Hybrid Long Short-Term Memory models. Int. J. Qual. Reliab. Manag. 2022, 39(7), 1551–1576. [Google Scholar] [CrossRef]
Bilgili, D.; Kecibas, G.; Besirova, C.; Chehrehzad, M.R.; Burun, G.; Pehlivan, T.; Uresin, U.; Emekli, E.; Lazoglu, I. Tool flank wear prediction using high-frequency machine data from an industrial edge device. arXiv 2022, arXiv:2212.13905. [Google Scholar] [CrossRef]
Zhang, S.; Yang, Y.; Xie, Y.; Tang, H.; Li, H.; Yao, L.; Yang, Y. GNSS Signal Extraction Using CEEMDAN–WPD for Deformation Monitoring of Ropeway Pillars. Remote Sens. 2025, 17(2), 224. [Google Scholar] [CrossRef]
Hoang, L.V.; Tran, V.D.; Nguyen, Q.T. Adaptive neuro-fuzzy inference system and Gaussian regression model for tool wear prediction in milling with AE signal features. Eng. Technol. Q. J. 2023, 6(4), 55–65. Available online: https://www.journal.eu-jr.eu/engineering/article/view/2509 (accessed on 7 June 2025).
Umar, M.; Siddique, M.F.; Ullah, N.; Kim, J.-M. Milling Machine Fault Diagnosis Using Acoustic Emission and Hybrid Deep Learning with Feature Optimization. Appl. Sci. 2024, 14, 10404. [Google Scholar] [CrossRef]
Poolakkachalil, T. K.; Chandran, S.; Muralidharan, R.; Vijayalakshmi, K. Comparative analysis of lossless compression techniques in efficient DCT-based image compression system based on Laplacian transparent composite model and an innovative lossless compression method for discrete-color images. In Proceedings of the 3rd MEC International Conference on Big Data and Smart City (ICBDSC 2016); IEEE, 2016; pp. 155–160. [Google Scholar] [CrossRef]
Piankitrungreang, P.; Chaiprabha, K.; Chungsangsatiporn, W.; Ratanasumawong, C.; Chancharoen, P.; Chancharoen, R. Acoustic-Based Machine Main State Monitoring for High-Speed CNC Drilling. Machines 2025, 13, 372. [Google Scholar] [CrossRef]
Jayachandran, J.; Sivakumar, V.; K, V.; et al. Machine Learning-Enhanced MXene–Copper–Graphene THz Sensor for Accurate Salinity Sensing in Environmental Applications. Plasmonics 2025, 20, 11349–11359. [Google Scholar] [CrossRef]
Wu, Z.; Huang, N. E. Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv. Adapt. Data Anal. 2009, 1(1), 1–41. [Google Scholar] [CrossRef]
Orhan, A.; Yordanov, N.; Ertarğın, M.; Zhilevski, M.; Mikhov, M. A Comparative Study of Time–Frequency Representations for Bearing and Rotating Fault Diagnosis Using Vision Transformer. Machines 2025, 13, 737. [Google Scholar] [CrossRef]
Suresh Babu, V.; Amuthakkannan, R.; Sriram Kumar, S.; Muruganandam, A. Optimal cutting parameters estimation to improve surface finish in turning operation in AISI 1045 using Taguchi’s robust design. Int. J. Ind. Syst. Eng. 2013, 15(1), 19–36. [Google Scholar] [CrossRef]
Amuthakkannan, R. Effective software assembly for real-time systems using multi-level genetic algorithm. Int. J. Eng. Sci. Technol. (IJEST) 2011, 3(8), 6187–6201. [Google Scholar]
Karuppasamy, P.; Wekalao, J.; Rajakannu, A. Graphene Terahertz Metasurface Sensor Enabled by AI for Rapid, High-Precision Sperm Detection in Fertility Assessment. Plasmonics 2026, 21, 619–641. [Google Scholar] [CrossRef]
R, V.; Thangavel, G.; Wekalao, J.; et al. Ultra-High Sensitivity Terahertz Detection Using a 2D-Material-Based Metasurface: Design, Tuning, and Machine Learning Validation. Plasmonics 2025, 20, 6139–6150. [Google Scholar] [CrossRef]
Muheki, J.; Elsayed, H.A.; Alfassam, H.E.; et al. Design and Optimization of a Hybrid Graphene–Gold–Silver Terahertz Metasurface Biosensor for High-Sensitivity Sperm Detection with Machine Learning for Behavior Prediction. J. Electron. Mater. 2026, 55, 2348–2371. [Google Scholar] [CrossRef]
Wekalao, Jacob; Elsayed, Hussein A.; Bin-Jumah, May; Alqhtani, Haifa A.; Abukhadra, Mostafa R.; Bellucci, Stefano; Rajakannu, Amuthakkannan; Mehaney, Ahmed. Advanced terahertz-range dopamine detection using a 2D material-based metasurface biosensor. Appl. Opt. 2025, 64, 4625–4638. [Google Scholar] [CrossRef]
Al Saadi, A. G. K.; Amuthakkannan, R. An impact of lean supply chain practices in oil and gas sector in Sultanate of Oman: A case study. J. Propuls. Technol. 2024, 45(1), 4224. [Google Scholar]
Krishnasamy, O.; Thirugapillai, P.; Rajakannu, A.; Selvaraju, M. Optimization and multi-objective analysis of tensile, flexural and impact strength in nano-hybrid bio-composites reinforced with Helicteres isora and Holoptelea integrifolia fibers, and nanographene. Matéria (Rio de Janeiro) 2025, 30. [Google Scholar] [CrossRef]
Elsayed, H. A.; Wekalao, J.; Alqhtani, H. A.; Bin-Jumah, M.; Abukhadra, M. R.; Bellucci, S.; Rajakannu, A.; Mehaney, A. Machine learning-enhanced terahertz plasmonic biosensor based on MXene-gold nanostructures for tuberculosis detection. Sens. Bio-Sens. Res. 2025, 49, 100852. [Google Scholar] [CrossRef]
Aggarwal, K.; Wekalao, J.; Rajakannu, A. A Trimodal 2D Metasurface Biosensor with Bayesian Regression for Ultra-Sensitive Cancer Biomarker Detection. Plasmonics 2025, 20, 5977–5990. [Google Scholar] [CrossRef]
Elsayed, H. A.; Wekalao, J.; Mehaney, A.; Alarifi, N. S.; Abukhadra, M. R.; Bellucci, S.; Hajjiah, A.; Rajakannu, A. Design and performance prediction of a multilayer metamaterial absorber for broadband solar-thermal energy conversion using random forest regression. Case Stud. Therm. Eng. 2025, 74, 106615. [Google Scholar] [CrossRef]
Wekalao, J.; Mehaney, A.; Alarifi, N. S.; Abukhadra, M. R.; Elsayed, H. A.; Rajakannu, A. Advanced THz metasurface biosensor for label-free amino acid detection optimized with stacking ensemble algorithm. Phys. E Low.-Dimens. Syst. Nanostructures 2025, 172, 116287. [Google Scholar] [CrossRef]
Anbazhagan, S.; U, A.K.; Rajakannu, A.; et al. AI-Augmented Terahertz Biosensor with MXene–Graphene Architecture for Sensitive Sperm Concentration Detection. Plasmonics 2025, 20, 10573–10587. [Google Scholar] [CrossRef]
Jim Jose; Amuthakkannan, R. Design, development and analysis of FDM based portable rapid prototyping machine. Int. J. Latest Trends Eng. Technol. (IJLTET) 2014, 4(4), 324–232. [Google Scholar]
Vijayalakshmi, K.; Ramaraj, N.; Amuthakkannan, R.; Kannan, S. M. A new algorithm in assembly for component-based software using dependency chart. Int. J. Inf. Syst. Change Manag. 2007, 2(3), 261–278. [Google Scholar] [CrossRef]

Table 1. Specification of CNC Drilling Machine.

S.No	Description	Dimensions/ Details
1.	Work area	500500150mm (X, Y, Z)
2.	Outer Size	6.46.26.5 Ft (X, Y, Z)
3.	Speed, Power, and Cooling	24,000 RPM 2.2kW ATC, Water-Cooled spindle
4.	Weight on the table	20 Kg
5.	Linear Rail	20mm
6.	Motor	Hybrid Servo Motors
7.	Collet size	ER20
8.	Drilling hits/min	80 hits/min
9.	Resolution µm	50µ, Accuracy: 50µ
10.	Rapid Traverse	7000 mm/min
11.	Machine weight	600KG ex. accessories
12.	Software	Millsoft V1.12
13.	Power supply	220v 50Hz 20A single-phase

Table 2. Data set collection.

Drill bit condition	Hea- lthy	Low	Medi- um	Sev- ere	Total
Drill diameter	Hea- lthy	Low	Medi- um	Sev- ere	Total
3.0 mm	50	50	50	50	200
3.2 mm	50	50	50	50	200
3.4 mm	50	50	50	50	200
3.6 mm	50	50	50	50	200
3.8 mm	50	50	50	50	200
Total	250	250	250	250
Overall datasets					1000

Table 3. Classification performance of SSWT + Vision Transformer.

Metric	Value
Accuracy (%)	99.3
Precision	0.99
Recall	0.99
F1-Score	0.99
Cohen’s Kappa	0.99

Table 4. Normalised Confusion Matrix for SSWT + ViT.

Actual\Predicted	HT	LW	MW	SW
HT	99.4%	0.4%	0.1%	0.1%
LW	0.3%	99.0%	0.6%	0.1%
MW	0.1%	0.5%	98.8%	0.6%
SW	0.1%	0.1%	0.5%	98.7%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Non-Stationary Airborne Acoustic Emission Analysis for CNC Drill Wear Classification Using Synchrosqueezed Wavelet Representation and Vision Transformer

Abstract

Keywords:

Subject:

1. Introduction

1.1. Real-Time Monitoring of Tool Wear with Acoustic Emission Signals

1.2. Research Gap in the Existing Work

1.3. Novelty in the Proposed Methodology

2.0. Literature Review

2.1. Acoustic Emission in Tool Wear Monitoring

2.2. Synchrosqueezed Wavelet Representation for Feature Extraction

2.3. Machine Learning and Tool Wear Classification

2.4. Vision Transformer Based Fault Diagnosis

3.0. Methodology

3.1. Dataset Collection

3.2. Feature Extraction

3.3. Pre-AE Signal Processing

3.4. Feature Representation Using SSWT

3.5. Improvements over the Traditional WPD-Based Feature Extraction

3.6. SSWT Maps Normalisation to Fit the Vision Transformer

3.7. Classification Using Vision Transformer

3.8. Mathematical Formulation of SSWT and Vision Transformer

3.8.1. Acoustic Emission Signal Model

3.8.2. Continuous Wavelet Transform (CWT)

3.8.3. Instantaneous Frequency Estimation

3.8.4. Synchrosqueezing Operation

3.8.5. Hilbert-Synchrosqueezed Time-Frequency Energy

3.8.6. Patch Embedding

3.8.7. Multi-Head Self-Attention (MHSA)

3.8.8. Feed-Forward Network (FFN)

3.8.9. Transformer Encoder Layer

3.8.10. Hilbert-Synchrosqueezed Time-Frequency Energy

3.9. Advantages over Lazy Classifiers

4.0. Experimental Setup

4.1. CNC Drilling Experiments and Collection of AE Data

4.2. Data Labelling and Segmentation

4.3. Feature Extraction and Fusion

4.4. Classification and Validation

4.5. Performance Metrics

5.0. Results and Discussion

5.1. Classification Metrics and Accuracy

5.2. Confusion Matrix Analysis

5.3. Feature Representation and Analysis

5.4. Analysis of Confusion Matrices

5.5. Cross-Validation of the Proposed Methodology

5.6. Comparison with Existing Methods

5.7. Limitations and Future Considerations

5.8. Practical Application of the Proposed Methodology

5.0. Conclusion

6.1. Key Conclusions

Consent for publication

Competing interests

References

MDPI Initiatives

Important Links

Subscribe