In our analysis at the Inspire Institute of Sports (IIS) in Karnataka, India, we studied five elite boxers (three males, two females) weighing 60 – 75 kg. These athletes, trained by top coaches, showcase exceptional skills. Using the MetaMotion IMU sensor by Mbientlab, we captured precise data on their movements and punch orientations. With a 200 Hz sampling frequency, ±16 g accelerometer, and ±2000 deg/s gyroscope, this sensor tracked motion across x, y, z axes in real time as shown in
Figure 2. Its Bluetooth capability enabled data transmission to smartphones and computers. The high-resolution data allowed a detailed examination of punching techniques, ideal for wearable devices, sports equipment, and robotics. The IMU sensor was secured with VelcroTM bands under boxing gloves on both left and right wrists.
2.1. Data Acquisition
The technique was to enable the simultaneous collection of IMU data, as shown in
Figure 3 and video data recording at 60 FPS for labeling and ground truth validation. During the data collection session, eight elite boxers (6 orthodox boxers & 2 southpaw boxers) executed a series of 320 shadowboxing punches, encompassing 14 different punch types each. These encompassed long-range lead jabs to the head, long-range rear jabs to the head, long-range lead jabs to the body, long-range rear jabs to the body, long-range lead hooks to the head, long-range rear hooks to the head, mid-range lead hooks to the head, mid-range rear hooks to the head, mid-range lead steep hooks to the head, mid-range rear steep hooks to the head, mid-range lead hooks to the body, mid-range rear hooks to the body, mid-range lead uppercuts to the head, and mid-range rear uppercuts to the head.
In the preliminary stages of our research, we focused on classifying six distinct punch types, including a category for ’no punch.’ These classifications considered actions of both the lead and rear hand. The punch types examined included maneuvers with the lead hand such as the ’long-range lead jab to the head (jab),’ ’mid-range lead hook to the head (hook),’ and ’mid-range lead uppercut to the head (uppercut).’ For the rear hand actions, we considered the ’no punch’ category alongside the ’long-range rear jab to the head (jab),’ ’mid-range rear hook to the head (hook),’ and ’mid-range rear uppercut to the head (uppercut)’.
2.3. Feature Extraction
In our study, we conducted extensive feature extraction on punch kinematic data, focusing particularly on the accelerometer and gyroscope data’s x, y, and z axes. Remarkably, all punch types consistently concluded within an average duration of 0.9 seconds, equivalent to approximately 180 data samples. This observation significantly influenced our data analysis approach. We employed a rolling window with a fixed size of around 180 samples and an overlap of 179 samples, maintaining a sampling frequency of 200 Hz to effectively capture time-frequency characteristics. This configuration was also applied to time-domain statistical analysis, ensuring alignment with the temporal aspects of punches. The corresponding spectrogram visually depicted the extracted features from all six axes, revealing vital spatiotemporal features such as power spectral density across frequency bands and 9 statistical features for enhanced classification accuracy. By analyzing power spectral density, we could identify frequency ranges associated with different punch types. Notably, in the x-axis, the jab punch exhibited more spectrum, while the y-axis showed more spectrum for the hook punch, and the z-axis demonstrated a higher spectrum for the uppercut.
2.4. Hierarchical Classification
In the initial phase of our hierarchical classification system, we utilize a binary recognition approach to determine the presence or absence of a punch and accurately identify the start and end times of detected punches. Further, we categorize the binary classification into punch types such as jabs, hooks, and uppercuts. We performed punch classification using the random forest technique, allocating 80% of the data, which included 120 punches from each category of punch, and reserved 20% of the data, comprising 50 punches from each category from unknown boxers, for testing. Our previous research [
6] has demonstrated the effectiveness of random forest for punch classification and its accuracy metrics as shown in
Table 3. However, labeling a significant amount of data, especially high-frequency data collected at 200 samples per second, can be a demanding and resource-intensive task, both in terms of time and cost. With our extensive dataset containing thousands of samples, manual labeling by a domain expert such as a boxing coach is impractical and laborious. This challenge is also faced by the researcher who are all did IoT wearable sensor-based classification [
5,
9,
18,
22].
To address these challenges, we adopted an innovative approach known as active learning modeling from the literatures [
24,
25,
26]. Unlike traditional machine learning methods that often require large portions of the dataset, typically 70% to 80%, for training, active learning significantly streamlines this requirement. It accomplishes accurate classification with just a fraction of the data, such as 18 punches from each category like jab, hook, and uppercut, around 15%. This approach reduces the computational burden and minimizes the cost and time associated with labeling an extensive dataset.
Data labeling was conducted using video data as the ground truth, employing a percentage-based criterion ranging from 10% to 100%. Specifically, under the 60% criteria, instances were labeled as "punch" if 60% of the punch occurred within the fixed-length window of 0.8 seconds (average punch action time) or 160 samples; otherwise, they were labeled as "no punch." This labeling approach was consistently applied across various thresholds, ensuring instances were labeled as "punch" only if they met the corresponding percentage criteria within the specified window duration.
The 60% criteria yielded higher accuracy in punch recognition and classification compared to other percentage criteria. Therefore, our domain expert categorized the data or samples into four distinct classes: ’no punch,’ ’long-range jab to the head,’ ’mid-range hook to the head,’ and ’mid-range uppercut to the head,’ for both rear and lead hand punches, using the 60% criteria.
2.4.1. Active Learning Technique with Query Strategy: Query By Committee (QBC)
We have opted for the Query by Committee (QBC) approach due to its effectiveness in punch classification, especially when using random forest or ensemble learning techniques. QBC is a robust active learning method that harnesses the combined intelligence of multiple weak learners such as the Naive Bayes classifier, k-nearest neighbor, and decision tree.
Steps Followed in Active Learning Using Query by Committee Technique:
- 1.
Randomly select 5% of the dataset for initial training, reserving 95% for testing.
- 2.
-
Utilize a weak learner committee (Naive Bayes classifier, k-nearest neighbor, decision tree, or ensemble learner) to train the model using the Query by Committee (QBC) strategy.
-
The Bayes classifier, grounded in Bayesian probability theory, excels in probabilistic classification of the punch.
= Posterior probability of class given spectro-temporal features X
= Likelihood of spectro-temporal features X given classes
= Prior probability of punch class
= Marginal probability of spectro-temporal features X
-
Decision trees, in the context of punch classification and recognition tasks, two common metrics used for splitting nodes in a decision tree are entropy and Gini impurity. Entropy is a measure of impurity or disorder in a set. Gini impurity quantifies the probability of incorrectly classifying an instance randomly chosen from the set. We used the default hyper-parameter settings such as Maximum number of splits is 100 and the split criterion is Gini’s diversity index.
Where,
S is the set of punches at a node.
c is the number of classes(2 class for punch recognision and 3 class for punch classification)
is the probability of punch class
KNN is particularly adept at capturing the punch signal local patterns and adapting to the underlying data distribution. Mathematically, if
has spatio-temporal features of the punch
, and
is a data point in the training set with features
, then the distance d between
and
can be calculated using Euclidean distance:
where
is the corresponding feature of
- 3.
Average the output of the committee to classify predicted punches.
- 4.
-
Compute entropy values for each sample using the Entropy sampling method.
- 5.
Arrange entropy values in descending order alongside corresponding samples.
- 6.
Identify samples with high entropy values, indicating uncertainty in classification.
- 7.
Involve a domain expert or boxing coach to label uncertain samples for the next 5% of the dataset.
- 8.
Add annotated data with the initial 5
- 9.
Repeat the process iterative from step 2 to step 8 until the total training dataset reaches 15% to improve model accuracy and reduce uncertainty in predictions as shown in
Table 4
Following the training of the model, we applied it to new athlete punch data for punch recognition, extracting punch count and punch duration (start time and end time). An attempt was made to identify punch start and end times by analyzing alternating occurrences of 0s (indicating punches) and 1s (indicating no punches) as labels. However, misclassifications led to instances where 0s and 1s appeared within the punch event, complicating the precise determination of punch start and end times. To resolve this issue, we recognize that the average duration of punch events is 0.8 seconds, and no events (punches or punches) occur within a 0.2 seconds interval. If there are fluctuations in events during this time frame, misclassified events are reverted to their previous state. This process ensures an accurate determination of punch count and punch duration.
For the second hierarchical classification aimed at identifying specific punch types (e.g., jab, hook, or uppercut) in both rear and lead hand from the previously identified punch samples, the same active learning technique was applied to train the data and its accuracy as shown in
Table 5. The trained model was subsequently tested with new athlete data. The mode value was extracted from the label output, transforming the punch labels into a single mode label. This approach enabled the determination of punch count for each punch type.