Preprint
Article

This version is not peer-reviewed.

Swarm Intelligence Based Deep Learning Approach for Human Activity Recognition in Wearable Internet of Healthcare Things (IoHT) Applications

Submitted:

09 June 2026

Posted:

10 June 2026

You are already at the latest version

Abstract
The automatic prediction of daily human activities like walking, running, cooking, and office work is called Human Activity Recognition (HAR). The medical industry can greatly benefit from it, especially those working with the elderly, personal health care aides, those keeping patient records for reference in the future, etc. A HAR system can take (a) video or still images of people doing things, or (b) data showing the human body's motions as they do those things gathered from sensors in smart devices (accelerometers, gyroscopes, etc.), smart homes, eldercare, and the Internet of Things (IoT). The suggested HAR applications heavily rely on the latest developments in AI approaches, such as optimisation algorithms from Deep Learning (DL) and Swarm Intelligence (SI). Here, we use open-source data from wearable sensors to construct a reliable HAR system that combines DL and SI. A method for light feature extraction called Residual Bidirectional Long Short-Term Memory (Res-BiLSTM) has been developed. Based on the Marine Predator Algorithm (MPA), we presented novel feature selection approaches to choose the best collection of features. Using three publicly available HAR datasets from the UCI machine learning repository, we assess the performance of the suggested model. We evaluate the suggested model against different DL architectures that have recently been suggested as solutions to the HAR problem. The proposed model surpasses other state-of-the-art approaches in terms of accuracy 96.92%, precision of 95.45%, recall of 94.07%, and F1 score of 96.15% on all three datasets. The suggested approach outperforms several reported results in robustness and activity detection. As well as adapting activity aspects, it has fewer parameters and improved accuracy.
Keywords: 
;  ;  ;  ;  

1. Introduction

Human Activity Recognition (HAR) has been an active research area for several decades, driven by the significant societal benefits it enables when applied to human-centric, real-world scenarios. In parallel, the rapid advancement of microelectronics and sensor technologies, along with the widespread adoption of smartphones, has accelerated the growth of ubiquitous sensing, which focuses on extracting meaningful insights from data captured by pervasive sensors [1]. With the rapid advancement of technology and the growing need for applications in fields includes ambient assisted living, context-aware systems, pervasive and mobile computing, and security based on surveillance, smartphone-based HAR has become increasingly important. Being able to detect activity in an inconspicuous manner is an additional feature of this method, making it appropriate for everyday use [2]. The fundamental goal of HAR in many real-world contexts is to precisely identify the physical actions carried out by a single person or a small group. Running, leaping, strolling, and sitting are just a few of these activities that may be performed by a single person using their complete body [3]. Some actions, including making hand gestures, are carried out by means of specific movements of the body [4]. In some situations, like when cooking, it is possible to do the task by talking to the things involved [5]. The HAR can also describe any kind of unusual behavior, such as a fall [6]. The most common and widely used applications of HAR include healthcare monitoring, human-computer interaction, assisted living in the environment, nursing homes, rehabilitation, and surveillance [7]. As a result of its many potential uses, HAR has recently emerged as a hot topic among academics. According to the data types used, HAR can be broadly classified as either vision-based or sensor-based. While sensor-based systems analyse data from accelerometers, gyroscopes, radars, and magnetometers as a time series [8] format, vision-based techniques analyse data from cameras as video or image data. Because of its small size, low cost, and portability, the accelerometer is the most commonly utilised sensor for HAR. Figure 1 shows the general layout of a typical activity recognition system. Despite the challenges in deployment, object sensors such as radio frequency identifier (RFID) tags find use in the home setting.
According to the studies, sensor-based HAR [9] is more private and easier to use than vision-based HAR [10]. Despite being less costly to create, vision-based HAR [11] is more affected by ambient factors such as camera angle, lighting, and individual overlap. Deep learning (DL) algorithms have gained significant traction recently because they excel at autonomously extracting meaningful characteristics from data, be it visual [12] or time-series. By learning high-level patterns directly from raw input, these models negate the need for manual feature engineering. This process not only preserves essential data relationships but also yields highly discriminative representations. When it comes to activity detection, DL algorithms have been consistently beating classical Machine Learning (ML) methods on classification performance metrics like F1 Score, recall, accuracy, and precision [13]. There are a number of critical steps that make up the composite system that is human behaviour recognition using deep learning architecture, particularly CNN. In every classification application, feature extraction is an essential step. Therefore, improving the classification accuracy of the used approach is possible by isolating the relevant features [14]. The extraction of features has been made easier by the new deep learning algorithms. In this work, we integrated the Marine Predator Algorithm (MPA) into a lightweight DL model (Res-BiLSTM). The convolution layers in the suggested model use a skip-connection (residual connection) mechanism and a BiLSTM method to extract features from the wireless Inertial Measurement Units (IMUs) dataset. Featuring features for learning and extraction, the suggested model is fine-tuned for human activity classification based on publicly available sensors data. Classifier performances are heavily influenced by feature selection approaches. In order to get the most essential features, many technologies are employed. When it comes to feature selection and other complicated engineering challenges, metaheuristic (MH) optimisation methods, such as swarm intelligence (SI) optimisation algorithms, have recently demonstrated remarkable performance [15]. Several MH methods have been used for feature selection, including GSA, artificial bee colonies, salp swarm algorithm, particle swarm optimisation, and differential evolution (DE) [16,17]. The MPA, an effective SI method, was developed by [18]. Problems like time-series forecasting, global optimisation, image segmentation, and feature selection were all tackled by this very effective MH and SI algorithm. As far as we are aware, this is the initial instance of utilising the MPA for HAR purposes. Everyone knows that MH algorithms have their share of issues, and the no-free lunch (NFL) theorem states that no algorithm can ever be perfect. Consequently, there are a few restrictions on the MPA as well. Thus, in this case, we tackled the feature selection issue in HAR applications by using three variants of the MPA, in addition to the original approach. To carry out the binarization, those variations employ two commonly used transfer functions. The MPAV employs the V-shape transfer function, whereas the MPAS and MPAS10 both employ the S-shape function. Three publicly available datasets including extensive and complicated activities were chosen for the purpose of developing a complete HAR methodology. We have chosen data from the Opportunity, UCI-HAR, and DAPHNET databases. There are a variety of fall actions and everyday activities included. In order to guarantee the quality of the MPA algorithms, we do additional study by comparing them to well-known MH and SI algorithms. To summarise, the following are the primary goals of this research:
  • To improve HAR applications, we use DL and SI advancements. In addition, we investigate HAR feature selection using optimization algorithms in detail.
  • Construct a novel feature extraction technique that relies on the MPA to extract features from signals received by IMUs. In addition to convolution layers, skip connections, and BiLSTM, the Res-BiLSTM is made up of a number of distinct components, with the skip being developed in a parallel architecture.
  • Conduct comprehensive evaluation experiments to compare the proposed MPA variations to other cutting-edge DL algorithms and evaluate their performance.
The following is the outline of the paper. Several previous HAR experiments were reviewed in Section 2. In Section 3, we cover the proposed approach and dataset description. In Section 4, the outcomes results of the suggested method are detailed. Section 5 provides a comprehensive summary and outline of the future plans.

3. Methodology

HRA are also hierarchical in the sense that complex activities are made up of simple moves or actions that are needed to do the activity itself. In addition, they are translation-invariant because many people have different ways of doing the same kind of activity and because different parts of the same activity can appear at different times [1]. Though earlier DL methods improved HAR system performance, they overfitted since they couldn’t scale as well. To tackle the activity recognition problem, we provide the smart Res-BiLSTM, which expands upon the MPA’s achievements.

3.1. Dataset Description

The data from three sources that are available to the public is summarized in Table 2 [3,4,10,30]. The UCI-HAR dataset was built from the recordings of 30 participants, which is the highest number of volunteers compared to other datasets.
In comparison to the UCI-HAR dataset, the DAPHNET dataset comprises six activities; however, it contains the most samples. Later on, we will discuss how this dataset is imbalanced. The OPPORTUNITY dataset includes seventeen different actions. Accelerometers, gyroscopes, magnetometers, object sensors, and ambient sensors were the five kinds of sensors that gathered the data.
(1)
UCI-HAR
The UCI-HAR dataset [30] was constructed from audio recordings from 30 individuals ranging in age from 19 to 48 years. Every participant was asked to adhere to a certain set of instructions while the recording was underway. Worn around their waist was a smartphone a Samsung Galaxy S II with inertial sensors built right in. There are six basic motions that everyone must make every day: standing, lying down, walking, going upstairs, and down. The following postural transitions are also included in this dataset: sitting to standing, sitting to laying, laying to sitting, standing to laying, and laying to standing. There is a total of eight such transitions. Due to the modest percentage of postural shifts, only six basic activities were chosen as input examples in this work. The experiments were videotaped so that the data could be manually annotated. At last, the researchers recorded data on three-dimensional acceleration and three-dimensional angular velocity at a steady 50Hz. Table 3 displays the detailed information, and the number of samples in this dataset is 748,206, according to statistics.
(3)
DAPHNE
The total number of samples in the DAPHNET dataset [31] is 294,739, and Table 4 displays the percentage of the total number of samples that are connected with each activity. An imbalanced dataset, DAPHNET, is clearly visible. While standing only makes up 4.4% of the total, activity walking accounts for 38.6%. It uses 36 participants as its experimental object. With an Android phone tucked into their front leg pockets, these individuals went about their usual routines. An accelerometer sampling at 20 Hz is the sensor that is utilized. The smartphone also has a motion sensor integrated into it. Standing (Std), sitting (Sit), walking (Walk), going upwards (Up), down (Down), and jogging (Jog) were the six actions marked. Someone committed to ensuring high-quality data oversaw the data collection process. To better understand the properties of the raw data on each axis, Figure 2 displays the acceleration wave-form of each activity during a 2.56-second period (128 points in total).
(5)
OPPORTUNITY
The 17 complicated motions and gestures included in the OPPORTUNITY dataset [32] were recorded in a sensor-rich environment. In total, it features four individuals engaged in various morning tasks in real-life settings. Various types of sensors were embedded in the environment, in objects, and on people’s bodies. Regarding the configuration of the sensors, the OPPORTUNITY challenge recommendations [33] were followed. We just took into account the on-body sensors, which comprise twelve Bluetooth 3-axis acceleration sensors, two InertiaCube3 sensors for the feet, and five inertial measurement units for the sports jacket. Table 5 summarises the gestures in this dataset, with the symbols of motions denoted by letters in parentheses.

3.2. Pre-Processing

The following pre-processing of raw data obtained by motion sensors is necessary to feed the suggested network with a certain data dimension and enhance the model’s accuracy.
(1)
LINEAR INTERPOLATION
The subjects wear wireless sensors, and the datasets mentioned are realistic. Consequently, it is possible for some data to be lost when collecting; typically, this data is denoted as N a N / 0 . In order to circumvent this issue, this work utilised the linear interpolation approach to fill in the missing numbers. Figure 3 describes Segmentation of sensor data.
(2)
SCALING AND NORMALIZATION
It is important to normalise the input data to the range of 0 to 1, as seen in Equation (1), because training models directly using big values from channels can introduce training bias.
X i = X i X i m i n X i m a x X i m i n   ( i = 1 ,   2 , , n ) (1)
the maximum and minimum values of the i t h channel is represented by X i m a x ; X i m i n , respectively, where n is the number of channels.

3.3. Proposed Residual Convolutional BiLSTM Network

The process begins with the collection of activity data using various devices such as Bluetooth, WIFI, radar, and others. The data is then preprocessed before being identified, based on the human activity identification capabilities of wearable sensing devices. The current approaches are slow and don’t differentiate between actions that are quite similar, like moving upstairs and downstairs. We offer a new architecture, Res-BiLSTM with MPA, to address the issues with current models. The three main parts of the model’s network architecture are the fully connected layer, Res-BiLSTM, and 1DCNN, as shown in Figure 4. The first part, 1DCNN, processes the preprocessed data by extracting spatial features. It successfully shortens the time series by adjusting the convolution kernel’s step size. The model can then decrease the time it takes to recognize objects. After the data is processed using 1DCNN, time series features are extracted using the upgraded Res-BiLSTM network. This model’s capacity to capture long-term dependencies in the time series data is improved by the Res-BiLSTM component, which combines the strengths of BiLSTM with residual connections. The model’s recognition accuracy and its capacity to comprehend complicated temporal patterns are both enhanced by this integration. In order to enhance the final recognition features even further, we present the MPA mechanism. By using weights for the Res-BiLSTM network’s feature information, this mode enables the model to zero in on the most relevant aspects of the input data. The attention mechanism improves activity recognition accuracy by highlighting the most relevant features, which increases the model’s discriminative capacity. When it comes time to classify the behavior information, the fully connected layer and SoftMax function are chosen. A prediction of the current activity is provided by the recognition result, which is the output of this categorization process. We will describe each part in depth in the sections that follow, including what they do and how they fit into our proposed model.

3.4. 1DCNN

Image processing and human behavior identification are two areas where convolutional neural networks (CNNs) shine due to their powerful feature extraction skills in handling tensor data. In this research, we successfully extract features using a one-dimensional convolutional neural network (1DCNN) [35]. Figure 2 shows that in the collected sensing data, the time series is represented vertically and the multi-axis channel features obtained by various sensors are exhibited horizontally. For spatial feature extraction, the model designers opted for 1D convolution—a convolution in behavioral units—rather than the more conventional 2D convolution because the former preserves the integrity of the sensor channels even when dealing with a large number of sensors, while the latter destroys them. A nonlinear activation function is computed in the following way: the input data is convolved with each filter, and then the 1 D convolution is computed:
X j = f i = 1 n W i x j + b i (2)
Here, X j denotes the output activation, W i represents the weight matrix of the i t h filter, and x j corresponds to the input sensing data convolved with W i . The term b i indicates the bias associated with the i t h filter, n is the total number of filters used in the layer, and f ( ) denotes a non-linear activation function. In this work, the convolutional layers employ the Swish activation function [36]. Swish is particularly well suited for sensor-based data, as it alleviates the dead neuron issue associated with the negative input region in the ReLU activation function [37]. The mathematical formulation of the Swish function is given as follows:
s w i s h ( x ) = x s i g m o i d ( x ) (3)
The pooling layer is used for down sampling after the activation stage. This down sampling procedure uses the ‘same’ padding. Furthermore, the stride of the pooling kernel can be adjusted to shorten the length of time in the 1DCNN layer, and the length of the time series changes in the following way:
len   out   = l en   input     (4)
where s is the size of the pooled kernel step, L e n out is the length of the pooled time series, and n input l e is the length of the input time series. Issues with disappearing or ballooning gradients can arise during neural network training due to the continual changes in the probability distribution of input inputs in each layer.
The phrase for this occurrence is the intermediate covariate shift issue [38]. Batch normalisation (BN) was proposed by [19] in 2015 to mitigate the issue of intermediate covariate shifts. Batch normalisation is based on the idea of taking the average and standard deviation of a data collection and replacing them with new values: 0 for the mean and 1 for the variance. Batch normalisation shortens the training time of neural networks by integrating normalisation into the training process, which speeds up convergence during gradient descent. The steps involved in the computation are detailed below.
x i , k ^ = x i , k μ k σ k 2 + ε (5)
Here, x i k denotes the k -dimensional component of the input sample x i from the training set x , μ k represents the mean of the k -dimensional feature computed across all training samples, and σ k 2 + ε corresponds to the standard deviation of the same feature, where ε is a small constant added for numerical stability. Following each convolutional layer, a fixed sequence of operations—convolution (Conv), batch normalization (BN), Swish activation, and max pooling—was applied, as illustrated in Figure 4. This block was repeatedly stacked to form a four-level hierarchical structure in the 1D-CNN component of the proposed model.

3.5. Res-BiLSTM

Activity recognition cannot be accomplished exclusively through the use of 1DCNN for the extraction of spatial features since human actions are essentially temporal. Also, the order in which the events transpired is crucial. When dealing with time series data, RNNs perform admirably. On the other hand, RNN models are susceptible to information loss and gradient vanishing as the time series increases. A long short-term memory network (LSTM) was suggested by [39]. LSTM recurrent neural networks are able to efficiently store longer-term temporal information, in contrast to basic RNNs. When dealing with longer time series, it even surpasses basic RNNs. However, both the moments immediately before and the moments immediately after have an impact on behavioral data. An LSTM network that can process data in both directions is called a bidirectional LSTM (Bi-LSTM). Time series feature extraction is improved with BLSTM over LSTM because bidirectional dependencies are captured. So, a BLSTM network is a good tool to use for behavioral data feature extraction. While BLSTM networks excel in time series feature extraction, they fall short when it comes to spatial feature capture, and the issue of gradient disappearance becomes much more problematic as the number of stacking layers increases during training. In 2015, a group from Microsoft Research developed ResNet, a residual network, to address this issue of gradient disappearance [40]. In 2015, the network achieved victory at the ILSVRC championship after reaching 152 layers. In Figure 5, we can see the precise structure of the residual. We can express each leftover block as:
x i + 1 = x ( i ) + F x i , W i (6)
The remaining blocks are split into two sections: the residual section, denoted as F x i , W i , and the direct mapping, x i . Likewise, the encoder component in the Transformer model likewise makes use of the structure discussed before. Taking advantage of the strengths of the BLSTM network, our study presents a residual structure that uses this architecture.
It is also possible to employ normalisation techniques in BLSTM networks. The following is an expression for layer normalisation (LN), which is computed in the same way as batch normalisation (BN) and has the same benefits for recurrent neural networks as BN [20]:
x ( i ) ^ = x ( i ) E x ( i ) v a r x ( i ) (7)
In this context, x i denotes the input vector corresponding to the i t h dimension, while x ^ i represents the normalized output obtained after applying layer normalization. In this study, a novel architecture that integrates a residual connection with layer normalization within a BLSTM network is introduced. This combined framework is referred to as Res-BiLSTM, and its overall structure is illustrated in Figure 6. The recursive feature information y can be described as:
x t f ( i + 1 ) = L N x t f ( i ) + L x t f ( i ) , W i (8)
x t b ( i + 1 ) = L N x t b ( i ) + L x t b ( i ) , W i (9)
y t = c o n c a t x t f , x t b (10)
Here, the layers are normalized and the input states are processed through the LSTM network. The subscript t in x t f ( i + 1 ) denotes the t t h time step in the input time series, while the superscript f represents the forward hidden state and b indicates the backward hidden state. The term i 1 corresponds to the number of stacked layers in the network. The encoded representation y t at time t is obtained by combining information from both the forward and backward states. As illustrated in Figure 7, the proposed Res-BiLSTM architecture consists of parallel forward and backward LSTM networks that jointly capture temporal dependencies in both directions.

3.6. Marine Predators’ Algorithm for Training the Weights and Hyperparameter Tuning

MPA is a novel algorithm that attempts to seek prey by simulating the actions of marine predators. Two random walks, like Brownian motion and Lévy flight, are the basis of the two foraging tactics used by marine predators. Here are the mathematical explanations for these foraging strategies.
(1)
BROWNIAN MOVEMENT
A probability function that is defined by a Gaussian distribution determines the step lengths in this stochastic model. The model’s probability density function is defined at the x-point as before:
f B ( x ; μ , σ ) = 1 2 π σ 2 e ( x μ ) 2 2 σ 2 = 1 2 π e x 2 2 (11)
where μ = 0 and σ 2 = 1 .
(2)
LEVY FLIGHT
The Lévy distribution can be used to express the step sizes of this random walk in the following way:
L x j x j 1 α (12)
where x j stands for the flight length and 1 < α 2 signifies the power-law exponent. The Lévy stable model’s integral formulation is provided by [2].
f L ( x ; α , γ ) = 1 π 0 e x p γ q α c o s x q d q (13)
where α is used to determine the scale unit and γ is the distribution index that is used to change the model’s scale attributes. Equation (3) offers a resolution in two instances. A Gaussian distribution is shown in the first scenario where the value of α is 2. The second scenario depicts a Cauchy distribution with α equal to 1. In addition, the integral in Equation (3) is solved using the series expansion method as x approaches infinity.
f L ( x ; α , γ ) γ Γ ( 1 + α ) s i n π α 2 π x ( 1 + α ) (14)
were, x approaches infinity, Γ stands for the Gamma function, where Γ(1+α) is equal to α! for integers α. In order to create a Lévy stable model with an index distribution (α) whose values range from 0.3 to 1.99, the Mantegna algorithm was suggested in Mantegna (1994). To produce Lévy-distributed random integers, the Mantegna method is used as
L e v y ( α ) = 0.05 × x | y | 1 / α (15)
were, x and y are variables that follow a normal distribution, with σ x and σ y being their standard deviations, respectively.
x = N ormal 0 , σ x 2 (16)
y = N ormal 0 , σ y 2 (17)
σ x is formulated in Equation (6) as:
σ x = Γ ( 1 + α ) s i n π α 2 Γ ( 1 + α ) 2 α 2 ( α 1 ) 2 1 / α , σ y = 1 and α = 1.5 (18)
(3)
MPA FORMULATION
The MPA is quite similar to other metaheuristic algorithms. When these algorithms are first started, they are defined by:
X 0 = X min   + r a n d X max   X min   (19)
where the variables’ lower and upper limits are represented by X min   and X max   , respectively, and rand is a vector of uniformly distributed random numbers from 0 to 1. The most fit predators will also be the best foraging, according to the idea of natural selection (Viswanathan et al., 1999). In order to create the Elite matrix, the fittest predators are used to symbolise the best solutions.
Elite   = X 1,1 I X 1,2 I X 1 , d I X 2,1 I X 2,2 I X 2 , d I X n , 1 I X n , 2 I X n , d I n × d (20)
wherever X I is not equal to create the Elite matrix, we take the fittest predator vector, where I stand for it, and multiply it by n. The number of dimensions is represented by d while the number of searching agents is represented by n. The positions of the prey are used by the Elite matrix arrays to detect it. The hunter and the hunted are both acknowledged as search agents. Next, we create a new matrix called Prey that shares the same dimensions as the Elite matrix.
P r e y = X 1,1 X 1,2 X 1 , d X 2,1 X 2,2 X 2 , d X n , 1 X n , 2 X n , d n × d (21)
where the i t h dimension of the j t h prey is denoted by X i , j . To keep the predator’s positions up-to-date, the Prey matrix is utilised. Specifically, the MPA’s whole optimisation scenario relies on the Elite and Prey matrices.
(4)
Optimization scenarios in the MPA
The three main optimization stages of the MPA algorithm are based on different velocity ratios, simulating the entire life cycle of a predator and prey. The three stages are described in the following way:
First phase (Exploration phase): During this time, the prey uses Brownian motion to rapidly move about and hunt for food. Instead, the predator watches its prey move while remaining motionless. During the initial three-quarters of iterations, which are mathematically represented as [12], the exploration phase takes place.
Iteration < 1 3 Maximum_Iteration
  step   size   i = R B r   Elite   R B r P r e y i for i = 0 , , n P r e y i = P r e y i + P R   step _ size   i (22)
A vector of randomly produced numbers with a Gaussian distribution is represented by R B r , and it indicates the Brownian movement. Multiplications performed entry-by-entry are shown by the symbol . The product R B r P r e y i represents the tracking of prey. Here, P is a constant with a value of 0.5, and R represents a vector containing uniformly distributed random integers ranging from 0 to 1. The term Maximum_Iteration refers to the total number of iterations allowed in the process, whereas Iteration indicates the current iteration number.
Second phase (Transition phase between exploration phase and exploitation phase): In this second stage, the pace of the hunter and the hunted are almost equal. There is a gradual change from the exploration phase to this transitional phase. The predator completes the exploring phase with Brownian motion, whereas the prey reaches an exploitative phase with Lévy flight. Actually, there are two equal divisions in the population: one group is responsible for exploration and the other for exploitation [4]:
while 1 3 Maximum_Iteration < Iteration < 2 3 Maximum_Iteration As far as the first group is concerned,   step _ size   i = R L v E i R L v   P r e y i ) for i = 0 , , n / 2 P r e y i = P r e y i + P R   step _ size   i (23)
The Lévy flight is shown by the vector R L v , which is created at random according to the Lévy distribution. The Lévy model of prey motion is given by the product R L v P r e y i , and the prey’s motion is modelled by adding the step size to its position. The Lévy distribution’s step sizes are helpful for exploitation because they mostly consist of tiny steps. This is the model that describes the second group’s actions:   step _ size   i = R B r R B r   Elite   i P r e y i for i = n / 2 , , n .
P r e y i =   Elite   i + P × C F   step _ size   i (24)
where C F = 1   Iteration     Maximum   Iteration   2   Iteration     Maximum   iteration   raised to the power of represents the convergence factor (CF), which helps predators manage their search space during exploitation by adaptively adjusting the step size of their mobility. The product R B r   Elite   i models the predator’s Brownian motion, while the Brownian-based predator motion updates the prey’s positions.
Third phase (Exploitation phase):
At this point in the game, the predator is outpacing its prey in terms of speed. During the Lévy flight, the predator carries out an exploitative phase in order to capture its prey. The final one-third of iterations constitute the third stage. In terms of mathematics, the third stage can be expressed as [5]:
while Iteration > 2 3 Maximum_Iteration   step _ size   i = R L v R L v   Elite   i   Pr   ey   i for i = 0 , , n P r e y i =   Elite   i + P × C F   step _ size   i (25)
The product R L v   Elite   i can be used to represent the predator’s motion when it follows the Lévy strategy. Adding a step size to Elite position models the predator’s motion to make it easier to update the prey’s position.
(5)
Eddy formation with the effect from FADs
FADs and eddy formation both have significant impacts on predator behavior in the ocean. Based on the information provided in [1], the majority of the time that sharks are available, they stay close to FADs. When they are not, they use that time to explore different dimensions and find areas with different distributions of prey. FADs and lengthy skips enhance the algorithm’s performance by preventing the MPA from stagnating at local optima. An example of the FADs scenario is:
P r e y i = P r e y i + C F X m i n + R X m a x X m i n U   if   r F A D s P r e y i + [ F A D s ( 1 r ) + r ] P r e y r 1 P r e y r 2   if   r > F A D s (26)
U is a binary vector with arrays of 0 and 1, and FADs is the likelihood of FADs effect, which is 0.2.r is a uniformly distributed random integer between zero and one. The subscripts r 1 and r 2 denote indices of arbitrary numbers for the prey matrix.
(6)
Memory of the marine predators
Inspired by marine predators’ remarkable memory for high-production feeding sites, which allows them to swiftly capture optimal solutions while avoiding local solutions, he integrated this feature into his algorithm by comparing current best replies to those from previous rounds. The solutions are adjusted depending on the optimal one during the comparison stage. The MPA pseudo-code is displayed below 1:1:
Algorithm 1: Steps of MPA
Initialize a set of N solutions U.
while stop conditions are not met do
Calculate fitness values and generate Elite matrix.if  t < t m a x / 3  then
using Equation (22) to Update generation values (solutions);else if  t m a x / 3 < t < 2 × t < t m a x / 3   thenfor the first-half of the solutions ( i = 1 , , n 2 ) .
Apply Equation (23) to update solution values;for the second half of the solutions ( i = 1 , , n 2 ) .
Apply Equation (24) to update solution valueselse if  t > 2 × t m a x / 3 then
Apply Equation (25) to update solution values;
end if
Apply Equation (26) and FADs effect for updating current
solutions.
Update memory and Elite.
end while
Equations (27)–(29) state the transfer function used in the three binary forms of MPA, which are S-shaped MPA (MPAs and MPA 10), V-shaped MPA (MPAV), and a third version that is not specified.
T F = 1 / 1 + e X ( i , j ) 0.5 (27)
T F = 1 / 1 + e 10 * X ( i , j ) 0.5 (28)
T F = 2 π t a n 1 π 2 * X ( i , j ) (29)
where X ( i , j ) is the j t h dimension of the i t h solution and TF is the value of the transfer function. Then, each solution is updated by comparing the TF value to a randomly generated number in the range of 0 to 1.
(7)
Computational complexity
There are two steps to the suggested model’s operation: feature optimization and feature extraction using the RCNN-BiGRU model. Phase two involves using MPA and its variants to pick the right features to boost accuracy, with classification tasks handled by SVM [15] and the random forest (RF) algorithm [5]. While training on the datasets in question, the Res-BiLSTM model’s 1.5 million parameters were changed. The complexity of feature optimization, represented by T F , is determined according to the formula in Equation (17).
T ( F ) ) = O t m a x × N s d + C F E × N s (30)
where N s is the total number of search agents and d is the dimension, which stands for the number of features. The cost function evaluation, abbreviated as CEF, is classifier-dependent. The training time for the SVM algorithm is O N T E 2 , while the training time for the RF algorithm is O N T R × N 5 l o g N 5 × d . Here, N T E is the number of training instances and N T R is the number of trees with the RF algorithm.

4. Results

This section begins with an introduction to the experimental setup, which includes a discussion of how we constructed the various models with their respective parameters, the various configurations of the datasets, and a subsequent study of the machine specifications on which we conducted the experiments. Then, we’ll compare the outcomes for the various models and present and talk about the results we got for each dataset. Implementation of the suggested model architecture was carried out in TensorFlow [32] through the Kera’s API. Machine learning algorithms can be expressed and executed using TensorFlow, which serves as both an interface and an implementation. In order to speed up training on GPUs, we employ TensorFlow 2.4.0, which has eager execution capabilities. The Kera’s application programming interface (API) for TensorFlow simplifies the process of creating artificial neural networks (ANNs) by hiding the underlying complexity.

4.1. Performance Measures

Three publicly available and benchmark datasets are used to assess the performance of the proposed model on the HAR problem and compare it to other models; we go into more depth about these datasets when we provide their results. Consistency and meaningful comparability were achieved by training the proposed model on the identical train, validation, and test sets. Our datasets are public and have been for some time; many other studies have used them as well, with competing claims of superiority [18,19,23,25,26]. In order to create a more consistent standard for deep learning applications and to compare them to newer methods, we optimised these datasets in our trials. For instance, with subject-specific information present in all datasets, we have taken precautions to avoid using training and testing sets that contain data from the same patients. Although the user’s data came from distinct experiments or drills, we utilised some of their data in both the testing and validation sets when working with the DAPHNET dataset. The scarcity of data belonging to the freeze class was the reason behind this. The subsections that follow have touched on a few other factors. A common problem with gathering data on human activities in their natural habitats is the inherent class imbalance in the resulting datasets. Depending on the class, there may be a large number of samples in some and very few in others. Among our four datasets, the UCI HAR dataset is the most evenly distributed, with 13% of the train set’s samples going to the smallest class and 19% to the biggest. The test and validation classes are no different. The Opportunity dataset is severely skewed in favour of the Drink from Cup class, which uses over 23% of the training data compared to Close Drawer 2’s 2%. The most straightforward metric for gauging a model’s efficacy on a dataset is its accuracy, which is defined as the percentage of observations that were correctly predicted relative to the total number of observations. If our accuracy is good, it would be easy to assume that the suggested model is ideal. The only situation in which accuracy is a useful metric is in symmetric datasets, where the values of false positives and false negatives are very close to one another. Consequently, other parameters should be considered while assessing the model’s performance. The model’s accuracy (A), is calculated as:
A = T P + T N T P + F P + F N + T N (31)
where TP and FP stand for the number of correct positive results and TN and FN for the number of incorrect negative results, respectively. When a classifier makes a prediction about the accuracy of each class’s categorisation, larger classes tend to do better than smaller ones. When evaluating performance, the overall classification accuracy is not the right metric to use [16]. The F1 score, F1, gives equal weight to each class’s accurate classification. In determining the final grade, it takes into account each class’s memory and precision. As a measure of accuracy, precision (P) is defined as the proportion of correctly predicted positive observations relative to the total number of expected positive observations:
P = T P T P + F P (32)
Recall (R), often referred to as sensitivity, measures the ability of a model to correctly identify positive instances. It is calculated as the ratio of correctly predicted positive observations to the total number of actual positive observations present in the dataset:
R = T P T P + F N (33)
The F1 score, which accounts for class imbalances by weighting classes according to their sample composition, is calculated as the weighted average of P and R:
F 1 = 2 × R × P R + P (34)
in which recall is denoted by R and accuracy by P. The performance of the proposed models was assessed using several key metrics: total parameter count, accuracy, F1 score [3], and categorical or binary cross-entropy loss. Notably, larger and more complex architectures do not always yield superior results, as certain compact models were observed to achieve lower loss values, highlighting the efficiency of simpler designs.

4.1.1. UCI HAR Dataset

The UCI HAR smartphone dataset, introduced by [18], contains recordings of 30 participants performing basic activities of daily living (BADL) while carrying a waist-mounted smartphone equipped with inertial sensors. The primary objective is to classify six distinct activities using triaxial angular velocity and triaxial linear acceleration data, all sampled at a consistent rate of 50 Hz. These activities comprise three static postures standing, sitting, and lying along with three dynamic movements: walking, walking downstairs, and walking upstairs. Data collection employed sliding windows of 2.56 seconds duration with a 50% overlap, resulting in 128 readings per window. To ensure signal quality, preprocessing involved applying a median filter alongside a third-order low-pass Butterworth filter with a 20 Hz cutoff frequency for noise reduction. Additionally, the acceleration signals were further decomposed into body acceleration and gravity components using another Butterworth low-pass filter. From this preprocessing pipeline, a total of nine signal channels were ultimately extracted for input into the deep learning models.
In order to create a reproducible and comparable benchmark, we used the datareader.py program from [34] to divide the dataset into three sets: train, test, and validation. These sets are organised according to the topics, as shown in Table 6. The goal was to make sure that all of our models produced the same, verifiable outcomes and that the trained models could be applied to any user. A balanced dataset across all six classes is indicated by the minimum and maximum percentages. We used many models to analyse the UCI HAR dataset, and the results are shown in Table 7. In comparison to the other models, our suggested model achieves the best test accuracy (96.12%) and F1 score (95.15%). The CNN-LSTM network comes in second, with an 89% F1 score and an accuracy that is 3.61% lower.
Although the CNN-LSTM network uses more than twice as many parameters as the proposed model (1,300,000 vs. 85,000), the vanilla LSTM network uses substantially fewer parameters overall (about 85,000 vs. 1,300,000). In Figure 8 and Figure 9, we show the accuracy and loss for the proposed model during training and validation. We utilised a learning rate scheduler to repeatedly lower the learning rate when training reaches a plateau, and we trained all of the models for a maximum of 350 epochs with early stopping patience set to 100 epochs.
Figure 10 shows a comparison of the confusion matrices from all six of our models, revealing that the suggested model is the most effective at distinguishing between the classes. The sitting and standing classes are typically mistaken for one another in most models due to their shared properties.

4.1.2. Opportunity Dataset

A dataset for opportunity activity detection that includes real-life actions recorded using 72 different sensors (both external and internal) in a very sensor-rich setting. Using 72 sensors across 10 modalities, embedded in items and the environment, as well as on the body, it records data from 12 participants over 15 networked sensor systems. For these reasons, it is an excellent choice for comparing different activity identification methods. The only inertial measurement units that were taken into consideration were those that come from the columns that ranged from 38 to 134. While we did include data from other sensors like the triaxial accelerometer, gyroscope, and magnetometer, we did not include the quaternion readings. The end consequence was that we had 77 channels (signals) to work with. We obtained 90 samples per window from the data, which was captured at 30 Hz, by extracting 3-second windows.
At first, the Opportunity dataset presents an 18-class multi-class classification challenge; however, we remove the extra label known as the null class, reducing the set to 17 classes. Table 8 lists these. The classes titled “Drink from Cup” include the greatest amount of data, whereas the classes titled “Close Drawer 1” and “Close Drawer 2” comprise the smallest portions of the dataset, accounting for approximately 2.5% apiece. The unequal distribution of the data across the classes indicates that this dataset is unbalanced. Table 9 provides a summary of the outcomes from the Opportunity dataset for model training.
When tested on this dataset, our proposed model outperforms the competition. Compared to the CNN LSTM and stacked BiLSTM models, it performed better with an F1 score of 93.23%. Our model accuracy of 95.14% and model loss of 0.23 are significantly better than the stacked LSTM and CNN-LSTM networks. Figure 11 compares the confusion matrices of our six models and demonstrates that the suggested model outperforms the others when it comes to distinguishing between the various classes. Using the UCI HAR dataset as an example, the suggested model parameters are only lower than the ResNet and LSTM models. Figure 12 and Figure 13 show the accuracy and loss for the proposed model during training and validation, respectively.
We trained all of the models, including the one we presented, for a maximum of 160 epochs, with early stopping patience set at 20 epochs. When training reached a plateau, we utilised a learning rate scheduler to iteratively cut the learning rate. Due to their shared properties, the “Drawer” related classes in most models are easily confused with one another.

4.1.3. Daphnet

Daphnet dataset that was created to test artificial algorithms for identifying gait freeze using acceleration sensors worn on the hips and legs. The sudden and temporary inability to walk, known as freezing of gait (FOG), affects over half of the people with severe Parkinson’s disease (PD). A person’s quality of life is greatly diminished, and they are more likely to fall as a result. Successful non-pharmacologic treatments are of particular importance for PD patients’ gait defects because these deficits are frequently resistant to pharmacologic treatment. Their research set out to test the feasibility of a wearable gadget that could track a person’s steps in real time, analyse the data, and then offer support according to user preferences. They created a wearable FOG detector that can detect fog in real-time, play a signal when it detects fog, and continue to do so until the person starts walking again. In research including ten individuals with PD, this wearable assistive device was assessed. In post-hoc film analysis, expert physiotherapists were able to identify 237 FOG occurrences. We captured the dataset in a controlled laboratory environment where we intentionally generated a large number of freeze events. A more realistic activity of daily living (ADL) task saw users entering several rooms to retrieve coffee, unlock doors, etc., after which they walked in a straight path, walked with many turns, etc. Freeze and No Freeze are the two activities included in this dataset. The data was captured at 64 Hz and sampled using 3 second fixed-width sliding windows with 50% overlap. This allowed for 192 readings per window. As input to the DL models, we utilised the nine accelerometer signals, as well as the triaxial accelerometer from the ankle, upper leg, and trunk. Figure 14 compares the confusion matrices of DAPHNET dataset on six models and demonstrates that the suggested model outperforms the others when it comes to distinguishing between the various classes.
In order to create a reproducible and comparable benchmark, we used the datareader.py file from [34] to divide the dataset into three parts: train, test, and validation. These parts are based on the experiments conducted for each participant, as shown in Table 10. We have a severely biassed dataset if the percentages for the No Freeze and Freeze groups are so different. Table 11 provides a summary of the outcomes from the Daphnet dataset model training. Even on this dataset, our suggested model outperforms the competition. Compared to the LSTM, ResNet, and stacked LSTM models, it performs 3% better with an F1 score of 94.07%. Our model accuracy of 96.32% and model loss of 0.25% are better than the stacked LSTM and CNN-LSTM networks. Similar to the CNN-LSTM model, the parameters of the suggested model are lower when applied to the UCI HAR dataset. Figure 15 and Figure 16 show the accuracy and loss during training and validation for the suggested model. With early stopping patience set at 20 epochs and a learning rate scheduler to iteratively reduce the learning rate when training plateaus, we trained the proposed model and the other models for a maximum of 160 epochs.
As a result of data imbalance, the “Freeze” class is under classified in the majority of models. In contrast to the other models, the suggested one is the most effective at identifying these freeze events.

5. Conclusions

As a result of the advancements in deep learning and swarm intelligence techniques, this study has addressed the topic of human activity recognition based on data that has been acquired publicly from wearable sensors. In order to address the HAR issue, we put up a novel feature extraction strategy that use a residual convolutional BiLSTM to extract pertinent features from sensor input. We used the latest developments in swarm intelligence algorithms, which have proven to be very effective in this area, to address the issue of feature selection. We compared opportunity, UCI-HAR, and Daphnet, three publicly available benchmark datasets, to various optimization techniques in our evaluation trials. The results demonstrated that the suggested Res-BiLSTM with MPA achieved the highest performance, as measured by several performance indicators and statistical tests. It improved classification accuracy and beat numerous optimization algorithms, including state-of-the-art DL. Additional research is needed to address other concerns related to feature development, such as making use of unlabeled data for easy implementation in real-time HAR applications and lowering computation costs.

Funding

No funding was received for conducting this study

Conflicts of Interest

The authors declare no competing interests.

References

  1. Abdel-Basset, M.; Hawash, H.; Chakrabortty, R. K.; Ryan, M.; Elhoseny, M.; Song, H. ST-DeepHAR: Deep learning model for human activity recognition in IoHT applications. IEEE Internet Things J. 2020, 8(6), 4969–4979. [Google Scholar] [CrossRef]
  2. Zhou, X.; Liang, W.; Kevin, I.; Wang, K.; Wang, H.; Yang, L. T.; Jin, Q. Deep-learning-enhanced human activity recognition for Internet of healthcare things. IEEE Internet Things J. 2020, 7(7), 6429–6438. [Google Scholar] [CrossRef]
  3. Islam, M. M.; Nooruddin, S.; Karray, F.; Muhammad, G. Multi-level feature fusion for multimodal human activity recognition in Internet of Healthcare Things. Inf. Fusion 2023, 94, 17–31. [Google Scholar] [CrossRef]
  4. Javeed, M.; Abdelhaq, M.; Algarni, A.; Jalal, A. Biosensor-based multimodal deep human locomotion decoding via internet of healthcare things. Micromachines 2023, 14(12), 2204. [Google Scholar] [CrossRef]
  5. Yu, J.; Zhang, J. Monitoring and analysis of physical activity and health conditions based on smart wearable devices. J. Intell. Fuzzy Syst. 2024, (Preprint), 1–16. [Google Scholar] [CrossRef]
  6. Khalid, A. M.; Khafaga, D. S.; Aldakheel, E. A.; Hosny, K. M. Human Activity Recognition Using Hybrid Coronavirus Disease Optimization Algorithm for Internet of Medical Things. Sensors 2023, 23(13), 5862. [Google Scholar] [CrossRef] [PubMed]
  7. Algethami, S. A.; Alshamrani, S. S. A Deep Learning-Based Framework for Strengthening Cybersecurity in Internet of Health Things (IoHT) Environments. Appl. Sci. 2024, 14(11), 4729. [Google Scholar] [CrossRef]
  8. Priyadarshini, I.; Sharma, R.; Bhatt, D.; Al-Numay, M. Human activity recognition in cyber-physical systems using optimized machine learning techniques. Clust. Comput. 2023, 26(4), 2199–2215. [Google Scholar] [CrossRef]
  9. Hamza, K.; Riaz, Q.; Imran, H. A.; Hussain, M.; Krüger, B. Generisch-Net: A Generic Deep Model for Analyzing Human Motion with Wearable Sensors in the Internet of Health Things. Sensors 2024, 24(19), 6167. [Google Scholar] [CrossRef]
  10. Hemalatha, T.; Kalaiselvi, T. C.; Gnana Kousalya, C.; Rohini, G. Multimodal deep learning for activity detection from iot sensors. IETE J. Res. 2024, 70(5), 5006–5018. [Google Scholar] [CrossRef]
  11. Gaud, N.; Rathore, M.; Suman, U. MHCNLS-HAR: Multi-Headed CNN-LSTM Based Human Activity Recognition Leveraging a Novel Wearable Edge Device for Elderly Health Care. IEEE Sens. J. 2024. [Google Scholar] [CrossRef]
  12. Thakur, D.; Guzzo, A.; Fortino, G. Attention-based multihead deep learning framework for online activity monitoring with smartwatch sensors. IEEE Internet Things J. 2023, 10(20), 17746–17754. [Google Scholar] [CrossRef]
  13. Menaka, S. R.; Prakash, M.; Neelakandan, S.; Radhakrishnan, A. A novel WGF-LN based edge driven intelligence for wearable devices in human activity recognition. Sci. Rep. 2023, 13(1), 17822. [Google Scholar] [CrossRef]
  14. Al-qaness, M. A.; Dahou, A.; Trouba, N. T.; Abd Elaziz, M.; Helmi, A. M. TCN-Inception: Temporal Convolutional Network and Inception modules for sensor-based human activity recognition. In Future Generation Computer Systems; 2024. [Google Scholar]
  15. Wazwaz, A.; Amin, K.; Semary, N.; Ghanem, T. Dynamic and Distributed Intelligence over Smart Devices, Internet of Things Edges, and Cloud Computing for Human Activity Recognition Using Wearable Sensors. J. Sens. Actuator Netw. 2024, 13(1), 5. [Google Scholar] [CrossRef]
  16. Waghchaware, S.; Joshi, R. Machine learning and deep learning models for human activity recognition in security and surveillance: a review. In Knowledge and Information Systems; 2024; pp. 1–32. [Google Scholar]
  17. Ashwin, M.; Jagadeesan, D.; Raman Kumar, M.; Murugavalli, S.; Chaitanya Krishna, A.; Ammisetty, V. Novel hybrid optimization based adaptive deep convolution neural network approach for human activity recognition system. In Multimedia Tools and Applications; 2024; pp. 1–25. [Google Scholar]
  18. Dahou, A.; Al-qaness, M. A.; Abd Elaziz, M.; Helmi, A. Human activity recognition in IoHT applications using arithmetic optimization algorithm and deep learning. Measurement 2022, 199, 111445. [Google Scholar] [CrossRef]
  19. Issa, M. E.; Helmi, A. M.; Al-Qaness, M. A.; Dahou, A.; Abd Elaziz, M.; Damaševičius, R. Human activity recognition based on embedded sensor data fusion for the internet of healthcare things. In Healthcare; MDPI, June 2022; Vol. 10, No. 6. [Google Scholar]
  20. Bolhasani, H.; Mohseni, M.; Rahmani, A. M. Deep learning applications for IoT in health care: A systematic review. Inform. Med. Unlocked 2021, 23, 100550. [Google Scholar] [CrossRef]
  21. Nagarajan, S. M.; Deverajan, G. G.; Chatterjee, P.; Alnumay, W.; Ghosh, U. Effective task scheduling algorithm with deep learning for Internet of Health Things (IoHT) in sustainable smart cities. Sustain. Cities Soc. 2021, 71, 102945. [Google Scholar] [CrossRef]
  22. Helmi, A. M.; Al-Qaness, M. A.; Dahou, A.; Damaševičius, R.; Krilavičius, T.; Elaziz, M. A. A novel hybrid gradient-based optimizer and grey wolf optimizer feature selection method for human activity recognition using smartphone sensors. Entropy 2021, 23(8), 1065. [Google Scholar] [CrossRef]
  23. Islam, M. M.; Nooruddin, S.; Karray, F. Multimodal human activity recognition for smart healthcare applications. In 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC); IEEE, October 2022; pp. 196–203. [Google Scholar]
  24. Thanarajan, T.; Alotaibi, Y.; Rajendran, S.; Nagappan, K. Improved wolf swarm optimization with deep-learning-based movement analysis and self-regulated human activity recognition. AIMS Math. 2023, 8(5), 12520–12539. [Google Scholar] [CrossRef]
  25. Bhattacharya, D.; Sharma, D.; Kim, W.; Ijaz, M. F.; Singh, P. K. Ensem-HAR: An ensemble deep learning model for smartphone sensor-based human activity recognition for measurement of elderly health monitoring. Biosensors 2022, 12(6), 393. [Google Scholar] [CrossRef]
  26. Jain, R.; Semwal, V. B. A novel feature extraction method for preimpact fall detection system using deep learning and wearable sensors. IEEE Sens. J. 2022, 22(23), 22943–22951. [Google Scholar]
  27. Hnoohom, N.; Chotivatunyu, P.; Mekruksavanich, S.; Jitpattanakul, A. Multi-resolution CNN for lower limb movement recognition based on wearable sensors. In International Conference on Multi-disciplinary Trends in Artificial Intelligence; Springer International Publishing: Cham, November 2022; pp. 111–119. [Google Scholar]
  28. Alonazi, M.; Alshahrani, H. M.; Kouki, F.; Almalki, N. S.; Mahmud, A.; Majdoubi, J. Deep convolutional neural network with symbiotic organism search-based human activity recognition for cognitive health assessment. Biomimetics 2023, 8(7), 554. [Google Scholar] [CrossRef] [PubMed]
  29. Waghchaware, S.; Joshi, R. Machine learning and deep learning models for human activity recognition in security and surveillance: a review. In Knowledge and Information Systems; 2024; pp. 1–32. [Google Scholar]
  30. Bebortta, S.; Singh, S. K. An intelligent framework towards managing big data in internet of healthcare things. International conference on computational intelligence in pattern recognition, 2022, April; Springer Nature Singapore: Singapore; pp. 520–530. [Google Scholar]
  31. Khaled, H.; Abu-Elnasr, O.; Elmougy, S.; Tolba, A. S. Intelligent system for human activity recognition in IoT environment. In Complex & Intelligent Systems; 2021; pp. 1–12. [Google Scholar]
  32. Ronald, M.; Poulose, A.; Han, D. S. iSPLInception: an inception-ResNet deep learning architecture for human activity recognition. IEEE Access 2021, 9, 68985–69001. [Google Scholar] [CrossRef]
  33. Fra, V.; Forno, E.; Pignari, R.; Stewart, T. C.; Macii, E.; Urgese, G. Human activity recognition: suitability of a neuromorphic approach for on-edge AIoT applications. Neuromorphic Comput. Eng. 2022, 2(1), 014006. [Google Scholar]
  34. Boudjema, A.; Titouna, F.; Titouna, C. AReNet: Cascade learning of multibranch convolutional neural networks for human activity recognition. Multimed. Tools Appl. 2024, 83(17), 51099–51128. [Google Scholar] [CrossRef]
  35. Zhao, Y.; Wang, J.; Zhang, Y.; Liu, H.; Chen, Z. A.; Lu, Y.; Gao, S. Flexible and wearable EMG and PSD sensors enabled locomotion mode recognition for IoHT-based in-home rehabilitation. IEEE Sens. J. 2021, 21(23), 26311–26319. [Google Scholar] [CrossRef]
  36. Zheng, G. A novel attention-based convolution neural network for human activity recognition. IEEE Sens. J. 2021, 21(23), 27015–27025. [Google Scholar]
  37. Ige, A. O.; Noor, M. H. M. A deep local-temporal architecture with attention for lightweight human activity recognition. Appl. Soft Comput. 2023, 149, 110954. [Google Scholar] [CrossRef]
  38. Al-qaness, M. A.; Dahou, A.; Abd Elaziz, M.; Helmi, A. M. Human activity recognition and fall detection using convolutional neural network and transformer-based architecture. Biomed. Signal Process. Control 2024, 95, 106412. [Google Scholar]
  39. Uddin, M. A.; Talukder, M. A.; Uzzaman, M. S.; Debnath, C.; Chanda, M.; Paul, S.; Aryal, S. Deep learning-based human activity recognition using CNN, ConvLSTM, and LRCN. Int. J. Cogn. Comput. Eng. 2024, 5, 259–268. [Google Scholar]
  40. Choudhury, N. A.; Soni, B. In-depth analysis of design & development for sensor-based human activity recognition system. Multimed. Tools Appl. 2024, 83(29), 73233–73272. [Google Scholar]
Figure 1. An overall system architecture of human activity recognition.
Figure 1. An overall system architecture of human activity recognition.
Preprints 217720 g001
Figure 2. 2.56-second acceleration waveform per activity.
Figure 2. 2.56-second acceleration waveform per activity.
Preprints 217720 g002
Figure 3. Segmentation of sensor data.
Figure 3. Segmentation of sensor data.
Preprints 217720 g003
Figure 4. The difference between a) 1DCNN and b) 2DCNN in human activity recognition.
Figure 4. The difference between a) 1DCNN and b) 2DCNN in human activity recognition.
Preprints 217720 g004
Figure 5. Proposed Res-BiLSTM model architecture.
Figure 5. Proposed Res-BiLSTM model architecture.
Preprints 217720 g005
Figure 6. Residual block in ResNet.
Figure 6. Residual block in ResNet.
Preprints 217720 g006
Figure 7. The Res-BiLSTM network is composed of forward and backward LSTM networks, each LSTM is added to the residual structure and LN, and the final encoding information forward state and backward state are spliced.
Figure 7. The Res-BiLSTM network is composed of forward and backward LSTM networks, each LSTM is added to the residual structure and LN, and the final encoding information forward state and backward state are spliced.
Preprints 217720 g007
Figure 8. Proposed model accuracy graph on the UCI HAR dataset.
Figure 8. Proposed model accuracy graph on the UCI HAR dataset.
Preprints 217720 g008
Figure 9. Proposed model loss graph on the UCI HAR dataset.
Figure 9. Proposed model loss graph on the UCI HAR dataset.
Preprints 217720 g009
Figure 10. Proposed model confusion matrices on the UCI HAR dataset.
Figure 10. Proposed model confusion matrices on the UCI HAR dataset.
Preprints 217720 g010
Figure 11. Proposed model confusion matrices on the opportunity dataset.
Figure 11. Proposed model confusion matrices on the opportunity dataset.
Preprints 217720 g011
Figure 12. Proposed model loss graph on the opportunity dataset.
Figure 12. Proposed model loss graph on the opportunity dataset.
Preprints 217720 g012
Figure 13. Proposed model accuracy graph on the opportunity dataset.
Figure 13. Proposed model accuracy graph on the opportunity dataset.
Preprints 217720 g013
Figure 14. Proposed model confusion matrices on the DAPHNET dataset.
Figure 14. Proposed model confusion matrices on the DAPHNET dataset.
Preprints 217720 g014
Figure 15. Proposed model accuracy graph on the DAPHNET dataset.
Figure 15. Proposed model accuracy graph on the DAPHNET dataset.
Preprints 217720 g015
Figure 16. Proposed model Loss graph on the DAPHNET dataset.
Figure 16. Proposed model Loss graph on the DAPHNET dataset.
Preprints 217720 g016
Table 1. Some state-of-the-art algorithms to solve HAR.
Table 1. Some state-of-the-art algorithms to solve HAR.
Reference Year Method Dataset Results Limitations
Thanarajan et al. [24] 2023 PSO-Optimized CNN MHEALTH Dataset Achieved 92.5% accuracy, reduced computational cost by 15% compared to standard CNNs, robust against noise. High convergence time during optimization.
Battacharya et al. [25] 2022 GA-Enhanced LSTM UCI HAR Dataset Improved temporal prediction with 91.8% accuracy, reduced false positives by 12%. Struggles with real-time data processing, affecting deployment in dynamic environments.
Jain et al. [26] 2022 Ant Colony Optimization (ACO) + CNN WISDM Dataset Enhanced minor activity detection, achieving F1-Score of 88.2%; computational overhead reduced by 10%. Poor generalization to new sensor data; requires dataset-specific tuning.
Priyadarshini et al. [8] 2024 Firefly Algorithm + Bi-LSTM Opportunity Dataset Precision of 90.1%, detected overlapping activities effectively, handled long-term dependencies well. Slow optimization and difficulty scaling to larger datasets.
Menaka et al. [13] 2024 PSO-Inception V3 PAMAP2 Dataset Delivered 93.2% accuracy, handled multi-sensor fusion efficiently, reduced energy consumption by 18%. Susceptible to overfitting on small training sets; requires regularization.
Hnoohom et al. [27] 2022 Swarm-Based RNN RealWorld HAR Dataset Achieved recall of 89.4%, with efficient identification of rare activities, improved battery life by 10%. Complexity leads to high computational power demands for wearable devices.
Alonazi et al. [28] 2023 Bee Colony Optimization + DNN HAPT Dataset Achieved 90.5% accuracy, 20% faster convergence than standard methods, robust for varying user habits. Limited scalability when new sensors or data types are introduced.
Waghchaware et al. [29] 2024 Particle Swarm Optimization (PSO) + CNN WISDM Dataset Detected walking and running with 88.9% accuracy, reduced training time by 25%, low memory usage. Lacks privacy-preserving mechanisms for wearable healthcare devices.
Bebortta et al. [30] 2023 Grey Wolf Optimizer (GWO) + GRU UCI HAR Dataset Enhanced sequential motion recognition with 91.2% accuracy, reduced latency to 20ms per prediction. Lower performance in high-noise scenarios, requiring pre-processing.
Table 2. Represents detailed Information on public datasets.
Table 2. Represents detailed Information on public datasets.
Dataset Sensors S. Rate Volunteers Samples
UCI-HAR A, G 50Hz 30 748,206
DAPHNET A 20Hz 36 294,739
Opportunity A, G, M, O, A, M 30Hz 4 701,366
Table 3. Activities of UCI-HAR.
Table 3. Activities of UCI-HAR.
Activities Samples Percentage
Walk 121, 191 15.3%
Up 117, 607 14.6%
Down 108, 861 15.4%
Sit 125, 577 15.9%
Stand 137, 205 17.5%
Lay 137, 765 17.3%
Table 4. Activities of daphne.
Table 4. Activities of daphne.
Activities Samples Percentage
Walk 42, 300 37.6%
Jog 41, 277 32.2%
Down 21, 769 10.2%
Up 90, 327 9.2%
Stand 52, 739 5.4%
Sit 46, 297 4.3%
Table 5. Activities of Opportunity.
Table 5. Activities of Opportunity.
Door 1 Open Drawer 1
Door 2 Close Drawer 1
Fridge 1 Open Drawer 2
Fridge 2 Close Drawer 2
Door 1 Open Drawer 3
Door 2 Close Drawer 3
Clean Table Open Drawer 1
Drink from Cup Open Drawer 1
Table 6. Splitting the UCI HAR dataset.
Table 6. Splitting the UCI HAR dataset.
Set Subject Total Samples Min Max
Train 1, 3, 5, 6, 11, 14, 15, 16, 17, 19, 21, 22, 23, 28, 29, 30. 7342 13.5% 18.1%
Test 2, 9, 10, 13, 18, 24. 1946 14.2% 17.2%
Validation 4, 12, 20 990 13.2% 18.2%
Table 7. Quantitative evaluation on UCI HAR dataset.
Table 7. Quantitative evaluation on UCI HAR dataset.
Method Accuracy Precision F1-score Recall
ResNet 93.57 82.34 87.23 90.12
Inception 92.38 90.23 89.14 91.23
CNN+PSO 91.70 88.45 82.32 90.12
LSTM 89.72 87.54 85.23 86.14
BiLSTM 91.81 92.12 88.23 90.15
This Work 96.12 96.24 95.15 94.26
Table 8. Splitting the opportunity dataset.
Table 8. Splitting the opportunity dataset.
Set Subject Samples Min Max
Train S1-2, S1-4, S1-5, S1-Drill, S2-1, S2-3, S2-4, S2-5, S3-4, S3-5, S4-1, S4-2, S4-Drill 3015 2.5% 22.1%
Test S2-2, S2-Drill, S3-1, S4-5 1179 2.2% 21.1%
Val S1-1, S3-2, S3-Drill, S4-4 1077 3.2% 18.9%
Table 9. Quantitative evaluation on opportunity dataset.
Table 9. Quantitative evaluation on opportunity dataset.
Method Accuracy Precision F1-score Recall
ResNet 82.24 81.13 79.23 80.45
Inception 82.41 79.54 81.45 80.12
CNN+PSO 77.79 75.12 77.16 75.24
LSTM 82.82 79.45 81.34 80.23
BiLSTM 80.90 80.14 79.45 78.23
This Work 95.14 94.45 93.23 95.00
Table 10. Splitting the UCI HAR dataset and data disparity.
Table 10. Splitting the UCI HAR dataset and data disparity.
Set Subject Number of Samples Min Max
Train S1-1, S1-3, S3-1, S3-2, S6-1, S6-2, S7-1, S8-1, S9-1, S10-1 7935 91.3% 8.6%
Test S2-1, S4-1, S5-1 2322 91.8% 7.1%
Validation S2-2, S3-3, S5-1 1612 83.0% 15.0%
Table 11. Quantitative evaluation on DAPHNET dataset.
Table 11. Quantitative evaluation on DAPHNET dataset.
Method Accuracy Precision F1-score Recall
ResNet 91.97 90.23 93.00 89.12
Inception 90.97 91.45 90.34 90.23
CNN+PSO 94.22 94.67 93.10 92.27
LSTM 88.65 86.78 88.67 91.34
BiLSTM 91.41 90.89 92.02 90.37
This Work 96.92 95.45 94.07 96.15
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated