Visual Target-Driven Robot Crowd Navigation with Limited FOV Using Self-Attention Enhanced Deep Reinforcement Learning

Yinbei Li; Qingyang Lyu; Jiaqiang Yang; Yasir Salam; Baixiang Wang

doi:10.20944/preprints202412.2181.v1

Submitted:

25 December 2024

Posted:

26 December 2024

You are already at the latest version

Abstract

Navigating crowded environments poses significant challenges for mobile robots, particularly as traditional Simultaneous Localization and Mapping (SLAM)-based methods often struggle with dynamic and unpredictable settings. This paper proposes a visual target-driven navigation method using self-attention enhanced deep reinforcement learning (DRL) to overcome these limitations. The navigation policy is developed based on the Twin-Delayed Deep Deterministic Policy Gradient (TD3) algorithm, enabling efficient obstacle avoidance and target pursuit. We utilize a single RGB-D camera with a limited field of view (FOV) for target detection and surrounding sensing, where environmental features are extracted from depth data via a convolutional neural network (CNN). A self-attention network (SAN) is employed to compensate for the limited FOV, enhancing the robot’s capability of searching for the target when it is temporarily lost. Experimental results show that our method achieves a higher success rate and shorter average target-reaching time in dynamic environments, while offering hardware simplicity, cost-effectiveness, and ease of deployment in real-world applications.

Keywords:

Crowd navigation

;

Deep Reinforcement Learning

;

Mapless Navigation

;

Twin-Delayed Deep Deterministic Policy Gradient (TD3)

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

Effective navigation in crowded and dynamic environments, as illustrated in Figure 1, is critical for a wide range of real-world robotic applications, such as autonomous delivery robots maneuvering through busy urban streets, service robots operating in congested shopping malls, and search-and-rescue robots deployed in disaster-stricken areas. Traditional navigation methods, such as those based on Simultaneous Localization and Mapping (SLAM), which rely on pre-mapping of environments, often underperform in these complex settings. The main limitation of SLAM-based approaches is their dependence on static maps and the assumption that the environment remains relatively unchanging [1,2], resulting in frequent localization errors and suboptimal navigation paths in dynamic environments.

In response to this challenge, Deep Reinforcement Learning (DRL) has emerged as a promising solution. DRL’s ability to learn directly from interactions with the environment enables it to adapt to a wide variety of scenarios, making it particularly effective in unknown and dynamic environments [3,4,5]. Unlike conventional methods, DRL-based approaches do not require a predefined map, instead modeling navigation tasks as Partially Observable Markov Decision Processes (POMDPs) [6,7], an extension of Markov Decision Processes (MDPs). A POMDP is defined by the tuple:

M = (S, A, T, R, Ω, O, Y)

(1)

where S represents the state space, A denotes the action space, T is the transition probability, R is the reward function, Ω is the observation space, O is a set of conditional observation probabilities and ϒ is the discount factor [8,9,10]. DRL techniques, including specific algorithms such as Deep Q-Networks (DQN) and Proximal Policy Optimization (PPO), as well as broader frameworks like Actor-Critic methods, have demonstrated significant potential in enabling robots to navigate dynamic environments without relying on pre-built maps. These approaches effectively handle the complex, sequential decision-making processes inherent in such tasks [11,12].

However, existing DRL-based mapless navigation solutions are often constrained by their reliance on complex feature extraction processes and sensor fusion strategies, which typically requires systems with a wide field of view (FOV), as shown in [13,14,15,16]. While other methods are effective in relatively static environments as shown in [17,18,19], they struggle in highly dynamic settings characterized by partial observability. In such environments, robots receive incomplete observations, which increases decision uncertainty. Moreover, traditional MDP-based approaches face significant challenges in handling large or continuous state and action spaces, often requiring function approximation techniques that can complicate model convergence and degrade real-time performance. The heavy reliance on handcrafted feature extraction methods exacerbates these issues, reducing generalization capabilities of the learned models.

Navigating in environments with limited FOV introduces additional complexities. The restricted observational capacity hampers a robot’s ability to accurately characterize its state, thereby reducing situational awareness. Furthermore, high-dimensional state representations derived from multimodal sensing data pose substantial difficulties for conventional DRL algorithms, particularly when generating continuous action spaces. This can lead to overfitting, limiting robot's adaptability to unseen scenarios.

To address these challenges, this paper proposes a DRL-based method specifically for visual target-driven navigation in crowded environments with limited FOV. Our approach is grounded in key principles aimed at enhancing generalization and practical applicability. First, we minimize hardware requirements by employing a single RGB-D camera with limited FOV, thereby reducing dependency on complex localization techniques, and enhancing deployment versatility across a broad spectrum of robotic platforms. This minimalist sensor setup not only reduces costs but also simplifies the overall system architecture, making it more robust accessible for various applications.

Second, we mitigate the risk of overfitting by incorporating dynamic obstacles into the training process, ensuring that the learned strategies are robust and adaptable to changing conditions in real-world scenarios. By introducing a randomly positioned visual target during training, we prevent the model from becoming overly dependent on specific environmental configurations, thereby enhancing the generalization capabilities.

Furthermore, to tackle the challenge associated with target tracking within a limited FOV, our method integrates a self-attention network (SAN). The SAN infers positional information of the lost target based on past observations, enabling the robot to effectively search for the target.

At the core of our approach is the Twin-Delayed Deep Deterministic Policy Gradient (TD3) algorithm, known for its effectiveness in handling continuous action spaces, to enable efficient navigation in crowded environments. The TD3 algorithm addresses the challenges associated with traditional DRL approaches by introducing several innovations, including twin Q-networks and delayed policy updates, which improve stability and performance in complex environments.

In summary, this paper addresses the gaps in the current literature on DRL-based navigation in crowded environments. By proposing a practical solution that reduces both observational and computational demands while maximizing adaptability. Our contributions are threefold:

Introducing a novel DRL-based architecture for mapless navigation in unseen and dynamic environments, which relies exclusively on a visual target without requiring any environment modeling or prior mapping.
Integrating a SAN to enhance the robot’s ability to search for and track targets in environments with dynamic obstacles and randomly positioned targets.
Simplifying sensor configurations by eliminating the need for complex localization techniques, thereby ensuring broader applicability in diverse operational scenarios.

The remainder of this paper is organized as follows: Section 2 provides a review of related work on DRL-based navigation, with emphasis on visual target-driven approaches. Section 3 details the system architecture of the proposed method. In Section 4, we present the navigation policy representation. Section 5 introduces the training algorithms used for the proposed model. Section 6 presents the experimental results and compares our approach with existing methods. Section 7 discusses the experiment results. Finally, Section 8 concludes the paper and discusses directions for future research.

2. Related Work

Current DRL-based navigation systems can be broadly classified into three categories based on the environmental context in which robots operate: map-based navigation, mapless navigation in static environments, and mapless navigation in dynamic environments.

2.1. Map-Based Navigation

In environments where a pre-built map is available, researchers have explored integrating DRL with conventional SLAM to enhance the obstacle avoidance capabilities. For instance, Chen et al. [20] proposed a navigation system that combines a DQN-based planner with SLAM to improve navigation performance in dynamic environments by utilizing a pre-built costmap. Similarly, Shunyi Yao et al. [21] developed a DRL-based local planner for maneuvering through crowded pedestrian areas using a pre-built map. Although these applications demonstrate improved obstacle avoidance, their reliance on static pre-built maps limits their adaptability to dynamic environments, reducing their generalization to unseen or evolving scenarios.

2.2. Mapless Navigation in Static Environment

In relatively static environments, DRL-based systems have been developed for target-driven navigation without the need for pre-built maps. Kulhánek et al. [22] demonstrated the efficacy of visual navigation in a static living room environment, using a LSTM network to leverage past actions and rewards to track targets. They introduced the A2C with Auxiliary Tasks for Visual Navigation (A2CAT-VN) framework, optimized for static indoor environments. Hsu et al. [23] proposed vision-based DRL approach for robot to navigate a large-scale static environment by combing current state image and last action. Zhu et al. [24] proposed a Deep Siamese Actor-Critic Network, which employs a dual-stream network to embed current observations and target images into a shared space for target-driven navigation, achieving successful navigation in indoor environments without a pre-built map. Kulhánek et al. [25] further integrated the Parallel Advantage Actor-Critic (PPAC) algorithm with LSTM for indoor navigation without dynamic obstacles, while Wu et al. [26] incorporated an information-theoretic regularization into an A3C framework to enable mapless navigation towards novel targets. Despite these successes, most existing DRL-based visual navigation systems are constrained by their application to relatively static environments. They often rely on complex network architectures and deep feature extraction, which are insufficient to handle navigation in highly dynamic and crowded settings.

2.3. Mapless Navigation in Dynamic Environment

Mapless navigation in highly dynamic and crowded environments poses distinct challenges. Researchers have made progress in achieving navigation in dynamic settings without relying on pre-built maps, though their solutions often depend heavily on sensor fusion and feature extraction techniques. For instance, Anas et al. [13] employed odometry and 2D laser data to develop a collision probability concept, while Shi et al. [14] introduced the notions of Traversability, VO Feasibility, and Survivability to guide navigation. Sun et al. [15] proposed a spatial feature encoder incorporating risk-aware and attention-based feature extraction strategies. Despite these advancements, the heavy reliance on handcrafted features and sophisticated sensor setups, such as combinations of lidar and depth cameras [14], complicates real-world deployment and increases the risk of overfitting to specific environments. Furthermore, most current approaches are tailored to predefined target positions, which can cause overfitting and limit their generalization to unseen scenarios.

In contrast to existing approaches, our method presents several key advantages. First, unlike map-based techniques, our solution enables real-time navigation without requiring prior mapping, making it both more flexible and scalable in unknown scenarios. Second, while most DRL-based methods using visual input are optimized for static environments, our system is specifically designed to operate in dynamic settings. Furthermore, compared to other DRL approaches in dynamic environments, which often rely on complex sensor fusion, we achieve efficient navigation using only a single RGB-D camera. Finally, our approach handles randomly positioned visual targets and reduces the need for handcrafted feature extraction, enhancing adaptability in real-world applications.

3. System Architecture

3.1. Overall System Design

The architecture of the proposed DRL-based robot navigation approach, as shown in Figure 2, integrates several components, including depth data processing via a convolutional neural network (CNN), YOLO-based target detection, a SAN for temporal feature extraction, and TD3 algorithm for policy learning. Collectively, these components form a robust framework for visual target-driven robot navigation in crowded environments.

3.2. Depth and Visual Data Processing

The system receives input from an onboard RGB-D camera with 70-degree horizontal FOV, a compact setup compared to systems in many existing works assuming a 360-degree FOV [27]. The depth data (720 pixel × 1280 pixel), representing the 3D structure of the environment, is processed through a 8-layer CNN, as shown in Figure 3. Each layer uses 3×3 kernels with 2×2 max-pooling to down-sample the data, reducing it to a feature representation (2048×2×5), which is further connected to a fully connected layer and then flattened to feature vector with dimension of 1024 for integration into the decision-making pipeline.

In parallel, YOLO is employed to perform real-time target detection using RGB data capture by the same RGB-D camera. YOLO predicts bounding boxes and class probabilities in a single pass, allowing continuous target tracking as the robot navigates. Once the target’s bounding box is detected, the depth value at its center is retrieved from the depth map, enabling the system to estimate the target’s distance and enhance situational awareness during navigation.

3.3. Self-Attention Network for Temporal Feature Processing

One key challenge posed by the robot’s limited 70-degree FOV is its restricted ability to perceive the entire environment, making it harder to track the target which moves outside its visual range. To mitigate this, the SAN is employed to compensate for the limited FOV by extracting temporal and spatial dependencies from past observations. This approach is inspired by SAN’s demonstrated effectiveness in capturing complex dependencies in previous research [28,29]. By learning patterns and relationships between actions and their outcomes over time, SAN enables the robot to maintain situational awareness beyond its current FOV.

In this study, SAN processes sequences of 20 past target positions and 20 corresponding actions and output a feature vector with dimension of 128, as illustrated in Figure 4. This allows the robot to “reconstruct” a broader understanding of the environment, enabling it to track the target’s location, even when it temporarily falls outside the camera’s view. By leveraging these temporal patterns, SAN improves decision-making in target pursuit in dynamic environments.

The effectiveness of SAN in our approach lies in its ability to dynamically prioritize relevant spatial and temporal features, allowing the robot to respond when the target is lost from its current view. The self-attention mechanism works by allowing the model to assign varying levels of importance to different parts of its past observations, such as past target locations and actions. This means that when key information, like the target’s location, is missing from the current FOV, the model can attend to previous observations that still hold useful data about the target’s movement patterns. By computing relationships between queries, keys, and values, the robot can better predict where the target is likely to be, even when direct visual input is unavailable. This dynamic allocation of attention enables the robot to maintain robust situational awareness and make more informed navigation decisions, even in challenging environments.

Mathematically, this attention mechanism is represented by the following formula:

A t t e n t i o n = S o f t m a x (\frac{Q K^{T}}{\sqrt{d_{K}}}) V

(2)

where the robot’s input sequence (i.e., past target positions and actions) is transformed into three distinct matrices—Query (Q), Key (K), and Value (V). Queries assess relevance, Keys represent the elements being attended to, and Values provide the actual data being aggregated. This allows the robot to focus on the most critical parts of its environment and adjust its behavior accordingly, despite the limited FOV.

3.4. Twin Delayed Deep Determinstic Policy Gradient (TD3)

Building on the feature extraction capabilities of the CNN and SAN, TD3 is employed as the core policy learning algorithm to manage continuous action spaces in the crow navigation task. TD3 enhances stability by using twin critics to reduce Q-value overestimation and delayed policy updates to improve training efficiency [30]. Its strength in continuous action optimization makes it effective for real-time obstacle avoidance in dynamic environments, as demonstrated by its integration with methods like the Dynamic Window Approach (DWA) for LiDAR-based navigation [31] and Long Short-Term Memory (LSTM) networks for path-following in autonomous systems [32]. In dynamic environments with unpredictable obstacle movement, TD3 is advantageous in managing continuous action space for obstacle avoidance. Its twin critics compute conservative value estimates, addressing overestimation bias, which is particularly critical in environments where rapid changes occur. The TD3 loss function is defined as:

L (θ_{1}, θ_{2}) = \frac{1}{N} \sum_{j = i}^{N} {(Q_{θ_{1}} (s_{j}, a_{j}) - y_{j})}^{2} + \frac{1}{N} \sum_{j = i}^{N} {(Q_{θ_{2}} (s_{j}, a_{j}) - y_{j})}^{2}

(3)

where

Q_{θ_{i}}

represents the action-value function approximated by the i-th Q-network, with

θ_{i}

denoting its parameters. The term

y_{j}

refers to the target Q-value, which is calculated as the minimum value between the two Q-networks, addressing the issue of overestimation. Specifically, the target Q-value

y_{j}

is computed as follows:

y_{j} = r_{j} + γ \min_{i = 1, 2} Q_{{θ_{i}}^{'}} (s_{j + 1}, π_{\emptyset^{'}} (s_{j + 1}) + ϵ)

(4)

Here,

Q_{{θ_{i}}^{'}}

represents the target Q network, where

{θ_{i}}^{'}

are the parameters slowly updated from the current Q network parameters

θ_{i}

. This gradual update process helps stabilize training by providing more consistent target Q-values. The reward

r_{j}

is obtained after executing action

a_{j}

in state

s_{j}

, and γ is the discount factor. The target policy

π_{\emptyset^{'}}

is used to compute the next action, with ϵ being a small noise added to encourage exploration.

To further enhance stability, TD3 implements delayed policy updates and soft target network updates to further stabilize the learning process. These enhancements ensure more accurate Q-value estimation and robust training, setting TD3 apart from other reinforcement learning algorithms like DDPG. TD3’s robustness in managing continuous action spaces, coupled with its enhancements for stability and efficiency, makes it perform well in crowd navigation.

4. Navigation Policy Framework

In this section, we present the design of navigation policy for the proposed system.

4.1. Observation Space

In our proposed method, the observation feature representation, which is used as the input to the TD3 network, is composed of three distinct components, each providing crucial information for effective navigation. First, a 1024-dimensional vector is extracted from the depth image via a CNN. This vector captures spatial features of the environment, helping the robot identify obstacles and free spaces. Second, the depth value of the target object, represented as a scalar, is derived from the depth at the center point of the bounding box detected by YOLO, providing essential target distance information. Lastly, a 128-dimensional vector is produced by a SAN, which processes the historical context of the last 20 actions and the last 20 target positions. This vector captures temporal dependencies and relationships between past actions and target positions, enhancing the agent's ability to make informed decisions based on prior experiences. By integrating these three components, the TD3 network is equipped with comprehensive spatial and temporal information, allowing it to navigate complex environments and efficiently track the target.

4.2. Action Space

The action space is represented by a two-dimensional vector,

a = [v, ω]

,, where

v

denotes the linear velocity and

ω

represents the angular velocity. Actions are selected based on a deterministic policy

π

, conditioned on the current observation o. Specifically, the action a is sampled from the policy distribution

π (a | o)

. The linear velocity

v

ranges from 0 to 0.55 m/s, allowing control over the forward speed, while the angular velocity ω represents the angular velocity ranging from -1.5 to 1.5 rad/s, allowing the robot to adjust its heading during navigation.

4.3. Reward Function

The reward function in our approach is designed to guide the robot toward efficient and safe navigation in dynamic environment. It is defined as follows:

R = R_{s t e p} + R_{d t} + R_{c v} + R_{t p} + R_{o d} + R_{c r}

(5)

4.3.1. Step Penalty $R_{s t e p}$

This component encourages the robot to reach the target efficiently by imposing a constant penalty for each time step. It discourages prolonged navigation and oscillatory behavior. The penalty is set at a constant value of:

R_{s t e p} = - 18

(6)

4.3.2. Distance Reward $R_{d t}$

The distance reward is designed to incentivize the robot to move closer to the target. The reward is proportional to the change in distance between the current position and the target, denoted by

d_{t}

, and the previous distance,

d_{t - 1}

. The robot is penalized for moving away from the target:

R_{d t} = - 150 \cdot (d_{t} - d_{t - 1})

(7)

4.3.3. Camera View Reward $R_{c v}$

This reward component encourages the robot to keep the target within the camera’s view. The robot is rewarded when the target is visible and penalized when it is lost. The reward is based on whether the target’s bounding box is detected:

R_{c v} = \{\begin{matrix} 10, i f t h e t a r g e t i s d e t e c t e d \\ - 10, o t h e r w i s e \end{matrix}

(8)

4.3.4. Target Position Reward $R_{t p}$

This reward component encourages the robot to center the target in its view. The closer the target to the center of the camera’s view, the higher the reward. The position of the target within the view is denoted as

P_{t a r g e t}

.

R_{t p} = \{\begin{matrix} 15 \cdot (1 - |P_{t a r g e t}|), i f - 1 \leq P_{t a r g e t} \leq 1 \\ - 15, o t h e r w i s e \end{matrix}

(9)

4.3.5. Obstacle Distance Penalty $R_{o d}$

This term penalizes the robot for moving too close to obstacles. The penalty is proportional to the change in distance between the agent and the nearest obstacle, where

d_{o_{t}}

and

d_{o_{t - 1}}

represent the current and previous distance, respectively:

R_{o d} = 3000 \cdot (d_{o_{t}} - d_{o_{t - 1}})

(10)

4.3.6. Collision Risk Penalty $R_{c r}$

A collision risk penalty is introduced to penalize the robot for operating in high-risk areas. As illustrated in Figure 5, it is calculated as the as the ratio of the number of “dangerous-level” depth pixels

n_{d}

, where their depth is below a danger threshold, to the number of “warning-level” depth pixels

n_{w}

, where their depth is below a warning threshold:

C R = \frac{n_{d}}{n_{w}}

(11)

The corresponding penalty is:

R_{c r} = - C R \cdot 180

(12)

The collision risk penalty encourages the robot to avoid highly crowded area and choose a less crowded area for navigation.

5. Model Training

This section outlines the training process for the proposed model, which integrates both CNN and SAN modules within a DRL framework.

5.1. Network Training

By combining the spatial information extracted by the CNN with the temporal dependencies captured by the SAN, the model addresses the limitations posed by the robot’s constrained FOV. This fusion enables more informed and anticipatory decision-making, particularly in dynamic and crowded environments.

Figure 6 illustrates the overall framework of our proposed enhanced TD3 algorithm, where the critic loss of TD3 network is used not only to update the traditional critic network but also to simultaneously update the CNN and SAN networks. Specifically, the state-action-reward-next state

(s, a, r, s^{'})

data in the experience replay buffer is first processed with Gaussian noise to generate a noisy next state

s^{'}

At this point, the TD3 algorithm uses two critic networks (Q1 and Q2) to compute the Q-values, and the target Q-value

y_{j}

is generated by taking the minimum of the two target critic networks (Target Critic 1 and Target Critic 2), which is then used to calculate critic loss.

In standard TD3, the critic loss is solely used to update the parameters of the critic network itself. However, in our design, the critic loss is further utilized to update the parameters of the CNN and SAN, enabling them to learn features that help reduce the Q-value prediction error. The parameters of CNN and SAN are updated simultaneously by minimizing this loss function. The gradient descent update rules are given by:

θ_{C N N} \leftarrow θ_{C N N} - α \nabla_{θ_{C N N}} L (θ_{1}, θ_{2})

(13)

θ_{S A N} \leftarrow θ_{S A N} - α \nabla_{θ_{S A N}} L (θ_{1}, θ_{2})

(14)

where α is the learning rate,

\nabla_{θ_{C N N}} L

and

\nabla_{θ_{S A N}} L

are the gradients of the loss function with respect to the CNN and SAN parameters, respectively. Specifically, the CNN is primarily responsible for extracting visual information from RGB-D images, while the SAN is used to extract temporal features from historical target positions and action sequences. By aligning the CNN and SAN networks with the target of the critic loss, they can collectively learn state and action features that are beneficial for Q-value prediction, thereby enhancing the accuracy of the Critic network's Q-value estimation.

Moreover, this joint optimization method ensures consistency between the feature extraction networks and the policy network, reducing the issue of feature mismatch among networks and improving the stability and overall performance of the training process.

We provide the pseudocode, outlined in Table 1, which details comprehensive steps and procedures employed during the model training process. The pseudocode includes important aspects such as algorithm initialization, training process, and parameter updates to facilitate the reader's understanding of the algorithm's implementation.

In TD3, the critic network plays a crucial role in estimating Q-values by minimizing the critic loss, which measures the error between predicted and target Q-values. This process ensures accurate value functions and stable convergence. CNN and SAN are integral components of the framework, with the former extracting spatial features from RGB-D images and the latter capturing temporal dependencies from historical data. However, when optimized independently, these networks face limitations, as their feature extraction processes are not directly aligned with the reinforcement learning objective of minimizing Q-value prediction errors. This misalignment can lead to feature mismatch, reducing the effectiveness of the overall system.

To address this, we leverage the critic loss to simultaneously update CNN and SAN, aligning their training objectives with the reinforcement learning goal. This joint optimization provides several key benefits:

Direct Alignment with RL Objectives: CNN and SAN no longer optimize features independently but instead learn representations that directly minimize Q-value prediction errors, improving overall efficiency.
Reduced Feature Mismatch: By aligning feature extraction with the critic’s loss function, the features learned by CNN and SAN are more consistent and better suited to support the critic network.
Improved Stability and Performance: The unified training process enhances the stability of the learning process, accelerates convergence, and improves policy performance by ensuring that spatial and temporal features are optimized to complement each other.
Enhanced Feature Representation: CNN and SAN learn richer and more task-relevant features, enabling the critic network to model complex environments effectively.
Avoiding Optimization Conflicts: Jointly training CNN and SAN avoids potential conflicts or redundancies that may arise when these networks are optimized independently.

This integrated design allows the model to utilize spatial and temporal information more efficiently, compensating for limited FOV and enabling robust decision-making in dynamic and crowded environments.

5.2. Training Environment and Procedure

The model training was conducted using an NVIDIA RTX 3090 GPU running Robot Operating System (ROS) Noetic distribution and Gazebo simulator. The simulated environment, as shown in Figure 7, comprises a 5 m × 5 m square area populated with 12 dynamic obstacles.

These obstacles are cylindrical in shape with a height of 0.24 meters and a radius of 0.05 meters, exhibiting random Brownian motion with a maximum linear speed of 0.4 m/s. The autonomous robot, a TurtleBot Burger equipped with an Intel RealSense D435 depth camera, was tasked with navigating this environment. The robot’s mobility constraints were set with a maximum linear velocity of 0.55 m/s and a maximum angular velocity of 0.15 rad/s. The depth camera provides both depth and visual input, which are subsequently processed to extract crucial environment information for navigation.

Over the course of training, the robot underwent 120,000 episodes, totaling approximately 185 hours. Each episode initiated with the agent positioned randomly within the environment, aiming to reach a predefined target location while avoiding collisions with the moving obstacles. The primary training objective focused on maximizing the cumulative reward, designed to incentivize efficient and safe target pursuit. Throughout the training phase, the agent’s policy was continually refined via the TD3 algorithm. This enhancement was supported by a SAN, which leveraged both spatial and temporal information from historical observations and target.

This training setup enabled the robot to learn navigation strategies within a crowded and dynamically changing environment, achieving real-time decision-making under limited FOV.

6. Results

We tested our approach in a simulated environment with different obstacle movement patterns and conducted evaluations of our method across three distinct scenarios to assess its performance relative to existing DRL-based approaches. Our comparison was made with: (1) map-based DRL approaches in static environments, (2) vision-based DRL approaches in static environments, and (3) TD3-based approach in dynamic environments.

6.1. Trajectory Visualization in Dynamic Environment

To evaluate the robot's navigation performance in dynamic environments, we visualized its trajectory in a simulated space. Figure 8 illustrates a typical trajectory of a robot navigating a 5 m × 5 m simulated space populated with eight cylindrical obstacles, each measuring 0.24 meters in height and 0.05 meters in radius. These obstacles are programmed to move in patterns of Brownian motion and crossing motion, with a maximum velocity of 0.4 m/s. Each frame captures the robot's state at different time intervals, showcasing its gradual approach toward the target in a dynamic environment. The variations in the trajectory confirm the robot's real-time adaptability to the movements of the obstacles, demonstrating the effectiveness and flexibility of the self-attention enhanced DRL approach in complex navigation tasks.

6.2. Comparison with Map-based DRL Approaches in Static Environment

Our experimental setup included a comparative evaluation against map-based DRL approaches, as summarized in Table 2. Our method achieved a success rate of 0.97 in a map-less, target-driven simulation environment utilizing a single RGB-D camera. In the same environment, the DQN-SLAM approach achieved a success rate of 0.94 based on a pre-built map, utilizing LiDAR. Similarly, the PPO-SPD model reported a success rate of 0.996 within a narrow corridor populated by humans, employing both RGB-D and LiDAR sensors in an environment with a pre-built map.

It is important to highlight that these baseline methods depend on pre-built maps and a predefined fixed target location, substantially simplifying the navigation task compared to our map-less, visual target-driven setup. In our trials, we employed a 5 m × 5 m indoor environment with four randomly placed static cylindrical objects and a randomly located target object. Despite the challenging conditions, our method demonstrated superior navigation capabilities, achieving the highest success rate, and illustrating its adaptability and efficacy in unstructured, mapless environments, which is a crucial advantage for real-world applications where pre-built maps are unfeasible.

6.3. Comparision with Mapless Vision-based DRL Approaches in Static Environment

In further evaluations, our method was contrasted against other vision-based DRL strategies in a static environment, as detailed in Table 3. Our approach recorded a success rate of 0.87 in an environment with eight moving obstacles, utilizing a single RGB-D sensor for perception. In contrast, the method developed by Wu et al. [26], tested in a static indoor bathroom using the AI2-THOR framework, achieved a success rate of 0.627. The PPAC+LSTM method [25] reported a perfect success rate of 1.0 in a simulated static indoor bathroom scenario, without any moving obstacle.

Our test environment, a 5 m × 5 m simulated space populated with eight cylindrical obstacles each 0.24 meters in height and 0.05 meters in radius moving at a maximum velocity of 0.4 m/s in a Brownian motion, represents a more complex and realistic challenge compared to the static conditions of the baseline methods. This demonstrates the robustness and adaptability of our method, offering significant benefits for practical robotic navigation, especially when employing a vision sensor with limited FOV.

6.4. Comparision with Mapless Vision-based DRL Approaches in Static Environment

In further tests, we compared our method with a baseline TD3 algorithm [13] in varying obstacle densities (4, 8, and 12 moving obstacles). Our approach, utilizing a single RGB-D camera, achieving success rates of 0.93, 0.87, and 0.57, respectively, as shown in Table 4.

We extensively tested the open-source program of [13] in the same simulated environment and recorded its average performance in Table 4. This baseline method, while achieving similar performance in low-density environments, showed significant performance declines in higher obstacle densities. This performance indicates that our approach not only maintains competitive success rates but also facilitates faster and more efficient navigation, validating its effectiveness in visual target-driven robot navigation tasks under varying dynamic conditions.

6.5. Ablation Study

To determine the efficacy of various reinforcement learning configurations, we conducted an ablation study comparing three configurations with respect to their learning convergence and reward maximization, The result is shown in Figure 9.

6.5.1. TD3 + SAN

This configuration exhibited stable convergence throughout the training period. Initially, the reward values progressively increased from -3000, approaching a level of approximately 2000 within the first 90000 episodes. After this phase, the reward plateaued around 2300, indicative of high performance and long-term stability.

6.5.2. Standalone TD3

Standalone TD3: In contrast, the standalone TD3 algorithm demonstrated faster convergence compared to TD3 + SAN, with rewards rising from the initial values and stabilizing near 1900. Despite some fluctuations observed between episodes 65000 and 75000, the final reward consistently hovered around 1900, suggesting quicker convergence but at a lower reward threshold compared to TD3 + SAN.

6.5.3. DQN + SAN

The DQN + SAN configuration showed the least favorable outcomes among the three tested configuration. Starting with an initial reward of -3400, it exhibited modest improvement during the first 45,000 episodes before stabilizing at a final reward of approximately 920. This configuration suffered from the highest degree of oscillation and demonstrated substantial instability, marking it as the least effective in this task.

It can be concluded from the ablation study that the TD3 + SAN configuration outperformed the others by achieving the highest final rewards and exhibiting significant training stability, albeit with a slower convergence rate. Conversely, the TD3 algorithm, though converging faster, failed to achieve the higher reward levels of TD3 + SAN. DQN + SAN displayed considerable instability and underperformance, reinforcing its unsuitability for robust task execution.

6.6. Experiment Using a Real Robot

To validate the proposed visual target-driven navigation approach, real-world experiments were conducted using a mobile robot, as shown in Figure 10. The robot was equipped with an RGB-D camera, which provided a 70-degree field of view, and operated under ROS Noetic. The experiments were carried out in a 9.5 m × 7 m indoor environment, with the robot and the target positioned at opposite ends of the area. To simulate a dynamic environment, five pedestrians were included, moving at typical walking speeds to represent moving obstacles. The safety cone served as the visual target for the robot.

The failure condition was defined as the robot coming within 6 cm of a human or a wall, as detected by an ultrasonic sensor, to ensure safety. Additionally, if the robot failed to reach the target within 30 seconds, the trial was considered a failure. The success condition was defined as the robot coming within 15 cm of the target, as detected by the RGB-D camera.

Three distinct scenarios were tested to evaluate the performance of the trained policy: (a) pedestrians standing randomly along the pathway from the robot to the target without movement; (b) pedestrians positioned on either side of the pathway, with some crossing the pathway; and (c) pedestrians positioned on either side of the pathway, with some walking toward the target. The experimental setup is shown in Figure 11. Each scenario was tested 20 times, and the success rate and average passing time for each trial are summarized in Table 5.

7. Discussion

7.1. Benefits of TD3 + SAN

In Figure 8, the incorporation of SAN led to a significant improvement in performance, particularly in the TD3 + SAN configuration. This improvement can be attributed to SAN's ability to enable the agent to dynamically focus on relevant information and recover lost targets when the target temporarily leaves its FOV. By leveraging past observations and capturing global contextual relationships, SAN helps the agent infer the target’s approximate position and resume tracking effectively. Specifically, SAN offers two key advantages:

Target Recovery and Tracking: Through its relational reasoning capability, SAN correlates spatial and temporal patterns in previous observations, allowing the agent to infer the target’s location even in scenarios with occlusions or limited visibility.
Dynamic Focus on Critical Regions: The mechanism of SAN enables robot to dynamically assign importance weights to different regions in the observation space. This allows the agent to prioritize target and obstacle-related information, enhancing decision-making under uncertainty.

Compared to traditional approaches, SAN improves the agent’s robustness to partial observability, reduces instability, and ensures smoother navigation trajectories. These advantages are evident in the TD3 + SAN results, where the agent achieved higher rewards and greater long-term stability than TD3 alone.

7.2. Overcoming the Challenge of Limited FOV

In visual target-driven crowd navigation, the limited FOV often causes the target to be lost, hindering navigation performance. Our method integrates the SAN, which leverages information from past frames to help the agent recover and track the target effectively. Additionally, we utilize TD3 algorithm, which is particularly well-suited for handling continuous action spaces, enabling the agent to efficiently navigate and reach the target. TD3’s actor-critic mechanism, along with its key features of target value smoothing and delayed updates, ensures stable learning, allowing the agent to adapt to dynamic crowd environments even with a limited FOV. Moreover, our reward function design of Camera View Reward

R_{c v}

and Target Position Reward

R_{t p}

effectively encourages the robot to keep the target within its view.

The results of TD3 + SAN demonstrate that the combination of SAN and TD3 enhances the agent's robustness and navigation performance under limited FOV conditions, validating the effectiveness of our approach.

7.3. Collaborate Training of CNN, SAN and TD3 for Better Performance

The experimental results are shown in Figure 9. TD3 + SAN demonstrates significant advantages in dynamic and crowded environments with limited field of view. Compared to traditional TD3 and DQN + SAN, TD3 + SAN converges to a higher average reward more quickly, exhibits better stability, and achieves signifi6cantly better final reward values. This indicates that the collaborative optimization of CNN and SAN effectively utilizes both visual and temporal features, compensating for the perception limitations due to the restricted field of view, while enhancing decision-making robustness and performance in complex interaction scenarios.

8. Conclusions

In this study, we introduced a new visual target-driven navigation strategy for robots operating in crowded and dynamic environments with a limited FOV. Our approach utilizes the TD3 algorithm, integrated SAN, enabling effective navigation without the need for pre-mapped environments

Our contributions offer meaningful advancements in the field of autonomous robotic navigation. Firstly, we successfully reduced hardware and computational requirements by employing a single RGB-D camera, thereby simplifying the system's architecture, and lowering the overall cost and complexity. Secondly, the incorporation of a SAN enabled our system to maintain awareness of the navigation target even with a limited sensory field, enhancing the robot’s ability to relocate and track dynamic targets efficiently. Finally, the robust performance of our approach was demonstrated through extensive simulation and real robot experiments, which showed superior navigation capabilities in terms of both obstacle avoidance and target pursuit in complex environments.

The experimental results validated that our model outperforms traditional DRL-based methods in dynamic environments, achieving higher success rates and shorter average target-reaching times. Notably, our method exhibited greater adaptability to changes within the environment, proving particularly effective in real-world scenarios where unpredictability and the presence of dynamic obstacles are common.

Future work will focus on further enhancing the adaptability and efficiency of the navigation system under even more challenging conditions, such as outdoor environments with more extreme variability. Additionally, integrating multi-modal sensory inputs will be considered to broaden the applicability of the technology across various robotic applications.

Author Contributions

Conceptualization, Y.L.; methodology, Y.L.; Q.L.; software, Q.L.; formal analysis, Y.L.; Q.L.; Y.S.; resources, J.Y.; writing—original draft preparation, Y.L., Q.L.; writing—review and editing, J.Y., Y.S., B.W.; visualization, Q.L., B.W.; supervision, J.Y.; project administration, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

H. Yin, S. Li, Y. Tao, J. Guo, and B. Huang, "Dynam-SLAM: An Accurate, Robust Stereo Visual-Inertial SLAM Method in Dynamic Environments," IEEE Trans. Robot., vol. 39, no. 1, pp. 289-308, Feb. 2023. [CrossRef]
S. M. Chaves, A. Kim, and R. M. Eustice, "Opportunistic sampling-based planning for active visual SLAM," in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2014, pp. 3073-3080.
K. Lobos-Tsunekawa, F. Leiva, and J. Ruiz-del-Solar, "Visual navigation for biped humanoid robots using deep reinforcement learning," *IEEE Robotics and Automation Letters*, vol. 3, no. 4, pp. 3247-3254, Oct. 2018. [CrossRef]
J. Jin, N. M. Nguyen, N. Sakib, D. Graves, H. Yao, and M. Jagersand, "Mapless navigation among dynamics with social-safety-awareness: a reinforcement learning approach from 2D laser scans," *arXiv preprint arXiv:1911.03074 [cs.RO]*, Nov. 2019. [CrossRef]
L. Tai, G. Paolo, and M. Liu, "Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation," in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 31-36, Sep. 2017.
M. M. Ejaz, T. B. Tang and C. -K. Lu, "Autonomous Visual Navigation using Deep Reinforcement Learning: An Overview," 2019 IEEE Student Conference on Research and Development (SCOReD), Bandar Seri Iskandar, Malaysia, 2019, pp. 294-299. [CrossRef]
H. Bai, S. Cai, N. Ye, D. Hsu, and W. S. Lee, "Intention-aware online POMDP planning for autonomous driving in a crowd," in *2015 IEEE International Conference on Robotics and Automation (ICRA)*, Seattle, WA, USA, 2015, pp. 454-460. [CrossRef]
C. Wang, J. Wang, Y. Shen, and X. Zhang, "Autonomous Navigation of UAVs in Large-Scale Complex Environments: A Deep Reinforcement Learning Approach," IEEE Trans. Veh. Technol., vol. 68, no. 3, pp. 2124-2136, Mar. 2019. [CrossRef]
J. Zeng, R. Ju, L. Qin, Y. Hu, Q. Yin, and C. Hu, "Navigation in Unknown Dynamic Environments Based on Deep Reinforcement Learning," Sensors, vol. 19, no. 18, p. 3837, Sep. 2019. [CrossRef]
K. Zhu and T. Zhang, "Deep reinforcement learning based mobile robot navigation: A review," Tsinghua Sci. Technol., vol. 26, no. 5, pp. 674-691, Oct. 2021. [CrossRef]
N. D. Toan and K. G. Woo, "Mapless navigation with deep reinforcement learning based on the convolutional proximal policy optimization network," in *Proc. 2021 IEEE International Conference on Big Data and Smart Computing (BigComp)*, Jeju Island, Korea (South), 2021, pp. 298-301. [CrossRef]
L. Sun, J. Zhai, and W. Qin, "Crowd navigation in an unknown and dynamic environment based on deep reinforcement learning," *IEEE Access*, vol. 7, pp. 109544-109554, 2019. [CrossRef]
H. Anas, O. W. Hong, and O. A. Malik, "Deep reinforcement learning-based mapless crowd navigation with perceived risk of the moving crowd for mobile robots," *arXiv preprint arXiv:2304.03593 [cs.RO]*, Apr. 2023. [CrossRef]
M. Shi, G. Chen, Á. Serra Gómez, S. Wu, and J. Alonso-Mora, "Evaluating dynamic environment difficulty for obstacle avoidance benchmarking," *arXiv preprint arXiv:2404.14848 [cs.RO]*, Apr. 2024. [CrossRef]
X. Sun, Q. Zhang, Y. Wei, and M. Liu, "Risk-aware deep reinforcement learning for robot crowd navigation," *Electronics*, vol. 12, no. 23, Art. no. 4744, 2023. [CrossRef]
C. Chen, Y. Liu, S. Kreiss, and A. Alahi, "Crowd-robot interaction: Crowd-aware robot navigation with attention-based deep reinforcement learning," arXiv preprint arXiv:1809.08835 [cs.RO], Sep. 2018. [CrossRef]
N. Duo, Q. Wang, Q. Lv, H. Wei, and P. Zhang, "A deep reinforcement learning based mapless navigation algorithm using continuous actions," in *2019 International Conference on Robots & Intelligent System (ICRIS)*, Haikou, China, 2019, pp. 63-68. [CrossRef]
M. Dobrevski and D. Skočaj, "Deep reinforcement learning for map-less goal-driven robot navigation," *International Journal of Advanced Robotic Systems*, vol. 18, no. 1, 2021. [CrossRef]
X. Lei, Z. Zhang, and P. Dong, "Dynamic path planning of unknown environment based on deep reinforcement learning," Journal of Robotics, vol. 2018, pp. 1-10, 2018. [CrossRef]
G. Chen, L. Pan, Y. Chen, P. Xu, Z. Wang, P. Wu, J. Ji, and X. Chen, "Robot navigation with map-based deep reinforcement learning," arXiv preprint arXiv:2002.04349 [cs.RO], Feb. 2020. [CrossRef]
S. Yao, G. Chen, Q. Qiu, J. Ma, X. Chen and J. Ji, "Crowd-Aware Robot Navigation for Pedestrians with Multiple Collision Avoidance Strategies via Map-based Deep Reinforcement Learning," 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 2021, pp. 8144-8150. [CrossRef]
J. Kulhánek, E. Derner, T. de Bruin, and R. Babuška, "Vision-based navigation using deep reinforcement learning," *arXiv preprint arXiv:1908.03627 [cs.RO]*, Aug. 2019. [CrossRef]
S. Hsu, S. Chan, P. Wu, K. Xiao, and L. Fu, "Distributed deep reinforcement learning based indoor visual navigation," in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 2018, pp. 2532-2537. [CrossRef]
Y. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, L. Fei-Fei, and A. Farhadi, "Target-driven visual navigation in indoor scenes using deep reinforcement learning," arXiv preprint arXiv:1609.05143 [cs.CV], Sep. 2016. [CrossRef]
J. Kulhánek, E. Derner, and R. Babuška, "Visual navigation in real-world indoor environments using end-to-end deep reinforcement learning," arXiv:2010.10903v1 [cs.RO] , Oct. 2020. arXiv:2010.10903v1 [cs.RO], Oct. 2020.
Q. Wu, K. Xu, J. Wang, M. Xu, X. Gong, and D. Manocha, "Reinforcement learning-based visual navigation with information-theoretic regularization," IEEE ROBOTICS AND AUTOMATION LETTERS, vol. 6, no. 2, April. 2021.
L. Liu, D. Dugas, G. Cesari, R. Siegwart, and R. Dubé, "Robot navigation in crowded environments using deep reinforcement learning," in *Proc. 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)*, Las Vegas, NV, USA, 2020, pp. 5671-5677. [CrossRef]
H. Zhao, J. Jia, and V. Koltun, "Exploring Self-Attention for Image Recognition," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), 2020, pp. 100-110. [CrossRef]
Humphreys, Glyn & Sui, Jie. (2015). Attentional control and the self: The Self Attention Network (SAN). Cognitive neuroscience. 21. [CrossRef]
S. Fujimoto, H. van Hoof, and D. Meger, "Addressing function approximation error in actor-critic methods," *arXiv preprint arXiv:1802.09477 [cs.AI]*, Feb. 2018. [CrossRef]
H. Liu, Y. Shen, C. Zhou, Y. Zou, Z. Gao, and Q. Wang, "TD3 based collision free motion planning for robot navigation," arXiv preprint arXiv:2405.15460 [cs.RO*, May 2024. [CrossRef]
X. Qu, Y. Jiang, R. Zhang, and F. Long, "A deep reinforcement learning-based path-following control scheme for an uncertain under-actuated autonomous marine vehicle," *J. Mar. Sci. Eng.*, vol. 11, no. 9, p. 1762, Sep. 2023. [CrossRef]

Figure 1. A scenario of a mobile robot navigating in a crowded environment toward a visual target.

Figure 2. System architecture of the proposed visual target-driven robot navigation.

Figure 3. Convolution neural network (CNN) to extract features from depth data.

Figure 4. Self-attention network to process sequences of past target positions and actions.

Figure 5. Collision risk penalty based on ratio of the number of “dangerous-level” depth pixels to the number of “warning-level” depth pixels.

Figure 6. Enhanced TD3 framework for visual target-driven robot crowd navigation with simultaneous CNN and SAN updates using critic loss.

Figure 7. Simulation environment for training and testing.

Figure 8. Trajectories of the robot navigating a 5 m × 5 m simulated space with eight moving cylindrical obstacles under Brownian motion and crossing motion.

Figure 9. Learning curves in the ablation study.

Figure 10. Real robot experiment. (a) Real robot with RGBD camera; (b) Experiment environment.

Figure 11. Real robot operating in three scenarios.

Table 1. TD3-based Policy Optimization.

Algorithm
1	Initialize critic networks $Q_{i} (i = 1, 2)$ , and actor network $π_{\emptyset}$ with parameters $θ_{i} (i = 1, 2)$ and $\emptyset$
2	Initialize target networks ${θ_{i}}^{'} \leftarrow θ_{i} (i = 1, 2)$ , $\emptyset^{'} \leftarrow \emptyset$
3	Initialize replay buffer $B$
4	while $t < t r a i n_s t e p s$ do
5	$s \leftarrow e n v . s t e p ()$
6	for $t = 0 \dots T - 1$ do
7	$a \approx π_{\emptyset} (s) + ϵ$
8	$s^{'}, r \leftarrow e n v . s t e p (a)$
9	store $(s, a, r, s^{'})$ into $B$
10	if $l e a r n i n g_m o d e$ then
11	Sample mini-batch $(s, a, r, s^{'})$ from $B$
12	Next action $a^{'} \leftarrow π_{\emptyset^{'}} (s^{'}) + ϵ^{'}$
13	$y \leftarrow r + γ \min_{i = 1, 2} Q_{{θ_{i}}^{'}} (s^{'}, a^{'})$
14	$c r i t i c l o s s L_{θ_{i}} \leftarrow \frac{1}{N} \sum {(Q_{θ_{i}} (s, a) - y)}^{2}$
15	Update critics $θ_{i} \leftarrow L_{θ_{i}}$
16	Update CNN $θ_{C N N} \leftarrow θ_{C N N} - α \nabla_{θ_{C N N}} L_{θ_{i}}$
17	Update SAN $θ_{S A N} \leftarrow θ_{S A N} - α \nabla_{θ_{S A N}} L_{θ_{i}}$
18	if $t$ mod d then
19	Update $\emptyset$ by deterministic policy gradient:
20	$\nabla_{\emptyset} J (\emptyset) = \frac{1}{N} \sum \nabla_{a} Q_{θ_{1}} (s, a) \|_{a = π_{\emptyset} (s)} \nabla_{\emptyset} π_{\emptyset} (s)$
21	Update target networks:
22	$θ_{i} \leftarrow τ θ_{i} + (1 - τ) {θ_{i}}^{'}$
23	$\emptyset^{'} \leftarrow τ \emptyset + (1 - τ) \emptyset^{'}$
24	end if
25	end if
26	end for
27	end while

Table 2. Performance comparison with map-based DRL approaches.

Method	Success Rate	Sensor	Env	Target
Ours	0.97	RGB-D	Mapless	Randomly located
DQN-SLAM [20]	0.94	LiDAR	Map-based	Pre-defined
PPO-SPD [21]	0.996	RGB-D + LiDAR	Map-based	Pre-defined

Table 3. Performance comparison with vision-based DRL approaches in mapless scenarios.

Method	Success Rate	Environment	Sensor
Ours	0.87	With 8 moving obstacles	RGB-D
Method by Wu et al. [26]	0.627	Static environment	RGB
PPAC+LSTM [25]	1.0	Static environment	RGB-D

Table 4. Performance comparison with existing TD3-based Approach in mapless scenarios.

Method	Success Rate			Average Time (s)				Sensor	Target
	4 moving obstacles	8 moving obstacles	12 moving obstacles	4 moving obstacles		8 moving obstacles	12 moving obstacles
Ours	0.93	0.87	0.57	13.78	15.96		18.87	RGB-D	Randomly located
TD3 with Collision Probability [13]	0.93	0.82	0.49	22.98	27.13		37.13	LiDAR	Pre-defined

Table 5. Results of the real robot experiment.

Motion Type	Success Rate	Average Time
Static	18/20	12.37s
Crossing	15/20	19.32s
Approaching	11 /20	23.47s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.