Each and every research paper selected for this survey have used one type of machine learning methodologies. In this section, each research paper will be analyzed based on the following criteria: what category of the UAV does the paper fall in: single rotor, multi-rotor, hybrid VTOL, or fixed wing? What have the authors achieved in their research using machine learning methodologies? Then obtain the analytical statistics information: what machine learning methodology have the authors used; in what period (month and year) was the research completed; and in what geographic region was the research performed? The analyses below are summarized per UAV category.
3.1. Fixed-Wing
C. Yan et al. [
10], investigated the leader–followers flocking problem of fixed-wing UAVs using an algorithm that can generate roll angle and velocity commands by training an end-to-end controller in continuous state and action spaces based on a continuous actor–critic learning automation of reinforcement learning. G. De Luca et al. [
11], investigated the reliability of free and open-source algorithms used in the geographical object-based image classification of very high resolution imagery surveyed by unmanned aerial vehicles using supervised learning. H. Khanzada et al. [
12], presented a comparative analysis of reinforcement learning algorithms in flight control systems used in fixed-wing UAVs. The authors compared deep deterministic policy gradient, twin delayed deep deterministic policy gradient, proximal policy optimization, trust region policy optimization, and soft actor-critic to determine their suitability or complex UAV dynamics, using reinforcement learning. J. Li et al. [
13], proposed a control system integrated with the conventional proportional–integral–derivative guidance law to facilitate the autonomous landing of fixed-wing UAVs and the automated tuning parameters through the use of deep Q-network of reinforcement learning. C. Tang et al. [
14], studied the reward shaping used during training and the effects of hyper-parameters and different network topologies of neural networks in training of fixed-wing UAV landing control, using deep deterministic policy gradient of reinforcement learning. Y. Zhao et al. [
15], studied the problem of collision avoidance for a variable number of fixed-wing UAVs in a limited airspace and formulated a set of flight scenarios using multi-agent Markov game theory and established a self-learning framework by using the actor-critic algorithm of reinforcement learning.
L. Lv et al. [
16], developed a hybrid sparrow search optimization framework for navigation in complex topographies with two-level architecture: at a lower level, a purpose-built operator suite essential for mountain environments, and at a higher level, a learning agent to adapt the search strategy to the terrain complexity, for multiple fixed-wing unmanned aerial vehicles, based on reinforcement learning. A. Cui et al. [
17], investigated the fault diagnosis and health management of fixed-wing UAVs and proposed a method of five algorithms: flight data generation, sample training prediction based on the long short-term memory network, gray prediction, combined prediction, and health calculation and management, all based on deep supervised learning. A. Sezgin et al. [
18], proposed a regression-based energy-aware drone selection technique to forecast energy consumption based on drone type, payload, and mission distance. The authors compared decision tree, random forest, and linear regression for hyper-parameters optimization of supervised learning. F. Giral et al. [
19], studied the guidance, navigation, and control systems of aerial vehicles, specifically focusing on motion planning for fixed-wing UAVs and presented two key applications, waypoint tracking and dynamic target interception, based on reinforcement learning. N. Musavi et al. [
20], proposed a realistic modeling framework to manage the interaction between manned and fully autonomous unmanned aircraft systems, equipped with sense-and-avoid algorithms, in the national airspace system by utilizing the game theory and reinforcement learning.
N. Santos [
21], proposed a hybrid artificial neural network model, incorporating a self-organizing map to identify feature points representing a cluster obtained from a binary image containing the UAV to estimate the actual UAV pose for landing maneuver, based on a single frame and the network structure was trained using a synthetic dataset based on unsupervised learning. A. Guerra-Langan et al. [
22], presented a 3-DOF longitudinal flight dynamics and control simulation model of a small fixed-wing UAV to control airspeed of the UAV using the readings from the sensing array distributed on the wing and to look into the sensor layout and its effect on the performance of the controller, using supervised learning. E. Bohn et al. [
23], studied how to perform attitude control of a fixed-wing UAV operating directly on the original nonlinear dynamics, requiring as little as 3 min of flight data, based on reinforcement learning. K. Borup et al. [
24], presented a method for estimating the air data parameters for a small fixed-wing UAV using an arrangement of low-cost micro-electromechinal systems-based pressure sensors embedded in the surface of the UAV, by implementing the linear regression of supervised learning. A. Pasha et al. [
25], proposed an algorithm trained and evaluated to overcome the problem of fixed-wing UAVs inaccurate estimates of attitude angles and lost control by predicting missing data from a malfunctioning system sensor using the data available from other system functioning sensors, using unsupervised learning.
J. Jiang et al. [
26], studied a fixed-wing unmanned aerial vehicle mounted multispectral camera to acquire canopy spectral data of winter wheat at the jointing and booting stages, while agronomic indicators of plant dry matter, plant nitrogen accumulation and nitrogen nutrition index, as well as agrometeorological and field management data, were measured synchronously to establish a spatially and temporally explicit model for the diagnosis of winter wheat nitrogen status on small scale farms. The authors characterized the relationships between agronomic variables and UAV remote sensing by using and comparing four methods of supervised learning. M. Bronz et al. [
27], studied the feasibility of real-time fault prediction in real flight fixed-wing UAV experiencing noisy measurements, communication limitations, and wrapped wing structure that breaks the geometric symmetry, using supervised learning. Z. Yu et al. [
28], investigated the fault-tolerant formation control problem for networked fixed-wing unmanned aerial vehicles against faults, using the actor-critic neural networks of reinforcement learning. X. Zhao et al. [
29], developed a framework enhanced with curriculum learning is designed, employing progressive task staging and reward optimization to accelerate convergence and improve policy stability, for fixed-wing UAVs that jointly integrated encirclement and obstacle avoidance, using a hybrid deep reinforcement learning. J. Tang et al. [
30], proposed a method for the trajectory tracking control of a fixed-wing unmanned aerial vehicle (UAV) based on the deep deterministic policy gradient integrated with a trajectory tracking controller functions ranging from the state of flight of the UAV to the rudder control, in a simulation trained with reinforcement learning.
D. Xu et al. [
31], investigated a low-cost UAV formation system consisting of one leader, equipped with the intelligence chip and five followers, without intelligence chips, and proposes a centralized collision-free formation-keeping method for fixed-Wing unmanned aerial vehicles using proximal policy optimization of reinforcement learning. X. Zhuang et al. [
32], proposed a penetration relative motion theory and aircraft control method based on the 6-DOF fixed-wing UAV model to reduce response errors between control decisions and actual flight control, and designed, based on Markov decision process, a UAV penetration decision control method by implementing autonomous decision-making and control for UAV penetration through the integration of interceptor trajectory prediction with proximal policy optimization of reinforcement learning. X. Yuan et al. [
33], designed a framework for radio surveillance, where a fixed-wing UAV is employed to acquire the radio fingerprint of a suspicious transmitter using a twin delayed deep deterministic policy gradient model to allow the UAV to learn its trajectory based on its observed transmit rate of the suspicious transmitter, in a simulation trained by deep reinforcement learning. H. Liu et al. [
34], studied the optimal formation control of networked fixed-wing unmanned aerial vehicles under communication uncertainties and external disturbances trained via an off-policy reinforcement learning. X. Zhao et al. [
35], developed a framework for optimizing fixed-wing unmanned aerial vehicles flight control strategies integrated with proximal policy optimization using deep reinforcement learning.
Y. Yuan et al. [
36], studied the fixed-wing UAVs ability to autonomously generate evasive maneuver and designed a hierarchical goal-guided learning method combining traditional off-policy deep learning algorithms and endows the agent with the ability to evade a series of air-to-air missiles, using reinforcement learning. M. Hayat et al. [
37], proposed a Bayesian method to perform rice panicle segmentation with optical images taken by unmanned aerial vehicles over paddy fields, based on unsupervised learning. M. Tan et al. [
38], proposed a hierarchical collaborative pursuit–evasion game framework based on target allocation and comprising three layers: target allocation layer, maneuver decision-making layer, and flight controller layer to ensure stable flight for multi-fixed-wing UAVs, using gradient-assisted reinforcement learning. L. Li et al. [
39], presented an analysis of the current state of a model-free, adaptive decision-making, with complex nonlinear constraints, and maneuverability in high-dynamic environments for a fixed-wing UAV control using reinforcement learning. Y. Guo et al. [
40], proposed a multi-agent fixed-wing UAV flocking algorithm named Cucker–Smale flocking to learn collision-free leader–follower flocking by utilizing the information among followers with the attention mechanism and a special reward function, to alleviate the problems of sparse reward and policy convergence, trained by reinforcement learning.
W. Hu et al. [
41], presented a control scheme employing a nonlinear disturbance observer within a dynamic inverse framework for a fixed-wing unmanned aerial vehicle, developed based on reinforcement learning. M. Khan et al. [
42], presented learning-empowered solutions to improve the multi-class UAV classification performance under challenging weather conditions using single-shot object detection algorithms for single-rotor, fixed-wing and multi-rotor UAVs, based on deep supervised learning. H. Peng et al. [
43], proposed a method to predict the future locations of the surrounding flying objects and classify them into different groups with similar levels of maneuverability, such as rotatory and fixed-wing UAVs, without prior knowledge regarding these classes, using generative unsupervised learning. M. Rostami et al. [
44], studied endurance and energy efficiency of fixed-wing electric unmanned aerial vehicles by utilizing the Harris Hawk optimization algorithm to regulate the UAV fuel consumption between the fuel cell and the battery, using fuzzy logic-based programming and multi-factor reinforcement learning. S. Wang et al. [
45], proposed a UAV fault detection algorithm based on contrast learning and spatial–temporal information of multivariate flight data by designing a series of specific sample transformations, and the feature distribution of normal data were modeled through comparing the similarity of different transformed samples using self- supervised learning.
A. Fotouhi et al. [
46], proposed a two-hop communication model, between an end-user and a macro-cell through a UAV base stations to optimize the flight trajectory of a fixed-wing UAV using deep Q-learning of reinforcement learning. J. Valasek et al. [
47], investigated machine learning visual tracking of stationary and moving ground targets by fixed-wing unmanned air systems with non-gimbaling, fixed pan-and-tilt cameras and developed an algorithm to determine an offline control policy for vehicle orientation and flight path such that a target can be tracked in the image frame of the camera without the need for operator input, trained by reinforcement learning. Y. Lin et al. [
48], proposed a method that integrated automatic curriculum learning with multi-agent proximal policy optimization to simultaneously train both pursuers and evaders involving fixed-wing UAVs, enabling dynamic and adaptive encirclement strategies in autonomous decision-making using reinforcement learning. M. Zhang et al. [
49], investigated the problem of prescribed-time optimal formation control with guidance trajectories for all fixed-wing unmanned aerial vehicles subject to denial-of-service attacks, and the optimal controller was designed by combining the back-stepping technique with reinforcement learning. S. Hu et al. [
50], developed an online control scheme for visual-based UAV-on-UAV tracking and monitoring, where a solar-powered, fixed-wing UAV tracks a suspicious UAV target by having the target inside its effective visual range, a model based on deep deterministic policy gradient trained by deep reinforcement learning.
Y. Xue et al. [
51], proposed a low-cost decision-making method for fixed-wing UAV swarm collaborative based on kinematic model and target spatial correlation model, to reduce the control error of algorithmic decision-making results and flight guidance law, and improve the detection using deep reinforcement learning. M. Chowdhury et al. [
52], developed a model-free interchangeable flight controller for fixed-wing unmanned aircraft system by incorporating memory functions into the policy using long-short-term memory, trained in simulation using proximal policy optimization of deep reinforcement learning. J. Mei et al. [
53], studied the scenario to maximize data collection of a fixed-wing unmanned aerial vehicle under insufficient battery energy and make a safety return, by adopting a twin delayed deep deterministic policy gradient algorithm with three designed reward functions with reinforcement learning. A. Din et al. [
54], presented a control architecture for maximizing glide range of a fixed-wing UAV having unconventional design utilizing an optimal dynamic programming in a continuous state and control space domains, coupled with nonlinear simulations based on reinforcement learning. Y. Ou et al. [
55], developed a hierarchical obstacle avoidance strategy that combined a high-level navigator with a low-level controller to guide and control a fixed-wing unmanned aerial vehicle to avoid a potential collision with a noncooperative dynamic obstacle, based on reinforcement learning.
X. Zhuang et al. [
56], proposed a decoupled guidance and control theory for autonomous fixed-wing aircraft maneuvering, distinguishing between close and long-range engagements using a Markov decision process with a hybrid discrete and continuous action reinforcement learning. K. Haughn et al. [
57], developed a state inference to efficiently alleviate gusts on a smart material camber-morphing wing of uncrewed aerial vehicles by relying on extensive sensing networks and using deep reinforcement learning. M. Raoufi et al. [
58], developed a unified path planning and control framework for uncrewed aerial vehicles operating in dynamic wildfire environments using deep deterministic policy gradient for tracking fire evolution through an architecture comprising high-level planning and low-level control components, based on reinforcement learning. C. Wang et al. [
59], developed an approach to solve the leader-follower flocking problem in continuous state and action spaces, and proposed a continuous actor-critic learning automation algorithm that used multilayer perceptron to represent both the actor and the critic with a structure based on continuous actor-critic learning automation of reinforcement learning. W. Zhao et al. [
60], investigated the problem of a non-stationary environment caused by the change of learning agent strategy in a multi-agent environment and presented an improved fixed-wing multi-UAV control algorithm using proximal policy optimization with centralized learning and decentralized execution, based on reinforcement learning.
B. Zhang et al. [
61], proposed a collision-free policy for 3D fixed-wing UAV Swarms based on scale–scalable combined with a reward function to achieve a balance between maintaining tight swarm formations and avoiding internal collisions, using multi-agent reinforcement learning. M. Qian et al. [
62], studied a fault-tolerant tracking control scheme of multiple unmanned aerial vehicles under denial-of-service attacks, using neural networks combined with a long term performance index of fixed-wing UAVs constructed to tackle the adverse impact of bias faults and external disturbances, based on reinforcement learning. W. Yang et al. [
63], developed an optimal formation control scheme integrated within an actor-critic framework for fixed-wing multi-unmanned aerial vehicle systems subject to uncertainties and external disturbances, bypassing the limitations of reinforcement learning. T. Ao et al. [
64], developed a leader-follower strategy for heterogeneous multi-UAVs such that fixed-wing UAVs serve as leaders to provide dynamic relay communication coverage, while multi-rotor UAVs act as followers for multi-target search tasks to optimize flight trajectories separately for fixed-wing and multi-rotor UAVs by using multi-agent reinforcement learning. J. Han et al. [
65], proposed a collision prediction and avoidance method for fixed-wing unmanned aerial vehicles to identify the neighboring UAVs with a significant threat, and a trajectory tracking macro action was incorporated into the action space guided by the reward function that considered to reward for closing to the preset flight paths, trained in simulation scenarios with model updates implemented using a soft actor-critic of deep reinforcement learning.
B. Meng et al. [
66], proposed a fixed-time distributed optimized formation control scheme to improve the steady-state and transient performances of fixed-wing UAVs system with uncertainties, communication link and actuator faults, and performance constraint, based on reinforcement learning. F. Xie et al. [
67], presented a hybrid intelligent control strategy through proximal policy optimization synergistically combining the stability of control theory with the adaptive learning capabilities for fixed-wing unmanned aerial vehicles to provide a baseline control, stability, and tracking efficiency via an actor-critic mechanism of reinforcement learning. M. Ali et al. [
68], developed algorithms for multi-agent fixed-wing UAV path planning in 2D and 3D simulated environments and compared five algorithms, proximal policy optimization, soft actor–critic, deep deterministic policy gradient, trust region policy optimization, and multi–agent deep deterministic policy gradient, in various scenarios using reinforcement learning. Q. Cheng et al. [
69], studied one of the collision avoidance scenarios for fixed-wing unmanned aerial vehicles needed to avoid collision with the enemy UAV during its flying path to the goal point, using a fixed-rule policy and Markov decision process with temporal-difference reinforcement learning. C. Yan et al. [
70], proposed a selective behavior cloning enhanced actor-critic algorithm as a partially observable Markov decision process with sparse rewards and learn an end-to-end policy that maps imperfect sensor data to control signals, an approach for fixed-wing UAV navigation and obstacle avoidance that synthesizes imitation learning with deep reinforcement learning.
M. Su et al. [
71], presented a controller framework equipped with a reward function to enable fixed-wing UAVs to navigate safely using only monocular onboard cameras, integrated with an adaptive entropy regularization mechanism into the proximal policy optimization using reinforcement learning. K. Julian et al. [
72], presented two approaches for training decentralized multiple autonomous fixed-wing UAV controllers to maximize forest fire coverage, track wildfire expansions, and outperform an online receding-horizon controller, using deep reinforcement learning. Y. Cui et al. [
73], presented a study that reframed vertical wind shear as a harvestable energy source and introduced a dynamic-soaring framework specifically to address the fixed-wing UAV controller's failure when turbulence or wind-direction shifts occurred, through dual-objective deep reinforcement learning. J. Wu et al. [
74], presented a reactive online decision-making maneuver controller of obstacle avoidance for fixed-wing unmanned aerial vehicles matched with a composed interfered fluid dynamical system guidance law based on deep reinforcement learning. Y. He et al. [
75], proposed an algorithm with a hybrid continuous action space to address the long-term dependency issues inherent in fixed-wing UAV operations, based on a continuous policy network to facilitate real-time flight path planning using reinforcement learning.
S. Jin et al. [
76], proposed a temporal sequence constrained Q-learning framework for tilt-wing unmanned aerial vehicle requires a unified control strategy across rotary-wing, fixed-wing, and transition modes to integrate an encoder–decoder with recurrent networks to capture temporal dependencies based on offline reinforcement learning. G. Chang et al. [
77], proposed a distributed Hierarchical Cooperative Soft Actor-Critic with maximum entropy (HC-SAC) framework to manage the decision making in close-range air combat for fixed-wing unmanned air vehicle swarms using traditional multi-agent reinforcement learning. Z. Liu et al. [
78], studied the trajectory optimization problem of fixed-wing uncrewed aerial vehicles to maximize the energy efficiency and proposed two algorithms over two timescales: a successive convex approximation strategy and a multi-agent reinforcement learning. S. Devaraju et al. [
79], developed a base station-distributed pheromone mobility model to autonomously coordinate fixed-wing UAV movements relied only on information from neighboring UAVs in a decentralized network using a Q-learning policy variant of deep reinforcement learning. H. Du et al. [
80], proposed a hierarchical two-layer motion planning algorithm for fixed-wing UAV based on a global planner in the presence of static obstacles and other UAVs, to accomplish kino-dynamic and collision-free motion planning within the sensing range, using a local reinforcement learning.
W. Wang et al. [
81], developed a flocking and navigation control of a fixed-wing UAV swarm as a Markov decision process with an oracle-guided two-stage training and execution scheme with observations and rewards based on multi-agent reinforcement learning. Y. Wang et al. [
82], proposed an approximate optimal curve-path-tracking control algorithm for partially unknown nonlinear systems for fixed-wing unmanned aerial vehicles subject to asymmetric control input constraints using neural networks with an integral reinforcement learning. D. Wada et al. [
83], investigated a fixed-wing UAV pitch control in wind tunnel tests using three different training approaches: a baseline using simple linear dynamics, a high-fidelity modeling, and a domain randomization, based on deep reinforcement learning. A. Viseras et al. [
84], proposed a framework to allow a team of autonomous unmanned aerial vehicles to learn how to monitor a fire front, starting with two UAVs and extended to a larger number of UAVs, by utilizing multiple single trained agents and value decomposition networks, based on reinforcement learning.
3.3. Multi-Rotor
W. Cheng et al. [
97], proposed a fault-tolerant control method to achieve high-speed flight of a quadrotor UAV, based on deep reinforcement learning. J. Huang et al. [
98], proposed an acceleration framework with incomplete information and an energy consumption model for quadrotor UAVs to characterize UAV communications, and designed a UAV trajectory planning algorithm based on reinforcement learning. B. Jiang et al. [
99], proposed a hybrid aerodynamic modeling method and model predictive control design for a quadrotor tail-sitter UAV that exhibited high predictive accuracy at a low computational cost and was used to optimize the throttle, pitch angle, and roll angle for position tracking, using supervised learning. R. Kestur et al. [
100], studied the use and potential of low-altitude remote sensing UAVs in agriculture, particularly in small open agricultural field, by testing two UAVs, a fixed-wing UAV and a quadrotor UAV and applying the unsupervised K-Means Clustering vs. the extreme supervised learning.
P. Jin et al. [
101], investigated a dynamic event-triggered robust optimal attitude tracking control problem for a quadrotor unmanned aerial vehicle in an uncertain environment and developed an augmented system of tracking error signal and reference signal to transform the tracking problem into a stabilization problem using Hamilton–Jacobi–Bellman equation to handle the random disturbances in the design of the controller and reinforcement learning. D. Silva et al. [
102], proposed an algorithm to accelerate the training of a drone agent for a counter unmanned aerial system and guide an invading drone to a safe-killing zone using a hunter quadrotor drone whose training dataset was based on reinforcement learning. C. Wang et al. [
103], proposed a framework for generating an optimal landing policy based on vision for a quadrotor unmanned aerial vehicle on a moving unmanned ground vehicle and presented a landing vision system to determine rapid localization and pose estimation of the unmanned ground vehicle, using reinforcement learning. H. He et al. [
104], presented a framework combining learning-based inertial odometry module and differentiable model predictive control for a quadrotor Unmanned Aerial Vehicle attitude control using supervised learning. Z. Adaika et al. [
105], presented a data-driven fault diagnosis for a quadcopter UAV that utilizes the principal direction vector of airframe vibration signals computed by principal component analysis based on classification using support vector machines and supervised learning.
S. Khan et al. [
106], proposed an optimized approach, offering generative adversarial network for crops and weeds classification at early growth stage such that a quadcopter UAV in two different crop-lands, pea and strawberry, can acquire high-resolution images, where the classification systems deploying UAV imagery are based on supervised learning. C. Zeng et al. [
107], studied the optimal states that transitioning UAVs, switching between fixed-wing, VTOL, and multi-rotor states, would assume under different mission and flight environment scenarios, using supervised learning. R. Hernandez-Hernandez et al. [
108], proposed a hybrid multilayer extreme learning algorithm with fuzzy Logic theory for active image classification applied to quadcopter UAVs, using supervised learning. G. Zhang et al. [
109], proposed an adaptive Kalman filter for adjusting the noise covariance of UAV global navigation system measurements under different positioning accuracies, based on a proposed accuracy classification model trained by supervised learning. T. Alotaibi et al. [
110], developed an algorithm to capture images of the obstacles at various flight altitudes and images of the horizon to assist a quadcopter UAV in flying at an appropriate altitude from multiple worlds on the Gazebo simulator, using convolution neural network and deep supervised learning.
T. Li et al. [
111], developed a multi-agent deep deterministic policy gradient algorithm based on environment sensing with a control barrier function to address the path planning problem for a quadrotor UAV swarm in an unknown electromagnetic environment, using weakly supervised learning. W. Pavon et al. [
112], developed a neural-network-based approach designed to replicate the behavior of classical control systems to address the challenge of accurately tracking the position of quadcopter UAVs, using supervised learning. D. Crowe et al. [
113], investigated the adequacy of two methods, K-nearest neighbor algorithm and long short-term memory algorithm, to tackle the problem of wind velocity estimation in the lower most region of the atmosphere using on-board inertial data of multi-rotor-based UAV, based on supervised learning. P. Yang et al. [
114], developed a 1-D convolutional neural network model using the adaptive batch normalization algorithm and the fault diagnosis and classification are accomplished by feature extraction from lower dimensional multi-rotor UAV data, based on supervised learning. X. Liu et al. [
115], presented a hybrid control method to combat the most prevalent types of quadrotor UAVs failures—actuator faults, preserve the stability and continuity of mission-critical tasks, and enhance fault tolerance by utilizing the proximal policy optimization algorithm for the strategic training of our control systems, based on reinforcement learning.
J. de Souza et al. [
116], proposed an alternative method to train a multilayer perceptron neural network based on fuzzy Mamdani logic to control the landing of a quadcopter UAV with an onboard computer and a robot operating system using supervised learning. A. Haddad et al. [
117], presented a concept of dual-scale homogeneity, defined by scaled magnitudes and timce in transformed coordinates that remain independent of system parameters, to achieve consistent performance of a quadrotor UAV with a slung load system. The authors also designed a parameter-dependent policy that homogenizes the quadrotor UAV with a slung load system using reinforcement learning. G. Farid et al. [
118], presented a motion planning approach for a quadrotor UAV that learns the optimal state values and related optimal policies considering various regions of the UAV environment to achieve multiple random targets in a specific 3-D cluttered space based on reinforcement learning. T. Trad et al. [
119], studied high and low-level control systems, attitude stabilization, and position tracking of quadrotor UAVs and designed a reward function and actor-critic network structures to stimulate high-order observable states and improve the agent’s understanding of UAV dynamics based on reinforcement learning. S. Sönmez et al. [
120], proposed an approach applying a deep deterministic policy gradient algorithm, an off-policy actor–critic method, to adjust the gains of a quadrotor UAV attitude controller during flight, using reinforcement learning.
H. Zheng et al. [
121], proposed an optimal formation control strategy for multiple quadrotor UAVs with model uncertainties and unknown disturbances to achieve prescribed transient and steady-state performance at an appointed time, based on self-structuring neural network and actor-critic reinforcement learning. Q. Sun et al. [
122], proposed a curiosity-driven method for aggressive quadrotor UAV flight missions and introduced a similarity-based curiosity module is to speed up the training procedure, based on reinforcement learning. C. Wang et al. [
123], proposed a learning framework for a quadrotor UAV to land on a moving UGV without knowing its motion dynamics, a framework of two components: a landing vision system and a landing control system, with both systems using reinforcement learning. Q. Luo et al. [
124], proposed a framework to solve quadrotor UAV attitude dynamics during autonomous navigation where high-level velocity commands will be generated by a deep neural network policy and translated by a low-level control algorithm to achieve precise control of both positions and rotations of quadrotor UAVs, using deep reinforcement learning. C. Yu et al. [
125], presented a trajectory tracking control of an autonomous quadrotor unmanned aerial vehicle to perform remote inspection with reinforcement learning.
G. Wen et al. [
126], developed an optimized attitude control for a quadrotor UAV system to steer both attitude angle position and velocity states to follow the predefined reference signals using critic-actor architecture and fuzzy logic system approximation of reinforcement learning. Y. Chang et al. [
127], presented an adaptive fault-tolerant control scheme for quadrotor UAVs subjected to external disturbances, input uncertainties, and structural uncertainties, with a penalty function, a critic network to evaluate control performance based on reinforcement learning. M. Alam et al. [
128], proposed a Q-learning routing protocol inspired by adaptive flocking control for a UAV aerial surveillance and emergency communication to obtain an optimal routing path in terms of delay, stable path selection defined by predictive link duration, and energy consumption, based on reinforcement learning. V. Tran et al. [
129], presented an adaptive control synthesis methodology for a quadrotor UAV attitude and altitude stabilization based on strictly negative imaginary property and fuzzy reinforcement learning. J. Quan et al. [
130], proposed a stability-aware dynamic clipping parameter adjustment strategy that used a Lipschitz continuity interpretation and adapted the clipping threshold in real time based on a stability variance and quadrotor UAV proximal policy optimization of reinforcement learning.
M. Khojasteh et al. [
131], proposed a path-planning method of two-stage structure comprising a depth estimation module and a decision-making module, for a quadrotor UAV equipped with only a monocular camera. The first module used a convolutional encoder-decoder network to learn image depth from visual cues self-supervised, with the output serving as input for the second module. The second module used dueling double deep recurrent Q reinforcement learning. G. Ryou et al. [
132], presented an algorithm that involved co-training of a planning policy through multi-fidelity Bayesian optimization and a reward estimator to effectively create a realistic dynamics model for high-speed quadrotor UAV trajectories, based on reinforcement learning. Y. Xia et al. [
133], proposed a tracking scheme for quadrotor UAVs to approximate an escape target, addressing time constraints and guaranteeing low energy expenditure, using multi-agent deep deterministic policy gradient of deep reinforcement learning. J. Estevez et al. [
134], presented a trajectory planning algorithm, containing a feature vector and a parameter vector, for the transportation of cable-suspended loads employing three quadrotor UAVs to transport the cargo smoothly while avoiding its swing, using reinforcement learning. O. Doukhi et al. [
135], proposed a multi-nonlinear predictive control policy for quadrotor UAVs, trained dynamically by selecting adaptation parameters and feedforward control commands for the low-level controller leveraging deep reinforcement learning.
J. Chen et al. [
136], proposed an algorithm that learned online planning for quadrotor UAVs pursuit-evasion in unknown environments and derived a feasible policy via a two-stage reward refinement and deployed the policy on real quadrotors to tackle partial observability in cooperative policy learning and enable higher exploration efficiency, using reinforcement learning. H. Han et al. [
137], proposed an odd symmetric actor to achieve stable and symmetric control performance, and an even critic to stabilize the training process of a quadrotor UAV such that the bias of neural networks was eliminated and the absolute value operation was adopted to construct the activation function. The authors devised a cascade architecture in which each module was trained to control a symmetric subsystem of a quadrotor UAV using deep reinforcement learning. G. Jiménez et al. [
138], studied and compared three different algorithms: DQN, SARSA, and Actor-Critic of reinforcement learning. Addressing the problem of a quadrotor UAV navigating from departure point A to endpoint B while avoiding obstacles and using the least possible time and flying the shortest distance, the authors showed that DQN was the only algorithm achieving the target. X. Huo et al. [
139], proposed an adaptive optimal collision-free control formulation for reducing the quantity of adaptive laws and computational complexity, using neural networks estimating capacity and the actor-critic control scheme of reinforcement learning. H. Zhao et al. [
140], developed a workflow safer than traditional learning, suited for quadrotor UAVs control in industrial scenarios, especially when learning from offline datasets, and utilizes a more general behavior policy and a pessimistic estimation to foster offline deep reinforcement learning.
X. Li et al. [
141], proposed a trajectory planning method for three quadrotor UAVs based on a value-function approximation algorithm to solve the problem of cable-suspended load transportation and reach the target position faster without the load swinging, using reinforcement learning. J. Xie et al. [
142], developed a model-free method based on a partially observable Markov decision process and a UAV autonomous tracking and landing maneuver by an end-to-end neural network combining deep deterministic policy gradients and heuristic rules, based on deep reinforcement learning. K. PB et al. [
143], presented a framework to control the altitude of a quadcopter UAV and stabilize its x- and y-axes by training the UAV using the Q-learning of reinforcement learning. L. Al-Haddad et al. [
144], presented and demonstrated the efficacy of an optimized stochastic gradient descent logistic regression algorithm for classifying unbalances in quadrotor UAVs, based on supervised learning. J. Park et al. [
145], proposed a fault diagnosis algorithm using deep neural network for a single rotor of octocopter UAV to determine the failure of the rotor in real time, based on supervised learning.
Y. Song et al. [
146], proposed a quadrotor control simulator, composed of two main components, a rendering engine built on Unity and a physics engine for dynamics simulation, utilizing 3-D path planning based on reinforcement learning. E. Mosweu et al. [
147], designed and evaluated a controller for a multirotor unmanned aerial vehicle system capable of adapting its gains in accordance with changes in the system dynamics utilizing a Simulink-constructed model employing a deep deterministic policy gradient network of reinforcement learning. A. Narmilan et al. [
148], presented an approach on the preliminary detection of sugarcane white leaf disease by using high-resolution multispectral sensors mounted on small unmanned aerial vehicles (UAVs) by implementing random forest, decision tree, and K-nearest neighbors of supervised learning. S. Chen et al. [
149], proposed an energy-saving path-planning algorithm in a turbulent wind environment that dynamically adjusted flight strategies for quadcopter UAVs to find the most energy-saving flight paths, based on reinforcement learning. Y. Sun et al. [
150], proposed a UAV detection method, aiming to carry out efficient and accurate multi-target detection through computer vision technology using YOLOv5-Lite model based on supervised learning.
J. Cardenas et al. [
151], designed a flight position controller based on deep neural networks and subsequent implementation for a multi-rotor UAV of diverse trajectories with transient and steady-state information such as position, speed, acceleration, and motor output signals, using supervised learning. H. Xiong et al. [
152], proposed a formation-surrounding control method for multiple quadrotor UAVs pursuit-evasion system subject to external disturbances, by constructing position and attitude tracking error subsystems of quadrotors and presented two control strategies which combined the feedforward control technique and reinforcement learning. J. Faigl et al. [
153], proposed a surveillance planning for a multi-rotor unmanned aerial vehicle to periodically take snapshots of areas of interest and find a fast and smooth 3-D trajectory by visiting a given set of waypoint locations in the shortest time possible, based on unsupervised learning. J. Xue et al. [
154], presented an algorithm combined with a domain randomization method to learn wind-resistant hovering policies and designed a reward function to maintain a continuous flight of quadrotor UAVs and guide a Q-learning process based on deep reinforcement learning. N. Parvaressh et al. [
155], proposed an algorithm to solve the location optimization problem of multi-rotor UAV base stations in the presence of mobile endpoints using a continuous actor-critic of deep reinforcement learning.
R. Polvara et al. [
156], proposed a method that requires low-resolution images taken from a down-looking camera in order to identify the position of a marker and land a quadrotor UAV on it, based on a hierarchy of deep Q-networks of reinforcement learning. C. Sabo et al. [
157], developed a robotic solution in the form of a quadcopter that can support the necessary payload to replicate the sensing capabilities including chemical sensing and a wide visual field-of-view controlled remotely to process the sensor data using neural reinforcement learning. N. Smolyanskiy et al. [
158], presented a micro aerial vehicle system, for autonomously following trails in unstructured, outdoor environments such as forests, and utilized vision modules for environmental awareness of obstacle detection using DNN with soft labels for supervised learning. P. Jardine et al. [
159], proposed a trajectory planning algorithm for autonomous quadrotor UAVs based on model predictive control tuned with reinforcement learning. B. Chen et al. [
160], proposed an observer-based team formation tracking control scheme for large-scale quadrotor UAV systems under external disturbance, measurement noise, couplings from other neighboring quadrotor UAVs, and malicious attacks on actuator and sensor of the network control system via wireless communication, based on deep neural network reinforcement learning.
Q. Sun et al. [
161], developed a vision-based hierarchical algorithm for quadrotor UAV navigation, integrating perception, obstacle avoidance, and motion control into a unified framework with an echoic hindsight experience replay mechanism to accelerate convergence by transforming failed episodes into successful ones using reinforcement learning. Z. Zhang et al. [
162], proposed a control method that relied on collected data instead of modelling to handle coupling and reduce disturbance from aerodynamics and utilized Lyapunov method to maintain stability of the system, based on reinforcement learning. X. Lin et al. [
163], proposed an event-triggered control strategy to stabilize a quadrotor unmanned aerial vehicle with actuator saturation with a model free scheme designed based on reinforcement learning. B. Yu et al. [
164], presented a modular framework for a low-level control of a quadrotor UAV, with direct control of yawing motion and a real-world autonomous flight based on reinforcement learning. P. Sharma et al. [
165], developed a model-free algorithm for a quadrotor to recover from a single rotor failure, based on soft-actor-critic that enabled the vehicle to hover, land, and perform complex maneuvers, using reinforcement learning. Y. Gong et al. [
166], proposed an optical flow algorithm to calculate the flight speed of the drone, the calculation errors of its rotation state, and the vertical take-off and landing state, for the visual navigation and positioning of multi-rotor drones, based on the K-nearest neighbors method of supervised learning.