Preprint
Review

This version is not peer-reviewed.

Application of Machine Learning Methodologies in Unmanned Aerial Vehicles: A Survey and Systematic Review

Submitted:

04 June 2026

Posted:

05 June 2026

You are already at the latest version

Abstract
This research is a survey of published studies in unmanned aerial vehicles utilizing machine learning and a systematic review of what these studies have accomplished in the past ten years. It focuses on the application of supervised learning, unsupervised learning, and reinforcement learning in four types of unmanned aerial vehicles: fixed-wing, hybrid VTOL, single-rotor, and multi-rotor.It is found, according to this survey, that the application of all three types of machine learning (supervised, unsupervised, and reinforcement) have increased over the past 12 months, 24 months, five years, and ten years, with reinforcement learning application getting the highest increasing trend, followed by supervised learning. Unsupervised learning application is also increasing but the lowest. It is also found that among all four periods, the past ten years showed a significant increase in the machine learning application. Then per geographic region, China gets the highest count of published papers, followed by the North America and Europe. However, per category of unmanned aerial vehicles, it is found that the multi-rotor UAVs has the highest count in application of machine learning, followed by the fixed-wing and single-rotor UAVs. Though hybrid VTOL UAVs have important application, however, their use of machine learning was the least in terms of published papers count.
Keywords: 
;  ;  ;  ;  ;  ;  ;  ;  

1. Introduction

An unmanned aerial vehicle (UAV) is essentially a robot that can fly on its own. It can have its own navigation and control autonomous system that works in conjunction with sensors and a global positioning system (GPS), or it can be controlled remotely by a human operator. UAV has two versions, aircraft that is recovered when it comes back to its original station after performing a mission, and it is unrecoverable as in the case of cruise missiles. The first UAV built was in 1849 when a balloon carrier was used by the Austrian forces besieging Venice. Today, UAVs can be used for military purposes, as in the case of drones, and for non-military purposes such as in agriculture for spraying fields of crops. There are two other similar terms used in aviation, UAS and RPAS, where UAS stands for unmanned aerial system which is also an aircraft whose elements are operated with no pilot onboard, and RPAS is remotely piloted aircraft system, is another system but controlled by a remote pilot or station [1]. The difference between UAS and RPAS is that the UAS must have special airspace accommodations, be kept away from other aircraft, and be piloted from a ground control station, however, RPAS can be integrated into the airspace of manned aircraft and piloted from a remote pilot station.
Figure 1. NUUVA V300 takes off with successful first flight in January 2025. Credit Pipistrel.
Figure 1. NUUVA V300 takes off with successful first flight in January 2025. Credit Pipistrel.
Preprints 217065 g001
UAVs are used for military or non-military purposes. Non-military uses of UAVs include the field of agriculture, where they are used to monitor crops, assess soil irrigation, and spray pesticides and fertilizers. UAVs are also used to distribute meals, medicine, and small shipment packages to remote areas; in media for photography, film-making, and journalism; for public safety in search, rescue, law enforcement surveillance, wildfire mapping, and disaster response; inspecting bridges, power lines, creating detailed 3D geographic maps for urban planning; and UAVs are even used for entertainment such as UAV racing. UAVs are tools for both commercial and public good; they can fly where humans cannot easily go, provide real-time situational awareness, and capture high resolution data. It is desirable to have a UAV system undetected in operation, particularly in military applications, in order to not alert the enemy or criminals to the forthcoming operation and to protect the UAV from loss due to enemy counter-measures. In civilian applications, having the UAV undetected will result in minimizing the environmental disturbance [1]. The principal means of detecting any air vehicle are through its acoustic or electromagnetic emissions. Therefore, to reduce the detectability, it is necessary to reduce the UAV acoustic or received emission.
The use of unmanned aircraft systems was restricted to military operations until the mid-1990s and the Federal Aviation Administration advisory circular 91-57 of 1981 defined all unmanned aircraft as model aircraft, restricted to 400 feet or less altitudes and away from noise-sensitive areas, such as schools, hospitals, and temples [2]. Such restrictions aimed to eliminate any possibilities of collision with manned aircraft and to protect the population from injury caused by model aircraft crash. By mid-1990s, the unmanned aircraft vehicles proved to be successful for reconnaissance missions.
The architecture of a typical unmanned aerial vehicle consists of eight components: a mainframe, control system, sensors, monitoring system, GPS connected to internet, data processing software application system, and a landing system [3]. All these components have to pilot the UAV without human intervention. However, in some cases, UAVs are connected to control stations to monitor their tasks performance. Model-wise, a UAV consists of an aerodynamic model, structural mass model, and radar cross section model [4]. Design-wise, UAVs are classified and include four main types: fixed-wing such as airplanes, single-rotor such as helicopter, multi-rotor such as quadcopter, and hybrid VTOL, combing wings and rotors for vertical takeoff and landing.

1.1. Fixed-Wing Uavs

Based on the structure of their lift-producing surfaces, these vehicles are similar to manned aircraft and their wings are bound to the body and do not move. They may have control surfaces such as ailerons and rudder that turn and rotate, but the wings are fixed as shown in Figure 2, below. The lift for this system of vehicles is generated by the aerodynamic shape of the wings and, therefore, they need a short runway to accelerate to fly into the sky.
Physics and Dynamics of Fixed-Wing UAVs
The physics and dynamics of these aircraft revolve around a delicate balance of four fundamental aerodynamic forces: lift, drag, weight, and thrust. These aircraft rely on forward airspeed over aerodynamic surfaces and lift is generated while maintaining delicate unique flight characteristics, constraints, and control methods. Lift, denoted by L, is the upward force, primarily determined by the wing’s airfoil shape, surface area, air density, and the angle of attack. Drag, denoted by D, is the backward retarding force generated as a result of air resistance as the UAV moves forward. Weight, denoted by W, is the downward gravity forced. Thrust, denoted by T, is the forward force generated by the aircraft propulsion system. In a steady level flight, lift must exactly equal weight. That is,
L = W
And thrust must equal drag:
T = D
However, list and drag are functions of the air density (ρ), wing surface area (S), true air speed (V), and dimensionless coefficient for lift ( C L ) and drag ( C D ) as shown below.
L = 1 2   ρ V 2 S   C L
D = 1 2   ρ V 2 S   C D
The angle of attack is the angle at which the wing meets the oncoming air and when it increases so does lift. However, there is a critical limit called the stall angle, when it is exceeded, the airflow detaches from the wing’s upper surface causing a sudden loss of list and a drastic increase in drag.
A fixed-wing UAV flight dynamics is described as a 6-DOF body moving in 3D space. Translational motion includes movement along the longitudinal axis, the lateral axis, and the vertical axis. In addition, there are three rotational movements, one about the longitudinal axis is called roll (ф), second about the lateral axis is called pitch (θ), and third is about the vertical axis is called yaw (ψ).
In terms of attitude control, fixed-wing UAV is described as an under-actuated system, meaning it cannot hover in place and therefore must maintain a continuous forward motion from takeoff to landing. In order to change direction, this UAV must perform a coordinated turn. However, this UAV remains incredibly efficient, it features significantly longer endurance, higher cruise speeds, and greater payload capacities.

1.2. Hybrid Vtol Uavs

These vehicles are designed to take off and land vertically. They combine long-range and aerodynamic efficiency of fixed-wing aircraft with the hovering capabilities of rotary devices and eliminate the need for large spaces for takeoff and landing, and give the vehicle the capability to hover and take off and land vertically. Though this configuration offer optimized flight planning, however, it presents challenges due to their complex dynamics and uncertainties. Hybrid VTOL UAVs are designed for many non-military purposes, such as rescue operations, agricultural fields operations, surveillance operations, geographic mapping operations, and power lines inspection. A sample UAV of this type is shown in Figure 3, below. Amazon’s Prime Air delivery uses this type of UAV to deliver small packages. Though the technology used in these UAVs is still in its nascent stage, the autopilot in these UAVs does all the hard work of keeping the vehicle stable.
Physics and Dynamics of Hybrid-VTOL UAVs
The fundamental physics of hybrid-VTOL UAV incorporates two entirely different aerodynamic states. One is a hover-multirotor mode that features low-speed flight and relies entirely on thrust generated by vertical-lift rotors. In this mode, the flight dynamics is governed by the momentum theory and blade element theory, where the aerodynamic lift is generated the spinning rotor blades pushing air downwards. Hovering relies on generating thrust exactly equal to the aircraft’s weight, that is, W = T. Second is a cruise-fixed-wing mode that features high-speed flight and relies on forward movement of the aircraft to generate aerodynamic lift across the wing surfaces. The equation for lift here is the same as equation (3), above.
In terms of attitude control, hybrid-VTOL UAV is described by a 6-DOF dynamic model that couples translational and rotational movements. The translational dynamics includes the external forces of lift, weight, aerodynamic drag, and thrust. The rotational dynamics is controlled by adjusting the moments of pitch, roll, and yaw around the aircraft’s center of gravity. In this mode, the hovering attitude is controlled through a differential thrust or by tilting individual rotors. The cruising attitude is controlled by traditional aerodynamic control surfaces including ailerons, elevators, and rudders. In the multirotor mode, varying rotor speeds result in reaction torques that must be actively canceled out by counter-rotating propellers to prevent the UAV from spinning.

1.3. Single-Rotor Uavs

Similar to helicopters, these vehicles with longer blades and slower rotation use one large main rotor for lift and thrust and a smaller rotor mounted on the tail to provide stability and yaw motion control. Though these vehicles are highly efficient and capable of hovering, they are mechanically complex and less stable than multi-rotor UAVs. Their advanced electronic controls manage blade pitch and tail rotor thrust for stable flight. They are best suited for specialized, demanding tasks, and they are more aerodynamically efficient for long-duration missions requiring heavy loads lifting. They are used for a variety of tasks, including surveillance, land survey, and inspection. Figure 4, below, shows a sample single-rotor UAV.
There are several advantages of single-rotor over quadcopter UAV, including higher payload to accommodate embedded sensors due to lower structural weight involved by a single actuator. Not only that, but also its increased versatility and its capability to custom design the platform to perform well in multiple environments. In addition, it can have different control schemes to ensure accurate trajectory, a great feature utilized in sampling pollution measurements accurately.
Physics and Dynamics of Single-Rotor UAVs
The fundamental aerodynamic forces for single-rotor UAV are the same for all other aircraft: lift, weight, thrust, and drag. Lift is generated by the main rotor blades as they spin. Its blades are shaped as airfoils with pressure difference between the upper and the lower surfaces. Weight is the downward gravitational force acting on the UAV’s center of mass. Thrust is a directional force generated by the propulsion system by tilting the main rotor’s thrust vector. Drag is the aerodynamic resistance of the fuselage and rotor blades moving through the air.
The conservation of angular momentum is the most critical physical challenge of single-rotor UAV. When the main rotor spins the rotor blades in a certain direction, the laws of physics exert an equal but opposite reaction force, causing the fuselage to spin in the opposite direction. However, this UAV incorporates a vertically mounted tail rotor, so when the main rotor generates a torque in a certain direction, this tail rotor’s pitch angle is adjusted to increase or decrease thrust to perfectly balance the main rotor’s torque and intentionally rotate the nose of the aircraft left or right as needed.
In terms of attitude control, the movement of this UAV requires mechanically manipulating the main rotor disk, which can be achieved using a swashplate mechanism that incorporates pitch, roll, and collective pitch controls. In the pitch control, tilting the rotor disk forward or backward, redirects the total thrust vector, resulting in a forward or backward flight of the UAV. Tilting the rotor disk sideways, generates a lateral force component that results in a left-rolling or right-rolling of the UAV. Changing the pitch angle of all main rotor blades simultaneously, increases or decreases the overall lift across the entire rotor disk, resulting in a climb or descent of the UAV without changing the rotational speed of the UAV’s motor.

1.4. Multi-Rotor Uavs

These vehicles come with three or more rotors, such as quadcopters and octa-copters, and they use electric motors and fixed-pitch blades, making them mechanically simpler than single-rotor UAVs and provide similar vertical takeoff and landing. The rotors on these vehicles have varying individual rotor speeds. The flight control in these UAVs is achieved through pitch, roll, yaw, and attitude by changing the speed of individual rotors. For instance, increasing the speed of one rotor makes the vehicle tilt in that direction. These vehicles are used for photography, surveying, and delivery. A sample of these UAVs is shown in Figure 5, below. Though quadrotor UAVs are associated with under-actuated dynamic characteristics which make them vulnerable to external disturbances and system failures, however, they are well known for their versatility, agility, and extensively used in range areas.
Physics and Dynamics of Multi-Rotor UAVs
The core physics and aerodynamics of multi-rotor UAVs relies on the interaction between its rotating propellers and the surrounding air. The flight of this UAV relies on precisely adjusting individual rotor speeds to generate varying amounts of aerodynamic thrust and torque. There are three major effects associated with the flight of multi-rotor UAVs. First is thrust, generated as the propeller spins and the UAV accelerates air downwards, creating an upward reactive thrust force, which is proportional to the square of the propeller’s angular velocity (Ω). This relationship is expressed by
T = k T   Ω 2
Here, ( k T ) is the thrust coefficient. Second is drag torque, created when spinning a propeller requires an opposing aerodynamic drag torque, pushing back against the motor and causing the UAV body to rotate (yaw). Third is gyroscopic effect describing the rotors, when they spin at high speed. Now, if the UAV pitches forward, a quick yaw movement will result in an unwanted roll, which the flight controller must constantly compensate for.
The multi-rotor UAV dynamics is governed by Newton-Euler equations of motion. For a standard quadcopter, there are four motors positioned at equal distance from the center of gravity. For vertical motion, all four rotors increase or decrease their speed simultaneously. When the total thrust exceeds the force of gravity, the UAV accelerates upward. But when thrust exactly equals gravity, the UAV hovers. In this case,
T T o t a l =   m g
For pitching motion, the front and rear motors adjust speeds relative to each other, for instance, front motors slow down and rear motors speed up. For rolling motion, the left and right motors adjust speeds relative to each other. A net torque is created by varying the speed of opposing motors, resulting in tilting the UAV aircraft. After tilting motion is achieved, the total thrust vector is no longer pointing purely upward, but a component of the thrust points in the horizontal direction and this will accelerate the UAV aircraft forward, backward, or sideways.
For yawing rotation, diagonal pairs of motors spin in the same direction to adjust speeds relative to the other diagonal pair spinning in the opposite direction. For hovering without spinning, the sum of clockwise torques must perfectly cancel out the sum of counter-clockwise torques. To accomplish spinning, yawing around the vertical axis, the balance of torque must be broken, and this happens by speeding up one pair of diagonally opposed motors while slowing down the other pair.
In terms of attitude control, the auto-pilot must follow a control allocation matrix that consists of high-level commands, control allocation, and input-output. The flight computer determines the desired torques ( τ x ,   τ y ,   τ z ), with the total thrust required for the next maneuver. The allocation computer solves a mathematical matrix that tells each individual motor exactly how much speed to add or subtract to acquire the exact desired torques. Then the output of this matrix is conveyed to the electronic speed controllers, which modulate the power to the brushless DC motors.
UAV Use of Artificial Intelligence
Unmanned aerial vehicles are either operated remotely by ground centers, or autonomously by utilizing machine learning software. However, in both cases, the data link portion of the UAV communication system is the most vulnerable part of its control system as it can be attacked by bad actor covertly by stealing onboard data, or overtly by overriding its control system [5]. In order for unmanned aerial vehicles to become more efficient, autonomous, and perform complex tasks with minimal human intervention, they are equipped with software programs to control their flight. The set of these algorithms are called machine learning, a subset of artificial intelligence, such that they allow the vehicle to learn from datasets and adapt to different dynamic environments, and make intelligent decisions as needed in real time. The mechanism that translates knowledge to intelligent navigation and control enables a UAV to process the information received by the sensors, including radar, cameras, and LiDar, to detect and avoid obstacles, navigate complex environments such as dense forests and urban areas, and plan optimal routes, achieving an autonomous flight control, all without human intervention.
In addition, these machine learning algorithms enable UAVs to recognize and track certain objects when trained on large datasets of images and sensor data, then search, identify, classify, and track such specific objects, people, or even animals. This type of application is crucial in the cases of search, surveillance, rescue, inspection of infrastructure when faults occur in power systems, for instance, and even in wildlife monitoring. These algorithms help UAVs to optimize their flight paths dependent on certain factors like speed, range of operations, and battery efficiency under environmental conditions, including wind and restricted zones. By analyzing flight dataset and sensor inputs, UAVs can predict when a certain component is likely to fail, an important step for proactive maintenance that results in reducing downtime of operations.
Figure 6. RQ-4 Global Hawk. Credit: U.S. Air Force.
Figure 6. RQ-4 Global Hawk. Credit: U.S. Air Force.
Preprints 217065 g006
As different algorithms of machine learning are getting more dominant in the unmanned aerial vehicle application, dependencies on radar observations and their corresponding datasets are growing, which makes this process limited by available radar configurations and resource-intensive. This leads to finding empirical methods and model-based algorithms offering more efficient alternative and solution to datasets.
Figure 7. Quadcopter drone in action with a DJI Mavic 3 Pro camera. Credit: DJI.
Figure 7. Quadcopter drone in action with a DJI Mavic 3 Pro camera. Credit: DJI.
Preprints 217065 g007
Instead of explicitly programmed to solve a particular problem using the physics laws, machine learning is focused on teaching a computer system to learn the solution from datasets containing solutions for such problem and improve with experience. A machine learning algorithm is trained to read large datasets, analyze the patterns and correlations in these datasets, and make the best decision and prediction for a solution. Therefore, the larger the datasets the algorithm reads, the more accurate the predicted solution is. Machine learning is broadly categorized based on its algorithms into three major categories: supervised learning, unsupervised learning, and reinforcement learning, as shown in Figure 8, below, where each category is associated with a certain set of algorithms [6].

2.1. Supervised Learning

It trains its model on labeled data such that it maps the input data to the correct output and predicts the solution for a new, unseen data based on the pattern it models during the training process. There are several types of supervised learning algorithms as listed in Figure 8, below.
Linear Regression (LR): trained on a labeled dataset and based on a linear relationship between input variables and a target variable, it predicts a continuous output value. The best-fit line is obtained by minimizing the difference between the predicted value and the target value. This is achieved by computing the cost function and using the gradient descent or the least squares to minimize the cost function [7,8].
Decision Tree (TR): constructs a tree-like model of decisions in the form of a series of if-else rules. This algorithm starts with a root node representing the whole dataset, then recursively splits the data into subsets based on the most informative input. The final nodes represent the predicted class label [7,8].
Random Forest (RF): constructs a multitude of decision trees during training and, compared with a single decision tree, it significantly reduces overfitting and improves accuracy, the generalization ability of the model. During training, each individual decision tree in the forest is trained independently using its respective bootstrap sample and random input subset. During prediction, each tree in the forest predicts a class and the final prediction is determined by a majority vote among all the trees [7,8].
Support Vector Machines (SVM): separates data points into different classes or predicts a continuous value over an established optimal hyperplane. The goal is to establish a hyperplane with the largest possible margin from the support vectors, the nearest data points of any class [7,8].
K-Nearest Neighbors (KNN): classifies a data point based on the majority class of its ‘K’ nearest neighbors in the input space. It is a lazy learning nonparametric algorithm that stores the entire training dataset and the corresponding input in memory. It can infer a predicted value by identifying the closest labeled examples to a new data point [7,8].
Naïve Bayes (NB): based on Bayes’ theorem, this probabilistic classifier assume no dependencies between input variables. It calculates the probability of each input variable in each class and the prior probability of each class, then for a new data point, it calculates the posterior probability for each class and assigns it to the class with the highest probability [7,8].

2.2. Unsupervised Learning

It deals with unlabeled data such that it aims to find patterns and relationships within the data without prior knowledge of the output. There are three major types of unsupervised learning algorithms as listed in Figure 8, below.
K-Means Clustering (KMC): partitions the unlabeled data into ‘K’ clusters based on similarity, aiming to minimize the distance between data points and their cluster’s centroid. Each data point belongs to the cluster with the nearest mean. The algorithm first assigns data points to the nearest centroid, then recalculates the centroid’s position as the mean of all points assigned to it. It operates iteratively until the cluster assignments stabilize. It identifies the clusters without any pre-existing labels in the data and hence it is called unsupervised [7,8].
Hierarchical Clustering (HC): either by staring with individual data points and merging them, or by starting with one large cluster and splitting it into smaller ones, clusters hierarchy is constructed as a tree of clusters. This algorithm does not require a predetermined number of clusters and it consists of five processes: initialization, merging, distance calculations, iteration, and dendrogram visualization [7,8].
Principal Component Analysis (PCA): reduces dimensionality to transform data into a new set of uncorrelated variables that capture the most variance in the data, and hence it is called principal component. This algorithm does not use a target variable, however, it identify patterns in high-dimensional datasets by simplifying them while retaining most of the important tasks information [7,8].

2.3. Reinforcement Learning

It deals with a liaison that interacts with an environment to learn through a trial and error the optimal action to take, and this liaison receives either a reward or a penalty for its action. There are four major types of reinforcement learning algorithms as listed in Figure 8, below.
Q-Learning (QL): an off-policy, value-based, and model-free reinforcement learning algorithm. It trains an agent to make optimal decision by learning the quality value of taking a specification in a given environment state. This algorithm consists of the six processes: Q-table initialization, exploration and exploitation, action selection, reward, Q-value update, convergence, and optimal policy derivation. The Q-values are updated using the Bellman equation in which the immediate reward and the maximum expected future reward are incorporated using a discount factor (ɤ) and a learning rate (α) [7].
Q S ,     A = Q S ,     A + α * [ R + ɤ * max Q S ' ,     A ' Q S ,     A ]
Where S represents the current state, A represents an action, S’ represents next state, A’ represents next action, and R represents a reward.
State-Action-Reward-State-Action (SARSA): similar to QL, it is model-free but a temporal difference on-policy algorithm that learns an optimal policy by estimating the optimal action-value function. To update its action-value function, Q(S, A) and learn the optimal policy, this algorithm uses a sequence of events: state, action, reward, next state, next action, update, and loop [7].
Actor-Critic (AC): combines both policy-based and value-based algorithms and consists of two components: an actor that learns the policy, and a critic that learns the value function, which guides the actor’s learning [7].
Deep Q-Network (DQN): combines Q-learning with deep neural network to enable agents to learn optimal actions in environments with large state spaces, like video games and robotics. Q-learning traditionally uses a table to store Q-values, but DQN replaces this table with neural network that approximates the Q-value function to generalize and handle high-dimensional inputs across states [7].

1.5. Learning Model Structures

Though it is not a separate category, deep learning is distinguished with multiple-layered neural networks such that it automatically learns complex features from datasets, whereas other machine learning categories often require manual feature extraction. That is, deep learning can learn directly from unstructured raw datasets, such as images and text, by constructing a hierarchy of features, while other machine learning categories need a human operator to identify and engineer the dataset features. Deep learning can be categorized by its neural network architecture, the structure and the organization of the artificial intelligence system of deep learning [7,8,9].
Neural networks (NN) is a primary type of learning-based model structure, applicable for all types of machine learning paradigm: supervised learning, unsupervised learning, reinforcement learning, and other learning methods, such as rule-based genetic fuzzy learning. The basic function of neural networks is to learn patterns and identify relationships in datasets. Neural networks consist of input and output layers, nodes or neurons, connections, and learning algorithms, a mechanism that models data to make predictions. The components of neural networks are illustrated in Figure 9, below. The mostly used algorithms in neural networks are listed, below, with a brief description [9].
Convolutional Neural Networks (CNN): specifically designed for processing data with a grid-like structure and can be used in image and video analysis. CNN works automatically and hierarchically extracting features from data, starting with simple patters such as edges, then progressing to more complex shapes, through a convolution process in which filters scan across the input dataset to detect image features. This process results in creating a features map that highlights the filter’s detection of specific features in the image [9].
Deep Neural Networks (DNN): used in image recognition, natural language processing, and autonomous systems. Deep neural networks is distinguished by using multiple hidden layers, where the number of hidden layers must be two or greater and can be hundreds. Using multiple hidden layers allows the networks to learn complex patterns directly from large datasets without requiring explicit feature engineering. A key part of these networks to learn is by adjusting the weights of connections between the layers’ neurons as they train on data [9].
Feedforward Neural Networks (FNN): specifically used for image recognition, pattern recognition, regression analysis, and other tasks where data processing is sequential. The information flows in a single, one-way direction, from input layer through hidden layers to output layer, without looping or iteration. In this type of neural networks, there is no need for memory of previous inputs because it doesn’t retain information for previous inputs [9].
Graph Neural Networks (GNN): best for relational data, achieved by processing graph-structured data such as social networks and chemistry. It learns by passing data between connected nodes. The core operation in GNN involves nodes sending messages of information to their neighbors [9].
Invertible Neural Networks (INN): designed to be reversible, that is, the forward mapping from input to output can be perfectly inverted, such that the input can be obtained from the output. This is a unique property which allows a range of applications and it is achieved through a set of specific constraints. In inverse problem, it is required to determine the causes of a measured outcome: the outcome or output is known and the causes or input data are unknown [9].
Physics-Guided Neural Networks (PGNN): a powerful framework for solving complex science and engineering problems with limited data, this hybrid modeling integrates machine learning with physical laws modeled by mathematical equations. It creates a more accurate and physically consistent predictions. It is used in fields where systems governed by physical laws are difficult to model accurately [9].
Physics-Informed Neural Networks (PINN): a valuable algorithm with a wide range of engineering and science application, it can be trained to predict the evolution of dynamical systems governed by ordinary or partial differential equations. Even with limited data or noisy data, it guides the model to obtain physically consistent and interpretable results [9].
Recurrent Neural Networks (RNN): used for processing sequential data such as text or speech. It has internal memory to retain information of previous inputs and achieves this by sequentially looping through past information one step at a time. This feature makes these RNNs a good choice for natural language processing, language modeling, speech recognition, image captioning and video analysis, and time-series forecasting [9].

3. Unmanned Aerial Vehicles and Automation

Each and every research paper selected for this survey have used one type of machine learning methodologies. In this section, each research paper will be analyzed based on the following criteria: what category of the UAV does the paper fall in: single rotor, multi-rotor, hybrid VTOL, or fixed wing? What have the authors achieved in their research using machine learning methodologies? Then obtain the analytical statistics information: what machine learning methodology have the authors used; in what period (month and year) was the research completed; and in what geographic region was the research performed? The analyses below are summarized per UAV category.

3.1. Fixed-Wing

C. Yan et al. [10], investigated the leader–followers flocking problem of fixed-wing UAVs using an algorithm that can generate roll angle and velocity commands by training an end-to-end controller in continuous state and action spaces based on a continuous actor–critic learning automation of reinforcement learning. G. De Luca et al. [11], investigated the reliability of free and open-source algorithms used in the geographical object-based image classification of very high resolution imagery surveyed by unmanned aerial vehicles using supervised learning. H. Khanzada et al. [12], presented a comparative analysis of reinforcement learning algorithms in flight control systems used in fixed-wing UAVs. The authors compared deep deterministic policy gradient, twin delayed deep deterministic policy gradient, proximal policy optimization, trust region policy optimization, and soft actor-critic to determine their suitability or complex UAV dynamics, using reinforcement learning. J. Li et al. [13], proposed a control system integrated with the conventional proportional–integral–derivative guidance law to facilitate the autonomous landing of fixed-wing UAVs and the automated tuning parameters through the use of deep Q-network of reinforcement learning. C. Tang et al. [14], studied the reward shaping used during training and the effects of hyper-parameters and different network topologies of neural networks in training of fixed-wing UAV landing control, using deep deterministic policy gradient of reinforcement learning. Y. Zhao et al. [15], studied the problem of collision avoidance for a variable number of fixed-wing UAVs in a limited airspace and formulated a set of flight scenarios using multi-agent Markov game theory and established a self-learning framework by using the actor-critic algorithm of reinforcement learning.
L. Lv et al. [16], developed a hybrid sparrow search optimization framework for navigation in complex topographies with two-level architecture: at a lower level, a purpose-built operator suite essential for mountain environments, and at a higher level, a learning agent to adapt the search strategy to the terrain complexity, for multiple fixed-wing unmanned aerial vehicles, based on reinforcement learning. A. Cui et al. [17], investigated the fault diagnosis and health management of fixed-wing UAVs and proposed a method of five algorithms: flight data generation, sample training prediction based on the long short-term memory network, gray prediction, combined prediction, and health calculation and management, all based on deep supervised learning. A. Sezgin et al. [18], proposed a regression-based energy-aware drone selection technique to forecast energy consumption based on drone type, payload, and mission distance. The authors compared decision tree, random forest, and linear regression for hyper-parameters optimization of supervised learning. F. Giral et al. [19], studied the guidance, navigation, and control systems of aerial vehicles, specifically focusing on motion planning for fixed-wing UAVs and presented two key applications, waypoint tracking and dynamic target interception, based on reinforcement learning. N. Musavi et al. [20], proposed a realistic modeling framework to manage the interaction between manned and fully autonomous unmanned aircraft systems, equipped with sense-and-avoid algorithms, in the national airspace system by utilizing the game theory and reinforcement learning.
N. Santos [21], proposed a hybrid artificial neural network model, incorporating a self-organizing map to identify feature points representing a cluster obtained from a binary image containing the UAV to estimate the actual UAV pose for landing maneuver, based on a single frame and the network structure was trained using a synthetic dataset based on unsupervised learning. A. Guerra-Langan et al. [22], presented a 3-DOF longitudinal flight dynamics and control simulation model of a small fixed-wing UAV to control airspeed of the UAV using the readings from the sensing array distributed on the wing and to look into the sensor layout and its effect on the performance of the controller, using supervised learning. E. Bohn et al. [23], studied how to perform attitude control of a fixed-wing UAV operating directly on the original nonlinear dynamics, requiring as little as 3 min of flight data, based on reinforcement learning. K. Borup et al. [24], presented a method for estimating the air data parameters for a small fixed-wing UAV using an arrangement of low-cost micro-electromechinal systems-based pressure sensors embedded in the surface of the UAV, by implementing the linear regression of supervised learning. A. Pasha et al. [25], proposed an algorithm trained and evaluated to overcome the problem of fixed-wing UAVs inaccurate estimates of attitude angles and lost control by predicting missing data from a malfunctioning system sensor using the data available from other system functioning sensors, using unsupervised learning.
J. Jiang et al. [26], studied a fixed-wing unmanned aerial vehicle mounted multispectral camera to acquire canopy spectral data of winter wheat at the jointing and booting stages, while agronomic indicators of plant dry matter, plant nitrogen accumulation and nitrogen nutrition index, as well as agrometeorological and field management data, were measured synchronously to establish a spatially and temporally explicit model for the diagnosis of winter wheat nitrogen status on small scale farms. The authors characterized the relationships between agronomic variables and UAV remote sensing by using and comparing four methods of supervised learning. M. Bronz et al. [27], studied the feasibility of real-time fault prediction in real flight fixed-wing UAV experiencing noisy measurements, communication limitations, and wrapped wing structure that breaks the geometric symmetry, using supervised learning. Z. Yu et al. [28], investigated the fault-tolerant formation control problem for networked fixed-wing unmanned aerial vehicles against faults, using the actor-critic neural networks of reinforcement learning. X. Zhao et al. [29], developed a framework enhanced with curriculum learning is designed, employing progressive task staging and reward optimization to accelerate convergence and improve policy stability, for fixed-wing UAVs that jointly integrated encirclement and obstacle avoidance, using a hybrid deep reinforcement learning. J. Tang et al. [30], proposed a method for the trajectory tracking control of a fixed-wing unmanned aerial vehicle (UAV) based on the deep deterministic policy gradient integrated with a trajectory tracking controller functions ranging from the state of flight of the UAV to the rudder control, in a simulation trained with reinforcement learning.
D. Xu et al. [31], investigated a low-cost UAV formation system consisting of one leader, equipped with the intelligence chip and five followers, without intelligence chips, and proposes a centralized collision-free formation-keeping method for fixed-Wing unmanned aerial vehicles using proximal policy optimization of reinforcement learning. X. Zhuang et al. [32], proposed a penetration relative motion theory and aircraft control method based on the 6-DOF fixed-wing UAV model to reduce response errors between control decisions and actual flight control, and designed, based on Markov decision process, a UAV penetration decision control method by implementing autonomous decision-making and control for UAV penetration through the integration of interceptor trajectory prediction with proximal policy optimization of reinforcement learning. X. Yuan et al. [33], designed a framework for radio surveillance, where a fixed-wing UAV is employed to acquire the radio fingerprint of a suspicious transmitter using a twin delayed deep deterministic policy gradient model to allow the UAV to learn its trajectory based on its observed transmit rate of the suspicious transmitter, in a simulation trained by deep reinforcement learning. H. Liu et al. [34], studied the optimal formation control of networked fixed-wing unmanned aerial vehicles under communication uncertainties and external disturbances trained via an off-policy reinforcement learning. X. Zhao et al. [35], developed a framework for optimizing fixed-wing unmanned aerial vehicles flight control strategies integrated with proximal policy optimization using deep reinforcement learning.
Y. Yuan et al. [36], studied the fixed-wing UAVs ability to autonomously generate evasive maneuver and designed a hierarchical goal-guided learning method combining traditional off-policy deep learning algorithms and endows the agent with the ability to evade a series of air-to-air missiles, using reinforcement learning. M. Hayat et al. [37], proposed a Bayesian method to perform rice panicle segmentation with optical images taken by unmanned aerial vehicles over paddy fields, based on unsupervised learning. M. Tan et al. [38], proposed a hierarchical collaborative pursuit–evasion game framework based on target allocation and comprising three layers: target allocation layer, maneuver decision-making layer, and flight controller layer to ensure stable flight for multi-fixed-wing UAVs, using gradient-assisted reinforcement learning. L. Li et al. [39], presented an analysis of the current state of a model-free, adaptive decision-making, with complex nonlinear constraints, and maneuverability in high-dynamic environments for a fixed-wing UAV control using reinforcement learning. Y. Guo et al. [40], proposed a multi-agent fixed-wing UAV flocking algorithm named Cucker–Smale flocking to learn collision-free leader–follower flocking by utilizing the information among followers with the attention mechanism and a special reward function, to alleviate the problems of sparse reward and policy convergence, trained by reinforcement learning.
W. Hu et al. [41], presented a control scheme employing a nonlinear disturbance observer within a dynamic inverse framework for a fixed-wing unmanned aerial vehicle, developed based on reinforcement learning. M. Khan et al. [42], presented learning-empowered solutions to improve the multi-class UAV classification performance under challenging weather conditions using single-shot object detection algorithms for single-rotor, fixed-wing and multi-rotor UAVs, based on deep supervised learning. H. Peng et al. [43], proposed a method to predict the future locations of the surrounding flying objects and classify them into different groups with similar levels of maneuverability, such as rotatory and fixed-wing UAVs, without prior knowledge regarding these classes, using generative unsupervised learning. M. Rostami et al. [44], studied endurance and energy efficiency of fixed-wing electric unmanned aerial vehicles by utilizing the Harris Hawk optimization algorithm to regulate the UAV fuel consumption between the fuel cell and the battery, using fuzzy logic-based programming and multi-factor reinforcement learning. S. Wang et al. [45], proposed a UAV fault detection algorithm based on contrast learning and spatial–temporal information of multivariate flight data by designing a series of specific sample transformations, and the feature distribution of normal data were modeled through comparing the similarity of different transformed samples using self- supervised learning.
A. Fotouhi et al. [46], proposed a two-hop communication model, between an end-user and a macro-cell through a UAV base stations to optimize the flight trajectory of a fixed-wing UAV using deep Q-learning of reinforcement learning. J. Valasek et al. [47], investigated machine learning visual tracking of stationary and moving ground targets by fixed-wing unmanned air systems with non-gimbaling, fixed pan-and-tilt cameras and developed an algorithm to determine an offline control policy for vehicle orientation and flight path such that a target can be tracked in the image frame of the camera without the need for operator input, trained by reinforcement learning. Y. Lin et al. [48], proposed a method that integrated automatic curriculum learning with multi-agent proximal policy optimization to simultaneously train both pursuers and evaders involving fixed-wing UAVs, enabling dynamic and adaptive encirclement strategies in autonomous decision-making using reinforcement learning. M. Zhang et al. [49], investigated the problem of prescribed-time optimal formation control with guidance trajectories for all fixed-wing unmanned aerial vehicles subject to denial-of-service attacks, and the optimal controller was designed by combining the back-stepping technique with reinforcement learning. S. Hu et al. [50], developed an online control scheme for visual-based UAV-on-UAV tracking and monitoring, where a solar-powered, fixed-wing UAV tracks a suspicious UAV target by having the target inside its effective visual range, a model based on deep deterministic policy gradient trained by deep reinforcement learning.
Y. Xue et al. [51], proposed a low-cost decision-making method for fixed-wing UAV swarm collaborative based on kinematic model and target spatial correlation model, to reduce the control error of algorithmic decision-making results and flight guidance law, and improve the detection using deep reinforcement learning. M. Chowdhury et al. [52], developed a model-free interchangeable flight controller for fixed-wing unmanned aircraft system by incorporating memory functions into the policy using long-short-term memory, trained in simulation using proximal policy optimization of deep reinforcement learning. J. Mei et al. [53], studied the scenario to maximize data collection of a fixed-wing unmanned aerial vehicle under insufficient battery energy and make a safety return, by adopting a twin delayed deep deterministic policy gradient algorithm with three designed reward functions with reinforcement learning. A. Din et al. [54], presented a control architecture for maximizing glide range of a fixed-wing UAV having unconventional design utilizing an optimal dynamic programming in a continuous state and control space domains, coupled with nonlinear simulations based on reinforcement learning. Y. Ou et al. [55], developed a hierarchical obstacle avoidance strategy that combined a high-level navigator with a low-level controller to guide and control a fixed-wing unmanned aerial vehicle to avoid a potential collision with a noncooperative dynamic obstacle, based on reinforcement learning.
X. Zhuang et al. [56], proposed a decoupled guidance and control theory for autonomous fixed-wing aircraft maneuvering, distinguishing between close and long-range engagements using a Markov decision process with a hybrid discrete and continuous action reinforcement learning. K. Haughn et al. [57], developed a state inference to efficiently alleviate gusts on a smart material camber-morphing wing of uncrewed aerial vehicles by relying on extensive sensing networks and using deep reinforcement learning. M. Raoufi et al. [58], developed a unified path planning and control framework for uncrewed aerial vehicles operating in dynamic wildfire environments using deep deterministic policy gradient for tracking fire evolution through an architecture comprising high-level planning and low-level control components, based on reinforcement learning. C. Wang et al. [59], developed an approach to solve the leader-follower flocking problem in continuous state and action spaces, and proposed a continuous actor-critic learning automation algorithm that used multilayer perceptron to represent both the actor and the critic with a structure based on continuous actor-critic learning automation of reinforcement learning. W. Zhao et al. [60], investigated the problem of a non-stationary environment caused by the change of learning agent strategy in a multi-agent environment and presented an improved fixed-wing multi-UAV control algorithm using proximal policy optimization with centralized learning and decentralized execution, based on reinforcement learning.
B. Zhang et al. [61], proposed a collision-free policy for 3D fixed-wing UAV Swarms based on scale–scalable combined with a reward function to achieve a balance between maintaining tight swarm formations and avoiding internal collisions, using multi-agent reinforcement learning. M. Qian et al. [62], studied a fault-tolerant tracking control scheme of multiple unmanned aerial vehicles under denial-of-service attacks, using neural networks combined with a long term performance index of fixed-wing UAVs constructed to tackle the adverse impact of bias faults and external disturbances, based on reinforcement learning. W. Yang et al. [63], developed an optimal formation control scheme integrated within an actor-critic framework for fixed-wing multi-unmanned aerial vehicle systems subject to uncertainties and external disturbances, bypassing the limitations of reinforcement learning. T. Ao et al. [64], developed a leader-follower strategy for heterogeneous multi-UAVs such that fixed-wing UAVs serve as leaders to provide dynamic relay communication coverage, while multi-rotor UAVs act as followers for multi-target search tasks to optimize flight trajectories separately for fixed-wing and multi-rotor UAVs by using multi-agent reinforcement learning. J. Han et al. [65], proposed a collision prediction and avoidance method for fixed-wing unmanned aerial vehicles to identify the neighboring UAVs with a significant threat, and a trajectory tracking macro action was incorporated into the action space guided by the reward function that considered to reward for closing to the preset flight paths, trained in simulation scenarios with model updates implemented using a soft actor-critic of deep reinforcement learning.
B. Meng et al. [66], proposed a fixed-time distributed optimized formation control scheme to improve the steady-state and transient performances of fixed-wing UAVs system with uncertainties, communication link and actuator faults, and performance constraint, based on reinforcement learning. F. Xie et al. [67], presented a hybrid intelligent control strategy through proximal policy optimization synergistically combining the stability of control theory with the adaptive learning capabilities for fixed-wing unmanned aerial vehicles to provide a baseline control, stability, and tracking efficiency via an actor-critic mechanism of reinforcement learning. M. Ali et al. [68], developed algorithms for multi-agent fixed-wing UAV path planning in 2D and 3D simulated environments and compared five algorithms, proximal policy optimization, soft actor–critic, deep deterministic policy gradient, trust region policy optimization, and multi–agent deep deterministic policy gradient, in various scenarios using reinforcement learning. Q. Cheng et al. [69], studied one of the collision avoidance scenarios for fixed-wing unmanned aerial vehicles needed to avoid collision with the enemy UAV during its flying path to the goal point, using a fixed-rule policy and Markov decision process with temporal-difference reinforcement learning. C. Yan et al. [70], proposed a selective behavior cloning enhanced actor-critic algorithm as a partially observable Markov decision process with sparse rewards and learn an end-to-end policy that maps imperfect sensor data to control signals, an approach for fixed-wing UAV navigation and obstacle avoidance that synthesizes imitation learning with deep reinforcement learning.
M. Su et al. [71], presented a controller framework equipped with a reward function to enable fixed-wing UAVs to navigate safely using only monocular onboard cameras, integrated with an adaptive entropy regularization mechanism into the proximal policy optimization using reinforcement learning. K. Julian et al. [72], presented two approaches for training decentralized multiple autonomous fixed-wing UAV controllers to maximize forest fire coverage, track wildfire expansions, and outperform an online receding-horizon controller, using deep reinforcement learning. Y. Cui et al. [73], presented a study that reframed vertical wind shear as a harvestable energy source and introduced a dynamic-soaring framework specifically to address the fixed-wing UAV controller's failure when turbulence or wind-direction shifts occurred, through dual-objective deep reinforcement learning. J. Wu et al. [74], presented a reactive online decision-making maneuver controller of obstacle avoidance for fixed-wing unmanned aerial vehicles matched with a composed interfered fluid dynamical system guidance law based on deep reinforcement learning. Y. He et al. [75], proposed an algorithm with a hybrid continuous action space to address the long-term dependency issues inherent in fixed-wing UAV operations, based on a continuous policy network to facilitate real-time flight path planning using reinforcement learning.
S. Jin et al. [76], proposed a temporal sequence constrained Q-learning framework for tilt-wing unmanned aerial vehicle requires a unified control strategy across rotary-wing, fixed-wing, and transition modes to integrate an encoder–decoder with recurrent networks to capture temporal dependencies based on offline reinforcement learning. G. Chang et al. [77], proposed a distributed Hierarchical Cooperative Soft Actor-Critic with maximum entropy (HC-SAC) framework to manage the decision making in close-range air combat for fixed-wing unmanned air vehicle swarms using traditional multi-agent reinforcement learning. Z. Liu et al. [78], studied the trajectory optimization problem of fixed-wing uncrewed aerial vehicles to maximize the energy efficiency and proposed two algorithms over two timescales: a successive convex approximation strategy and a multi-agent reinforcement learning. S. Devaraju et al. [79], developed a base station-distributed pheromone mobility model to autonomously coordinate fixed-wing UAV movements relied only on information from neighboring UAVs in a decentralized network using a Q-learning policy variant of deep reinforcement learning. H. Du et al. [80], proposed a hierarchical two-layer motion planning algorithm for fixed-wing UAV based on a global planner in the presence of static obstacles and other UAVs, to accomplish kino-dynamic and collision-free motion planning within the sensing range, using a local reinforcement learning.
W. Wang et al. [81], developed a flocking and navigation control of a fixed-wing UAV swarm as a Markov decision process with an oracle-guided two-stage training and execution scheme with observations and rewards based on multi-agent reinforcement learning. Y. Wang et al. [82], proposed an approximate optimal curve-path-tracking control algorithm for partially unknown nonlinear systems for fixed-wing unmanned aerial vehicles subject to asymmetric control input constraints using neural networks with an integral reinforcement learning. D. Wada et al. [83], investigated a fixed-wing UAV pitch control in wind tunnel tests using three different training approaches: a baseline using simple linear dynamics, a high-fidelity modeling, and a domain randomization, based on deep reinforcement learning. A. Viseras et al. [84], proposed a framework to allow a team of autonomous unmanned aerial vehicles to learn how to monitor a fire front, starting with two UAVs and extended to a larger number of UAVs, by utilizing multiple single trained agents and value decomposition networks, based on reinforcement learning.

3.2. Hybrid Vtol

B. Yuksek et al. [85], investigated a transition flight phase of a tilt-rotor fixed-wing VTOL UAV, designed a low-level flight control system based on adaptive dynamic inversion methodology to compensate aerodynamic effects during the transition phase, and to provide safety and energy efficiency during the transition flight phase, used deep deterministic policy gradient algorithm of reinforcement learning. S. Domitran et al. [86], studied how to make a VTOL UAV flight more accurate, efficient and stable and proposed a flight control using the proximal policy optimization algorithm of reinforcement learning. S. Makhtar et al. [87], presented a user-graphical user interface for the VTOL UAV propeller faulty classification system to enable the identification of different propeller conditions based on time-domain and frequency-domain acoustical features using supervised learning. F. Yu et al. [88], proposed to improve learning efficiency, a prior knowledge-guided learning method, integrated with deep deterministic policy gradient algorithm for a VTOL aircraft using reinforcement learning. B. Ma et al. [89], developed a control algorithm for a vertical take-off and landing aircraft under wind disturbances where the tracking control is formulated as a Markov decision process and presented an appropriate system state reward function with three kinds of wind fields in the learning environment to expand the exploration space and simulate the effect of wind disturbances on the flight control, based on reinforcement learning. J. Xu et al. [90], proposed a method to automate the design process of hybrid unmanned aerial vehicles by training a model-free, model-agnostic neural network controller design having an error convolution input trained by reinforcement learning.
M. Ugur et al. [91], proposed a multi-agent control system that utilizes soft actor-critic modules designed to independently control each input with a tailored reward mechanism, to reduce training time compared to single-agent systems, and to lower energy consumption, based on reinforcement learning. S. Sonkar et al. [92], proposed an anchor-based object detection algorithm to identify objects from a real-time video stream transmitted from a low-altitude long-endurance fixed-wing hybrid VTOL UAV designed for asset monitoring, using supervised learning. V. Saj et al. [93], developed an algorithm for autonomous controlling a vertical take-off and landing unmanned aerial vehicle, landing on a 6-DOF ship deck using only a monocular camera for tracking and localization reinforcement learning.
K. Xia et al. [94], investigated the motion planning and control for autonomous landing issue of a vertical take-off and landing unmanned aerial vehicle on a moving target, proposed a funnel shaped surface to maintain the relative position within a preassigned set for precise and safe landing, and designed an orientation constraint planning to avoid the flipping over, using reinforcement learning. J. Yang et al. [95], proposed a dual attention-based multi-instance learning of real UAV flight data for pin-pointing anomaly instances and automatically explaining anomalous events to establish the relationships between anomalous events and anomalous behaviors using supervised learning. A. Ali et al. [96], proposed an algorithm for simulation-to-real policy transfer of vertical take-off and landing unmanned aerial vehicle designed for landing on offshore docking stations in maritime operations, based on deep reinforcement learning.

3.3. Multi-Rotor

W. Cheng et al. [97], proposed a fault-tolerant control method to achieve high-speed flight of a quadrotor UAV, based on deep reinforcement learning. J. Huang et al. [98], proposed an acceleration framework with incomplete information and an energy consumption model for quadrotor UAVs to characterize UAV communications, and designed a UAV trajectory planning algorithm based on reinforcement learning. B. Jiang et al. [99], proposed a hybrid aerodynamic modeling method and model predictive control design for a quadrotor tail-sitter UAV that exhibited high predictive accuracy at a low computational cost and was used to optimize the throttle, pitch angle, and roll angle for position tracking, using supervised learning. R. Kestur et al. [100], studied the use and potential of low-altitude remote sensing UAVs in agriculture, particularly in small open agricultural field, by testing two UAVs, a fixed-wing UAV and a quadrotor UAV and applying the unsupervised K-Means Clustering vs. the extreme supervised learning.
P. Jin et al. [101], investigated a dynamic event-triggered robust optimal attitude tracking control problem for a quadrotor unmanned aerial vehicle in an uncertain environment and developed an augmented system of tracking error signal and reference signal to transform the tracking problem into a stabilization problem using Hamilton–Jacobi–Bellman equation to handle the random disturbances in the design of the controller and reinforcement learning. D. Silva et al. [102], proposed an algorithm to accelerate the training of a drone agent for a counter unmanned aerial system and guide an invading drone to a safe-killing zone using a hunter quadrotor drone whose training dataset was based on reinforcement learning. C. Wang et al. [103], proposed a framework for generating an optimal landing policy based on vision for a quadrotor unmanned aerial vehicle on a moving unmanned ground vehicle and presented a landing vision system to determine rapid localization and pose estimation of the unmanned ground vehicle, using reinforcement learning. H. He et al. [104], presented a framework combining learning-based inertial odometry module and differentiable model predictive control for a quadrotor Unmanned Aerial Vehicle attitude control using supervised learning. Z. Adaika et al. [105], presented a data-driven fault diagnosis for a quadcopter UAV that utilizes the principal direction vector of airframe vibration signals computed by principal component analysis based on classification using support vector machines and supervised learning.
S. Khan et al. [106], proposed an optimized approach, offering generative adversarial network for crops and weeds classification at early growth stage such that a quadcopter UAV in two different crop-lands, pea and strawberry, can acquire high-resolution images, where the classification systems deploying UAV imagery are based on supervised learning. C. Zeng et al. [107], studied the optimal states that transitioning UAVs, switching between fixed-wing, VTOL, and multi-rotor states, would assume under different mission and flight environment scenarios, using supervised learning. R. Hernandez-Hernandez et al. [108], proposed a hybrid multilayer extreme learning algorithm with fuzzy Logic theory for active image classification applied to quadcopter UAVs, using supervised learning. G. Zhang et al. [109], proposed an adaptive Kalman filter for adjusting the noise covariance of UAV global navigation system measurements under different positioning accuracies, based on a proposed accuracy classification model trained by supervised learning. T. Alotaibi et al. [110], developed an algorithm to capture images of the obstacles at various flight altitudes and images of the horizon to assist a quadcopter UAV in flying at an appropriate altitude from multiple worlds on the Gazebo simulator, using convolution neural network and deep supervised learning.
T. Li et al. [111], developed a multi-agent deep deterministic policy gradient algorithm based on environment sensing with a control barrier function to address the path planning problem for a quadrotor UAV swarm in an unknown electromagnetic environment, using weakly supervised learning. W. Pavon et al. [112], developed a neural-network-based approach designed to replicate the behavior of classical control systems to address the challenge of accurately tracking the position of quadcopter UAVs, using supervised learning. D. Crowe et al. [113], investigated the adequacy of two methods, K-nearest neighbor algorithm and long short-term memory algorithm, to tackle the problem of wind velocity estimation in the lower most region of the atmosphere using on-board inertial data of multi-rotor-based UAV, based on supervised learning. P. Yang et al. [114], developed a 1-D convolutional neural network model using the adaptive batch normalization algorithm and the fault diagnosis and classification are accomplished by feature extraction from lower dimensional multi-rotor UAV data, based on supervised learning. X. Liu et al. [115], presented a hybrid control method to combat the most prevalent types of quadrotor UAVs failures—actuator faults, preserve the stability and continuity of mission-critical tasks, and enhance fault tolerance by utilizing the proximal policy optimization algorithm for the strategic training of our control systems, based on reinforcement learning.
J. de Souza et al. [116], proposed an alternative method to train a multilayer perceptron neural network based on fuzzy Mamdani logic to control the landing of a quadcopter UAV with an onboard computer and a robot operating system using supervised learning. A. Haddad et al. [117], presented a concept of dual-scale homogeneity, defined by scaled magnitudes and timce in transformed coordinates that remain independent of system parameters, to achieve consistent performance of a quadrotor UAV with a slung load system. The authors also designed a parameter-dependent policy that homogenizes the quadrotor UAV with a slung load system using reinforcement learning. G. Farid et al. [118], presented a motion planning approach for a quadrotor UAV that learns the optimal state values and related optimal policies considering various regions of the UAV environment to achieve multiple random targets in a specific 3-D cluttered space based on reinforcement learning. T. Trad et al. [119], studied high and low-level control systems, attitude stabilization, and position tracking of quadrotor UAVs and designed a reward function and actor-critic network structures to stimulate high-order observable states and improve the agent’s understanding of UAV dynamics based on reinforcement learning. S. Sönmez et al. [120], proposed an approach applying a deep deterministic policy gradient algorithm, an off-policy actor–critic method, to adjust the gains of a quadrotor UAV attitude controller during flight, using reinforcement learning.
H. Zheng et al. [121], proposed an optimal formation control strategy for multiple quadrotor UAVs with model uncertainties and unknown disturbances to achieve prescribed transient and steady-state performance at an appointed time, based on self-structuring neural network and actor-critic reinforcement learning. Q. Sun et al. [122], proposed a curiosity-driven method for aggressive quadrotor UAV flight missions and introduced a similarity-based curiosity module is to speed up the training procedure, based on reinforcement learning. C. Wang et al. [123], proposed a learning framework for a quadrotor UAV to land on a moving UGV without knowing its motion dynamics, a framework of two components: a landing vision system and a landing control system, with both systems using reinforcement learning. Q. Luo et al. [124], proposed a framework to solve quadrotor UAV attitude dynamics during autonomous navigation where high-level velocity commands will be generated by a deep neural network policy and translated by a low-level control algorithm to achieve precise control of both positions and rotations of quadrotor UAVs, using deep reinforcement learning. C. Yu et al. [125], presented a trajectory tracking control of an autonomous quadrotor unmanned aerial vehicle to perform remote inspection with reinforcement learning.
G. Wen et al. [126], developed an optimized attitude control for a quadrotor UAV system to steer both attitude angle position and velocity states to follow the predefined reference signals using critic-actor architecture and fuzzy logic system approximation of reinforcement learning. Y. Chang et al. [127], presented an adaptive fault-tolerant control scheme for quadrotor UAVs subjected to external disturbances, input uncertainties, and structural uncertainties, with a penalty function, a critic network to evaluate control performance based on reinforcement learning. M. Alam et al. [128], proposed a Q-learning routing protocol inspired by adaptive flocking control for a UAV aerial surveillance and emergency communication to obtain an optimal routing path in terms of delay, stable path selection defined by predictive link duration, and energy consumption, based on reinforcement learning. V. Tran et al. [129], presented an adaptive control synthesis methodology for a quadrotor UAV attitude and altitude stabilization based on strictly negative imaginary property and fuzzy reinforcement learning. J. Quan et al. [130], proposed a stability-aware dynamic clipping parameter adjustment strategy that used a Lipschitz continuity interpretation and adapted the clipping threshold in real time based on a stability variance and quadrotor UAV proximal policy optimization of reinforcement learning.
M. Khojasteh et al. [131], proposed a path-planning method of two-stage structure comprising a depth estimation module and a decision-making module, for a quadrotor UAV equipped with only a monocular camera. The first module used a convolutional encoder-decoder network to learn image depth from visual cues self-supervised, with the output serving as input for the second module. The second module used dueling double deep recurrent Q reinforcement learning. G. Ryou et al. [132], presented an algorithm that involved co-training of a planning policy through multi-fidelity Bayesian optimization and a reward estimator to effectively create a realistic dynamics model for high-speed quadrotor UAV trajectories, based on reinforcement learning. Y. Xia et al. [133], proposed a tracking scheme for quadrotor UAVs to approximate an escape target, addressing time constraints and guaranteeing low energy expenditure, using multi-agent deep deterministic policy gradient of deep reinforcement learning. J. Estevez et al. [134], presented a trajectory planning algorithm, containing a feature vector and a parameter vector, for the transportation of cable-suspended loads employing three quadrotor UAVs to transport the cargo smoothly while avoiding its swing, using reinforcement learning. O. Doukhi et al. [135], proposed a multi-nonlinear predictive control policy for quadrotor UAVs, trained dynamically by selecting adaptation parameters and feedforward control commands for the low-level controller leveraging deep reinforcement learning.
J. Chen et al. [136], proposed an algorithm that learned online planning for quadrotor UAVs pursuit-evasion in unknown environments and derived a feasible policy via a two-stage reward refinement and deployed the policy on real quadrotors to tackle partial observability in cooperative policy learning and enable higher exploration efficiency, using reinforcement learning. H. Han et al. [137], proposed an odd symmetric actor to achieve stable and symmetric control performance, and an even critic to stabilize the training process of a quadrotor UAV such that the bias of neural networks was eliminated and the absolute value operation was adopted to construct the activation function. The authors devised a cascade architecture in which each module was trained to control a symmetric subsystem of a quadrotor UAV using deep reinforcement learning. G. Jiménez et al. [138], studied and compared three different algorithms: DQN, SARSA, and Actor-Critic of reinforcement learning. Addressing the problem of a quadrotor UAV navigating from departure point A to endpoint B while avoiding obstacles and using the least possible time and flying the shortest distance, the authors showed that DQN was the only algorithm achieving the target. X. Huo et al. [139], proposed an adaptive optimal collision-free control formulation for reducing the quantity of adaptive laws and computational complexity, using neural networks estimating capacity and the actor-critic control scheme of reinforcement learning. H. Zhao et al. [140], developed a workflow safer than traditional learning, suited for quadrotor UAVs control in industrial scenarios, especially when learning from offline datasets, and utilizes a more general behavior policy and a pessimistic estimation to foster offline deep reinforcement learning.
X. Li et al. [141], proposed a trajectory planning method for three quadrotor UAVs based on a value-function approximation algorithm to solve the problem of cable-suspended load transportation and reach the target position faster without the load swinging, using reinforcement learning. J. Xie et al. [142], developed a model-free method based on a partially observable Markov decision process and a UAV autonomous tracking and landing maneuver by an end-to-end neural network combining deep deterministic policy gradients and heuristic rules, based on deep reinforcement learning. K. PB et al. [143], presented a framework to control the altitude of a quadcopter UAV and stabilize its x- and y-axes by training the UAV using the Q-learning of reinforcement learning. L. Al-Haddad et al. [144], presented and demonstrated the efficacy of an optimized stochastic gradient descent logistic regression algorithm for classifying unbalances in quadrotor UAVs, based on supervised learning. J. Park et al. [145], proposed a fault diagnosis algorithm using deep neural network for a single rotor of octocopter UAV to determine the failure of the rotor in real time, based on supervised learning.
Y. Song et al. [146], proposed a quadrotor control simulator, composed of two main components, a rendering engine built on Unity and a physics engine for dynamics simulation, utilizing 3-D path planning based on reinforcement learning. E. Mosweu et al. [147], designed and evaluated a controller for a multirotor unmanned aerial vehicle system capable of adapting its gains in accordance with changes in the system dynamics utilizing a Simulink-constructed model employing a deep deterministic policy gradient network of reinforcement learning. A. Narmilan et al. [148], presented an approach on the preliminary detection of sugarcane white leaf disease by using high-resolution multispectral sensors mounted on small unmanned aerial vehicles (UAVs) by implementing random forest, decision tree, and K-nearest neighbors of supervised learning. S. Chen et al. [149], proposed an energy-saving path-planning algorithm in a turbulent wind environment that dynamically adjusted flight strategies for quadcopter UAVs to find the most energy-saving flight paths, based on reinforcement learning. Y. Sun et al. [150], proposed a UAV detection method, aiming to carry out efficient and accurate multi-target detection through computer vision technology using YOLOv5-Lite model based on supervised learning.
J. Cardenas et al. [151], designed a flight position controller based on deep neural networks and subsequent implementation for a multi-rotor UAV of diverse trajectories with transient and steady-state information such as position, speed, acceleration, and motor output signals, using supervised learning. H. Xiong et al. [152], proposed a formation-surrounding control method for multiple quadrotor UAVs pursuit-evasion system subject to external disturbances, by constructing position and attitude tracking error subsystems of quadrotors and presented two control strategies which combined the feedforward control technique and reinforcement learning. J. Faigl et al. [153], proposed a surveillance planning for a multi-rotor unmanned aerial vehicle to periodically take snapshots of areas of interest and find a fast and smooth 3-D trajectory by visiting a given set of waypoint locations in the shortest time possible, based on unsupervised learning. J. Xue et al. [154], presented an algorithm combined with a domain randomization method to learn wind-resistant hovering policies and designed a reward function to maintain a continuous flight of quadrotor UAVs and guide a Q-learning process based on deep reinforcement learning. N. Parvaressh et al. [155], proposed an algorithm to solve the location optimization problem of multi-rotor UAV base stations in the presence of mobile endpoints using a continuous actor-critic of deep reinforcement learning.
R. Polvara et al. [156], proposed a method that requires low-resolution images taken from a down-looking camera in order to identify the position of a marker and land a quadrotor UAV on it, based on a hierarchy of deep Q-networks of reinforcement learning. C. Sabo et al. [157], developed a robotic solution in the form of a quadcopter that can support the necessary payload to replicate the sensing capabilities including chemical sensing and a wide visual field-of-view controlled remotely to process the sensor data using neural reinforcement learning. N. Smolyanskiy et al. [158], presented a micro aerial vehicle system, for autonomously following trails in unstructured, outdoor environments such as forests, and utilized vision modules for environmental awareness of obstacle detection using DNN with soft labels for supervised learning. P. Jardine et al. [159], proposed a trajectory planning algorithm for autonomous quadrotor UAVs based on model predictive control tuned with reinforcement learning. B. Chen et al. [160], proposed an observer-based team formation tracking control scheme for large-scale quadrotor UAV systems under external disturbance, measurement noise, couplings from other neighboring quadrotor UAVs, and malicious attacks on actuator and sensor of the network control system via wireless communication, based on deep neural network reinforcement learning.
Q. Sun et al. [161], developed a vision-based hierarchical algorithm for quadrotor UAV navigation, integrating perception, obstacle avoidance, and motion control into a unified framework with an echoic hindsight experience replay mechanism to accelerate convergence by transforming failed episodes into successful ones using reinforcement learning. Z. Zhang et al. [162], proposed a control method that relied on collected data instead of modelling to handle coupling and reduce disturbance from aerodynamics and utilized Lyapunov method to maintain stability of the system, based on reinforcement learning. X. Lin et al. [163], proposed an event-triggered control strategy to stabilize a quadrotor unmanned aerial vehicle with actuator saturation with a model free scheme designed based on reinforcement learning. B. Yu et al. [164], presented a modular framework for a low-level control of a quadrotor UAV, with direct control of yawing motion and a real-world autonomous flight based on reinforcement learning. P. Sharma et al. [165], developed a model-free algorithm for a quadrotor to recover from a single rotor failure, based on soft-actor-critic that enabled the vehicle to hover, land, and perform complex maneuvers, using reinforcement learning. Y. Gong et al. [166], proposed an optical flow algorithm to calculate the flight speed of the drone, the calculation errors of its rotation state, and the vertical take-off and landing state, for the visual navigation and positioning of multi-rotor drones, based on the K-nearest neighbors method of supervised learning.

3.4. Single-Rotor

A. Alanazi [167], proposed enhancing the attack detection capabilities and addressing the growing threat of global positioning system spoofing to small unmanned aerial vehicles by using a hybrid architecture integrating long short-term memory and gated recurrent unit models based on supervised learning. X. Jiang et al. [168], designed a state representation learning method based on observation temporal permutation to improve the UAV state representation of the agent's policy network such that the network to output consistent action value estimates for observation sequences with the same content but different temporal orders, and the state representation to predict the target orientation variations in the observation sequence, which further regularizes and facilitates the state representation learning process using supervised learning. Z. Mou et al. [169], proposed a multi-cluster graph attention learning algorithm for detecting the clusters of a hierarchical unmanned aerial vehicle swarm network in the US-NET using self- supervised learning. M. Akhtar et al. [170], proposed a unified framework for the development and optimization of a tilt-rotor tri-copter UAV capable of performing vertical takeoff and landing and efficient hover-to-cruise transitions utilizing algorithms such as deep deterministic policy gradient, trust region policy optimization, and proximal policy optimization of reinforcement learning.
S. Wen et al. [171], investigated the use of deep learning in combination with flow field methods, specifically the use of computational fluid dynamics model and the generative adversarial network in predicting the flow field dynamics of a single-rotor UAV, based on unsupervised learning. T. Hong et al. [172], proposed a method for real-time tracking of drone targets by using the existing monitoring network to obtain drone images in real time and employing deep learning methods by which single-rotor and multi-rotor UAVs in urban environments can be guided, using supervised learning. M. Hossam-E-Haider et al. [173], developed a learning framework for single-rotor UAVs real-time object recognition and sending alerts to the ground station, using the You Only Look Once (YOLO) deep supervised learning. X. Chen et al. [174], proposed a micro-motion feature classification method using K-band frequency modulated continuous wave radar, data acquisition of five types of rotor UAVs, and presented a solution based on multi-scale convolutional neural network of supervised learning. B. Yen et al. [175], proposed a method to accurately estimate an unmanned aerial vehicle's rotor noise power spectral density and capture the desired sound signals by utilizing the rotor characteristics and microphone signals, based on supervised learning. G. Marichal et al. [176], proposed a scheme for detecting incipient defects in spur gears in a single propeller system of a small-sized unmanned helicopter, and developed a hybrid genetic neuro-fuzzy system to express the fault diagnostic system as a set of fuzzy rules, using unsupervised learning.

4. Discussion of the Survey Results

The results of this survey of unmanned aerial vehicles are aggregated by the category of machine learning, the period (month and year) in which studies were completed, the category of UAV, and the geographic region. The methodologies covered include supervised learning, unsupervised learning, and reinforcement learning. There are four periods of time surveyed: last 12 months, last 24 months, last five years, and last 10 years. For each period of time, two tables of survey data and six graphic charts will be shown. Among the graphic charts, a pie chart will show the methodologies applications share percentage. A bar chart will show the total count of papers published per individual time slice, which is per month for the period of last 12 months, a year quarter for the period of last 24 months, and a year for the periods of last five years and last 10 years. A line chart will show the trend of the learning methodologies over the corresponding period of time. Then a histogram chart will show the total publications per methodology during the entire period of time.
In this section, previous related studies in space robotic applications have been surveyed with specific focus on free-flying and free-floating coupled systems and whether the following issues were addressed and resolved: use of machine learning, resolving system singularities, presented algorithms for collision-avoidance, defining failure cases and an algorithm to check for failure cases, smoothing of robotic joint velocities using optimization, presented algorithms to reduce or eliminate oscillations.

4.1. Survey Period: Last 12 Months

Table 1, below, lists machine learning published papers count per each methodology for the specific field of unmanned aerial vehicles of different types in engineering and maintenance applications during the entire time period of the last 12 months.
Figure 10, below, illustrates machine learning methodologies percentage share in the entire publication for the specific field of unmanned aerial vehicles of different types in engineering and maintenance applications during the time period of the last 12 months.
Table 2, below, lists machine learning published papers count per each methodology for the specific field of unmanned aerial vehicles of different types in engineering and maintenance applications for each month of the time period of the last 12 months. For any month not shown, that is an indication of no papers published during that month.
Figure 11, below, is a multiple-bars chart that shows the total machine learning papers count of published per machine learning methodology for the specific field of unmanned aerial vehicles of different types in engineering and maintenance applications during the time period of the last 12 months.
Figure 12, below, is a multiple-lines chart that shows the total machine learning papers count of published per machine learning methodology for the specific field of unmanned aerial vehicles of different types in engineering and maintenance applications during the time period of the last 12 months. It illustrates the trend of machine learning methodology publications.
Figure 13, below, illustrates machine learning methodologies histogram for the total paper publication per methodology for the specific field unmanned aerial vehicles of different types in engineering and maintenance applications during the time period of the last 12 months.
Figure 14, below, illustrates unmanned aerial vehicles categories histogram for the total paper publication per unmanned aerial vehicles category for all machine learning categories used in specific unmanned aerial vehicles category applications during the time period of the last 12 months. There are four UAVs categories covered in this study. They include Fixed-Wing, Hybrid VTOL, Multi-Rotor, and Single-Rotor.
Table 3, below, lists machine learning published papers count per geographic regions for the specific field of unmanned aerial vehicles of different types in engineering and maintenance applications during the entire time period of the last 12 months.
Figure 15, below, illustrates machine learning methodologies histogram for the total paper publication per geographic region for the specific field of unmanned aerial vehicles of different types in space engineering and maintenance applications during the time period of the last 12 months. There are seven geographic regions covered by this study. They include Australia, China, Europe, Far East (Indonesia, India, Japan, and Koreas), Middle East, North America, and Russia. Russian scholars perhaps publish their work in Russian journals, and it could be the reason that this research did not find many papers for them.

4.2. Survey Period: Last 24 Months

Table 4, below, lists machine learning published papers count per each methodology for the specific field of unmanned aerial vehicles of different types in engineering and maintenance applications during the entire time period of the last 24 months.
Figure 16, below, illustrates machine learning methodologies percentage share in the entire publication for the specific field of unmanned aerial vehicles of different types in engineering and maintenance applications during the time period of the last 24 months.
Table 5, below, lists machine learning published papers count per each methodology for the specific field unmanned aerial vehicles of different types in engineering and maintenance applications during each year quarter of the time period of the last 24 months. For any quarter of the year not shown, that is an indication of no papers published during that year quarter.
Figure 17, below, is a multiple-bars chart that shows the total machine learning papers count of published per machine learning methodology for the specific field unmanned aerial vehicles of different types in engineering and maintenance applications during the time period of the last 24 months.
Figure 18, below, is a multiple-lines chart that shows the total machine learning papers count of published per machine learning methodology for the specific field unmanned aerial vehicles of different types in engineering and maintenance applications during the time period of the last 24 months. It illustrates the trend of machine learning methodology publications.
Figure 19, below, illustrates machine learning methodologies histogram for the total paper publication per methodology for the specific field unmanned aerial vehicles of different types in engineering and maintenance applications during the time period of the last 24 months.
Figure 20, below, illustrates unmanned aerial vehicles categories histogram for the total paper publication per unmanned aerial vehicles category for all machine learning categories used in specific unmanned aerial vehicles category applications during the time period of the last 24 months. There are four UAVs categories covered in this study. They include Fixed-Wing, Hybrid VTOL, Multi-Rotor, and Single-Rotor.
Table 6, below, lists machine learning published papers count per geographic regions for the specific field of unmanned aerial vehicles of different types in engineering and maintenance applications during the entire time period of the last 24 months.
Figure 21, below, illustrates machine learning methodologies histogram for the total paper publication per geographic region for the specific field of unmanned aerial vehicles of different types in space engineering and maintenance applications during the time period of the last 24 months. There are seven geographic regions covered by this study. They include Australia, China, Europe, Far East (Indonesia, India, Japan, and Koreas), Middle East, North America, and Russia. Russian scholars perhaps publish their work in Russian journals, and it could be the reason that this research did not find many papers for them.

4.3. Survey Period: Last 5 Years

Table 7, below, lists machine learning published papers count per each methodology for the specific field of unmanned aerial vehicles of different types in engineering and maintenance applications during the entire time period of the last 5 years.
Figure 22, below, illustrates machine learning methodologies percentage share in the entire publication for the specific field of unmanned aerial vehicles of different types in engineering and maintenance applications during the time period of the last 5 years.
Table 8, below, lists machine learning published papers count per each methodology for the specific field unmanned aerial vehicles of different types in engineering and maintenance applications during each year of the time period of the last 5 years.
Figure 23, below, is a multiple-bars chart that shows the total machine learning papers count of published per machine learning methodology for the specific field unmanned aerial vehicles of different types in engineering and maintenance applications during the time period of the last 5 years.
Figure 24, below, is a multiple-lines chart that shows the total machine learning papers count of published per machine learning methodology for the specific field unmanned aerial vehicles of different types in engineering and maintenance applications during the time period of the last 5 years. It illustrates the trend of machine learning methodology publications.
Figure 25, below, illustrates machine learning methodologies histogram for the total paper publication per methodology for the specific field unmanned aerial vehicles of different types in engineering and maintenance applications during the time period of the last 5 years.
Figure 26, below, illustrates unmanned aerial vehicles categories histogram for the total paper publication per unmanned aerial vehicles category for all machine learning categories used in specific unmanned aerial vehicles category applications during the time period of the last 5 years. There are four UAVs categories covered in this study. They include Fixed-Wing, Hybrid VTOL, Multi-Rotor, and Single-Rotor.
Table 9, below, lists machine learning published papers count per geographic regions for the specific field of unmanned aerial vehicles of different types in engineering and maintenance applications during the entire time period of the last 5 years.
Figure 27, below, illustrates machine learning methodologies histogram for the total paper publication per geographic region for the specific field of unmanned aerial vehicles of different types in space engineering and maintenance applications during the time period of the last 5 years. There are seven geographic regions covered by this study. They include Australia, China, Europe, Far East (Indonesia, India, Japan, and Koreas), Middle East, North America, and Russia. Russian scholars perhaps publish their work in Russian journals, and it could be the reason that this research did not find many papers for them.

4.4. Survey Period: Last 10 Years

Table 10, below, lists machine learning published papers count per each methodology for the specific field of unmanned aerial vehicles of different types in engineering and maintenance applications during the entire time period of the last 10 years.
Figure 28, below, illustrates machine learning methodologies percentage share in the entire publication for the specific field of unmanned aerial vehicles of different types in engineering and maintenance applications during the time period of the last 10 years.
Table 11, below, lists machine learning published papers count per each methodology for the specific field unmanned aerial vehicles of different types in engineering and maintenance applications during each year of the time period of the last 10 years.
Figure 29, below, is a multiple-bars chart that shows the total machine learning papers count of published per machine learning methodology for the specific field unmanned aerial vehicles of different types in engineering and maintenance applications during the time period of the last 10 years.
Figure 30, below, is a multiple-lines chart that shows the total machine learning papers count of published per machine learning methodology for the specific field unmanned aerial vehicles of different types in engineering and maintenance applications during the time period of the last 10 years. It illustrates the trend of machine learning methodology publications.
Figure 31, below, illustrates machine learning methodologies histogram for the total paper publication per methodology for the specific field unmanned aerial vehicles of different types in engineering and maintenance applications during the time period of the last 10 years.
Figure 32, below, illustrates unmanned aerial vehicles categories histogram for the total paper publication per unmanned aerial vehicles category for all machine learning categories used in specific unmanned aerial vehicles category applications during the time period of the last 10 years. There are four UAVs categories covered in this study. They include Fixed-Wing, Hybrid VTOL, Multi-Rotor, and Single-Rotor.
Table 12, below, lists machine learning published papers count per geographic regions for the specific field of unmanned aerial vehicles of different types in engineering and maintenance applications during the entire time period of the last 10 years.
Figure 33, below, illustrates machine learning methodologies histogram for the total paper publication per geographic region for the specific field of unmanned aerial vehicles of different types in space engineering and maintenance applications during the time period of the last 10 years. There are seven geographic regions covered by this study. They include Australia, China, Europe, Far East (Indonesia, India, Japan, and Koreas), Middle East, North America, and Russia. Russian scholars perhaps publish their work in Russian journals, and it could be the reason that this research did not find many papers for them.

5. Future Challenges for Unmanned Aerial Vehicles

The primary challenges in 2025 for unmanned aerial vehicles are battery life, complex airspace integration, robust machine learning for full autonomy, regulatory hurdles, cyber security, weather resilience, and integrating hybrid designs. In addition, for each category of unmanned aerial vehicle, there are specific future challenges to be accounted for. For instance, fixed-wing require independent takeoff and landing runways. Single-rotor UAVs encounter the challenge of vibrations. Multi-rotor UAVs encounter the challenge of payload and range. Hybrid VTOL UAVs encounter the challenge of complex control. On the artificial intelligence side, machine learning is still facing the gaps in data training and real-time handling of decision-making for truly autonomous missions. Here below, some more details about these issues.
Fixed-Wing UAVs: the lack of hover capability requires independent runways for takeoff and landing, a major issue the deployment in confined urban areas. Operation space is dependent on runways or catapults for launching UAVs, and landing needs open areas. Urban terrain and rugged areas impose limitation. High-speed flights create aerodynamic challenges for retreating blades and affect aircraft stability. Lower spatial resolution decreases data accuracy, another challenge for fixed-wing UAVs.
Single-Rotor UAVs: these UAVs have one large rotor, high vibration levels, and made of many interconnected parts. In close-quarters operations, larger blades are associated with higher kinetic energy and present considerable danger and risks, a major safety issue, particularly in populated areas. Sensitive payloads can potentially be affected due to high vibration levels. In addition, having too many parts requires advanced control and maintenance systems. Multi-Rotor UAVs: these vehicles are associated with low payload-to-weight ratios and short endurance which restrict the mission scope and range. Multiple rotors consume high power for hovering and drain the UAV batteries fast. Multiple rotors generate vibrations that affect the UAV structural integrity and make it susceptible to wear. Payload-to-weight ratio is 1:1 or lower and breaking this barrier is a major future challenge. In addition, these vehicles have short flight duration, typically less than 40 minutes, due to their inefficiency in fighting gravity.
Hybrid VTOL UAVs: the primary control challenge in these vehicles is managing the complex transition between horizontal forward flight and vertical hover while maintaining stability under different wind conditions. Another challenge is the parasitic drag generated by the VTOL rotors during cruise unless retractable designed are successfully scaled; otherwise this drag can reduce the vehicle efficiency by more than 30%.Balancing efficient and powerful combustion and electric systems is also a challenge beside the need for urban infrastructure such as managing complex airspace and ensuring public safety and privacy with seamless integration into air traffic management. It is important to note that integrating fixed-wing and multi-rotor design requires sophisticated machine learning control systems.
Machine learning for UAVs encounters certain challenges that have to be addressed and resolved, including real-time autonomy, data and training, trust and cybersecurity, and energy-aware learning. UAV real-time autonomy challenges involve complex technical issues such as reliable communication, airspace management, energy endurance, sensor machine learning integration, public trust issues, and cyber security. These challenging issues require advanced real-time path planning, fault-tolerant algorithms, and secure data management to achieve truly independent and safe operations in dynamically changing environment. Real-time advanced path planning needs mechanisms that can quickly re-plan efficient and reliable routes in real time while considering changing user locations and encountering dynamic obstacles. For energy endurance, the UAV system must overcome the battery limitations for longer flight duration and higher payload capacity by optimizing materials, aerodynamics, and power solutions. Fault tolerance can be achieved by building predictive fault detection systems as well as redundant architectures to manage component failures. Cyber security can be managed by having protection algorithms, encryption, and anomaly detection against data breaches and hijacking. Airspace integration can be achieved by developing standard air traffic management and integrate UAVs with manned civil and military aviation.

6. Conclusion and Future Prospects

In this research, most of the papers published in the field of unmanned aerial vehicles that utilize machine learning in their artificial intelligence have been surveyed and analyzed to find what methodologies they had adopted to obtain a solution.The main methodologies surveyed are the supervised, the unsupervised, and the reinforcement learning. The survey also targeted the most used categories of unmanned aerial vehicles. These categories include fixed-wing, single-rotor, multi-rotor, and hybrid VTOL unmanned aerial vehicles. In the analyses, the survey focused the number of published papers per machine learning methodology, per UAV category, and per geographic region of all surveyed papers for four specific periods of time: past 12 months, past 24 months, past five years, and for past ten years. It is found, according to this survey, that the application of all three types of machine learning (supervised, unsupervised, and reinforcement) have been increased over the past 12 months, 24 months, five years, and ten years, with reinforcement learning getting the most increasing trend, followed by the supervised learning, and the unsupervised learning getting the least increasing trend. It is also found that among all four periods, the past ten years showed a significant increase in the machine learning application. Then per geographic region, China gets the highest score in the number of published papers count, followed immediately by the North America and Europe. But when the categories of unmanned aerial vehicles is analyzed, it is found that the multi-rotor UAVs got the highest count in application, followed by the fixed-wing and single-rotor UAVs. Though hybrid VTOL UAVs have significant application, however, they have the least significant published papers count.
The future prospects for UAVs are driven by machine learning and autonomy, with widespread adoption in infrastructure inspection, defense, agriculture, and even logistics. UAVs are transforming industries as they enable operations in remote areas, improve efficiency, and reduce costs. In fact, some industries are becoming dependent on UAVs even for complex tasks such as autonomous inspection, exploration, and swarming missions.

Data Availability Statement

Data used in this research will be available upon request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. Austin, R. Unmanned Aircraft System UAVS Design, Development and Deployment; Wiley, 2010. [Google Scholar]
  2. Barnhart, R.; Hottman, S.; Marshall, D.; Shappee, E. Introduction to Unmanned Aircraft Systems; CRC Press, 2012. [Google Scholar]
  3. Garg, P. Unmanned Aerial Vehicles. Mercury Learning and Information, 2021. [Google Scholar]
  4. Karali, H.; Inalhan, G.; Tsourdos, A. AI-Driven Multidisciplinary Conceptual Design of Unmanned Aerial Vehicles. AIAA SciTech Forum, 8-12 January 2024. [Google Scholar]
  5. Barrera, N. Unmanned Aerial Vehicles; Nova Science Publishers, 2021. [Google Scholar]
  6. Liu, G. Machine Learning with Python: Theory and Applications; World Scientific, 2023. [Google Scholar]
  7. Sahay, A.; Sahay, R. Machine Learning Fundamentals: Concepts, Models, and Applications; Business Expert Press, 2025. [Google Scholar]
  8. Kelleher, J.; MacNamee, B.; D’Arcy, A. Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies; The MIT Press: Cambridge, Massachusetts, 2015. [Google Scholar]
  9. Buduma, N.; Buduma, N.; Papa, J. Fundamentals of Deep Learning: Designing Next-Generation Machine Intelligence Algorithms, 2nd Edition ed; O’Reilly Media, 2022. [Google Scholar]
  10. Yan, C.; Xiang, X.; Wang, C. Fixed-Wing UAVs Flocking in Continuous Spaces: A Deep Reinforcement Learning Approach. Robot. Auton. Syst. 2020, Vol. 131. [Google Scholar] [CrossRef]
  11. De Luca, G.; Silva, J.; Cerasoli, S.; Araujo, J.; Campos, J.; Di Fazio, S.; Modica, G. Object-Based Land Cover Classification of Cork Oak Woodlands using UAV Imagery and Orfeo ToolBox. Remote Sens. 2019, Vol. 11. [Google Scholar] [CrossRef]
  12. Khanzada, H.; Maqsood, A.; Basit, A. Reinforcement learning for UAV flight controls: Evaluating continuous space reinforcement learning algorithms for fixed-wing UAVs. PLoS ONE 2025, Vol. 20. [Google Scholar] [CrossRef]
  13. Li, J.; Xu, S.; Wu, Y.; Zhang, Z. Automatic Landing Control for Fixed-Wing UAV in Longitudinal Channel Based on Deep Reinforcement Learning. Drones 2024, Vol. 8. [Google Scholar] [CrossRef]
  14. Tang, C.; Lai, Y. Deep Reinforcement Learning Automatic Landing Control of Fixed-Wing Aircraft Using Deep Deterministic Policy Gradient. International Conference on Unmanned Aircraft Systems, 2020. [Google Scholar]
  15. Zhao, Y.; Guo, J.; Bai, C.; Zheng, H. Reinforcement Learning-Based Collision Avoidance Guidance Algorithm for Fixed-Wing UAVs. Complexity 2021, Vol. 2021. [Google Scholar] [CrossRef]
  16. Lv, L.; Jia, W.; He, R.; Sun, W. Adaptive Strategy for the Path Planning of Fixed-Wing UAV Swarms in Complex Mountain Terrain via Reinforcement Learning. Aerospace 2025, Vol. 12. [Google Scholar] [CrossRef]
  17. Cui, A.; Zhang, Y.; Zhang, P.; Dong, W.; Wang, C. Intelligent Health Management of Fixed-Wing UAVs: A Deep-Learning-Based Approach. 16th International Conference on Control, Automation, Robotics and Vision, 2020. [Google Scholar]
  18. Sezgin, A.; Sezgin, B. Predictive Energy Modeling for Drone Missions: A Regression-Based Decision Framework. IEEE International Conference on Environment and Electrical Engineering, 2025. [Google Scholar]
  19. Giral, F.; Gomez, I.; Le Clainche, S. Control and Motion Planning of Fixed-Wing UAV through Reinforcement Learning. Results Eng. 2024, Vol. 23. [Google Scholar] [CrossRef]
  20. Musavi, N.; Onural, D.; Gunes, K.; Yildiz, Y. Unmanned Aircraft Systems Airspace Integration: A Game Theoretical Framework for Concept Evaluations. J. Guid. Control Dyn. 2017, Vol. 40. [Google Scholar] [CrossRef]
  21. Santos, N. Fixed-Wing UAV Pose Estimation Using a Self-Organizing Map and Deep Learning. Robotics 2024. [Google Scholar]
  22. Guerra-Langan, A.; Araujo-Estrada, S.; Richards, A.; Windsor, S. Simulation of a Machine Learning Based Controller for a Fixed-Wing UAV with Distributed Sensors. AIAA SciTech Forum 2020, Vol. 1. [Google Scholar]
  23. Bohn, E.; Coates, E.; Reinhardt, D.; Johansen, T. Data-Efficient Deep Reinforcement Learning for Attitude Control of Fixed-Wing UAVs: Field Experiments. In IEEE Trans. Neural Netw. Learn. Syst.; 2024; Vol. 35. [Google Scholar]
  24. Borup, K.; Fossen, T.; Johansen, T. A Machine Learning Approach for Estimating Air Data Parameters of Small Fixed-Wing UAVs Using Distributed Pressure Sensors. In IEEE Trans. Aerosp. Electron. Syst.; 2020; Vol. 56. [Google Scholar]
  25. Pasha, A.; Sankaralingam, L.; Rahman, M.; Alam, M.; Juhany, K. MEMS Fault-Tolerant Machine Learning Algorithm Assisted Attitude Estimation for Fixed-Wing UAVs. Eng. Appl. Artif. Intell. 2024, Vol. 129. [Google Scholar] [CrossRef]
  26. Jiang, J.; Atkinson, P.; Zhang, J.; Lu, R.; Zhou, Y.; Cao, Q.; Tian, Y.; Zhu, Y.; Cao, W.; Liu, X. Combining Fixed-Wing UAV Multispectral Imagery and Machine Learning to Diagnose Winter Wheat Nitrogen Status at the Farm Scale. Eur. J. Agron. 2022, Vol. 138. [Google Scholar] [CrossRef]
  27. Bronz, M.; Baskaya, E.; Delahaye, D.; Puechmorel, S. Real-time Fault Detection on Small Fixed-Wing UAVs using Machine Learning. HAL 2020. [Google Scholar]
  28. Yu, Z.; Li, J.; Xu, Y.; Zhang, Y.; Jiang, B.; Su, C. Reinforcement Learning-Based Fractional-Order Adaptive Fault-Tolerant Formation Control of Networked Fixed-Wing UAVs with Prescribed Performance. IEEE Trans. Neural Netw. Learn. Syst. 2024, Vol. 35. [Google Scholar] [CrossRef]
  29. Zhao, X.; Tan, J.; Meng, W.; Yu, Z.; Yan, Y.; Zhang, Z. Cooperative Encirclement and Obstacle Avoidance of Fixed-Wing UAVs via MADDPG with Curriculum Learning. Drones 2025, Vol. 9. [Google Scholar] [CrossRef]
  30. Tang, J.; Xie, N.; Li, K.; Liang, Y.; Shen, X. Trajectory Tracking Control for Fixed-Wing UAV Based on DDPG. J. Aerosp. Eng. 2024, Vol. 37. [Google Scholar] [CrossRef]
  31. Xu, D.; Guo, Y.; Yu, Z.; Wang, Z.; Lan, R.; Zhao, R.; Xie, X.; Long, H. PPO-Exp: Keeping Fixed-Wing UAV Formation with Deep Reinforcement Learning. Drones 2023, Vol. 7. [Google Scholar] [CrossRef]
  32. Zhuang, X.; Li, D.; Wang, Y.; Liu, X.; Li, H. Optimization of High-Speed Fixed-Wing UAV Penetration Strategy Based on Deep Reinforcement Learning. Aerosp. Sci. Technol. 2024, Vol. 148. [Google Scholar]
  33. Yuan, X.; Hu, S.; Ni, W.; Wang, X.; Jamalipour, A. Deep Reinforcement Learning-Driven Reconfigurable Intelligent Surface-Assisted Radio Surveillance With a Fixed-Wing UAV. In IEEE Trans. Inf. Forensics Secur.; 2023; Vol. 18. [Google Scholar]
  34. Liu, H.; Ren, Z.; Duan, H.; Basin, M. Reinforcement Learning-Based Formation Control for Networked Fixed-Wing UAVs: Self-Triggered Observer–Feedforward–Feedback Design and Experiment. IEEE Trans. Syst. Man. Cybern. Syst. 2025. [Google Scholar]
  35. Zhao, X.; He, L.; Liu, X.; Han, K.; Li, J. A Novel Reinforcement Learning Framework for Optimizing Fixed-Wing UAV. Aerosp. Sci. Technol. 2025, Vol. 166. [Google Scholar] [CrossRef]
  36. Yuan, Y.; Yang, J.; Yu, Z.; Cheng, Y.; Jiao, P.; Hua, L. Hierarchical Goal-Guided Learning for the Evasive Maneuver of Fixed-Wing UAVs Based on Deep Reinforcement Learning. J. Intell. Robot. Syst. 2023, Vol. 109. [Google Scholar]
  37. Hayat, M.; Wu, J.; Cao, Y. Unsupervised Bayesian Learning for Rice Panicle Segmentation with UAV Images. Plant Methods 2020, Vol. 16. [Google Scholar] [CrossRef]
  38. Tan, M.; Sun, H.; Ding, D.; Zhou, H.; Liu, Y. Scalable Pursuit–Evasion Game for Multi-Fixed-Wing UAV Based on Dynamic Target Assignment and Hierarchical Reinforcement Learning. Drones 2025, Vol. 10. [Google Scholar]
  39. Li, L.; Zhang, X.; Wang, Y.; Qian, C.; Zhao, M. Reinforcement Learning Methods for Fixed-Wing Aircraft Control. 9th International Conference on Systems and Informatics, 2023. [Google Scholar]
  40. Guo, Y.; Xu, D.; Wang, C.; Li, J.; Long, H. An Invulnerable Leader–Follower Collision-Free Unmanned Aerial Vehicle Flocking System with Attention-Based Multi-Agent Reinforcement Learning. Eng. Appl. Artif. Intell. 2025, Vol. 160. [Google Scholar] [CrossRef]
  41. Hu, W.; Wang, Y.; Chen, Q.; Wang, P.; Wu, E.; Guo, Z.; Hou, Z. TD3 Agent-Based Nonlinear Dynamic Inverse Control for Fixed-Wing UAV Attitudes. IEEE Transactions on Intelligent Transportation Systems, 2025. [Google Scholar]
  42. Khan, M.; Dil, M.; Misbah, M.; Orakazi, F.; Alam, M.; Kaleem, Z. TransLearn-YOLOx: Improved-YOLO with Transfer Learning for Fast and Accurate Multiclass UAV Detection. International Conference on Communication, Computing and Digital Systems, 2023. [Google Scholar]
  43. Peng, H.; Razi, A.; Afghah, F.; Ashdown, J. A Unified Framework for Joint Mobility Prediction and Object Profiling of Drones in UAV Networks. J. Commun. Netw. 2018, Vol. 20. [Google Scholar] [CrossRef]
  44. Rostami, M.; Farajollahi, A.; Habibi, P. Power Management of Hybrid Fuel Cell Fixed Wing UAVs Using a Fuzzy Reinforcement Learning System Optimized with Meta-Heuristic Methods. Sci. Rep. 2025, Vol. 16. [Google Scholar] [CrossRef]
  45. Wang, S.; Jia, Z.; Liu, Z.; Tang, Y.; Qin, X.; Wang, X. Self-Supervised Contrast Learning Based UAV Fault Detection and Interpretation with Spatial–Temporal Information of Multivariate Flight Data. Expert Syst. With Appl. 2025, Vol. 267. [Google Scholar]
  46. Fotouhi, A.; Ding, M.; Hassan, M. Deep Q-Learning for Two-Hop Communications of Drone Base Stations. Sensors 2021, Vol. 21. [Google Scholar]
  47. Valasek, J.; Kirkpatrick, K.; May, J.; Harris, J. Intelligent Motion Video Guidance for Unmanned Air System Ground Target Surveillance. J. Aerosp. Inf. Syst. 2016, Vol. 13. [Google Scholar]
  48. Lin, Y.; Gao, H.; Xia, Y. Distributed Pursuit–Evasion Game Decision-Making Based on Multi-Agent Deep Reinforcement Learning. Electronics 2025, Vol. 14. [Google Scholar]
  49. Zhang, M.; Li, Y.; Hou, Z. Prescribed-Time Hierarchical Optimal Formation Control for UAVs Under DoS Attacks. Int. J. Robust. Nonlinear Control 2025, Vol. 35. [Google Scholar]
  50. Hu, S.; Yuan, X.; Ni, W.; Wang, X.; Jamalipour, A. Visual-Based Moving Target Tracking With Solar-Powered Fixed-Wing UAV: A New Learning-Based Approach. IEEE Trans. Intell. Transp. Syst. 2024, Vol. 25. [Google Scholar]
  51. Xue, Y.; Yu, Q.; Wang, T.; Huang, Y.; Yang, B.; Wang, H. Collaborative Control Strategy for Low-Cost Fixed-Wing UAV Swarms Based on Deep Q Network. Aerosp. Sci. Technol. 2025, Vol. 170. [Google Scholar] [CrossRef]
  52. Chowdhury, M.; Keshmiri, S. Interchangeable Reinforcement-Learning Flight Controller for Fixed-Wing UASs. In IEEE Trans. Aerosp. Electron. Syst.; 2024; Vol. 60. [Google Scholar]
  53. Mei, J.; Zhang, Y.; Tong, Z.; Li, K. Trajectory Design for Data Collection Under Insufficient UAV Energy: A Staged Actor–Critic Reinforcement Learning Approach. J. Syst. Archit. 2025, Vol. 168. [Google Scholar] [CrossRef]
  54. Din, A.; Mir, I.; Gul, F.; Mir, S.; Alhady, S.; Al Nasar, M.; Alkhazaleh, H.; Abualigah, L. Robust Flight Control System Design of a Fixed Wing UAV Using Optimal Dynamic Programming. Soft Comput. 2022, Vol. 27. [Google Scholar] [CrossRef]
  55. Ou, Y.; Xiong, H.; Jiang, H.; Zhang, Y.; Noack, B. Dynamic Obstacle Avoidance of Fixed-Wing Aircraft in Final Phase via Reinforcement Learning. IEEE Trans. Aerosp. Electron. Syst. 2024, Vol. 60. [Google Scholar] [CrossRef]
  56. Zhuang, X.; Li, D.; Li, H.; Wang, Y.; Zhu, J. A Dynamic Control Decision Approach for Fixed-Wing Aircraft Games via Hybrid Action Reinforcement. Sci. China-Inf. Sci. 2025, Vol. 68. [Google Scholar] [CrossRef]
  57. Haughn, K.; Harvey, C.; Inman, D. Deep Learning Reduces Sensor Requirements for Gust Rejection on a Small Uncrewed Aerial Vehicle Morphing Wing. Commun. Eng. 2024, Vol. 3. [Google Scholar]
  58. Raoufi, M.; Telikani, A.; Zhang, T.; Shen, J. Fire Front Path Planning and Tracking Control of Uncrewed Aerial Vehicles Using Deep Reinforcement Learning. Robot. Auton. Syst. 2025, Vol. 193. [Google Scholar] [CrossRef]
  59. Wang, C.; Yan, C.; Xiang, X.; Zhou, H. A Continuous Actor-Critic Reinforcement Learning Approach to Flocking with Fixed-Wing UAVs. Asian Conference on Machine Learning 2019, Vol. 101. [Google Scholar]
  60. Zhao, W.; Chu, H.; Miao, X.; Guo, L.; Shen, H.; Zhu, C.; Zhang, F.; Liang, D. Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance. Sensors 2020, Vol. 20. [Google Scholar] [CrossRef]
  61. Zhang, B.; Wu, C.; Wu, L.; Zhuang, X. CPSS: Collision-Free Policy for 3D Fixed-Wing UAV Swarms Based on Scale–Scalable in Radar Threat Environments. Int. J. Aeronaut. Space Sci. 2026, Vol. 27. [Google Scholar] [CrossRef]
  62. Qian, M.; Ding, W.; Gao, Z.; Wang, C. Learning-Based Collision-Free Cooperative Fault-Tolerant Control for Multiple UAVs under Denial-of-Service Attacks. Commun. Nonlinear Sci. Numer. Simul. 2026, Vol. 152. [Google Scholar] [CrossRef]
  63. Yang, W.; Wu, M.; Wen, X.; Cao, J.; Lv, M.; Shi, Y. Adaptive Optimal Control for Distributed Fixed-Wing UAV Formations: Integration of Predefined-Time Convergence and Actor-Critic Learning. Neurocomputing 2025, Vol. 664. [Google Scholar] [CrossRef]
  64. Ao, T.; Li, H.; Zhang, K.; Shi, H.; Shi, L.; Liu, F.; Zhou, Y. Heterogeneous UAVs Trajectory Optimization for Post-Disaster Target Search Based on MARL with Graph Attention Network. IEEE Transactions on Vehicular Technology 2025. [Google Scholar] [CrossRef]
  65. Han, J.; Zhu, Y.; Yang, J. A Deep Reinforcement Learning Method for Collision Avoidance with Dense Speed-Constrained Multi-UAV. IEEE Robot. Autom. Lett. 2025, Vol. 10. [Google Scholar] [CrossRef]
  66. Meng, B.; Zhang, K.; Jiang, B. Fixed-Time Optimal Fault-Tolerant Formation Control With Prescribed Performance for Fixed-Wing UAVs Under Dual Faults. In IEEE Trans. Signal Inf. Process. Over Netw.; 2023; Vol. 9. [Google Scholar]
  67. Xie, F.; Chai, Y.; Chen, X.; Zhang, K.; Liu, Q. Robust Fractional-Order PID Controller Design for Fixed-Wing UAVs Through Proximal-Policy-Optimization for Disturbance Rejection. Int. J. Mach. Learn. Cybern. 2025, Vol. 16. [Google Scholar] [CrossRef]
  68. Ali, M.; Maqsood, A.; Athar, U.; Khanzada, H. Comparative Evaluation of Reinforcement Learning Algorithms for Multi-Agent Unmanned Aerial Vehicle Path Planning in 2D and 3D Environments. Drones 2025, Vol. 9. [Google Scholar] [CrossRef]
  69. Cheng, Q.; Wang, X.; Yang, J.; Shen, L. Automated Enemy Avoidance of Unmanned Aerial Vehicles Based on Reinforcement Learning. Appl. Sci. 2019, Vol. 9. [Google Scholar] [CrossRef]
  70. Yan, C.; Sun, Y.; Jiang, Y.; Xiang, X.; Chen, M. Selective Imitation Enhanced Deep Reinforcement Learning for AAV Navigation and Obstacle Avoidance with Sparse Rewards. IEEE Trans. Intell. Transp. Syst. 2025, Vol. 26. [Google Scholar] [CrossRef]
  71. Su, M.; Chai, H.; Zhao, C.; Lyu, Y.; Hu, J. Lightweight Obstacle Avoidance for Fixed-Wing UAVs Using Entropy-Aware PPO. Drones 2025, Vol. 9. [Google Scholar]
  72. Julian, K.; Kochenderfer, M. Distributed Wildfire Surveillance with Autonomous Aircraft Using Deep Reinforcement Learning. J. Guid. Control Dyn. 2019, Vol. 42. [Google Scholar] [CrossRef]
  73. Cui, Y.; He, X.; Luo, C. A DRL-Based Framework for Energy-Optimal Trajectory Optimization Between Disjoint Regions in Wind Shear Environments Based on Dynamic Soaring. IEEE Access 2025, Vol. 13. [Google Scholar] [CrossRef]
  74. Wu, J.; Wang, H.; Liu, Y.; Zhang, M.; Wu, T. Learning-Based Fixed-Wing UAV Reactive Maneuver Control for Obstacle Avoidance. Aerosp. Sci. Technol. 2022, Vol. 126. [Google Scholar] [CrossRef]
  75. He, Y.; Hu, R.; Liang, K.; Liu, Y.; Zhou, Z. Deep Reinforcement Learning Algorithm with Long Short-Term Memory Network for Optimizing Unmanned Aerial Vehicle Information Transmission. Mathematics 2025, Vol. 13. [Google Scholar] [CrossRef]
  76. Jin, S.; Zhao, W. Temporal-Sequence Offline Reinforcement Learning for Transition Control of a Novel Tilt-Wing Unmanned Aerial Vehicle. Aerospace 2025, Vol. 12. [Google Scholar] [CrossRef]
  77. Chang, G.; Ren, S.; Zhang, S.; Zhang, X. Hierarchical Decision Making-Based Intelligent Game Confrontation on UAV Swarm. Aerospace 2025, Vol. 12. [Google Scholar] [CrossRef]
  78. Liu, Z.; Zhang, Jiayi; Zeng, Y.; Ai, B. Energy-Efficient Multi-Agent Reinforcement Learning for UAV Trajectory Optimization in Cell-Free Massive MIMO Networks. IEEE Trans. Wirel. Commun. 2025, Vol. 24. [Google Scholar] [CrossRef]
  79. Devaraju, S.; Ihler, A.; Kumar, S. A Deep-Q-Learning-Based Base-Station-Connectivity-Aware Decentralized Pheromone Mobility Model for Autonomous UAV Networks. IEEE Trans. Aerosp. Electron. Syst. 2024, Vol. 60. [Google Scholar] [CrossRef]
  80. Du, H.; You, M.; Zhao, X. Globally Guided Deep V-Network-Based Motion Planning Algorithm for Fixed-Wing Unmanned Aerial Vehicles. Sensors 2024, Vol. 24. [Google Scholar] [CrossRef]
  81. Wang, W.; Wang, L.; Wu, J.; Tao, X.; Wu, H. Oracle-Guided Deep Reinforcement Learning for Large-Scale Multi-UAVs Flocking and Navigation. IEEE Trans. Veh. Technol. 2022, Vol. 71. [Google Scholar] [CrossRef]
  82. Wang, Y.; Wang, X.; Shen, L. Approximate Optimal Curve Path Tracking Control for Nonlinear Systems with Asymmetric Input Constraints. Drones 2022, Vol. 6. [Google Scholar] [CrossRef]
  83. Wada, D.; Araujo-Estrada, S.; Windsor, S. Sim-to-Real Transfer for Fixed-Wing Uncrewed Aerial Vehicle: Pitch Control by High-Fidelity Modelling and Domain Randomization. IEEE Robot. Autom. Lett. 2022, Vol. 7. [Google Scholar] [CrossRef]
  84. Viseras, A.; Meissner, M.; Marchal, J. Wildfire Front Monitoring With Multiple UAVs Using Deep Q-Learning. IEEE Access 2025, Vol. 13. [Google Scholar]
  85. Yuksek, B.; Inalhan, G. Transition Flight Control System Design for Fixed-Wing VTOL UAV: A Reinforcement Learning Approach. In Proceedings of the AIAA SCITECH 2022 Forum, 2022; Vol. 2022. [Google Scholar]
  86. Domitran, S.; Babac, M. A Machine Learning Approach to Flight Control of a VTOL Tailsitter UAV. 44th International Convention on Information, Communication and Electronic Technology, 2021; Vol. 2021. [Google Scholar]
  87. Makhtar, S.; Sani, F.; Nor, E.; Kamarudin, N.; Ali, S. Faulty Classification System for VTOL UAV Acoustic Signal using Machine Learning. J. Kejuruter. 2025, Vol. 37. [Google Scholar] [CrossRef]
  88. Yu, F.; Tang, W.; Chen, J.; Wang, J.; Sun, X.; Chen, X. Deep Reinforcement Learning Based Energy Management Strategy for Vertical Take-Off and Landing Aircraft with Turbo-Electric Hybrid Propulsion System. Aerospace 2025, Vol. 12. [Google Scholar] [CrossRef]
  89. Ma, B.; Liu, Z.; Zhao, W.; Yuan, J.; Long, H.; Wang, X.; Yuan, Z. Target Tracking Control of UAV Through Deep Reinforcement Learning. IEEE Trans. Intell. Transp. Syst. 2023, Vol. 24. [Google Scholar]
  90. Xu, J.; Du, T.; Foshey, M.; Li, B.; Zhu, B.; Schulz, A.; Matusik, W. Learning to Fly: Computational Controller Design for Hybrid UAVs with Reinforcement Learning. In ACM Trans. Graph.; 2019; Vol. 38. [Google Scholar]
  91. Ugur, M.; Yesildirek, A. Improving Control Performance of Tilt-Rotor VTOL UAV with Model-Based Reward and Multi-Agent Reinforcement Learning. Aerospace 2025, Vol. 12. [Google Scholar]
  92. Sonkar, S.; Kumar, P.; George, R.; Yuvaraj, T.; Phillip, D.; Ghosh, A. Real-Time Object Detection and Recognition Using Fixed-Wing LALE VTOL UAV. IEEE Sens. J. 2022, Vol. 22. [Google Scholar] [CrossRef]
  93. Saj, V.; Lee, B.; Kalathil, D.; Benedict, M. Robust Reinforcement Learning Control for Vision-Based Ship Landing of VTOL-UAVs. J. Am. Helicopter Soc. 2025, Vol. 70. [Google Scholar] [CrossRef]
  94. Xia, K.; Huang, Y.; Zou, Y.; Zuo, Z. Reinforcement Learning Control for Moving Target Landing of VTOL UAVs With Motion Constraints. In IEEE Trans. Ind. Electron.; 2024; Vol. 71. [Google Scholar]
  95. Yang, J.; Tang, D.; Yu, J.; Zhang, J.; Liu, H. Explaining Anomalous Events in Flight Data of UAV With Deep Attention-Based Multi-Instance Learning. IEEE Trans. Veh. Technol. 2024, Vol. 73. [Google Scholar] [CrossRef]
  96. Ali, A.; Gupta, A.; Hashim, H. Deep Reinforcement Learning for Sim-to-Real Policy Transfer of VTOL-UAVs Offshore Docking Operations. Appl. Soft Comput. 2024, Vol. 162. [Google Scholar] [CrossRef]
  97. Cheng, W.; Li, H.; Hou, Z. Fault-Tolerant Control for UAV Based on Deep Reinforcement Learning under Single Rotor Failure. 36th Chinese Control and Decision Conference, 2024. [Google Scholar]
  98. Huang, J.; Wu, B.; Duan, Q.; Dong, L.; Yu, S. A Fast UAV Trajectory Planning Framework in RIS-Assisted Communication Systems with Accelerated Learning via Multithreading and Federating. IEEE Trans. Mob. Comput. 2025, Vol. 24. [Google Scholar] [CrossRef]
  99. Jiang, B.; Li, B.; Chang, C.; Wen, C. Hybrid Aerodynamics-Based Model Predictive Control for a Tail-Sitter UAV. arXiv.org 2023. [Google Scholar]
  100. Kestur, R.; Angural, A.; Bashir, B.; Omkar, S.; Anand, G.; Meenavathi, M. Tree Crown Detection, Delineation and Counting in UAV Remote Sensed Images: A Neural Network Based Spectral–Spatial Method. J. Indian Soc. Remote Sens. 2018, Vol. 46. [Google Scholar] [CrossRef]
  101. Jin, P.; Ma, Q.; Xu, S. Dynamic Event-Triggered Robust Optimal Attitude Control of QUAV Using Reinforcement Learning. IEEE Trans. Aerosp. Electron. Syst. 2023, Vol. 59. [Google Scholar] [CrossRef]
  102. Silva, D.; Machado, R.; Coutinho, O.; Antreich, F. A Soft-Kill Reinforcement Learning Counter Unmanned Aerial System (C-UAS) With Accelerated Training. IEEE Access, 2023. [Google Scholar]
  103. Wang, C.; Wang, J.; Wei, C.; Zhu, Y.; Yin, D.; Li, J. Vision-Based Deep Reinforcement Learning of UAV-UGV Collaborative Landing Policy Using Automatic Curriculum. Drones 2023, Vol. 7. [Google Scholar] [CrossRef]
  104. He, H.; Qiu, Y.; Geng, J. Imperative MPC: An End-to-End Self-Supervised Learning with Differentiable MPC for UAV Attitude Control. arXiv.org 2025. [Google Scholar]
  105. Adaika, Z.; Boumehraz, M.; Mahmoudi, A. Data-driven Diagnosis of Quadcopter Thrust Fault Using Supervised Learning with Airframe Vibration Signals. 2nd International Conference on Electrical Engineering and Automatic Control, 2024. [Google Scholar]
  106. Khan, S.; Tufail, M.; Khan, M.; Khan, Z.; Iqbal, J.; Alam, M. A Novel Semi-Supervised Framework for UAV Based Crop/Weed Classification. PLoS ONE 2021, Vol. 16. [Google Scholar] [CrossRef]
  107. Zeng, C.; Behjat, A. Uncertainty-aware Optimal Flight State Selection for a Transitioning UAV via Simulation-based Learning. AIAA Multidisciplinary Analysis and Optimization Conference, 2018. [Google Scholar]
  108. Hernandez-Hernandez, R.; Rubio-Solis, A. A Hybrid Multilayer Extreme Learning Machine for Image Classification with an Application to Quadcopters. arXiv.org 2025. [Google Scholar]
  109. Zhang, G.; Hsu, L. Intelligent GNSS/INS Integrated Navigation System for a Commercial UAV Flight Control System. Aerosp. Sci. Technol. 2018, Vol. 80. [Google Scholar] [CrossRef]
  110. Alotaibi, T.; Jambi, K.; Khemakhem, M.; Eassa, F.; Bourennani, F. Outdoor Dataset for Flying a UAV at an Appropriate Altitude. Drones 2025, Vol. 9. [Google Scholar] [CrossRef]
  111. Li, T.; Ma, Z.; Shao, J.; Zhao, Y.; Zhang, X.; Cheng, Y. Path Planning for Unmanned Aerial Vehicle Swarm Based on Electromagnetic Environment Sensing. Chin. J. Electron. 2025, Vol. 34. [Google Scholar] [CrossRef]
  112. Pavon, W.; Chavez, J.; Guffanti, D.; Asiedu-Asante, A. Unmanned Aerial Vehicle Position Tracking Using Nonlinear Autoregressive Exogenous Networks Learned from Proportional-Derivative Model-Based Guidance. Math. Comput. Appl. 2025, Vol. 30. [Google Scholar] [CrossRef]
  113. Crowe, D.; Pamula, R.; Cheung, H.; De Wekker, S. Two Supervised Machine Learning Approaches for Wind Velocity Estimation Using Multi-Rotor Copter Attitude Measurements. Sensors 2020, Vol. 20. [Google Scholar] [CrossRef]
  114. Yang, P.; Li, W.; Wen, C.; Liu, P. Fault Diagnosis Method of Multi-Rotor UAV Based on One-Dimensional Convolutional Neural Network with Adaptive Batch Normalization Algorithm. Meas. Sci. Technol. 2024, Vol. 35. [Google Scholar] [CrossRef]
  115. Liu, X.; Yuan, Z.; Gao, Z.; Zhang, W. Reinforcement Learning-Based Fault-Tolerant Control for Quadrotor UAVs Under Actuator Fault. IEEE Trans. Ind. Inform. 2024, Vol. 20. [Google Scholar] [CrossRef]
  116. de Souza, J.; Marcato, A.; de Aguiar, E.; Jucá, M.; Teixeira, A. Autonomous Landing of UAV Based on Artificial Neural Network Supervised by Fuzzy Logic. J. Control Autom. Electr. Syst. 2019. [Google Scholar] [CrossRef]
  117. Haddad, A.; Boiko, I.; Zweiri, Y. Reinforcement Learning Generalization for Quadrotor with Slung Load Systems through Homogeneity Transformations; IEEE Industrial Electronics Society, 2025. [Google Scholar]
  118. Farid, G.; Zhang, L.; Younas, T.; Ilyas, M.; Iqbal, A. A Multi-Goal Reinforcement Learning Framework for Motion Planning of a Quadrotor UAV in 3D Cluttered Environment with Unseen Random Goals. Appl. Soft Comput. 2025, Vol. 184. [Google Scholar] [CrossRef]
  119. Trad, T.; Choutri, K.; Lagha, M.; Meshoul, S.; Khenfri, F.; Fareh, R.; Shaiba, H. Real-Time Implementation of Quadrotor UAV Control System Based on a Deep Reinforcement Learning Approach. Comput. Mater. Contin. 2024, Vol. 81. [Google Scholar] [CrossRef]
  120. Sönmez, S.; Montecchio, L.; Martini, S.; Rutherford, M.; Rizzo, A.; Stefanovic, M.; Valavanis, K. Reinforcement Learning-Based PD Controller Gains Prediction for Quadrotor UAVs. Drones 2025, Vol. 9, 581. [Google Scholar] [CrossRef]
  121. Zheng, H.; Liu, H.; Tian, X.; Mai, Q. Reinforcement Learning-Based Distributed Appointed-Time Optimal Trajectory Tracking Formation Control for Quadrotor UAVs. Applied Intelligence 2025. [Google Scholar] [CrossRef]
  122. Sun, Q.; Fang, J.; Zheng, W.; Tang, Y. Aggressive Quadrotor Flight Using Curiosity-Driven Reinforcement Learning. In IEEE Trans. Ind. Electron.; 2022; Vol. 69. [Google Scholar]
  123. Wang, C.; Wang, J.; Ma, Z.; Xu, M.; Qi, K.; Ji, Z.; Wei, C. Integrated Learning-Based Framework for Autonomous Quadrotor UAV Landing on a Collaborative Moving UGV. IEEE Trans. Veh. Technol. 2024, Vol. 73. [Google Scholar] [CrossRef]
  124. Luo, Q.; Li, Y.; Zeng, J.; Wu, G.; Wang, Y. Quadrotor Navigation Considering Attitude: A Deep Reinforcement Learning Method Using Tangent Path Rewards. Expert Systems With Applications 2025. [Google Scholar] [CrossRef]
  125. Yu, C.; Yang, Y.; Cheng, Y.; Wang, Z.; Shi, M. Trajectory Tracking Control of an Unmanned Aerial Vehicle with Deep Reinforcement Learning for Tasks inside the EAST. Fusion Eng. Des. 2023, Vol. 194. [Google Scholar] [CrossRef]
  126. Wen, G.; Yu, D.; Zhao, Y. Optimized Fuzzy Attitude Control of Quadrotor Unmanned Aerial Vehicle Using Adaptive Reinforcement Learning Strategy. IEEE Trans. Aerosp. Electron. Syst. 2024. [Google Scholar] [CrossRef]
  127. Chang, Y.; Wang, Z.; Qiu, L.; Wang, S.; Bai, Y. Reinforcement Learning–Based Adaptive Fault-Tolerant Antidisturbance Control for UAVs Subjected to External Disturbances, Input Uncertainties, and Structural Uncertainties. J. Aerosp. Eng. 2025, Vol. 38. [Google Scholar] [CrossRef]
  128. Alam, M.; Moh, S. Q-Learning-Based Routing Inspired by Adaptive Flocking Control for Collaborative Unmanned Aerial Vehicle Swarms. Veh. Commun. 2023, Vol. 40. [Google Scholar] [CrossRef]
  129. Tran, V.; Mabrok, M.; Anavatti, S.; Garratt, M.; Petersen, I. Robust Fuzzy Q-Learning-Based Strictly Negative Imaginary Tracking Controllers for the Uncertain Quadrotor Systems. IEEE Trans. Cybern. 2023, Vol. 53. [Google Scholar] [CrossRef]
  130. Quan, J.; Hu, W.; Ma, X.; Chen, G. Reinforcement Learning Stabilization for Quadrotor UAVs via Lipschitz-Constrained Policy Regularization. Drones 2025, Vol. 9. [Google Scholar] [CrossRef]
  131. Khojasteh, M.; Salimi-Badr, A. Autonomous Quadrotor Path Planning Through Deep Reinforcement Learning With Monocular Depth Estimation. IEEE Open Journal of Vehicular Technology, 2024. [Google Scholar]
  132. Ryou, G.; Wang, G.; Karman, S. Multi-Fidelity Reinforcement Learning for Time-Optimal Quadrotor Re-planning. Int. J. Robot. Res. 2025. [Google Scholar] [CrossRef]
  133. Xia, Y.; Du, J.; Zhang, Z.; Wang, Z.; Xu, J.; Mi, W. Standoff Target Tracking for Networked UAVs With Specified Performance via Deep Reinforcement Learning. IEEE J. Sel. Top. Signal Process. 2024, Vol. 18. [Google Scholar] [CrossRef]
  134. Estevez, J.; Lopez-Guede, J.; Valle-Echavarri, J.; Grana, M. Reinforcement Learning Based Trajectory Planning for Multi-UAV Load Transportation. IEEE Access 2024, Vol. 12. [Google Scholar] [CrossRef]
  135. Doukhi, O.; Lee, D. Sim-to-Real Learning-Based Nonlinear MPC for UAV Navigation and Collision Avoidance in Unknown Cluttered Environments. IEEE Access 2025, Vol. 13. [Google Scholar] [CrossRef]
  136. Chen, J.; Yu, C.; Li, G.; Tang, W.; Ji, S.; Yang, X.; Xu, B.; Yang, H.; Wang, Y. Online Planning for Multi-UAV Pursuit-Evasion in Unknown Environments Using Deep Reinforcement Learning. IEEE Robot. Autom. Lett. 2025, Vol. 10. [Google Scholar] [CrossRef]
  137. Han, H.; Cheng, J.; Xi, Z.; Lv, M. Symmetric Actor–Critic Deep Reinforcement Learning for Cascade Quadrotor Flight Control. Neurocomputing 2023, Vol. 559. [Google Scholar] [CrossRef]
  138. Jiménez, G.; Hueso, A.; Gómez-Silva, M. Reinforcement Learning Algorithms for Autonomous Mission Accomplishment by Unmanned Aerial Vehicles: A Comparative View with DQN, SARSA and A2C. Sensors 2023, Vol. 23. [Google Scholar] [CrossRef]
  139. Huo, X.; Guo, Y. Adaptive Collision-Free Control for UAVs with Discrete-Time System Based on Reinforcement Learning. Adv. Mech. Eng. 2024, Vol. 16. [Google Scholar] [CrossRef]
  140. Zhao, H.; Fu, H.; Yang, F.; Qu, C.; Zhou, Y. Data-Driven Offline Reinforcement Learning Approach for Quadrotor’s Motion and Path Planning. Chin. J. Aeronaut. 2024, Vol. 37. [Google Scholar] [CrossRef]
  141. Li, X.; Zhang, J.; Han, J. Trajectory Planning of Load Transportation with Multi-Quadrotors Based on Reinforcement Learning Algorithm. Aerosp. Sci. Technol. 2021, Vol. 116. [Google Scholar] [CrossRef]
  142. Xie, J.; Peng, X.; Wang, H.; Niu, W.; Zheng, X. UAV Autonomous Tracking and Landing Based on Deep Reinforcement Learning Strategy. Sensors 2020. [Google Scholar] [CrossRef]
  143. PB, K.; Kumar, K.; Fernandes, V.; Arya, K. Reinforcement Learning for Altitude Hold and Path Planning in a Quadcopter. 6th International Conference on Control, Automation and Robotics, 2020. [Google Scholar]
  144. Al-Haddad, L.; Jaber, A. An Intelligent Quadcopter Unbalance Classification Method Based on Stochastic Gradient Descent Logistic Regression. 3rd Information Technology to Enhance e-learning and Other Application, 2022. [Google Scholar]
  145. Park, J.; Kim, J.; Jung, Y. Binary Classification Fault Diagnosis for Octocopter Using Deep Neural Network. Mediterranean Conference on Control & Automation, 2021. [Google Scholar]
  146. Song, Y.; Naji, S.; Kaufmann, E.; Loquercio, A.; Scaramuzza, D. Flightmare: A Flexible Quadrotor Simulator. arXiv.org 2021. [Google Scholar]
  147. Mosweu, E.; Seokolo, T.; Akano, T.; Motsamai, O. Implementation of Partially Tuned PD Controllers of a Multirotor UAV Using Deep Deterministic Policy Gradient. J. Electr. Syst. Inf. Technol. 2024, Vol. 11. [Google Scholar] [CrossRef]
  148. Narmilan, A.; Gonzalez, F.; Salgadoe, A.; Powell, K. Detection of White Leaf Disease in Sugarcane Using Machine Learning Techniques over UAV Multispectral Images. Drones 2022, Vol. 6. [Google Scholar] [CrossRef]
  149. Chen, S.; Mo, Y.; Wu, X.; Xiao, J.; Liu, Q. Reinforcement Learning-Based Energy-Saving Path Planning for UAVs in Turbulent Wind. Electronics 2024, Vol. 13. [Google Scholar] [CrossRef]
  150. Sun, Y.; Li, X.; Li, J.; Liu, J. Yolov5-Lite-Based Real-Time UAV Detection for High-Security Zones: Ensuring Airspace Protection. 9th International Conference on Control, Robotics and Cybernetics, 2024. [Google Scholar]
  151. Cardenas, J.; Carrero, U.; Camacho, E.; Calderon, J. Intelligent Position Controller for Unmanned Aerial Vehicles (UAV) Based on Supervised Deep Learning. Machines 2023, Vol. 11. [Google Scholar] [CrossRef]
  152. Xiong, H.; Zhang, Y. Reinforcement Learning-Based Formation-Surrounding Control for Multiple Quadrotor UAVs Pursuit-Evasion Games. In ISA Trans.; 2024; Vol. 145. [Google Scholar]
  153. Faigl, J.; Vana, P. Surveillance Planning With Bezier Curves. IEEE Robot. Autom. Lett. 2018, Vol. 3. [Google Scholar]
  154. Xue, J.; Liu, Z.; Liu, G.; Zhou, Z.; Zhang, K.; Tang, Y.; Wang, J. Robust Wind-Resistant Hovering Control of Quadrotor UAVs Using Deep Reinforcement Learning. IEEE Transactions on Intelligent Vehicles, 2024. [Google Scholar]
  155. Parvaressh, N.; Kantraci, B. A Continuous Actor–Critic Deep Q-Learning-Enabled Deployment of UAV Base Stations: Toward 6G Small Cells in the Skies of Smart Cities. IEEE Open J. Commun. Soc. 2023, Vol. 4. [Google Scholar] [CrossRef]
  156. Polvara, R.; Patacchiola, M.; Sharma, S.; Wan, J.; Manning, A.; Sutton, R.; Cangelosi, A. Autonomous Quadrotor Landing using Deep Reinforcement Learning. arXiv.org 2018. [Google Scholar]
  157. Sabo, C.; Cope, A.; Gurney, K.; Vasilaki, E.; Nowotny, T.; Marshall, J. An Inexpensive Flying Robot Design for Embodied Robotics Research. International Joint Conference on Neural Networks Proceedings, 2017; Vol. 2017. [Google Scholar]
  158. Smolyanskiy, N.; Kamenev, A.; Smith, J.; Birchfield, S. Toward Low-Flying Autonomous MAV Trail Navigation using Deep Neural Networks for Environmental Awareness. IEEE-RSJ International Conference on Intelligent Robots and Systems, 2017; Vol. 2017. [Google Scholar]
  159. Jardine, P.; Givigi, S.; Yousefi, S. Experimental Results for Autonomous Model-Predictive Trajectory Planning Tuned with Machine Learning. 11th Annual IEEE International Systems Conference, 2017; Vol. 2017. [Google Scholar]
  160. Chen, B.; Chao, P. Robust Decentralized H∞ Attack-Tolerant Observer-Based Team Formation Network Control of Large-Scale Quadrotor UAVs: HJIE-Reinforcement Learning-Based Deep Neural Network Method. IEEE Access 2023. [Google Scholar] [CrossRef]
  161. Sun, Q.; Ji, J.; Mu, J.; Xu, J.; Kocarev, L.; Kurths, J.; Tang, Y. Vision-Based Hierarchical Reinforcement Learning for Quadrotor UAV Navigation. IEEE-ASME Trans. Mechatron. 2025, Vol. 30. [Google Scholar] [CrossRef]
  162. Zhang, Z.; Yang, H.; Fei, Y.; Sun, C.; Yu, Y. Control of UAV Quadrotor Using Reinforcement Learning and Robust Controller. IET Control Theory and Applications, 2023. [Google Scholar]
  163. Lin, X.; liu, J.; Yu, Y.; Sun, C. Event-Triggered Reinforcement Learning Control for the Quadrotor UAV with Actuator Saturation. Neurocomputing 2020. [Google Scholar] [CrossRef]
  164. Yu, B.; Lee, T. Modular Reinforcement Learning for a Quadrotor UAV With Decoupled Yaw Control. IEEE Robot. Autom. Lett. 2025, Vol. 10. [Google Scholar] [CrossRef]
  165. Sharma, P.; Poddar, P.; Sujit, P. A Model-free Deep Reinforcement Learning Approach To Maneuver A Quadrotor Despite Single Rotor Failure. International Conference on Emerging Technology in Autonomous Aerial Vehicles, 2025; Vol. 2025. [Google Scholar]
  166. Gong, Y.; Liu, X. Flight State Recognition for UAV Optical Flow Velocity Measurement. 4th International Conference on Mechanical Instrumentation and Automation, 2023. [Google Scholar]
  167. Alanazi, A. SSRL-UAVs: A Self-Supervised Deep Representation Learning Approach for GPS Spoofing Attack Detection in Small Unmanned Aerial Vehicles. Drones 2024, Vol. 8. [Google Scholar] [CrossRef]
  168. Jiang, X.; Liu, T.; Liu, L.; Liu, Z.; Liu, Y. Observations Temporal Permutation-Based Self-Supervised Reinforcement Learning for UAV Active Object Detection. IEEE Trans. Geosci. Remote Sens. 2025, Vol. 63. [Google Scholar] [CrossRef]
  169. Mou, Z.; Gao, F.; Liu, J.; Yun, X.; Wu, Q. Cluster Head Detection for Hierarchical UAV Swarm With Graph Self-Supervised Learning. IEEE Trans. Signal Process. 2024, Vol. 72. [Google Scholar] [CrossRef]
  170. Akhtar, M.; Maqsood, A. Comparative Analysis of Deep Reinforcement Learning Algorithms for Hover-to-Cruise Transition Maneuvers of a Tilt-Rotor Unmanned Aerial Vehicle. Aerospace 2024. [Google Scholar] [CrossRef]
  171. Wen, S.; Shen, N.; Zhang, J.; Lan, Y.; Han, J.; Yin, X.; Zhang, Q.; Ge, Y. Single-Rotor UAV Flow Field Simulation Using Generative Adversarial Networks. Comput. Electron. Agric. 2019, Vol. 67. [Google Scholar] [CrossRef]
  172. Hong, T.; Yang, Q.; Wang, P.; Zhang, J.; Sun, W.; Tao, L.; Fang, C.; Cao, J. Multitarget Real-Time Tracking Algorithm for UAV IoT. Hindawi Wirel. Commun. Mob. Comput. 2021, Vol. 2021. [Google Scholar] [CrossRef]
  173. Hossam-E-Haider, M.; Emon, A.; Kar, A.; Mubin, M. An Architectural Approach Using Machine Learning for Threat Detection in UAV-Based Defense System. International Conference on Electrical, Computer and Communication Engineering, 2025. [Google Scholar]
  174. Chen, X.; Zhang, H.; Song, J.; Guan, J.; Li, J.; He, Z. Micro-Motion Classification of Flying Bird and Rotor Drones via Data Augmentation and Modified Multi-Scale CNN. Remote Sens. 2022, Vol. 14. [Google Scholar] [CrossRef]
  175. Yen, B.; Hoika, Y.; Mace, B. Improving Power Spectral Density Estimation of Unmanned Aerial Vehicle Rotor Noise by Learning from Non-Acoustic Information. International Workshop on Acoustic Signal Enhancement, 2018. [Google Scholar]
  176. Marichal, G.; Castillo, M.; Lopez, J.; Padron, I.; Artes, M. An Artificial Intelligence Approach for Gears Diagnostics in AUVs. Sensors 2016. [Google Scholar] [CrossRef]
Figure 2. Raider, a fixed-wing hydrogen-powered drone. Credit: Heven Drones.
Figure 2. Raider, a fixed-wing hydrogen-powered drone. Credit: Heven Drones.
Preprints 217065 g002
Figure 3. Hybrid-VTOL UAV.
Figure 3. Hybrid-VTOL UAV.
Preprints 217065 g003
Figure 4. Single-rotor UAV.
Figure 4. Single-rotor UAV.
Preprints 217065 g004
Figure 5. Multi-rotor, quadcopters UAV.
Figure 5. Multi-rotor, quadcopters UAV.
Preprints 217065 g005
Figure 8. Machine learning categories and associated algorithms.
Figure 8. Machine learning categories and associated algorithms.
Preprints 217065 g008
Figure 9. Neural networks and associated algorithms.
Figure 9. Neural networks and associated algorithms.
Preprints 217065 g009
Figure 10. Methodologies publication shares percentage Last 12 Months.
Figure 10. Methodologies publication shares percentage Last 12 Months.
Preprints 217065 g010
Figure 11. Published papers count during Last 12 Months.
Figure 11. Published papers count during Last 12 Months.
Preprints 217065 g011
Figure 12. Methodologies application trend during Last 12 Months.
Figure 12. Methodologies application trend during Last 12 Months.
Preprints 217065 g012
Figure 13. Learning methodologies histogram for Last 12 Months.
Figure 13. Learning methodologies histogram for Last 12 Months.
Preprints 217065 g013
Figure 14. Machine learning publications per unmanned aerial vehicles category * Last 12 Months.
Figure 14. Machine learning publications per unmanned aerial vehicles category * Last 12 Months.
Preprints 217065 g014
Figure 15. Machine learning publications geographic regions histogram for Last 12 Months.
Figure 15. Machine learning publications geographic regions histogram for Last 12 Months.
Preprints 217065 g015
Figure 16. Methodologies publication shares percentage Last 24 Months.
Figure 16. Methodologies publication shares percentage Last 24 Months.
Preprints 217065 g016
Figure 17. Published papers count during Last 24 Months.
Figure 17. Published papers count during Last 24 Months.
Preprints 217065 g017
Figure 18. Methodologies utilization trend during Last 24 Months.
Figure 18. Methodologies utilization trend during Last 24 Months.
Preprints 217065 g018
Figure 19. Learning methodologies histogram for Last 24 Months.
Figure 19. Learning methodologies histogram for Last 24 Months.
Preprints 217065 g019
Figure 20. Machine learning publications per unmanned aerial vehicles category * Last 24 Months.
Figure 20. Machine learning publications per unmanned aerial vehicles category * Last 24 Months.
Preprints 217065 g020
Figure 21. Machine learning publications geographic regions histogram for Last 24 Months.
Figure 21. Machine learning publications geographic regions histogram for Last 24 Months.
Preprints 217065 g021
Figure 22. Methodologies publication shares percentage Last 5 Years.
Figure 22. Methodologies publication shares percentage Last 5 Years.
Preprints 217065 g022
Figure 23. Published papers count during Last 5 Years.
Figure 23. Published papers count during Last 5 Years.
Preprints 217065 g023
Figure 24. Methodologies utilization trend during Last 5 Years.
Figure 24. Methodologies utilization trend during Last 5 Years.
Preprints 217065 g024
Figure 25. Learning methodologies histogram for Last 5 Years.
Figure 25. Learning methodologies histogram for Last 5 Years.
Preprints 217065 g025
Figure 26. Machine learning publications per unmanned aerial vehicles category * Last 5 Years.
Figure 26. Machine learning publications per unmanned aerial vehicles category * Last 5 Years.
Preprints 217065 g026
Figure 27. Machine learning publications geographic regions histogram for Last 5 Years.
Figure 27. Machine learning publications geographic regions histogram for Last 5 Years.
Preprints 217065 g027
Figure 28. Methodologies publication shares percentage Last 10 Years.
Figure 28. Methodologies publication shares percentage Last 10 Years.
Preprints 217065 g028
Figure 29. Published papers count during Last 10 Years.
Figure 29. Published papers count during Last 10 Years.
Preprints 217065 g029
Figure 30. Methodologies utilization trend during Last 10 Years.
Figure 30. Methodologies utilization trend during Last 10 Years.
Preprints 217065 g030
Figure 31. Learning methodologies histogram for Last 10 Years.
Figure 31. Learning methodologies histogram for Last 10 Years.
Preprints 217065 g031
Figure 32. Machine learning publications per unmanned aerial vehicles category * Last 10 Years.
Figure 32. Machine learning publications per unmanned aerial vehicles category * Last 10 Years.
Preprints 217065 g032
Figure 33. Machine learning publications geographic regions histogram for Last 10 Years.
Figure 33. Machine learning publications geographic regions histogram for Last 10 Years.
Preprints 217065 g033
Table 1. Published Papers Count per Methodology for Last 12 Months.
Table 1. Published Papers Count per Methodology for Last 12 Months.
Methodology COUNT( Papers )
reinforcement learning 43
supervised learning 10
Table 2. Total Published Papers per Methodology per Month Last 12 Months.
Table 2. Total Published Papers per Methodology per Month Last 12 Months.
Methodology Publish Year Publish Month COUNT( Papers )
reinforcement learning 2026 1 2
reinforcement learning 2025 2 1
reinforcement learning 2025 3 2
reinforcement learning 2025 4 3
reinforcement learning 2025 5 2
reinforcement learning 2025 6 2
reinforcement learning 2025 7 2
reinforcement learning 2025 8 7
reinforcement learning 2025 9 5
reinforcement learning 2025 10 3
reinforcement learning 2025 11 7
reinforcement learning 2025 12 7
supervised learning 2025 2 1
supervised learning 2025 4 2
supervised learning 2025 5 1
supervised learning 2025 6 1
supervised learning 2025 7 3
supervised learning 2025 8 1
supervised learning 2025 10 1
Table 3. Published Papers Count per Geographic Region for Last 12 Months.
Table 3. Published Papers Count per Geographic Region for Last 12 Months.
Geographic Region COUNT( Papers )
Australia 1
China 33
Europe 1
Far East 6
Middle East 6
N. America 6
Table 4. Published Papers Count per Methodology for Last 24 Months.
Table 4. Published Papers Count per Methodology for Last 24 Months.
Methodology COUNT( Papers )
reinforcement learning 74
supervised learning 15
unsupervised learning 2
Table 5. Total Published Papers per Methodology per Quarter for Last 24 Months.
Table 5. Total Published Papers per Methodology per Quarter for Last 24 Months.
Methodology Publish Year Publish Quarter COUNT( Papers )
reinforcement learning 2024 1 3
reinforcement learning 2024 2 10
reinforcement learning 2024 3 8
reinforcement learning 2024 4 6
reinforcement learning 2025 1 7
reinforcement learning 2025 2 7
reinforcement learning 2025 3 14
reinforcement learning 2025 4 17
reinforcement learning 2026 1 2
supervised learning 2024 1 1
supervised learning 2024 2 2
supervised learning 2024 3 1
supervised learning 2024 4 1
supervised learning 2025 1 1
supervised learning 2025 2 4
supervised learning 2025 3 4
supervised learning 2025 4 1
unsupervised learning 2024 1 1
unsupervised learning 2024 3 1
Table 6. Published Papers Count per Geographic Region for Last 24 Months.
Table 6. Published Papers Count per Geographic Region for Last 24 Months.
Geographic Region COUNT( Papers )
Australia 1
China 55
Europe 5
Far East 7
Middle East 12
N. America 11
Table 7. Published Papers Count per Methodology for Last 5 Years.
Table 7. Published Papers Count per Methodology for Last 5 Years.
Methodology COUNT( Papers )
reinforcement learning 104
supervised learning 28
unsupervised learning 2
Table 8. Total Published Papers per Methodology per Year for Last 5 Years.
Table 8. Total Published Papers per Methodology per Year for Last 5 Years.
Methodology Publish Year COUNT( Papers )
reinforcement learning 2021 4
reinforcement learning 2022 7
reinforcement learning 2023 17
reinforcement learning 2024 29
reinforcement learning 2025 45
reinforcement learning 2026 2
supervised learning 2021 3
supervised learning 2022 5
supervised learning 2023 4
supervised learning 2024 6
supervised learning 2025 10
unsupervised learning 2024 2
Table 9. Published Papers Count per Geographic Region for Last 5 Years.
Table 9. Published Papers Count per Geographic Region for Last 5 Years.
Geographic Region COUNT( Papers )
Australia 4
China 78
Europe 9
Far East 15
Middle East 13
N. America 15
Table 10. Published Papers Count per Methodology for Last 10 Years.
Table 10. Published Papers Count per Methodology for Last 10 Years.
Methodology COUNT( Papers )
reinforcement learning 119
supervised learning 40
unsupervised learning 7
Table 11. Total Published Papers per Methodology per Year for Last 10 Years.
Table 11. Total Published Papers per Methodology per Year for Last 10 Years.
Methodology Publish Year COUNT( Papers )
reinforcement learning 2017 3
reinforcement learning 2018 1
reinforcement learning 2019 4
reinforcement learning 2020 6
reinforcement learning 2021 5
reinforcement learning 2022 7
reinforcement learning 2023 17
reinforcement learning 2024 29
reinforcement learning 2025 45
reinforcement learning 2026 2
supervised learning 2017 1
supervised learning 2018 4
supervised learning 2019 2
supervised learning 2020 5
supervised learning 2021 3
supervised learning 2022 5
supervised learning 2023 4
supervised learning 2024 6
supervised learning 2025 10
unsupervised learning 2016 1
unsupervised learning 2018 2
unsupervised learning 2019 1
unsupervised learning 2020 1
unsupervised learning 2024 2
Table 12. Published Papers Count per Geographic Region for Last 10 Years.
Table 12. Published Papers Count per Geographic Region for Last 10 Years.
Geographic Region COUNT( Papers )
Australia 5
China 89
Europe 18
Far East 18
Middle East 14
N. America 22
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Accessibility

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated