ARTICLE | doi:10.20944/preprints202209.0196.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Autonomous Vehicles; Reinforcement Learning; Explainable Reinforcement Learning; XRL
Online: 14 September 2022 (08:13:44 CEST)
While machine learning models are powering more and more everyday devices, there is a growing need for explaining them. This especially applies to the use of Deep Reinforcement Learning in solutions that require security, such as vehicle motion planning. In this paper, we propose a method of understanding what the RL agent’s decision is based on. The method relies on conducting statistical analysis on a massive set of state-decisions samples. It indicates which input features have an impact on the agent’s decision and the relationships between decisions, the significance of the input features, and their values. The method allows us for determining whether the process of making a decision by the agent is coherent with human intuition and what contradicts it. We applied the proposed method to the RL motion planning agent which is supposed to drive a vehicle safely and efficiently on a highway. We find out that making such analysis allows for a better understanding agent’s decisions, inspecting its behavior, debugging the ANN model, and verifying the correctness of input values, which increases its credibility.
ARTICLE | doi:10.20944/preprints202308.0756.v1
Subject: Engineering, Control And Systems Engineering Keywords: reinforcement learning； meta learning； deep reinforcement learning； autonomous driving； robot operating system
Online: 10 August 2023 (05:42:54 CEST)
Reinforcement Learning (RL) has demonstrated considerable potential in solving challenges across various domains, notably in autonomous driving. Nevertheless, implementing RL in autonomous driving comes with its own set of difficulties, such as the overestimation phenomenon, extensive learning time, and sparse reward problems. Although solutions like Hindsight Experience Replay (HER) have been proposed to alleviate these issues, the direct utilization of RL in autonomous vehicles remains constrained due to the intricate fusion of information and the possibility of system failures during the learning process. In this paper, we present a novel RL-based autonomous driving system technology that combines Obstacle Dependent Gaussian (ODG) RL, Soft Actor-Critic (SAC), and meta-learning algorithms. Our approach addresses key issues in RL, including the overestimation phenomenon and sparse reward problems, by incorporating prior knowledge derived from the ODG algorithm. We evaluated our proposed algorithm on official F1 circuits, using high-fidelity racing simulations with complex dynamics. The results demonstrate exceptional performance, with our method achieving up to 89% faster learning speed compared to existing algorithms in these environments.
ARTICLE | doi:10.20944/preprints202112.0337.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Transfer learning; Reinforcement learning; Adaptive operator selection; Artificial bee colony
Online: 21 December 2021 (13:41:06 CET)
In the past two decades, metaheuristic optimization algorithms (MOAs) have been increasingly popular, particularly in logistic, science, and engineering problems. The fundamental characteristics of such algorithms are that they are dependent on a parameter or a strategy. Some online and offline strategies are employed in order to obtain optimal configurations of the algorithms. Adaptive operator selection is one of them, and it determines whether or not to update a strategy from the strategy pool during the search process. In the filed of machine learning, Reinforcement Learning (RL) refers to goal-oriented algorithms, which learn from the environment how to achieve a goal. On MOAs, reinforcement learning has been utilised to control the operator selection process. Existing research, however, fails to show that learned information may be transferred from one problem-solving procedure to another. The primary goal of the proposed research is to determine the impact of transfer learning on RL and MOAs. As a test problem, a set union knapsack problem with 30 separate benchmark problem instances is used. The results are statistically compared in depth. The learning process, according to the findings, improved the convergence speed while significantly reducing the CPU time.
ARTICLE | doi:10.20944/preprints202308.1429.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: DASH; video streaming; Wireless networks; QoE; Deep learning; Reinforcement Learning algorithms; Deep reinforcement learning; Bandwidth estimation
Online: 21 August 2023 (07:28:16 CEST)
Dynamic adaptive video streaming over HTTP (DASH) plays a crucial role in video transmission across networks. Traditional adaptive bitrate (ABR) algorithms adjust the quality of video segments based on network conditions and buffer occupancy. However, these algorithms rely on fixed rules within a complex environment, making it challenging to achieve optimal decisions considering the overall context. In this paper, we propose a novel Deep Reinforcement Learning-based approach for streaming DASH, focusing on maintaining consistent perceived video quality throughout the streaming session to enhance user experience. Our approach optimizes the Quality of Experience (QoE) by dynamically controlling the quality distance factor between consecutive video segments. We evaluate this approach through a simulation model that encompasses diverse wireless network environments and various video sequences. Additionally, we compare our proposed approach with state-of-the-art methods. The experimental results demonstrate significant improvements in QoE, ensuring users enjoy stable, high-quality video streaming sessions.
ARTICLE | doi:10.20944/preprints202203.0094.v1
Subject: Engineering, Automotive Engineering Keywords: Smart scheduling; Smart Reservations; Reinforcement Learning; Electric vehicle charging; Electric Vehicle Charging Management platform; DQN Reinforcement Learning algorithm
Online: 7 March 2022 (09:20:13 CET)
Abstract: As the policies and regulations currently in place concentrate on environmental protection and greenhouse gas reduction, we are steadily witnessing a shift in the transportation industry towards electromobility. There are, though, several issues that need to be addressed to encourage the adoption of EVs at a larger scale. To this end, we propose a solution capable of addressing multiple EV charging scheduling issues, such as congestion management, scheduling a charging station in advance, and allowing EV drivers to plan optimized long trips using their EVs. The smart charging scheduling system we propose considers a variety of factors such as battery charge level, trip distance, nearby charging stations, other appointments, and average speed. Given the scarcity of data sets required to train the Reinforcement Learning algorithms, the novelty of the recommended solution lies in the scenario simulator, which generates the labelled datasets needed to train the algorithm. Based on the generated scenarios, we created and trained a neural network that uses a history of previous situations to identify the optimal charging station and time interval for recharging. The results are promising and for future work we are planning to train the DQN model using real-world data.
ARTICLE | doi:10.20944/preprints202005.0181.v1
Subject: Computer Science And Mathematics, Mathematics Keywords: Reinforcement learning; Cartpole; Q Learning; Mathematical Modeling
Online: 10 May 2020 (18:02:43 CEST)
The prevalence of differential equations as a mathematical technique has refined the fields of control theory and constrained optimization due to the newfound ability to accurately model chaotic, unbalanced systems. However, in recent research, systems are increasingly more nonlinear and difficult to model using Differential Equations only. Thus, a newer technique is to use policy iteration and Reinforcement Learning, techniques that center around an action and reward sequence for a controller. Reinforcement Learning (RL) can be applied to control theory problems since a system can robustly apply RL in a dynamic environment such as the cartpole system (an inverted pendulum). This solution successfully avoids use of PID or other dynamics optimization systems, in favor of a more robust, reward-based control mechanism. This paper applies RL and Q-Learning to the classic cartpole problem, while also discussing the mathematical background and differential equations which are used to model the aforementioned system.
ARTICLE | doi:10.20944/preprints202007.0598.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Reinforcement Learning; Simulation; Health Services Research; Operational Research
Online: 24 July 2020 (14:45:33 CEST)
Background and motivation: Combining Deep Reinforcement Learning (Deep RL) and Health Systems Simulations has significant potential, for both research into improving Deep RL performance and safety, and in operational practice. While individual toolkits exist for Deep RL and Health Systems Simulations, no framework to integrate the two has been established. Aim: Provide a framework for integrating Deep RL Networks with Health System Simulations, and to ensure this framework is compatible with Deep RL agents that have been developed and tested using OpenAI Gym. Methods: We developed our framework based on the OpenAI Gym framework, and demonstrate its use on a simple hospital bed capacity model. We built the Deep RL agents using PyTorch, and the Hospital Simulation using SimPy. Results: We demonstrate example models using a Double Deep Q Network or a Duelling Double Deep Q Network as the Deep RL agent. Conclusion: SimPy may be used to create Health System Simulations that are compatible with agents developed and tested on OpenAI Gym environments. GitHub repository of code: https://github.com/MichaelAllen1966/learninghospital
REVIEW | doi:10.20944/preprints201811.0510.v2
Subject: Engineering, Control And Systems Engineering Keywords: deep reinforcement learning; imitation learning; soft robotics
Online: 23 November 2018 (11:57:55 CET)
The increasing trend of studying the innate softness of robotic structures and amalgamating it with the benefits of the extensive developments in the field of embodied intelligence has led to sprouting of a relatively new yet extremely rewarding sphere of technology. The fusion of current deep reinforcement algorithms with physical advantages of a soft bio-inspired structure certainly directs us to a fruitful prospect of designing completely self-sufficient agents that are capable of learning from observations collected from their environment to achieve a task they have been assigned. For soft robotics structure possessing countless degrees of freedom, it is often not easy (something not even possible) to formulate mathematical constraints necessary for training a deep reinforcement learning (DRL) agent for the task in hand, hence, we resolve to imitation learning techniques due to ease of manually performing such tasks like manipulation that could be comfortably mimicked by our agent. Deploying current imitation learning algorithms on soft robotic systems have been observed to provide satisfactory results but there are still challenges in doing so. This review article thus posits an overview of various such algorithms along with instances of them being applied to real world scenarios and yielding state-of-the-art results followed by brief descriptions on various pristine branches of DRL research that may be centers of future research in this field of interest.
REVIEW | doi:10.20944/preprints202007.0693.v1
Subject: Computer Science And Mathematics, Computer Networks And Communications Keywords: 5G; deep learning; reinforcement learning; systematic review; cellular networks
Online: 29 July 2020 (11:12:32 CEST)
This last decade, the amount of data exchanged in the Internet increased by over a staggering factor of 100, and is expected to exceed well over the 500 exabytes by 2020. This phenomenon is mainly due to the evolution of high speed broadband Internet and, more specifically, the popularization and wide spread use of smartphones and associated accessible data plans. Although 4G with its long-term evolution (LTE) technology is seen as a mature technology, there is continual improvement to its radio technology and architecture such as in the scope of the LTE Advanced standard, a major enhancement of LTE. But for the long run, the next generation of telecommunication (5G) is considered and is gaining considerable momentum from both industry and researchers. In addition, with the deployment of the Internet of Things (IoT) applications, smart cities, vehicular networks, e-health systems, and Industry 4.0, a new plethora of 5G services has emerged with very diverging and technologically challenging design requirements. These include: high mobile data volume per area, high number of devices connected per area, high data rates, longer battery life for low-power devices, and reduced end-to-end latency. Several technologies are being developed to meet these new requirements. Among these we list ultra-densification, millimeter Wave usage, antennas with massive multiple-input multiple-output (MIMO), antenna beamforming to increase spacial diversity, edge/fog computing, among others. Each of these technologies brings its own design issues and challenges. For instance, ultra-densification and MIMO will increase the complexity to estimate channel condition and traditional channel state information (CSI) estimation techniques are no longer suitable due to the complexity of the new scenarios. As a result, new approaches to evaluate network condition such as by continuously collecting and monitoring key performance indicators become necessary. Timely decisions are needed to ensure the correct operation of such network. In this context, deep learning (DL) models could be seen as one of the main tools that can be used to process monitoring data and automate decisions. As these models are able to extract relevant features from raw data (images, texts, and other types of unstructured data), the integration between 5G and DL looks promising and one that requires exploring. As main contributions, this paper presents a systematic review about how DL is being applied to solve some 5G issues. We examine data from the last decade and the works that addressed diverse 5G problems, such as physical medium state estimation, network traffic prediction, user device location prediction, self network management, among others. We also discuss the main research challenges when using DL models in 5G scenarios and identify several issues that deserve further consideration.
ARTICLE | doi:10.20944/preprints202212.0167.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Reinforcement Learning; Locomotion Disorder; IMU Sensor; Musculoskeletal simulation
Online: 9 December 2022 (01:12:43 CET)
Locomotor impairment is a high-prevalent and significant source of disability and significantly impacts a large population’s quality of life. Despite decades of research in human locomotion, the challenges of simulating human movement to study the features of musculoskeletal drivers and clinical conditions remain. Most recent efforts in utilizing reinforcement learning (RL) techniques are promising to simulate human locomotion and reveal musculoskeletal drives. However, these simulations often failed to mimic natural human locomotion because most reinforcement strategies have yet to consider any reference data regarding human movement. To address these challenges, in this study, we designed a reward function based on the trajectory optimization rewards (TOR), and bio-inspired rewards, which includes the rewards obtained from reference motion data captured by a single Intertial Moment Unit (IMU) sensor. The sensor was equipped on the participants’ pelvis to capture reference motion data. Also, we adapted the reward function by leveraging previous research in walking simulation for TOR. The experimental results showed that the simulated agents with the modified reward function performed better in mimicking the collected IMU data from participants, which means the simulated human locomotion was more realistic. Also, as this bio-inspired defined cost, IMU data enhanced the agent’s capacity to converge during the training process. As a result, the models’ convergence is faster than those developed without reference motion data. Consequently, human locomotion can be simulated more quicker and in a broader range of environments with a better simulation performance.
REVIEW | doi:10.20944/preprints202003.0309.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: economics; deep reinforcement learning; deep learning; machine learning
Online: 20 March 2020 (07:13:42 CET)
The popularity of deep reinforcement learning (DRL) methods in economics have been exponentially increased. DRL through a wide range of capabilities from reinforcement learning (RL) and deep learning (DL) for handling sophisticated dynamic business environments offers vast opportunities. DRL is characterized by scalability with the potential to be applied to high-dimensional problems in conjunction with noisy and nonlinear patterns of economic data. In this work, we first consider a brief review of DL, RL, and deep RL methods in diverse applications in economics providing an in-depth insight into the state of the art. Furthermore, the architecture of DRL applied to economic applications is investigated in order to highlight the complexity, robustness, accuracy, performance, computational tasks, risk constraints, and profitability. The survey results indicate that DRL can provide better performance and higher accuracy as compared to the traditional algorithms while facing real economic problems at the presence of risk parameters and the ever-increasing uncertainties.
ARTICLE | doi:10.20944/preprints202209.0483.v1
Subject: Engineering, Control And Systems Engineering Keywords: deep reinforcement learning; data efficient; curriculum learning; transfer learning
Online: 30 September 2022 (10:35:06 CEST)
Sparse reward long horizon task is a major challenge for deep reinforcement learning algorithm. One of the key barriers is data-inefficiency. Even in the simulation environment, it usually takes weeks to training the agent. In this study, a data-efficiency training framework is proposed, where a curriculum learning is design for the agent in the simulation scenario. Different distributions of the initial state are set for the agent to get more informative reward during the whole training process. A fine-tuning of the parameters in the output layer of the neural network for value function is conduct to bridge the gap between sim-to-real. An experiment of UAV maneuver control is conducted in the proposed training framework to verify the method more efficient. We demonstrate that data-efficiency is different for the same data in different training stages.
ARTICLE | doi:10.20944/preprints202309.1896.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: biochips; Digital Microfluidic Biochips; deep reinforcement learning; optimization
Online: 27 September 2023 (11:24:50 CEST)
Digital Microfluidic Biochips (DMFBs), used in various kinds of fields like DNA analysis, clinical diagnosis, and PCR testing, have made biochemical experiments more compact, efficient, and user-friendly than previous ways. However, their reliability is often compromised by their inability to adapt to all kinds of errors. All errors in biochips can be categorized into two types: known errors and unknown errors. Known errors are detectable before the start of the routing process through sensors or cameras. Unknown errors, in contrast, become apparent only during the routing process and remain undetected by sensors or cameras, which is the biggest issue to unexpectedly stop the routing process and diminishes the reliability of biochips. This paper introduces a deep reinforcement learning-based routing algorithm designed to manage not only known errors but also unknown errors. Our experiments demonstrate that our algorithm outperforms previous ones in terms of the success rate of the routing in the scenario including both known errors and unknown errors. Additionally, our algorithm contributes to detecting unknown errors during the routing process and identifying the most efficient routing path with high probability.
ARTICLE | doi:10.20944/preprints202305.1532.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: deep reinforcement learning; trading; clustering; hierarchy; prediction; risk; bitcoin
Online: 22 May 2023 (16:24:21 CEST)
We present a hierarchical reinforcement learning (RL) architecture that employs various low-level agents to act in the trading environment, i.e. the market. The highest level agent selects among a group of specialised agents, and then the selected agent decides when to sell or buy a single asset for some period. This period can be variable according to a termination function. We hypothesized that due to different market regimes, more than one single agent is needed when trying to learn from such heterogeneous data, and instead, multiple agents will perform better, with each one specialising in a subset of the data. We use $k-means$ clustering to partition the data and train each agent with a different cluster. Partitioning the input data also helps model-based RL (MBRL), where models can be heterogeneous. We also add two simple decision-making models to the set of low-level agents, diversifying the pool of available agents and thus increasing overall behaviour flexibility. We perform multiple experiments showing the strengths of a hierarchical approach and test various prediction models at both levels. We also use a risk-based reward at the high level, which transforms the overall problem into a risk-return optimization. This type of reward shows a significant reduction in risk while minimally reducing profits. Overall, the hierarchical approach shows significant promise, especially when the pool of low-level agents is highly diverse.
ARTICLE | doi:10.20944/preprints201909.0159.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: radio over fiber; nonlinearities mitigation; reinforcement learning (RL) method
Online: 16 September 2019 (10:37:01 CEST)
We propose a 10-Gb/s 64-quadrature amplitude modulation (QAM) signal-based Radio over Fiber (RoF) system for 50 km of standard single mode fiber length which utilizes Reinforcement Learning (RL) SARSA based decision method to indicate an effective decision which mitigates nonlinearity. By utilizing RL-SARSA algorithm, the results demonstrate that significant reduction can be obtained in terms of bit error rate.
ARTICLE | doi:10.20944/preprints202212.0233.v1
Subject: Engineering, Control And Systems Engineering Keywords: mobile robotics; neural networks; control systems; reinforcement learning; crowd navigation
Online: 13 December 2022 (08:33:17 CET)
For a mobile robot, navigation in a densely crowded space can be a challenging and sometimes impossible task, especially with traditional techniques. In this paper, we present a framework to train neural controllers for differential drive mobile robots which must safely navigate a crowded environment while trying to reach a target location. To learn the robot’s policy, we train a convolutional neural network using two reinforcement learning algorithms, Deep Q-Networks (DQN) and Asynchronous Advantage Actor Critic (A3C), and develop a training pipeline that allows to scale the process to several compute nodes. We show that the asynchronous training procedure in A3C can be leveraged to quickly train neural controllers and test them on a real robot in a crowded environment.
ARTICLE | doi:10.20944/preprints202101.0176.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: recommender system; tag-ware; deep reinforcement learning; user cold start
Online: 11 January 2021 (10:02:49 CET)
Recently, the application of deep reinforcement learning in recommender system is flourishing and stands out by overcoming drawbacks of traditional methods and achieving high recommendation quality. The dynamics, long-term returns and sparse data issues in recommender system have been effectively solved. But the application of deep reinforcement learning brings problems of interpretability, overfitting, complex reward function design, and user cold start. This paper proposed a tag-aware recommender system based on deep reinforcement learning without complex function design, taking advantage of tags to make up for the interpretability problems existing in recommender system. Our experiment is carried out on MovieLens dataset. The result shows that, DRL based recommender system is superior than traditional algorithms in minimum error and the application of tags has little effect on accuracy when making up for interpretability. In addition, DRL based recommender system has excellent performance on user cold start problems.
ARTICLE | doi:10.20944/preprints202305.1862.v1
Subject: Computer Science And Mathematics, Robotics Keywords: path tracking; deep reinforcement learning; maximum entropy; inverse kinematics
Online: 26 May 2023 (05:24:45 CEST)
We propose a deep reinforcement learning based manipulator path tracking method to solve the computationally difficult and non-unique problem of manipulator path tracking methods based on inverse kinematics. By transforming the path tracking task into a sequence decision problem, our method adopts an end-to-end learning method for closed-loop control and avoids the process of finding the inverse solution. We first explored the feasibility of the deep reinforcement learning method in the path tracking of the manipulator. After verifying the feasibility, the path tracking of the multi-degree-of-freedom(multi-DOF) manipulator was realized by combining the maximum entropy deep reinforcement learning algorithm. The experimental results show that our method has a good effect on the path tracking of the manipulator, which not only avoids the process of finding the inverse kinematics solution, but also requires no dynamic model. Therefore, we believe that our method has great significance in the study of manipulator path tracking.
ARTICLE | doi:10.20944/preprints202305.0219.v1
Subject: Engineering, Bioengineering Keywords: ECG; modeling; reinforcement learning; lognormal; autonomic nervous system; model-driven analysis
Online: 4 May 2023 (07:48:29 CEST)
Modeling is essential to understand better the generative mechanisms responsible for experimental observations gathered from complex systems. In this work, we are using such an approach to analyze the electrocardiogram (ECG). We present a systematic framework to decompose ECG signals into sums of overlapping lognormal components. We used reinforcement learning to train a deep neural network to estimate the modeling parameters from ECG recorded in babies of 1 to 24 months of age. We demonstrate this model-driven approach by showing how the extracted parameters vary with age. After correction for multiple tests, 10 of 24 modeling parameters showed statistical significance below the 0.01 threshold, with absolute Kendall rank correlation coefficients in the [0.27, 0.51] range. We presented a model-driven approach to the analysis of ECG. The impact of this framework on fundamental science and clinical applications is likely to be increased by further refining the modeling of the physiological mechanisms generating the ECG. By improving the physiological interpretability, this approach can provide a window into latent variables important for understanding the heart-beating process and its control by the autonomous nervous system.
ARTICLE | doi:10.20944/preprints202006.0046.v2
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Postural Balance; Deep Reinforcement Learning; Postural Stabilisation; Biomechanics
Online: 8 June 2020 (10:25:54 CEST)
Learning to maintain postural balance while standing requires a significant fine coordination effort between the neuromuscular system and the sensory system. It is one of the key contributing factors towards fall prevention, especially in the older population. Using artificial intelligence (AI), we can similarly teach an agent to maintain a standing posture, and thus teach the agent not to fall. In this paper, we investigate the learning progress of an AI agent and how it maintains a stable standing posture through reinforcement learning. During training, the AI agent learnt three policies. First, it learnt to maintain the Centre-of-Gravity and Zero-Moment-Point in front of the body. Then, it learnt to shift the load of the entire body on one leg while using the other leg for fine tuning the balancing action. Finally, it started to learn the coordination between the two pre-trained policies. This study shows the potentials of using deep reinforcement learning in human movement studies. The learnt AI behaviour also exhibited attempts to achieve an unplanned goal because it correlated with the set goal (e.g. walking in order to prevent falling). The failed attempts to maintain a standing posture is an interesting by-product which can enrich the fall detection and prevention research efforts.
ARTICLE | doi:10.20944/preprints202308.0081.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: evolutionary algorithm; evolution strategy; neural network; neuroevolution; reinforcement learning
Online: 2 August 2023 (04:52:52 CEST)
Evolutionary algorithms find applicability in the reinforcement learning of neural networks due to their independence from gradient-based methods. To achieve successful training of neural networks using evolutionary algorithms, careful considerations must be made to select appropriate algorithms due to the availability of various algorithmic variations. The author previously reported experimental evaluations on Evolution Strategy for reinforcement learning of neural networks, utilizing the pendulum control task. In this study, the Acrobot control task is adopted as another task. Experimental results demonstrate that ES successfully trained a Multi-Layer Perceptron to achieve a remarkable height of 99.85% concerning the maximum height. However, the trained MLP failed to maintain the chain end in an upright position throughout an episode. In this study, it was observed that employing 8 hidden units in the neural network yielded better results with statistical significance compared to using 4, 16, or 32 hidden units. Furthermore, the findings indicate that a larger population size in ES led to a more extensive exploration of potential solutions over a greater number of generations, which aligns with the previous study.
ARTICLE | doi:10.20944/preprints202304.0656.v1
Subject: Computer Science And Mathematics, Computer Networks And Communications Keywords: LEO satellite networks; satellite routing; multi-agent reinforcement learning; distributed routing
Online: 21 April 2023 (02:25:03 CEST)
Fast convergence routing is an important issue for LEO constellation network, due to its dynamical topology changing and time varying transmission requests. Most of existing research focus on the OSPF routing algorithm, which cannot handle the frequently links state changing of network. In this paper, we propose a Fast-Convergence Reinforcement Learning Satellite Routing Algorithm (FRL-SR) for LEO satellite networks, in which the satellite gets the network links status fast and adjusts its routing strategy. In FRL-SR, each satellite node is regarded as an agent. The agent selects the port for packet forwarding according to its own routing policy. When the satellite network state changes, agent would send ’hello’ packets to the neighbor node to update the neighbor node’s routing policy. Compared with traditional reinforcement learning, FRL-SR can perceive network information faster, and then converge faster. Also, FRL-SR can mask the dynamics of satellite network topology and adaptively adjust the forwarding strategy according to the link state. Various simulation is constructed, the results show that the proposed FRL-SR algorithm out performance the Dijkstra algorithm in performance of average delay, packet arriving ratio, network load balance.
ARTICLE | doi:10.20944/preprints202103.0592.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Electric Vehicles; batch reinforcement learning; dueling neural networks; fitted Q-iteration
Online: 24 March 2021 (13:44:36 CET)
We consider the problem of coordinating the charging of an entire fleet of electric vehicles (EV), using a model-free approach, i.e. purely data-driven reinforcement learning (RL). The objective of the RL-based control is to optimize charging actions, while fulfilling all EV charging constraints (e.g. timely completion of the charging). In particular, we focus on batch-mode learning and adopt fitted Q-iteration (FQI). A core component in FQI is approximating the Q-function using a regression technique, from which the policy is derived. Recently, a dueling neural networks architecture was proposed and shown to lead to better policy evaluation in the presence of many similar-valued actions, as applied in a computer game context. The main research contributions of the current paper are that (i)we develop a dueling neural networks approach for the setting of joint coordination of an entire EV fleet, and (ii)we evaluate its performance and compare it to an all-knowing benchmark and an FQI approach using EXTRA trees regression technique, a popular approach currently discussed in EV related works. We present a case study where RL agents are trained with an epsilon-greedy approach for different objectives, (a)cost minimization, and (b)maximization of self-consumption of local renewable energy sources. Our results indicate that RL agents achieve significant cost reductions (70--80%) compared to a business-as-usual scenario without smart charging. Comparing the dueling neural networks regression to EXTRA trees indicates that for our case study's EV fleet parameters and training scenario, the EXTRA trees-based agents achieve higher performance in terms of both lower costs (or higher self-consumption) and stronger robustness, i.e. less variation among trained agents. This suggests that adopting dueling neural networks in this EV setting is not particularly beneficial as opposed to the Atari game context from where this idea originated.
ARTICLE | doi:10.20944/preprints202307.0199.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: machine learning; deep reinforcement learning; transfer learning; fire; evacuation
Online: 4 July 2023 (10:38:05 CEST)
There is only a very short reaction time for people to find the best way out of a building in a fire outbreak. Software applications can be used to assist the rapid evacuation of victims; however, this is an arduous task, which requires an understanding of advanced technologies. Since well-known pathway algorithms (such as, Dijkstra, Bellman-Ford and A*) can lead to serious problems over performance, we decided to make use of deep reinforcement learning techniques. A wide range of strategies including a random initialization of replay buffer and transfer learning were assessed in three projects involving schools of different sizes. The results showed the proposal was viable and that in most cases the performance of transfer learning was superior. In addition, the study raised challenges that had to be faced in the future.
ARTICLE | doi:10.20944/preprints202203.0199.v1
Subject: Engineering, Control And Systems Engineering Keywords: micropositioners; reinforcement learning; disturbance observer; deep deterministic policy gradient
Online: 15 March 2022 (07:58:27 CET)
The robust control of high precision electromechanical systems, such as micropositioners, is challenging in terms of the inherent high nonlinearity, the sensitivity to external interference, and the complexity of accurate identification of the model parameters. To cope with these problems, this work investigates a disturbance observer-based deep reinforcement learning control strategy to realize high robustness and precise tracking performance. Reinforcement learning has shown great potential as optimal control scheme, however, its application in micropositioning systems is still rare. Therefore, embedded with the integral differential compensator (ID), deep deterministic policy gradient (DDPG) is utilized in this work with the ability to not only decrease the state error but also improves the transient response speed. In addition, an adaptive sliding mode disturbance observer (ASMDO) is proposed to further eliminate the collective effect caused by the lumped disturbances. The sterling performance is revealed with intensive tracking simulation experiments and demonstrates the improvement in the accuracy and response time of the controller.
REVIEW | doi:10.20944/preprints202201.0050.v1
Subject: Engineering, Mechanical Engineering Keywords: turbulence; flow control; simulation; aerodynamics; machine learning; deep reinforcement learning
Online: 6 January 2022 (09:36:50 CET)
In this review we summarize existing trends of flow control used to improve the aerodynamic efficiency of wings. We first discuss active methods to control turbulence, starting with flat-plate geometries and building towards the more complicated flow around wings. Then, we discuss active approaches to control separation, a crucial aspect towards achieving high aerodynamic efficiency. Furthermore, we highlight methods relying on turbulence simulation, and discuss various levels of modelling. Finally, we thoroughly revise data-driven methods, their application to flow control, and focus on deep reinforcement learning (DRL). We conclude that this methodology has the potential to discover novel control strategies in complex turbulent flows of aerodynamic relevance.
ARTICLE | doi:10.20944/preprints202304.0734.v1
Subject: Computer Science And Mathematics, Computer Networks And Communications Keywords: Federated Learning; Node Selection; Deep Reinforcement Learning; Multi-Objective; Model Performance
Online: 23 April 2023 (03:02:36 CEST)
As a new distributed machine learning (ML) approach, federated learning (FL) shows the great potential to preserve data privacy by enabling distributed data owners to collaboratively build a global model without sharing their raw data. However, the heterogeneity in terms of data distribution and hardware configurations make it hard to select participants from the thousands of nodes. In this paper, we propose a multi-objective node selection approach to improve time-to-accuracy performance while resisting malicious nodes. We firstly design a deep reinforcement learning assisted FL framework. Then the problem of multi-objective node selection under this framework is formulated as a Markov decision process (MDP), which aims to reduce the training time and improve model accuracy simultaneously. Finally, a deep Q-netwok (DQN) based algorithm is proposed to efficiently solve the optimal set of participants for each iteration. Simulation results show that the proposed method not only significantly improves the accuracy and training speed of FL, but has stronger robustness to resist malicious nodes.
ARTICLE | doi:10.20944/preprints202307.0386.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Combination optimization; grid section; deep reinforcement learning; annealing optimization algorithm
Online: 6 July 2023 (07:13:08 CEST)
Modern power system integrates more and more new energy and use a large number of power electronic equipment. This makes it face more challenges in online optimization and real-time control. Deep reinforcement learning(DRL) has the ability of processing big data and high-dimensional features, as well as the ability of independently learning and optimizing decision-making in complex environments. In this paper, we explore DRL based online combination optimization method of grid section for large complex power system. In our method, to improve the convergence speed of the model, we propose to discretize the output action of the unit and simplify the action space. We also design a reinforcement learning loss function with strong constraints to further improve the convergence speed of the model and facilitate the algorithm to obtain the stable solution. Moreover, to avoid the local optimal solution problem caused by the discretization of the output action, we propose to use the annealing optimization algorithm to make the granularity of the unit output finer. We verify our method on IEEE 118-bus system. The experimental results show that our model has fast convergence speed and better performance, and can obtain stable solutions.
REVIEW | doi:10.20944/preprints202306.1901.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Drones; Machine Learning; Artificial Intelligence; Supervised learning; Unsupervised Learning; Reinforcement Learning
Online: 27 June 2023 (12:27:38 CEST)
The use of drones for various applications has become increasingly popular in recent years, and machine learning has played a significant role in this trend. In this paper, we provide a comprehensive survey of the classification and application of machine learning in drones. The paper begins with an overview of the different types of machine learning algorithms and their applications in drones, including supervised learning, unsupervised learning, and reinforcement learning. Next, we present a detailed analysis of various real-world applications of machine learning in drones, such as object recognition, route planning, obstacle avoidance, search area optimization, and autonomous search. The paper also discusses the challenges and limitations of using machine learning in drones, such as data privacy, data quality, and computational requirements. Finally, the paper concludes with a discussion of the future directions of machine learning in drones and its potential impact on various industries and fields. This paper provides a valuable resource for researchers, practitioners, and students interested in the intersection of machine learning and drones.
ARTICLE | doi:10.20944/preprints202309.1975.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: forecasting; reinforcement learning; power grid; planning and scheduling; uncertainty in AI; agent-based systems; deep learning; stochastic optimization
Online: 28 September 2023 (10:14:29 CEST)
Continuous greenhouse gas emissions are causing global warming and impacting the habitats of many animals. Researchers in the field of electric power are making efforts to mitigate this situation. Operating and maintaining the power grid in an economic, low-carbon, and stable is challenging. To address the issue, we propose a grid dispatching technique that combines prediction technology, reinforcement learning, and optimization technology. Prediction technology can forecast future power demand and solar power generation, while reinforcement learning and optimization technology can make charging and discharging decisions for energy storage devices based on current and future grid conditions. In the power system, the aggregation of distributed energy resources increases uncertainty, particularly due to the fluctuating generation of renewable energy. This requires the use of advanced predictive control techniques to ensure long-term economic and decarbonization goals. In this paper, we present a real-time dispatching framework that integrates deep learning-based prediction, reinforcement learning-based decision-making, and stochastic optimization techniques. The framework can rapidly adapt to target uncertainty caused by various factors in real-time data distribution and control processes. The proposed framework achieved global Champion in the NeurIPS Challenge 2022 competition and demonstrated its effectiveness in practical scenarios of intelligent building energy management.
ARTICLE | doi:10.20944/preprints202309.0448.v1
Subject: Engineering, Civil Engineering Keywords: 2D Netzgitterträger; NetzGT reinforcement; Non-metallic reinforcement; Carbon textile reinforcement
Online: 7 September 2023 (02:52:26 CEST)
The increasing popularity of carbon-reinforced concrete (CRC) is attributed to its exceptional tensile properties, low density, no corrosion phenomenon, and remarkable flexibility, allowing it to be easily shaped into various forms. This research investigates the feasibility of using a special 2D Netzgitterträger (NetzGT) reinforcement system, featuring a net-shaped fabricated textile made of multiple diagonally offset rovings with overlapping edge strands, as a viable alternative to traditional steel reinforcement in concrete beams. This reinforcement is manufactured from carbon rovings with three different diagonal angles of 50⁰, 60⁰, and 70⁰ respectively. Laboratory experiments were conducted to assess the mechanical behavior of beams reinforced with the 2D NetzGT reinforcement. Bending and shear tests were performed on beams with varying numbers of overlapped edge roving and roving angles to evaluate the tensile capacity and failure characteristics of beams. The increase in the number of overlapped edge rovings led to a noticeable increase in the maximum tensile force. Tensile tests on strands were also performed with the increasing number of overlapped rovings to analyze their tensile strength. Additionally, single yarn pull-out tests were also conducted to examine the influence of the roving angle on the bond strength between the carbon textile roving and the concrete matrix
ARTICLE | doi:10.20944/preprints202309.1441.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: fine grain image recognition; Inception-V3; reinforcement complementary learning; complementary learning; inter-class gap
Online: 21 September 2023 (08:54:27 CEST)
Abstract:The objects of fine-grained image categories(e.g., bird species) are various subclass under different categories. Because the differences between subclass are very subtle and most of them are concentrated in multiple local areas, the task of fine-grained image recognition is very challenging. At the same time, some fine-grained networks tend to focus on a certain region when judging the target category, resulting in the lack of other auxiliary regional features. To this end, Inception V3 is used as the backbone network, and an enhanced and complementary fine-grained image classification network is designed. While adopting the method of reinforcement learning to obtain more detailed fine grain image features, the complementary network can obtain the complementary discriminant area of the target through the method of attention erasure to increase the network's perception of the overall target. Finally, experiments are conducted on CUB-200-2011, FGVC Aircraft and Stanford dogs three open datasets. The experimental results show that the proposed model has better performance.
ARTICLE | doi:10.20944/preprints202309.1684.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: Agent-based Modeling and Simulation; Reinforcement Learning; COVID-19; Social Simulation.
Online: 25 September 2023 (11:41:53 CEST)
This study assesses the impact of incorporating an adaptive learning mechanism into an agent-based model (ABMS) simulating behavior on a university campus during the course of a pandemic outbreak, with a particular case on the COVID-19 pandemic. The aim is to reduce overcrowding and infections on campus through the use of Reinforcement Learning (RL). Our findings indicate that RL is a viable approach for effectively representing agents’ behavior within this context. The results reveal specific temporal patterns of overcrowding violations. While our study successfully mitigated campus crowding, it had limited influence on altering the course of the epidemic. This highlights the necessity for comprehensive epidemic control strategies that consider the role of individual decision-making influenced by adaptive learning, along with the implementation of targeted interventions. This research significantly contributes to our understanding of adaptive learning within complex systems and offers valuable insights for shaping future public health policies in similar community settings. Future research directions encompass exploring various parameter settings and updating representations of the disease’s natural history.
ARTICLE | doi:10.20944/preprints202308.0431.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: conceptualization; methodology; job allocation; reinforcement learning; stocker; digital twin; simulation; Industry 4.0
Online: 7 August 2023 (03:03:04 CEST)
In this study, reinforcement learning (RL) was used in factory simulation to optimize storage devices for use in Industry 4.0 and digital twins. First, we defined an RL environment, modeled it, and validated its ability to simulate a real physical system. Subsequently, we introduced a method to calculate reward signals and apply them to the environment to ensure the alignment of the behavior of the RL agent with the task objective. The stocker simulation model was used to validate the effectiveness of RL. The model is a storage device that simulates logistics in a manufacturing production area. The results revealed that RL is a useful tool for automating and optimizing complex logistics systems and increase the applicability of RL in logistics. We proposed a novel method for creating an agent through learning using the proximal policy optimization algorithm, and the agent was optimized by configuring various learning options. The application of reinforcement learning resulted in an effectiveness of 30% to 100%, and methods can be expanded to other fields.
BRIEF REPORT | doi:10.20944/preprints202302.0210.v1
Subject: Engineering, Control And Systems Engineering Keywords: Aerial Manipulator, Deep Deterministic Policy Gradient, Fuzzy Reinforcement Learning, Sensors
Online: 13 February 2023 (09:09:53 CET)
A detailed literature review is performed in this study to address solutions for the full-body design and control of an aerial manipulator. Deep Reinforcement Learning methods are growing to be utilized recently to cope with various uncertainties. The pros and cons of these theories will be explained as well as introducing the advantages of Fuzzy Reinforcement Learning methods. State-of-the-Art, possible challenges, potential approaches, and a summary of desired precision devices are discussed in this study.
ARTICLE | doi:10.20944/preprints202206.0028.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: reinforcement learning (RL); online learning; mobile health; algorithm design; algorithm evaluation; decision support systems
Online: 2 June 2022 (06:05:06 CEST)
Online reinforcement learning (RL) algorithms are increasingly used to personalize digital interventions in the fields of mobile health and online education. Common challenges in designing and testing an RL algorithm in these settings include ensuring the RL algorithm can learn and run stably under real-time constraints, and accounting for the complexity of the environment, e.g., a lack of accurate mechanistic models for the user dynamics. To guide how one can tackle these challenges, we extend the PCS (Predictability, Computability, Stability) framework, a data science framework that incorporates best practices from machine learning and statistics in supervised learning (Yu and Kumbier, 2020), to the design of RL algorithms for the digital interventions setting. Further, we provide guidelines on how to design simulation environments, a crucial tool for evaluating RL candidate algorithms using the PCS framework. We illustrate the use of the PCS framework for designing an RL algorithm for Oralytics, a mobile health study aiming to improve users' tooth-brushing behaviors through the personalized delivery of intervention messages. Oralytics will go into the field in late 2022.
REVIEW | doi:10.20944/preprints202208.0104.v2
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: HEMS; Reinforcement Learning; Deep Neural Network; Q-Value; Policy Gradient; Natural Gradient; Actor-Critic; Residential, Commercial, Academic.
Online: 1 September 2022 (04:27:12 CEST)
The steep rise in reinforcement learning (RL) in various applications in energy as well as the penetration of home automation in recent years are the motivation for this article. It surveys the use of RL in various home energy management system (HEMS) applications. There is a focus on deep neural network (DNN) models in RL. The article provides an overview of reinforcement learning. This is followed with discussions on state-of-the-art methods for value, policy, and actor–critic methods in deep reinforcement learning (DRL). In order to make the published literature in reinforcement learning more accessible to the HEMS community, verbal descriptions are accompanied with explanatory figures as well as mathematical expressions using standard machine learning terminology. Next, a detailed survey of how reinforcement learning is used in different HEMS domains is described. The survey also considers what kind of reinforcement learning algorithms are used in each HEMS application. It suggests that research in this direction is still in its infancy. Lastly, the article proposes four performance metrics to evaluate RL methods.
ARTICLE | doi:10.20944/preprints202010.0413.v1
Subject: Engineering, Civil Engineering Keywords: Real-time Control; Reinforcement Learning; Smart Stormwater Systems; Urban Flooding
Online: 20 October 2020 (15:03:45 CEST)
Climate change and development have increased urban flooding, requiring modernization of stormwater infrastructure. Retrofitting standard passive systems with controllable valves/pumps is promising, but requires real-time control (RTC). One method of automating RTC is reinforcement learning (RL), a general technique for sequential optimization and control in uncertain environments. The notion is that an RL algorithm can use inputs of real-time flood data and rainfall forecasts to learn a policy for controlling the stormwater infrastructure to minimize measures of flooding. In real-world conditions, rainfall forecasts and other state information, are subject to noise and uncertainty. To account for these characteristics of the problem data, we implemented Deep Deterministic Policy Gradient (DDPG), an RL algorithm that is distinguished by its capability to handle noise in the input data. DDPG implementations were trained and tested against a passive flood control policy. Three primary cases were studied: (i) perfect data, (ii) imperfect rainfall forecasts, and (iii) imperfect water level and forecast data. Rainfall episodes (100) that caused flooding in the passive system were selected from 10 years of observations in Norfolk, Virginia, USA; 85 randomly selected episodes were used for training and the remaining 15 unseen episodes served as test cases. Compared to the passive system, all RL implementations reduced flooding volume by 70.5% on average, and performed within a range of 5%. This suggests that DDPG is robust to noisy input data, which is essential knowledge to advance the real-world applicability of RL for stormwater RTC.
ARTICLE | doi:10.20944/preprints201805.0353.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: big data; big data system; energy; district heating; reinforcement learning
Online: 24 May 2018 (16:05:27 CEST)
This paper presents a study on the thermal efficiency improvement of the user equipment room in the district heating system based on reinforcement learning , and suggests a general method of constructing a learning network(DQN) using deep Q learning, which is a reinforcement learning algorithm that does not specify a model. In addition, we introduce the big data platform system and the integrated heat management system for the energy field in the massive data processing from the IoT sensor installed in large number of thermal energy control facilities.
ARTICLE | doi:10.20944/preprints202210.0360.v1
Subject: Engineering, Control And Systems Engineering Keywords: Reinforcement Learning, Q-learning, Fuzzy Q-learning, Attitude Control, Truss-braced Wing, Flight Control
Online: 24 October 2022 (10:24:33 CEST)
Attitude control of a novel regional truss-braced wing aircraft with low stability characteristics is addressed in this paper using Reinforcement Learning (RL). In recent years, RL has been increasingly employed in challenging applications, particularly, autonomous flight control. However, a significant predicament confronting discrete RL algorithms is the dimension limitation of the state-action table and difficulties in defining the elements of the RL environment. To address these issues, in this paper, a detailed mathematical model of the mentioned aircraft is first developed to shape an RL environment. Subsequently, Q-learning, the most prevalent discrete RL algorithm will be implemented in both the Markov Decision Process (MDP), and Partially Observable Markov Decision Process (POMDP) frameworks to control the longitudinal mode of the air vehicle. In order to eliminate residual fluctuations that are a consequence of discrete action selection, and simultaneously track variable pitch angles, a Fuzzy Action Assignment (FAA) method is proposed to generate continuous control commands using the trained Q-table. Accordingly, it will be proved that by defining an accurate reward function, along with observing all crucial states (which is equivalent to satisfying the Markov Property), the performance of the introduced control system surpasses a well-tuned Proportional–Integral–Derivative (PID) controller.
COMMUNICATION | doi:10.20944/preprints202211.0234.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: injection laryngoplasty; neck CT; vocal fold localization; deep learning; reinforcement learning; mirror environment
Online: 14 November 2022 (04:16:28 CET)
Transcutaneous injection laryngoplasty is a well-known procedure for treating paralyzed vocal fold by injecting augmentation material into it. Hence, vocal fold localization plays a vital role in the preoperative planning as the fold location is required to determine the optimal injection route. In this communication, we propose a mirror environment based reinforcement learning (RL) algorithm for localizing the right and left vocal folds in preoperative neck CT. RL-based methods commonly showed noteworthy outcome in general anatomic landmark localization problem in the recent years. However, such methods suggest training individual agent for localizing each fold, though the right and left vocal folds are located in close proximity and have high feature-similarity. Utilizing the lateral symmetry between the right and left vocal folds, the proposed mirror environment allows for a single agent for localizing both the folds by treating the left fold as a flipped version of the right fold. Thus, localization of both folds can be trained using a single training session which utilizes the inter-fold correlation and avoids redundant feature learning. Experiment with 120 CT volumes showed improved localization performance and training efficiency of the proposed method compared with the standard RL method.
ARTICLE | doi:10.20944/preprints201808.0049.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: intelligent driving vehicle; trajectory planning; end-to-end; deep reinforcement learning; model transfer
Online: 2 August 2018 (13:06:39 CEST)
Aiming at the problem of model error and tracking dependence in the process of intelligent vehicle motion planning, an intelligent vehicle model transfer trajectory planning method based on deep reinforcement learning is proposed, which obtain an effective control action sequence directly. Firstly, an abstract model of the real environment is extracted. On this basis, Deep Deterministic Policy Gradient (DDPG) and vehicle dynamic model are adopted to jointly train a reinforcement learning model, and to decide the optimal intelligent driving maneuver. Secondly, the actual scene is transferred to equivalent virtual abstract scene by transfer model, furthermore, the control action and trajectory sequences are calculated according to trained deep reinforcement learning model. Thirdly, the optimal trajectory sequence is selected according to evaluation function in the real environment. Finally, the results demonstrate that the proposed method can deal with the problem of intelligent vehicle trajectory planning for continuous input and continuous output. The model transfer method improves the model generalization performance. Compared with the traditional trajectory planning, the proposed method output continuous rotation angle control sequence, meanwhile, the lateral control error is also reduced.
ARTICLE | doi:10.20944/preprints202211.0011.v1
Subject: Computer Science And Mathematics, Mathematics Keywords: Wi-Fi; contention-based access scheme; channel utilization optimization; machine learning; reinforcement learning; NS-3, NS3-gym
Online: 1 November 2022 (02:39:05 CET)
The collision avoidance mechanism adopted by the IEEE 802.11 standard is not optimal. The mechanism employs a binary exponential backoff (BEB) algorithm in the medium access control (MAC) layer. Such an algorithm increases the backoff interval whenever a collision is detected to minimize the probability of subsequent collisions. However, the expansion of the backoff interval causes degradation of the radio spectrum utilization (i.e., bandwidth wastage). That problem worsens when the network has to manage the channel access to a dense number of stations, leading to a dramatic decrease in network performance. Furthermore, a wrong backoff setting increases the probability of collisions such that the stations experience numerous collisions before achieving the optimal backoff value. Therefore, to mitigate bandwidth wastage and, consequently, maximize the network performance, this work proposes using reinforcement learning (RL) algorithms, namely Deep Q Learning (DQN) and Deep Deterministic Policy Gradient (DDPG), to tackle such an optimization problem. As for the simulations, the NS-3 network simulator is used along with a toolkit known as NS3-gym, which integrates a reinforcement-learning (RL) framework into NS-3. The results demonstrate that DQN and DDPG have much better performance than BEB for static and dynamic scenarios, regardless of the number of stations. Moreover, the performance difference is amplified as the number of stations increases, with DQN and DDPG showing a 27% increase in throughput with 50 stations compared to BEB. Furthermore, DQN and DDPG presented similar performances.
REVIEW | doi:10.20944/preprints202111.0044.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: deep reinforcement learning; model-based RL; hierarchy; trading; cryptocurrency; foreign exchange; stock market; risk; prediction; reward shaping
Online: 2 November 2021 (10:57:23 CET)
Deep reinforcement learning (DRL) has achieved significant results in many Machine Learning (ML) benchmarks. In this short survey we provide an overview of DRL applied to trading on financial markets, including a short meta-analysis using Google Scholar, with an emphasis on using hierarchy for dividing the problem space as well as using model-based RL to learn a world model of the trading environment which can be used for prediction. In addition, multiple risk measures are defined and discussed, which not only provide a way of quantifying the performance of various algorithms, but they can also act as (dense) reward-shaping mechanisms for the agent. We discuss in detail the various state representations used for financial markets, which we consider critical for the success and efficiency of such DRL agents. The market in focus for this survey is the cryptocurrency market.
ARTICLE | doi:10.20944/preprints202305.0944.v1
Subject: Computer Science And Mathematics, Computer Networks And Communications Keywords: Artificial intelligence of thing; Message queue system; Reinforcement learning approach; System throughput
Online: 12 May 2023 (11:55:58 CEST)
The convergence of artificial intelligence and the Internet of Things (IoT) has made remarkable strides in the realm of industry. The IoT devices in this computing model gather data from various sources and send it to edge servers for real-time processing, which presents a challenge to the existing message queue systems in Artificial Intelligence of Things (AIoT) Edge computing. These systems lack the adaptability to respond to the current state of the system, such as changes in the number of devices, message size, and frequency, and optimize the message transmission mechanism accordingly. Hence, it is critical to devise an approach that can effectively decouple message processing and mitigate workload fluctuations in the AIoT computing environment. To this end, this study introduces a distributed message queue system that is specifically tailored to the AIoT edge environment, utilizing a reinforcement learning approach to optimize message queue performance. Empirical findings reveal that this pioneering method significantly improves system throughput while handling varying message scenarios.
ARTICLE | doi:10.20944/preprints202302.0236.v1
Subject: Engineering, Civil Engineering Keywords: Tunnelling; Tunnel Boring Machine; Support pressure; Face stability; Reinforcement Learning; Machine Learning; Deep-Q-Network
Online: 14 February 2023 (06:10:35 CET)
In tunnel excavation with boring machines, the tunnel face is supported to avoid collapse and minimise settlement. This article proposes the use of reinforcement learning, specifically the Deep Q-Network algorithm, to predict the face support pressure. The approach is tested both analytically and numerically. By using the soil properties ahead of the tunnel face and the overburden depth as the input, the algorithm is capable of predicting the optimal tunnel face support pressure, adapting to changes in geological and geometrical conditions.
ARTICLE | doi:10.20944/preprints202211.0190.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: sustainability; smart cities; Internet of Things (IoT); multi-agent deep reinforcement learning; smart waste management; smart sensors
Online: 10 November 2022 (04:49:09 CET)
Ever-increasing need for improving the livability of a city and improve outcomes for its residents, over the last decade, the adoption of technology to develop urbanised societies around the world has given rise to the need for developing smart cities. The speed at which the world population is growing, the use of Internet of Things in smart cities have really advanced the quality of life. One significant area of concern within the smart city framework is waste management. If the waste within a city is not adequately managed, then it leads to issues in the health of the citizens. Additionally, the waste management has such a high impact on the environmental footprint, hence the need to have a smart way of managing waste is of critical importance. Through our research, we analyse the challenges of waste management within a city to understand the impact of the problem on to the citizens and overall city operations. We then investigate ways in which we can solve these problems using the emerging technologies, such as the Internet of Things, to collect valuable data of large volumes arriving at an astronomical rate, then apply multi-agent deep reinforcement learning algorithms to harness the power of big data to extract meaningful information and actionable insights. We ingest data generated by our Internet of Things into our algorithm for three main purposes including providing the notifications to an external system, for example, a map navigation engine out of the scope for this project but a future extension for route optimisation and waste vehicle tracking; extracting and reporting the actionable insights from the underlying data; and consuming the extracted data for predictive forecasting to draw out the unknown patterns of waste fill levels within various geographical locations and again send out triggers and notification to external systems for example a waste collection authority who can efficiently schedule the waste collection vehicles and optimise the route. To achieve the above mentioned outcomes, we propose a framework that is agnostic of the hardware that it connects to and can effectively interface with a wide variety of hardware keeping a level of abstraction in the architecture.
ARTICLE | doi:10.20944/preprints202111.0514.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: millimeter bands; fifth Generation; Handover; Deep Reinforcement Learning; and Jump Markov Linear System
Online: 29 November 2021 (07:50:19 CET)
The fifth Generation (5G) mobile networks use millimeter Waves (mmWaves) to offer giga bit data rates. However, unlike microwaves, mmWave links are prone to user and topographic dynamics. They easily get blocked and end up forming irregular cell patterns for 5G. This in turn cause too early, too late, or wrong handoffs (HOs). To mitigate HO challenges, sustain connectivity and avert unnecessary HO, we propose a HO scheme based on Jump Markov Linear System (JMLS) and Deep Reinforcement Learning (DRL). JMLS is widely known to account for abrupt changes in system dynamics. DRL likewise emerges as an artificial intelligence technique for learning highly dimensional and time-varying behaviors. We combine the two techniques to account for time-varying, abrupt, and irregular changes in mmWave link behaviour by predicting likely deterioration patterns of target links. The prediction is optimized by meta training techniques that also reduces training sample size. Thus, the JMLS-DRL platform formulates intelligent and versatile HO policies for 5G. Results show our proposed prediction scheme about target link behavior post HO to be highly reliable. The scheme also averts unnecessary HOs thus ably supports longer dew time.
ARTICLE | doi:10.20944/preprints202203.0161.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: multi-agent systems; multi-agent reinforcement learning; internet of vehicles; urban area
Online: 11 March 2022 (05:13:15 CET)
Smart Internet of Vehicles (IoVs) combined with Artificial Intelligence (AI) will contribute to vehicle decision-making in the Intelligent Transportation System (ITS). Multi-Vehicle Pursuit games (MVP), a multi-vehicle cooperative ability to capture mobile targets, is becoming a hot research topic gradually. Although there are some achievements in the field of MVP in the open space environment, the urban area brings complicated road structures and restricted moving spaces as challenges to the resolution of MVP games. We define an Observation-constrained MVP (OMVP) problem in this paper and propose a Transformer-based Time and Team Reinforcement Learning scheme (T3OMVP) to address the problem. First, a new multi-vehicle pursuit model is constructed based on decentralized partially observed Markov decision processes (Dec-POMDP) to instantiate this problem. Second, by introducing and modifying the transformer-based observation sequence, QMIX is redefined to adapt to the complicated road structure, restricted moving spaces and constrained observations, so as to control vehicles to pursue the target combining the vehicle’s observations. Third, a multi-intersection urban environment is built to verify the proposed scheme. Extensive experimental results demonstrate that the proposed T3OMVP scheme achieves significant improvements relative to state-of-the-art QMIX approaches by 9.66%~106.25%. Code is available at https://github.com/pipihaiziguai/T3OMVP.
ARTICLE | doi:10.20944/preprints202308.2135.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: episodic memory; deep reinforcement learning; hierarchical reinforcement learning
Online: 31 August 2023 (09:39:49 CEST)
Deep reinforcement learning is one of the research hotspots in artificial intelligence and has been successfully applied in many research areas, however, the low training efficiency and high demand for samples are problems that limit the application To address these problems, a hierarchical episodic control model extending episodic memory to the domain of hierarchical reinforcement learning is proposed in this paper. The model is theoretically justified and employs a hierarchical implicit memory planning approach for counterfactual trajectory value estimation. Starting from the final step and recursively moving back along the trajectory, a hidden plan is formed within the episodic memory. Experience is aggregated both along trajectories and across trajectories, and the model is updated using a multi-headed backpropagation similar to bootstrapped neural networks. This model extends the parameterized episodic memory framework to the realm of hierarchical reinforcement learning and is theoretically analyzed to demonstrate its convergence and effectiveness. Experiments conducted in Four Room, Mujoco, and UE4-based active tracking , highlight that the hierarchical episodic control model effectively enhances training efficiency. It demonstrates notable improvements in both low-dimensional and high-dimensional environments, even in cases of sparse rewards.
REVIEW | doi:10.20944/preprints202211.0431.v1
Subject: Social Sciences, Behavior Sciences Keywords: adventitious reinforcement; induction; noncontingent reinforcement; response-independent schedules; superstition
Online: 23 November 2022 (03:34:03 CET)
In 1948, Skinner described the behavior of pigeons under response-independent schedules as “superstitious,” and proposed that the responses were reinforced by contiguous, adventitious food deliveries. Subsequently, response-independent schedules have been of interest to both basic and applied researchers, first to understand the mechanisms involved, and later, as “noncontingent reinforcement” (NCR) to reduce undesirable behavior. However, the potential superstitious effects produced by these schedules have been challenged, with some researchers arguing that antecedent variables play a significant role. This paper examines the evidence for adventitious reinforcement from both laboratory and applied research, the results of which suggest that antecedent, non-operant functions may be important in fully understanding the effects of NCR. We propose an applied-basic research synthesis, in which attention to potential non-operant functions could provide a more complete understanding of response-independent schedules. We conclude with a summary of the applied implications of the non-operant functions of NCR schedules.
ARTICLE | doi:10.20944/preprints202304.0891.v1
Subject: Computer Science And Mathematics, Computer Networks And Communications Keywords: mobile edge computing (MEC); non-orthogonal multiple access (NOMA); video offloading; resource allocation; deep reinforcement learning (DRL)
Online: 25 April 2023 (07:03:35 CEST)
With the proliferating of video surveillance system deployment and related applications, real-time video analysis is very critical to achieve intelligent monitoring, autonomous driving, etc. It is non-trivial to achieve high accuracy and low latency video stream analysis through the traditional cloud computing. In this paper, we propose a non-orthogonal multiple access (NOMA) based edge real-time video analysis framework with one edge server (ES) and multiple user equipments (UEs). A cost minimization problem composed of delay, energy and accuracy is formulated to improve the QoE of UEs. In order to efficiently solve this problem, we propose the joint video frame resolution scaling, task offloading, and resource allocation algorithm based on the Deep Q-Learning Network (JVFRS-TO-RA-DQN), which effectively overcomes the sparsity of the single-layer reward function and accelerates the training convergence speed. JVFRS-TO-RA-DQN consists of two DQN networks to reduce the curse of dimension, which respectively select the offloading and resource allocation action, the resolution scaling action. Experimental results show that JVFRS-TO-RA-DQN can effectively reduce the cost of transmission and computation, and have better performance in convergence compared to other baseline.
COMMUNICATION | doi:10.20944/preprints202104.0575.v2
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Ethnopharmacology; Artificial Intelligence; Web Crawling; Active Learning; Reinforcement Learning; Text Mining; Big Data
Online: 23 June 2021 (11:47:32 CEST)
Ethnopharmacology experts face several challenges when identifying and retrieving documents and resources related to their scientific focus. The volume of sources that need to be monitored, the variety of formats utilized, the different quality of language use across sources, present some of what we call “big data” challenges in the analysis of this data. This study aims to understand if and how experts can be supported effectively through intelligent tools in the task of ethnopharmacological literature research. To this end, we utilize a real case study of ethnopharmacology research, aimed at the Southern Balkans and Coastal zone of Asia Minor. Thus, we propose a methodology for more efficient research in ethnopharmacology. Our work follows an “Expert-Apprentice” paradigm in an automatic URL extraction process, through crawling, where the apprentice is a Machine Learning (ML) algorithm, utilizing a combination of Active Learning (AL) and Reinforcement Learning (RL), and the Expert is the human researcher. ML-powered research improved 3.1 times the effectiveness and 5.14 times the efficiency of the domain expert, fetching a total number of 420 relevant ethnopharmacological documents in only 7 hours versus an estimated 36-hour human-expert effort. Therefore, utilizing Artificial Intelligence (AI) tools to support the researcher can boost the efficiency and effectiveness of the identification and retrieval of appropriate documents.
ARTICLE | doi:10.20944/preprints202207.0110.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: associative learning; molecular circuits; synthetic biology; mathematical modeling; Hill equation; Pavlov’s dog; reinforcement; dissociation; non-dimensionalization
Online: 7 July 2022 (04:38:20 CEST)
The development of synthetic biology has enabled us to make massive progress on biotechnology and to approach research questions from a brand new perspective. In particular, the design and study of gene regulatory networks in vitro, in vivo and in silico, have played an increasingly indispensable role in understanding and controlling biological phenomena. Among them, it is of great interest to understand how associative learning is formed at the molecular circuit level. Noticeably, mathematical models have been increasingly used to predict the behaviors of molecular circuits. The Fernando’s model, which is thought to be one of the first works in this line of research using the Hill equation, attempted to design a synthetic circuit that mimics Hebbian learning in the neural network architecture. In this article, we carry out in-depth computational analysis of the model and demonstrate that the reinforcement effect can be achieved by choosing the proper parameter values. We also construct a novel circuit that can demonstrate forced dissociation, which was not observed in the Fernando’s model. Our work can be readily used as reference for synthetic biologists who consider implementing the circuits of this kind in biological systems.
ARTICLE | doi:10.20944/preprints202212.0038.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Reconfigurable intelligent surface (RIS); integrated satellite-HAP-terrestrial networks (IS-HAP-TNs); deep reinforcement learning (DRL); optimization performance
Online: 2 December 2022 (02:35:56 CET)
In this paper, we consider a reconfigurable intelligent surface (RIS)-assisted integrated satellite-high altitude platform-terrestrial networks (IS-HAP-TNs) that can improve network performance by exploiting HAP's stability and RIS's reflection. Specifically, the reflector RIS is installed on the side of HAP to reflect signals from the multiple ground user equipments (UEs) to the satellite. To aim at maximising system sum rate, we jointly optimize the transmit beamforming matrix at the ground UEs and RIS phase shift matrix. Due to the limitation of the unit modulus of the RIS reflective elements constraint, the combinatorial optimization problem is difficult to tackle it effectively by traditional solving methods. Based on this, this paper studies deep reinforcement learning (DRL) algorithm to achieve online decision making for this joint optimization problem. In addition, it is verified through simulation experiments that the proposed DRL algorithm outperforms the standard scheme in terms of system performance and execution time, and higher computing speed, making real-time decision making truly feasible。
ARTICLE | doi:10.20944/preprints202201.0147.v1
Subject: Engineering, Civil Engineering Keywords: hybrid FRP-steel reinforcement; ductility; hybrid reinforcement ratio; fiber element; neutral axis
Online: 11 January 2022 (14:04:26 CET)
An experimental study was carried out to evaluate the ductility of reinforced concrete beams longitudinally reinforced with hybrid FRP-Steel bars. The specimens were fourteen reinforced concrete beams with and without hybrid reinforcement. The test variables were bars position, the ratio of longitudinal reinforcement, and the type of FRP bars. The beams were loaded up to failure using a four-point bending test. The performance of the tested beams was observed using the load-deflection curve obtained from the test. Numerical analysis using the fiber element model was used to examine the growth of neutral axis depth due to the effect of test variables. The neutral axis curves were then used to further estimate the neutral axis angle and neutral axis displacement index. The test results show that the position of the reinforcement greatly influences the flexural behavior of the beam with hybrid reinforcement. It was observed from the test that the flexural capacity of beams with hybrid reinforcement is 4% to 50% higher than that of the beams with conventional steel bars depending on bars position and the ratio of longitudinal reinforcement. The ductility decreases as the hybrid reinforcement ratio (Af/As) increases. This study also showed that a numerical model developed can predict the flexural behavior of beams with hybrid reinforcement with reasonable accuracy.
Subject: Computer Science And Mathematics, Computer Science Keywords: reinforcement learning; bitrate streaming; world-models; video streaming; model-based reinforcement learning
Online: 20 August 2020 (07:02:57 CEST)
Adaptive bitrate (ABR) algorithms optimize the quality of streaming experiences for users in client-side video players especially in unreliable or slow mobile networks. Several rule-based heuristic algorithms can achieve stable performance, but they sometimes fail to adapt properly to changing network conditions. Fluctuating bandwidth may cause algorithms to default to behavior that creates a negative experience for the user. ABR algorithms can be generated with reinforcement learning, a decision-making paradigm in which an agent learns to make optimal choices through interactions with an environment. Training reinforcement learning algorithms for bitrate streaming requires building a simulator for an agent to experience interactions quickly; training an agent in the real environment is infeasible due to the long step times in real environments. This project explores using supervised learning to construct a world-model, or a learned simulator, from recorded interactions. A reinforcement learning agent trained inside of the learned model, rather than a simulator, can outperform rule-based heuristics. Furthermore, agents trained inside the learned world-model can outperform model-free agents in low sample regimes. This work highlights the potential for world-models to quickly learn simulators, and to be used to generate optimal policies.
ARTICLE | doi:10.20944/preprints202010.0156.v1
Subject: Engineering, Energy And Fuel Technology Keywords: Artificial intelligence; Deep reinforcement learning; Demand Response; Dynamic pricing; Energy management system; Microgrid; Neural networks; Price-responsive loads; Smart grid; Thermostatically controlled loads
Online: 7 October 2020 (11:21:03 CEST)
In this paper, we study the performance of various deep reinforcement learning algorithms to enhance the energy management system of a microgrid. We propose a novel microgrid model that consists of a wind turbine generator, an energy storage system, a set of thermostatically controlled loads, a set of price-responsive loads, and a connection to the main grid. The proposed energy management system is designed to coordinate among the different flexible sources by defining the priority resources, direct demand control signals, and electricity prices. Seven deep reinforcement learning algorithms were implemented and are empirically compared in this paper. The numerical results show that the deep reinforcement learning algorithms differ widely in their ability to converge to optimal policies. By adding an experience replay and a semi-deterministic training phase to the well-known asynchronous advantage actor-critic algorithm, we achieved the highest model performance as well as convergence to near-optimal policies.
ARTICLE | doi:10.20944/preprints202306.1005.v1
Subject: Computer Science And Mathematics, Security Systems Keywords: Continuous Authentication; Static Authentication; Behavioral Biometrics; Reinforcement Learning (RL); Q-learning; Keystroke Dynamics; Anomaly Detection; Machine Learning; Supervised Learning; User Authentication; Identification
Online: 14 June 2023 (07:42:17 CEST)
This article focuses on developing a continuous authentication system using behavioral biometrics to recognize users accessing computing devices. The user’s distinct behavioral biometric is captured through keystroke dynamics, and reward-based reinforcement learning (RL) ideas are applied to recognize them throughout the session. The suggested system adds an extra layer of security to traditional authentication methods, forming a robust continuous authentication system that can be added to static authentication systems. The methodology involves training a RL model to detect unusual user typing patterns and flag suspicious activity. Each user has an agent trained on their historical data, which is preprocessed and used to create episodes for the agent to learn from. The environment involves fetching observations and randomly corrupting them to learn out-of-order behavior. The observation vector includes both running features and summary features. The re-ward function is binary and minimalistic. The Principal Component Analysis (PCA) model is used to encode the running features, and the Double Deep Q-Network (DDQN) algorithm with a fully connected neural network is used as the policy net. The evaluation achieved an average training accuracy and EER (equal error rate) of 94.7% and 0.0126 and test accuracy and ERR of 81.06% and 0.0323 for all users when the number of encoder features was increased. Therefore, it is concluded that by continuously learning and adapting to changing behavior patterns, this approach can provide more secure and personalized authentication, lowering the possibility of unauthorized access and cyberattacks. Overall, the use of reinforcement learning and behavioral biometrics for continuous authentication has the potential to significantly enhance security in the digital age and are effective in identifying each user.
ARTICLE | doi:10.20944/preprints202207.0461.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Malaria; digital; epidemic; mixed infections; reinforcement
Online: 29 July 2022 (11:25:46 CEST)
Malaria is a long-standing disease and one of the top life-threatening diseases, yet its treatment has not changed, while the world has already embraced the Fourth Industrial Revolution (4IR). A wave of research on digitizing monitoring mechanisms of such a deadly disease has surfaced. Automated malaria screening is one of the detection processes which are gaining popularity in the research domain. However, the process needs to be coupled with other processes aiming a nationally or regionally contextualised malaria monitoring system. This paper proposes a digital malaria monitoring system in the context of an African country or region. One advantage of such a digital system is that is enables a novel disease spread forecasting model based on the dynamics of different malaria types. The architecture of the diagnosis system is described, and the disease spread model is mathematically modelled in terms of a SPITR (Susceptible- Protected- Infected-Treated- Recovered) epidemic model which is further analysed. The forecasting model is expressed and analysed whereas experiments are conducted using a Monte Carlo simulation method. The design of the monitoring system has inspired how predictions can be made in the complex cases such as mixed infections. Results show that reinforcing the model parameter makes a significant improvement on the disease prediction.
ARTICLE | doi:10.20944/preprints202110.0408.v1
Subject: Chemistry And Materials Science, Polymers And Plastics Keywords: PA66; PA66GF; weaves; reinforcement; overmoulding; composites
Online: 27 October 2021 (12:43:39 CEST)
The need to develop novel lightweight materials and their manufacturing processes is sets out to meet the new aerospace, automotive and construction requirements. Within this context, this research work is proposed to develop a novel thermoplastic composite material with high mechanical properties. These composites will be based on thermoplastic matrixes made from polyamide and 35% short glass fiber filled-polyamide reinforced with different types of fabrics. As reinforcement, glass fiber fabrics will be used as the base. They will be treated with different processes, both chemical and physical, to promote adherence to the matrix. Textile overmoulding technology was selected for manufacturing these composites. This technology was primarily developed to manufacture aesthetic lined components and has achieved a great implantation. Once these new composites are manufactured, they will be submitted to different tests to evaluate their behavior regarding adhesion, impact strength and stiffness. It is expected an improvement on stiffness and impact absorption.
REVIEW | doi:10.20944/preprints201804.0019.v1
Subject: Engineering, Civil Engineering Keywords: building rehabilitation; energy efficiency; seismic reinforcement
Online: 2 April 2018 (10:26:38 CEST)
Most European cities are characterized by very large areas, often formed by buildings with low quality, from a series of point of view. The possibility of renovating them is strategic to improve both the quality of life and to the possibility of economic recovery for building companies. In the last decades, the attention of the scientific community has been addressed to the energy renovation, thanks to the strong activities of the European Community in this field. However, since a relevant part of the EC territory is at risk of earthquake, the possibility to combine both energy and seismic renovation actions may be strategic for many countries. In particular, Italy and Romania are linked by a common social tradition that springs from the Roman Empire. Nowadays, this link is stronger, thanks to common interests in social, cultural and business fields. Therefore, the investigation of possible synergies for seismic and energy renovation strategies may be really interesting for both countries. This paper represents the first step in this direction. After an overview of regulations and common practices for buildings with reinforced concrete structures, in both states, some key combined renovation interventions will be described and discussed, as well as advantages and perspectives of integrated renovation approaches.
ARTICLE | doi:10.20944/preprints201704.0118.v2
Subject: Engineering, Civil Engineering Keywords: bond; concrete; reinforcement; damage-plasticity; failure
Online: 25 August 2017 (08:01:21 CEST)
The structural performance of reinforced concrete relies heavily on the bond between reinforcement and concrete. In nonlinear finite element analyses, bond is either modelled by merged, also called perfect bond, or coincident with slip, also called bond-slip, approaches. Here, the performance of these two approaches for the modelling of failure of reinforced concrete was investigated using a damage-plasticity constitutive model in LS-DYNA. Firstly, the influence of element size on the response of tension-stiffening analyses with the two modelling approaches was investigated. Then, the results of the two approaches were compared for plain and fibre reinforced tension stiffening and a drop weight impact test. It was shown that only the coincident with slip approach provided mesh insensitive results. However, both approaches were capable of reproducing the overall response of the experiments in the form of load and displacements satisfactorily for the meshes used.
ARTICLE | doi:10.20944/preprints202101.0115.v1
Subject: Physical Sciences, Acoustics Keywords: machine learning; virtual diagnostics; reinforcement learning control
Online: 6 January 2021 (11:58:41 CET)
We discuss the implementation of a suite of virtual diagnostics at the FACET-II facility currently under commissioning at SLAC National Accelerator Laboratory. The diagnostics will be used for prediction of the longitudinal phase space along the linac, spectral reconstruction of the bunch profile and non-destructive inference of transverse beam quality (emittance) using edge radiation at the injector dogleg and bunch compressor locations. These measurements will be folded in to adaptive feedbacks and ML-based reinforcement learning controls to improve the stability and optimize the performance of the machine for different experimental configurations. In this paper we describe each of these diagnostics with expected measurement results based on simulation data and discuss progress towards implementation in regular operations.
ARTICLE | doi:10.20944/preprints201805.0348.v1
Subject: Engineering, Mechanical Engineering Keywords: failure criteria; curauá fibers; reinforcement direction; ANOVA.
Online: 24 May 2018 (10:23:36 CEST)
Natural fibers are being increasingly used in different areas of engineering, including as composite reinforcement. Among these fibers, carauá stands out for its good mechanical properties and adherence to resin. Nevertheless, little is known about the behavior of this material in the manufacture of a composite or whether classic failure theory can be used in this case. In this context, the present study assesses the mechanical properties of two laminas made of unidirectional curauá fiber with volumetric fiber percentages of 30 % and 22 %, and compares the results with the values obtained for four failure criteria reported in the literature, using analysis of variance (ANOVA). To that end, tensile tests were conducted in the direction of the fiber and at other loading angles, in addition to iosipescu shear tests. The results show that the maximum stress criterion does not represent the failure behavior of these materials and that the best was the Hashin criterion.
ARTICLE | doi:10.20944/preprints201608.0220.v1
Subject: Chemistry And Materials Science, Polymers And Plastics Keywords: composite fibers; flexural strength; polyester matrix; reinforcement
Online: 29 August 2016 (10:46:01 CEST)
Composite fiber materials are superior materials due to their high strength and light weight. Composites reflect the properties of their constituents, which is proportional to the volume fraction of each phase. There are different fiber reinforcement types and each affects it’s flexural, tensile and compression strength. When selecting a composite for a specific application, the forces excreted on the composite must be known in order to determine the reinforcement type. Unidirectional fiber reinforcement will allow very strong load resistance but only in one direction where as a random orientated fiber reinforcement can resist less load but can maintain this quota in all directions. These materials are said to be anisotropic. Certain composite fibers, taking into consideration there weights, are physically stronger than conventional metals. This research deals with the analysis of three composite materials with different reinforcement types, volume fraction and phase content. It was found that material A (glass epoxy) was the strongest in the longitudinal direction with a flexural strength of 534 MPa in the longitudinal and 420 MPa in the transverse direction. The flexural stresses of material B (glass silicone) and material C (glass polyester) where both found in the 120 to 135 MPa range. Differences were due to their differences in matrix composition and reinforcement type.
BRIEF REPORT | doi:10.20944/preprints202307.0118.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Graph Neural Networks; Reinforcement Learning; Over-smoothing; Heterophily
Online: 4 July 2023 (03:28:51 CEST)
This report gives a comprehensive summary of two problems about graph convolutional networks (GCNs): over-smoothing and heterophily challenges, and outlines future directions to explore.
ARTICLE | doi:10.20944/preprints202304.1162.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Reinforcement learning; Decision tree; Explainable AI; Rule learning
Online: 28 April 2023 (10:14:59 CEST)
The demand for explainable and transparent models increases with the continued success of reinforcement learning. In this article, we explore the potential of generating shallow decision trees (DT) as simple and transparent surrogate models for opaque deep reinforcement learning (DRL) agents. We investigate three algorithms for generating training data for axis-parallel and oblique DTs with the help of DRL agents ("oracles") and evaluate these methods on classic control problems from OpenAI Gym. The results show that one of our newly developed algorithms, the iterative training, outperforms traditional sampling algorithms, resulting in well-performing DTs that often even surpass the oracle from which they were trained. Even higher dimensional problems can be solved with surprisingly shallow DTs. We discuss the advantages and disadvantages of different sampling methods and insights into the decision-making process made possible by the transparent nature of DTs. Our work contributes to the development of not only powerful but also explainable RL agents and highlights the potential of DTs as a simple and effective alternative to complex DRL models.
REVIEW | doi:10.20944/preprints202211.0302.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: AI; UAV; IoT; mobile edge computing; reinforcement learning
Online: 16 November 2022 (09:06:37 CET)
With the rapid development of the Internet of Things (IoT), there are dramatic increasing number of devices in the network, which causes the challenge that only using infrastructure such as based station cannot provide service with all devices with high quality. Therefore, due to their flexibility and economy, unmanned aerial vehicles (UAV) are widely used to increase the performance of IoT networks. UAVs can not only provide communication services for IoT devices in the absence of a network, but they can also perform video surveillance, cargo transportation, pesticide spraying, and other specialized tasks. However, due to the complexity of the scenario and the need for real-time decision making, it is challenging to schedule UAVs in the network using traditional optimization methods, and growing attention has focused on using AI to optimize UAVs in the network. In this paper, we focus on the AI-enabled UAV optimization method in IoT networks and give a comprehensive scope on what and how to use AI-enabled methods to increase the performance of UAV-assisted IoT networks. Moreover, a brief analysis of the challenges of using AI methods in IoT networks and some potential research directions are given.
ARTICLE | doi:10.20944/preprints202203.0119.v1
Subject: Engineering, Automotive Engineering Keywords: smart scheduling; smart reservations; reinforcement learning; electric vehicle charging; electric vehicle charging management platform; neural network; DQN reinforcement Learning algorithm
Online: 8 March 2022 (08:54:48 CET)
The widespread adoption of electromobility constitutes one of the measures designed to reduce air pollution caused by traditional fossil fuels. However, several factors are currently impending this process, ranging from insufficient charging infrastructure, battery capacity, long queueing and charging time, to psychological factors. On top of range anxiety, the frustration of the EV drivers is further fueled by the lack the uncertainty of finding an available charging point on their route. To address this issue, we propose a solution that comes to bypass the limitations of the Reserve now function of the OCPP standard, enabling drivers to make charging reservations for the upcoming days, especially when planning a longer trip. We created an algorithm that generates reservation intervals based on the charging station's reservation and transaction history. Subsequently, we ran a series of test cases that yielded promising results, with no overlapping reservations.
ARTICLE | doi:10.20944/preprints202308.1865.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: Reinforcement Learning; Episodic Control; Synthetic Aperture Radar; Image Registration
Online: 29 August 2023 (09:36:31 CEST)
For Synthetic Aperture Radar (SAR) image registration, successive processes following feature extraction are required by both the traditional feature-based method and the deep learning method. Among these processes, the feature matching process—whose time and space complexity are related to the number of feature points extracted from sensed and reference images, as well as the dimension of feature descriptors—proves to be particularly time-consuming. Additionally, the successive processes introduce data sharing and memory occupancy issues, requiring an elaborate design to prevent memory leaks. To address these challenges, this paper introduces the OptionEM-based reinforcement learning framework to achieve end-to-end SAR image registration. This framework outputs registered images directly without requiring feature matching and calculation of the transformation matrix, leading to significant processing time savings. The Transformer architecture is employed to learn image features, while a correlation network is introduced to learn the correlation and transformation matrix between image pairs. Reinforcement learning, as a decision process, can dynamically correct errors, making it more efficient and robust compared to supervised learning mechanisms like deep learning. We present a hierarchical reinforcement learning framework combined with episodic memory to mitigate the inherent problem of invalid exploration in generalized reinforcement learning algorithms. This approach effectively combines coarse and fine registration, further enhancing training efficiency. Experiments conducted on three sets of SAR images, acquired by TerraSAR-X and Sentinel-1A, demonstrate that the proposed method’s average runtime is sub-second, achieving subpixel registration accuracy.
ARTICLE | doi:10.20944/preprints201912.0229.v1
Subject: Engineering, Civil Engineering Keywords: masonry structures; stiffening walls; wall joints; connectors; bed joint reinforcement
Online: 17 December 2019 (10:46:57 CET)
Joints between walls are very important for structural analysis of each masonry building at the global and local level. This issue was often neglected in case of traditional joints and relatively squat walls. Nowadays the issue of wall joints is becoming particularly important due to the continuous strive for simplifying structures, introducing new technologies and materials. Eurocode 6 and other standards (USA, Canadian, Chinese, and Japanese) recommend inspecting joints between walls, but no detail procedures have been specified. This paper presents our own tests on joints between walls made of autoclaved aerated concrete (AAC) masonry units. Tests included reference models composed of two wall panels joined perpendicularly with a masonry bond (6 models), traditional steel and modified connectors (12 models). A shape and size of test models and structure of a test stand were determined on the basis of the analysis of the current knowledge, pilot studies and numerical analyses of FEM. The analysis referred to the morphology and failure mechanism of models. Load-displacement relationships for different types of joints were compared and obtained results were referred to results for reference models. A mechanism of cracking and failure was found to vary, and clear differences in behaviour and load capacity of each type of joints were observed. Individual working phases of joints were determined and defined, and the empirical approach was suggested to determine forces and displacement of wall joints.
ARTICLE | doi:10.20944/preprints201804.0021.v1
Subject: Business, Economics And Management, Economics Keywords: interbank market; contagion risk; multi-agent system; reinforcement learning agents
Online: 2 April 2018 (10:51:49 CEST)
In this study, we examine the relationship of bank level lending and borrowing decisions and the risk preferences on the dynamics of the interbank lending We develop an agent-based model that incorporates individual bank decisions using the temporal difference reinforcement learning algorithm with empirical data of 6600 S. banks. The model can successfully replicate the key characteristics of interbank lending and borrowing relationships documented in the recent literatur A key finding of this study is that risk preferences at individual bank level can lead to unique interbank market structures which are suggestive of the capacity that the market responds to surprising
REVIEW | doi:10.20944/preprints202303.0292.v1
Subject: Computer Science And Mathematics, Mathematics Keywords: Bayesian optimization; Gaussian process regression; acquisition function; machine learning; reinforcement learning
Online: 16 March 2023 (01:36:11 CET)
Machine learning forks into three main branches such as supervised learning, unsupervised learning, and reinforcement learning where reinforcement learning is much potential to artificial intelligence (AI) applications because it solves real problems by progressive process in which possible solutions are improved and finetuned continuously. The progressive approach, which reflects ability of adaptation, is appropriate to the real world where most events occur and change continuously and unexpectedly. Moreover, data is getting too huge for supervised learning and unsupervised learning to draw valuable knowledge from such huge data at one time. Bayesian optimization (BO) models an optimization problem as a probabilistic form called surrogate model and then directly maximizes an acquisition function created from such surrogate model in order to maximize implicitly and indirectly the target function for finding out solution of the optimization problem. A popular surrogate model is Gaussian process regression model. The process of maximizing acquisition function is based on updating posterior probability of surrogate model repeatedly, which is improved after every iteration. Taking advantages of acquisition function or utility function is also common in decision theory but the semantic meaning behind BO is that BO solves problems by progressive and adaptive approach via updating surrogate model from a small piece of data at each time, according to ideology of reinforcement learning. Undoubtedly, BO is a reinforcement learning algorithm with many potential applications and thus it is surveyed in this research with attention to its mathematical ideas. Moreover, the solution of optimization problem is important to not only applied mathematics but also AI.
ARTICLE | doi:10.20944/preprints202107.0545.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: UAVs; Wireless Power Transfer; RF energy harvesting, MIMO; Deep Reinforcement Learning.
Online: 23 July 2021 (13:29:51 CEST)
The Unmanned Aerial Vehicles (UAVs), used in civilian applications such as emergency medical deliveries, precision agriculture, wireless communication provisioning, etc., face the challenge of limited flight time due to their reliance on the on-board battery. Therefore, developing efficient mechanisms for in-situ power transfer to recharge UAV batteries hold potential in extending their mission time. In this paper, we study the use of far-field wireless power transfer (WPT) technique from specialized, transmitter UAVs (tUAVs) carrying Multiple Input Multiple Output (MIMO) antennas for transferring wireless power to receiver UAVs (rUAVs) in a mission. The tUAVs can fly and adjust their distance to the rUAVs to maximize energy transfer. The use of MIMO antennas further boost the energy reception by narrowing the energy beam toward the rUAVs. The complexity of their dynamic operating environment increases with the growing number of tUAVs, and rUAVs with varying levels of energy consumption and residual power. We propose an intelligent trajectory selection algorithm for the tUAVs based on a deep reinforcement learning model called Proximal Policy Optimization (PPO) to optimize the energy transfer gain. Simulation results demonstrate that with the use of PPO, the system achieves a tenfold flight time extension compared to no wireless recharging. Further, PPO outperforms the benchmark movement strategies of ’Traveling Salesman Problem’ and ’Low Battery First’ when used by the tUAVs.
ARTICLE | doi:10.20944/preprints201806.0127.v1
Subject: Engineering, Energy And Fuel Technology Keywords: Complicated structural region; Directional drilling; Grouting reinforcement; Coal floor; Karst aquifer
Online: 7 June 2018 (16:07:36 CEST)
Water inrush from coal floor constitutes one of the main disasters in mine construction and mine production, which always brings high risks and losses to the coal mine safe production. As the mining depth of coal fields in North China gradually increased, especially in the complicated structural region, the threat posed by limestone karstic water of coal floor to the safe stoping of mines has become increasingly prominent. In this paper, the Taoyuan coalmine was taken as an example, for which, the directional borehole grouting technology was utilized to reinforce the coal seam floor prior to mining. Also, the factors affecting the grouting effect were analyzed. These were the geological structure, the crustal stress and the range of slurry diffusion. The layout principle of grouting drilling was put forward and the directional drilling structure was designed. The water level observations in the end hole indicated that the target stratum was accurate and reliable. The effect of grouting was validated through the audio frequency electric perspective method and the holedrilling in the track trough. The results demonstrated that the effect of grouting in third limestone and the rock stratum above the third limestone of coal seam floor was apparent. Simultaneously, no water inrush occurred following the actual mining of the working face, which further demonstrated that the grouting reinforcement effect was apparent. The research findings were of high significance for the prevention and control of floor water disaster and water conservation in deep complex structural areas.
ARTICLE | doi:10.20944/preprints202303.0260.v1
Subject: Engineering, Chemical Engineering Keywords: Additive manufacturing; Concrete; Particle bed; Reinforcement; SPI; WAAM; Rheology; Temperature; Concrete strength
Online: 14 March 2023 (13:24:27 CET)
Selective Paste Intrusion (SPI) is an additive manufacturing (AM) process in which thin layers of aggregates are selectively bonded by cement paste only where the structure is to be produced. In this way, concrete elements with complex geometries and structures can be created. Reinforcement is required to increase the flexural strength of the concrete elements and, thus, enable their applicability in practice. Integrating the reinforcement is a difficult task, particularly in the case of SPI due to the layer-wise printing method. Especially with respect to possible complex structures, the production of the reinforcement needs to be adapted to SPI, thereby offering a high degree of freedom. One concept for a reinforcement integration is combining the two additive manufacturing processes SPI and Wire and Arc Additive Manufacturing (WAAM). However, since the two processes serve different fields of application, their compatibility is not necessarily given. Ongoing investigations show that the temperatures caused by WAAM adversely affect both, the cement paste rheology required for sufficient paste penetration into the particle bed and the overall concrete strength. This paper provides an overview of ongoing research focusing on different cooling strategies and their effects on the compressive strength of SPI-printed concrete parts.
ARTICLE | doi:10.20944/preprints202109.0177.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Sustainable wireless connectivity; Energy saving; UAV; Communication system; 5G; Positioning; Reinforcement learning
Online: 9 September 2021 (11:28:38 CEST)
Unmanned aerial vehicles (UAVs)-based communication system is a promising solution to meet coverage and capacity requirements of future wireless networks. However, UAV-enabled communications is constrained with its coverage, energy consumption, and flying regulations, and the number of works focusing on the sustainability aspect of UAV-assisted networking has been limited in the literature so far. In this paper, we propose a solution to this limitation; particularly, we design a $Q$-learning-based UAV positioning scheme for sustainable wireless connectivity considering key constraints, that are, altitude regulations, non-flight zones, and transmit power. The objective is to find the optimal position of the UAV base station (BS) and minimize the energy consumption while maximizing the number of users covered. Moreover, a weighting mechanism is developed, where the energy consumption and number of users covered can be prioritized according to network/battery conditions. The proposed Q-learning-based solution is compared to the baseline k-means clustering method, where the UAV BS is positioned at the centroid location that minimizes the cumulative distance between the UAV BS and the users. The results demonstrate that the proposed solution outperforms the baseline k-means clustering-based method in terms of the number of users covered while achieving the desired minimization of the energy consumption.
ARTICLE | doi:10.20944/preprints202011.0675.v1
Subject: Engineering, Automotive Engineering Keywords: fiber reinforcement; SHCC; ECC; impact testing; split Hopkinson tension bar; structural inertia
Online: 26 November 2020 (15:03:39 CET)
The performance of a normal-strength SHCC under impact loading was studied using the results obtained from a split Hopkinson tension bar (SHTB). The focus of the investigation is to explain the mechanisms behind the peculiar rate-dependent behavior of SHCC under tensile loading. With the help of frames obtained by high-speed cameras and the subsequent Digital Image Correlation (DIC) analysis, the stress-strain relation of the SHCC obtained in SHTB was analyzed. The investigation of the composite’s behavior was supported by constituent-level experiments on the non-reinforced matrix of the SHCC and on the fiber-matrix bond. In the case of the constituent matrix, the well-known apparent increase in the tensile strength of the cement-based matrix and its influence on the behavior of SHCC was studied. For this purpose, experiments on the SHCC specimens with different geometries were performed in the SHTB. The results obtained from these experiments and those obtained by DIC show that commonly used analytical models, in which the specimen is assumed elastic, cannot capture the effects of structural inertia on the results. Thus, an alternative novel method based on the results of DIC has been used to explain and quantify the contribution of structural inertia. The rate-dependent behavior of the fiber-matrix bond was studied by performing high-speed single fiber pullout tests in a miniaturized split Hopkinson tension bar. This novel experimental technique enabled explanation of the rate-dependent bridging action of the fibers in SHCC. Based on the results, the enhanced behavior of SHCC under impact loading is explained.
ARTICLE | doi:10.20944/preprints201901.0021.v1
Subject: Engineering, Civil Engineering Keywords: Rebar location, FRP reinforcement, NDT Methods, GPR Testing, Ultrasonic Testing, Electromagnetic Testing
Online: 3 January 2019 (13:10:29 CET)
An increasing use of non-metallic reinforcement is problematic as it has to be detected at the stage of accepting construction works, or later when expert opinions are prepared for the building. In contrast to metallic reinforcement, location of this type of reinforcement is difficult using non-destructive techniques. Small diameters of rebars and their location in a tested element were troublesome. This article describes an attempt to locate non-metallic reinforcement in a concrete element and the masonry. Tests were performed using an ultrasonic tomograph and GPR with a broad range of frequencies.
ARTICLE | doi:10.20944/preprints201809.0227.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: location-aware; cooperative anti-jamming; Markov decision process; Markove game; reinforcement learning
Online: 13 September 2018 (03:26:04 CEST)
This paper investigates the cooperative anti-jamming distributed channel selection problem in UAV communication networks. Considering the existence of malicious jamming and co-channel interference, a location-aware cooperative anti-jamming scheme is designed for the purpose of maximizing the users' utilities. Users in the UAV group cooperate with each other via location information sharing. When the received interference energy is lower than mutual interference threshold, users conduct channel selection strategies independently. Otherwise, users take joint actions with a cooperative anti-jamming pattern under the impact of mutual interference. Aimed at the independent anti-jamming channel selection problem under no mutual interference, a Markov Decision Process framework is introduced, whereas for the cooperative anti-jamming channel selection case under the influence of co-channel mutual interference, a Markov game framework is employed. Furthermore, motivated by reinforcement learning with a ``Cooperation-Decision-Feedback-Adjustment" idea, we design a location-aware cooperative anti-jamming distributed channel selection algorithm (LCADCSA) to obtain the optimal anti-jamming channel strategies for the users with a distributed way. In addition, the channel switching cost and cooperation cost, which have great impact on the users' utilities, are introduced. Finally, simulation results show that the proposed algorithm converges to a stable solution with which the UAV group can avoid the malicious jamming as well as co-channel interference effectively.
ARTICLE | doi:10.20944/preprints202304.0826.v1
Subject: Engineering, Civil Engineering Keywords: numerical analysis; virtual tracking element; segmental joints; stainless steel corrugated plate; tunnel reinforcement
Online: 24 April 2023 (05:14:49 CEST)
The shield tunnels inevitably endure various damage with the service time increasing. Meanwhile, steel corrugated plate has been used extensively in multiple conditions and proved effective to strengthen the segmental joints by full-scale tests. A numerical model is proposed to probe the feasibility of using new Stainless Steel Corrugated Plate (SSCP) to reinforce the shield tunnel segments. A new method, called virtual tracking element technology, is employed to achieve the simulation of realistic stress state of the segmental joint. Moreover, the segmental joint component analysis and parametric study are conducted based on the numerical model. The results demonstrate that: (1) the virtual tracking element technology proves valid and efficient to simulate the secondary stress state of segmental joint; (2) SSCP reinforcement is not fully utilized when the grade of segmental concrete is C50 and it has abundant safety margin for potential overload; (3) SSCP reinforcement is well-performed regardless of the burial depth and reinforcement in advance is recommended.
ARTICLE | doi:10.20944/preprints202201.0041.v1
Subject: Engineering, Civil Engineering Keywords: building remodeling; concentrated loads; FRP reinforcement; FRP strips; shear capacity, vertical concrete cantilever
Online: 5 January 2022 (13:01:16 CET)
Renovation, restoration, remodeling, refurbishment, and retrofitting of build-ings often imply modifying the behavior of the structural system. Modification sometimes includes applying forces (i.e., concentrated loads) to beams that before were subjected to distributed loads only. For a reinforced concrete structure, the new condition causes a beam to bear a concentrated load with the crack pattern that was produced by the distributed loads that acted in the past. If the concentrated load is applied at or near the beam’s midspan, the new shear demand reaches the maximum around the midspan. But around the midspan, the cracks are vertical or quasi-vertical, and no inclined bar is present. So, the actual shear capacity around the midspan not only is low, but also can be substantially lower than the new demand. In order to bring the beam capacity up to the demand, fiber-reinforced-polymer composites can be used. This paper presents a design method to increase the concentrated load-carrying capacity of reinforced concrete beams whose load distribution has to be changed from distributed to concentrated, and an analytical model to pre-dict the concentrated load-carrying capacity of a beam in the strengthened state.
ARTICLE | doi:10.20944/preprints202008.0534.v1
Subject: Engineering, Civil Engineering Keywords: jute fibre; reinforcement; modified compaction test; California bearing ratio test; stabilization; shear strength
Online: 25 August 2020 (03:30:40 CEST)
Abstract: This paper is focusing on the stabilisation of soil using jute fibre as soil stabilizer. Stabilisation is the process of modifying the properties of a soil to improve its engineering performance and used it for a variety of engineering works. This study examines the potential of soil stabilization with jute fibre when it is cut into roughly 30mm lengths as stabilizer. The varying percentages like 0.5%, 1%, 1.5 and 2% of pieces of jute fibre were used and mixed it with soil. The laboratory tests such as California Bearing Ratio (CBR) test, modified compaction tests and direct shear strength tests have been conducted to observe the change in engineering properties of soil. On the basis of the experiments performed, it can be concluded that the stabilization of soil using 30mm pieces of jute as stabilizer improves the strength characteristics of the soil so that it becomes usable as one of the reinforcing material for the construction of roadways, parking areas, site development projects, airports and many other situations where sub-soils are not suitable for construction.
ARTICLE | doi:10.20944/preprints201902.0193.v1
Subject: Engineering, Civil Engineering Keywords: retrofitting; earthquakes; masonry; historical buildings; active reinforcement; Mohr’s circles; CAM system; Φ system
Online: 20 February 2019 (12:18:11 CET)
The present paper deals with the retrofitting of unreinforced masonry (URM) buildings, subjected to in-plane shear and out of-plane loading when struck by an earthquake. After an introductive comparison between some of the latest punctual and continuous active retrofitting methods, the authors focused on the two most effective active continuous techniques, the CAM system and the Φ system, which also improve the box-type behavior of buildings. These two retrofitting systems allow us to increase both the static and dynamic load-bearing capacity of masonry buildings. Nevertheless, information on how they actually modify the stress field in static conditions is lacking and sometimes questionable, in the literature. Therefore, we performed a static analysis in the plane of Mohr/Coulomb, with the dual intent to clarify which of the two is preferable under static conditions and whether the models currently used to design the retrofitting systems are fully adequate.
REVIEW | doi:10.20944/preprints202308.1539.v2
Subject: Biology And Life Sciences, Life Sciences Keywords: machine learning; reinforcement learning; deep learning; Gaussian process; artificial neural networks; real-time diagnostics
Online: 25 September 2023 (05:19:01 CEST)
Plasma technology shows tremendous potential for revolutionizing oncology research and treatment. Reactive oxygen and nitrogen species, electromagnetic emissions generated through gas plasma jets, have attracted significant attention due to their selective cytotoxicity towards cancer cells. To leverage the full potential of plasma medicine, researchers have explored the use of mathematical models and various subsets of machine learning, such as reinforcement learning, and deep learning. This review emphasizes the significant application of AI algorithms in the adaptive plasma system, paving the way for precision and dynamic cancer treatment. Realizing the full potential of AI in plasma medicine, requires research efforts, data sharing and interdisciplinary collaborations. Unravelling the complex mechanisms, developing real-time diagnostics, and optimizing AI models will be crucial to harness the true power of plasma technology in oncology. The integration of personalized and dynamic plasma therapies, alongside AI and diagnostic sensors, presents a transformative approach to cancer treatment with the potential to improve outcomes globally.
ARTICLE | doi:10.20944/preprints202308.1040.v1
Subject: Engineering, Civil Engineering Keywords: Beams with openings; Basalt fiber-reinforced polymer (BFRP); Stiffness, Ductility; Energy; Hybrid Reinforcement; Strengthening
Online: 15 August 2023 (02:44:01 CEST)
The opened beams always confused the designers due to the guidelines missing. In this research, six hybrid reinforced beams reinforced with mixed steel and basalt fiber reinforced polymer (BFRP) bars had constant cross-sections of 150mm x 300mm and a clear span of 1800mm. Generally, five beams have symmetrical rectangular openings with dimensions of 150mm x 250mm located at a distance of 250mm (equivalent to the beam effective depth) from the beam support in addition to a solid beam that is served as a control reference. The studied parameters included the effect of using internal reinforcement (steel or BFRP bars) provided along the opening or by incorporating an external BFRP sheet around the opening corners. Also, the conduction of double enhancement with internal steel reinforcement bars further external strengthening BFRP sheet was investigated. The relevant results showed that the opened beam without enhancement lost almost 74.66% of the maximum load compared with the solid beam. Placing internal steel or BFRP bars around the openings increased the maximum load by 62.07% and 59.68%, respectively in comparison to the non-enhanced opened beams. Using an external BFRP sheet to strengthen the opening corners of the beam enhanced the maximum load by 76.39% compared with the non-enhanced opened beam. Therefore, if the beam double enhancement with an external BFRP sheet and internal steel reinforcement around the openings, the maximum load increased by 137.40% compared with the non-enhanced opened beam. Ultimately to further analyze the experimental results and confirm their findings, the study was extended to include the numerical analysis using three dimensional finite element modeling and the results correlated very well with the experimental ones.
REVIEW | doi:10.20944/preprints202307.1929.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: onboard microgrid; intelligent transportation; energy management; artificial intelligence; digital twin; machine learning; reinforcement learning
Online: 28 July 2023 (07:35:21 CEST)
In the past decades, the world is actively working towards the global goal of net-zero emission. To decrease emissions, there is a notable trend of electrification in transportation, which is a transition from traditional fuel-based systems to electrical power systems onboard different transportation platforms. For this transition, it is important to study the electrical structure, specifically the onboard microgrid, powered by various energy sources. In this paper, traditional energy management strategies for onboard microgrid systems are discussed, which usually require complicated optimization algorithms and high computation capabilities. Driven by the recent advancements in information technologies, artificial intelligence and digital twin have gained much interest within the transportation sector. These technologies can effectively utilize data to achieve intelligent decision-making, optimize resource utilization, and save energy consumption. This paper presents an overview of the usage of these emerging information technologies in energy management strategies, providing an overall summary and classification of the practical applications. In addition, after examining the potential challenges associated with artificial intelligence and digital twin, this paper also discusses future trends in this field.
ARTICLE | doi:10.20944/preprints202209.0387.v1
Subject: Engineering, Marine Engineering Keywords: maritime autonomy; autonomous ship; safety; digital twin; deep reinforcement learning; collision avoidance; situational awareness
Online: 26 September 2022 (08:55:58 CEST)
The use of digital twins for the development of Autonomous Maritime Surface Vessels (AMSVs) has enormous potential to resolve the increasing need for water-based navigation and safety at the seas. Aiming at the problem of lack of broad and integrated digital twin implementations with live data along with the absence of a digital twin-driven framework for AMSV design and development, an application framework for the development of a fully autonomous vessel using an integrated digital twin in a 3D simulation environment has been presented. Our framework has four layers to ensure that the simulation and the real-world boat as well as the environment are as close as possible. Åboat, an experimental research platform for maritime automation and autonomous surface ship applications, equipped with two trolling electric motors, cameras, LiDARs, IMU and GPS has been used as the case study to provide a proof of concept. Åboat and its sensors, alongwith the environment have been replicated in a 3D simulation environment. Using the proposed application framework, we develop obstacle detection and path planning systems based on machine learning which leverage live data from a 3D simulation environment to mirror the complex dynamics of the real world.
ARTICLE | doi:10.20944/preprints201802.0049.v1
Subject: Chemistry And Materials Science, Polymers And Plastics Keywords: polymer-matrix composites; natural fiber reinforcement; interface/interphase; microstructural analysis; crystallization behavior; rheological behavior
Online: 6 February 2018 (00:36:44 CET)
To improve the interfacial bonding of sisal fiber reinforced polylactide biocomposites, polylactide (PLA) and sisal fibers (SF) were melt-blended to fabricate bio-based composites via in situ reactive interfacial compatibilization with the addition of an epoxy-functionalized oligomer (ADR). The FTIR analysis and SEM characterization demonstrated that PLA molecular chain was bonded to the fiber surface and epoxy-functionalized oligomer played a hinge-like role between sisal fibers and PLA matrix, which resulted in improved interfacial adhesion between fibers and PLA matrix. The interfacial reaction and microstructures of composites were further investigated by thermal and rheological analyses, which indicated that the mobility of the PLA molecular chain in composites was restricted because of the introduction of ADR oligomer, which in turn reflected the improved interfacial interaction between SF and PLA matrix. These conclusions were further investigated by the calculated activation energies of glass transition relaxation (△Ea) of composites via dynamic mechanical analysis. The mechanical properties of PLA/SF composites were simultaneously reinforced and toughened via addition of ADR oligomer. The interfacial interaction and structure-properties relationship of composites are key points of this study.
ARTICLE | doi:10.20944/preprints202308.0039.v1
Subject: Engineering, Aerospace Engineering Keywords: hypersonic morphing vehicle; predictor-corrector guidance; Q-learning; B-spline curve; Monte Carlo reinforcement learning
Online: 1 August 2023 (08:27:13 CEST)
Aiming at the problem of hypersonic morphing vehicle avoiding no-fly zones and reaching the target, an improved predictor-corrector guidance method is proposed. Firstly, the aircraft motion model and the constraint model are established. Then, the basic algorithm is given, the Q-learning method is used to design the attack angle and sweep angle scheme to ensure that the aircraft can fly over the low-altitude zones. The B-spline curve is used to design the location of flight path points and the bank angle scheme is designed according to the predictor-corrector method, so that the aircraft can fly around to avoid high-altitude zones. Next, Monte Carlo reinforcement learning(MCRL) method is used to improve predictor-corrector method and Deep Neural Network(DNN) is used to fit reward function. The improved method can generate trajectory with better performance. Simulation results verify the effectiveness of the proposed algorithm.
ARTICLE | doi:10.20944/preprints201710.0141.v1
Subject: Engineering, Control And Systems Engineering Keywords: Vertical coal bunker; Coal given chamber; Floor heave; Wall-mounted coal bunker; Reinforcement; Self-bearing system
Online: 20 October 2017 (15:31:57 CEST)
Serious damage caused by floor heave in the coal given chamber of a vertical coal bunker is one of the challenges faced in underground coal mines. Engineering practice shows that it is more difficult to maintain the coal given chamber (CGC) than a roadway. More importantly, repairing the CGC during mining practice will pose major safety risks and reduce production. Based on the case of the serious collapse that occurred in the bearing structure of the CGC at the lower part of the 214# coal bunker in Xiashijie mine, China, this work analysed (i) the main factors influencing floor heave and (ii) the failure mechanism of the load-bearing structure in the CGC using FLAC2D numerical models and expansion experiment. The analysis results indicate that: the floor heave, caused mainly by mine water, is the basic reason leading to the instability and repeated failure of the CGC in the 214# coal bunker. Then a new coal bunker, without building the CGC, is proposed and put into practice to replace the 214# coal bunker. The FLAC3D software program is adopted to establish the numerical model of the wall-mounted coal bunker (WMCB), and the stability of the rock surrounding the WMCB is simulated and analysed. The results show that: (1) the rock surrounding the sandstone segment is basically stable. (2) The surrounding rock in the coal seam segment, which moves into the inside of the bunker, is the main zone of deformation for the entire rock mass surrounding the bunker. Then the surrounding rock is controlled effectively by means of high-strength bolt–cable combined supporting technology. According to the geological conditions of the WMCB, the self-bearing system, which includes (i) H-steel beams, (ii) H-steel brackets, and (iii) self-locking anchor cables, is established and serves as a substitute for the CGC to transfer the whole weight of the bunker to stable surrounding rock. The stability of the new coal bunker has been verified by field testing, and the coal mine has gained economic benefit to a value of 158.026174 million RMB over three years. The new WMCB thus made production more effective and can provide helpful references for construction of vertical bunkers under similar geological conditions.
ARTICLE | doi:10.20944/preprints202309.1644.v1
Subject: Chemistry And Materials Science, Ceramics And Composites Keywords: fly ash; fiber reinforcement; cement stabilization; compressive strength; indirect tensile strength; flexural strength; resilient modulus; subbase and base.
Online: 25 September 2023 (06:36:19 CEST)
It is necessary to address the scarcity of crushed stones for pavement structural layers. So fly ash can be proved to be promising solution as more than 270 tonnes of fly ash is generated in India. Though, numerous research has been conducted for the use of fly ash intreated and untreated form, high volume of fine particles and brittleness of the stabilized fly ash pose a great problem for its use in subbase and base. Moreover, stiffness or modulus of stabilized fly ash is vital elastic parameter which is used for mechanistic pavement design. Hence, in this study an extensive experimental investigation is carried out to study its strength and stiffness properties such as compressive strength, indirect tensile strength and flexural strength, cyclic indirect tensile modulus and flexural modulus of fiber reinforced cement stabilized fly ash, stone dust, aggregate mixtures. The stone dust and aggregates have been added to enhance the gradation of the composite’s mixture. The study presents the effect of fiber on strength and stiffness properties. The experimental result reveals that addition of polypropylene (PP) fibers up to 0.25 wt.% enhances the compressive strength and any further addition of fiber results in decrease of the strength. However, indirect tensile strength and flexural strength increases with increase in fiber percentage up to 0.5 wt.%. Cement content is observed to be the dominant parameter for stabilized materials. Suitable relationships have been developed between strength and modulus parameters for stabilized mixtures. Based on the strength and stiffness study, 70% fly ash and 30% stone dust-aggregate and 60% fly ash and 40% stone dust-aggregate with 6% cement can be considered for the base layer. Based on the indirect tensile strength and flexural strength behavior, 0.35% is considered as the optimum fiber percentage.
Subject: Engineering, Civil Engineering Keywords: structural safety assessment; experimental monitoring; strain transducers; reinforcement; civil engineering; optical fiber sensors; life time structural monitoring; Brillouin
Online: 4 June 2020 (03:54:44 CEST)
This work describes a new transducer prototype for continuous monitoring both in the structural and geotechnical fields. The transducer is synthetically constituted by a wire of optical fiber embedded between two fiber tapes (fiberglass or carbon fiber) and glued by a matrix of polyester resin. The fiber optical wire ends have been connected to a control unit whose detection system is based on Brillouin optical time-domain frequency analysis. Three laboratory tests were carried out to evaluate the sensor's reliability and accuracy. In each experiment, the transducer was applied to a sample of inclinometer casing sets in different configurations and with different constraint conditions. The experimental data collected were compared with theoretical models and with data obtained from the use of different measuring instruments to perform validation and calibration of the transducer at the same time. Several diagrams allow comparing the transducer and highlighting its suitability for monitoring and maintenance of structures. The characteristic of the transducer suggests its use as a mixed system for reinforcing and monitoring, especially in lifetime maintenance of critical infrastructures such as transportation and service networks, and historical heritage.
REVIEW | doi:10.20944/preprints202212.0305.v1
Subject: Computer Science And Mathematics, Robotics Keywords: robotics; cloth-like deformable objects; deep reinforcement learning; deep imitation 12 learning; human-robot interaction; knot theory; general embodied AI
Online: 16 December 2022 (12:51:23 CET)
Manipulating cloth-like deformable objects (CDOs) is a long-standing problem in the robotics community. CDOs are flexible (non-rigid) objects that do not show a detectable level of com- pression strength while two points on the article are pushed towards each otherand include objects such as ropes (1D), fabrics (2D) and bags (3D). In general, CDOs’ many degrees of freedom (DoF) introduce severe self-occlusion and complex state-action dynamics as significant obstacles for perception and manipulation systems. These challenges exacerbate existing issues of modern robotic control methods such as imitation learning (IL) and reinforcement learning (RL). This review focuses on the application details of data-driven control methods on four major task families in this domain: cloth-shaping, rope manipulation, dressing and bag manip- ulation. Furthermore, we identify specific inductive biases in these four domains that present challenges for more general IL and RL algorithms, and summarise the future direction for the development of the field.
ARTICLE | doi:10.20944/preprints202205.0027.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: Additive manufacturing (AM); Wire arc additive manufacturing (WAAM); Weld cladding; Residual stresses; Reinforcement; Hole drilling method; LS-Dyna; Numerical Simulation
Online: 5 May 2022 (08:46:31 CEST)
Cladding is typically used to protect components from wear and corrosion while also improving the aesthetic value and reliability of the substrate. The cladding process induces significant residual stresses due to the temperature difference between the substrate and the clad layer. However, these residual stresses could be effectively utilized by modifying processes and geometrical parameters. This paper introduces a novel methodology for using the weld-cladding process as a cost-effective alternative to various existing reinforcement techniques. The numerical analyses are performed to maximize the reinforcement of a cylindrical tool. The investigation of how the weld cladding develops compressive stresses on the specimen in response to a change in the weld beads and the welding sequence is presented. For the benchmark shape, experimental verification of the numerical model is performed. The impact of the distance between the weld beads and the effect of the tool diameter is numerically investigated. Furthermore, the variation in compressive stresses due to temperature fluctuations during the extrusion process has been evaluated. The results showed that adequate compressive stresses are generated on the welded parts through the cladding process after cooling. Hence, the targeted reinforcement of the substrate can be achieved by optimizing the welding sequence and process parameters.
ARTICLE | doi:10.20944/preprints202108.0018.v1
Subject: Physical Sciences, Radiation And Radiography Keywords: deep reinforcement learning; source search and localization; active search; gamma radiation; source parameter estimation; sequential decision making; non-convex environment}
Online: 2 August 2021 (11:14:24 CEST)
Rapid search and localization for nuclear sources can be an important aspect in preventing human harm from illicit material in dirty bombs or from contamination. In the case of a single mobile radiation detector, there are numerous challenges to overcome such as weak source intensity, multiple sources, background radiation, and the presence of obstructions, i.e., a non-convex environment. In this work, we investigate the sequential decision making capability of deep reinforcement learning in the nuclear source search context. A novel neural network architecture (RAD-A2C) based on the actor critic (A2C) framework and a particle filter gated recurrent unit for localization is proposed. Performance is studied in a randomized 20 x 20 m convex and non-convex environment across a range of signal-to-noise ratio (SNR)s for a single detector and single source. RAD-A2C performance is compared to both an information-driven controller that uses a bootstrap particle filter and to a gradient search (GS) algorithm. We find that the RAD-A2C has comparable performance to the information-driven controller across SNR in a convex environment and at lower computational complexity per action. The RAD-A2C far outperforms the GS algorithm in the non-convex environment with greater than 95% median completion rate for up to seven obstructions.
ARTICLE | doi:10.20944/preprints201907.0311.v1
Subject: Engineering, Automotive Engineering Keywords: Cyber-Physical Systems; reliability assessment; Internet-of-Things; LiDAR sensor; driving assistance; obstacle recognition; reinforcement learning; Artificial Intelligence-based modelling
Online: 28 July 2019 (12:38:28 CEST)
Currently, the most important challenge in any assessment of state-of-the-art sensor technology and its reliability is to achieve road traffic safety targets. The research reported in this paper is focused on the design of a procedure for evaluating the reliability of Internet-of-Things (IoT) sensors and the use of a Cyber-Physical System (CPS) for the implementation of that evaluation procedure to gauge reliability. An important requirement for the generation of real critical situations under safety conditions is the capability of managing a co-simulation environment, in which both real and virtual data sensory information can be processed. An IoT case study that consists of a LiDAR-based collaborative map is then proposed, in which both real and virtual computing nodes with their corresponding sensors exchange information. Specifically, the sensor chosen for this study is a Ibeo Lux 4-layer LiDAR sensor with IoT added capabilities. Implementation is through an artificial-intelligence-based modeling library for sensor data-prediction error, at a local level, and a self-learning-based decision-making model supported on a Q-learning method, at a global level. Its aim is to determine the best model behavior and to trigger the updating procedure, if required. Finally, an experimental evaluation of this framework is also performed using simulated and real data
ARTICLE | doi:10.20944/preprints202306.0883.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Artificial intelligence (AI); classification; cognitive; detection; multibeam multifunction phased array radar (MMPAR); reinforcement learning (RL); swarm; tracking; unmanned aerial vehicles (UAVs)
Online: 13 June 2023 (07:31:23 CEST)
Detecting, tracking, and classifying unmanned aerial vehicles (UAVs) in a swarm presents significant challenges due to their small and diverse radar cross-sections, multiple flight altitudes, velocities, and close trajectories. To overcome these challenges, adjustments of the radar parameters and/or position of the radar (for airborne platforms) are often required during runtime. The runtime adjustments help to overcome the anomalies in the detection, tracking, and classification of UAVs. The runtime adjustments are performed either manually or through fixed algorithms, each of which can have its limitations for complex and dynamic scenarios. In this work, we propose the use of multi-agent reinforcement learning (RL) to carry out the runtime adjustment of the radar parameters and position of the radar platform. The radar used in our work is a multibeam multifunction phased array radar (MMPAR) placed onboard UAVs. The simulations show the cognitive adjustment of the MMPAR parameters and position of the airborne platform using RL helps to overcome anomalies in the detection, tracking, and classification of UAVs in a swarm. A comparison with other artificial intelligence (AI) algorithms shows that RL performs better due to runtime learning of the environment through rewards.
ARTICLE | doi:10.20944/preprints202303.0316.v1
Subject: Engineering, Architecture, Building And Construction Keywords: long-span steel structure truss; variable axial force cable; 3D3S finite element model; joint plate analysis; variable system reinforcement combination stiffness; load domain
Online: 17 March 2023 (03:33:39 CET)
Long-span steel structure trusses are widely used in factory buildings, but with the growth of service time and the increase of dynamic load fatigue, a considerable part of the long-span truss with dynamic load appears serious transverse cracks at the bottom of the middle span and oblique deformation of the abdomen in the operation process. The U-shaped cracks at the bottom and belly, as well as the mid-span down deflection of the main truss, reduced the functional function of the factory building truss structure, and had to limit the original crane load, which affected the normal safety and durability of the structure. Therefore, the application principle of the variable axial force cable system in the long-span factory building truss structure and 3D3S software modeling  were used. Analyzing and studying the reinforcement method of large-span powerhouse truss can provide practical experience for subsequent similar projects. In view of the above phenomenon, the large-span powerhouse trusses of Hongcheng Powerhouse 1 and No.2 located in Tonglu, Zhejiang Province are used as the research object, and the variable axial force cable method is proposed to strengthen and lift the load. Considering the span of large-span powerhouse truss, the cable system with 22m controlling force of 400kN is proposed to be selected for powerhouse 1, and the cable system with variable axial force of 24m is proposed to be selected for Powerhouse 2. The force model of large-span truss is established by using the finite element method commonly used to analyze the force of truss. Under two working conditions, The influence of reinforcement effect is analyzed and compared from three aspects: stiffness, bearing capacity and stability. And the phenomenon of uneven stress distribution is analyzed. The stress distribution characteristics of each node are understood by simulating the most disadvantageous node plates with the greatest internal force before and after reinforcement.