Preprint
Article

This version is not peer-reviewed.

Learning-Based Routing for Autonomous Shuttles Under Stochastic Demand in Urban Mobility Systems Using Generative Adversarial Imitation Learning and Reinforcement Learning

Submitted:

10 February 2026

Posted:

11 February 2026

You are already at the latest version

Abstract
Extensive research has been conducted to develop technologies that enable paratransit systems to operate autonomously, including advanced sensing technologies and associated software. However, there remains a significant gap in research addressing the development of adaptive operational algorithms for such systems in urban environments. Autonomous Shuttles (AS) represent an emerging technology that has gained attention from industry, government, and academia as a novel public transit solution. AS hold the potential to enable Ride-shared Autonomous Mobility on Demand (RAMoD), which can improve accessibility and service equity in transportation-disadvantaged populations across urban and surrounding regions. To address this gap, this study applies an imitation-learning-assisted Deep Reinforcement Learning (DRL) approach to develop a routing method for AS under stochastic and dynamic passenger demand conditions. The proposed framework integrates Generative Adversarial Imitation Learning with Proximal Policy Optimization to enable real-time pickup and drop-off decision-making without centralized re-optimization. The DRL agent was trained over approximately 1.5 million training steps and evaluated across twenty episodes with stochastic passenger generation. Its performance was benchmarked against a deterministic Dial-a-Ride Problem (DARP) solver implemented using Google’s OR-Tools, which employs a Cheapest Insertion heuristic with Local Search refinement. Comparative analysis showed median percentage differences of 37%, –6%, 20%, and 44% in passenger wait time, in-vehicle time, total service time, and episode completion time relative to the DARP baseline. The OR-Tools implementation was selected as a benchmark due to the lack of established step-wise evaluation methods for dynamic routing optimization in simulation environments. These findings demonstrate the potential of learning-based routing policies to support scalable, demand-responsive autonomous mobility services and future smart urban transportation systems.
Keywords: 
;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated