Submitted:
31 May 2024
Posted:
03 June 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction

- We have explored the challenges and requirements specific to trajectory interpolation for autonomous vehicles, considering the real-time nature and safety criticality of autonomous driving systems.
- We have examined the use of the LSTM and GRU in the travel path restoration of pedestrians, taking into consideration their propensity for managing temporal dependencies and contextual interactions.
- We have designed an encoder-decoder imputation model comprising LSTM blocks for the encoder and GRU blocks for the decoder.
- Our model evaluation has been conducted on the inD dataset, marking one of the earliest instances of research on this dataset for the trajectory imputation task.
2. Related Work
3. Gap Analysis
4. Dataset Observation and Prepossessing
4.1. Dataset Observation
- Preserving naturalistic behavior: By avoiding visible sensors resembling traffic surveillance cameras, the inD dataset ensures road users are not impacted by the measurement method.
- Having satisfactory size: The inclusion of trajectories from thousands of road users provides the necessary size and variety for robust data-driven algorithms.
- Differentiating recording locations and times: The inD dataset captures measurements from multiple sites and diverse times of day, including public roads, ensuring coverage of different road layouts, densities and traffic rules for enhanced signifance in automated driving scenarios.
- Detecting and tracking all road user types: Unlike datasets that limit themselves to specific categories, the inD dataset tracks all road users, recognizes the vital importance of capturing interactions across diverse user types.
- Tracking road users with top accuracy: The inD dataset guarantees trajectories with a positioning inaccuracy of less than 0.1 meters, regardless of road user type, ensuring precision in capturing the intricacies of user movement.
- Including infrastructure details: With the precise recording of road layouts and local traffic rules, the inD dataset acknowledges the dependency of road user behavior on these factors, providing comprehensive information within the dataset.
| Title | Location | # Trajectories | # Locations | Road User Types | Data Frequency | Method |
|---|---|---|---|---|---|---|
| BIWI ETH | University building entry | 360 | 1 | pedestrian | 2.5 Hz | stat. sensor |
| Interaction | Urban intersection | 18642 | 4 | vehicles | 10 Hz | drone/cam. |
| Ko-PER | Urban intersection | 350 | 1 | pedestrian, bicycle, car,truck |
25 Hz | stat. sensor |
| Crowds UCY/ Zara |
Campus, urban street | 909 | 3 | pedestrian | 2.5 Hz | stat. sensor |
| Stanford Drone | Campus | 10240 | 8 | pedestrian, bicycle, car, skateboard, cart, bus |
25 Hz | drone |
| BIWI | Hotel sidewalk, hotel entry | 389 | 1 | pedestrian | 2.5 Hz | stat. sensor |
| VRU Trajectory | Urban intersection | 3278 | 1 | pedestrian, bicycle | 25 Hz | stat. sensor |
| DUT | Campus | 1862 | 2 | pedestrian, vehicles | 23.98 Hz | drone |
| CITR | Designed experiment | 340 | 1 | pedestrian, golf-cart | 29.97 Hz | drone |
| inD | Urban intersection | 13599 | 4 | pedestrian, bicycle, car, truck, bus |
25 Hz | drone |
4.2. Point of Interest Dataset Generation
4.3. Making Incomplete Trajectories
4.3.1. Masking Strategy
- Masking Ratio (): We have introduced a hyperparameter which represents the percentage of data points we want to mask in each trajectory. The value of has been determined during training and can be fine-tuned for optimal performance.
- Random Selection: We have randomly selected segments of the trajectory, with a length proportional to , to be marked as missing. These segments will have to be replaced with a special token denoted as [MASK], indicating the presence of missing data.
4.3.2. Creating Incomplete Trajectories
4.4. Padding and Binary Masking
5. Methodology
5.1. LSTM Encoder-GRU Decoder Architecture with Teacher Forcing
5.1.1. LSTM Encoder
5.1.2. GRU Decoder with Teacher Forcing

5.1.3. Justification for Using GRU
- (1)
- Sequential Modeling: Because each point in a trajectory depends on the ones before it, trajectory data is inherently sequential. GRUs excel at sequential modeling as they maintain a hidden state that evolves, allowing them to capture dependencies in sequential data.
- (2)
- Memory Cells with Gates: GRUs are equipped with memory cells and gating mechanisms, enabling them to reset and update their internal state dynamically. The reset gate allows the model to choose which historical data to disregard, while the update gate determines which fresh data to incorporate. This flexibility mitigates the vanishing gradient issue and facilitates the capture of long-term interdependencies.
- (3)
- Efficient Training: GRUs are designed for computational efficiency, making them suitable for handling lengthy sequences often encountered in trajectory data. The efficiency of GRUs may contribute to quicker convergence during training.
- (4)
- Parameter Efficiency: Compared to LSTM networks, GRUs have fewer parameters, enhancing their parameter efficiency. This characteristic is particularly beneficial when working with trajectory data that may have limited samples. The reduced number of parameters helps prevent overfitting and may lead to better generalization for new trajectories.
5.2. Training
-
OptimizationTo update the model parameters, the Adam optimizer has been utilized. The Adam optimization algorithm has computed adaptive learning rates for each parameter based on their first-order moment estimate (mean) and the second-order moment estimate (uncentered variance). The update rule for the parameters has been given by:where and are the biased first and second-moment estimates, is a small constant to prevent division by zero, and lr is the learning rate.
-
Gradient ClippingGradient clipping has been employed to prevent exploding gradients during backpropagation. This technique involves scaling the gradients if their norm exceeds a predefined threshold (max_grad_norm). The scaled gradient has been computed as follows:
-
Learning Rate SchedulerThe learning rate has been scheduled using the ReduceLROnPlateau(a PyTorch library function which decreases learning rate when a metric has ceased improving) scheduler. This scheduler has adjusted the learning rate if the validation losses plateaus, enabling the model to fine-tune its parameters more effectively. The learning rate update rule has been given by:where is the previous learning rate, and factor is a user-defined factor (default is 0.3).
5.3. Hyperparameters
5.3.1. Model Parameters
- Dimensionality of input trajectories: This parameter has represented the number of features or dimensions in the input data.
- Number of hidden units: The hidden size has determined the ability of the model to grasp and represent temporal dependencies.
- Dimensionality of output trajectories: Similar to input size, this parameter has defined the number of features in the output predictions.
- Number of layers: This parameter has controlled the depth of the recurrent neural network, influencing its ability to model complex patterns.
5.3.2. Training Parameters
- Gradient clipping threshold: For gradients during backpropagation, this value has established the highest permitted norm. It has improved training stability and helped stop gradients from blowing up.
- Maximum gradient norm for scaling gradients: Gradient clipping has been applied to limit the norm of gradients. If the computed norm exceeds, the gradients have been scaled to meet this threshold.
6. Experimental Results and Discussion
6.1. L1 Loss
6.2. MSE Loss
6.3. ADE loss
6.4. Quantile Loss
6.5. Graph Representations of Predicted Trajectory and Ground Truth Trajectory

6.6. Comparison with Other Models
7. Conclusion
Funding
References
- Kalman, R.E. A New Approach to Linear Filtering and Prediction Problems. Journal of Basic Engineering 1960, 82, 35–45, [https://asmedigitalcollection.asme.org/fluidsengineering/article-pdf/82/1/35/5518977/35_1.pdf]. [Google Scholar] [CrossRef]
- Dellaert, F.; Fox, D.; Burgard, W.; Thrun, S. Monte Carlo localization for mobile robots. In Proceedings of the 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C); 1999; Vol. 2, pp. 1322–1328. [Google Scholar] [CrossRef]
- Cristianini, N.; Ricci, E. Support Vector Machines. In Encyclopedia of Algorithms; Kao, M.Y., Ed.; Springer US: Boston, MA, 2008; pp. 928–932. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; https://www.deeplearningbook.org. [Google Scholar]
- Graves, A.; Fernández, S.; Schmidhuber, J. Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition. In Proceedings of the Artificial Neural Networks: Formal Models and Their Applications – ICANN 2005, Berlin, Heidelberg, 2005; Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S., Eds.; pp. 799–804. [Google Scholar]
- Liao, Y.; Lin, R.; Zhang, R.; Wu, G. Attention-based LSTM (AttLSTM) neural network for Seismic Response Modeling of Bridges. Computers & Structures 2023, 275, 106915. [Google Scholar] [CrossRef]
- Ullah, M.; Ullah, H.; Khan, S.D.; Cheikh, F.A. Stacked Lstm Network for Human Activity Recognition Using Smartphone Data. In Proceedings of the 2019 8th European Workshop on Visual Information Processing (EUVIP); 2019; pp. 175–180. [Google Scholar] [CrossRef]
- Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; kin Wong, W.; chun Woo, W. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting, 2015, [arXiv:cs.CV/1506.04214].
- Lea, C.; Vidal, R.; Reiter, A.; Hager, G.D. Temporal Convolutional Networks: A Unified Approach to Action Segmentation. In Proceedings of the Computer Vision – ECCV 2016 Workshops; Hua, G.; Jégou, H., Eds., Cham; 2016; pp. 47–54. [Google Scholar]
- Deng, B.; Yan, J.; Lin, D. Peephole: Predicting Network Performance Before Training, 2017, [arXiv:cs.LG/1712.03351].
- Cao, W.; Wang, D.; Li, J.; Zhou, H.; Li, L.; Li, Y. Brits: Bidirectional recurrent imputation for time series. Advances in neural information processing systems 2018, 31. [Google Scholar]
- Kreindler, D.; Lumsden, C.J. The effects of the irregular sample and missing data in time series analysis. Nonlinear dynamics, psychology, and life sciences 2006, 10, 187–214. [Google Scholar] [PubMed]
- Nawaz, A.; Huang, Z.; Wang, S.; Akbar, A.; AlSalman, H.; Gumaei, A. GPS trajectory completion using end-to-end bidirectional convolutional recurrent encoder-decoder architecture with attention mechanism. Sensors 2020, 20, 5143. [Google Scholar] [CrossRef] [PubMed]
- Zheng, Y. GPS Trajectories with transportation mode labels, 2010.
- Ma, J.; Yang, C.; Mao, S.; Zhang, J.; Periaswamy, S.C.; Patton, J. Human Trajectory Completion with Transformers. In Proceedings of the ICC 2022-IEEE International Conference on Communications. IEEE; 2022; pp. 3346–3351. [Google Scholar]
- Sikeridis, D.; Papapanagiotou, I.; Devetsikiotis, M. CRAWDAD unm/blebeacon, 2022. [CrossRef]
- Fujii, R.; Vongkulbhisal, J.; Hachiuma, R.; Saito, H. A two-block rnn-based trajectory prediction from incomplete trajectory. IEEE Access 2021, 9, 56140–56151. [Google Scholar] [CrossRef]
- Pellegrini, S.; Ess, A.; Schindler, K.; van Gool, L. You’ll never walk alone: Modeling social behavior for multi-target tracking. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. IEEE; 2009; pp. 261–268. [Google Scholar]
- Lerner, A.; Chrysanthou, Y.; Lischinski, D. Crowds by example. In Proceedings of the ACM Transactions on Graphics (TOG). ACM, Vol. 26; 2007; pp. 1–10. [Google Scholar]
- Al-Molegi, A.; Jabreel, M.; Martínez-Ballesté, A. Move, Attend and Predict: An attention-based neural model for people’s movement prediction. Pattern Recognition Letters 2018, 112, 34–40. [Google Scholar] [CrossRef]
- Yang, D.; Zhang, D.; Yu, Z.; Yu, Z. Fine-grained preference-aware location search leveraging crowdsourced digital footprints from LBSNs. In Proceedings of the Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing. ACM, 2013, pp. 479–488.
- Wang, Z.; Zhang, S.; Yu, J.J. Reconstruction of Missing Trajectory Data: A Deep Learning Approach. In Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC); 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Soydaner, D. Attention mechanism in neural networks: where it comes and where it goes. Neural Computing and Applications 2022, 34, 13371–13385. [Google Scholar] [CrossRef]
- Xue Yang, Jin Chen, Q.G.H.G.W.X. Enhanced Spatial–Temporal Savitzky–Golay Method for Reconstructing High-Quality NDVI Time Series: Reduced Sensitivity to Quality Flags and Improved Computational Efficiency. IEEE Transactions on Geoscience and Remote Sensing 2022.
- Pellegrini, S.; Ess, A.; Schindler, K.; van Gool, L. You’ll never walk alone: Modeling social behavior for multi-target tracking. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision; 2009; pp. 261–268. [Google Scholar]
- Robicquet, A.; Sadeghian, A.; Alahi, A.; Savarese, S. Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes. In Proceedings of the Computer Vision – ECCV 2016; Leibe, B.; Matas, J.; Sebe, N.; Welling, M., Eds., Cham; 2016; pp. 549–565. [Google Scholar]
- Yang, D.; Li, L.; Redmill, K.; Ozguner, U. Top-view Trajectories: A Pedestrian Dataset of Vehicle-Crowd Interaction from Controlled Experiments and Crowded Campus. 06 2019, pp. 899–904. [CrossRef]
- Krajewski, R.; Bock, J.; Kloeker, L.; Eckstein, L. The highD Dataset: A Drone Dataset of Naturalistic Vehicle Trajectories on German Highways for Validation of Highly Automated Driving Systems. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC); 2018; pp. 2118–2125. [Google Scholar] [CrossRef]
- Zhan, W.; Sun, L.; Wang, D.; Shi, H.; Clausse, A.; Naumann, M.; Kümmerle, J.; Königshof, H.; Stiller, C.; de La Fortelle, A.; et al. INTERACTION Dataset: An INTERnational, Adversarial and Cooperative moTION Dataset in Interactive Driving Scenarios with Semantic Maps. CoRR, 1910. [Google Scholar]
- Strigel, E.; Meissner, D.; Seeliger, F.; Wilking, B.; Dietmayer, K. The Ko-PER intersection laserscanner and video dataset. 2014 17th IEEE International Conference on Intelligent Transportation Systems, ITSC 2014; 2014; pp. 1900–1901. [Google Scholar] [CrossRef]
- Bock, J.; Krajewski, R.; Moers, T.; Runde, S.; Vater, L.; Eckstein, L. The inD Dataset: A Drone Dataset of Naturalistic Road User Trajectories at German Intersections. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV); 2020; pp. 1929–1934. [Google Scholar] [CrossRef]
- https://github.com/adhocmaster/drone-dataset-tools [Accessed on March 07, 2024].
- Muktadir, G.M.; Ikram, Z.; Whitehead, J. Pedestrian Crossing Dataset Extraction From InD Dataset, 2023. [CrossRef]











| Symbol | Meaning |
|---|---|
| Trajectory of a entity at time t, consisting of spatial coordinates, velocity, and orientation | |
| Position of the entity at time t | |
| Velocity of the entity at time t | |
| Heading of the entity at time t | |
| Complete trajectory comprising all data points | |
| Number of missing points in the trajectory to be imputed | |
| Incomplete trajectory with missing points | |
| Masking ratio for simulating missing data in trajectory sequences | |
| Trajectory of the i-th example | |
| Length of the trajectory | |
| Maximum length across different trajectories | |
| Binary mask indicating observed and missing points in a trajectory | |
| Hidden state sequence of the LSTM encoder | |
| Hidden state sequence of the GRU decoder | |
| Y | Input sequence for the GRU decoder |
| Output sequence of the GRU decoder | |
| Reset gate at time step t | |
| Update gate at time step t | |
| Candidate hidden state at time step t |
| Authors | Method | Dataset | Evaluation Metric | Limitation | Our Situation |
|---|---|---|---|---|---|
| Nawaz et al. | Encoder has collected input data to create a context vector. Decoder has used it for trajectory prediction, aided by Beam Search for sequence optimization. | Geolife Trajectory dataset | Average Displacement Error (ADE) | The method can identify patterns in common scenarios but may fail in complex, crowded trajectory movements. | The inD dataset we have used contains complex, crowded trajectory movements, which our model has successfully addressed. |
| Cao et al. | BRITS-I Algorithm has enhanced predictive accuracy with bidirectional recurrent dynamics, enabling error back-propagation from past and future observations in time-series analysis. | PhysioNet, Beijing PM2.5, Electricity | Mean absolute error | Treating missing values as variables adds uncertainty and noise, potentially diminishing model performance. | Our model has applied data imputation techniques to estimate the missing values based on the available data, thus decreasing the effects of noise and uncertainty. |
| Ma et al. | A transformer-based deep neural network has been used. Multi-head Self-attention mechanism has enabled the transformer to achieve good performance. | BLE data packet generated by 46 participants. | Model training loss, prediction accuracy metrics | The proposed approach may perform well on BLE data, its effectiveness on other types of trajectory data has not been explored. | This problem also persists in our research, because apart from using drone dataset, our model didn’t explore GPS or other kinds of dataset. |
| Almolegi et al. | MAP consists of three core components: RNNs, the Attention Model, and a Softmax Classifier. RNNs capture spatial sequences, generating a high-level representation of historical movement summaries. | Geolife | Recall, F1, and Precision | One notable constraint is the model’s incapacity to handle new or previously unseen locations. | This limitation also lies in our research. We did not check our model’s capacity or incapacity to face unseen locations. |
| Fujii et al. | It has introduced a two-block Recurrent Neural Network (RNN) architecture. | ETH and UCY | ADE and FDE | Introduction of Bayesian filter framework has provided additional complexity, which may have impacted scalability. | Our model for its complex architecture can also suffer from limitation in scalability. |
| Wang et al. | It has leveraged a RNN with encoder-decoder architecture with attention mechanism. | Geolife | Accuracy | Other than accuracy metrics with varying p and lambda, no other evaluation metric has been used. | We have used L1 Loss, MSE Loss, ADE Loss and Quantile training loss to test our model’s correctness from different aspects. |
| Loss Type | Minimum Value |
|---|---|
| L1 Train Loss | 0.2264 |
| L1 Validation Loss | 0.2231 |
| Train Quantile Loss | 0.1686 |
| Validation Quantile Loss | 0.1636 |
| MSE Training | 1.4728 |
| MSE Validation | 1.5025 |
| ADE Test Loss | 4.14 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).