Submitted:
25 August 2024
Posted:
27 August 2024
You are already at the latest version
Abstract

Keywords:
1. Introduction
2. Related Work
3. Attention-Linear Trajectory Prediction
3.1. Problem Formulation
3.2. Overview of the structure
3.3. Attention-Linear Components
3.3.1. Trajectory Series Attention
3.3.2. Map Representation
3.3.3. Interactive Attention
3.3.4. Linear Module
3.4. Training
4. Experiments
4.1. Experimental Settings
4.2. Experimental Results
5. Conclusion
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- Z. Zhou, L. Ye, J. Wang, K. Wu, and K. Lu, "HIVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. [CrossRef]
- J. Mercat, T. Gilles, N. El Zoghby, G. Sandou, D. Beauvois, and G. P. Gil, "Multi-Head Attention for Multi-Modal Joint Vehicle Motion Forecasting," in 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 9638-9644, IEEE, May 2020. [CrossRef]
- C. Yu, X. Ma, J. Ren, H. Zhao, and S. Yi, "Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction," in Proceedings of the European Conference on Computer Vision (ECCV), 2020. [CrossRef]
- N. Nayakanti, R. Al-Rfou, A. Zhou, K. Goel, K. S. Refaat, and B. Sapp, "Wayformer: Motion forecasting via simple & efficient attention networks," in 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 2980-2987, IEEE, 2023. [CrossRef]
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention Is All You Need," in Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS), 2017.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," arXiv preprint arXiv:1810.04805, 2018.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, "Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows," in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012-10022, 2021. [CrossRef]
- L. Dong, S. Xu, and B. Xu, "Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition," in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884-5888, IEEE, 2018. [CrossRef]
- T. Brown et al., "Language Models Are Few-Shot Learners," in Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS), 2020.
- H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, "Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting," in Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2021), Virtual Conference, vol. 35, pp. 11106-11115, AAAI Press, 2021.
- T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin, "FEDformer: Frequency Enhanced Decomposed Transformer for Long-Term Series Forecasting," in Proceedings of the International Conference on Machine Learning (ICML), 2022.
- H. Wu, J. Xu, J. Wang, and M. Long, "Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting," Advances in Neural Information Processing Systems, vol. 34, pp. 22419-22430, 2021.
- A. Zeng, M. Chen, L. Zhang, and Q. Xu, "Are Transformers Effective for Time Series Forecasting?," in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 9, pp. 11121-11128, 2023. [CrossRef]
- Y. Zhang and J. Yan, "Crossformer: Transformer Utilizing Cross-Dimension Dependency for Multivariate Time Series Forecasting," in Proceedings of the International Conference on Learning Representations (ICLR), 2023.
- V. Ekambaram, A. Jati, N. Nguyen, P. Sinthong, and J. Kalagnanam, "Tsmixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting," in Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2023. [CrossRef]
- Y. Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long, "iTransformer: Inverted Transformers are Effective for Time Series Forecasting," arXiv preprint arXiv:2310.06625, 2023.
- M. F. Chang, J. Lambert, P. Sangkloy, J. Singh, S. Bak, A. Hartnett, D. Wang, P. Carr, S. Lucey, D. Ramanan, and J. Hays, "Argoverse: 3D Tracking and Forecasting with Rich Maps," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8748-8757, 2019.
- B. Wilson, W. Qi, T. Agarwal, J. Lambert, J. Singh, S. Khandelwal, B. Pan, R. Kumar, A. Hartnett, J. K. Pontes, and D. Ramanan, "Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting," arXiv preprint arXiv:2301.00493, 2023.
- J. Gao, C. Sun, H. Zhao, Y. Shen, D. Anguelov, C. Li, and C. Schmid, "VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11525-11533, 2020. [CrossRef]
- M. Liang, B. Yang, R. Hu, Y. Chen, R. Liao, S. Feng, and R. Urtasun, "Learning Lane Graph Representations for Motion Forecasting," in Proceedings of the European Conference on Computer Vision (ECCV), 2020. [CrossRef]
- T. N. Kipf and M. Welling, "Semi-supervised classification with graph convolutional networks," arXiv preprint arXiv:1609.02907, 2016.
- K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778. [CrossRef]
- J. L. Ba, J. R. Kiros, and G. E. Hinton, "Layer normalization," arXiv preprint arXiv:1607.06450, 2016.
- K. Hornik, "Approximation capabilities of multilayer feedforward networks," Neural Networks, vol. 4, no. 2, pp. 251–257, 1991. [CrossRef]
- H. Zhao, J. Gao, T. Lan, C. Sun, B. Sapp, B. Varadarajan, Y. Shen, Y. Shen, Y. Chai, C. Schmid, and C. Li, "TNT: Target-driven trajectory prediction," in Conf. Robot Learn., Oct. 2021, pp. 895-904, PMLR.
- J. Mercat, T. Gilles, N. El Zoghby, G. Sandou, D. Beauvois, and G. P. Gil, "Multi-head attention for multi-modal joint vehicle motion forecasting," in 2020 IEEE Int. Conf. Robot. Autom. (ICRA), May 2020, pp. 9638-9644, IEEE. [CrossRef]
- Khandelwal, S., Qi, W., Singh, J., Hartnett, A. and Ramanan, D., 2020. What-if motion prediction for autonomous driving. arXiv preprint arXiv:2008.10587.
- J. Gao, C. Sun, H. Zhao, Y. Shen, D. Anguelov, C. Li, and C. Schmid, "VectorNet: Encoding HD maps and agent dynamics from vectorized representation," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 11525-11533.
- J. Schmidt, J. Jordan, F. Gritschneder, and K. Dietmayer, "CRAT-Pred: Vehicle trajectory prediction with crystal graph convolutional neural networks and multi-head self-attention," in 2022 Int. Conf. Robot. Autom. (ICRA), May 2022, pp. 7799-7805, IEEE.
- J. Ngiam, B. Caine, V. Vasudevan, Z. Zhang, H. T. L. Chiang, J. Ling, R. Roelofs, A. Bewley, C. Liu, A. Venugopal, and D. Weiss, "Scene Transformer: A unified architecture for predicting multiple agent trajectories," arXiv preprint arXiv:2106.08417, 2021.






| Methods | min | min | |
|---|---|---|---|
| Constant Velocity [17] | 3.550 | 7.890 | - |
| NN+map [17] | 3.454 | 7.882 | 0.871 |
| LSTM ED+social [17] | 2.290 | 5.220 | 0.680 |
| TNT [25] | 2.174 | 4.959 | 0.709 |
| Jean [26] | 1.860 | 4.171 | 0.685 |
| WIMP [27] | 1.823 | 4.030 | 0.628 |
| VectorNet [28] | 1.810 | 4.010 | - |
| CRAT-Pred [29] | 1.816 | 4.057 | 0.623 |
| SceneTransformer [30] | 1.810 | 4.055 | 0.592 |
| Attention-Linear | 1.792 | 3.967 | 0.611 |
| Model configuration | Final evaluation metrics | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| TSA | iTSA | Map | LinearM | min | min | min | min | ||
| ✓ | ✗ | ✗ | ✗ | 2.138 | 4.725 | 0.694 | 1.367 | 2.767 | 0.451 |
| ✗ | ✗ | ✗ | ✓ | 2.407 | 5.469 | 0.712 | 1.150 | 2.229 | 0.340 |
| ✓ | ✓ | ✓ | ✗ | 1.803 | 3.803 | 0.617 | 0.957 | 1.655 | 0.212 |
| ✓ | ✗ | ✓ | ✓ | 1.544 | 3.411 | 0.561 | 0.785 | 1.270 | 0.136 |
| ✓ | ✓ | ✓ | ✓ | 1.527 | 3.381 | 0.559 | 0.776 | 1.239 | 0.129 |
| Component configuration | Final evaluation metrics | |||||
|---|---|---|---|---|---|---|
| LinearM | Positional-En | Sparse-Att | Normal-Att | min | min | |
| ✓ | ✗ | ✗ | ✓ | 1.699 | 3.710 | 0.595 |
| ✓ | ✗ | ✓ | ✗ | 1.527 | 3.381 | 0.559 |
| ✗ | ✓ | ✓ | ✗ | 1.619 | 3.471 | 0.587 |
| ✓ | ✓ | ✓ | ✗ | 1.511 | 3.357 | 0.557 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).