1. Introduction
1.1. Research Background and Motivation
The rapid development of autonomous driving technology has brought unprecedented challenges to the prediction and understanding of pedestrian behaviors in mixed traffic environments. While autonomous vehicles have achieved significant advancements in perception, localization, and control systems, accurate prediction of pedestrian trajectory intentions remains a critical challenge for safe autonomous driving operations [
1]. Pedestrian trajectory prediction plays a vital role in autonomous driving systems, contributing to both safety enhancement and planning optimization in dynamic environments.
In mixed traffic environments consisting of autonomous vehicles and pedestrians, it is essential for autonomous vehicles to predict the intentions and trajectories of pedestrians that may pose potential risks [
2]. The uncertainty inherent in human intentions makes accurate prediction of pedestrian trajectories particularly challenging. Traditional trajectory prediction methods based on physical models or simple pattern recognition techniques struggle to capture the complex dynamic interactions between pedestrians and vehicles, limiting their practical applications in real-world scenarios [
3].
Recent advancements in deep learning have enabled more sophisticated approaches to trajectory prediction. These approaches demonstrate superior capabilities in learning complex motion patterns and interaction features from large-scale trajectory datasets. The emergence of attention mechanisms has particularly revolutionized the field of sequence prediction, showing remarkable potential in capturing long-term dependencies and spatial-temporal correlations in pedestrian movements [
4]. The attention mechanism allows models to focus on the most relevant historical trajectory points and surrounding context information, leading to more accurate predictions.
The integration of spatial and temporal attention mechanisms provides a promising framework for understanding both the spatial relationships between pedestrians and vehicles and the temporal evolution of movement patterns. This dual attention approach enables the model to capture not only the immediate spatial context but also the long-term behavioral patterns that influence pedestrian movements. By incorporating both spatial and temporal dependencies, the prediction model can better understand and forecast pedestrian intentions in complex traffic scenarios.
1.2. Research Objectives and Contributions
This research addresses the fundamental challenges in pedestrian trajectory intention prediction through the development of a novel spatio-temporal attention framework. The primary objective is to enhance the accuracy and reliability of pedestrian trajectory predictions in autonomous driving scenarios by effectively modeling the complex interactions between pedestrians and vehicles [
5].
The main contributions of this research are threefold. A spatio-temporal attention mechanism is proposed to encode both spatial and temporal features from historical trajectory data. This mechanism enables the model to capture complex dependencies across different time scales while considering the spatial context of the surrounding environment. The architecture integrates multiple attention heads to process different aspects of the trajectory information simultaneously, allowing for more comprehensive feature extraction.
A novel intention recognition module is developed to explicitly model the relationship between trajectory patterns and pedestrian intentions. This module leverages the encoded spatio-temporal features to classify different types of movement intentions, providing valuable information for subsequent trajectory generation. The intention recognition results are incorporated into the trajectory prediction process, improving the accuracy and interpretability of the predictions.
The research introduces an end-to-end trainable network architecture that combines intention recognition with trajectory prediction. This unified framework allows for joint optimization of both tasks, leading to improved overall performance. The model architecture is designed to be computationally efficient while maintaining high prediction accuracy, making it suitable for real-time applications in autonomous driving systems.
Extensive experiments on benchmark datasets demonstrate the effectiveness of the proposed approach. The model achieves significant improvements in both prediction accuracy and intention recognition compared to existing state-of-the-art methods [
6]. The experiments include comprehensive ablation studies that validate the contribution of each component in the proposed architecture. The results highlight the importance of integrating spatio-temporal attention mechanisms with intention recognition for accurate trajectory prediction.
This research advances the field of pedestrian trajectory prediction by introducing a novel architecture that effectively combines spatio-temporal attention mechanisms with intention recognition. The proposed approach provides a practical solution for autonomous driving systems, contributing to improved safety and efficiency in mixed traffic environments. The findings of this research lay the foundation for future developments in intention-aware trajectory prediction systems.
2. Related Work
2.1. Traditional Methods for Pedestrian Trajectory Prediction
Traditional approaches to pedestrian trajectory prediction have primarily focused on physics-based models and pattern recognition techniques. The Social Force Model (SFM) represents a fundamental framework in this domain, modeling pedestrian movements through attractive and repulsive forces. Under the SFM paradigm, pedestrians are influenced by their destination goals through attractive forces, while obstacles and other agents generate repulsive forces [
7]. This physics-based approach has demonstrated effectiveness in simulating basic pedestrian behaviors and interactions in controlled environments.
Pattern recognition-based methods have extended beyond simple physical models by incorporating statistical analysis and machine learning techniques. These approaches typically extract hand-crafted features from historical trajectories and apply various statistical models to predict future positions. Gaussian Process models have been applied to capture the uncertainty in pedestrian movements, providing probabilistic predictions of future trajectories [
8]. Hidden Markov Models (HMMs) and Kalman Filters have also been employed to model the sequential nature of pedestrian movements.
Traditional methods have established important foundational concepts in trajectory prediction, including the consideration of social interactions and environmental constraints. The incorporation of behavioral models and social rules has enhanced the ability to predict realistic pedestrian movements. These methods have also highlighted the importance of considering both individual goals and collective behaviors in trajectory prediction tasks.
2.2. Deep Learning-Based Methods for Trajectory Prediction
The advent of deep learning has revolutionized pedestrian trajectory prediction by enabling more sophisticated feature extraction and pattern recognition capabilities. Recurrent Neural Networks (RNNs), particularly Long Short-Term Memory (LSTM) networks, have emerged as powerful tools for sequence modeling in trajectory prediction [
9]. These architectures can capture complex temporal dependencies in pedestrian movements and learn intricate patterns from large-scale trajectory datasets.
Attention mechanisms have significantly advanced the state-of-the-art in trajectory prediction. The integration of spatial and temporal attention allows models to focus on relevant historical information and spatial contexts dynamically. Multi-head attention architectures have demonstrated superior performance in capturing different aspects of motion patterns simultaneously. The transformer architecture, built upon self-attention mechanisms, has enabled more effective modeling of long-term dependencies in trajectory sequences.
Social interaction modeling has become a crucial component in deep learning approaches. Graph Neural Networks (GNNs) have been employed to model the relationships between pedestrians and their surrounding agents explicitly. These models can capture complex social interactions and spatial dependencies through message passing between nodes in the graph structure. The incorporation of social pooling layers has enabled the aggregation of information from neighboring agents, improving the prediction accuracy in crowded scenarios.
Recent advances have focused on developing end-to-end trainable architectures that combine multiple components for trajectory prediction. Generative models, including Conditional Variational Autoencoders (CVAEs) and Generative Adversarial Networks (GANs), have been introduced to model the multimodal nature of future trajectories [
10]. These approaches can generate diverse and realistic trajectory predictions by learning the underlying distribution of pedestrian movements.
The integration of intention recognition with trajectory prediction has emerged as a promising direction in deep learning approaches. Models that explicitly consider pedestrian intentions have demonstrated improved prediction accuracy by capturing higher-level behavioral patterns. The combination of intention recognition modules with trajectory prediction networks has enabled more interpretable and accurate predictions.
Deep learning methods have also addressed the challenges of real-time prediction in autonomous driving scenarios. Efficient network architectures and optimization techniques have been developed to meet the computational constraints of real-world applications. The incorporation of domain knowledge and physical constraints into deep learning models has improved the robustness and reliability of trajectory predictions.
Research in deep learning-based trajectory prediction continues to evolve, with increasing focus on developing more sophisticated architectures that can handle complex scenarios and interactions. The combination of multiple deep learning techniques and the integration of traditional insights have led to significant improvements in prediction accuracy and robustness.
3. Methodology
3.1. Problem Definition and Framework Overview
The pedestrian trajectory intention prediction problem in autonomous driving scenarios can be formulated as a spatio-temporal sequence prediction task. Given historical trajectory observations Xt = {x1, x2, ..., xt} and surrounding context information St = {s1, s2, ..., st}, the goal is to predict both the movement intention I and future trajectory positions Yt+1:t+n = {yt+1, yt+2, ..., yt+n}. Each trajectory point xt consists of position coordinates (px, py), velocity (vx, vy), and acceleration (ax, ay) in a 2D space.
Table 1 presents the key notations used throughout the methodology description:
The proposed framework consists of five main components, as illustrated in
Figure 1:
The figure presents a multi-component architecture diagram showing the data flow from input trajectory sequences through various processing modules. The diagram uses different colored blocks for each component, with arrows indicating the information flow. The key components include the spatio-temporal feature extractor (blue), multi-head attention module (green), intention recognition module (yellow), and trajectory generator (red).
The architecture demonstrates how raw trajectory data is processed through multiple attention layers before being split into intention recognition and trajectory prediction branches. The visualization emphasizes the parallel processing of spatial and temporal information streams.
3.2. Spatio-Temporal Feature Extraction Module
The spatio-temporal feature extraction module employs a hierarchical structure to capture both local and global motion patterns.
Table 2 outlines the architecture details of this module:
The extracted features form a comprehensive representation matrix F ∈ R^(128×T), incorporating both spatial and temporal information. The effectiveness of different feature combinations is shown in
Table 3:
3.3. Multi-Head Attention Mechanism for Trajectory Encoding
The multi-head attention mechanism processes the extracted features through H parallel attention heads.
Figure 2 illustrates the detailed structure of the attention mechanism:
This figure shows a complex multi-head attention mechanism with parallel processing streams. The visualization includes attention weight matrices, feature transformation paths, and the concatenation process. Matrix multiplication operations and softmax normalizations are represented through color-coded arrows and blocks.
The attention mechanism calculates the importance of different time steps through scaled dot-product attention:
αh = softmax(QhKh^T/√dk)Vh
where Qh, Kh, and Vh represent query, key, and value matrices for head h, respectively.
Table 4 shows the attention head configurations:
3.4. Intention Recognition Module
The intention recognition module processes the attention-encoded features through a specialized network structure.
Figure 3 provides a detailed visualization of this module:
The figure presents a detailed network architecture for intention recognition, showing multiple processing layers with skip connections. The visualization includes feature dimension transformations, activation functions, and the final classification layer. Different colored blocks represent various processing stages.
The intention recognition performance for different movement categories is presented in
Table 5:
3.5. Trajectory Generation Network
The trajectory generation network combines the recognized intention with encoded features to produce future trajectory predictions. The network architecture employs a sequence-to-sequence structure with attention-based decoding. Generated trajectories are sampled using a mixture density network output layer, producing a multi-modal distribution of possible future trajectories [
11].
The training process optimizes a combined loss function:
where Lint represents the intention classification loss, Ltraj denotes the trajectory prediction loss, and Lreg is a regularization term. The loss weights λ1, λ2, and λ3 are empirically set to balance different objectives.
4. Experiments and Results
4.1. Datasets and Implementation Details
The proposed model has been evaluated on two widely-used public datasets: ETH-UCY and Stanford Drone Dataset (SDD). The ETH-UCY dataset contains 5 subsets of pedestrian trajectories captured in different scenarios, with a total of 1,536 pedestrians and approximately 32,000 trajectory samples. The SDD dataset provides complex interactions between pedestrians and vehicles in a campus environment, including 185,000 trajectory samples from 6 different locations.
Table 6 presents the detailed statistics of the experimental datasets:
The implementation details of our model are specified in
Table 7:
4.2. Evaluation Metrics
The model performance is evaluated using multiple metrics to assess both trajectory prediction accuracy and intention recognition capability. The primary metrics include Average Displacement Error (ADE), Final Displacement Error (FDE), and Intention Recognition Accuracy (IRA).
Figure 4 illustrates the evaluation process and metric calculations:
The figure presents a comprehensive visualization of the evaluation metrics calculation process. Multiple colored trajectories represent predicted paths, while black lines show ground truth trajectories. The visualization includes error measurements at different time steps and intention classification results.
The diagram demonstrates how ADE and FDE are calculated by measuring the distances between predicted and actual trajectories at various time points.
4.3. Comparison with State-of-the-Art Methods
The proposed model has been compared with several state-of-the-art methods, and the results are presented in
Table 8:
Figure 5 shows the comparative analysis of prediction accuracy across different time horizons:
This visualization presents a multi-line graph showing the prediction error curves for different methods across various time horizons. The x-axis represents prediction time steps (0.5s to 4.0s), while the y-axis shows the displacement error. Different colored lines represent various methods, with confidence intervals shown as shaded regions.
The graph demonstrates the superior performance of our method, particularly in long-term predictions.
4.4. Ablation Studies
Comprehensive ablation studies have been conducted to analyze the contribution of each component.
Table 9 presents the detailed results:
4.5. Qualitative Analysis
Figure 6 presents qualitative results in various challenging scenarios:
The figure shows a complex multi-panel visualization comparing predicted trajectories with ground truth in different scenarios. Each panel represents a different challenging case, including crowded scenes, intersections, and interaction scenarios. Predicted trajectories are shown with uncertainty estimates, and intention recognition results are visualized through color-coded overlays.
The visualization demonstrates the model’s ability to handle various complex scenarios and generate accurate predictions with appropriate uncertainty estimates.
Table 10 provides detailed analysis of prediction performance in different environmental conditions:
The experimental results demonstrate the robustness and effectiveness of the proposed method across various scenarios and conditions. The spatio-temporal attention mechanism shows particular effectiveness in handling complex interactions between pedestrians and vehicles, while the intention recognition module significantly improves prediction accuracy in scenarios involving direction changes or complex maneuvers [
12].
5. Conclusion and Future Work
5.1. Summary of Contributions
This research presents a novel spatio-temporal attention mechanism for pedestrian trajectory intention prediction in autonomous driving scenarios. The proposed framework demonstrates significant improvements in both prediction accuracy and computational efficiency compared to existing state-of-the-art methods. The integration of multi-head attention mechanisms with intention recognition capabilities has proven effective in capturing complex pedestrian behaviors and interactions in mixed traffic environments.
The experimental results validate the effectiveness of our approach across multiple datasets and scenarios. The model achieves an average displacement error reduction of 12.8% compared to existing methods, while maintaining real-time performance capabilities suitable for autonomous driving applications [
13,
14]. The intention recognition module demonstrates robust performance with an accuracy of 93% across various environmental conditions and interaction scenarios [
15].
The architecture’s modular design enables effective feature extraction and representation learning at multiple scales. The spatio-temporal feature extraction module successfully captures both local motion patterns and global interaction contexts, providing a comprehensive understanding of pedestrian behaviors [
16]. The multi-head attention mechanism demonstrates superior capability in modeling complex dependencies between historical trajectories and environmental factors [
17].
The research contributions advance the field of pedestrian trajectory prediction through several key innovations. The proposed attention mechanism effectively combines spatial and temporal information, enabling more accurate long-term predictions. The intention recognition module provides interpretable results while improving overall prediction accuracy [
18]. The framework’s computational efficiency makes it practical for real-world autonomous driving applications.
5.2. Limitations and Future Research Directions
Despite the demonstrated effectiveness of the proposed approach, several limitations and potential areas for improvement have been identified. The current model performance shows degradation in extremely crowded scenarios where multiple interactions occur simultaneously [
19]. The prediction accuracy also decreases in scenarios with unusual pedestrian behaviors or rare interaction patterns not well represented in the training data [
20].
Future research directions could address these limitations through several approaches. The development of more sophisticated attention mechanisms could improve the model’s ability to handle complex multi-agent interactions [
21]. Advanced techniques for modeling group behaviors and collective motion patterns could enhance prediction accuracy in crowded environments. The integration of additional contextual information, such as detailed environmental semantics and traffic rules, could provide more comprehensive understanding of pedestrian intentions [
22].
The exploration of adaptive prediction horizons based on scene complexity and interaction dynamics presents another promising research direction. Dynamic adjustment of model parameters according to environmental conditions could improve both prediction accuracy and computational efficiency [
23]. The investigation of uncertainty estimation techniques could provide more reliable confidence measures for predicted trajectories [
24].
Extended research could focus on the generalization of the proposed framework to different types of road users and varying environmental conditions. The development of transfer learning techniques could enable efficient adaptation to new scenarios with minimal additional training. The integration of the prediction framework with downstream planning and control systems presents opportunities for end-to-end optimization of autonomous driving behaviors [
25].
The incorporation of additional sensor modalities, such as RGB cameras and LiDAR data, could provide richer environmental understanding and improve prediction accuracy [
26]. Multi-modal fusion techniques could enable more robust feature extraction and representation learning. The development of explainable prediction models could enhance trust and interpretability in autonomous driving systems.
Long-term research objectives include the development of prediction models capable of handling rare events and anomalous behaviors. Advanced training techniques utilizing synthetic data generation could address the scarcity of unusual interaction scenarios in real-world datasets. The investigation of continual learning approaches could enable model adaptation to evolving traffic patterns and behavioral changes.
The research findings establish a foundation for future developments in pedestrian trajectory prediction systems, while highlighting important challenges and opportunities in the field [
27,
28]. The continued advancement of these technologies will play a crucial role in improving the safety and efficiency of autonomous driving systems in mixed traffic environments [
29].
6. Acknowledgment
I would like to extend my sincere gratitude to Hangyu Xie, Yining Zhang, Zhongwen Zhou, and Hong Zhou for their groundbreaking research on privacy-preserving medical data analysis as published in their article titled “Privacy-Preserving Medical Data Collaborative Modeling: A Differential Privacy Enhanced Federated Learning Framework” [30]. Their innovative insights into privacy-preserving attention mechanisms have significantly influenced my understanding of spatio-temporal feature extraction and have provided valuable inspiration for my trajectory prediction research.
I would like to express my heartfelt appreciation to Zhongwen Zhou, Siwei Xia, Mengying Shu, and Hong Zhou for their pioneering study on medical image analysis using deep learning approaches, as published in their article titled “Fine-grained Abnormality Detection and Natural Language Description of Medical CT Images Using Large Language Models” [31]. Their comprehensive analysis of multi-head attention mechanisms and feature extraction techniques have significantly enhanced my knowledge of interaction modeling, inspiring the development of my research methodology.
References
- Gao, K. Li, X., Chen, B., Hu, L., Liu, J., Du, R., & Li, Y. (2023). Dual transformer based prediction for lane change intentions and trajectories in mixed traffic environment. IEEE Transactions on Intelligent Transportation Systems 24, 6203–6216.
- Xue, Q., Zhang, Z., Liu, S., Guo, P., Liu, Q., Wang, Q., & Zhao, J. (2024, September). Evaluation on Backbones for Pedestrian Trajectory Prediction. In 2024 4th International Conference on Computer Science and Blockchain (CCSB) (pp. 496-499). IEEE.
- Wang, C., Li, H., & Lu, W. (2022). Fast prediction of vehicle driving intentions and trajectories based on lightweight methods. IEEE Journal of Radio Frequency Identification, 6, 917-921.
- Liu, S., Zhu, Y., Yao, P., Mao, T., & Wang, Z. (2024, April). SpectrumNet: Spectrum-Based Trajectory Encode Neural Network for Pedestrian Trajectory Prediction. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7075-7079). IEEE.
- Golchoubian, M., Ghafurian, M., Dautenhahn, K., & Azad, N. L. (2023). Pedestrian trajectory prediction in pedestrian-vehicle mixed environments: A systematic review. IEEE Transactions on Intelligent Transportation Systems.
- Real-time Anomaly Detection in Dark Pool Trading Using Enhanced Transformer NetworksGuanghe, C., Zheng, S., & Liu, Y. (2024). Real-time Anomaly Detection in Dark Pool Trading Using Enhanced Transformer Networks. Journal of Knowledge Learning and Science Technology ISSN: 2959-6386 (online), 3(4), 320-329.
- Guanghe, C., Zheng, S., & Liu, Y. (2024). Real-time Anomaly Detection in Dark Pool Trading Using Enhanced Transformer Networks. Journal of Knowledge Learning and Science Technology ISSN: 2959-6386 (online), 3(4), 320-329.
- Chen, J., Yan, L., Wang, S., & Zheng, W. (2024). Deep Reinforcement Learning-Based Automatic Test Case Generation for Hardware Verification. Journal of Artificial Intelligence General science (JAIGS) ISSN: 3006-4023, 6(1), 409-429.
- Zhang, Haodong, et al. “Enhancing facial micro-expression recognition in low-light conditions using attention-guided deep learning.” Journal of Economic Theory and Business Management 1.5 (2024): 12-22.
- Ma, X., Lu, T., & Jin, G. AI-Driven Optimization of Rare Disease Drug Supply Chains: Enhancing Efficiency and Accessibility in the US Healthcare System.
- Ma, D., Jin, M., Zhou, Z., & Wu, J. Deep Learning-Based ADLAssessment and Personalized Care Planning Optimization in Adult Day Health Centers.
- Ju, C. , Liu, Y., & Shu, M. Performance Evaluation of Supply Chain Disruption Risk Prediction Models in Healthcare: A Multi-Source Data Analysis.
- Lu, T. , Zhou, Z., Wang, J., & Wang, Y. (2024). A Large Language Model-based Approach for Personalized Search Results Re-ranking in Professional Domains. The International Journal of Language Studies (ISSN: 3078-2244), 1(2), 1-6.
- Yan, L. , Zhou, S., Zheng, W., & Chen, J. (2024). Deep Reinforcement Learning-based Resource Adaptive Scheduling for Cloud Video Conferencing Systems.
- Chen, J. , Yan, L., Wang, S., & Zheng, W. (2024). Deep Reinforcement Learning-Based Automatic Test Case Generation for Hardware Verification. Journal of Artificial Intelligence General science (JAIGS) ISSN: 3006-4023, 6(1), 409-429.
- Yu, P. , Xu, Z., Wang, J., & Xu, X. (2025). The Application of Large Language Models in Recommendation Systems. arXiv:2501.02178.
- Yi, J. , Xu, Z., Huang, T., & Yu, P. (2025). Challenges and Innovations in LLM-Powered Fake News Detection: A Synthesis of Approaches and Future Directions. arXiv:2502.00339.
- Huang, T. , Xu, Z., Yu, P., Yi, J., & Xu, X. (2025). A Hybrid Transformer Model for Fake News Detection: Leveraging Bayesian Optimization and Bidirectional Recurrent Unit. arXiv:2502.09097.
- Wang, J. , Xu, X., Yu, P., & Xu, Z. (2025). Hierarchical Multi-Stage BERT Fusion Framework with Dual Attention for Enhanced Cyberbullying Detection in Social Media.
- Huang, T. , Yi, J., Yu, P., & Xu, X. (2025). Unmasking Digital Falsehoods: A Comparative Analysis of LLM-Based Misinformation Detection Strategies.
- Liang, X., & Chen, H. (2024, July). One cloud subscription-based software license management and protection mechanism. In Proceedings of the 2024 International Conference on Image Processing, Intelligent Control and Computer Engineering (pp. 199-203).
- Xu, J. , Wang, Y., Chen, H., & Shen, Z. (2025). Adversarial Machine Learning in Cybersecurity: Attacks and Defenses. International Journal of Management Science Research, 8(2), 26-33.
- Chen, H. , Shen, Z., Wang, Y., & Xu, J. (2024). Threat Detection Driven by Artificial Intelligence: Enhancing Cybersecurity with Machine Learning Algorithms.
- Xu,J.;Chen,H.;Xiao,X.;Zhao,M.;Liu,B. (2025).Gesture Object Detection and Recognition Based on YOLOv11.Applied and Computational Engineering,133,81-89.
- Weng, J., & Jiang, X. (2024). Research on Movement Fluidity Assessment for Professional Dancers Based on Artificial Intelligence Technology. Artificial Intelligence and Machine Learning Review, 5(4), 41-54.
- Jiang, C. , Jia, G., & Hu, C. (2024). AI-Driven Cultural Sensitivity Analysis for Game Localization: A Case Study of Player Feedback in East Asian Markets. Artificial Intelligence and Machine Learning Review, 5(4), 26-40.
- Ma, D. (2024). AI-Driven Optimization of Intergenerational Community Services: An Empirical Analysis of Elderly Care Communities in Los Angeles. Artificial Intelligence and Machine Learning Review, 5(4), 10-25.
- Ma, D. , & Ling, Z. (2024). Optimization of Nursing Staff Allocation in Elderly Care Institutions: A Time Series Data Analysis Approach. Annals of Applied Sciences, 5(1).
- Zheng, S. , Zhang, Y., & Chen, Y. (2024). Leveraging Financial Sentiment Analysis for Detecting Abnormal Stock Market Volatility: An Evidence-Based Approach from Social Media Data. Academia Nexus Journal, 3(3).
- Xie, H. , Zhang, Y., Zhongwen, Z., & Zhou, H. (2024). Privacy-Preserving Medical Data Collaborative Modeling: A Differential Privacy Enhanced Federated Learning Framework. Journal of Knowledge Learning and Science Technology ISSN: 2959-6386 (online), 3(4), 340-350.
- Zhou, Z. , Xia, S., Shu, M., & Zhou, H. (2024). Fine-grained abnormality detection and natural language description of medical CT images using large language models. International Journal of Innovative Research in Computer Science & Technology, 12(6), 52-62.
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).