PreprintArticleVersion 2Preserved in Portico This version is not peer-reviewed
Examining the Potential of Generative Language Models for Aviation Safety Analysis: Case Study and Insights using the Aviation Safety Reporting System (ASRS)
Version 1
: Received: 4 July 2023 / Approved: 4 July 2023 / Online: 4 July 2023 (10:08:18 CEST)
Version 2
: Received: 4 July 2023 / Approved: 10 July 2023 / Online: 11 July 2023 (07:13:20 CEST)
Tikayat Ray, A.; Bhat, A.P.; White, R.T.; Nguyen, V.M.; Pinon Fischer, O.J.; Mavris, D.N. Examining the Potential of Generative Language Models for Aviation Safety Analysis: Case Study and Insights Using the Aviation Safety Reporting System (ASRS). Aerospace2023, 10, 770.
Tikayat Ray, A.; Bhat, A.P.; White, R.T.; Nguyen, V.M.; Pinon Fischer, O.J.; Mavris, D.N. Examining the Potential of Generative Language Models for Aviation Safety Analysis: Case Study and Insights Using the Aviation Safety Reporting System (ASRS). Aerospace 2023, 10, 770.
Tikayat Ray, A.; Bhat, A.P.; White, R.T.; Nguyen, V.M.; Pinon Fischer, O.J.; Mavris, D.N. Examining the Potential of Generative Language Models for Aviation Safety Analysis: Case Study and Insights Using the Aviation Safety Reporting System (ASRS). Aerospace2023, 10, 770.
Tikayat Ray, A.; Bhat, A.P.; White, R.T.; Nguyen, V.M.; Pinon Fischer, O.J.; Mavris, D.N. Examining the Potential of Generative Language Models for Aviation Safety Analysis: Case Study and Insights Using the Aviation Safety Reporting System (ASRS). Aerospace 2023, 10, 770.
Abstract
This research investigates the potential application of generative language models, especially ChatGPT, in aviation safety analysis as a means to enhance the efficiency of safety analyses and accelerate the time it takes to process incident reports. In particular, ChatGPT was leveraged to generate incident synopses from narratives, which were subsequently compared with ground truth synopses from the Aviation Safety Reporting System (ASRS) dataset. The comparison was facilitated by using embeddings from Large Language Models (LLMs), with aeroBERT demonstrating the highest similarity due to its aerospace-specific fine-tuning. A positive correlation was observed between synopsis length and their cosine similarity. In a subsequent phase, human factor issues involved in incidents as identified by ChatGPT were compared to human factor issues identified by safety analysts. A concurrence rate of 61% was found, with ChatGPT demonstrating a cautious approach towards attributing human factor issues. Finally, the model was used to attribute incidents to relevant parties. As no dedicated ground truth column existed for this task, a manual evaluation was conducted. ChatGPT attributed the majority of incidents to the Flight Crew, ATC, Ground Personnel, and Maintenance. This study opens new avenues for leveraging AI in aviation safety analysis.
Keywords
Aviation Safety Reporting System; ASRS; Aviation Safety; Human Factors; Large Language Models; LLM; ChatGPT; Generative Language Models; GPT-3.5; aeroBERT; BERT; InstructGPT; Prompt Engineering
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Commenter: Archana Tikayat Ray
Commenter's Conflict of Interests: Author