1. Introduction
The intricately variable and multifaceted climatic conditions, endemic to the expansive marine environment, have perpetually underscored maritime transport as a vocation of considerable risk. Incidents disrupting maritime transit, encompassing a spectrum of occurrences from vessel collisions to groundings, and from onboard fires to devastating explosions, bear the hallmark of low frequency yet are marked by their profoundly destructive aftermath [
1]. Upon the unfortunate manifestation of a maritime accident, a tsunami of undesirable outcomes typically ensues, notably substantial financial loss, a tragic toll of human casualties, or even the insidious onset of extensive environmental pollution [
2]. As an imperative and fundamental step towards mitigating the inherent perils of maritime navigation, a comprehensive exploration into the multifactorial etiology of these traffic mishaps proves indispensable [
3,
4,
5]. Moreover, it becomes increasingly pivotal to meticulously undertake a dynamic risk evaluation, focusing on the myriad facets of maritime operations [
6,
7,
8]. Complementing this, the development and implementation of robust predictive models, which can potentially forecast the calamitous conjunction of human fatality and its probability in the event of an accident, can contribute significantly towards minimizing future maritime disasters [
9,
10,
11,
12].
Over the years, an impressive corpus of scholarly efforts has been devoted to enhancing our understanding of maritime traffic safety, including explorations into accident causation analysis [
3,
4,
5,
13], accident consequence assessment [
14,
15,
16], and accident loss computation [
17,
18]. These endeavors have given rise to an array of innovative evaluation methodologies. In one notable study, Hu et al. [
3] skillfully harnessed the capabilities of the Human Failure Analysis and Classification System (HFACS) in tandem with Structural Equation Modeling (SEM) to disentangle the intricate web of causal factors underpinning Marine Traffic Accidents (MTAs). Chou et al. [
4], in a synergistic integration of technologies, amalgamated the Automatic Identification System, Geographic Information System, and an electronic chart (e-chart) to scrutinize the interplay between environmental factors, geographical locations, and common causes of marine mishaps. By overlaying vessel traffic flows, accident sites, and environmental data on a shared e-chart, their research unfurled valuable insights for port authorities in streamlining ship traffic flow and curtailing the prevalence of marine accidents in the vicinity of ports. Meanwhile, Xue et al. [
5] proffered a comprehensive analytical framework for investigating the peculiarities and causative factors of ship accidents, utilizing a decade's worth of historical data harvested from the capriciously fluctuating backwater expanse of the Three Gorges Reservoir region. Their extensive work yielded a thorough summary and visualization of vessel accident categories and severity, involved vessel types, spatial-temporal distribution characteristics, vessel accident losses, along with the underlying causes and lessons gleaned from pertinent accidents, achieved through a rigorous statistical and comparative analysis of historical data. Elsewhere, Fu et al. [
13] engineered a bivariate probit model to delve into an array of 311 Arctic ship accidents spanning from 1998 to 2017. Their study brought to the fore influential factors such as gross tonnage, ship type, ship age, accident type, accident year, accident location, wind, and sea ice as primary contributors to accident severity. Simultaneously, their research unveiled an intriguing negative correlation between serious accidents and those resulting in pollution. As research on maritime traffic accidents has illuminated a gamut of potential causative factors, an increased granularity of available accident data has spurred a growing number of scholars to concentrate on the ramifications of these mishaps, specifically the evaluation of accident consequences and loss computation. Such undertakings have risen to prominence, particularly in the eyes of managers concerned with incidents that yield significant economic damage and human casualties. For instance, Chen et al. [
15] presented an evidence-based Fuzzy Bayesian Network methodology to erect probabilistic models of marine accidents, thereby enabling the appraisal of accidents likely to spawn severe consequences. In a similar vein, Ventikos and Giannopoulos [
16] introduced a criterion for assessing risk and repercussions within the maritime transport sector from the societal perspective, thereby formulating a novel framework for marine risk assessment that facilitates the comparison of disparate accident scales and characteristics, while accurately mirroring the risk threshold society is prepared to tolerate. Chen et al [
17] pioneered an enhanced entropy weight-TOPSIS model to furnish a holistic analysis and appraisal of marine total loss incidents, encompassing a global scope from 1998 to 2018. These studies, though highly impactful, predominantly undertake analyses either from the standpoint of accident causation or evaluation of accident consequences. Rarely do these scholarly pursuits straddle both domains in a bidirectional inquiry.
In the realm of accident scenario analysis, methodological constructs like event tree analysis and accident tree analysis find frequent utilization in the assembly of traffic accident scenario evolution models [
19,
20,
21,
22,
23]. However, the breadth of most accident-cause analyses often overshadows their specificity, impeding their ability to yield targeted recommendations for forestalling analogous events [
24]. To bridge this gap, scholars may employ a Bayesian network-based maritime accident scenario modeling approach. Bayesian networks stand as a form of probabilistic graphical model, deftly equipped to encapsulate and deliberate over uncertain knowledge and nebulous relationships among variables. This versatile modeling approach, designed to embrace the labyrinthine and dynamic character of maritime activities, excels at discerning the contributory factors precipitating maritime accidents [
3,
5]. Employing a synergistic blend of historical data and expert acumen, this model can approximate both the likelihood of an accident's occurrence and the potential fallout arising from a range of accident scenarios [
14,
16]. Bayesian networks (BN) find broad application in confronting uncertain multi-factor causality inference, accident causation analysis, and scenario prediction, making them invaluable tools in road and waterway transportation sectors [
25,
26,
27,
28,
29,
30,
31,
32]. Various scholars have employed these tools in diverse studies: Zou and Yue [
33] melded probabilistic risk analysis with BN theory to explore the origins of road traffic accidents; Yuan et al. [
34] constructed a scenario-derived prediction model for the repercussions of fire accidents in oil and gas storage and transportation emergency processes, leveraging a defuzzification method and a dynamic BN model. Other researchers, such as Zhao et al. [
35], used the ISM-BN model to assess the impact of varying factors on maritime safety, successfully pinpointing the critical risk components for different accident types. Afenyo et al. [
36] utilized a BN model to sketch an Arctic shipping accident scenario and illuminate the crucial causative elements of a potential accident scenario. Similarly, Jiang et al. [
37] proposed a Bayesian network-based risk analysis strategy for evaluating maritime accidents along the 21st century Maritime Silk Road (MSR), identifying the principal influencing factors to bolster accident prevention measures and ensure maritime transportation's safety and sustainability. In a more focused study, Si et al. [
38] employed a BN structure learning algorithm that paired kernel density estimation with a model weighted average strategy to dissect the causative elements of container ship collisions, basing their analysis on a limited set of container ship collision sample data. Other studies like Fan et al. [
39] and Hänninen et al. [
40] proposed similar Bayesian network-based risk analysis approaches for understanding the contributing factors to maritime transport accidents, with the latter focusing more on maritime safety management and its relationship with maritime traffic safety. Despite these successes, these aforementioned studies suffer from a triad of limitations: 1) a paucity of sample data from maritime accidents, 2) a labor-intensive and time-consuming data collection process, and 3) the inherent difficulty in obtaining accident loss records. Summarily, while waterway transportation research has honed its focus on accident causality reasoning and accident causation analysis, there remains a conspicuous void in the research landscape pertaining to accident scenario modeling.
In light of this, the paper aims to build a BN model for the evolution of maritime accident scenarios using global maritime accident data. The data derive from the Global Integrated Shipping Information System (GISIS) established by the International Maritime Organization (IMO) and have been widely used by scholars in maritime accident studies [
41,
42,
43,
44,
45,
46,
47]. A Bayesian network-based maritime traffic accident scenario modeling approach is proposed to analyze the causes of maritime traffic accidents, perform dynamic risk assessment of shipping activities, and predict the probability of occurrence and consequences of accidents. By identifying the influencing factors and simulating various accident scenarios, the proposed approach can help maritime stakeholders to implement appropriate preventive measures and improve the safety of maritime transportation. The proposed model can predict the most likely causal factors leading to specific accident consequences, thereby providing technical support for the practical direction of shipping safety management strategies.
The rest of this article is structured as follows.
Section 2 briefly introduces the structure and construction method of Bayesian network, and further introduces the method tree augmented network (TAN) driven by Bayesian data.
Section 3 builds the TAN model based on the data of 5660 maritime accidents, and carries out sensitivity analysis and simulation verification on the built model.
Section 4 uses the two-way reasoning ability of TAN model to predict the accident chain and analyze the accident causes. Finally, the fifth part summarizes the full text.
2. BN Structure Learning—TAN
BN is a Directed Acyclic Graph (DAG) composed of nodes and directed edges, which is widely employed to illustrate the interdependence and strength of associations between variables. The network represents the interrelationships between variables through vectorial arcs, with the intensity of each association specified by a table of conditional probabilities.
There are two primary approaches to generating BN structures: 1) the expert knowledge method, and 2) the data-driven method. In the expert knowledge method, the BN structure is built by subjectively evaluating the causal relationships between variables. Conversely, the data-driven method is employed to uncover the interdependence between variables, based on the learning algorithm of the BN model and data correlations. In this study, since sufficient sample data were collected, the data-driven method was used to construct the BN structure.
Data-driven Bayesian approaches can be classified into three main categories: 1) the Naive Bayesian Network (NBN); 2) the Augmented Naive Bayesian Network (ABN); and 3) the Tree Augmented Network (TAN). Among these, TAN learning effectively combines the simplicity and robustness of NBN computation with the ability to characterize interaction dependencies among variables, thus providing insights into the key factors leading to the outcomes of specific accidents [
48]. Therefore, this paper employs the data-driven TAN approach to construct the BN structure.
BN encodes the joint probability distribution over a set of random variables
. Let
, where
denotes the number of influencing factors,
represent the influencing factors, and
is a class variable (accident type). It is established that the set of parent nodes of
in
is empty, meaning
. Moreover,
has at most one other node besides
that can have an associated edge pointing to it. The joint probability density distribution adheres to the following equation:
In the process of learning the TAN structure, Chow and Liu [
49] proposed an approach to optimize and construct the BN structure using the conditional mutual information of each attribute pair. The function is defined as:
where
denotes the conditional mutual information; and
is the
i-th state of the influencing factor
; and
is the
i-th state of the influencing factor
.
3. Global Maritime Accident TAN Model
This section can be divided using subheadings to provide clear organization. It should offer a concise and accurate description of the experimental results, their interpretation, and the conclusions that can be drawn from the experiments.
3.1. Data Collection
This paper utilizes the Marine Casualties and Incidents (MCI) database in GISIS, which is managed by IMO [
50]. GISIS is a comprehensive, global maritime information system. In accordance with IMO regulations, every country with sovereignty over its territorial sea is required to report maritime accidents occurring within its waters to the IMO. The MCI database contains two types of information related to global maritime accidents: first, factual data gathered from various sources; and second, detailed data obtained from casualty investigation reports submitted to the IMO.
The MCI database houses global maritime accident data dating back to 1973. Between 1973 and 2000, the annual number of recorded maritime accidents was quite limited. From 2001 onwards, the number of accidents documented in the MCI database has been more consistent. However, the accident timestamps during 2001-2004 are only accurate to the day, which is not sufficient for studying the specific time periods in which the accidents occurred. Consequently, we excluded the low-quality data from the early years and utilized a total of 5,660 maritime accident records from 2005 to 2020 to construct the BN model.
3.2. Node Variable Definitions
Based on the literature's studies on maritime accident factor analysis [
1,
43,
51,
52], there are 16 primary factors contributing to maritime accidents, including ship type, hull type, ship age, ship length, ship gross tonnage, ship operation, voyage segment, ship speed, ship condition, ship equipment or device condition, ship design, interaction information, weather condition, ocean condition, time period, and channel traffic condition. Combining these factors with the information available in the MCI database, we selected seven node variables for the BN model: accident quarter, accident period, accident type, ship type involved, total tonnage of the ship involved, life loss contingency, and accident severity.
Since BN nodes require discrete variables, it is necessary to discretize continuous variables in the accident statistics. We divided accident occurrence quarters into first quarter (January, February, March), second quarter (April, May, June), third quarter (July, August, September), and fourth quarter (October, November, December). Accident periods were categorized as dawn (0:00-5:59), early morning (6:00-8:59), morning (9:00-11:59), noon (12:00-13:59), afternoon (14:00-16:59), early evening (17:00-19:59), and evening (20:00-23:59). To discretize the gross tonnage of the ships involved, we used the collected data and the Centroid Clustering (CC) algorithm for classification. The CC algorithm employs the minimization error sum of squares as the objective function and terminates when the number of iterations reaches a preset maximum of 5,000 iterations. The optimal classification results yielded four groups based on the gross tonnage of the ships involved: (1-18,500 t), (18,501-57,500 t), (57,501-120,000 t), and (120,001-403,342 t). Among them, 403,342 t represents the maximum total tonnage of the ships involved in the collected data.
Furthermore, this paper classifies non-routine accidents, such as missing ships, life-saving equipment accidents, and numerous accident types with irregular or rare records, accounting for no more than 5%, as "other" [
53]. Multipurpose ships, tugboats, supply and offshore vessels, unspecified ship types, and other ship types, representing no more than 10%, are categorized as "Other" [
53].
Table 1 presents the names, classifications, frequency of occurrence, and percentages of each discrete variable category.
3.3. TAN Modeling
Based on the data processing results, we examined the relationship between six influencing factors and accident consequences. We used Netica software with a "learning network" function to develop a TAN model based on Equation (2), ensuring that all connections between nodes are meaningful. The initial structure of the TAN, depicted in
Figure 1, is grounded in data-driven TAN training results that demonstrate realistic correlations between variables.
Utilizing the TAN model, Netica software employs basis functions to create a structure learning module and a parameter learning module that automatically learn the Conditional Probability Table (CPT) parameters from the sample dataset. By constructing the TAN and obtaining the CPT, we can calculate the posterior probability of each variable. The statistical results of the probabilistic variables facilitate the analysis of maritime safety considerations and aid in accident prevention.
Figure 2 presents the TAN results for the random variables of interest.
3.4. Sensitivity Analysis and Model Validation
3.4.1. Sensitivity Analysis
In the Netica software, we selected the accident type as the target node and conducted a sensitivity analysis on this node to identify the factors with the greatest influence on the target node within the TAN model.
The mutual information value represents the sensitivity level between two random variables; a higher value indicates greater sensitivity of the influencing factor to the target node and, conversely, lower sensitivity. We used the sensitivity analysis function in Netica software to calculate the mutual information value, percentage, and variance for each influencing factor and accident type, as displayed in
Table 2. According to
Table 2, the accident consequence and accident severity are the factors most sensitive to accident type performance, with mutual information values of 0.14246 and 0.14033, respectively, which are notably higher than those of the other four factors. The results reveal that accident consequence and accident severity are the two most intuitive factors for determining the type of accident, followed by ship type, gross tonnage of the ship, time period, and quarter.
3.4.2. Model Validation
To validate the TAN model's effectiveness, we randomly selected three offshore accident cases from 2021, each with varying accident consequences and severities, labeled as events 1, 2, and 3. We input the case data into the model for scenario analysis, and
Table 3 presents the relevant data information for these accident cases.
Based on the data from three randomly selected events, we set the probability of known nodes such as quarter, time period, vessel type, accident consequence, accident severity, and gross tonnage of the vessel to 100%. We then observed the types and probabilities of the predicted accidents. As illustrated in
Table 3, the probability of other accident types occurring in event 1 was 75.1%; the probability of reefing and grounding in event 2 was 38.0%; and the probability of collision in event 3 was 44.4%. When compared to the original data's accident types, the predicted accident types for the three events matched, indicating that the model's predictions are accurate to some extent. Since the occurrence probability of other accident types in the original data is significantly higher than that of collision, reefing, and grounding, the data-driven TAN model's simulation results demonstrate better performance in predicting the occurrence probability of other accident types (e.g.,
Figure 3a) and average results in predicting collision and reefing and grounding accidents (e.g.,
Figure 3b and
Figure 3c).
4. Model Reasoning
The TAN model has the ability to reason bidirectionally and helps explain the most likely scenarios associated with a specific accident type. The data-driven TAN-based model examines the correlations between various influencing factors of maritime accidents and accident types, as well as accident consequences. This analysis enables the prediction of the likelihood of various accident scenarios and the extrapolation of accident consequences under specific conditions.
4.1. Accident Chain Forecast
After using the Netica software to manipulate the TAN model, we obtained the relationships between the influencing factors and the accident type, life loss contingency, and probability of each node. By adjusting the placement bar of a single node or multiple nodes, we observed the target node's probability trend, forming a judgment of the potential trend and consequences of the accident.
We first simulated the parameters of the conditions for maritime accidents by changing a single node and observing the target node's changes. When changing the ship type, more significant changes occur in the probability of each accident type. For example, when the ship type is set to a chemical ship, the probability of fire and explosion accident type increases significantly. When the ship type is a bulk carrier, the probability of collision accident type increases notably. The study shows that different ship types can lead to significant differences in the occurrence of accident types. Additionally, the ship's gross tonnage and the accident's quarter and time also impact the accident type.
Since the accident type is influenced by the joint decision of several nodes, the influence of a single node on the accident type is more one-sided. Therefore, we set the accident quarter to the first quarter (with the variable node's confidence bar set to 100%), the ship type to general cargo ship, and the gross tonnage to 1-18,500 t, as shown in
Figure 4. At this time, the change in accident type and accident severity node probability from early morning to evening is shown in
Figure 5. As seen in
Figure 5, among the types of maritime accidents throughout the day, the probability of fire and explosion on ships is low, except for the afternoon time period, which is 18.2%; the probabilities of ship capsizing, machinery damage, and poor communication are also low, at below 10%. Among other accidents, the probabilities of ship collision and reefing and grounding accidents are significantly higher, around 20%. Additionally, it is observed that the occurrence probability of reefing and grounding accidents is significantly higher during dawn and evening than in other periods.
In summary, the highest probabilities of collision and grounding occur at dawn, with the accident severity being particularly serious. Collision is most likely to occur at noon, with high accident severity. Particularly severe collision and grounding accidents are more likely to occur in the evening. It is worth mentioning that although the probabilities of collision and grounding and stranding of ships are higher in this scenario, the probability of life loss is relatively low, and the accident consequences are less affected by the time of the accidents.
4.2. Accident Cause Analysist
The bidirectional reasoning of the TAN model refers to causal reasoning and diagnostic reasoning. Causal reasoning can be applied to accident chain prediction, while diagnostic reasoning can be applied to accident cause analysis. Diagnostic reasoning aims to determine the type, consequence, and severity of an accident to more intuitively understand the causes and mechanisms of offshore accident formation.
As shown in
Figure 6, when the accident type is determined to be a collision accident, the accident severity is a general accident, and it does not involve life loss. The study shows that the probability of nodes for container ships, general cargo ships, and chemical ships among the ship types is significantly higher than for other ship types, and the probability of collision accidents for these three ship types is higher than for other ship types. Additionally, observation of accident time and ship tonnage shows that ships with a tonnage of 1-18,500 t are more prone to collisions during the dawn hours. Furthermore, the consequences of accidents involving container ships, general cargo ships, and chemical ships of such tonnage indicate a lower probability of causing life loss. This could be because it is easier for personnel to escape from small ships in distress, or because the rescue of small ships in distress is relatively easy and more successful.
5. Conclusions
Ship safety has always been a major concern in the maritime transportation industry. In this paper, a TAN model for maritime traffic accident risk assessment is constructed to analyze the relationship between the consequences of maritime accidents and various influencing factors, and to use model simulation to analyze how different risk factors affect different types of maritime accidents.
The TAN model was constructed based on data from a total of 5,660 maritime accidents from 2005 to 2020. In addition to other accident types, the accident type with the highest probability of occurrence among maritime traffic accidents is collision, followed by grounding and stranding, and then fire and explosion.
The sensitivity analysis and simulation validation of the constructed model showed that accident consequences and accident severity are the two most intuitive factors in determining the type of accident occurrence, followed by ship type, gross tonnage of the ship, time period, and season. The constructed model can effectively predict the likelihood of various accident scenarios and accident consequence projections under specific conditions.
According to the causal reasoning analysis of the TAN model, under the condition of "first quarter, general cargo ship, and ship's gross tonnage of 1-18,500 t," the probability of ship collision and grounding and stranding accidents is higher, while the probability of life loss is relatively low, and the consequences of the accident are less affected by the time of the accident. According to the analysis of model diagnostic reasoning, in the general collision accident chain without loss of life, container ships, general cargo ships, and chemical ships are the main types of ships involved in such accidents. Ships with a tonnage of 1-18,500 t are more likely to have such accidents in the dawn hours, but the probability of causing loss of life is lower.