The proposed ontology-driven process mining approach consists of two main contributions:
Together, these contributions provide the cognitive ability for process mining analytics. Such an approach could outline a potential progression from discovery to conformance checking and, eventually, the enhancement of business processes.
3.1. DIAG meta-model
The DIAG meta-model is shown in
Figure 2. This conceptual model shows an explicit use of a domain-specific ontology to define the process domain and promote interoperability. However, we go beyond just a conceptual model. DIAG meta-model is implemented in the
R.IO-DIAG1 tool.
This meta-model is made up of eight packages, each of which contains a number of classes for tracking patient’s activities utilizing location information. To start the experimentation, for now, we have developed packages such as Healthcare resources, Organization, Objectives, Location Event Logs, Processes, Functions, Healthcare Functions, and Context.
Inside the Context package, the Potential Assignable Causes (PAC) class has 4 inherited sub-classes defined here as environmental causes, equipment-related causes, human-related causes,and rules and procedures. This class and its inherited sub-classes serve as resources for explainability and reasoning rules to diagnose process drifts.
These deviations are a subset of the fact class, which shows unexpected events in patients’ pathways. The context package primarily includes the domain knowledge that healthcare experts can provide prior to the analyzing phase and using process mining methods. While we have outlined four categories to identify potential assignable causes, it’s worth mentioning that this classification is not limited to only four classes. As experiments become more complex, it may be necessary to identify other categories or identify non-linear relationships.
Additionally, we have modeled the objectives package, which includes important classes to evaluate the quality of business processes and patient’s pathways according to the detected drifts. In this package, we have the objective class, which could be realized by the process class. Objectives define the CTQ (Critical To Quality) characteristics that will define KPI (key performance indicators) to evaluate the quality of a process based on identified quality characteristics. This evaluation will be based on certain specification/target values that are defined by the performance objectives of an organization.
The organization class is defined within the organization package, including other important information that should be modeled prior to launching process mining analyses. In this package, we identify the used resources and components that could [help to] run business processes, according to the capacity and competence of the organization.
Furthermore, to better assess the quality of processes, we defined the function package. Within this, we identified a class as function that is the parent to the healthcare functions. This entity contains a value class, which has three sub-classes that identify the value of executing a certain function (i.e., value-added, non-value added, business value-added). This helps to understand if a deviation or an activity exists, how, and to what extent it would impact a process.
Moreover, we have described the model for integrating location data within the location event-logs package and presented how such information could be defined as resources of an organization for executing business processes.
These modeled domain specifications help to augment the capability of process mining analyses to be more cognitive and capable of diagnosing drifts and issues and assessing the quality of a business process.
As mentioned earlier, to assess the effectiveness of this meta-model, we applied it to depict the underlying architecture of a cognitive process mining tool titled
R.IO-DIAG [
8]. This open-source tool is developed by incorporating the meta-model as its core semantic engine to capture the important relationships in an experimental setting.
This meta-model is realized by constructing a knowledge graph, which makes it easier to update and extend the knowledge base. The significance of this meta-model lies in its ability to provide the necessary semantics for conducting process-oriented analyses and diagnoses of business processes. Without such a semantic foundation, it would not be feasible to diagnose unanticipated events in patients’ pathways automatically.
Figure 3 and
Figure 4 provide screenshots of how domain knowledge is modeled in the tool, and an online demonstration video
2 is available for a more detailed understanding of how the meta-model provides interoperability and incorporation of the domain knowledge in process mining analyses and its implementation within the tool’s architecture.
Figure 3 and
Figure 4 serve as illustrative examples to show how the conceptual DIAG meta-model is actually used in practice to provide cognition to process mining analyses. For instance, these figures show how location tags are dedicated as a
resource to each patient to run a
process, and how
functions are modeled. This information is used to enrich conventional process discovery event logs. In the following, we illustrate the DIAG algorithm’s operation in light of this improvement.
3.2. DIAG algorithm
As outlined in
Section 2.3, and shown in
Figure 1, prior to diagnosing a cause, it is necessary to detect any drifts or deviations in the patients’ pathways. To achieve this, the DIAG algorithm is based on the stable heuristic miner algorithm [
25], and it extends the previous capability of the algorithm to take into account the domain knowledge while discovering process models. Once this step is completed, the DIAG algorithm matches the discovered drifts with PACs modeled by the domain experts in R.IO-DIAG. The steps of this method are presented in algorithm 1. A running example in the following subsection illustrates our approach and each step of the algorithm.
3.2.1. An illustrative example
Potential assignable causes (PAC) for organizational actions can be found, and their repercussions can be recorded thanks to the DIAG meta-model. A knowledge graph incorporates this information, with vertices representing the activities and their PACs. An illustration of this domain knowledge can be seen in
Table 2. This method enables a better comprehension of the connections between the various process entities and how PACs may affect healthcare operations.
Now, let’s assume during a data-gathering procedure, an event log like the one below is collected as L:
As shown in the first two lines of algorithm 1, boththe domain knowledge and the event log will be received as inputs. This is a different approach compared to the conventional process discovery methods. The activities will be extracted from the event log, and a data frame of the domain knowledge will be detected and merged into the raw event log for further analysis.
Thanks to the execution of the stable heuristic miner, the two thresholds of (Upper Control Limit) and (Lower Control Limit) are identified. Then, as shown in lines 10 to 19, lists of unstable_activities, deviating_activities and stable_activities are detected.
To diagnose the causes of drifts,
a matrix is generated and placed adjacent to the data, and the algorithm matches the domain knowledge and the data frame of deviating behaviors. This matrix will be used in the extraction of edges/connections among activities and it will help to detect the causes of deviations. Finally, we aim to identify each type of extracted behavior by a different color. This helps domain experts to distinguish between stable behaviors, drifts, and corresponding causes of deviations. Consequently, the DIAG algorithm 1 produces the model depicted in
Figure 5.
The activities and edges that are shown in black are expressing stable behaviorswhich are activities that are typically present in multiple iterations of the process and are considered major activities in the execution of patients’ pathways. The red color indicates activities and edges that exhibit higher variations compared to the normal, stable state of the entire data set. The dashed edges are drifting connections among activities. The activities in green are drifting activities.
Once the ensemble of behaviors has been discovered, we can enhance the model by incorporating information from the knowledge graph associated with each activity, healthcare function, and the observed drift. For instance, the deviation between activity `b’ and activity `k’ corresponds to an environmental cause. The deviation between activity `c’ and `j’ is related to a human error. The edge between activity `c’ and `h’ is related to a change in rules and procedures. When some edges demonstrate 0 values, it means that the modeled PAC did not match these deviations. Simply put, the domain knowledge was not adequate. A case study in a hospital living lab is devised to assess pragmatically the effectiveness and applicability of our method and proposal which will be introduced in the next section.
Algorithm 1 DIAG algorithm |
- 1:
input ;
- 2:
input ;
- 3:
Identify ;
- 4:
DomainKnowledge.df= data.frame(DomainKnowledge[“activity"], DomainKnowledge[“deviation"], DomainKnowledge[“PAC"]);
- 5:
Execute stable heuristic miner
- 6:
Detect & ;
- 7:
unstable_activities=[ ];
- 8:
deviating_activities=[ ];
- 9:
stable_activities=[ ];
- 10:
for i in activities do
- 11:
if i < then
- 12:
deviating_activities= append(i, deviating_activities);
- 13:
else if i >
- 14:
unstable_activities= append(i, unstable_activities);
- 15:
else
- 16:
stable_activities = append(i, stable_activities);
- 17:
end if
- 18:
deviating_behaviors = merge(deviating_activities, unstable_activities)
- 19:
end for
- 20:
Comment: Verifying deviations with the domain knowledge
= as.matrix(merge(
,
), by.x=c(“activity", “deviation"), by.y=c(“from_activity", “to_activity"), all.y = TRUE);
- 21:
stable_nodes = data.frame(stable_activities, attribute_color = “white");
- 22:
deviating_nodes = data.frame(deviating_activities, attribute_color = “green");
- 23:
unstable_nodes = data.frame(unstable_activities, attribute_color = “red");
- 24:
all_nodes = combine(stable_nodes, deviating_nodes, unstable_nodes);
- 25:
- 26:
devise.graph(all_nodes, diagnosis.edges);
|