3.1. Imprecise Dirichlet Model
The Imprecise Dirichlet Model (IDM) is an advanced Bayesian statistical approach for handling uncertainty, serving as an extension of the Dirichlet distribution [
11] for estimating probability distributions under conditions of insufficient information. Consider a multinomial distribution with N possible outcomes, whose Dirichlet prior probability density function (PDF) is:
In the formula, θ=(θ1,θ2,…,θN)represents the probability of the result occurring, so that 0≤θn≤1 (n =1,2,…,N) and =1,α1, α2, …,αN represents the positive parameter of the Dirichlet distribution, andΓ(⋅)represents the gamma function, which is often used in statistics to represent the probability distribution of random variables.
Further, when the sample observation value M is obtained, the original prior Dirichlet probability density function is updated by Bayes' theorem. This updating process produces the posterior Dirichlet probability density function, which reflects the reevaluation of the parameters after taking into account the actual observed data[
12]. The specific posterior probability density function form is shown in equation (2) :
In the formula,
M={
m1,
m2,...,
mn}is sample observation;
mi represents the number of occurrences of the random variable state
i. After obtaining the posterior probability density function, the parameter
θn is estimated using the posterior distribution expected value, i.e
When analyzing the estimated results of a deterministic Dirichlet model, if observations are lacking, the probability
θn of the n th result is determined by the parameter α, i.e.
, where the parameter
α is called the prior weight of the result, often expressed as the parameter s, and is called the equivalent sample size in the Dirichlet distribution. In the probability estimation process,
s represents the influence of prior distribution on posterior probability[
13], that is, the larger the value of
s, the more observed values are needed to adjust the parameters of prior distribution[
14]. As shown in equation (3), one disadvantage of using the deterministic Dirichlet model is that when the available observations are small, the estimated results will be significantly affected by the prior distribution. If the setting is not reasonable, the estimated results based on the deterministic Dirichlet model may become inaccurate, thus affecting the final decision and prediction.
To overcome the shortcomings of the deterministic Dirichlet model, IDM uses a series of Dirichlet prior distributions instead of a single Dirichlet distribution. In IDM, the corresponding prior probability density function can be expressed as:
In the formula, rn (n = 1, 2, …, N)is the n th prior weight factor, and in equation (4) , s﹒rn has the same effect as αn. When rn varies within the interval [0,1], f(θ)will contain all possible prior PDFS for a given predetermined s, thus avoiding unreasonable effects of prior values.
Then, according to the updating process of Bayes' rule, a posterior PDF of the IDM relative to the observed value M can be calculated, which can be expressed as:
In the formula, represents the total number of observations.
Thus, a parameter representing the interval valued probabilities of all outcomes in the IDM,
,
, …,
can be estimated from the posterior PDF by calculating the expected value, as follows:
The expected boundary shown in equation (6) is calculated with respect to the boundary of
rn, namely 0 and 1[
15]. Thus, the imprecise probability of random variable state occurrence in a given case can be estimated based on small sample data. The IDM statistical model eliminates the adverse effects of unreasonable prior Settings on event probability estimation in the absence of sample size.
3.2. Naive Credal Classifier
Naive Credal Classifier (NCC) is a classifier based on Naive Bayes (NB) that enhances the robustness of the model by introducing imprecise probabilities. The core idea of NCC is to provide more robust classification results by using a set of prior probabilities to model uncertainty in the face of incomplete or small-scale data sets, i.e. multiple possible categories can be returned in the face of uncertain instances. The Bayesian framework learns to update the prior with a profile representing the data evidence to calculate a posterior probability that can be used for decision making[
16]. Formally, a classifier is a function that maps instances of a set of variables (called attributes or features) to the state or class of a class variable. The credal network uses the classical Bayesian network theory inference method to calculate the state value
Xc of
xc, and then by observing the specific value
XE existing in the evidence variable
xE, the probability
P(
xc|
xE) can be calculated as follows:
In the formula, I is the number of multi-state random variables in the Bayesian network, P(xi|πi) is the conditional probability quality function, xi is the observation value of the i th random variable Xi, XiX, X represents all random variables in the network, πi is an observation value of Пi, which represents the state of the parent node of Xi, XM=X\(XEXc), represents a full probability operation on different states of variables in the node variable set XM.
Bayes classifiers perform classification by comparing the calculated posterior probabilities, and the category with the largest posterior probability is the classification result. However, when there is not a sufficient number of samples, Bayesian classifiers may return biased prior-dependent classification results, i.e. depending on the different priors employed, it may identify different classes as the most likely. However, any single a priori choice carries a certain arbitrariness, and these classifications are highly uncertain[
17]. The credal network classifier relaxes the classification results of Bayesian classifiers by accepting imprecise probabilistic representations[
18]. In a Bayesian classifier, each category of a class variable has a single-valued probability. In contrast, in a credal network classifier, the occurrence probability of each class can be expressed as an interval valued probability, that is, an imprecise precision probability.
In order to deal with the uncertainty of node random variables, the Credal Set (CS) concept is introduced into the credal network[
19]. The credal set is used to describe the imprecise probabilistic properties of a node random variable, and mathematically, the credal set
K(
Xi) is defined as a closed convex set that covers all possible probabilistic mass functions
P(
Xi) of the random variable
Xi. Specific definitions are as follows:
K(Xi) represents the closed convex set consisting of all possible probability mass functions P(Xi) of the random variable Xi, CH represents a convex hull, means that the sum of all possible probabilities must equal 1, and ΩXi is the range of values for Xi.
As shown in equation (9), there may be many combinations of prior distribution and observed data, so the credal set contains an infinite number of probability mass functions, but it only contains a finite number of extreme mass functions, which are called the vertices of the credal set, denoted as
ext[
K(
Xi)]. These extremal functions correspond to the vertices that make up the convex hull, and they can be obtained by combining the endpoints of the probability interval. The classification of a credal network classifier consists of calculating the upper and lower bounds of the conditional probability of
Xc=
xc given
XE=
xE, a goal that can be achieved by calculating on a Bayesian network that corresponds to a limiting joint mass function as follows:
In the formula, P(X) represents the joint probabilistic mass function of all random variables, K(X) is the convex hull of a set of joint mass functions, i.e. the credal set, ext[K(X)] represents the limiting joint mass function of K(X), and P(X)∈ext[K(X)], which means that P(X) should be selected from ext[K(X)].
In this paper, IDM is used to model the prior, and it returns imprecise probabilities, which can be easily integrated into the credal network classifier, so as to realize the organic combination of IDM and credal classifier.
3.3. Naive Credal Classifier Classification Control Standards
Bayesian classifiers determine sample categories based on the principle of maximizing posterior probability in probability theory. Utilizing Bayesian networks, the classifier calculates the probability of each category by applying Bayes' theorem given known input evidence
x. It then compares these probabilities and selects the category with the highest posterior probability as its classification decision. As shown in
Figure 1, P(
c1|
x), P(
c2|
x) on the axis,... , P(
c5|
x) is calculated by a Bayesian classifier and the classification result is
c1 because P(
c1|
x) has the greatest posterior probability.
Figure 2 and
Figure 3 illustrate the diagnostic logic and output results of the credal network classifier. As shown in
Figure 2, after computation, the Naive Credal Classifier category
C1 as having a lower bound of posterior imprecise probability higher than all other categories. Consequently,
C1 was designated as the sole diagnostic result under evidence condition
X.However, in
Figure 3, the lower limit of the posterior imprecision probability for
C1 is lower than the upper limit of the posterior imprecision probability for
C2, which indicates that the probability intervals of the two may overlap, as shown by the shaded area in the figure. In this case, the Naive Credal Classifier cannot determine an exact classification result, but instead provides a set of possible categories {
C1,
C2}, indicating that the sample is likely to be classified as
C1 or
C2 based on the conditions of evidence.
As can be seen, compared to Bayesian classifiers, the credal network controller provides a larger probability margin when performing category diagnosis on samples. When samples are unique, the credal network controller can deliver higher judgment reliability [
19]. When faced with overlapping regions of maximum a posteriori imprecise probability intervals, the credal network controller can generate sets encompassing multiple possible categories, effectively reducing the risk of misdiagnosis.