We start with the definition of logical distance based on the uninorm and absorbing norm and then present the suggested algorithm. In the last subsection we consider numerical simulations and compare the suggested algorithm with the known method.
3.1. Logical Distance Based on the Uninorm and Absorbing Norm
Let
be two truth values. Given neutral element
and absorbing element
, fuzzy logical dissimilarity of the values
and
is defined as follows:
where the value
is a fuzzy logical similarity of the values
and
.
If , we will write instead of and instead of .
Lemma 1.
If , then the function is a semi-metric in the algebra .
Proof. To prove the lemma, we need to check the following three properties:
- a.
.
By commutativity and identity properties of the uninorm holds:
Then, by the property of absorbing element holds
and
- b.
if .
If
, then either
or
- c.
.
It follows directly from the commutativity of the absorbing norm.
Lemma is proven. □
Using the dissimilarity function, we define the fuzzy logical distance
between the values
as follows (
):
Lemma 2.
If , then the function is a semi-metric in the algebra of real numbers on the interval .
Proof. Since for any
and any
both
,
and
, and by the properties a and b of semi-metric, holds
- a.
If
, then
and
- b.
If
, then
. So
and
- c.
Symmetry
follows directly from the symmetry of the dissimilarity
. □
An example of the fuzzy logic distance
between the values
with
is shown in
Figure 1.a. For comparison,
Figure 1.b shows the Euclidean distance between the values
.
It is seen that the fuzzy logical distance better separates the close values and less sensitive to the far values than the Euclidean distance.
Now let us extend the introduced fuzzy logical distance to multidimensional variables.
Let
be
-dimensional vectors such that each vector
,
, is a point in a
-dimensional space.
The fuzzy logical dissimilarity of the points
and
,
, is defined as follows:
where, as above, the value
is a fuzzy logical similarity of the points
and
.
Let
. Then, as above, the fuzzy logical distance between the points
and
,
, is
Lemma 3.
If , then the function is a semi-metric in the algebra , .
Proof. This statement is a direct consequence of lemma 1 and the properties of the uninorm.
- a.
.
If
then for each
- b.
if .
If
, then for each
- c.
.
It follows directly from the symmetry of the dissimilarity for each and commutativity of the uninorm. □
Lemma 4.
If , then the function is a semi-metric on the hypercube , .
Proof. The proof is literally the same as the proof of lemma2. □
The suggested algorithm uses the introduced function as a distance measure between the instances of the data.
3.2. The c-Means Algorithm with Fuzzy Logical Distance
The suggested algorithm considers the instances of the data as truth values and uses Algorithm 1 with the fuzzy logical distance on these values.
As above, assume that the raw data is represented by the
-dimensional vectors
where
,
,
,
, is a data instance.
Since function
requires the values from the hypercube
,
, vector
must be normalized
and the algorithm should be applied to the normalized data vector
. After definition of the cluster centers
, the inverse normalization must be applied to the vector
Normalization can be conducted by several methods; in
Appendix A we present a simple Algorithm 3 of normalization by linear transformation. The inverse transformation is provided by the Algorithm 4 also presented in
Appendix A.
In general, the suggested Algorithm 2 follows the Bezdek fuzzy c-means algorithm 1, however differs in the distance function and in the initialization of the cluster centers , , and, consequently, – in the definition of the number of clusters.

The main difference between the suggested Algorithm 2 and the original Bezdek fuzzy
c-means Algorithm 1 [
13,
14] and the known Gustafson-Kessel [
16] and Gath-Geva [
17] is the use of the fuzzy logical distance
. The use of this distance requires normalization and renormalization of the data.
The other difference is in the need of the initialization of the cluster centers as a grid in the algorithm’s domain. Such initialization is required because of quick convergence of the algorithm; thus, the even distribution of the initial cluster centers avoids missing the clusters. A simple algorithm 5 for creating a grid in the square
is outlined in
Appendix A.
Let us consider two main properties of the suggested algorithm.
Theorem 1. Algorithm 2 converges.
Proof. Convergence of the algorithm 2 follows directly from the fact that is a semi-metric (see lemma 4).
In fact, in the lines 3 and 7 of the algorithm holds
and the algorithm converges. □
Theorem 2.
Time complexity of the Algorithm 2 is , where is the dimensionality of the data, is the number of clusters, is the number of instances in the data and is the number of iterations.
Proof. At first, consider calculation of the fuzzy logical distances , , . Complexity of this operation is for each dimension ; thus, calculation of the distances has a complexity .
Now let us consider the lines of the algorithm. Normalization of the data vector (line 1) requires
steps and initialization of the cluster centers (line 2) requires
steps (see Algorithm 3 in
Appendix A).
Initialization of the membership degrees (line 3) requires steps for each dimension that gives .
In the do-while loop, saving the membership degrees (line 5) requires , calculation of the cluster centers given the membership degrees (line 6) requires steps and calculation of the membership degrees (line 7) requires, as above, steps for each dimension that gives .
Finally, renormalization of the vector of the cluster centers (line 9) requires steps.
Thus, initial (lines 1-3) and final (9) operations of the algorithm require
steps and each iteration requires
Then, for
iterations it is required
steps. □
Note that in the considerations above we assumed that which supported the semi-metric properties of the function . Along with that, in practice these parameters can differ and, despite the absence of formal proofs, the use of the function with can provide essentially better clustering.
3.3. Numerical Simulations
In the first series of simulations, we will demonstrate that the suggested Algorithm 2 with fuzzy logical distance results in more precise centers of the clusters than the original Algorithm 1 with Euclidean distance.
For this purpose, in the simulations we generate a single cluster as normally distributed instances with the known center and apply Algorithms 1 and 2 to these data. As a measure of the quality of the algorithms we use the mean squared error (MSE) in finding the center of the cluster center and the means of standard deviations of the calculated clusters centers.
To avoid the influence of the instance values, in the simulations we considered the normalized data. An example of the data (white circles) and the results of the algorithms (gray diamonds for Algorithm 1 and black pentagrams for Algorithm 2) are shown in
Figure 2.
The simulations were conducted with different numbers of clusters by the series of trials. The results were tested and compared by the one-sample and two-sample Student’s -tests.
In the simulations we assumed that the algorithm which for a single cluster provides less deviated cluster centers is more precise in the calculation of the cluster centers. In other words, the algorithm which results in the clusters centers concentrated near the actual cluster center is better than the algorithm which results in more dispersed cluster centers.
Results of simulations are summarized in
Table 1. In the table, we present the results of clustering of two-dimensional data distributed around the mean
.
It is seen that both algorithms result in cluster centers close to the actual cluster center and the errors in calculating the centers are extremely small. Additional statistical testing by Student’s -test demonstrated that the differences between the obtained clusters centers are not significant with .
Along with that, the suggested Algorithm 2 results in smaller standard deviation than the Algorithm 1 and this difference is significant with . Hence, the suggested Algorithm 2 results in more precise cluster centers than Algorithm 1.
To illustrate these results, let us consider simulations of the algorithms on the data with several predefined clusters.
Consider application of Algorithms 1 and 2 to the data with two predefined clusters with the centers in the points
and
. The resulting cluster centers with
and
clusters are shown in
Figure 3. Notation in the figure is the same as in
Figure 2.
It is seen that clusters centers obtained by Algorithm 2 (black pentagrams) are concentrated closer to the actual cluster centers than the cluster centers obtained by Algorithm 1 (gray diamonds). Moreover, some of the cluster centers obtained by Algorithm 2 are located in the same points while all cluster centers obtained by Algorithm 1 are located in different points.
More clearly the observed effect is seen on the data with several clusters.
Figure 4 shows the results of Algorithms 1 and 2 applied to the data with
predefined clusters.
It is seen that the cluster centers calculated by Algorithm 2 are concentrated at the real centers of the clusters and, as above, several centers are located at the same points. Hence, the suggested Algorithm 2 allows more correct definition of the cluster centers and consequently, more correct clustering.
To illustrate the usefulness of the suggested algorithm in recognition of the number of clusters and analysis of the data structure, let us compare the results of the algorithm with the results obtained by the MATLAB
® fcm function [
15] with three possible distance measures: Euclidean distance, Mahalanobis distance [
16] and exponential distance [
17]. These algorithms were applied to
data instances distributed around
centers; the obtained results are shown in
Figure 5.
It is seen that the suggested algorithm (
Figure 5.a) correctly recognizes the real centers of the clusters and locates the cluster centers close to these real centers. In contrast, the known algorithms implemented in MATLAB
® do not recognize real centers of the clusters and, consequently, do not define correct number of the clusters and locations of their centers. The function
fcm with Euclidean and Mahalanobis distance measures (
Figure 5.b,c) results in two cluster centers and the function
fcm with exponential distance (
Figure 5.d) results in three cluster centers. Note that the use of Euclidean and Mahalanobis distance measures leads to very similar results.
Thus, recognition of the real cluster centers which are the centers of distributions of the data instances can be conducted in two stages. At first, the cluster centers are defined by application of the suggested algorithm to the raw data, and at second, the cluster centers by application of k-means to the cluster centers found at the first stage. The resulting cluster centers indicate the real cluster centers.