Implementation of Fuzzy C-Means and Possibilistic C-Means Clustering Algorithms, Cluster Tendency Analysis and Cluster Validation

In this paper, several two-dimensional clustering scenarios are given. In those scenarios, soft partitioning clustering algorithms (Fuzzy C-means (FCM) and Possibilistic c-means (PCM)) are applied. Afterward, VAT is used to investigate the clustering tendency visually, and then in order of checking cluster validation, three types of indices (e.g., PC, DI, and DBI) were used. After observing the clustering algorithms, it was evident that each of them has its limitations; however, PCM is more robust to noise than FCM as in case of FCM a noise point has to be considered as a member of any of the cluster.

Unlike supervised classification where the labels of the learning data are provided, in case of unsupervised learning labels are not equipped with data set. The job of this paper is first to identify whether there is any cluster substructure in the data or not, then apply the clustering algorithm to separate the cluster and then, finally, VAT is used to observe the clustering tendency.
While approaching towards a clustering problem three issues must be taken care of: 1. We have to consider whether the data set has cluster substructure for 1 < c < n. Here n is the number of data points in the dataset, and c is the cluster number. 2. After confirming the presence of substructure, we need to apply clustering algorithms make the computer identify the clusters. Several types of clustering algorithms can be used here, e.g., FCM, PCM, HCM and Mean shift.
3. After recognizing the clusters, cluster validity analysis should be applied to validate the clustering.
In this paper, all those issues were considered in step by step manner. In the case of FCM, a membership matrix was created which signifies the possibility of a data-point to be under a particular cluster. However, FCM has a constraint that for a specific data point the summation of the membership values for all the clusters must be '1' which makes FCM more vulnerable to noise. In contrast, this constraint was removed in case of PCM which made it more robust to noises.

A. The Basics of Fuzzy C-Means Algorithm
In the Fuzzy c-means algorithm each cluster is represented by a parameter vector θj where j=1…c and c is the total number of clusters. In FCM, it is assumed that a data point from the dataset X does not exclusively belong to a single group; instead, it may belong to more than one cluster simultaneously up to a certain degree. The variable u ij symbolizes the degree of membership of x i in cluster Cj. The data point is more likely to be under the cluster for which the membership value is higher. The sum of all the membership value in all clusters of a particular data point must be 1. The algorithm involves an additional parameter q (≥1) which is called fuzzifier. The preferable value of the fuzzifier is 2. However, different other values were tried in this paper to observe the difference. The higher the value of the q, the less generalized the algorithm becomes. FCM stems from the minimization of the cost function [20,21]: FCM is one of the most popular algorithms. It is iterative and starts with some initial estimates. Iteration contains following steps: 1. The grade of membership, u ij of the data x j in cluster Cj, i=1…N, and j=1…c, is computed taking into account the Euclidean or Mahalanobis distance of x i from all θj's.
Then the representatives, θjS are updated as the weighted means of all data vectors.
To terminate the algorithm, several methods can be applied. If the difference in the values of θjS or the grade of membership between two successive iterations were small enough, the algorithm could be terminated. However, the number of iterations can be predetermined.
The FCM algorithm is sensitive in the presence of outliers because of the requirement of: (4) indicates even a noise point has to be considered to have a higher membership value in a particular cluster.

B. The Basics of Possibilistic C-Means Algorithm
This algorithm (known as PCM) is also appropriate for unraveling compact clusters. It is a mode seeking algorithm. The framework here is similar to the one used in FCM. Each data vector x i is associated with a cluster C j via a scalar u ij . However, the constraint that all u ij s for a given x i sum up to 1 is removed. As a consequence, the u ij s (for a given x i ) are not interrelated anymore, and they cannot be interpreted as a grade of membership of vector in cluster C j since this term implies that the summation of u ij s for each x i should be constant. Instead, u ij is interpreted as the degree of compatibility between x i and C j . The degree of compatibility between x i and C j is independent of that between x i and the remaining clusters. PCM stems from the minimization of the cost function [22]: Here the first term of the equation is similar to the cost function of FCM. The second term was added because, without it, direct minimization for U leads to trivial zero solution and also this term ensure preferring large memberships. As with FCM, a parameter q (≥1) is involved in PCM. However, it does not act as a fuzzier in PCM. Like FCM, PCM is also iterative, and it even starts with some estimates. 1. The degree of compatibility, u ij , of the data vector x i to cluster C j , i = 1,...,N, j = 1,…,m, is computed, taking into account the (squared Euclidean) distance of x i from θj and the parameter, j  .
2. The representatives, θjS, are updated, as in FCM, as the weighted means of all data vectors (u q ij weights each data vector x i ).
As like FCM, to terminate the algorithm, several methods can be applied. If the difference in the values of θjS or the degree of compatibility between two successive iterations were small enough, the algorithm could be terminated. However, the number of iterations can be predetermined. In contrast to the FCM, PCM does not impose a clustering structure on input data X. PCM is sensitive to the initial θj values and the estimates of j  s.
In this paper, at first Fuzzy C-means algorithm was applied, which provided the cluster centers. Then the distance between each data point and the cluster center was measured. Now, if a data point exhibits minimum distance from cluster center k, then the data-point is considered to be under that cluster. By this theory, every data point was assigned to a particular cluster. While measuring the distance, both Euclidean and Mahalanobis distance were utilized.

III. CLUSTER TENDENCY ANALYSIS
Cluster tendency analysis can be done by visually inspecting the reordered distance matrix of the given dataset known as the visual assessment of cluster tendency (VAT) [23] algorithm. In VAT algorithm, at first, the Euclidean distance matrix between the samples is computed. Then, this distance matrix is reordered to create an ordered dissimilarity matrix such that similar data points are located close to each other. Then this ordered dissimilarity matrix is converted to an ordered dissimilarity image, known as VAT image. In the picture, dissimilarity is represented by each pixel. If the image is scaled on the gray intensity scale, then, white pixels values show high contrast and black pixels exhibit low dissimilarity which is evident from the diagonal pixels where the entry of divergence is zero because dissimilarity is measured within the same data points. But VAT has a problem of being computationally expensive and limitation for discovering more sophisticated patterns inside the data. Therefore, improved VAT (iVAT) [24] can also be utilized. In the iVAT algorithm, a graph-theoretic distance transform has been used to enhance the effectiveness of the VAT algorithm for complex cases where VAT fails to provide detailed and clear cluster tendency.

IV. CLUSTER VALIDATION
After cluster tendency analysis, cluster validity analysis was applied which ensures the validity of the number of clusters considered in the clustering algorithm. Cluster tendency provides a visual representation of the number of clusters; on the other hand, cluster validity analysis offers numerical value for different groups' validity indices which indicate the number of clusters. In this paper, three cluster validity indices were utilized for cluster validation.

Partition Coefficient (PC): It uses the membership matrix
to compute the index as shown in equation (6) (10) it is visible that the value of DBI indicates the ratio of intra-class scattering to interclass separation. Hence, the lower value of DBI means better clustering.

A. The Performance of Fuzzy C-Means Clustering Algorithm
After building the algorithm of Fuzzy C-means, different data sets were implemented (e.g., two separately spaced clusters, three close clusters, 4 clusters, 5 clusters, clusters with noise, etc.). Fuzzy C means algorithm performs reasonably well with clusters with no noise and separately spaced clusters.   As it is evident that FCM works well with data sets with two clusters, hence data sets with 3, 4 and 5 clusters were implemented afterward to observe the performance of FCM.

B. The Performance of Possibilistic C-Means Clustering Algorithm
As like FCM, the same datasets were implemented in the PCM algorithm. At first, the data set with two well-separated clusters was performed, and the response from the PCM algorithm was following. Figure 9: The dataset with two clusters and PCM output PCM was able to find the two clusters. Then another dataset was implemented where there were two not wholly separable data. It is evident that PCM was capable of clustering the data; however, it also miss-clustered some of the data. Next, data with three clusters was applied. Figure 11: The dataset with three clusters and PCM output After implementing three cluster data, four cluster data was used the PCM nicely clustered this.      From the tables above, it is visible that for well separated and less number of clusters the clustering index performs reasonably well. And the performance of DBI is more accurate than PC and DI.

VII. IMPLEMENTATION OF CLUSTER TENDENCY ANALYSIS
For cluster tendency analysis, VAT was used. The results of cluster tendency analysis for both Fuzzy c-means and possibilistic c-means clustering are shown in table 8.
From table 8, it is evident that if the data set has wellseparated clusters, then cluster tendency analysis will show a neat result. With the decrease of separation and increase of cluster number, the tendency analysis shows an ambiguous result. For example, in the case of four clusters, cluster tendency analysis for PCM was not showing a good pattern. This paper has discussed some aspects of cluster analysis process. From the analysis measures done in this paper indicates some significant observations. In the case of two separately places clusters, both FCM and PCM showed reasonably good performance. However, if noise is added to the dataset, then FCM may lead to a wrong result as the noise point shifts the cluster centers. On the other hand, PCM is free from this lacking as it does not consider the constraint of FCM. PCM is mode searching algorithm which makes it more stable in case of noisy data.