Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

# Converting Ensemble Clustering Problem to a Mathematical Optimization Problem and Providing an Approach to Solve Based on Optimization Toolbox

Version 1 : Received: 17 April 2018 / Approved: 17 April 2018 / Online: 17 April 2018 (15:58:22 CEST)

How to cite: Salehpour, S.; Parvin, H. Converting Ensemble Clustering Problem to a Mathematical Optimization Problem and Providing an Approach to Solve Based on Optimization Toolbox. Preprints 2018, 2018040227. https://doi.org/10.20944/preprints201804.0227.v1 Salehpour, S.; Parvin, H. Converting Ensemble Clustering Problem to a Mathematical Optimization Problem and Providing an Approach to Solve Based on Optimization Toolbox. Preprints 2018, 2018040227. https://doi.org/10.20944/preprints201804.0227.v1

## Abstract

Nowadays, we live in a world in which people are facing with a lot of data that should be stored or displayed. One of the key methods to control and manage this data refers to grouping and classifying them in clusters. Today, clustering has a critical role in information retrieval methods for organizing large collections inside a few significant clusters. One of the main motivations for the use of clustering is to determine and reveal the hidden and inherent structure of a set of data. Ensemble clustering algorithms combine multiple clustering algorithms to finally reach an overall clustering system. Ensemble clustering methods by lack of information fusing utilize several primary partitions of data to find better ways. Since various clustering algorithms look at the different data points, they can produce various partitions from such data. It is possible to create a partition with high performance by combining the partitions obtained from different algorithms, even if the clusters to be very dense from each other. Most studies in this area have examined all the initial clusters. In this study, a new method is used in which the most sustainable clusters are utilized instead of all primary produced clusters. Consensus function based on co-association matrixes used to select more stable clusters. The most stable clusters selection method is done by cluster stability criterion based on F-measure. Optimization functions are used to optimize the obtained final clusters. The genetic algorithm is the optimizer used in this article to find the ultimate clusters participated in a consensus. Experimental results on several datasets show that the output of proposed method is various clusters with high stability.

## Keywords

Ensemble clustering; cluster stability; F-measure; co-association matrix; genetic algorithm

## Subject

Computer Science and Mathematics, Data Structures, Algorithms and Complexity