Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Double-Constrained Consensus Clustering with Application to Online Anti-Counterfeiting

Version 1 : Received: 1 August 2023 / Approved: 2 August 2023 / Online: 3 August 2023 (02:48:26 CEST)

A peer-reviewed article of this Preprint also exists.

Carpineto, C.; Romano, G. Double-Constrained Consensus Clustering with Application to Online Anti-Counterfeiting. Appl. Sci. 2023, 13, 10050. Carpineto, C.; Romano, G. Double-Constrained Consensus Clustering with Application to Online Anti-Counterfeiting. Appl. Sci. 2023, 13, 10050.

Abstract

Semi-supervised consensus clustering is a promising strategy to compensate for the subjectivity of clustering and its sensitivity to design factors, with various techniques being recently proposed to integrate domain knowledge and multiple clustering partitions. In this article we present a new approach that makes double use of domain knowledge, namely to build the initial partitions as well as to combine them. In particular, we show how to model and integrate must-link and cannot-link constraints into the objective function of a generic consensus clustering (??) framework that maximizes the similarity between the consensus partition and the input partitions, which have, in turn, been enriched with the same constraints. In addition, borrowing from the theory of functional dependencies, the integrated framework exploits the notions of deductive closure and minimal cover to take full advantage of the logical implication between constraints. Using standard UCI benchmarks, we found that the resulting algorithm, termed ??? (double-Constrained Consensus Clustering), was more effective than plain ?? at combining base constrained partitions. We then argue that ??? is especially well-suited to profiling counterfeit e-commerce websites, because constraints can be acquired by leveraging specific domain features, and demonstrate its potential for detecting affiliate marketing programs. Taken together, our experiments suggest that ??? makes the process of clustering more robust and able to withstand changes in clustering algorithms, datasets, and features, with a remarkable improvement of average performance.

Keywords

semi-supervised consensus clustering; ensemble clustering; constrained clustering; analysis of clustering constraints; online anti-counterfeiting; clustering fraudulent websites; detection 1of counterfeit affiliate programs

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.