Article
Version 1
Preserved in Portico This version is not peer-reviewed
Unsupervised Feature Selection for Histogram-Valued Symbolic Data by Hierarchical Conceptual Clustering
Version 1
: Received: 30 March 2021 / Approved: 31 March 2021 / Online: 31 March 2021 (07:53:39 CEST)
A peer-reviewed article of this Preprint also exists.
Ichino, M.; Umbleja, K.; Yaguchi, H. Unsupervised Feature Selection for Histogram-Valued Symbolic Data Using Hierarchical Conceptual Clustering. Stats 2021, 4, 359-384. Ichino, M.; Umbleja, K.; Yaguchi, H. Unsupervised Feature Selection for Histogram-Valued Symbolic Data Using Hierarchical Conceptual Clustering. Stats 2021, 4, 359-384.
Abstract
This paper presents an unsupervised feature selection method for multi-dimensional histogram-valued data. We define a multi-role measure, called the compactness, based on the concept size of given objects and/or clusters described by a fixed number of equal probability bin-rectangles. In each step of clustering, we agglomerate objects and/or clusters so as to minimize the compactness for the generated cluster. This means that the compactness plays the role of a similarity measure between objects and/or clusters to be merged. To minimize the compactness is equivalent to maximize the dis-similarity of the generated cluster, i.e., concept, against the whole concept in each step. In this sense, the compactness plays the role of cluster quality. We also show that the average compactness of each feature with respect to objects and/or clusters in several clustering steps is useful as feature effectiveness criterion. Features having small average compactness are mutually covariate, and are able to detect geometrically thin structure embedded in the given multi-dimensional histogram-valued data. We obtain thorough understandings of the given data by the visualization using dendrograms and scatter diagrams with respect to the selected informative features. We illustrate the effectiveness of the proposed method by using an artificial data set and real histogram-valued data sets.
Keywords
unsupervised feature selection; histogram-valued data; compactness; hierarchical conceptual clustering; multi-role measure; visualization
Subject
Computer Science and Mathematics, Algebra and Number Theory
Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments (0)
We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.
Leave a public commentSend a private comment to the author(s)
* All users must log in before leaving a comment