Preprint
Article

This version is not peer-reviewed.

Unsupervised Hierarchical Visual Taxonomy of Marble Natural Stone Using Cluster-Aware Self-Supervised Vision Transformers

Submitted:

16 March 2026

Posted:

17 March 2026

You are already at the latest version

Abstract

The marble industry relies on proprietary commercial names rather than objective visual categories, creating market inefficiencies for stakeholders who select stones based on appearance. Supervised classification methods perpetuate this problem by replicating inconsistent commercial labels instead of discovering intrinsic visual structure. We propose an unsupervised pipeline combining a two-stage training strategy, pure self-supervised pretraining followed by cluster-aware fine-tuning of a DINO Vision Transformer, with UMAP dimensionality reduction and Ward's agglomerative hierarchical clustering. Systematic ablation studies on 1,540 marble images spanning 10 commercial varieties validate each design choice: cluster-aware training at k=10 yields superior embeddings over the self-supervised baseline (Silhouette Score 0.778 vs. 0.761; Davies–Bouldin Index 0.293 vs. 0.364), UMAP compression to five dimensions resolves high-dimensional noise pathologies, and Ward's linkage produces the most compact partitions. The resulting taxonomy reveals three phenomena invisible to commercial classification: cross-category merging of visually indistinguishable stones carrying different market names, intra-category splitting of heterogeneous sub-populations within single varieties, and coherent grouping where commercial and visual boundaries coincide. We further demonstrate that standard extrinsic metrics are misaligned with unsupervised taxonomy objectives when reference labels encode the inconsistencies the method aims to resolve. This work provides a validated methodology for data-driven visual classification in the natural stone industry and a transferable template for domains with unreliable labelling conventions.

Keywords: 
;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated