Generative Adversarial Networks (GANs) have demonstrated remarkable capabilities for synthesizing photorealistic textures, yet deploying conditional GANs (cGANs) in industrial settings faces two barriers: the prohibitive cost of annotating proprietary data and the uncertain alignment between automated metrics and human perception. This study addresses both challenges for marble texture synthesis. We adapt an unsupervised segmentation pipeline combining Simple Linear Iterative Clustering (SLIC) superpixels, Gaussian Mixture Models (GMMs), and Graph Cut optimization to extract vein structures from 289 industrial scans without manual annotation. We then benchmark four cGAN architectures, a baseline cGAN, Pix2Pix, BicycleGAN, and GauGAN, using a dual-evaluation protocol contrasting automated assessment via pixel-based metrics, structural metrics, statistical metrics, and learned distributional metrics with human-centered assessment. Results reveal a significant metric–perception discrepancy: Pix2Pix achieved the best FID yet received the lowest human ratings due to checkerboard artifacts, whereas GauGAN produced textures statistically indistinguishable from real marble (Visual Turing Pass Rate, VTPR: 0.533; Mean Opinion Score on Marble Authenticity, MOS-MA: 2.89) despite inferior FID (87.3). These findings establish three contributions: (1) an unsupervised annotation-free segmentation pipeline, (2) empirical evidence that automated metrics alone are insufficient for architecture selection, and (3) a dual-evaluation framework validating human-in-the-loop assessment as essential for industrial deployment.