Engineering design increasingly uses generative AI to explore large form spaces, yet concept-driven generation is only useful if observers consistently perceive the intended attribute. We propose a ranking-based human validation layer that tests whether AI-generated concept-intensity gradients are interpretable, reliable, and usable. For each Product–Concept pair, a controlled generative workflow produced six variants intended to increase concept expression (A–F). In an online study, 26 design engineers ranked the variants by perceived intensity, with an optional not-applicable (NA) flag when category recognition failed. We analyse rankings with heatmap diagnostics, inter-observer agreement, monotonic alignment with the intended order, and Plackett–Luce aggregation with uncertainty, while using NA trends to bound operational ranges. Across nine pairs, most gradients aligned with the intended direction, but performance depended on the concept and product context, revealing both stable and failure-prone segments. The approach provides an evidence-based gate for concept implementation in AI-generative design.