Submitted:
21 August 2025
Posted:
22 August 2025
You are already at the latest version
Abstract

Keywords:
1. Introduction
2. Related Works
2.1. Vision-Based Generative Design
2.2. Leveraging Urban Imagery for Human Perception Studies
2.3. Perceptual and Affective Quality of Our Environment
3. Methodology
3.1. Model Pipeline
4. Parameter Analysis
4.1. Mask Prompt for Localised Inpainting
4.2. Image Diversity
4.3. Guidance and Scale Parameters
5. Experiments
5.1. Geolocalisation Experiment
5.2. Perceptual Alignment Experiment
5.3. Perceptual Text Prompts
5.4. Model-Based Evaluations
5.5. Human-Based Evaluation
6. Results
6.1. Geolocalisation Experiment
6.2. Perceptual Experiment
| Type | Contrastive Attribute | N | ||
| Obj | Colorful vs Dull-color | 0.85 | 0.58 | 72 |
| Obj | Angular vs Curvy | 0.72 | 0.38 | 73 |
| Obj | Symmetrical vs Asymmetrical | 0.64 | 0.23 | 71 |
| Obj | Textured vs Smooth | 0.65 | 0.30 | 70 |
| Aff | Welcoming vs Uninviting | 0.64 | 0.35 | 72 |
| Aff | Safe vs Unsafe | 0.55 | 0.24 | 72 |
| Aff | Relaxing vs Tense | 0.54 | 0.12 | 73 |
| Aff | Stimulating vs Not Stimulating | 0.53 | 0.27 | 74 |
| HiA | Utopian vs Dystopian | 0.59 | 0.24 | 72 |
| HiA | Harmonious vs Discordant | 0.58 | 0.28 | 71 |
| Type | Multi-class Category | |
| Obj | Red-green-blue-yellow-purple-orange | 0.86 |
| Obj | Brick-glass-stone-wood | 0.65 |
| Aff | Exciting-depressing-calm-stressful | 0.36 |
7. Discussion
8. Conclusion
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Declaration of Generative AI and AI-Assisted Technologies in the Writing Process
Conflicts of Interest
References
- Alexander, C. A new theory of urban design; Vol. 6, Center for Environmental Struc, 1987.
- Hillier, B.; Hanson, J. The social logic of space; Cambridge university press, 1989.
- Batty, M.; Longley, P.A. Fractal cities: a geometry of form and function; Academic press, 1994.
- Koenig, R.; Miao, Y.; Aichinger, A.; Knecht, K.; Konieva, K. Integrating urban analysis, generative design, and evolutionary optimization for solving urban design problems. Environment and Planning B: Urban Analytics and City Science 2020, 47, 997–1013. [Google Scholar] [CrossRef]
- Wortmann, T. Model-based Optimization for Architectural Design: Optimizing Daylight and Glare in Grasshopper. Technology|Architecture+Design 2017.
- Vermeulen, T.; Knopf-Lenoir, C.; Villon, P.; Beckers, B. Urban layout optimization framework to maximize direct solar irradiation. Computers, Environment and Urban Systems 2015, 51, 1–12. [Google Scholar] [CrossRef]
- Jang, S.; Roh, H.; Lee, G. Generative AI in architectural design: Application, data, and evaluation methods. Automation in Construction 2025, 174, 106174. [Google Scholar] [CrossRef]
- Parish, Y.I.; Müller, P. Procedural modeling of cities. In Proceedings of the Proceedings of the 28th annual conference on Computer graphics and interactive techniques, 2001, pp. 301–308.
- Müller, P.; Wonka, P.; Haegler, S.; Ulmer, A.; Van Gool, L. Procedural modeling of buildings. In ACM SIGGRAPH 2006 Papers; 2006; pp. 614–623.
- Jiang, F.; Ma, J.; Webster, C.J.; Chiaradia, A.J.; Zhou, Y.; Zhao, Z.; Zhang, X. Generative urban design: A systematic review on problem formulation, design generation, and decision-making. Progress in planning 2024, 180, 100795. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Communications of the ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232.
- Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and improving the image quality of stylegan. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 8110–8119.
- Hartmann, S.; Weinmann, M.; Wessel, R.; Klein, R. Streetgan: Towards road network synthesis with generative adversarial networks 2017.
- Chaillou, S. Archigan: Artificial intelligence x architecture. In Proceedings of the Architectural intelligence: Selected papers from the 1st international conference on computational design and robotic fabrication (CDRF 2019). Springer, 2020, pp. 117–127.
- Wu, W.; Fu, X.M.; Tang, R.; Wang, Y.; Qi, Y.H.; Liu, L. Data-driven interior plan generation for residential buildings. ACM Transactions on Graphics (TOG) 2019, 38, 1–12. [Google Scholar] [CrossRef]
- Wu, A.N.; Biljecki, F. GANmapper: geographical data translation. International Journal of Geographical Information Science 2022, 36, 1394–1422. [Google Scholar] [CrossRef]
- Law, S.; Hasegawa, R.; Paige, B.; Russell, C.; Elliott, A. Explaining holistic image regressors and classifiers in urban analytics with plausible counterfactuals. International Journal of Geographical Information Science 2023, 37, 2575–2596. [Google Scholar] [CrossRef]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10684–10695.
- Ma, H.; Zheng, H. Text Semantics to Image Generation: A method of building facades design base on Stable Diffusion model. In Proceedings of the The International Conference on Computational Design and Robotic Fabrication. Springer, 2023, pp. 24–34.
- Zhou, F.; Li, H.; Hu, R.; Wu, S.; Feng, H.; Du, Z.; Xu, L. ControlCity: A Multimodal Diffusion Model Based Approach for Accurate Geospatial Data Generation and Urban Morphology Analysis. arXiv preprint, arXiv:2409.17049 2024.
- Shang, Y.; Lin, Y.; Zheng, Y.; Fan, H.; Ding, J.; Feng, J.; Chen, J.; Tian, L.; Li, Y. UrbanWorld: An Urban World Model for 3D City Generation. arXiv preprint, arXiv:2407.11965 2024.
- Zhuang, J.; Li, G.; Xu, H.; Xu, J.; Tian, R. TEXT-TO-CITY: controllable 3D urban block generation with latent diffusion model. In Proceedings of the ACCELERATED DESIGN, Proceedings of the 29th International Conference of the Association for ComputerAided Architectural Design Research in Asia (CAADRIA) 2024. Presented at the CAADRIA, 2024, pp. 169–178.
- Cui, X.; Feng, X.; Sun, S. Learning to generate urban design images from the conditional latent diffusion model. IEEE Access 2024. [Google Scholar] [CrossRef]
- Zhang, H.; Zhang, R. Generating accessible multi-occupancy floor plans with fine-grained control using a diffusion model. Automation in Construction 2025, 177, 106332. [Google Scholar] [CrossRef]
- Shabani, M.A.; Hosseini, S.; Furukawa, Y. Housediffusion: Vector floorplan generation via a diffusion model with discrete and continuous denoising. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 5466–5475.
- Zhang, Z.; Fort, J.M.; Mateu, L.G. Exploringthe potential of artificial intelligence as a tool for architectural design: A perception study using gaudí’sworks. Buildings 2023, 13, 1863. [Google Scholar] [CrossRef]
- Zhong, X.; Chen, W.; Guo, Z.; Zhang, J.; Luo, H. Image inpainting using diffusion models to restore eaves tile patterns in Chinese heritage buildings. Automation in Construction 2025, 171, 105997. [Google Scholar] [CrossRef]
- Ibrahim, M.R.; Haworth, J.; Cheng, T. Understanding cities with machine eyes: A review of deep computer vision in urban analytics. Cities 2020, 96, 102481. [Google Scholar] [CrossRef]
- Biljecki, F.; Ito, K. Street view imagery in urban analytics and GIS: A review. Landscape and Urban Planning 2021, 215, 104217. [Google Scholar] [CrossRef]
- Salesses, P.; Schechtner, K.; Hidalgo, C.A. The collaborative image of the city: mapping the inequality of urban perception. PloS one 2013, 8, e68400. [Google Scholar] [CrossRef] [PubMed]
- Naik, N.; Philipoom, J.; Raskar, R.; Hidalgo, C. Streetscore-predicting the perceived safety of one million streetscapes. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2014, pp. 779–785.
- Dubey, A.; Naik, N.; Parikh, D.; Raskar, R.; Hidalgo, C.A. Deep learning the city: Quantifying urban perception at a global scale. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer, 2016, pp. 196–212.
- Kaplan, R.; Kaplan, S. The experience of nature: A psychological perspective; Cambridge university press, 1989.
- Ulrich, R.S. Aesthetic and affective response to natural environment. In Behavior and the natural environment; Springer, 1983; pp. 85–125.
- Gregory, R.L. The intelligent eye. 1970.
- Neisser, U. Cognitive psychology: Classic edition; Psychology press, 2014.
- Scherer, K.R. Appraisal theory. In Handbook of Cognition and Emotion/John Wiley & Sons 1999.
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Advances in neural information processing systems 2020, 33, 6840–6851. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9 2015, proceedings, part III 18. Springer, 2015; pp. 234–241.
- Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv preprint arXiv:1312.6114 2013. Presented at the 2nd International Conference on Learning Representations (ICLR), 2014.
- Podell, D.; English, Z.; Lacey, K.; Blattmann, A.; Dockhorn, T.; Müller, J.; Penna, J.; Rombach, R. Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint, arXiv:2307.01952 2023.
- Schuhmann, C.; Beaumont, R.; Vencu, R.; Gordon, C.; Wightman, R.; Cherti, M.; Coombes, T.; Katta, A.; Mullis, C.; Wortsman, M.; et al. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in neural information processing systems 2022, 35, 25278–25294. [Google Scholar]
- Labs, B.F. FLUX. https://github.com/black-forest-labs/flux, 2024.
- Lüddecke, T.; Ecker, A. Image segmentation using text and image prompts. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 7086–7096.
- Liu, S.; Zeng, Z.; Ren, T.; Li, F.; Zhang, H.; Yang, J.; Jiang, Q.; Li, C.; Yang, J.; Su, H.; et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. In Proceedings of the European Conference on Computer Vision. Springer; 2024; pp. 38–55. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. arXiv preprint, arXiv:2106.09685 2021.
- Zhang, L.; Rao, A.; Agrawala, M. Adding conditional control to text-to-image diffusion models. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp.
- Hao, Y.; Chi, Z.; Dong, L.; Wei, F. Optimizing prompts for text-to-image generation. Advances in Neural Information Processing Systems 2024, 36. [Google Scholar]
- Vivanco Cepeda, V.; Nayak, G.K.; Shah, M. Geoclip: Clip-inspired alignment between locations and images for effective worldwide geo-localization. Advances in Neural Information Processing Systems 2024, 36. [Google Scholar]
- Evans, J.S.B.; Stanovich, K.E. Dual-process theories of higher cognition: Advancing the debate. Perspectives on psychological science 2013, 8, 223–241. [Google Scholar] [CrossRef]
- Wason, P.C.; Evans, J.S.B. Dual processes in reasoning? Cognition 1974, 3, 141–154. [Google Scholar] [CrossRef]
- Kahneman, D. Thinking, fast and slow. Farrar, Straus and Giroux 2011.
- Reimers, N. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv preprint arXiv:1908.10084, arXiv:1908.10084 2019.
- Zhai, X.; Mustafa, B.; Kolesnikov, A.; Beyer, L. Sigmoid loss for language image pre-training. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 11975–11986.
- Russell, J.A.; Weiss, A.; Mendelsohn, G.A. Affect grid: a single-item scale of pleasure and arousal. Journal of personality and social psychology 1989, 57, 493. [Google Scholar] [CrossRef]
- Doersch, C.; Singh, S.; Gupta, A.; Sivic, J.; Efros, A.A. What makes paris look like paris? Communications of the ACM 2015, 58, 103–110. [Google Scholar] [CrossRef]
- Chen, D.; Chen, R.; Zhang, S.; Wang, Y.; Liu, Y.; Zhou, H.; Zhang, Q.; Wan, Y.; Zhou, P.; Sun, L. Mllm-as-a-judge: Assessing multimodal llm-as-a-judge with vision-language benchmark. In Proceedings of the Forty-first International Conference on Machine Learning, 2024.
- Wallace, B.; Dang, M.; Rafailov, R.; Zhou, L.; Lou, A.; Purushwalkam, S.; Ermon, S.; Xiong, C.; Joty, S.; Naik, N. Diffusion model alignment using direct preference optimization. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 8228–8238.
| 1 | CreativeML Open RAIL++-M License |
| 2 | Terms borrowed from the valence and arousal grid of [56] |














Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).