Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Automatic Estimation for Visual Quality Changes of Street Space Via Street-View Images and Multimodal Large Language Models

Version 1 : Received: 22 November 2023 / Approved: 23 November 2023 / Online: 23 November 2023 (05:33:11 CET)

How to cite: Liang, H.; Zhang, J.; Li, Y.; Zhu, Z.; Wang, B. Automatic Estimation for Visual Quality Changes of Street Space Via Street-View Images and Multimodal Large Language Models. Preprints 2023, 2023111473. https://doi.org/10.20944/preprints202311.1473.v1 Liang, H.; Zhang, J.; Li, Y.; Zhu, Z.; Wang, B. Automatic Estimation for Visual Quality Changes of Street Space Via Street-View Images and Multimodal Large Language Models. Preprints 2023, 2023111473. https://doi.org/10.20944/preprints202311.1473.v1

Abstract

Estimating Visual Quality of Street Space (VQoSS) is pivotal for urban design, environmental sustainability, civic engagement, etc. Recent advancements, notably in deep learning, have enabled large-scale analysis. However, traditional deep learning approaches are hampered by extensive data annotation requirements and limited adaptability across diverse VQoSS tasks. Multimodal Large Language Models (MLLMs) have recently demonstrated proficiency in various computer vision tasks, positioning them as promising tools for automated VQoSS assessment. In this paper, we pioneer the application of MLLMs to VQoSS change estimation, with our empirical findings affirming their effectiveness. In addition, we introduce Street Quality GPT (SQ-GPT), a model that distills knowledge from the current most powerful but inaccessible (not free) GPT-4, requiring no human efforts. SQ-GPT approaches GPT-4’s performance and is viable for large-scale VQoSS change estimation. In a case study of Nanjin, we showcase the practicality of SQ-GPT and knowledge distillation pipeline. Our work promises to be a valuable asset for future urban studies research.

Keywords

Visual Quality of Street Space, Street-View Images, Large Language Models, Multimodal, Deep Learning

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.