Preprint
Article

This version is not peer-reviewed.

Classical Architectural Imagery and Digital Twins: AIGC-Driven Methods and Evaluation Strategy for the Spatial Reconstruction of Ancient Academies

Submitted:

07 June 2026

Posted:

08 June 2026

You are already at the latest version

Abstract
The digital reconstruction of ancient architecture and landscapes has become a prominent topic in current scholarship. Existing studies predominantly focus on the three-dimensional reconstruction of archaeological sites or damaged historic buildings, while effective methods for the restoration or image translation of architectural imagery recorded in ancient classical texts remain insufficient. Taking China’s distinctive educational architectural heritage, the ancient academy as a case study, this paper proposes 3 methods--TI, TLTI, TPTI driven by Chinese AI Generated Content (AIGC) . The methods includes simplified images of architectural remains, and spatial features to reconstruct the architecture and landscapes of ancient Chinese academies.The findings demonstrate that AIGC can be effectively applied to the reconstruction of landscape spaces and material architectural art within a Chinese cultural context, enabling the re-presentation of architectural developmental histories that were traditionally reliant on textual interpretation in the history of technology and art. This research offers a novel methodological contribution to the integration of heritage science and the history of technology and art by activating cultural heritage in non-Western contexts through artificial intelligence and analyzing the new forms of cognition and knowledge reconstruction that emerge as a result.
Keywords: 
;  ;  

1. Introduction

Digital reconstruction of ancient architecture and landscapes represents a form of digital twin methodology, employing digital technologies to recreate virtual representations of physical buildings or landscapes. Current reconstruction approaches include 3D modeling, laser scanning, and holographic reconstruction. Such methods are particularly suitable for extant architectural remains or physical structures. For example, Gao (2024) employed Multi-View Stereo (MVS) 3D reconstruction technology to reconstruct Cave 285 of the Mogao Grottoes in Dunhuang. Similarly, Yuan, K., and Han (2024) employed three-dimensional reconstruction technologies to convert ancient pagoda heritage into virtual environments for game development. However, these methods rely heavily on existing physical structures, advanced professional technologies such as MVS, and scientific and efficient heritage image acquisition strategies to achieve high-quality 3D modeling outcomes. In contrast, when dealing with architectural imagery documented in ancient gazetteers and classical texts—where no physical remains are available—such methods face significant limitations in achieving effective image translation or reconstruction. At present, large-scale Chinese AI LLM operating primarily within Chinese-language contexts, such as Jimeng and Doubao, are capable of performing two-dimensional and three-dimensional image transformations through multimodal AIGC approaches. This development introduces a new research direction for the digital reconstruction of ancient architecture and landscapes: whether large language models can be leveraged for three-dimensional image reconstruction to accomplish image translation or restoration.
Ancient Chinese academy architecture represents a spatial embodiment of traditional Chinese education and ritual culture. It shaped the intellectual formation and cognition of local scholars and constitutes an important form of material cultural heritage worthy of preservation and transmission. China possesses a rich architectural heritage. Beyond the grand and monumental imperial architectural heritage exemplified by the Forbidden City in Beijing, there also exist regional academies that once served as institutions for Confucian education for scholars across different localities. Owing to historical developments, most academy buildings now survive only in local gazetteers and classical texts. Ancient Chinese academies developed during the Tang and Song dynasties as distinctive regional educational centers established either by the state or by private individuals. Their purpose was to provide local scholars with classical Confucian education, preparation for the imperial civil service examinations, and the cultivation of local talent. Accordingly, academy architecture varied in scale, ranging from small to medium to large complexes. For example, Bailudong Academy in Jiujiang, Jiangxi Province—known as the foremost academy in China—covers an area of approximately 3,800 square meters and could accommodate hundreds of students attending lectures. Ancient Chinese academy architecture is characterized by three key features. First, the architectural environment recreates a Confucian ritual space [4], typically organized as an axial and symmetrical complex centered on a lecture hall displaying a portrait of Confucius. Second, an academy consists of a group of buildings comparable to a modern university, including structures functioning similarly to libraries (scripture repositories) as well as traditional Chinese landscape elements. Third, academy architecture often served as an important urban landmark, representing local identity and regional culture (Figure 1). Consequently, how to employ digital–intelligent technologies for the three-dimensional reconstruction of academy architecture documented in classical texts has become a crucial research topic in cultural heritage protection and transmission.Digital reconstruction of ancient architecture and landscapes represents a form of digital twin methodology, employing digital technologies to recreate virtual representations of physical buildings or landscapes. Current reconstruction approaches include 3D modeling, laser scanning, and holographic reconstruction. Such methods are particularly suitable for extant architectural remains or physical structures. For example, Gao (2024) employed Multi-View Stereo (MVS) 3D reconstruction technology to reconstruct Cave 285 of the Mogao Grottoes in Dunhuang [1]. Similarly, Yuan, K., and Han (2024) employed three-dimensional reconstruction technologies to convert ancient pagoda heritage into virtual environments for game development [2]. However, these methods rely heavily on existing physical structures, advanced professional technologies such as MVS, and scientific and efficient heritage image acquisition strategies to achieve high-quality 3D modeling outcomes. In contrast, when dealing with architectural imagery documented in ancient gazetteers and classical texts—where no physical remains are available—such methods face significant limitations in achieving effective image translation or reconstruction. At present, large-scale Chinese AI LLM operating primarily within Chinese-language contexts, such as Jimeng and Doubao, are capable of performing two-dimensional and three-dimensional image transformations through multimodal AIGC approaches. This development introduces a new research direction for the digital reconstruction of ancient architecture and landscapes: whether large language models can be leveraged for three-dimensional image reconstruction to accomplish image translation or restoration. Ancient Chinese academy architecture represents a spatial embodiment of traditional Chinese education and ritual culture. It shaped the intellectual formation and cognition of local scholars and constitutes an important form of material cultural heritage worthy of preservation and transmission. China possesses a rich architectural heritage. Beyond the grand and monumental imperial architectural heritage exemplified by the Forbidden City in Beijing, there also exist regional academies that once served as institutions for Confucian education for scholars across different localities. Owing to historical developments, most academy buildings now survive only in local gazetteers and classical texts. Ancient Chinese academies developed during the Tang and Song dynasties as distinctive regional educational centers established either by the state or by private individuals. Their purpose was to provide local scholars with classical Confucian education, preparation for the imperial civil service examinations, and the cultivation of local talent. Accordingly, academy architecture varied in scale, ranging from small to medium to large complexes. For example, Bailudong Academy in Jiujiang, Jiangxi Province—known as the foremost academy in China—covers an area of approximately 3,800 square meters and could accommodate hundreds of students attending lectures. Ancient Chinese academy architecture is characterized by three key features. First, the architectural environment recreates a Confucian ritual space [3], typically organized as an axial and symmetrical complex centered on a lecture hall displaying a portrait of Confucius. Second, an academy consists of a group of buildings comparable to a modern university, including structures functioning similarly to libraries (scripture repositories) as well as traditional Chinese landscape elements. Third, academy architecture often served as an important urban landmark, representing local identity and regional culture. Consequently, how to employ digital intelligent technologies for the three-dimensional reconstruction of academy architecture documented in classical texts has become a crucial research topic in cultural heritage protection and transmission.
Prompts constitute a critical component in AIGC-based image generation. Models operating in a Chinese-language context are capable of understanding classical Chinese content found in ancient local gazetteers, which provides a distinct advantage for scientifically grounded image translation, as relevant historical materials can be incorporated into the prompt-engineering process. Oppenlaender et al. (2025) argue that users should be able to discern prompt quality, compose prompts, and iteratively refine them [4]. Liu and Chilton (2022) proposed prompt engineering guidelines to facilitate improved outputs from text-to-image generative models [5]. Other studies examining the effectiveness of prompt usage include Ekin (2023) [6], Burlin (2023) [7], and Khan (2024) [8], all of which analyze how prompt design can enhance artistic creativity, practice, and outcomes. This research is try to propose and validate a human-machine collaborative methodological framework for the spatial reconstruction of ancient architecture, applicable to specific scenarios where no physical remains exist.

2. Materials and Methods

This study primarily uses selected academy images from local gazetteers of Hunan Province, China, including a total of four images comprising academy and landscape illustrations: “Chengnan Academy Illustration” from the Shanhua County Gazetteer (1747, Qianlong reign, Qing dynasty [9]); “Nantai Academy Illustration” from the Liuyang County Gazetteer (Tongzhi reign, Qing dynasty) [10] ; “Yuelu Academy” from the Yuelu Gazetteer of Changsha Prefecture (Qing dynasty) [11]; and “Shigu Academy” from the Hengzhou Prefecture Gazetteer (Guangxu reign, Qing dynasty) [12]. These images are woodblock prints primarily derived from stitched thread-bound editions produced through block printing(Table 1). The layout of these books is designed for left-to-right reading, and due to the binding format, a single complete image is often divided into two sections. Most gazetteer images are accompanied by titles, and some include the names of the illustrators. As these gazetteers have been preserved for several centuries, the printed lines of the woodblock illustrations have gradually faded, resulting in blurred architectural and landscape details. Specifically, the academy images in local gazetteers exhibit several common characteristics: they mainly employ scattered-point perspective and line-drawing techniques; the proportions of buildings lack explicit scale annotations; academies situated within landscapes are often concealed within mountains and forests; architectural and landscape representations are highly simplified; and the spatial distances between buildings are unclear. These features collectively constitute major challenges for digital twin reconstruction and image translation using AIGC large models.

3. Results

This This study proposed three methods to revitalizing ancient Chinese academy image as followings (Figure 2):
Method TI — Image to Image: This method adopts a Chinese AI reference-image-based image generation approach, in which images are generated directly from reference images combined with simple prompts, in order to examine the AIGC image-generation workflow.
Method TLTI — Image to Line Drawing to Image: This method generates more detailed overhead line-drawing representations of academy architecture. First, AI is used to generate a line drawing, which is then transformed by AI into an overhead photographic representation of Hunan academy architecture.
Method TPTI — Image to Prompts to Image: This method employs Qwen (Qwen3-Max) to generate prompts—specifically, to “convert image content into prompts that Jimeng can use for image generation.” Images are then generated from these prompts to evaluate the image-generation performance of Jimeng.
Due to varying image sizes, the research standardized all images to a 16:9 ratio when generating the final output for easier comparison and analysis.
In the present study, prompt engineering adopts an approach in which AI models generate prompts from images, followed by prompt refinement and correction based on the textual descriptions in local gazetteers. By inputting the prompt “generate a bird’s-eye view photo of Hunan Academy architecture from a line drawing” into the Qianwen large model, a more detailed and enriched prompt engineering for images is obtained (Table 2).Partial prompt words have been optimized based on generation effect for Yuelu Academy and Shigu Academy.

3. Results

Through the application and outcomes of the aforementioned three methods on five types of academy images (Figure 3), three results of the multimodal activation methods for ancient Chinese academy landscape and architectural imagery can be observed, namely: generation results, generation effectiveness, and generative hallucinations.

3.1. Results of Academy Landscape Image Generation

Overall, Jimeng AI and Doubao AI produce complete photorealistic images. Keywords and images related to Hunan academies can be processed by these large models, and the generated buildings, landscapes, and atmosphere largely align with the original academy image characteristics. The effectiveness of the three methods is as follows:
The first method, TI (Text-to-Image), involves direct image translation, using prompts like “ancient reference image with simple keywords” for generation. Current Jimeng and Doubao models can recognize the basic composition, state, architectural/landscape style, and textual content of the original image’s buildings. However, due to the brevity and limitations of the prompts, as well as the stylized and rigid depiction of architectural forms in the original images, the generated images, while realistic, resemble architectural drawings inserted into a real-world scene rather than authentic representations of buildings.
The second method, TLTI (Two-Level Translation and Interpretation), involves two-stage image translation: AI generates a line-drawing sketch, which is then translated by AI into a photorealistic architectural aerial photo. This method refines architectural details and clarity, with the workflow observable in two styles: architectural design drawings and scene renderings. While it still grapples with the issue of architectural drawings being inserted into scenes, it optimizes the problem of some building facades appearing stiff due to blurred lines in the first method, aiding the model in understanding the original image through multiple stages.
The third method, TPTI (Text-Prompted Translation and Interpretation), employs two-stage multimodal translation via language and images. The AI model first generates detailed keywords to assist the second translation stage in achieving a deeper understanding of the architecture and scenes depicted in the image. Descriptive keywords for architectural and landscape details, culture, and scenes generated by the AI can guide the image generation model to comprehend the probable original appearance of the reference image and generate more realistic scene effects. The overall atmosphere of the images reflects Chinese landscape aesthetics, demonstrating the effectiveness of multimodal methods in improving visual presentation quality. The main issue is a relatively high frequency of erroneous scene content generation.

3.2. Generation Effectiveness

TI Method: Generated works conform to the context of academy illustrations and visual style. The drawback is its high dependency on the quality of the original image.
TLTI Method: Features a clear workflow. Generated works are sharper, particularly effective for activating and creating digital twins from simple line-drawing sketches of academy building complexes.
TPTI Method: Generates richer keywords, which can further help control and define the AI’s final generation and image presentation. The disadvantage is that the generated keywords can be ambiguous and lack clear prioritization of elements. This can lead the large model to misinterpret keywords, resulting in the generation of non-existent buildings or landscape effects not present in the original image, or the blurring/omission of important content from the image.

3.3. Generative Hallucinations

Overall, all three methods have strengths and weaknesses. It is evident that hallucinations caused by AI-generated images cannot be overlooked (Figure 3). Traditional Hunan academy buildings typically feature white walls and grey tiles, as seen in existing examples like the Yuelu Academy and the Shigu Academy in Hunan, China. Examining the buildings along the central axis of the actual Yuelu Academy reveals they are primarily educational and ceremonial structures, characterized by golden glazed tiles and red brick walls symbolizing their importance--features not prominently displayed in the generated images (This feature may have been added during restoration work in the modern or contemporary period). Furthermore, consulting architectural records of the Yuelu Academy from the Newly Revised Gazetteer of the Yuelu Academy indicates its status as a premier academy with complex architecture and landscapes. Records from Zhao Ning in Yuelu gazetteer of Changsha Prefecture describe structures like the Jingyi Hall, Zunjing Pavilion, dormitories, study cells, halls, the Xuan sheng (Confucius) Hall, as well as landscapes such as the Winding Stream Pavilion, Baiquan Pavilion, Yuanbai, Baihe Spring, and hundreds of fir trees, ancient cypresses, and pines. This includes relics from the Song dynasty onwards and newly constructed buildings. Therefore, it is essential to thoroughly cross-verify AI-generated results with local gazetteer records and existing structures for supplementation and correction. The imbalance of building proportions should also be noted.

3.4. Histogram-Based Numerical Analysis of Generated Images

Through the comparative analysis of indexed colors in generated images of Chengnan Academy based on Photoshop, it is evident that when using the relatively simple keyword-based TI method, Jimeng AI produces images with comparatively dull colors. However, when employing the TLTI and TPTI methods, which incorporate line drawing processes and detailed keywords, the generated images exhibit more diverse colors, demonstrating that the LLM engages in deeper reflection on the complex content of the images when provided with more information. In contrast, Doubao AI differs: when using the TI and TLTI methods, the generated images display very rich colors with minimal variation between them, indicating the LLM’s capacity for free imagination even with simple keywords. However, when the TPTI method is applied, the color palette becomes more restricted, reflecting Doubao LLM’s cautious approach to handling complex content (Figure 4).
Using Jimeng AI as an example, an analysis of histogram data and corresponding visual effects of generated images reveals the following findings (Figure 5). For the Chengnan Academy image generated using the TI method, the average RGB value is 57.90, with a standard deviation of 50.92 and a median of 44, across a total of 230,400 pixels. This indicates that the image is overall relatively dark, resulting in low light–shadow contrast and a tranquil sense of academy atmosphere. For the Chengnan Academy image generated using the TLTI method, the average RGB value is 89.00, with a standard deviation of 63.25 and a median of 72, across 230,400 pixels, indicating a high-contrast with extensive dark and bright regions and strong illumination effects. For the Chengnan Academy image generated using TPTI Method, the average RGB value is 57.54, with a standard deviation of 52.65 and a median of 38, across 230,400 pixels. This shows that, similar to the brightness of images generated by the TI method, the overall tone of the image is dark. The shadows of tall trees create a sense of tranquility for the academy. However, due to the optimized prompts, the academy building itself is bathed in sunlight, resulting in a bright, well-balanced contrast with a clear centerpiece.

4. Discussion

Based on the above results, three multimodal activation methods for ancient Chinese academy landscapes and architectural images can be identified, each of which may be selected and optimized according to the characteristics of the original academy images and the intended activation goals. When the objective is to create photographic-style realistic scenes, all three workflows are capable of producing satisfactory results, and the method TI is the fastest in time consuming by either Jimeng or Doubao AI. When the aim is to highlight the architectural details and aesthetics of academies in isolation, the modeling-oriented effects of Method TLTI are more prominent, as this workflow emphasizes architectural layout representations.
Histogram analysis further indicates that, from the TI to the TLTI to the TPTI methods, AI increasingly aligns with the creator’s intentions through a more refined understanding of prompts. Although reference images themselves are highly intuitive, without the support of detailed prompts, AI-generated images tend to exhibit weaker comprehension of light–shadow contrast and reduced detail refinement. Therefore, incorporating detailed keywords into the workflow is essential for improving the quality of generated outputs and enhancing overall workflow effectiveness. Therefore, an image quality evaluation method should encompass architectural proportions, color accuracy, level of detail, luminance, and overall ambiance.

5. Conclusions

This study analyzes three methods and practical workflows for the multimodal activation of landscape and architectural images of ancient Chinese academies. By employing Chinese-context AIGC tools to generate realistic images of ancient Chinese academy landscapes and architecture, the study examines the characteristics of AIGC-generated images and conducts histogram-based numerical analyses of these images. Given the abundance of academy-related images and descriptive texts preserved in ancient Chinese literature, the future activation of ancient academy architecture and landscapes represents a highly meaningful research direction. Through the revitalization of academy architecture and landscapes, traditional educational and ritual spaces can be reconstructed, thereby assisting the public in understanding the history of Chinese education and the historical evolution of spatial environments.

Author Contributions

The author confirms sole responsibility for the conception and design of the study, data collection, analysis and interpretation of results, and the drafting, revision, and final approval of the manuscript.

Funding

This research was funded by Philosophy and Social Science Foundation of Hunan Province, grant number 25YBA344 and The APC was funded by Philosophy and Social Science Foundation of Hunan Province.

Acknowledgments

Thank National Library of China and Shidianguji for the Images of Ancient Chinese Academies.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIGC Artificial Intelligence Generated Content
AI Artificial Intelligence

References

  1. Gao, M. Research on Cultural Relic Restoration and Digital Presentation Based on 3D Reconstruction MVS Algorithm: A Case Study of Mogao Grottoes’ Cave 285. In 2024 2nd International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI 2024); 2024, Atlantis Press, 2024; pp. 744–754. [Google Scholar]
  2. Yuan, K.; Han, M. Creating and Preserving the Architectural Heritage and Environment of Ancient Towers in the Game Space Based on Digital Measurement and Computing Technology: Fangshan Ancient Pagoda, Beijing. In 2024 IEEE Smart World Congress (SWC); IEEE, 2024; pp. 88–93. [Google Scholar]
  3. Conan-Wu, X. L. Lure of the Supreme Joy: Pedagogy and Environment in the Neo-Confucian Academies of Zhu Xi; Brill, 2024; Vol. 164. [Google Scholar]
  4. Oppenlaender, J.; Linder, R.; Silvennoinen, J. Prompting AI Art: An Investigation into the Creative Skill of Prompt Engineering. Int. J. Hum.–Comput. Interact. 2025, 41, 10207–10229. [Google Scholar] [CrossRef]
  5. Liu, V.; Chilton, L. B. Design Guidelines for Prompt Engineering Text-to-Image Generative Models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, 2022; pp. 1–23. [Google Scholar]
  6. Ekin, S. Prompt Engineering for ChatGPT: A Quick Guide to Techniques, Tips, and Best Practices. Authorea Preprints 2023. [Google Scholar]
  7. Burlin, C. Explainability to Enhance Creativity: A Human-centered Approach to Prompt Engineering and Task Allocation in Text-to-Image Models for Design.
  8. Khan, I. The Quick Guide to Prompt Engineering: Generative AI Tips and Tricks for ChatGPT, Bard, Dall-E, and Midjourney; John Wiley & Sons, 2024. [Google Scholar]
  9. National Library of China. Shanhua County Gazetteer [善化县志]. n.d. Available online: http://read.nlc.cn/OutOpenBook/OpenObjectBook?aid=403&bid=56204.
  10. Wang, R.X. (Ed.) Liuyang County Gazetteer [浏阳县志]; Shidian Guji Digital Library, 1871. [Google Scholar]
  11. Zhao, N. (Ed.) Yuelu Gazetteer of Changsha Prefecture [长沙府岳麓志]; Shidian Guji Digital Library, 1685. [Google Scholar]
  12. Rao, Q. (Ed.) Hengzhou Prefecture Gazetteer [衡州府志]; Shidian Guji Digital Library, 1662. [Google Scholar]
Figure 1. Schematic Diagram of an Axially Symmetrical Academy Architectural Complex.
Figure 1. Schematic Diagram of an Axially Symmetrical Academy Architectural Complex.
Preprints 217366 g001
Figure 2. Workflow for AIGC-Based Revitalization of Ancient Chinese Academy Imagery.
Figure 2. Workflow for AIGC-Based Revitalization of Ancient Chinese Academy Imagery.
Preprints 217366 g002
Figure 3. Generated image effect.
Figure 3. Generated image effect.
Preprints 217366 g003
Figure 4. Comparative Analysis of Indexed Colors in Generated Images of Chengnan Academy.
Figure 4. Comparative Analysis of Indexed Colors in Generated Images of Chengnan Academy.
Preprints 217366 g004
Figure 5. Histogram-Based Numerical Analysise.
Figure 5. Histogram-Based Numerical Analysise.
Preprints 217366 g005
Table 1. Images of Ancient Chinese Academies
Table 1. Images of Ancient Chinese Academies
Name Chengnan Academy Illustration Nantai Academy Illustration Yuelu Academy Illustration Shigu Academy Illustration
Image characteristics Academy architectural illustration with labeled building names Academy layout illustration with labeled building names Overall view of the academy site, panoramic view of Yuelu Mountain Panoramic view of the academy with landscape depiction
Source National Library of China.
http://read.nlc.cn/OutOpenBook/OpenObjectBook?aid=403&bid=56204.0
https://www.shidianguji.com/book/NLG312001078228/chapter/1ljkoye2ttn9r?page_from=searching_page&version=3 https://www.shidianguji.com/book/CADAL02086491/chapter/1lgyqn7verlma?page_from=searching_page&version=5&paragraphId=7534273749740568627&lineId=126 https://www.shidianguji.com/book/NLG312001078282/chapter/1liwfpwbnbkll?page_from=searching_page&version=5&paragraphId=7571542153912057906&lineId=
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Accessibility

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated