Digital reconstruction of ancient architecture and landscapes represents a form of digital twin methodology, employing digital technologies to recreate virtual representations of physical buildings or landscapes. Current reconstruction approaches include 3D modeling, laser scanning, and holographic reconstruction. Such methods are particularly suitable for extant architectural remains or physical structures. For example, Gao (2024) employed Multi-View Stereo (MVS) 3D reconstruction technology to reconstruct Cave 285 of the Mogao Grottoes in Dunhuang. Similarly, Yuan, K., and Han (2024) employed three-dimensional reconstruction technologies to convert ancient pagoda heritage into virtual environments for game development. However, these methods rely heavily on existing physical structures, advanced professional technologies such as MVS, and scientific and efficient heritage image acquisition strategies to achieve high-quality 3D modeling outcomes. In contrast, when dealing with architectural imagery documented in ancient gazetteers and classical texts—where no physical remains are available—such methods face significant limitations in achieving effective image translation or reconstruction. At present, large-scale Chinese AI LLM operating primarily within Chinese-language contexts, such as Jimeng and Doubao, are capable of performing two-dimensional and three-dimensional image transformations through multimodal AIGC approaches. This development introduces a new research direction for the digital reconstruction of ancient architecture and landscapes: whether large language models can be leveraged for three-dimensional image reconstruction to accomplish image translation or restoration.
Ancient Chinese academy architecture represents a spatial embodiment of traditional Chinese education and ritual culture. It shaped the intellectual formation and cognition of local scholars and constitutes an important form of material cultural heritage worthy of preservation and transmission. China possesses a rich architectural heritage. Beyond the grand and monumental imperial architectural heritage exemplified by the Forbidden City in Beijing, there also exist regional academies that once served as institutions for Confucian education for scholars across different localities. Owing to historical developments, most academy buildings now survive only in local gazetteers and classical texts. Ancient Chinese academies developed during the Tang and Song dynasties as distinctive regional educational centers established either by the state or by private individuals. Their purpose was to provide local scholars with classical Confucian education, preparation for the imperial civil service examinations, and the cultivation of local talent. Accordingly, academy architecture varied in scale, ranging from small to medium to large complexes. For example, Bailudong Academy in Jiujiang, Jiangxi Province—known as the foremost academy in China—covers an area of approximately 3,800 square meters and could accommodate hundreds of students attending lectures. Ancient Chinese academy architecture is characterized by three key features. First, the architectural environment recreates a Confucian ritual space [
4], typically organized as an axial and symmetrical complex centered on a lecture hall displaying a portrait of Confucius. Second, an academy consists of a group of buildings comparable to a modern university, including structures functioning similarly to libraries (scripture repositories) as well as traditional Chinese landscape elements. Third, academy architecture often served as an important urban landmark, representing local identity and regional culture (
Figure 1). Consequently, how to employ digital–intelligent technologies for the three-dimensional reconstruction of academy architecture documented in classical texts has become a crucial research topic in cultural heritage protection and transmission.Digital reconstruction of ancient architecture and landscapes represents a form of digital twin methodology, employing digital technologies to recreate virtual representations of physical buildings or landscapes. Current reconstruction approaches include 3D modeling, laser scanning, and holographic reconstruction. Such methods are particularly suitable for extant architectural remains or physical structures. For example, Gao (2024) employed Multi-View Stereo (MVS) 3D reconstruction technology to reconstruct Cave 285 of the Mogao Grottoes in Dunhuang [
1]. Similarly, Yuan, K., and Han (2024) employed three-dimensional reconstruction technologies to convert ancient pagoda heritage into virtual environments for game development [
2]. However, these methods rely heavily on existing physical structures, advanced professional technologies such as MVS, and scientific and efficient heritage image acquisition strategies to achieve high-quality 3D modeling outcomes. In contrast, when dealing with architectural imagery documented in ancient gazetteers and classical texts—where no physical remains are available—such methods face significant limitations in achieving effective image translation or reconstruction. At present, large-scale Chinese AI LLM operating primarily within Chinese-language contexts, such as Jimeng and Doubao, are capable of performing two-dimensional and three-dimensional image transformations through multimodal AIGC approaches. This development introduces a new research direction for the digital reconstruction of ancient architecture and landscapes: whether large language models can be leveraged for three-dimensional image reconstruction to accomplish image translation or restoration. Ancient Chinese academy architecture represents a spatial embodiment of traditional Chinese education and ritual culture. It shaped the intellectual formation and cognition of local scholars and constitutes an important form of material cultural heritage worthy of preservation and transmission. China possesses a rich architectural heritage. Beyond the grand and monumental imperial architectural heritage exemplified by the Forbidden City in Beijing, there also exist regional academies that once served as institutions for Confucian education for scholars across different localities. Owing to historical developments, most academy buildings now survive only in local gazetteers and classical texts. Ancient Chinese academies developed during the Tang and Song dynasties as distinctive regional educational centers established either by the state or by private individuals. Their purpose was to provide local scholars with classical Confucian education, preparation for the imperial civil service examinations, and the cultivation of local talent. Accordingly, academy architecture varied in scale, ranging from small to medium to large complexes. For example, Bailudong Academy in Jiujiang, Jiangxi Province—known as the foremost academy in China—covers an area of approximately 3,800 square meters and could accommodate hundreds of students attending lectures. Ancient Chinese academy architecture is characterized by three key features. First, the architectural environment recreates a Confucian ritual space [
3], typically organized as an axial and symmetrical complex centered on a lecture hall displaying a portrait of Confucius. Second, an academy consists of a group of buildings comparable to a modern university, including structures functioning similarly to libraries (scripture repositories) as well as traditional Chinese landscape elements. Third, academy architecture often served as an important urban landmark, representing local identity and regional culture. Consequently, how to employ digital intelligent technologies for the three-dimensional reconstruction of academy architecture documented in classical texts has become a crucial research topic in cultural heritage protection and transmission.
Prompts constitute a critical component in AIGC-based image generation. Models operating in a Chinese-language context are capable of understanding classical Chinese content found in ancient local gazetteers, which provides a distinct advantage for scientifically grounded image translation, as relevant historical materials can be incorporated into the prompt-engineering process. Oppenlaender et al. (2025) argue that users should be able to discern prompt quality, compose prompts, and iteratively refine them [
4]. Liu and Chilton (2022) proposed prompt engineering guidelines to facilitate improved outputs from text-to-image generative models [
5]. Other studies examining the effectiveness of prompt usage include Ekin (2023) [
6], Burlin (2023) [
7], and Khan (2024) [
8], all of which analyze how prompt design can enhance artistic creativity, practice, and outcomes. This research is try to propose and validate a human-machine collaborative methodological framework for the spatial reconstruction of ancient architecture, applicable to specific scenarios where no physical remains exist.