Preprint
Article

This version is not peer-reviewed.

The Application of ResNet-34 Model Integrating Transfer Learning in the Recognition and Classification of Overseas Chinese Frescoes

A peer-reviewed article of this preprint also exists.

Submitted:

21 July 2023

Posted:

21 July 2023

You are already at the latest version

Abstract
The unique characteristics of frescoes on overseas Chinese buildings can attest to the integration and historical background of Chinese and Western cultures. Reasonable analysis and preservation of overseas Chinese frescoes can provide sustainable development for culture and history. This research adopts the image analysis technology based on artificial intelligence, and proposes a ResNet-34 model and method integrating transfer learning. This deep learning model can identify and classify the source of the frescoes of the emigrants, and can effectively deal with the problems such as the small number of fresco images on the emigrants' buildings, poor quality, difficulty in feature extraction, and similar pattern text and style. The experimental results show that the training process of the model proposed in this article is stable. On the constructed Jiangmen and Haikou fresco JHD datasets, the final accuracy is 98.41%, and the recall rate is 98.53%. The above evaluation indicators are superior to classic models such as AlexNet, GoogLeNet, and VGGNet. It can be seen that the model in this article has strong generalization ability and is not prone to overfitting. It can effectively identify and classify the cultural connotations and regions of frescoes.
Keywords: 
;  ;  ;  ;  

1. Introduction

The ancestral buildings of overseas Chinese in China are the architectural cultures that overseas Chinese introduce Mediterranean, Indian and Nanyang cultures into their ancestral countries through the Maritime Silk Road. The multicultural fusion frescoes painted on the ancestral buildings of overseas Chinese also have a strong local painting style, which has spread to multiple cultural regions in the southeastern coastal areas of China [1]. This article conducts a comparative study on the pattern style and color of Jiangmen overseas Chinese frescoes representing the Wuyi Overseas Chinese Cultural District and Haikou Overseas Chinese frescoes representing the Qionglei Cultural District. The architectural fresco styles of Jiangmen and Haikou are exemplary examples of integrating local culture and foreign Western culture. The frescoes on overseas Chinese buildings are a direct and reliable external language that reflects the local cultural genes. However, due to the different overseas Chinese cultures, historical backgrounds, and levels of Sino foreign exchange between the two regions, there are differences in the painting style, decorative patterns, and styling design of architectural frescoes.
Fresco painting is one of the earliest forms of painting in human history, known as wall art. The digital protection and restoration of frescoes can preserve the fresco information intact and replicate it as needed. By using high-quality fresco images collected and processed, and establishing a primitive and systematic database, virtual display of frescoes can be achieved, and it can also provide a clear window for future generations to understand the historical background of frescoes and the culture of overseas Chinese hometown. Lerme et al. [2] proposed a set of artificial intelligence image virtual reconstruction technology, which achieved digital restoration of frescoes s. Jiang et al. [3] addressed the issues of natural weathering and detachment in Dunhuang frescoes s and utilized computer virtual restoration to assist in the replication and protection of frescoes. Dondi et al. [4] used a large dataset of artificially generated hundreds of thousands of fresco images as reference objects to identify frescoes s of different colors in different periods, and match and repair severely damaged frescoes.
There are many studies on the recognition and classification of frescoes s using traditional methods such as feature extraction and classifier classification. For example, Cao et al.[5] aimed at the the problems such as small number of mural images and difficulty in feature extraction, proposed the Inception-v3 model of fusion transfer learning to identify and classify mural paintings, and effectively extracted the high-level features. Tang et al. [6] used the extracted frescoes image features as a measure of image similarity to express the overall similarity between two images. Teixeira et al. [7] used feature extraction algorithms to select key points corresponding to frescoes fragments and reference images for repairing damaged frescoes objects. Although traditional frescoes recognition and classification methods can extract certain features from frescoes s, due to the diversity of texture and color in frescoes s themselves, traditional methods cannot learn more abundant features of frescoes, resulting in insufficient generalization ability for feature extraction and classification of frescoes. With the continuous development of deep learning, Convolutional neural network has made many excellent achievements in the field of image recognition and classification. Convolutional neural network is widely used in various fields besides helping Medical imaging segmentation [8], resource prediction image [9], human motion recognition [10], cell image [11] and other image classification fields. In recent years, Convolutional neural network has been gradually applied to frescoes image restoration, image reconstruction and classification.
In order to address the shortcomings of insufficient feature extraction in previous frescoes recognition and classification methods, as well as the inability to reach a consensus on the source of frescoes s in traditional manual recognition methods, more scientific and convincing identification of the source areas of frescoes should be carried out. Therefore, a method that can scientifically and effectively identify the source of frescoes is particularly important. In this paper, by collecting a large number of overseas Chinese frescoes in Jiangmen and Haikou areas, using the pre trained ResNet-34 network model and integrating Transfer learning, a ResNet-34 model integrating Transfer learning that can effectively identify the areas to which the frescoes s belong is proposed, and the task of identifying and classifying overseas Chinese frescoes s is completed. This research is a cross application of artificial intelligence technology and Cultural anthropology. By identifying and categorizing the frescoes of overseas Chinese living in Haikou and Jiangmen, this study investigates the reasons for the style of frescoes drawn by overseas Chinese under the influence of Nanyang and North American cultures, as well as the proportions that are suitable for local cultural elements. This study can realize the practical and cultural significance of the protection and restoration of overseas Chinese cultural heritage a hundred years ago. Compared with the manual data processing of traditional anthropological Field research, this study strengthens the depth and breadth of cultural research, and has strong innovation and practicality.

2. Related theories

2.1. Convolutional Neural Network (CNN)

Convolutional Neural Network (CNN) is one of the representative neural networks in the field of deep learning technology [12,13,14]. This theory was first put forward by Lecun et al. [15]. And it has been rapidly developed and applied in recent years. The design originates from the research of visual neuroscience on simple cells and complex cells in the visual cortex of animals, and constructs a neural network model by simulating the processing process of cell visual information. Generally, it includes input convolution normalized activation pool, full connection softmax output and other operations. Lenet marks the official debut of CNN [15], followed by Alexnet and VGG [16,17], Now ResNet is widely used [18], and CNN continues to improve and has been well applied in various fields [19,20,21,22,23,24].

2.2. Convolution layer

The convolution layer of convolution neural network has two dimensions: height and width, so it is most commonly used in two-dimensional convolution operation. Usually, convolution operation is realized through correlation operation, that is, applying two-dimensional kernel array (also known as convolution kernel) to input data to obtain new two-dimensional data. Then, the convolution kernel moves one step on the input data, and each step is a convolution operation. Through the continuous convolution operation on the input data, the features of the input data are extracted from each convolution core to extract a feature of the data, and N features are extracted from n convolution cores. The operation principle of convolution is shown in formula (1), where X is the input matrix and W is the size of convolution kernel.
S ( i , j ) = ( X W ) ( i , j ) = m n x ( i + m , j + n ) w ( m , n )

2.3. Normalization Layer and Activation Function

The batch data is normalized before output, and a normalization layer is placed between each layer of the whole network, which can make the intermediate output of the whole network numerically stable, prevent the gradient disappearance or gradient explosion in the back propagation of the neural network, and make the training more stable.
Activation function can increase the nonlinearity of neural network and enhance the learning ability of neural network, such as rule activation function, Sigmoid activation function, Tanh activation function, etc. The rule activation function only retains the positive data, clears the negative data to 0, and the derivative of the positive data is 1 during back propagation, which solves the problem of gradient disappearance and has high operation efficiency. The Sigmoid activation function limits the output value range to (0,1), making the data smoother and easier to calculate the derivative. However, when the derivative value is [0,0.25], it is easy to lose the gradient. The Tanh activation function limits the output value range to (-1,1) and the derivative range to (0,1). Compared withSigmoid activation function, the gradient vanishing problem is alleviated.

2.4. ResNet-34 network

Residual network is a convolutional neural network proposed by four scholars of Microsoft Research. It won the prize in image classification and object recognition in the ImageNet large scale visual recognition challenge (ILSVRC) in 2015 [18]. The deeper the network model is, the more information can be obtained, and the richer the characteristics are. However, experiments show that with the deepening of the network, the optimization effect is worse, and the accuracy of test data and training data is reduced. This is because of the chain rule in the back propagation algorithm. If the gradients between layers are between (0, 1) and the layers are reduced, the gradients will disappear. On the contrary, if the gradient transferred layer by layer is greater than 1, then the gradient explosion will occur after layer by layer expansion. Therefore, a simple stacking layer will inevitably lead to network degradation. In order to make the deeper network train better, He et al., proposed a new network structure ResNet [25].
The advantage of residual network is that it is easy to realize optimization, and it can improve the accuracy by increasing a considerable depth. Its internal residual block uses a shortcut to alleviate the gradient vanishing problem caused by the increase of depth in the depth neural network. ResNet-34 is used as the primary network in this article. Table 1 is the specific parameter table of ResNet34 [18].
The pooling layer operation is mainly used to reduce the dimension of features, reduce the number of parameters, reduce the fitting situation, and improve the fault tolerance of the model. Common pool operations include maximum pool and average pool. This operation is similar to convolution. Convolution continuously moves input data through a two-dimensional array and obtains the maximum or average value of data in the two-dimensional array. Maximum pooling selects the maximum feature value in the region, which can better retain the texture features; Average pooling selects the average characteristic value in the region, which can better retain the background characteristics. The calculation of maximum pooling and average pooling is shown in Figure 1.
After continuous feature extraction, the neural network will eventually access a complete connection layer. The full connection layer is connected with the upper nodes, but in order to solve the over fitting problem of the neural network model, the discarding method is generally used to discard the extracted features with a certain probability. The full connection layer can increase the nonlinearity of the neural network, reduce the network parameters, and get the final mapping results.

2.5. Image Feature Extraction

The frescoes of overseas Chinese residences are rich in color, and there are significant differences in the color expression of frescoes in the Jiangmen and Haikou regions. This article will use the histogram method to extract the color features of images in frescoes, and analyze the color features in fresco images through calculations of different color ratios. The definition of color histogram is shown in Formula (2):
H(m) = nm / N ; m= 0, 1, ..., L-1
In formula (2), m is the grayscale level to which the pixel belongs; nm represents the number of grayscale pixels; N is the total number of pixels; L is the total number of grayscale levels. Due to the fact that fresco images are drawn on walls, the texture of fresco is more complex compared to typical natural images. This study uses Local binary patterns (LBP) to calculate the texture features of frescoes . The LBP algorithm can maintain the unchanged characteristics of the image under grayscale transformation operations, and can provide more than 90% of the features of fresco images. The LBP algorithm is defined in formula (3):
L B P   ( x c ,   y c   ) = p = 0 p 1 2 p   ( i p i c   )
In formula (3), (xc, yc ) represents the central element in the neighborhood, with a pixel value of i c , the pixel values of other elements in the neighborhood are i p , p is the number of central elements, s(x) represents the symbol operator, and s(x) is defined in formula (4).
( x ) = 1 ,   x 0 0 ,   x < 0

2.6. Transfer Learning

Transfer learning is a knowledge transfer method that maps the source domain with a large number of labeled data to the task target domain. Due to the limited number of frescoes in this study, which is quite different from the 10 million data sample size in ImageNet, it is difficult to train the deep network model. To achieve better training results, an important premise is to have enough data support. In order to solve this problem, we can use the method of transfer learning [26]. Because the training model parameters have strong feature migration ability, they can be directly introduced when extracting features from other data sets, which can not only improve the efficiency of network model development, but also strengthen the model performance and accelerate the training process. This paper adopts a fine-tuning migration strategy. During the training process, only the softmax layer is changed, and other layers load the weight parameters that ResNet-34 has trained in the ImageNet dataset.

3. Materials and Methods

3.1. Study Area and Datasets

Jiangmen city, located in Guangdong province, China, is a famous homeland of diaspora Chinese in South China. Haikou city, located in Hainan province, China, is also a gathering place for overseas Chinese. The geographical locations and sampling points of the two cities are shown in Figure 2. The experimental data set of this study was captured and collected by high-resolution SLR camera and high-definition smart phone. The classification of frescoes was studied using the styles and colors of fresco patterns from different regions as sample features.
The data of the experiment were collected from the Jiangmen overseas Chinese architectures. Its architectural style is a combination of Chinese and Western styles, mainly baroque style, imitation Renaissance style, Roman arcade style, Ionian column style and a small amount of south China traditional style. The Jiangmen arched constitute the historical imprint and urban memory of the changes in Jiangmen urban architectural landscape caused by the overseas Chinese investing in real estate in their hometown. The experimental data of Haikou were collected in the old arcade street of Haikou. The history of the Haikou fresco is closely related to the early opening up of Haikou. In the late Qing Dynasty, Haikou was one of the ports open to the outside world at that time, and it was the window of the island’s opening to the outside world. People who made a living in Namyang returned home to invest and build on a large scale. Haikou arcade has a strong Namyang style. With an area of 25000 square kilometers, there are nearly 500 arcade buildings. This study collected a total of 2385 overseas Chinese architectural frescoesin the Jiangmen and Haikou regions.

3.2. Data Preprocessing

Data preprocessing can eliminate duplicates, errors, and poor quality images. In order to facilitate the construction of convolutional neural network and network training, the sample image size is reset to 224 × 224 pixel. Generally, the larger the amount of training data, the more accurate the recognition rate of the system. Therefore, in order to get a successful neural network, a large number of parameters are needed. However, in practice, it is difficult to find a large number of original data that can be used for training. Therefore, it is necessary to expand the data before providing it to the model, that is, data enhancement.
Data set enhancement is mainly to increase the amount of training data, improve the generalization ability and robustness of the model, and reduce the over fitting phenomenon of the network. The data enhancement method adopted in this paper (as shown in Figure 3 mainly includes: Gaussian noise [27], contrast enhanced [28], sharpened image [29] and rotates the image [30], and the number of database images is expanded to a total of 11380 images. We constructed the Jiangmen Haikou fresco image dataset (JHD). The cross validation method was used in the data set division. For the preprocessed data set, the images of each category were randomly divided into three parts, 80% of which were the training set, 10% of which were the test set, and 10% of which were the verification set. CNN is the most popular neural network model for image classification. Image classification is given a group of images marked with a single category. We need to predict what category they are for a new group of test images? And measure the accuracy of the prediction. In this paper, we have two different styles of wall painting. We use ResNet-34 to train the model and identify the image features. After fine-tuning the parameters, our model can better classify the architectural frescoes.

3.3. Classification Model of Expatriate Frescoes Integrating Transfer Learning

Due to the low quality and small quantity of fresco images, in order to extract the characteristics of fresco images in depth on the JHD fresco data set in this study, the model in this paper will conduct pre training on the large ImageNet data set, and apply the knowledge learned from Transfer learning to the JHD fresco data set, so as to identify and classify the fresco images. The classification model for overseas Chinese fresco images proposed in this article consists of a feature extraction section and a classification section. The feature extraction part uses Convolutional neural network, color histogram and LBP texture feature histogram. The classification section is the Softmax layer. The classification model is shown in Figure 4.
As can be seen from Figure 4, the proposed classification model of expatriate frescoes integrating Transfer learning is mainly divided into three parts for regional classification of frescos. Firstly, a pre trained ResNet-34 model is used to extract high-dimensional features from frescos. In order to better express the features extracted from the front-end convolutional layer, three consecutive fully connected layers are used to extract the deep features of the fresco image; Then, the color histogram is used to extract the color features of the fresco, and the LBP texture histogram is used to extract the texture features of the fresco image; Finally, the high-dimensional features extracted from the pre trained model are fused with artistic features to generate feature vectors as the required output nodes in the Softmax layer.

3.4. Improved Classification Model for Overseas Chinese Frescoes

3.4.1. Integrating Transfer Learning to Enhance the Stability of the Model

Due to the local characteristics of existing overseas Chinese frescos, as well as the problems of limited quantity, severe damage, and poor quality of fresco images, the collection and preprocessing of overseas Chinese fresco images are relatively difficult. We need to collect fresco images from different cities for regional classification of overseas Chinese fresco images, which makes it more difficult to collect and organize a large amount of data. In order to improve the learning efficiency of the model and better extract the deep features of the fresco image, and overcome the instability of the model caused by the complexity of fresco features and the cliff problem in the process of feature extraction, this method is based on ResNet-34 model and integrates Transfer learning. The purpose of Transfer learning is to transfer valuable information learned in one field to another. Using Transfer learning can improve the stability and generalization of the model, so that the final classification results will not be affected by the changes of fresco image pixels.
The method of integrating Transfer learning in this paper is to pre train the ResNet-34 model on the large dataset ImageNet, extract the shallow features of the image, and then apply the Transfer learning knowledge as the output of the model bottleneck layer to the JHD fresco dataset. This model freezes the convolutional layers before the fully connected and Softmax layers of the ResNet-34 model, trains a new fully connected and Softmax layer for deep extraction of image features from frescos, and completes the model training and fresco image classification tasks in a relatively short time.

3.4.2. Introducing Cross Entropy Function to Stabilize Model Gradient

In order to solve the problem of Vanishing gradient problem of the model and assess the gap between the real value and the predicted value, this study uses the combination of Cross entropy function and Softmax function as the Loss function. It can effectively solve problems such as slow or stagnant weight updates of hidden layers caused by the phenomenon of model gradient vanishing. Cross entropy is used to indicate the distance between the actual output and the expected output of the model. The smaller the value of Cross entropy, the closer the actual output and the expected result of the model are, the better the effect is. In the process of backpropagation, the greater the error between the true and predicted values, the greater the amplitude of parameter adjustment, and the faster the model converges. At the end of the experiment, the cross entropy value in the training process is output, which can be used to judge whether the model is over fitted.

3.4.3. Increasing the Number of Fully Connected Layers to Enhance Image Feature Expression

When using the original network model directly to extract fresco images of overseas Chinese, there is often a problem of insufficient image feature extraction. On the pre trained ResNet-34 model during this experiment, after fine-tuning the parameters of all layers, in order to better learn and express the high-dimensional image features extracted by the front-end network, three consecutive fully connected layers were constructed after the bottleneck layer of the network model. To avoid gradient dispersion issues, the Softmax layer is selected to classify image features.

3.5. Classification Process of Overseas Chinese Architectural Frescoes

The framework for regional classification of overseas Chinese architectural frescoes by ResNet-34 model integrating Transfer learning is shown in Figure 5, which is mainly divided into the following six tages.
Stage 1: Fresco image preprocessing stage. The input data for this stage is the original image dataset of overseas Chinese architectural frescos, and the output data is the training set, testing set, and validation set of the frescos. The specific steps are as follows: (1) Modify the size of each fresco image in the original dataset, and the unified format is 224 × 224 pixels, eliminating duplicates, errors, and poor quality images; (2) Expand the image dataset using preprocessing methods such as Gaussian noise, salt and pepper noise, histogram equalization, and rotations of the image to obtain the JHD fresco dataset; (3) Using a random function on the JHD dataset images, 80% of the fresco images were used as the training set, 10% of the fresco images were used as the test set, and 10% of the fresco images were used as the validation set.
Stage 2: Model pre training stage. At this stage, the input data is the training set, and the output data is the transfer model. The specific steps are as follows: (1) Train on a large dataset ImageNet and pre train the ResNet-34 model; (2) Adjust the parameters of the model slightly, and record the changes of Learning rate and accuracy rate at different iterations; (3) Train the training set of fresco images to obtain the trained ResNet-34 model; (4) Obtain the migration model.
Stage 3: Image art feature extraction stage. At this stage, the input data is the training set, and the output data is the artistic features of fresco images. The specific steps are as follows: (1) Use the color histogram algorithm to extract the color features of fresco images; (2) Using LBP texture histogram algorithm to extract texture features of fresco images; (3) Obtain the artistic features of the fresco.
Stage 4: Feature fusion stage. At this stage, the input data is high-level features and artistic features of frescos, while the output data is fusion features of fresco images. The specific steps are as follows: (1) Obtain the deep features of the frescos extracted from the pre trained model; (2) Obtain color and texture features of frescos; (3) Integrate deep features, color features, and texture features to obtain artistic features.
Stage 5: Model testing stage. At this stage, the input is the test set, and the output is the test accuracy. The steps are as follows: (1) Import the test set into the pre trained transfer model; (2) Statistical classification results, get the final accuracy rate.
Stage 6: Model validation stage. At this stage, the input is a validation set, and the output is to verify the accuracy of fresco image classification. The steps are as follows: (1) Import the validation set into the pre trained transfer model; (2) Statistical validation results.

4. Results and Discussion

4.1. Experimental Environment

In this paper, the hardware environment of this experiment is: the processor is Intel Core i7-9700 CPU; RAM is 16 GB; The GPU is NVIDIA GeForce RTX 3090. The SPM12 and CAT12 toolkits in Matlab R2016b used for image preprocessing. The network model is built using the open-source deep learning framework Python, with an input image size of 224×224×3. The hyperparameter settings are epochs=100、Batch_size=32、 Initial Learning rate Learning_ Rate=0.01, Weight_ Decay=0. 0001, Dropout=0 5.

4.2. Evaluation Index

The accuracy is used to evaluate the test results.The calculation formula of the accuracy rate is shown in Formula (5):
Accuracy = TP + TN / TP + FP + FN + TN
Among them, TP (true positions) is the number of correctly divided positive examples, that is, the number of samples that are actually positive examples and are divided into positive examples by the classifier; FP (falsepositions) is the number of samples that are incorrectly divided into positive cases, that is, the number of samples that are actually negative cases but are divided into positive cases by the classifier; FN (false negatives) is the number of samples that are incorrectly divided into negative cases, that is, the number of samples that are actually positive but are divided into negative cases by the classifier; TN (true negatives) is the number of samples that are correctly divided into negative cases, that is, the number of samples that are actually negative and are divided into negative cases by the classifier.

4.3. Results and Analysis

4.3.1. Model Training and Validation

In this experiment, after training and testing the model for many time. The number of iterations of network training epoch is set to 100, the batch_size is set to 32, the Adam optimizer is used to speed up the convergence of the model, the learning rate is set to 0.0001, and the cross entropy loss function is used to realize the feature extraction and classification of arcade patterns.As can be seen from Table 2, the learning rate is 0.0001, the research model has better performance. Figure 6 shows the changes of accuracy and cross entropy in the training process. From Figure 6, it can be seen that during the model training process, the training accuracy continues to increase. After reaching 60 training times, the accuracy of the model tends to stabilize, reaching around 98%. It can be seen from the Cross entropy in Figure 6 that with the increase of training times, the Cross entropy keeps decreasing and tends to be stable after 60 times. In summary, the model in this article has good performance during the training process and is not prone to overfitting.
The learning rate is a scale factor that adjusts the weight during training. Too large a learning rate will cause the model fluctuation to fail to converge, while too small a learning rate will make the model converge too slowly, wasting training time and computing resources. In this experiment, when the number of iteration steps is the same, the learning rate is set to 0.0001, 0.001 and 0.01 respectively for multiple groups of experiments. Finally, the experimental results are statistically analyzed. The comparison of accuracy under different learning rates is shown in Table 2. As can be seen from Table 2, when the learning rate LR is 0.0001, the model in this paper shows good performance, and the final accuracy is as high as 98.41%. Compared with the two groups of experiments with learning rates of 0.001 and 0.0001, the accuracy of this model is improved by 8.61% and 4.04% respectively.

4.3.2. Comparison of Different Fresco Features

In the experiment of identifying fresco areas where overseas Chinese reside, the color features, texture features, and painting style of fresco images will have a significant impact on the experimental results. The painting style of frescos is a reflection of the culture of the time, and it is also based on color characteristics. The recognition accuracy between regions with significant differences in fresco color and texture is higher. Based on the above research, this section conducts comparative experiments from two aspects.
(1) In order to verify the impact of the color features of fresco images on recognition performance, a portion of the fresco images are selected for color adjustment, including increasing grayscale values, increasing saturation, and inverting color transformations, before proceeding with fresco region recognition. Figure 7 is an example of color adjustment for a fresco image. In Figure 7, (a) image is used in the experiment 224 × A 224 size original fresco image, where (b) the image increases the grayscale value based on (a), (c) the image increases the saturation value based on (a), and (d) the image undergoes an inverse color transformation based on (a).
Table 3 represents the probability that the fresco images of overseas Chinese residences are correctly recognized as preset regional labels. From Table 3, it can be seen that after adjusting the color of the fresco image, the final recognition accuracy has decreased. The accuracy decreased by 61.53, 1.76, and 27.15 percentage points after increasing the grayscale value, saturation, and inverse transformation, respectively. The above experiments indicate that after the loss of some color features in overseas frescos, the color histogram does not extract the rich color features of the fresco image, resulting in poor learning and classification of features when recognizing regions.
(2) In order to verify the impact of texture features on the recognition effect of fresco images, and considering that the change in image resolution directly affects the calculation of texture features, a portion of fresco images are selected for resolution adjustment before region recognition. Due to the influence of image resolution on the texture features of the image, this experiment expands the resolution of the original image to 2 and 4 times, and applies them to the network model in this paper, respectively, to determine the accuracy of the image being recognized as the correct region.
Table 4 represents the probability that overseas Chinese architectural fresco images are correctly recognized as preset area labels. From Table 4, it can be seen that as the image resolution increases, the texture features of the image become more blurry, and the final recognition accuracy also decreases.
From Table 3 and Table 4, it can be analyzed that the color features of the image have a significant impact on the accuracy of the final image recognition, while the texture features of the image have a relatively small impact. From this, it can be concluded that color features play a decisive role in the recognition and classification experiments of fresco regions in this article.

4.3.3. Comparison of Performance between Different Models

In order to better reflect the advantages of ResNet model and transfer learning fusion method in arcade area recognition and classification, this paper compares the accuracy of this model with the classical deep learning network model. The classical deep learning network models include AlexNet [31], and GoogLeNet [32] and VGGNet [33]. The comparative experiment of this experimental model adopts the same configuration of software and hardware environment, and uses the same super parameter setting and the same image preprocessing method in data training. Finally, the accuracy and recall rate are used to analyze the classification results of different models. The accuracy rate is for our prediction results, which indicates how many of the predicted positive samples are really positive samples. Recall rate is for our original sample, which indicates how many positive examples in the sample are predicted correctly. As shown in Table 5, the accuracy rate of transfer learning and ResNet-34 in this paper is 98.41, and the recall rate is 98.53; Compared with AlexNet-10 and AlexNet-S6, the accuracy rate are 85.51% & 86.68, and the recall rate are 83.79% & 84.55%; The accuracy rate of GoogLeNet network is 90.29%, and the recall rate is 89.61%. The accuracy rate of VGGNet-16 network is 89.26%, and the recall rate is 88.15%. It can be seen from the above table that our model used in this paper has good performance in classification of arcade decoration images. The above experimental research shows the correctness of this idea and the effectiveness of the method.

5. Conclusion

In order to solve the problems of small number, poor quality, difficulty in feature extraction and similarity between pattern text and painting style of fresco images on overseas Chinese buildings, this paper proposes a ResNet-34 model and method integrating Transfer learning, and applies it to the recognition and classification of overseas Chinese architectural frescos. This model adopts the method of Transfer learning, trains on the large dataset ImageNet, and gets the transfer model. Train the training set of fresco images to obtain the trained ResNet-34 model. Solved the training problem caused by the small amount of data in the overseas Chinese architectural fresco dataset.The data set is expanded by data enhancement and expansion algorithm. Finally, the classification accuracy on the test set is 98 41%, which shortens the data operation time, and can extract the features of architectural images and classify them. Compared with classical convolution neural network, each evaluation index is better than AlexNet, GoogLeNet, VGGNet and other classical models. Compared with AlexNet-10, AlexNet-S6, GoogLeNet, VGGNet-16 network models, the accuracy of out model for overseas Chinese hometown building recognition is improved by 13, 11.73, 8.12 and 9.15 percentage points respectively. The experimental results show that the proposed model has stable recognition and classification performance, higher accuracy and faster convergence.This research is an innovative applied research that applies the most advanced computer research methods to architectural frescos of overseas Chinese residence.The classification of architectural ornamentation containing cultural information in Wuyi Overseas Chinese cultural area and Qionglei cultural area is carried out to achieve accurate positioning of the sources of other ornamentation images and cultural areas. Finally, the research results of this paper can provide technical support for subsequent cultural traceability, cultural identification and cultural protection. In the experiment, due to the small differences between the hardware environment and the painting style and color of some overseas Chinese architectural fresco images, the model in this paper can not extract good color features for the Color gradient and cliffs of frescos. In future research, we will continue to expand the JHD image dataset, conduct further research based on the characteristics of frescos themselves, and improve classification accuracy, making the regional classification of overseas architectural frescos more rapid and effective.

Author Contributions

Conceptualization, L.G. and T.Y.; methodology, L.G. and B.W.; software, X.Z.; validation, L.G., T.Y. and B.W.; formal analysis, L.G.; investigation, T.Y.; resources, L.G.; data curation, T.Y. and J.L.; writing—original draft preparation, L.G.; writing—review and editing, T.Y.; visualization, L.G.; supervision, L.G.; project administration, L.G.; funding acquisition, L.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R & D Program of China (2022YFC3303200), the Teaching reform project of Guangdong province (GDJX2020009) and the Guangdong province philosophy and social science planning discipline joint project (GD20XSH06).

Data Availability Statement

In this study, the authors used a publicly available datasets for analysis: ImageNet which has been deposited on the website http://image-net.org/.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Gao, L.; Wu, Y.; Yang, T.; Zhang, X.; Zeng, Z.; Chan, C.K.D.; Chen, W. Research on Image Classification and Retrieval Using Deep Learning with Attention Mechanism on Diaspora Chinese Architectural Heritage in Jiangmen, China. Buildings. 2023, 13, 275. [Google Scholar] [CrossRef]
  2. Lerme, N.; Hegarat-Mascle, S.L.; Zhang, B.; Aldea, E. Fast and efficient reconstruction of digitized frescoes. Pattern Recognition Letters. 2020, 138, 417–423. [Google Scholar] [CrossRef]
  3. Jiang, C.; Jiang, Z.; Shi, D. Computer-Aided Virtual Restoration of Frescoes Based on Intelligent Generation of Line Drawings. Mathematical Problems in Engineering. 2022, 1, 1–9. [Google Scholar] [CrossRef]
  4. Dondi, P.; Lombardi, L.; Setti, A. DAFNE: A dataset of fresco fragments for digital anastlylosis. Pattern Recognition Letters. 2020, 138, 631–637. [Google Scholar] [CrossRef]
  5. Cao, J.; Yan, M.; Jia, Y.; Tian, X. Application of inception-v3 model integrated with transfer learning in dynasty identification of ancient murals. Journal of Computer Applications. 2021, 11, 3219–3227. [Google Scholar] [CrossRef]
  6. Tang, D.; Lu, D.; Yang, B.; Xu, D. Similarity metrics between mural images with constraints of the overall structure of contours. Journal of Image and Graphics. 2013, 8, 968–975. [Google Scholar] [CrossRef]
  7. Teixeira, T.S.; Andrade, M.L.S.C.; Luz, M.R. Reconstruction of frescoes by sequential layers of feature extraction. Pattern Recognition Letters. 2021, 147, 172–178. [Google Scholar] [CrossRef]
  8. Su, H.; Gao, L.; Lu, Y.; Jing, H.; Hong, J.; Huang, L.; Chen, Z. Attention-guided cascaded network with pixel-importance-balance loss for retinal vessel segmentation. Frontiers in Cell and Developmental Biology. 2023, 11, 1196191. [Google Scholar] [CrossRef]
  9. Gao, L.; Wang, K.; Zhang, X.; Wang, C. Intelligent identification and prediction mineral resources deposit based on deep learning. Sustainability. 2023, 15, 10269. [Google Scholar] [CrossRef]
  10. Najeeb, R.M.; Syed, A.R.A.; Usman, U.S.; Asma, C.; Nirvana, P. Cascading pose features with CNN-LSTM for multiview human action recognition. Signals. 2023, 4, 40–55. [Google Scholar] [CrossRef]
  11. Cedric, A.; Lionel, H.; Chiara, P.; Ondrej, M.; Olivier, C.; William, P.; Francesca, A.; Kiran, P.; Sophie, M. CNN-Based cell analysis: From image to quantitative representation. Frontiers in Physics. 2022, 9, 776805. [Google Scholar] [CrossRef]
  12. Lawrence, S.; Giles, C.L.; Tsoi, A.C.; Back, A.D. Face recognition: A convolutional neural-network approach. IEEE Transactions on Neural Networks. 1997, 1, 98–113. [Google Scholar] [CrossRef] [PubMed]
  13. Xu, X.B.; Ma, F.; Zhou, J.M.; Du, C.W. Applying convolutional neural networks (CNN) for end- to- end soil analysis based on laser- induced breakdown spectroscopy (LIBS) with less spectral preprocessing. Computers and Electronics in Agriculture. 2022, 199, 107171. [Google Scholar] [CrossRef]
  14. Murugan, G.; Moyal, V.; Nandankar, P.; Pandithurai, O.; Pimo, E.S. A novel CNN method for the accurate spatial data recovery from digital images. Materialstoday: Proceedings. 2021, 80, 1706–1712. [Google Scholar] [CrossRef]
  15. Lecun, Y.; Bottou, L.; engio, Y.B.; Haffner, P. Gradient-based learning applied to document recognition. In Proceedings of the IEEE. 1998, 11, 2278–2324. [Google Scholar] [CrossRef]
  16. Alex, K.; Ilya, S.; Geoffrey, E.H. ImageNet classification with deep convolutional neural networks. Communications of the ACM. 2017, 60, 84–90. [Google Scholar] [CrossRef]
  17. Szegedy, C. Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015; 1–9. [Google Scholar] [CrossRef]
  18. He., K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016; 770–778. [Google Scholar] [CrossRef]
  19. Smith, Y.; Zajicek, G.; Werman, M.; Pizov, G.; Sherman, Y. Similarity measurement method for the classification of architecturally differentiated images. Computers and biomedical research an international journal. 1999, 32, 1–12. [Google Scholar] [CrossRef]
  20. Akbarimajd, A.; Hoertel, N.; Hussain, M.A.; Neshat, A.A.; Marhamati, M.; Bakhtoor, M.; Momeny, M. Learning-to-augment incorporated noise-robust deep CNN for detection of COVID-19 in noisy X-ray images. Journal of Computational Science. 2022, 63, 101763. [Google Scholar] [CrossRef]
  21. He, H.J.; Xu, H.Z.; Zhang, Y.; Gao, K.; Li, H.X.; Ma, L.F.; Li, J. Mask R-CNN based automated identification and extraction of oil well sites. International Journal of Applied Earth Observation and Geoinformation. 2022, 112, 102875. [Google Scholar] [CrossRef]
  22. Kim, Y.H.; Park, K.R. MTS- CNN: Multi- task semantic segmentation- convolutional neural network for detecting crops and weeds. Computers and Electronics in Agriculture. 2022, 199, 107146. [Google Scholar] [CrossRef]
  23. Polsinelli, M.; Cinque, L.; Placidig, G. A light CNN for detecting COVID-19 from CT scans of the chest. Pattern Recognition Letters. 2020, 140, 95–100. [Google Scholar] [CrossRef] [PubMed]
  24. Kabir, S.; Patidar, S.; Xia, X.; Liang, Q.H.; Neal, J.; Pender, G. A deep convolutional neural network model for rapid prediction of fluvial flood inundation. Journal of Hydrology. 2020, 590, 125481. [Google Scholar] [CrossRef]
  25. He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Identity Mappings in Deep Residual Networks. Computer vision- ECCV 2016, PT IV. 2016, 9908, 630–645. [Google Scholar] [CrossRef]
  26. Lv, Y.Z.; Xue, J.N.; Duan, F.; Sun, Z.; Li, J.H. An exploratory study of transfer learning frameworks in the context of few available shots of neurophysiological signals. Computers and Electrical Engineering. 2022, 101, 108091. [Google Scholar] [CrossRef]
  27. Rafael, L.J.; Jose, M.M.; Casilari, E. Analytical and empirical evaluation of the impact of Gaussian noise on the modulations employed by Bluetooth Enhanced Data Rates. Eurasip Journal on wireless communications and networking. 2012, 2012, 1–11. [Google Scholar] [CrossRef]
  28. Zhou, Z.; Shi, Z.; Ren, W. Linear contrast enhancement network for low-illumination image enhancement. IEEE transactions on instrumentation and measurement. 2023, 72, 1–16. [Google Scholar] [CrossRef]
  29. Pham, T.D. Kriging-weighted laplacian kernels for grayscale image sharpening. IEEE Access. 2022, 10, 57094–57106. [Google Scholar] [CrossRef]
  30. Liu, K.; Tian, Y.Z. Research and analysis of deep learning image enhancement algorithm based on fractional differential. Chaos, Solitons & Fractals. 2020, 131, 109507. [Google Scholar] [CrossRef]
  31. Singh, I.; Goyal, G.; Chandel, A. AlexNet architecture based convolutional neural network for toxic comments classification. Journal of King Saud University- Computer and Information Sciences. 2022, 34, 7547–7558. [Google Scholar] [CrossRef]
  32. AK, A.; Topuz, V.; Midi, I. Motor imagery EEG signal classification using image processing technique over GoogLeNet deep learning algorithm for controlling the robot manipulator. Biomedical Signal Processing and Control. 2022, 72, 103295. [Google Scholar] [CrossRef]
  33. Feng, S.; Zhao, L.; Shi, H.; Wang, M.; Shen, S.; Wang, W. One-dimensional VGGNet for high-dimensional data. Applied Soft Computing. 2023, 135, 110035. [Google Scholar] [CrossRef]
Figure 1. Pool Operation.
Figure 1. Pool Operation.
Preprints 80166 g001
Figure 2. Geographical Location and Sampling Points of the Research Area.
Figure 2. Geographical Location and Sampling Points of the Research Area.
Preprints 80166 g002
Figure 3. Example of Image Data Enhancement.
Figure 3. Example of Image Data Enhancement.
Preprints 80166 g003
Figure 4. Classification Model of Overseas Chinese Frescoes Integrating Transfer Learning.
Figure 4. Classification Model of Overseas Chinese Frescoes Integrating Transfer Learning.
Preprints 80166 g004
Figure 5. The framework of the ResNet-34 model Integrated with Transfer Learning to Classify the Expatriate Frescoes.
Figure 5. The framework of the ResNet-34 model Integrated with Transfer Learning to Classify the Expatriate Frescoes.
Preprints 80166 g005
Figure 6. Changes in Accuracy and Cross Entropy during Training.
Figure 6. Changes in Accuracy and Cross Entropy during Training.
Preprints 80166 g006
Figure 7. Color Adjustment of Fresco Images.
Figure 7. Color Adjustment of Fresco Images.
Preprints 80166 g007
Table 1. ResNet34 Parameter Table.
Table 1. ResNet34 Parameter Table.
Layer name Output size 34-layer
conv1 112*112 7*7, 64, stride 2
Max pooling 3*3, stride 2
conv2_x 56*56 3 3 , 64 3 3 , 64 3
conv3_x 28*28 3 3 , 128 3 3 , 128 4
conv4_x 14*14 3 3 , 256 3 3 , 256 6
conv5_x 7*7 3 3 , 512 3 3 , 512 3
Global Average Pooling
Fully Connected Layer
Softmax
Table 2. Comparison of accuracy at different learning rates.
Table 2. Comparison of accuracy at different learning rates.
Number LR Accuracy
1 0.01 89.80%
2 0.001 94.37%
3 0.0001 98.41%
Table 3. Comparison of Regional Recognition Accuracy for Fresco Images with Different Color Features.
Table 3. Comparison of Regional Recognition Accuracy for Fresco Images with Different Color Features.
Color Feature Accuracy
Original Image 98.41%
Increase Grayscale Value 36.88%
Saturation Add 96.65%
Inverse Transformation 71.26%
Table 4. Comparison of regional recognition accuracy of different resolutions of overseas Chinese fresco images.
Table 4. Comparison of regional recognition accuracy of different resolutions of overseas Chinese fresco images.
Resolution Ratio Accuracy
224×224 98.41%
448×448 91.58%
896×896 89.79%
Table 5. Performance Comparison of Different Models.
Table 5. Performance Comparison of Different Models.
Model Accuracy Recall rate
AlexNet-10 85.41% 83.79%
AlexNet-S6 86.68% 84.55%
GoogLeNet 90.29% 89.61%
VGGNet-16 89.26% 88.15%
Ours 98.41% 98.53%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated