Forest fire recognition based on GNN with dynamic feature similarity of multi-view images

Forest fire identification is important for forest resource protection. Effective monitoring of forest fires requires the deployment of multiple monitors with different viewpoints, while most traditional recognition models can only effectively recognize images from a single source, often because they ignore the correlation information between images from different viewpoints, resulting in inaccurate visual similarity estimation for multiple source samples and generating the problems of missed and high false alarm rates. In order to solve the problems, a similarity-guided graph neural network model based on the dynamic characteristics of images is proposed in this paper. The method converts the input features of the nodes on the graph into relational features of different gallery pairs by establishing pairs (nodes) that represent different viewpoint images and gallery images. The dynamic feature update of the image gallery using the new feature-bank relationship enables the estimation of the similarity between images and improves the image recognition rate of the model. Besides, to reduce the complicated pre-processing process and extract the key features in the images effectively, this paper also proposes a dynamic feature extraction method for fire regions based on image segment ability. By setting the threshold value of HSV color space, the fire region is segmented from the image and the fire region frames are calculated for dynamic feature extraction. The experimental results on the open-source forest fire dataset and our collected forest fire dataset show that the performance of the method in this paper is improved by 4% compared with Resnet, the theme during this paper may be tailored to totally different fire eventualities and has sensible generalization and interference resistance.


Introduction
Forest fire is the most threatening disaster in forest ecosystems, and early detection of fire sources before it turns into a catastrophic event is the key to prevent fires1. The development of electronic data and machine vision, forest fire monitoring based on computer vision has become a current research hotspot in the field of forest fire prevention. Usually, the method of fire identification is flame detection, and most fire identification systems are specifically designed for flame detection, but the fire scene has become serious after the flame appears, and sometimes the flame is difficult to identify under complex weather conditions2.
Forest fire identification, color recognition is one of the earliest methods used to achieve flame recognition through the recognition of motion, spatial and temporal features of flame color patterns 3, but requires a small recognition distance and a large flame size. Many current fireplace recognition algorithms supported CNN strategies, however, have a high warning rate, and these methods can only satisfy the recognition of static fire images, i.e., none of them can achieve the recognition of dynamic similar features of fire 4. On this basis, researchers proposed to use the background difference method to find motion pixels, and then use the color model to find flame color regions, and perform Spatio-temporal analysis on these regions to identify irregular and flickering fire features 5. However, only images from the same viewpoint can be used for recognition and the accuracy is low. Currently, some scholars use deep CNN models to identify fire regions by training classifiers that avoid the tedious and time-consuming process of feature extraction and automatically learn the rich features in the original fire data 6. Unfortunately, they suffer from the same problem of effective recognition of images from the same source, but poor training or detection performance for multiple source samples. More often, the presence of fire-like interference in the recognition process they cannot be handled effectively, leading to false alarms. Thus, to reduce the complex pre-processing process of forest fire images and effectively extract key features from the images, this paper proposes a dynamic feature extraction method for fire regions based on image segment ability, which uses dynamic features as model inputs to improve the robustness of the network in recognizing forest fires from different perspectives.
These schemes can only effectively identify images from the same source, ignoring the correlation information between images from different viewpoints, so thus the visual similarity estimation for multiple supply samples isn't correct 7. Images taken from different monitoring viewpoints in the same forest fire monitoring scene have the same fire characteristics, e.g., fire area, Background colors, thermal radiation, to improve the identification of forest fires, some researchers have learned to distinguish features or designed various metric distances to better measure the similarity between images from different viewpoints 8. However, these researchers only consider the similarity between two images as a whole, while ignoring the intrinsic similarity between the whole. For example, once we tend to try and estimate the similarity between the detection image and also the target image, most feature learning and metric learning is completed by coaching and perceptive the pairwise relationships between the photographs on an individual basis., ignoring other relationships between different images of the source. To overcome this problem, differences need to be identified in the valuable images. The literature 9 suggests the use of stream learning, a method that considers the similarity of each group of images in the set. This method maps the images as stream shapes, to make the local geometry smoothest. There is also the re-ranking method which is also used for local similarity estimation between images 10, which combines the similarities between well-ranked images. However, both "stream learning" and "reordering" methods have shortcomings: (1) most of the "stream learning" and "reordering (1) Most of the "flow learning" and "reordering" methods are unsupervised and cannot be fully utilized. (2) These two methods do not involve the training process not conducive to feature learning.
Fortunately, graph neural networks have gained importance due to their strong ability to generalize graph data 11. After GNNs deliver messages during a graph structure, the ultimate illustration of nodes is obtained by rotten the graph, GNNs use graph node representation, which makes training end-to-end and facilitates the learning of feature representations compared to stream learning and rearrangement order. The network combines graph computing and deep learning to obtain a deep learning framework with robust similarity estimation and recognition.
Thus, in this paper, a GNN is proposed for forest fire recognition under multiple views. For a small batch of images consisting of multiple images, the initial visual features and images of the learned images are first learned in pairs supervised; then, each combine is then accountable for generating a similarity score for the graph, and that we will perceive them as a node. In addition, the pairwise relational features associated with each node are updated and optimized by propagating deeply learned messages among the nodes. Based on this, image recognition is performed using feature fusion weights to obtain robust similarity estimates for images from different viewpoints.
By analyzing the public fire dataset and the existing fire dataset in this paper, it is proved that the method can better identify forest fires, adapt to totally different fireplace situations, and has smart generalization ability and anti-interference ability.
The main outcomes of this thesis are as follows.
1) A graph neural network based on image similarity is proposed, which generates a graph to represent the pairwise relationships (nodes) between images based on images from different viewpoints, and uses the updated feature relationships of the nodes to estimate the similarity between images, thus greatly improving the recognition rate of forest fires.
2) In order to reduce the complex pre-processing process and extract the key features in the images effectively, this paper proposes a dynamic feature extraction method for fire areas based on image segment ability. The dynamic features of forest fire images are used as model inputs to improve the robustness of the network in recognizing forest fires.
3) We open-source a set of tagged fire detection benchmark datasets by combining some previous open-source forest fire images of good quality with our collection, and expect that this benchmark will help further research in this area.
4) The experimental results of the proposed method and deep learning framework on different fire datasets in this paper show that the method can better identify forest fires in different scenes with strong generalization ability and interference resistance.

2.Related Work
Most forest fire recognition algorithms are based on visual analysis of flame texture features, flame color features, motion features, etc. For example, 12 studied the dynamic behavior and irregularity recognition of fires in RGB and HSI color spaces now. 13 used the property of separation of color components from luminance in YCbCr space to design classification rules. 14 studied the shape of flames and the motion of rigid objects, and proposed to use optical flow information and flame behavior to intelligently extract features to distinguish flames accordingly. 15 combined shape, color, and motion attribute to form a multi-expert system framework for real-time flame recognition. 16 found experimentally that flames in HSV color space show lower chromaticity. In 17, based on the RGB color model, the flame pixel points are first extracted, and then the flames are recognized based on their growth and disorder features. 18 calculated the motion direction of the fire by a fast estimation method and accumulated the motion direction to time to identify the fire based on the fire spreading characteristics. A fire identification algorithm based on spectral, spatial, and temporal features and fuzzy logic features is proposed, and a real-time forest fire alarm system is designed based on it.
With the development of machine learning and graph neural networks，fire identification based on computer vision has become a new idea. For example, 19 designed a convolutional neural network for identifying forest fires, and used the alternative random initialization parameter method for the problem of small training sample size in the network training process, and achieved a better fire classification effect. 20 combined traditional recognition methods with neural networks, and firstly used AdaBoost and LBP (Local Binary Pattern) algorithms for initial recognition of images to extract flame candidate areas, and then used convolutional neural networks for feature extraction and classification. 21 used a deep trust network for flame recognition. 22 trained ResNet network 23 using deviated data by exploiting the quantitative difference between fire images and normal images, and then used the network to recognize flames. 24 proposed a multilayer noise reduction automatic coding network algorithm and applied it to more than a dozen different scenarios including forest fires. 25 proposed a cascaded CNN algorithm, which uses two independent convolutional networks to identify static and dynamic features of flames separately, and combines the results of the two networks to determine whether they are flames. 26 designed DnCNN networks to recognize flame images and compared them with networks such as VGG and ZF-Net. These models can accurately recognize the same viewpoint image, while the recognition of multiple source samples is not accurate. For this reason, 27 introduced that GNNs can effectively use intergraph relational information to improve image recognition. 28 proposed two methods constructing deep convolutional network on CNN graph square measure planned, one is mathematician spectrum methodology supported graph. the opposite is abstraction structure, that extends the properties of convolutional filter to the generating graph Lawx. area GCN is employed to spot disaster behavior.
The method projected during this paper uses coaching knowledge labeling management, in contrast to existing GNN strategies, to come up with a lot of correct feature fusion weights in graph message passing, and so effectively establish fire pictures from totally different views.

Image segmentation
The HSV (hue, saturation, value) model provides a more humane way of describing color than the RGB color model. The way of neural network perceives color is closely related to the HSV component 29. The HSV color space can be defined as x v are the H, S, and V component values of x , the pixel in the HSV color space is x , and, respectively. Thus, we can obtain the fire color distribution from the sample image containing the forest fire region, whose sample color values form the pixel component values as shown in Figure 1. Gaussian mixture model was used to represent the fire shape, and the pixel points whose colors are within the range of the distribution model are used as fire pixel points.

Figure 1．H、S and V component display
To further reduce the computational effort, three 2D projection planes are used instead of the 3D distribution model, the color of the flame on the fireplace sample is projected on the HS, HV and SV planes. In each plane, the extent of the color distribution can be easily represented by one or two rectangles, so that a relatively simple 2D color distribution can be defined.
Based on the color range, the image is segmented and candidate fire areas are obtained, as shown in Figure 2, which can clearly segment the forest fire scene.

Extraction of fire features
In this subsection, features such as area, roundness and contour of the fire area are acquired for the fire area segmented in the image. Forest fire is associate unstable flame at the start, and also the range of fireside pixels will increase with the fireplace space, therefore the fireplace space has become a very important feature of fireside 31. To identify the degree of area variation of a fire, the change in the size of the fire area can be calculated from two consecutive images. If the result exceeds a predefined threshold, the fire growth is judged to occur.
Given a segmented fire area, using Laplace operator to retrieve its boundary, then its connected boundary chain code can be easily retrieved, whereby the perimeter of the boundary L can be easily calculated. the roundness of the fire area is calculated from its perimeter and area, which can indicate the quality of the fireplace space form, i.e., the additional complicated the form, the larger its price. And the roundness helps to get rid of the recognition interference of irregular bright objects in the early fire recognition.
Contour: Since the shape of the fire area varies due to air flow, the degree of its fire can be measured by calculating the contour undulation , assuming that there are N points on the boundary and they are in the plural form ，where， is the coordinate of the ith point of the fire zone boundary crossed clockwise. The discrete Fourier transform of i z is obtained as： where Φ represents the center of gravity of the one-dimensional boundary. According to reference 416 only a few dozen Fourier coefficients are really needed to describe the profile, and based on experience the first 32 are chosen The difference of two consecutive Fourier is : Di it is greater than Td and lasts longer than Tm, where Td and Tm are statistical thresholds from the experiment, it means that a drastic change in shape has occurred and a fire may have occurred.

Dynamic characteristics of fires
In order to observe the spread of forest fire, the dynamic characteristics of continuous image fire features are important for fire detection 26. We define a dynamic characteristic containing n continuous images. In order to ensure real-time fire detection, n should be a relatively small number. In general, the characteristic frequency of flame flicker is about 10 Hz and the recorded video has 30 frames per second. Based on the real scene requirements, the value of n is set to 5 to define the dynamic characteristics for every 5 consecutive images of the fire features. Therefore, construct associate nm  matrix for the flame options within the image, set range the quantity of consecutive pictures  Thus, for any forest fire image, there are associated dynamic features, i.e., the mean and mean squared deviation of the image matrix. In the machine learning model of this paper the above image segmentation information, fire features and fire dynamic features are fed into the model as auxiliary information along with the forest fire images for training.

The proposed graph neural network
We divided the test dataset into a detection set and a probe library image set to evaluate the algorithm for identifying forest fires. Given a pair of probe images and image pairs with different viewpoints, the goal of the forest fire recognition model is to robustly determine the visual similarity between the probe images. In this paper, we tend to use a tiny low batch dataset to coach the model and judge totally different image pairs singly. Such as one pair of images, is evaluated separately in this setup so that it will be independent of the influence of other image pairs.
Our projected technique is meant to form higher use of this knowledge to spice up feature learning, as shown in Figure 3. within the algorithmic rule, every node generates pictures exploitation pictures from one probe and multiple libraries as inputs. It outputs the similarity score of every probe library image. throughout the end-to-end training method, the deeply learned data is passed between nodes to update the relative options related to every node and procure a lot of correct similarity score estimates.

Graphical representation and node characteristics
In this paper GNN framework, a probe image library and M images are given to construct an undirected complete graph ( ) is the set of nodes consisting of probe library image pairs. First, we need to estimate the similarity score of each probe library image response. Generally speaking, the input of any node is encoded between its corresponding probe library images. The scheme in this paper acquires the input relational features as shown in Figure 3a. Each input image will be fed into the CNN for pairwise relational feature encoding when we give a probe image and M images. In this paper we use ResNet-50 26 as CNN structure. The last global average ensemble feature of the two images in ResNet-50 is element-wise subtracted to obtain two-two correlation features. The two-two features are processed into differential features and the i-th image but also input features for the i-th node on the graph. Since the task on the graph is node classification. In the linear classifier, since node classification is a complex task that requires inputting the input features of each node and obtaining the output similarity score, the pairwise relationship between nodes is not considered. The loss function of the model in this paper is in the form of cross-entropy as follows32.
where denotes the classifier as a sigmoid function 27. denotes the label of the ith detector library image pair, and 1 indicates that the detector and the ith library image belong to the same identity. Figure 3 depicts the basic model and the deep messaging implementation of the graph architecture in this paper. The basic model in Figure 3a can be used not only to obtain the similarity of detector image pairs for deep message passing and to update the relational features of detector image pairs. The similarity of detector image pairs can also be calculated. Figure 3b in order to deliver more effective messages, the image relationship features di is first fed into a two-layer messaging network for feature encoding. Use the similarity score of the detection gallery, the detection gallery relational features are fused to derive the message passing and feature fusion scheme, the objective function is shown in equation 7.

Similar guidance
Clearly, the simple node classification model (Equation (7)) ignores the valuable information between different probe library pairs. In order to utilize this important information, it is necessary to create edges E on the graph G. G is fully connected and E denotes the set of relationships between different probe library pairs, where Wij is a scalar edge weight. It denotes the importance of the relationship between node i and node j, which can be calculated as where t is the number of iterations. The refined relational feature the relational feature i d in equation (7) for loss function calculation and GNN training.
The training equation (11) can be back-propagated for framework structure and model update.

Forest fire data production
Forest fire images open-source datasets are few and of low clarity, so targeted forest fire dataset production is needed. In this paper, we produce a dataset by collecting a large number of forest fire videos on relevant networks.

Related Technologies
Crawling is a technique that automatically obtains corresponding information or resources on the Internet according to a certain purpose. This paper uses a crawling technique based on the Python language, where the toolkits used for crawling in this paper are Requests, Beautiful Soup17.
OpenCV31 is a library of open source (API) functions for computer vision and this paper uses its Python interface, where Cascade and Classifier are cascaded classifiers for target recognition in OpenCV, which is used by using Local Binary Pattern (LBP) to import specific classifier file, such as an image classifier recognition of targets 23.

Concrete implementation
The production of this dataset requires the implementation of crawling the forest fire image website, and then cutting out and saving the fire part to achieve the fire dataset. The most important purpose in this process is to reduce the difficulty of manual rechecking while ensuring quality and speed, which means that the results processed using the program can only contain a very small number of non-fire images, so the filtering module must be designed in this paper.
Since the process of single-threaded data writing, data analysis, and waiting for server response takes more time and does not make full use of bandwidth, this paper will use a multi-threaded approach to crawl, and the speed of crawling depends mainly on the bandwidth limit.
All the fire images saved by the crawler are processed, and the cascade classifier of OpenCV is used to partially recognize the saved fire images, and the picture are cut and saved, meanwhile, because the accuracy of using only OpenCV in performing fire recognition Also, since the accuracy rate is not high enough when using only OpenCV for fire recognition, a large number of images that do not contain fire parts will be generated when there are enough original images, so the Dlib module is used to filter and remove the unqualified parts.
The specific implementation consists of four aspects, the first is the use of crawlers for image collection, the second is the fire recognition module, using the OpenCV module to achieve the cropping of fire, and the third is the screening module, using the DLib module to screen the cropped images to remove the unqualified images. The specific method is shown in Figure 4, also the content of the finished dataset is shown in Figure 5.

Data set
In this paper, we use the technique proposed in Section 5 to collect a total of 2826 forest fire images on the Web, including images at the beginning of the fire and images while it is burning, and an additional the forest fire dataset 932 non-forest fire images.
The xBD dataset 29 is one of the public datasets of high-resolution satellite images 30with annotations. The natural disaster image dataset updated by MIT, encompasses 19 disaster events and contains the number: 22068 images with an image resolution of 1024 × 1024, where each building has an identifier. We just use this dataset to contain data on forest fires.

Setup
The models in this paper are implemented using Keras and TensorFlow frameworks, and its implementation platform running environment configuration operating system is Ubuntu 19.04, GPU is Geforce GTX 1080Ti, Intel i710500U processor, 16 GB memory RAM,1 TB hard disk.
In this paper, we use deep learning models for comparison, ResNet and DenseNet 28, and set the learning rate to 0.01 and the batch size to 64.
Our proposed GNN is based on ResNet-50 for forest fire recognition. All input images are adjusted to 256×128. the base CNN model is first pre-trained with an initial learning rate of 0.01 set on all datasets, and the learning rate is reduced by an element of 10 once 50 epochs, then coaching rate is mounted for 50 training cycles. The weights of the linear classifier used to obtain the image similarity were initialized using the weights of the linear classifier trained in the base model training phase, and the model was optimized using Adam 23 with the weighting parameter α set to 0.9.

Results
The number of parameters, training time and test results of each model are shown in Table 1.  Table 1, the employment time of our GNN model on every dataset is well however that of DenseNet, and GNN throughout this paper can effectively use photos from utterly totally different views or multiple sources, which could produce the model converge quickly by the approach of information accumulation. whereas achieving comparable employment time with ResNet, the number of article participants is not as high as ResNet, and so the GNN has multidimensional higher accuracy than ResNet as a results of we tend to tend to stand live the first to do and do dynamic feature extraction for fireplace footage, that creates the model learn the deep knowledge of the images quickly, and jointly the similarity of our GNN model for fireplace footage from utterly totally different sources or views yet. Meanwhile, the complete parameters of our style area unit relatively least, reducing the memory overhead. Our model takes smart accuracy on utterly totally different employment sets, then the model does not manufacture overfitting, and so the theme of this paper has strong generalization ability and strength to effectively verify fireplace footage from utterly totally different viewpoints, that creates the theme of this paper meet the fireplace observance needs of assorted eventualities. The loss drop premeditated in Figure.6 clearly shows that our method converges quicker than the baseline theme throughout coaching on the xBD dataset，which the coaching method is extremely stable and begins with the model trained at 27k steps, compared to the 32k steps needed by the newest DenseNet. The accuracy of our theme will increase virtually linearly throughout the coaching method, and as shown among the proper panel in Figure.6, the coaching accuracy is consistently the most effective, even with some slight noise, that is at intervals the allowable vary. In general, it are often seen from Figure.6 that the strategy during this paper is quick convergence and stable, which might be attributed to the dynamic feature settings that may be learnt adaptively, and also the similarity steering mechanism of loss and accuracy throughout the coaching of the model to traumatize the matter of heterogeneous information from multiple sources and create the model coaching stable, avoiding the matter of gradient disappearance overly deep models. The unstable coaching accuracy and poor convergence of ResNet show that the look plan of GNN framework during this paper has the impact of mitigating overfitting, and our framework has higher generalization ability than ResNet.
To justify the utilization of dynamic options, a series of experiments were conducted on the fire dataset. One feature was discarded in every analysis check, as well as image segmentation, boundary chain code with contour, and roundness. As are often seen from Table a pair of, discarding any of the options can cut back the accuracy of the algorithmic program. If the norm is removed, the quantity of false alarms will increase considerably. In conclusion, each feature is essential. They can be combined together to obtain a high recognition rate and can significantly increase the false positive rate improving the robustness of the recognition. Roundness is not used 94.1 1.24 92.04 The dynamic features accurately describe the physical and optical properties of the forest fire. Thus, the method in this paper has less false alarm rate than the traditional color space-based method RGB model. We additionally compare the performance of dynamic options of forest fires and RGB models, and therefore the experimental results square measure shown in Table 3. False-Positive 0.12 13.8 In Table 3, the recognition accuracy of the dynamic features is higher than that of the RGB model for positive samples. However, the recognition accuracy of the RGB model is lower than that of the dynamic features for the negative samples, i.e., the false positive RGB model is higher than that of the dynamic features.
We further tested the flame luminosity and smoke comparison results of our proposed dynamic features, as shown in Figure 7. To improve the accuracy of identification and reduce the number of false positives. Our dynamic feature approach can provide more physical features for identification, and its temperature distribution blocks of different features detected due to the conversion of RGB color space into multiple single-spectrum spaces can be quickly estimated using the two-color high-temperature method. In Figure 7, the left is the region identified and the middle is the temperature distribution of the estimated dynamic features, allowing the model to learn deep information about the forest fire images. The right figure shows the RGB black-and-white pattern, and obviously, the temperature distribution mapped by our method is more realistic. But the method proposed in this paper is not 100% effective either. Our method is difficult to handle scenes of shade fires, which do not exist to produce bright light and are accompanied by smoke interference. The GNN node classification was visualized on the collected dataset, as shown in Figure 8. Each point corresponds to a node on the graph and each color corresponds to its node class. It is observed that the nodes of certain classes are clustered, while the nodes of other classes are separated. For example, Class-1 (magenta) and Class-9 (green) belong to the same cluster, So they are the same color will be close to each other, but far from the other classes. This is based on the similarity bootstrap of dynamic features of the images in different viewpoints. The node relationship features between different viewpoint images and gallery images can carry out the approximate classification of each image, and some points of different colors have overlays, indicating that these relationship features are good to update the information of other nodes and also consider the similarity of images from different sources and update the dynamic features of the image gallery accordingly to build different types of graph nodes. Then, Points of different colors also produce a good classification effect, indicating that our framework can learn the correlation and differences of images from different sources or perspectives better.

Conclusion
This paper first proposes a graph neural network based on the similarity of forestfire images from different viewpoints to achieve the estimation of similarity between images. A dynamic feature approach for segment able fire regions from images is also proposed, which can improve the robustness of the network in discriminating forest fires by reducing the complex preprocessing process and effectively extracting key features from the images. We also contribute a dataset of forest fire images that we have produced. The experimental results of our proposed method and several deep learning methods on the fire dataset show that our method is adaptable to different fire scenarios with high generalization and interference resistance.
In the future, we plan to design fire identification and monitoring systems with perspective dynamics, for example, deploying drone patrols to regularly patrol and monitor open fires in forest areas via drones.

Conflicts of Interest:
The authors declare no conflict of interest.