A New Visual Attention Model Designed for SAR 2 Images Based on Texture Saliency 3

Object detection in synthetic aperture radar (SAR) images, which is a fundamental but 10 challenging problem in the field of satellite image interpretation, plays an important role for a wide 11 range of applications and is receiving significant attention in recent years. Recently, the ability of 12 human visual system to detect targets with visual saliency is extraordinarily fast and reliable. 13 However, visual computational modeling of SAR image scene still remains a challenge. This paper 14 designs a visual attention model for SAR images. Firstly, we propose a novel approach for 15 computing the local texture coarseness of the input image, then our model constructs the 16 corresponding feature maps. Next a new mechanism of feature fusion is adopted to replace the 17 linear additive mechanism of classical models to obtain the final saliency map. Moreover, the gray 18 values of focus of attention (FOA) in all feature maps are taken into account, our model chooses the 19 best saliency representation, the filter and threshold segmentation of saliency maps can be used to 20 extract the salient regions accurately through the multi-scale competition strategy, thereby 21 completing this operation for visual saliency detection in SAR images. Finally, the paper gives the 22 framework based on classical ITTI model. In the paper, several types of satellite data, such as 23 TerraSAR-X (TS-X) and Radarsat-2, are used to evaluate the performance of visual models. The 24 results show that our model provides superior performance than classical models. By further 25 contrasting with classical visual models, our model reduce the false alarm caused by speckle noise, 26 its detection speed is greatly improved, and it is increased by 25% to 45%. 27


Introduction
Synthetic aperture radar (SAR), known as a kind of advanced active microwave sensors, with its all-weather, all-day, multi-polarization advantages, has been increasingly paid attention to by all countries seeking detection technology in remote sensing [1].As the basis of its classification and identification, targets detection in SAR images is an important aspect of SAR application [2].SAR images have the characteristics of low contrast, low signal-to-noise ratio and limited gray level, and so on.These characteristics cause targets in SAR images to be subject to noise interference, and the contrast between the target and the surrounding environment becomes low.This bring difficulties to target detection.What's more, with the successful launch of TerraSAR-X (TS-X), Radarsat-2 and other next-generation SAR sensors, SAR is gradually evolving towards higher resolution and larger width directions.The quality of SAR image is getting closer to the optical images, and the features of SAR images are complex.Traditional detection system cannot interpret and analyze complex features of high-resolution SAR images timely and effectively [3,4].In a word, since there is a contradiction between the large amount of information in SAR images and the limited computer processing power, the research for target detection technology in SAR images is currently a serious problem [5].
However, in the face of complex scenarios, the human visual system can quickly focus on several interesting targets, known as visual attention mechanism [6,7].Some scholars have made great progress in human visual intelligence, and have adopted visual attention mechanism to select useful information from rich and complex information to complete target detection, which greatly improves the efficiency of processing.Consequently, scholars try to put forward the mathematical model to simulate human visual system.
At present, the study of visual attention models mainly includes two aspects: (1) The data-driven visual model, which can be divided into visual model based on image time domain and visual model based on image frequency domain.The visual model proposed by Itti simulated firstly the "attention" concept of the human eye in the mathematical level [8], and established the visual attention model based on the time domain; The visual model based on frequency domain transforms the processing level of the image from time domain to frequency domain.For example, the visual model based on the spectral residual presented by Hou et al. [9], Hou Model uses the fast Fourier transform of the amplitude spectrum as the significant representation of the image, and then returns to the time domain to obtain saliency map.The visual model proposed by Yu, which combined the discrete cosine transform [10], is also widely used.Although such models can obtain salient regions in a simple way, there are many false alarms in the saliency map.(2) The purpose-driven visual model, the most representative of which is the function model based on the psychological threshold proposed by Itti [11,12] and the visual model proposed by Oliva et al. based on Bayesian learning [13].The shortcoming of such models is lack of self-adaptability.
Currently, commonly used visual model are ITTI model, AIM model and spectral residual model [14].Recently, the ITTI attention model has been widely used in the field of computer vision, because the ITTI visual model has imitated the formation of the bottom-up saliency in the human visual system, so as to realize the saliency detection of the image.
Although the research of computer vision has made great progress in recent years, and a series of achievements have been obtained, the ability of human eye to process and to analyze information on the realistic scene is still more efficient.Recently, most of these existing visual attention models are designed for the nature scene image, and these models can obtain the saliency maps and extract the regions of interest.However, there are significant differences between SAR image and natural scene image.For instance, the characteristics of speckle noise in the background and targets are similar in SAR images [15].The phenomenon makes the classical visual models difficult to get accurate results when the object regions are extracted from this type of SAR image.
The difficulties mainly include the following aspects: (1) Extraction of the underlying early visual features.The selected features in the classical ITTI model are local features such as brightness, color and orientation, but the global features of the target regions are not considered.Therefore, the model cannot accurately deal with the regions of interest whose local features are not obvious in the detected image.Among them, the texture, shape and other important features of targets in SAR image are not considered in ITTI model, which is also the reason for the poor performance of the model.(2) Strategy of feature fusion: In ITTI model, the fusion strategy of feature maps is linear combination.
Usually, the model get the total saliency map by adding the feature maps linearly directly, ignoring the priority relationships of different features, which leads to the weakening of a dominant feature map in the merge process, so leads to a missed detection of the target areas.In conclusion, the ITTI model has no adaptability to the extraction of salient regions of SAR image.
Motivated by this, we propose an improved visual attention model for SAR images based on texture saliency.Firstly, we need to calculate and extract the texture and other features that can describe the SAR image better.In this step, we design a new calculation method for local texture coarseness.Then we construct the corresponding saliency maps of features.What's more, a new mechanism of feature fusion is adopted to replace the linear additive mechanism of classical models to obtain the overall saliency map, and a measurement method of the texture saliency is given.
Finally, the gray-scale values of focus of attention in saliency maps of features are taken into account.
Our model choose the best significant representation.Through the multi-scale competition strategy, the filter and threshold segmentation of the saliency maps can be used to accurately select the salient regions, thereby completing this operation for the visual saliency detection in SAR images.
The paper is organized as follows.We provide a review of the existing computational models to visual attention with a brief description of their strengths and shortcomings in Section 1.And the motivation leading to the current improved work is described in the second part of Section 1.The proposed model is described by means of a pseudo-code and a graphical illustration in Section 2. In Section 3, we analyze and evaluate the performance of our calculation method for local texture coarseness through mathematical derivation and experiments; In Section 4, we describe the experiments carried out to validate the performance of the proposed method for the task of salient regions detection.Finally, we conclude this work by highlighting its current shortcomings with a brief discussion about the future directions of the current work in Section 5.

Measurement of Texture Saliency
Under normal circumstances, local features are used to distinguish the target pixels and neighborhood pixels, and the global features can calculate the saliency of similar areas in the image from the perspective of global saliency, further highlighting the target areas.Considering these above, this paper takes the texture and shape features which is the dominant position in SAR image into the visual feature extraction category.Human beings have the perfect texture sensing mechanism, which can distinguish the fine texture difference.The features used by humans to distinguish textures include: coarseness, contrast, complexity, orientation, etc.
Texture feature is one of the important properties which are used to identify the target and region of interest [16].The texture feature exists on the surface of every object, and contains the important information of the object's surface structure arrangement and their connection to the surrounding environment.Texture reflects the visual features of homogeneity, and is independent of the color or brightness in images [17].Therefore, the visual attention model based on texture saliency is of great significance.
In the paper, four operational factors of features are designed, they include the local coarseness, standard deviation, orientation, and global contrast feature of SAR image.

Local texture coarseness
Tamura et al. proposed the expression of Tamura texture feature based on the psychological research on the human visual perception of texture, which has been widely used in image recognition and image retrieval in recent years [18,19].The Tamura texture feature includes six properties that correspond to the texture features in the psychological point of view: coarseness, contrast, orientation, linearity, regularity and roughness.Among them, the coarseness is the most basic and important texture feature.From the narrow point of view, the texture is coarseness.
Coarseness feature is a quantity that reflects the granularity of texture, when the two texture patterns are only different in the dimension of element.The pattern with a larger dimension of element and fewer repetition units is more cruder [20].The calculation of texture coarseness can be divided into the following steps: (1) Calculate the average intensity of pixels in the activity window in the image.The size of the activity window is 2 k ×2 k .Assuming I(i,j) is the input image; the average intensity value is: Among that, k=0,1,2,•••, Lmax; Lmax is the maximum window scale; (2) For each pixel in the image, the average intensity difference between the non-overlapping windows in horizontal and vertical directions is calculated separately: ) (3) The size of the maximum average intensity difference is set to optimum size at each pixel: max , , max( , ) (3) Where: Zopt is the optimum size at current pixel.If there is k>kmax; Ek>t•Emax, then, kmax=k; in the original, t takes the empirical value 0.9; (4) Calculate the mean value of Zopt at every pixel, that is Zopt(x,y).The coarseness of the input image (Fcrs) is gotten: It can be seen that, Tamura coarseness is a measurement of texture coarseness in a global perspective, and it can only extract coarseness from an entire image or a larger image block, but cannot accurately measure local texture coarseness.Due to the limitation of Tamura's algorithm, a new local texture coarseness calculation algorithm with more general noise robustness is proposed.
We put the principle of Tamura's algorithm shown in figure 1 The element dimension is larger, the repeating unit is lesser, and the texture coarseness is greater.
Normally, complex texture feature is composed of some simple texture elements [21].However, the texture element is still a vague concept.There is lack of a good mathematical model to describe it [22].In this section, we construct mathematical models to analyze these problems.The general texture element in the image has a uniform gray image block, and we can think the image block is just an isolated pixel.The image of figure 1(a) can also be considered as two texture elements with two different dimensions and gray values, and their dimensions are d and D. If the image only includes a texture element, the optimal size is Zopt(x,y); The output is shown in figure 1(c).When M=N=1, there is Fcrs=Zopt.Therefore, the optimal size of Zopt(x,y) is used to calculate the local texture coarseness of the pixel point (x,y).The output of Zopt(x,y) should be figure 1(d).The size of active windows is set to 3×3 else end for k=1 to Lmax Step 2: Calculate the deviation scale of the two windows (Lb) Step 3: Calculate the eccentricity of the two windows ( ) Step 4: For each pixel in the image, the average intensity difference between the non-overlapping windows in horizontal and vertical directions is calculated separately Step 5: Calculate the optimum size at each pixel )

E xy =
Step 6: There is a parameter kmax, which is determined by the following methods: it is divided into three situations: The pixel points in the boundary, the pixel points within the larger and smaller dimensional texture elements.if k=0 Get the mean of the local non-zero maxima of all the pixels of E0,that is T1, In this situation, the current pixel is the point on the boundary of the texture element; In this situation, the current pixel is the point within the larger dimensional element; In this situation, the current pixel is the point within the smaller dimensional element; Calculate the local coarseness of the pixel point according to the optimal size of each pixel in the image To increase the contrast, we put the power transformation to Zopt

The windows used to make a difference (a) is the window of Tamura's algorithm, which has the same size. In our model, The two windows that make the difference are the eccentric overlapping windows. There is a deviation in the size of windows, as shown in Fig.(b).
Moreover, our model have completed the measurement of local texture coarseness, it is: ( , ) ( , ) After the normalization operation and significant treatment of the feature matrix, the feature map is as follows: In the formula and later formulas, N( • )is the normalization operation: the feature maps of the model will be normalized to the range [0,N], N is any gray value within the gray range of the input image, this process will reduce the saliency of background, the feature maps after the normalization operation is F'; And then we multiply F' and a coefficient; Where: M is the global maximum of F', and M is the average gray value of remaining pixels except the pixels with the gray value of M in F'. S is the initial saliency map of current feature.

Standard deviation
Normally, standard deviation can effectively reflect local features of images, such as the edge and shape features.The SAR images have rich edge information, and the target edge and contour information can be enhanced by extracting the standard deviation of the image, and realize detecting targets from the background [23].Standard deviation can guarantee the difference between targets and background in a low computational complexity.
The standard deviation is calculated by sliding the filter in the detection image.The size of the filter sliding window used here is related to this balance, which is used to coordinate the relationship between the time consumed by the model and the effectiveness of the saliency inhibition of the background.Assuming I(i,j) is the input image, the size of the filter is N×N, the central coordinate of the filter is (m,n), the formulas for standard deviation are derived as follows: Firstly, we need to calculate out the average value of the local areas in the image, that is mx(i,j), M is a parameter related to the size of the filter, M=0.5×(N-1). ) Standard deviation (Fstd) can be obtained: ) Pixels on the boundary of target areas is larger than pixels in their adjacent field, so the STD values calculated are larger, which are the highlights in the feature map.From the perspective of image comprehension, the corresponding Fstd values of each pixels constitute the feature matrix of standard deviation.Moreover, after the normalization operation and significant treatment of the feature matrix, the feature map of standard deviation is obtained.The calculation is as follows: ( ) ( )

Orientation
The orientation feature usually is used to represent the targets with different directions [24].In our model, the orientation features of pixels in the input image are extracted by Gabor filter, and the filter is shown in the formula: ( ) θk is the orientation of the sine wave; and λ is the wavelength of the sine wave; α and β refer to the standard deviation of the Gaussian function in the X-axis and the Y-axis, respectively.
The orientation the sine wave (θk) can be obtained by the formula: ( ) Normally, the orientations of the sine wave are periodic, our model selects four orientations (n=4), they are 0 °, 45 °, 90 °, 135 °; The calculation of the parameters (consist of Then our model substitutes the obtained parameters into the filter formula.We get the Gabor filters in four orientations, so we get the feature maps based on these four orientations.They are Fori(0), Fori(45), Fori(90), Fori(135).Assuming I(i,j) is the input image, the size of the input image is M×N.The feature value of global contrast (val(i,j)) is calculate by the formula: Next, the binarization result of the feature is obtained by a criterion: , , , , 0, , Among that, g(i,j) refers to the final feature value of global contrast for the current pixel.γ is an experience parameter associated with grayscale.Last but not least, normalize it.
( ) ( ) In the formula, Sglobal represents the feature map of global contrast.

Calculation of Saliency Map
In most cases, the visual models end up with a saliency map that is a synthesis of all the feature maps.The meaning of different feature maps corresponds to the different channels of "attention" [21].
The magnitude of the response of the feature maps corresponding to silent regions is quite different.Some feature maps correspond to more than one silent regions with strong responses.Some feature maps may include only one silent region with relatively weak.Therefore, the mechanism of feature fusion needs to be completed based on the significant levels of the features, rather than the simple linear addition.
According to the features extracted from SAR image in the last section, the feature maps are obtained.The fusion strategy of multiple features used in our model is: first of all, the feature maps of the local coarseness, standard deviation and orientation are adopted in a linear additive way and normalization operation, the saliency of the silent areas in the image has been enhanced by this step.
And then we can eliminate the inconspicuous regions in the feature map of global contrast by the multiplication operation, at the same time to strengthen the saliency of the silent regions contained in local feature maps and the global feature map.We calculate the weighted sum of feature maps, after its normalization and we can get a coefficient, and then multiply it by the global feature map, then the total saliency map is calculated as follows: Among that, the parameter can be understood as sample weight, c is an empirical adjustment parameter that ranges between 1.5 and 2.2.

Saliency Detection
Our model optimizes the competition strategy in the ITTI classic model and uses multi-scale segmentation in the saliency map to realize the detection and extraction of silent regions.Focus of Attention (FOA) is the focus of visual attention.In general, FOA is the pixel that has the maximum value of the grayscale in the saliency map S. If there are more than one pixels with the maximum gray value at the same time, our model mimics the mechanism of the human visual system to deal with multiple regions of interest, and regards the region nearest to the center of the image as the most significant area of visual attention.In this situation, the calculation formulas of the focus of attention are as follows.
Firstly, we need to determine the distance from the center of the image to the current pixel; ( ) ( ) ( ) ( ) Among that, the coordinates of the center of the input image are (x0,y0); the coordinates of FOA are (x,y); Finally, FOA is calculated as follows: ( ) ( ) ( )

Acquisition of binarization template
According to the method for determining FOA, our model calculates the FOA of the saliency map S, and then we can obtain four pixels corresponding to this FOA point in four feature maps, and take the feature map with the maximum gray value among the four pixels as the next saliency map (S') for detecting silent regions, and the binarization operation is carried out by using the FOA of the saliency map S' as the center.Last but not least, in the binarization operation, the global threshold segmentation is realized by the traditional Otsu method.The judgment criterion of binarization is: Among that, sVal is the gray value of the FOA pixel in S', and T0 is the threshold for image segmentation; B(x,y) is the result of binarization.
The model performs the following two steps on the binarization results obtained above.First of all, Gaussian filtering is performed in the binary image.The parameters of the filter need to be determined according to the prior knowledge of targets.The size of the filter in this article is set as the estimate value of the pixels of the actual target in the image.In the second phase, our model is used to judge the regions after Gaussian filtering.
Assuming that Num' is the number of pixels in the regions to be detected; and N'×M' is the size of the input image.The proportional parameter needed to be used in the judgment process is ratio; it is calculated by the formula.
' Num ratio N M ′× ′ = (23) β is the ratio between the estimated value of the actual pixel and the number of all pixels in the image, and the criterion are as follows: The current region belongs to the target area he current region belongs to the background area ratio ratio Finally, the binary image judged to be the target area is saved.

The multi-scale acquisition of silent regions
In order to avoid the process of finding the next silent region entering into the cycle of death, an operation of "inhibition of return" (IOR) is done.The specific operation is that when the binarization template in a silent region is obtained, this silent region is set to zero, and then the operation of the next FOA is started until the retrieval of all silent regions is completed.The detection result of silent regions is obtained through the above operations.
The optimization and improvement of competitive strategy in our model is mainly reflected in the following aspects.
(1) Comparing and analyzing the gray value of FOA in each saliency map, the best saliency representation of every positions are selected;

Evaluation of Texture Saliency Extraction Algorithm
A new method is proposed to measure the texture saliency.In this section, the extraction algorithm is discussed from the following two aspects.

Complexity Analysis of Algorithm
In terms of the operation process, this algorithm does not need to do the construction of Gaussian pyramid structure.The ITTI model uses the nine-story pyramid structure to simulate the human visual attention system, so as to realize multi-scale representation of images.However, the areas of ships and other targets in SAR image that occupy the detected images are relatively small.Excessive In terms of the time complexity, our algorithm deals with pixels in two categories.The pixels in the image can be divided into points in texture boundary and points inside the texture elements.
For the interior point of the texture element, the current window size (k) is less than the texture dimension, and it satisfies Ek≈0; When k is larger than the dimension of the texture element, it is clear that Ek≫0, and the maximum value is Emax, then kmax=k; When the size of the texture element is very large, the values of all Ek are small and they are similar.At this point, kmax=Lmax.Use the constraint conditions: Numel(DEk<T2)=Lmax-1 and Emax<T3 to judge.
For the boundary points, Ek is larger and Ek≫0.At this point, set kmax=0.Because E0 contains the information of original texture boundary, so we use the condition: E0>T3 to judge the boundary points.
The value of T3 is set as the average of all local non-zero maximums in pixels of E0. we can get: k=0, E0<T3.
In a word, it is fast and effective to distinguish the internal points of the affected texture elements from the boundary points.

The mathematical model
The image is disturbed by noise in the process of acquisition and propagation.The extraction algorithm proposed in this paper is applied to SAR images, while the SAR image has higher noise than the general optical image.Therefore, the noise robustness of algorithms must be considered.
Considering the additive noise n(i,j), the intensity value of pixel .Then, there are: Nk is the total number of pixels in the window area (Ak).When the area Ak and Ak' are in the same texture element, we can get: There is a condition: when the radius of the probability distribution of n(i,j) is small, that is r, the values of Nk and Nk' are greater than the value of r, considering Wiener-khinchin law of large numbers, we can get: Then we can derive the following formulas: Among that, n μ is the mean of the noise n(i,j).Then we can get: Considering the values of Nk and Nk' are greater than the value of r, we can get: Obviously, the bigger the value of Nk is, the better the condition is met, the noisy suppression effect.In fact, after the calculation of average intensity difference in the first few steps of the algorithm, Ek,h and Ek,v is the intensity difference after the mean filtering of the original image, and theoretically the algorithm also should have good anti-noise ability.

Analysis of simulation experiments
In order to prove the effectiveness of the extraction algorithm, the algorithm has carried on the experimental analysis to some images.Experimental images include images in Brodatz's texture library and some natural scene images, and the results are compared with Novianto's algorithm based on local fractal dimension [22].In order to show the consistency of the coarseness feature map obtained by the two methods, the coarseness feature map of fractal dimension method is inverted.From the coarseness feature map extracted by the proposed algorithm we can see that the value of each pixel point corresponds to the value of the local coarseness of the image, and then the texture coarseness distribution of the original image is given accurately.Compared with the fractal dimension algorithm, the proposed algorithm is as effective as fractal dimension method.When the Gaussian white noise is added to the original image whose variance is 10, we can get two coarseness feature maps by these methods.It is easy to see that the noise has little impact on the result of the algorithm proposed in the paper, but has a bad impact on the result of fractal dimension method.In the experiments, we found that, even if we use the 5×5 window instead of the 3×3 window in fractal dimension algorithm, the feature map is still influenced by noise greatly, and we can't use simple post-processing such as median filtering to filter noise.
In a word, our algorithm have a good noise robustness.

Evaluation of Salient Regions Extraction
This section carries out the saliency detection experiments on the basis of visual saliency and the design theory of the proposed model.The results of the model are given for SAR images.Considering TS-X image is a typical high-resolution SAR image, several representative TS-X images are selected.
Compared with classical models, the advantages and disadvantages of our model are analyzed.

Evaluation Index of Detection Results
In order to verify whether the model is valid to SAR images, a TS-X image is selected as the experimental data, whose pixels are 4015×3616.The imaging area is Strait of Gibraltar, the sampling rate of the image is 1.25 meters, and the polarization mode is HH polarization.The distribution of sea in this area is complex and there are a lot of ships and strong speckle noise, and the area contains a certain region of non-uniform ocean background.In this area, combining with GPS and AIS data analysis, we collect and sort out the geographical information of the Strait of Gibraltar, and determine the number and location of ships in SAR image data, and improve the objectivity of evaluation in experimental results.For better evaluation of detection algorithms, define the detection rate (Pt) and the Figures of Merit (  Table 1 shows the comparison of the performance of our model and two other traditional visual attention models.Among them, the evaluation indexes include the number of targets detected, the  A saliency map of the image using the ITTI model is shown in the figure 11 (b), and it includes a higher missed rate and false alarm rate; (c) is a saliency map obtained by using Hou model, which cannot distinguish the target region and background clutter region, so that the obtained saliency map is meaningless, and the model fails.According to the saliency map obtained by the model and proposed in the paper, that is (d), our model has a better detection effect, and can inhibit the saliency of background clutter, filter the features of background, highlight the target contour shape, and enhance the saliency of targets.

Conclusions
Considering the characteristics of SAR image, this paper analyzed the basic theory of classical visual models.We focus on the problem of their poor performance when the classical visual models are applied to SAR images with complex background.A new visual model for detecting targets in SAR images is presented in the paper.Firstly, our model extracts several special features which can describe the SAR image better.After a series of calculations, the feature maps are obtained; Secondly, the model combines the feature maps to obtain the final saliency map by a new mechanism of feature fusion; Finally, the extraction of silent regions is achieved through a multi-scale competition strategy, so as to realize the saliency detection of SAR image.
In the end, the performance of our visual model and classical visual models are simulated in the uniform clutter environment.A number of experiments were performed in TS-X images with a complex background.The results show that our model has a better performance: the lower false alarm rate and the better contour shape.Our model has a great advantage in the saliency detection of targets in SAR image.
The research of next phase is: (1) When we complete the extraction of visual features, how do we combine the attributes of targets to select a feature?how to select and extract features which can represent the targets accurately with a small amount of calculation as a prerequisite?(2) How to establish a new mechanism of feature fusion, which can adaptively adjust the proportion of each feature?(3) When carrying out the calculation of feature maps, it is necessary to further improve the parameters setting of these filters.
(a).A spike with a width of d is arranged in a spaced D cycles.The optimum output size of each pixel is shown in figure 1(b).As can be seen from the figure 1(b), the optimal size is the expression associated with d and D, and the final output result is Fcrs=(3d+D)/4.The results are in line with the facts, when the value of d and D is larger.

Fig. 1 Algorithm 1 : 4 ) 1 :
Fig.1 Analysis of local texture coarseness In summary, the local coarseness is the largest at the center of texture element.The local texture coarseness are minimal at the boundary points of texture element; The pixels between the center and the boundary points have a middle-value local coarseness.the more far away from the center, the Experiment on many texture images, and found the values of T2, T3 are related to min E 2 m i n 1

Preprints
(www.preprints.org)| NOT PEER-REVIEWED | Posted: 13 October 2017 doi:10.20944/preprints201710.0093.v1Lastbut not least, since our model does not build the Gaussian pyramid structure, the final feature map of orientation (Sori) is obtained by the linear addition of four feature maps.The calculation is as follows: contrast Global contrast is a quantification of the difference between the current local region and the whole image[25].Generally, the gray value of target areas in the SAR image is relatively high, and the saliency map of the global contrast feature can be used to enhance the difference of the target and the background, so as to highlight the significance of targets.The calculation processes of the global contrast feature based on pixel level are shown as follows.

Fig. 4
Fig.4 Results of Gaussian pyramid structure The figure 4 (a) is the original SAR image.(b) to (f) are the sub-image obtained by Gauss pyramid.In figure (c) to (f), we can clearly see that these sub-image become blurred, and the target information is missing, and feature extraction on them to build saliency map have no practical significance, it also can cause the waste of time.

Figure 5 Fig. 5 Fig. 6
Figure5is the processing result of the natural scene texture image from the Brodatz's texture library, and the original size of the image is 320×320 pixels.

Preprints
Fig.11 The detection results complex scene SAR image (a) is a complex scene SAR image, (b) is the saliency detection result of the ITTI model; (c) is the saliency map of the Hou's model; and (d) isthe saliency detection result of our model.

Fig. 11
Fig.11is the result of a simulation experiment of a complex scene SAR image.The regions of targets and background in the SAR image are relatively similar, and the background is very complex.