Entomological remote dark field signal extraction by maximum noise fraction and unsupervised clustering for species identification

Characterization of flying insects in-situ measurement using remote sensing spectroscopy is an emerging research field. Also, most analysis techniques in remote sensing spectroscopy are based on the use of an intensity threshold which introduces indeterminacies in the number of detected specimens. In this manuscript, we investigated the possibility of analyzing passive remote sensing spectroscopy measurement data using the maximum noise fraction method. The results obtained show that this analysis technique can help to overcome the measurement of background noise in spectroscopic measurements.


Introduction
Remote sensing spectroscopy is an indirect measurement technique used to characterize aerial fauna and atmospheric particles. This measurement technique is done through instrument generally composed by reflector to collect backscattered light by object of interest, refractor to extend incident beam and sensors to measure backscattered light [1]. Several remote sensing spectroscopy instruments are been developed in recent decades to measure the spectral footprint [2, 3], wing beat frequency [4], optical cross section of flies species [5,6]. These instruments are calibrated sometimes to measure fly direction [4] and detection range of insects [7,8]. These measurement instrument do not capture simultaneously image and identification parameters of the specimens studied, namely spectral footprint, wing beat frequency and optical cross section. Also, backscatter intensities due to insects are extracted using intensity threshold which sometimes omits intensities backscattered by small specimens [2]. Intensity threshold derived in conjunction of median intensities, the interquartile range of intensities and the choice of an arbitrary constant related to the signal to noise ratio of each data file has been proposed [9]. The arbitrary choice of this constant is subjective and leads to loss of information. Blind source separation methods seem to be an attractive alternative [10][11][12]. These information extraction techniques have been widely used in biomedical engineering, speech recognition systems and telecommunications. An extension of this technique has been proposed by some authors under the name of maximum noise fraction (MNF) [13]. In order to overcome the shortcomings of conventional indirect measurement systems, we are investigating a technique for simultaneous indirect measurement of the spectral footprint and image capture of flying specimens and analysis techniques based on the MNF for feature extraction. An unsupervised cluster analysis was done to identify insects' species.

Experimental set up and synchronous acquisition of footprint and image
The equipment is composed of two receivers a reflector (Skywatcher telescope ; D=200 mm ; F=1000 mm ; coated optic) and a refractor (Bresser Messier ; AR 90 Achromatic refractor telescope 90/900 f/10.0 ; coated optics) , mounted in parallel on an aluminum bar with a separation distance L (L=500 mm). The whole was fixed on an equatorial mount. The position of the counterweight is adjusted to determine the equilibrium position in the xy plane and in the yz plane. It is often necessary to move on the position of the telescope to adjust the balance in the yz plane. The determination of the equilibrium point allows flexibility in the handling of the instrument. This also allows the instrument to remain fixed at low atmospheric pressures, which minimizes the margins of error in measurements (Fig. 1). Before any measurement, the reflector and the refractor are aligned on the black box located 1 meter above the ground, about 60 m from the experimental device ( Fig. 1). This alignment phase consists of first, using an eyepiece, then a CCD camera connected to the computer to observe the orifice of the black box. The alignment phase of the telescope and the telescope allows the spectral and spatial information of the flying specimen to be acquired synchronously. Once the alignment phase is complete, the CCD camera is connected to the telescope then to the computer to record the image and the spectrometer on the telescope then to the computer to record the spectral imprint of the insect which is introduced into the volume defined by the intersection of the solid angle of the refractor and the reflector. The spectrometer and the camera are controlled by a computer program that we have developed using MATLAB for automated acquisition. The reference measurements are carried out by free-falling white Lambertian spheres in the focal plane of the black box. The release time is noted for future identification of the reference data stored on the computer. The integration time of the spectrometer has been set at 20 ms and the number of samples acquired is 1000.

Method for extracting and determining the spectral footprint
Raw data, Iraw,t , recorded is a 16 bits files, containing 3648 pixels x 1000 samples. Backscattered intensities due to dark box reflection and electronic noise intensities generated by electronic chain of spectrometer are add up to different measurements at each instant. These intensities represent the static intensities, Istatic,t . Interquartile range intensity is insensitive to intensity flows backscattered by insects, atmospheric particles, plant leaves, aerosols and other undesirable factors such as smoke entering probe volume. This intensity has been calculated and represent Istatic,t . Then, Iraw,t was centered around Istatic,t to get the intensity It . Then, derived intensity It is supposed to contain backscattered intensity by an insect Iinsect,t and intensities backscattered in the volume probed by unknown objects considered as noise, Inoisy,t , at time t . Iinsect,t and Inoisy,t are assumed to be independent and orthogonal and with zero means. We can write: (2) The principle of the maximum noise fraction consists in finding a basic vector, ø , such that the maximum noise fraction , fø , define by equation (3) is optimal [13].
To determine the value of ø which makes optimal fø it is necessary to set to zero the derivative of fø with respect to ø. The calculus led to equation (4). This equation is the generalized eigenvalues problem. = (5) Knowledge of background noise, Inoisy,t , in this methodology is required to derive B. Several existing techniques allow to estimate Inoisy,t . The method of differentiating neighboring pixels is generally used in image processing. Inoisy,t was evaluated , by determining the difference of the intensities of the neighboring samples . , = − (6) After extracting the spectra of interest, the spectral footprint, Rinsect, characteristic of each species of specimen, was calculated based on the spectrum of the specimen, Iinsect, and the spectrum of the Lambertian sphere, Iref , according equation (7) . In this equation, the subtraction of the intensity of the background noise does not appear, because the background noise was dissociated from the spectrum of interest during the extraction of the spectrum of interest (Fig. 2). = (7) Figure 2: Examples of spectral of two events (a,b) Raw data centered around static intensity. (a) corresponds to reference spectra and (b) insect spectrum passing in the probe volume. (c,d) Spectra obtained after application of the maximum noise fraction method. The spectra were ordered according to the decreasing signal to noise ratio (SNR). The spectrum of interest is the high intensity spectrum (blue curves). In (c), highest intensity spectrum is sun spectrum which is used has illumination light; in (d), highest intensity spectrum is the spectrum of an insect detected by the acquisition system. Lower intensity spectra are considered to be background noise.

Identification of insects by their spectral footprints and their associated images by unsupervised classification
The spectral fingerprints determined were normalized and then grouped together. The Euclidean distance was then calculated. The decision criterion which allowed the determination of the number of classes (clusters) is a method based on the elbow detection called L-method [15][16][17]. The L-method consists of dividing the data relating to the associated distances into two batches and finding the best linear fit of the two batches (Fig. 3a) which minimizes the total mean square error, RMSET, (Fig. 3b) calculate according to the mean square error of the adjustment of each of the lots on the left (Fig. 3a, red curve), rmseL , and right (Fig. 3a, green curve), rmseR , the number of clusters, a, the maximum number of clusters, b, according equation (8). After determining the number of classes, the calculated Euclidean distances are used to construct the different classes according to their similarity. The determination of the corresponding image was carried out manually using the time of detection of the insect by the measurement system. Figure Fig. 4 shows the central spectra of the 14 classes of insects identified and Fig. 5 shows the image associated with the central spectrum of each of the classes.

Conclusion
We have implemented a device for synchronous acquisition of the image and the spectral imprint of flying specimens by passive remote sensing. We used the maximum noise fraction method to extract the spectra backscattered by the flying specimens in a set of 1000 samples containing the background noises. An unsupervised classification was used over a short measurement period to group the insect species detected by our system and thus identify each class of insect by an acquired image. As the camera used was not fast, we obtained saturated images. The simultaneous acquisition of the image and the spectral imprint being a major stake, this study can be deepened by using a fast camera.