A fast lightweight based deep fusion learning for detecting macula fovea using ultra-widefield Fundus images

Macula fovea detection is a crucial prerequisite towards screening and diagnosing macular diseases. Without early detection and proper treatment, any abnormality involving the macula may lead to blindness. However, with the ophthalmologist shortage and time-consuming artificial evaluation, neither accuracy nor effectiveness of the diagnose process could be guaranteed. In this project, we proposed a deep learning approach on ultra-widefield fundus (UWF) images for macula fovea detection. This study collected 2300 ultra-widefield fundus images from Shenzhen Aier Eye Hospital in China. Methods based on U-shape network (Unet) and Fully Convolutional Networks (FCN) are implemented on 1800 (before amplifying process) training fundus images, 400 (before amplifying process) validation images and 100 test images. Three professional ophthalmologists were invited to mark the fovea. A method from the anatomy perspective is investigated. This approach is derived from the spatial relationship between macula fovea and optic disc center in UWF. A set of parameters of this method is set based on the experience of ophthalmologists and verified to be effective. Results are measured by calculating the Euclidean distance between proposed approaches and the accurate grounded standard, which is detected by Ultra-widefield swept-source optical coherence tomograph (UWF-OCT) approach. Through a comparation of proposed methods, we conclude that, deep learning approach of Unet outperformed other methods on macula fovea detection tasks, by which outcomes obtained are comparable to grounded standard method.


Introduction
The macula is located approximately at the retina center, which is in charge of the central vision [1] of human. Like any kind of retinal disease abnormality deposit at the edge of macula, such as Age-related Macular Degeneration(AMD), Diabetic Macular Edema(DME), Diabetic Retinopathy(DR),etc. can be responsible for center vision blockage, maculopathy, and retinal detachment [1,2]. Macular lesion is one of the most serious scenarios among maculopathies, which is an abrupt and irreversible modulation in vision and is the major cause of blindness [3,4]. Maculopathy could result in choroidal inflammation and retinal pigment epithelial detachment, with 63% of the patents develop hemorrhagic lesions [5], 7% of the macula patents suffer central serous choroidopathy symptoms [5], some shows vitreous hemorrhage [6], a disciform scar could occur in months or years [4], and during said period about 60% of the time the patients' vision is impaired [7]. New lesions always occur around the old ones [8]. For patients who have been diagnosed of histo scars before, there is an approximately 1/4 of the chances that they would have a macula recrudescence within 3 years of their first recovery [9]. Early diagnosis and timely treatments would significantly help prevent this irreversible damage [2]. However, without the presence of pigmentation and hemorrhage, it is laborious to detect macula lesion membranes [10]. The deficient quantity of ophthalmologists is highly incompatible with the large number of patients [2].
Opportunities and challenges are fallen into macula fovea detection tasks when traditional ophthalmologist-diagnosing methods meet the novel deep-learning approach. Macula fovea is concepted as the center of macula, which is composed by compact core cells. Fovea allows us to maintain a sharp, clear central vision and discriminate colors. Detecting Macula fovea plays a vital role in retinal screening and early detection [1]. Conventional methods are artificial diagnoses based on Fluorescein Angiography (FA), color fundus photographs, Optical Coherence Tomography (OCT) and Retinal Thickness Analyzer (RTA), where retinal imaging and bio-microscopy technology are included [2,11]. Such detecting process is time-consuming, cumbersome, costly, and always make patients physically uncomfortable. Automated diagnose and computeraided assessment methods are preferred for fovea detection. Sinthanayothin et al. (2002) detected the fovea at the darkest area of the Optic disc (OD), they applied template matching skills in the view of increased pigmentation features around the fovea [11]. Singh etc. (2008) described a method derived from distinct structures of appearance for fovea localization [12]. Sekhar etc. (2008) employed Hough transform and morphological operations to explore the spatial distribution of optic disk(OD) and macula fovea, based on which, their study introduced a novel approach for OD and fovea detection [13].These traditional methods performed well but they are still time-consuming and labor-intensive. Thus, since advanced artificial intelligence technologies present great advantages in many different fields, researchers consider and implement it into the ophthalmology field. Sedai etc. (2017) presented an end-to-end pixelwise segmentation framework based on fully convolutional neural network(FCN) deep learning (DL) algorithms, they tested the model on 400 retinal fund images with an average error of 14±7 pixels [14]. Al-Bander etc. (2018) demonstrated a deep learning method based on convolutional neural network (CNN) algorithm to detect the fovea and OD. They also examined the results with accurate grounded standard marked by experienced ophthalmologists, in their study, high accuracies of 97% and 96.6% were presented for fovea and OD detection respectively [15]. Bian etc. (2020) Proposed an automatic fovea localization, OD segmentation and glaucoma classification frame work based on U-shape network (Unet) architecture, their research acquired a 47.8 average distance for fovea detection [16].
Ultra-widefield fundus (UWF) imaging is an essential improvement and supplement to multiple imaging paradigm of OCT and scanning laser ophthalmoscopy (SLO). UWF images apply 200° panoramic photographing technique to acquire various retinal modalities, which consists of fundus autofluorescence(FAF), pseudocolor, indocyanine green angiography(ICGA), and fluorescein angiography (FA) [17]. With a larger view and richer information, Optos UWF provides novel insights for retinal diagnoses [17,18]. Applications of deep learning technology in macula fovea detection on normal fundus images of 45° field of view (FOV) have been validated by renowned researchers [14,15,19], but there has still been few applications on UWF images. Yang(2019) proposed a joint method for OD and fovea localization based on spatially constrained Faster R-CNN in the context of UWF, which was an idea not been considered before [18]. However, with the lack of open source UWF datasets [20], most of the deep learning-based methods are pertained by 45° fundus images.
This study proposes a DL method for fovea detection on 2300 UWF images based on a lightweight Unet architecture, where 100 UWF photographs are randomly chosen as the test dataset. In this research, three experienced ophthalmologists are invited to locate the fovea center manually. Comparison methods include but not exclusive to FCN (fully convolutional Network), and also an anatomy approach based on the spatial relationship between OD and fovea. Regarded results based on OCT-induced approach as the grounded principle, this research examines the efficiency, accuracy and stability of the proposed methods. Thus, contributions are concluded as the following. (1) A three-layer lightweight Unet architecture is proposed. (2) A set of parameters of anatomy-based method is experimented and verified to be effective. (3) Five approaches are experimented and compared using the collected dataset: FCN-based method, Unet-based method, anatomy-based method, OCT-induced method and manual approach executed by human experts, where a time indicator and ten statistical indexes are used as evaluation criteria. (4) Features of automated and non-automated methods are investigated. (5) The proposed Unet-based method is verified with performances of cost consumption, accuracy and stability, which exhibits a potential for wide application on macular fovea detection clinically in the future.
The structure of this paper is organized as following. Methodology and experimental setup are introduced in the section 2. In section 3 we analyze the experimental results and extend discussions. The conclusion is presented in section 4.

Datasets
UWF imaging is recognized as a significant innovation in ophthalmology scanning and diagnosing. With a panoramic FOV of 200°or above, the vasculature and peripheral retina can be observed. Thus, images are expected to be the standard-of-care for clinical diagnoses [17]. Some of the normal fundus images are openly accessed online, like the STARE dataset [21], where 400 images are included and categorized by 14 retinal disease. Up until the point when this paper is written there is no open source of UWF images available, probably because UWF images are difficult to obtain, especially in the underdeveloped areas [20]. The recognized UWF datasets are included as the DeepDR and other in-house data sets. DeepDR is from the IEEE International Symposium on Biomedical Imaging conference, which includes 256 UWF images and 2000 regular fundus images collected from Chinese patients. The in-house datasets are collected from hospitals for lesion detection research purposes, such as Diabetic Retinopathy (DR) [22], Rhegmatogenous Retinal Detachment (RRD) [22], Retinitis Pigmentosa (RP) [22], Idiopathic Macular Hole (MH)[23] and Proliferative Diabetic Retinopathy (PDR) [24] detections. UWF datasets for MH, RP, RRD and PDR are collected from Tsukazaki Hospital, which includes images from normal people and patients with lesion [22][23][24].
As the Table 1 shows, this research collected 2300 Optos images from Shenzhen Aier Eye Hospital (Guangdong, China). UWF images are captured by Optos with 2600 × 2048 pixels of resolution. 1800(81.82%) and 400(18.18%) UWF images are set as the training and validation dataset respectively, which are labeled by a qualified ophthalmologist and reviewed by another ophthalmology expert. After the process of resizing and amplifying, the resolution of images is set as 488 × 488, and the size of training and validation dataset is 8182 and 1818 respectively. Testing dataset includes 100 UWF images from 50 patients. The in-house data set is properly pruned without sensitive information, like patients' personal data. This study adhered to the Declaration of Shenzhen Aier Eye Hospital, and this research has obtained appropriate permission and approvement from said hospital.

Data Labeling and preprocessing
Experts mark the macular fovea on the blank canvas of original UWF images, which has a resolution of 2600 × 2048 pixels. A circle is drawn around the marked fovea as the center with a 30 pixels radius. The ophthalmologists are invited from Shenzhen Aier Eye Hospital in China, who are experts with more than 7-year clinical experiences in colorful fundus imaging diagnoses.
As the Figure 2 shows, the preprocessing includes image resizing and amplifying. We resized the original images to 448 × 448 pixels and amplified the training dataset for DL models from the size of 2200 to 10000. The amplifying process includes rotation (set rotation range to 30, width shift (set shift range to 0.3), height shift (set shift range to 0.3), zoom process (set zoom range to 0.3), horizontal flip and vertical flip. With this process, this study obtained highly accurate labeled training and validation datasets with appropriate sizes.

Methods
A great number of deep learning models have been implemented for medical use [25][26][27] since AI research has found its way into medical field. However, as for macular fovea detection tasks, algorithms are required to consider retinal indicator quantifications. Furthermore, when it comes to clinical practice, the invested time is an important factor to be considered to evaluate performance [26]. In this research we compare an FCN architecture model and a 3-layer-based lightweight Unet structure on the task of locating the macular fovea.

Fully Convolutional Network
Fully Convolutional Network (FCN) is proposed and used by L. Jonathan et al. [28] for semantic segmentation, which means to segment images in a pixel level. In their research they adapted and replace the fully connected layers in some classic CNN models such as AlexNet, with locally connected convolutional layers and form new models of FCN. When talking about Fully Convolutional Network, the Convolutional Neural Network is to be mentioned. As the mainstream technique in image tasks, segmentation methods based on CNN possess the common drawbacks: high computational cost the consequently lower efficiency. Conventional CNN techniques are suitable for almost any kind of image classification or/and regression tasks, they accept whole image input and produce mathematical descriptions (value, possibility). Hence the main difference between CNN and FCN where FCN takes in random (or customized) size of input, and produces correspondent size of output, which presents key characteristics: end-to-end, pixel-to-pixel learning. Structurally speaking, difference from FCN to CNN is that FCN only uses locally connected layers (pooling, convolution, etc.) instead of fully connected layers at late stage of processing, which requires fewer parameters and would possibly reduce the computational cost comparing to conventional CNN models. As we know, fully connected layers process the image as a whole, while locally connected convolutional layers in FCN process the image as parts, therefore using locally connected convolutional layers allows the convnet to slide through images of larger size (which breaks through the limitation on the size of input image). FCN takes in an arbitrary size image as input, after multiple convolutions and pooling the produced result become smaller and poorer resolution. To restore the processed result back to correspondent sized image, deconvolution is used to up sample for the feature map of the last convolutional layer and produce a prediction for each pixel while also maintain the spatial coordinates from original input image.
Until now some might consider FCN a more efficient solution over some conventional CNN method, but its disadvantages also include that its process excluded the spatial regulation step, which is seen in some common segmentation models, this might lead to possible lack of spatial consistency. And because FCN conducts pixelwise prediction, correlations between pixels could possibly be overlooked.

Unet
Unet is first proposed by Olaf Ronneberger et al. [29] in 2015, the objective is to achieve segmentation for biomedical images. This model is call U-net because of its symmetric processing structure resembles the letter 'U'. U-net's mechanism is widely recognized to resemble the mechanism of an encoder-decoder relationship. Two symmetric processes are contained in this architecture, which are encoding of subsampling (feature extraction), and decoding of up sampling and concatenation. In the original research, the size of the input is 224 × 224. And during feature extraction, after multiple convolutions and pooling, 4 types of features are produced as 112 × 112 , 56 × 56, 28 × 28 and 14 × 14. Then after deconvolution (up sampling) a new 28 × 28 feature map is produced from 14 × 14, which will then be combined with the extracted feature map with correspondent size. Progressively after multiple deconvolutions and concatenations, the final produced result will be a same sized image as the input. A difference between the aforementioned FCN and U-net is the step of merging feature maps. FCN achieves it by stacking correspondent pixels of the feature maps, and U-net does it through the concatenation of channels. U-net has been widely implemented on processing medical images, it is also utilized as baseline for many related research.
There are obvious advantages of U-net on specifically medical image processing, considering the nature of medical images. They are usually area-specific and narrowstructured, for example CT and MRI of various cancers, therefore most features extracted from the original image could be important. The way that U-net merge feature maps through correspondent channels could protect the integrity of every valuable feature. On the other hand, medical image datasets are hard to obtain and compile, for some diseases there might not be enough amount of images, and if large structure deep learning models such as DeepLabv3+ are used, overfitting problem could easily occur. Therefore, the simpler structured U-net could possibly better handle the task, also provide the researchers with wider space of modifying and optimizing the model according to their research objectives.
This study proposed a 3-layer-based lightweight Unet structure for fovea detection. The architecture of the proposed network is shown in Figure 3. Comparing to the traditional Unet architecture, the size of the input and output image is defined as 448x448, through multiple convolutions and max-pooling, 4 types of features are produced as 224 × 224, 112 × 112, 56 × 56 and 28 × 28. Correspondingly, with convolution and up sampling processes, the output image is resized as 448 × 448. The other progress is that, instead of 4-layer of Unet, this study proposed a 3-layer architecture, which is more lightweight and resource-saving.

OCT-induced Method and anatomy-based method
In the experiment we utilized Ultra-widefield swept-source OCT (UWF-OCT) [30] with depths of 5 mm and scan widths of 23 mm, which are performed on 50 patients of 100 subjects. This method is utilized by multiple researchers for ophthalmology disease tasks, which is defined as the grounded standard method [30][31][32][33].
Another method is based on the anatomy-based location relationship between the optic disk and macular fovea based on UWF. OD is regarded as the brightest area of the fundus images, in the experiment we identify the brightest pixel in UWF, calculate OD mask's width ( ) and located the center ( , ). The fovea is located at the left of OD center when it comes to the left eye, otherwise, it is detected on the right. Coordinates ( , )of macular fovea could be calculated as the following equations 1 to 3. For the left eye, the location of the fovea is located as ( , ), otherwise, it is detected as ( , ). The parameters of and is decided by the experts, who are ophthalmologists and have more than 7 years of experiences in macular fovea detection based on colorful fundus images. The parameters setting is ( = 3.04, = 0.409).

Results and Discussion
100 UWF images are tested by the aforementioned methods of FCN deep learning model, Unet algorithm, anatomy-based method, OCT-induced method and manual marking by three experts. The Euclidean distance between the proposed approach and OCT-induced method results are calculated. OCT-induced method is regarded as the grounded standard in Ophthalmic area. Indicators of mean value, mode value, median value, max value, min value, variation ratio, mean difference, standard deviation, discrete coefficient and the difference between max and min value (range value) are calculated to describe errors.
As the Table 2 shows, deep learning models show a great advantage on timeconsuming performance in the experiment. Moreover, according to the table 3 and figure  4, the findings indicated that artificial methods(manpower) for fovea detection consumed significantly more time than automatic detection methods. The parameters of anatomybased method are verified to be effective, since the max error is lower than 300, which is also not the highest among the five selected approaches. Comparing measures between ophthalmologist results, there is a great difference of max and min value of their distance errors, the marking results show a high inconsistency between three experts. The variation ratio, mean difference standard deviation discrete coefficient and range value are relatively high. A pool stability falls on the manual marking method due to the high discrete degree of errors. Thus, comparing to the automatic models, fovea marking manually shows a shortage of efficiency, stability and accuracy.
Furthermore, due to the max error is lower than 300, which is also not the highest among the five approaches, the parameters of anatomy-based method are considered effective. With the lowest value of mean value, mode value, max value, min value, mean difference, standard deviation and discrete coefficient, and the relatively low median value and range value, the Unet deep learning algorithm presents advantages of the lowest error and highest stability. This approach exhibits better performance over other approaches.

Conclusions
In this paper, we experimented on a dataset containing 2300 UWF images collected from Shenzhen Aier Eye Hospital in China. With 5 different methods: FCN-based method, Unet-based method, anatomy-based method, OCT-induced method and manual marking by three experts, the objective is to find the best performance method for macular fovea detection. OCT-induced method refers to Ultra-widefield swept-source OCT-based method, which is recognized as the grounded standard in Ophthalmic area. Considering the limited resource in the clinical practice, this study proposed a 3-layer-based lightweight Unet architecture for fovea diagnoses. There is a spatial relationship between macula fovea and OD center in UWF, a set of parameters to detect the fovea based on the location of OD center is verified to be effective. Comparing to the grounded standard on testing datasets, deep learning models show a great advantage in time-saving performance over other approaches. The manual marking method showed a shortage in performance of resource-saving, accuracy and stability. The proposed lightweight Unetarchitecture model performed best on efficiencies, stabilities and precises. Thus, the proposed method of 3-layer lightweight Unet architecture model exhibits a potential to be widely implemented for macular fovea detections in the clinical practice.
The dataset we used contains 2300 UWF images, but it is still insufficient for further optimization and development of new advanced methods. Furthermore the access of UWF images is still extremely limited in current days, for the researchers with less resources and unable to get necessary permits (ethical, legal), further related research will be restricted. With thriving research on deep learning techniques, ophthalmology, and their interdisciplinary studies, inspiration and motivations are needed, so that clinicalused Deep Learning models can be compared and improved continuously. For future research, the development of medical-used DL models and algorithms are expected to be more objective-oriented and also possibly explainable.

Patents
The Chinese patent named "The method of Macular fovea detection" (No.202010199266.9) is related to this project.

Supplementary Materials:
The "data access authorization & medical data ethics supporting document" from Shenzhen Aier Eye Hospital (both of Chinese and English version) of this study is available online at https://github.com/luckanny111/macular_fovea_detection. Informed Consent Statement: Not applicable. The dataset is preprocessed with desensitization procedures before this paper applies. Data Availability Statement: The dataset is an in-house dataset this article collected from Shenzhen Aier Eye Hospital.