ARTICLE | doi:10.20944/preprints202305.0319.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: hyperspectral images; convolutional neural networks; graph convolutional networks; feature fusion
Online: 5 May 2023 (07:40:07 CEST)
Convolutional neural networks (CNN) have attracted much attention as a commonly used method for hyperspectral image (HSI) classification in recent years, however, CNNs can only be applied to Euclidean data and have limitations in dealing with relationships due to the limitations of local feature extraction. However, each pixel of a hyperspectral image contains a set of spectral bands that are correlated and interact with each other, and the methods used to process Euclidean data cannot effectively obtain these correlations. In contrast, the graph convolutional network (GCN) can be used in non-Euclidean data, but usually leads to oversmoothing and ignoring local detail features due to the need for superpixel segmentation processing to reduce computational effort. To overcome the above problems, we constructed a network a fusion network based on GCN and CNN, which contains two branches: a graph convolutional network based on superpixel segmentation and a convolutional network with added attention mechanism. The graph convolu-tional branch can extract the structural features and capture the relationships between the nodes, and the convolutional branch can extract the detailed features in the local fine region. Owing to the fact that the features extracted from the two branches are different, the classification performance can be improved by fusing the complementary features extracted from the two branches. To vali-date the proposed algorithm, experiments were conducted on three widely used datasets, namely Indian Pines, Pavia University, and Salinas, and the overall accuracy of 98.78% was obtained in the Indian Pines dataset, and the overall accuracy of 98.99% and 98.69% was obtained in the other two datasets. The results showed that the proposed fusion network can obtain richer features and achieve high classification accuracy.
ARTICLE | doi:10.20944/preprints202004.0271.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Keratoconus; smartphone; cornea; convolutional neural network
Online: 16 April 2020 (12:38:42 CEST)
Nowadays smartphone utilization for disease diagnosis and remote health care applications has become promising due to their ubiquity. Here, a novel convolutional neural network method for detecting keratoconus that is wholly implemented on a smartphone is proposed. The proposed method provides accurate detection of over 72.9% for all stages of keratoconus. Preliminary results indicate 90%, 83%, 64% and 52% detection rate for severe, advanced, moderate and mild stages of disease, respectively.
ARTICLE | doi:10.20944/preprints201811.0583.v1
Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: Station logo; Convolutional Neural Network; Detection
Online: 26 November 2018 (10:57:17 CET)
The Station logo is a way for a TV station to claim copyright, which can realize the analysis and understanding of the video by the identification of the station logo, so as to ensure that the broadcasted TV signal will not be illegally interfered. In this paper, we design a station logo detection method based on Convolutional Neural Network by the characteristics of the station, such as small scale-to-height ratio change and relatively fixed position. Firstly, in order to realize the preprocessing and feature extraction of the station data, the video samples are collected, filtered, framed, labeled and processed. Then, the training sample data and the test sample data are divided proportionally to train the station detection model. Finally, the sample is tested to evaluate the effect of the training model in practice. The simulation experiments prove its validity.
REVIEW | doi:10.20944/preprints202208.0313.v3
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Convolutional Neural Network; domain; natural language processing; computer vision; semantic parsing
Online: 18 August 2022 (07:39:33 CEST)
Convolutional neural network (CNN), a class of artificial neural network (ANN) is attracting interests of researchers in all research domain. CNN was invented for computer vision. They have also shown to be useful for semantic parsing, sentence modeling and other natural language processing related tasks. Here in this paper we discuss the basics of CNN models and their scope to provide a reference/baseline to the researchers interested in using CNN models in their research.
ARTICLE | doi:10.20944/preprints201811.0546.v4
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Convolutional Neural Network (CNN), Deep learning, Architecture, Applications
Online: 14 February 2019 (10:01:31 CET)
With the increase of the Artificial Neural Network (ANN), machine learning has taken a forceful twist in recent times. One of the most spectacular kinds of ANN design is the Convolutional Neural Network (CNN). The Convolutional Neural Network (CNN) is a technology that mixes artificial neural networks and up to date deep learning strategies. In deep learning, Convolutional Neural Network is at the center of spectacular advances. This artificial neural network has been applied to several image recognition tasks for decades and attracted the eye of the researchers of the many countries in recent years as the CNN has shown promising performances in several computer vision and machine learning tasks. This paper describes the underlying architecture and various applications of Convolutional Neural Network.
ARTICLE | doi:10.20944/preprints202005.0455.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: pattern recognition; deep convolutional neural network; Brahmi script; CNN
Online: 28 May 2020 (07:33:32 CEST)
Significant progress has made in pattern recognition technology. However, one obstacle that has not yet overcome is the recognition of words in the Brahmi script, specifically the identification of characters, compound characters, and word. This study proposes the use of the deep convolutional neural network with dropout to recognize the Brahmi words. This study also proposed a DCNN for Brahmi word recognition and a series of experiments are performed on standard Brahmi dataset. The practical operation of this method was systematically tested on accessible Brahmi image database, achieving 92.47% recognition rate by CNN with dropout respectively which is among the best while comparing with the ones reported in the literature for the same task.
ARTICLE | doi:10.20944/preprints201912.0252.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: time series; deep learning; convolutional neural network; recurrence plot; financial market prediction
Online: 19 December 2019 (07:39:54 CET)
An application of deep convolutional neural network and recurrence plot for financial market movement prediction is presented. Though it is challenging and subjective to interpret its information, the pattern formed by a recurrence plot provide a useful insight into the dy- namical system. We used a recurrence plot of seven financial time series to train a deep neural network for financial market movement predic- tion. Our approach is tested on our dataset and achieved an average of 53.25% classification accuracy. The result suggests that a well trained deep convolutional neural network can learn a recurrence plot and pre- dict a financial market direction.
ARTICLE | doi:10.20944/preprints202103.0220.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: Convolutional Neural Network; Deep Learning; Environmental Monitoring
Online: 8 March 2021 (13:37:58 CET)
Accurately mapping individual tree species in densely forested environments is crucial to forest inventory. When considering only RGB images, this is a challenging task for many automatic photogrammetry processes. The main reason for that is the spectral similarity between species in RGB scenes, which can be a hindrance for most automatic methods. State-of-the-art deep learning methods could be capable of identifying tree species with an attractive cost, accuracy, and computational load in RGB images. This paper presents a deep learning-based approach to detect an important multi-use species of palm trees (Mauritia flexuosa; i.e., Buriti) on aerial RGB imagery. In South-America, this palm tree is essential for many indigenous and local communities because of its characteristics. The species is also a valuable indicator of water resources, which comes as a benefit for mapping its location. The method is based on a Convolutional Neural Network (CNN) to identify and geolocate singular tree species in a high-complexity forest environment, and considers the likelihood of every pixel in the image to be recognized as a possible tree by implementing a confidence map feature extraction. This study compares the performance of the proposed method against state-of-the-art object detection networks. For this, images from a dataset composed of 1,394 airborne scenes, where 5,334 palm-trees were manually labeled, were used. The results returned a mean absolute error (MAE) of 0.75 trees and an F1-measure of 86.9%. These results are better than both Faster R-CNN and RetinaNet considering equal experiment conditions. The proposed network provided fast solutions to detect the palm trees, with a delivered image detection of 0.073 seconds and a standard deviation of 0.002 using the GPU. In conclusion, the method presented is efficient to deal with a high-density forest scenario and can accurately map the location of single species like the M flexuosa palm tree and may be useful for future frameworks.
ARTICLE | doi:10.20944/preprints202007.0650.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Myocarditis; Diagnosis; Convolutional Neural Network; Cardiac MRI; prediction
Online: 26 July 2020 (17:44:05 CEST)
Myocarditis is the form of an inflammation of the middle layer of the heart wall which is caused by a viral infection and can affect the heart muscle and its electrical system. It has remained as one of the most challenging diagnoses in cardiology. Myocardial is the prime cause of unexpected death in approximately 20% of adults less than 40 years of age. Cardiac MRI (CMR) has been considered as a noninvasive and golden standard diagnostic tool for suspected myocarditis and plays an indispensable role in diagnosing various cardiac diseases. However, the performance of CMR is heavily dependent on the clinical presentation and non-specific features such as chest pain, arrhythmia, and heart failure. Besides, other imaging factors like artifacts, technical errors, pulse sequence, acquisition parameters, contrast agent dose, and more importantly qualitatively visual interpretation can affect the result of the diagnosis. This paper introduces a new deep learning-based model called Convolutional Neural Network-Clustering (CNN-KCL) to diagnose the Myocarditis. The hybrid CNN-KCL method performs the early and accurate diagnosis of Myocarditis. To the best-of-our-knowledge, a Convolutional neural network has never been used before for the diagnosis of Myocarditis. In this study, we used 47 subjects to diagnose myocarditis patients from Tehran's Omid Hospital. The total number of data examined is 10425. Our results demonstrate that CNN-KCL achieves 92.3% in terms of diagnosis myocarditis prediction accuracy which is significantly better than those reported in previous studies.
ARTICLE | doi:10.20944/preprints202103.0180.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: convolutional neural networks; activation functions; biomedical classification; ensembles; MeLU variants
Online: 5 March 2021 (10:05:38 CET)
Recently, much attention has been devoted to finding highly efficient and powerful activation functions for CNN layers. Because activation functions inject different nonlinearities between layers that affect performance, varying them is one method for building robust ensembles of CNNs. The objective of this study is to examine the performance of CNN ensembles made with different activation functions, including six new ones presented here: 2D Mexican ReLU, TanELU, MeLU+GaLU, Symmetric MeLU, Symmetric GaLU, and Flexible MeLU. The highest performing ensemble was built with CNNs having different activation layers that randomly replaced the standard ReLU. A comprehensive evaluation of the proposed approach was conducted across fifteen biomedical data sets representing various classification tasks. The proposed method was tested on two basic CNN architectures: Vgg16 and ResNet50. Results demonstrate the superiority in performance of this approach. The MATLAB source code for this study will be available at https://github.com/LorisNanni.
ARTICLE | doi:10.20944/preprints201812.0296.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: staircase recognition; Convolutional Neural Networks (CNN); re-configurable robot; contour detection
Online: 25 December 2018 (05:33:12 CET)
Multi-floor environments are usually ignored while designing an autonomous robot for indoor cleaning applications. However, for efficient operation in such environments, the ability of a robotic platform to traverse staircases is crucial. Staircase detection and localization is highly important for planning the traversal on staircases. This paper describes a deep learning approach using Convolutional Neural Networks (CNNs) based Robot Operation System (ROS) to staircase detection and localization. We use an object detection network to detect staircases in images. We also localize these staircases using a contour detection algorithm to detect the target point, a point close to the center of the first step, and the angle of approach to the target point. Experiments are performed with data obtained from images captured on different types of staircases at different view points/angles. Results show that the approach is very accurate in identifying the presence of the staircase in the working environment and is also able to locate the target point with good accuracy.
ARTICLE | doi:10.20944/preprints201811.0612.v1
Subject: Environmental And Earth Sciences, Geophysics And Geology Keywords: geophysical signal processing; pattern recognition; temporal convolutional neural networks; seismology; deep learning; nuclear treaty monitoring
Online: 29 November 2018 (03:37:48 CET)
The detection of seismic events at regional and teleseismic distances is critical to Nuclear Treaty Monitoring. Traditionally, detecting regional and teleseismic events has required the use of an expensive multi-instrument seismic array; however in this work, we present DeepPick, a novel seismic detection algorithm capable of array-like performance from a single trace. We achieve this directly, by training our single-trace detector against labeled events from an array catalog, and by utilizing a deep temporal convolutional neural network. The training data consists of all arrivals in the International Seismological Centre Catalog for seven seismic arrays over a five year window from 1 Jan 2010 to 1 Jan 2015, yielding a total training set of 608,362 detections. The test set consists of the same seven arrays over a one year window from 1 Jan 2015 to 1 Jan 2016. We report our results by training the algorithm on six of the arrays and testing it on the seventh, so as to demonstrate the transportability and generalization of the technique to new stations. Detection performance against this test set is outstanding. Fixing a type-I error rate of 1%, the algorithm achieves an overall recall rate of 73% on the 141,095 array beam picks in the test set, yielding 102,394 correct detections. This is more than 4 times the 23,259 detections found in the analyst-reviewed single-trace catalogs over the same period, and represents an 8dB improvement in detector sensitivity over current methods. These results demonstrate the potential of our algorithm to significantly enhance the effectiveness of the global treaty monitoring network.
ARTICLE | doi:10.20944/preprints201807.0086.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: vibration measurement; frequency prediction; deep learning; convolutional neural network; photogrammetry; computer vison; non-contact measurement
Online: 5 July 2018 (08:31:00 CEST)
Vibration measurement serves as the basis for various engineering practices such as natural frequency or resonant frequency estimation. As image acquisition devices become cheaper and faster, vibration measurement and frequency estimation through image sequence analysis continue to receive increasing attention. In the conventional photogrammetry and optical methods of frequency measurement, vibration signals are first extracted before implementing the vibration frequency analysis algorithm. In this work, we demonstrated that frequency prediction can be achieved using a single feed-forward convolutional neural network. The proposed method is verified using a vibration signal generator and excitation system, and the result obtained was compared with that of an industrial contact vibrometer in a real application. Our experimental results demonstrate that the proposed method can achieve acceptable prediction accuracy even in unfavorable field conditions.
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: precipitation downscaling; convolutional neural networks; long short term memory networks; hydrological simulation
Online: 2 April 2019 (12:37:11 CEST)
Precipitation downscaling is widely employed for enhancing the resolution and accuracy of precipitation products from general circulation models (GCMs). In this study, we propose a novel statistical downscaling method to foster GCMs’ precipitation prediction resolution and accuracy for monsoon region. We develop a deep neural network composed of convolution and Long Short Term Memory (LSTM) recurrent module to estimate precipitation based on well-resolved atmospheric dynamical fields. The proposed model is compared against GCM precipitation product and classical downscaling methods in the Xiangjiang River Basin in South China. Results show considerable improvement compared to the ECMWF-Interim reanalysis precipitation. Also, the model outperforms benchmark downscaling approaches, including 1) quantile mapping, 2) support vector machine, and 3) convolutional neural network. To test the robustness of the model and its applicability in practical forecast, we apply the trained network for precipitation prediction forced by retrospective forecasts from ECMWF model. Compared to ECMWF precipitation forecast, our model makes better use of the resolved dynamical field for more accurate precipitation prediction at lead time from 1 day up to 2 weeks. This superiority decreases along forecast lead time, as GCM’s skill in predicting atmospheric dynamics being diminished by the chaotic effect. At last, we build a distributed hydrological model and force it with different sources of precipitation inputs. Hydrological simulation forced with the neural network precipitation estimation shows significant advantage over simulation forced with the original ERA-Interim precipitation (with NSE value increases from 0.06 to 0.64), and the performance is just slightly worse than the observed precipitation forced simulation (NSE=0.82). This further proves the value of the proposed downscaling method, and suggests its potential for hydrological forecasts.
ARTICLE | doi:10.20944/preprints202211.0094.v1
Subject: Engineering, Mechanical Engineering Keywords: Bearing fault feature extraction; Blind deconvolution (BD); Multi-task optimization; Convolutional neural network
Online: 4 November 2022 (13:41:46 CET)
Blind deconvolution (BD) is one of the effective methods that help pre-process vibration signals and assist in bearing fault diagnosis. Currently, most BD methods design an optimization criterion and use frequency or time domain information independently to optimize a deconvolution filter. It recovers weak periodic impulses related to incipient faults. However, the random noise interference may cause the optimizer to overfit. The time-domain-based BD methods tend to extract fault-unrelated single peak impulse, and the frequency-domain-based BD methods tend to retain the maximum energy frequency component, which will lose the fault-related harmonics frequency components. To solve the above issue, we propose a hybrid criterion that combines the kurtosis for time domain optimization and the $G-l_1/l_2$ norm for the frequency domain. These two criteria are monotonically increasing and decreasing, so they mutually constrain to avoid overfitting. After that, we design a multi-task one-dimensional convolutional neural network with time and frequency branches to achieve an optimal solution for this hybrid criterion. The multi-task neural network realizes the simultaneous optimization of two domains. Experimental results show that our proposed method outperforms other state-of-the-art methods.
ARTICLE | doi:10.20944/preprints202305.1228.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: grape; Appearance quality; Classification; Convolutional neural network; Transfer learning; Support vector machine
Online: 17 May 2023 (10:28:16 CEST)
Grapes are a globally popular fruit, with grape cultivation worldwide being second only to citrus. This article focuses on the low efficiency and accuracy of traditional manual grading of red grape external appearance and proposes a small-sample red grape external appearance grading model based on transfer learning with convolutional neural networks (CNNs). Initially, the CNN transfer learning method was used to transfer the pre-trained AlexNet, VGG16, GoogleNet, InceptionV3, and ResNet50 network models on the ImageNet image dataset to the red grape image grading task. By comparing the classification performance of the CNN models of these five different network depths with fine-tuning, ResNet50 with a learning rate of 0.001 and a loop number of 10 was determined to be the best feature extractor for red grape images. Moreover, given the small number of red grape image samples in this study, different convolutional layer features output by the ResNet50 feature extractor were analyzed layer by layer to determine the effect of deep features extracted by each convolutional layer on SVM classification performance. This analysis helped to obtain a ResNet50+SVM red grape external appearance grading model based on the optimal ResNet50 feature extraction strategy. Experimental data showed that the classification model constructed using the feature parameters extracted from the 10th node of the ResNet50 network achieved an accuracy rate of 95.08% for red grape grading. These research results provide a reference for the online grading of red grape clusters based on external appearance quality and have certain guiding significance for the quality and efficiency of grape industry circulation and production.
ARTICLE | doi:10.20944/preprints202201.0068.v1
Subject: Engineering, Mechanical Engineering Keywords: Simulated annealing; Wavelet packet transform; Convolutional neural network
Online: 6 January 2022 (10:27:40 CET)
Bearings are widely used in various types of electrical machinery and equipment. As their core components, failures will often cause serious consequences . At present, most methods of parameter adjustment are still manual adjustment of parameters. This adjustment method is susceptible to prior knowledge and easy to fall into the local optimal solution, failing to obtain the global optimal solution and requires a lot of resources.Therefore, this paper proposes a new method of bearing fault diagnosis based on wavelet packet transform and convolutional neural network optimized by simulated annealing algorithm.The experimental results show that the method proposed in this paper has a more accurate effect in feature extraction and fault classification compared with traditional bearing fault diagnosis methods. At the same time, compared with the traditional artificial neural network parameter adjustment, this paper introduces the simulated annealing algorithm to automatically adjust the parameters of the neural network, thereby obtaining an adaptive bearing fault diagnosis method. To verify the effectiveness of the method, the Case Western Reserve University bearing database was used for testing, and the traditional intelligent bearing fault diagnosis method was compared. The results show that the method proposed in this paper has good results in bearing fault diagnosis. Provides a new way of thinking in the field of bearing fault diagnosis in parameter adjustment and fault classification algorithms
ARTICLE | doi:10.20944/preprints202304.0996.v1
Subject: Biology And Life Sciences, Biology And Biotechnology Keywords: Convolutional Neural Network; Deep Learning; Photoplethysmography; Respiratory Rate; Time Series
Online: 26 April 2023 (13:17:24 CEST)
Respiratory rate is an important biomarker that indicates changes in the clinical condition of critically ill patients, so a surveillance tool that can accurately monitor the changing respiratory rate in real time is needed. Through investigating various pairs of machine learning models, we proposed new machine learning model for real-time respiratory rate estimation using photoplethysmogram. New photoplethysmogram-driven respiratory rate dataset(StMary) was collected from surgical intensive care unit of a tertiary referral hospital, using photoplethysmogram signal collector. For 50patients and 50healthy volunteers, 2-minute photoplethysmogram was collected for each subject twice. To evaluate the respiratory rate of subject, it was inputted into the deep neural network model we built, and dataset was splitted into training, validation, testing dataset, then 4-fold cross validation was exploited. Our deep neural network model trained with StMary and two public datasets(BIDMC and CapnoBase) individually, or selectively merged dataset had shown a low error rate in respiration rate measurements. Our model trained with StMary showed low mean absolute error score(1.0273±0.8965), and trained with 3 datasets(CapnoBase, BIDMC and StMary) showed a lower error rate(1.7359±1.6724) than the model trained with CapnoBase and BIDMC(1.9480±1.6751). We could verify the performance of model evaluating respiratory rate from photoplethysmogram, and our dataset could contribute as the clinical research data that supports artificial intelligence models evaluating respiratory rate and surveillance tools to test whether their monitoring function works properly.
REVIEW | doi:10.20944/preprints202206.0167.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: deep learning; convolutional neural network; brain tumor classification; clinical application
Online: 13 June 2022 (04:57:42 CEST)
Deep learning has shown remarkable results in every field, especially in the biomedical field, due to its ability to exploit large-scale datasets. A convolutional neural network (CNN) is a widely used deep learning approach to solve medical imaging problems. Over the past few years, many studies have focused on CNN-based techniques for brain tumor diagnosis. There are, however, still some critical challenges that CNNs face towards clinic application. This study presents a comprehensive review of current literature that involves CNN architectures for brain tumor classification. We compare the key achievements in the performance evaluation metrics of the applied classification algorithms. In addition, this review assesses the clinical effectiveness of the included studies to elaborate on the limitations and directions of this area for future work. No review focusing on the clinical effectiveness of previous works in this field has been published. We believe that this study has the potential to elevate the application of CNN-based deep learning methods in clinical practice and also can be a quick reference for biomedical researchers who are interested in this field.
ARTICLE | doi:10.20944/preprints202209.0190.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: green coffee bean; lightweight framework; deep convolutional neural network; explainable model; random optimization
Online: 14 September 2022 (04:04:05 CEST)
In recent years, the demand for coffee has increased tremendously. During the production process, green coffee beans are traditionally screened manually for defective beans before they are packed into coffee bean packages; however, this method is not only time-consuming but also increases the rate of human error due to fatigue. Therefore, this paper proposed a lightweight deep convolutional neural network (LDCNN) for the quality detection system of green coffee beans, which combined depthwise separable convolution (DSC), squeeze-and-excite block (SE block), skip block, and other frameworks. To avoid the influence of low parameters of the lightweight model caused by the model training process, rectified Adam (RA), lookahead (LA), and gradient centralization (GC) were included to improve efficiency; the model was also put into the embedded system. Finally, the local interpretable model-agnostic explanations (LIME) model was employed to explain the predictions of the model. The experimental results indicated that the accuracy rate of the model could reach up to 98.38% and the F1 score could be as high as 98.24% when detecting the quality of green coffee beans. Hence, it can obtain higher accuracy, lower computing time, and lower parameters. Moreover, the interpretable model verified that the lightweight model in this work is reliable, providing the basis for screening personnel to understand the judgment through its interpretability, thereby improving the classification and prediction of the model.
ARTICLE | doi:10.20944/preprints202210.0112.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: ARIMA; convolutional neural network; Kalman filter; passenger flow; transportation; short-term prediction; stochastic model
Online: 10 October 2022 (03:05:34 CEST)
The passenger prediction flow is very significant to transportation sustainability. This is due to some chaos of traffic jams encountered by the road users during their movement to the offices, schools, or markets at earlier of the days and during closing periods. This problem is peculiar to the transportation system of the Federal University of Technology Minna, Nigeria. However, the prevailing technique of passenger flow estimation is non-parametric which depends on the fixed planning and is easily affected by noise. In this research, we proposed the development of a hybrid intelligent passenger frequency prediction model using the Auto-Regressive Integrated Moving Average (ARIMA) linear model, Convolutional Neural Network (CNN), and Kalman Filter Algorithm (KFA). The passengers’ frequency of arrival at the bus terminals is obtained and enumerated through the closed-circuit television (CCTV) and demonstrated using the Markovian Queueing Systems Model (MQSM). The ARIMA model was used for learning and prediction and compared the result with the combined techniques of using CNN-KFA. The autocorrelation coefficient functions (ACF) and partial autocorrelation coefficient functions (PACF) are used to examine the stationary data with different features. The performance of the models was analyzed and evaluated in describing the short-term passenger flow frequency at each terminal using the Mean Absolute Percentage Error (MAPE) and Mean Squared Error (MSE) values. The CNN-Kalman-filter model was fitted into the short-term series and the MAPE values are below 10%. The Mean Square Error (MSE) shows that the CNN-Kalman Filter model has the overall best performance with 83.33% of the time better than the ARIMA model and provides high accuracy in forecasting.
ARTICLE | doi:10.20944/preprints202110.0375.v1
Subject: Medicine And Pharmacology, Neuroscience And Neurology Keywords: Brain-Computer Interface (BCI), Convolutional neural network (CNN), Electroencephalogram (EEG), Explainable artificial intelligence (XAI)
Online: 26 October 2021 (11:45:00 CEST)
Functional connectivity (FC) is a potential candidate that can increase the performance of brain-computer interfaces (BCIs) in the elderly because of its compensatory role in neural circuits. However, it is difficult to decode FC by current machine learning techniques because of a lack of its physiological understanding. To investigate the suitability of FC in BCI for the elderly, we propose the decoding of lower- and higher-order FCs using a convolutional neural network (CNN) in six cognitive-motor tasks. The layer-wise relevance propagation (LRP) method describes how age-related changes in FCs impact BCI applications for the elderly compared to younger adults. Seventeen younger (24.5±2.7 years) and twelve older (72.5±3.2 years) adults were recruited to perform tasks related to hand-force control with or without mental calculation. CNN yielded a six-class classification accuracy of 75.3% in the elderly, exceeding the 70.7% accuracy for the younger adults. In the elderly, the proposed method increases the classification accuracy by 88.3% compared to the filter-bank common spatial pattern (FBCSP). LRP results revealed that both lower- and higher-order FCs were dominantly overactivated in the prefrontal lobe depending on task type. These findings suggest a promising application of multi-order FC with deep learning on BCI systems for the elderly.
ARTICLE | doi:10.20944/preprints202208.0029.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: heme distortion; pocket conformation; convolutional neural network; machine learning
Online: 2 August 2022 (03:20:13 CEST)
Heme proteins serve diverse and pivotal biological functions. Therefore, clarifying the mechanisms of these diverse functions of heme is a crucial scientific topic. Distortion of heme porphyrin is one of the key factors regulating the chemical properties of heme. Here, we constructed convolutional neural network models for predicting heme distortion from the tertiary structure of the heme-binding pocket to examine their correlation. For saddling, ruffling, doming, and waving distortions, the experimental structure and predicted values were closely correlated. Furthermore, we assessed the correlation between the cavity shape and molecular structure of heme and demonstrated that hemes in protein pockets with similar structures exhibit near-identical structures, indicating the regulation of heme distortion through the protein environment. These findings indicate that the tertiary structure of the heme-binding pocket regulates the distortion of heme porphyrin, thereby controlling the chemical properties of heme relevant to the protein function; this implies a structure–function correlation in heme proteins.
ARTICLE | doi:10.20944/preprints202212.0010.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: structure–function correlation; active site conformation; convolutional neural network; machine learning
Online: 1 December 2022 (04:11:37 CET)
Structure–function relationships in proteins have been one of the crucial scientific topics. Heme proteins have diverse and pivotal biological functions. Therefore, clarifying their structure–function correlation is significant to understand their functional mechanism and is informative for various fields of science. In this study, we constructed convolutional neural network models for predicting protein functions from the tertiary structures of heme-binding sites (active sites) of heme proteins to examine the structure–function correlation. As a result, we succeeded in the classification of oxygen-binding protein (OB), oxidoreductase (OR), proteins with both functions (OB–OR), and electron transport protein (ET) with high accuracy. Although the misclassification rate for OR and ET was high, the rates between OB and ET and between OB and OR were almost zero, indicating that the prediction model works well between protein groups with very different functions. However, predicting the function of proteins modified with amino acid mutation(s) remains a challenge. Our findings indicate a structure–function correlation in the active site of heme proteins. This study is expected to be applied to the prediction of more detailed protein functions such as catalytic reactions.
ARTICLE | doi:10.20944/preprints202010.0502.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: Statistical downscaling; Generative Adversarial Network; Combination of Errors; Convolutional Neural Network; multi-scale structural similarity index; Wasserstein GAN
Online: 25 October 2020 (19:33:49 CET)
Despite numerous studies in statistical downscaling methodology, there remains a lack of methods that can downscale from precipitation modeled in global climate models to regional level high resolution gridded precipitation. This paper reports a novel downscaling method using a Generative Adversarial Network (GAN), CliGAN, which can downscale large-scale annual maximum precipitation given by simulation of multiple atmosphere-ocean global climate models (AOGCM) from Coupled Model Inter-comparison Project 6 (CMIP6) to regional-level gridded annual maximum precipitation data. This framework utilizes a convolution encoder-dense decoder network to create a generative network and a similar network to create a critic network. The model is trained using an adversarial training approach. The critic uses the Wasserstein distance loss function and the generator is trained using a combination of adversarial loss Wasserstein distance, structural loss with the multi-scale structural similarity index (MSSIM), and content loss with the Nash-Sutcliff Model Efficiency (NS). The MSSIM index allowed us to gain insight into the model’s regional characteristics and shows that relying exclusively on point-based error functions, widely used in statistical downscaling, may not be enough to reliably simulate regional precipitation characteristics. Further use of structural loss functions within CNN-based downscaling methods may lead to higher quality downscaled climate model products.
ARTICLE | doi:10.20944/preprints202111.0230.v1
Subject: Engineering, Automotive Engineering Keywords: Convolutional neural network; Driver drowsiness; ECG signal; Heart rate variability; Wavelet scalogram
Online: 12 November 2021 (15:01:50 CET)
Driver drowsiness is one of the leading causes of traffic accidents. This paper proposes a new method for classifying driver drowsiness using deep convolution neural networks trained by wavelet scalogram images of electrocardiogram (ECG) signals. Three different classes were de-fined for drowsiness based on video observation of driving tests performed in a simulator for manual and automated modes. The Bayesian optimization method is employed to optimize the hyperparameters of the designed neural networks, such as the learning rate and the number of neurons in every layer. To assess the results of the deep network method, Heart Rate Variability (HRV) data is derived from the ECG signals, some features are extracted from this data, and finally, random forest and k-nearest neighbors (KNN) classifiers are used as two traditional methods to classify the drowsiness levels. Results show that the trained deep network achieves balanced accuracies of about 77% and 79% in the manual and automated modes, respectively. However, the best obtained balanced accuracies using traditional methods are about 62% and 64%. We conclude that designed deep networks working with wavelet scalogram images of ECG signals significantly outperform KNN and random forest classifiers which are trained on HRV-based features.
ARTICLE | doi:10.20944/preprints202203.0288.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: computer-aided detection; convolutional neural network; COVID-19; deep learning; image classification
Online: 22 March 2022 (02:19:50 CET)
One of the critical tools for early detection and subsequent evaluation of the incidence of lung diseases is chest radiography. This study presents a real-world implementation of a convolutional neural network (CNN) based Carebot Covid app to detect COVID-19 from chest X-ray (CXR) images. Our proposed model takes the form of a simple and intuitive application. Used CNN can be deployed as a STOW-RS prediction endpoint for direct implementation into DICOM viewers. The results of this study show that the deep learning model based on DenseNet and ResNet architecture can detect SARS-CoV-2 from CXR images with precision of 0.981, recall of 0.962 and AP of 0.993.
ARTICLE | doi:10.20944/preprints201903.0039.v2
Subject: Engineering, Control And Systems Engineering Keywords: Handwritten digit recognition; Convolutional Neural Network (CNN); Deep learning; MNIST dataset; Epochs; Hidden Layers; Stochastic Gradient Descent; Backpropagation
Online: 20 September 2019 (10:12:26 CEST)
In recent times, with the increase of Artificial Neural Network (ANN), deep learning has brought a dramatic twist in the field of machine learning by making it more Artificial Intelligence (AI). Deep learning is used remarkably used in vast ranges of fields because of its diverse range of applications such as surveillance, health, medicine, sports, robotics, drones etc. In deep learning, Convolutional Neural Network (CNN) is at the center of spectacular advances that mixes Artificial Neural Network (ANN) and up to date deep learning strategies. It has been used broadly in pattern recognition, sentence classification, speech recognition, face recognition, text categorization, document analysis, scene, and handwritten digit recognition. The goal of this paper is to observe the variation of accuracies of CNN to classify handwritten digits using various numbers of hidden layer and epochs and to make the comparison between the accuracies. For this performance evaluation of CNN, we performed our experiment using Modified National Institute of Standards and Technology (MNIST) dataset. Further, the network is trained using stochastic gradient descent and the backpropagation algorithm.
BRIEF REPORT | doi:10.20944/preprints201902.0257.v2
Subject: Engineering, Control And Systems Engineering Keywords: convolutional neural networks; pattern recognition; machine learning
Online: 12 March 2019 (10:18:12 CET)
This paper presents a study and implementation of a convolutional neural network to identify and recognize humpback whale specimens from the unique patterns of their tails. Starting from a dataset composed of images of whale tails, all the phases of the process of creation and training of a neural network are detailed – from the analysis and pre-processing of images to the elaboration of predictions, using TensorFlow and Keras frameworks. Other possible alternatives are also explained when it comes to tackling this problem and the complications that have arisen during the process of developing this paper.
Subject: Biology And Life Sciences, Animal Science, Veterinary Science And Zoology Keywords: convolutional neural networks; horse emotion recognition; horse emotion
Online: 7 June 2021 (12:42:05 CEST)
Creating intelligent systems capable of recognizing emotions is a difficult task, especially when looking at emotions in animals. This paper describes the process of designing a “proof of concept” system to recognize emotions in horses. This system is formed by two elements, a detector and a model. The detector is a fast region-based convolutional neural network that detects horses in an image. The model is a convolutional neural network that predicts the emotions of those horses. These two elements were trained with multiple images of horses until they achieved high accuracy in their tasks. 400 images of horses were collected and labeled to train both the detector and the model while 80 were used to validate the system. Once the two components were validated, they were combined into a testable system that would detect equine emotions based on established behavioral ethograms indicating emotional affect through head, neck, ear, muzzle and eye position. The system showed an accuracy of between 69% and 74% on the validation set, demonstrating that it is possible to predict emotions in animals using autonomous intelligent systems. Such a system has multiple applications including further studies in the growing field of animal emotions as well as in the veterinary field to determine the physical welfare of horses or other livestock.
ARTICLE | doi:10.20944/preprints202104.0501.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: convolutional neural networks; dilated neural networks; optimality
Online: 19 April 2021 (15:00:30 CEST)
One of the most effective image processing techniques is the use of convolutional neural networks, where we combine intensity values at grid points in the vicinity of each point. To speed up computations, researchers have developed a dilated version of this technique, in which only some points are processed. It turns out that the most efficient case is when we select points from a sub-grid. In this paper, we explain this empirical efficiency proving that the sub-grid is indeed optimal – in some reasonable sense. To be more precise, we prove that all reasonable optimality criteria, the optimal subset of the original grid is either a sub-grid, or a sub-grid-like set.
ARTICLE | doi:10.20944/preprints202102.0318.v3
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: Machine Learning; Artificial Intelligence; Androgen Receptor; Random Forest; Deep Neural Network; Convolutional
Online: 24 February 2021 (13:14:01 CET)
Substances that can modify the androgen receptor pathway in humans and animals are entering the environment and food chain with the proven ability to disrupt hormonal systems and leading to toxicity and adverse effects on reproduction, brain development, and prostate cancer, among others. State-of-the-art databases with experimental data of human, chimp, and rat effects by chemicals have been used to build machine learning classifiers and regressors and evaluate these on independent sets. Different featurizations, algorithms, and protein structures lead to dif- ferent results, with deep neural networks (DNNs) on user-defined physicochemically-relevant features developed for this work outperforming graph convolutional, random forest, and large featurizations. The results show that these user-provided structure-, ligand-, and statistically-based features and specific DNNs provided the best results as determined by AUC (0.87), MCC (0.47), and other metrics and by their interpretability and chemical meaning of the descriptors/features. In addition, the same features in the DNN method performed better than in a multivariate logistic model: validation MCC = 0.468 and training MCC = 0.868 for the present work compared to evalu- ation set MCC = 0.2036 and training set MCC = 0.5364 for the multivariate logistic regression on the full, unbalanced set. Techniques of this type may improve AR and toxicity description and predic- tion, improving assessment and design of compounds. Source code and data are available at https://github.com/AlfonsoTGarcia-Sosa/ML
ARTICLE | doi:10.20944/preprints201901.0319.v1
Subject: Chemistry And Materials Science, Nanotechnology Keywords: cascaded neural networks; memristor crossbar; convolutional neural networks
Online: 31 January 2019 (06:54:33 CET)
Multiply-accumulate calculations using a memristor crossbar array is an important method to realize neuromorphic computing. However, the memristor array fabrication technology is still immature, and it is difficult to fabricate large-scale arrays with high-yield, which restricts the development of memristor-based neuromorphic computing technology. Therefore, cascading small-scale arrays to achieve the neuromorphic computational ability that can be achieved by large-scale arrays, which is of great significance for promoting the application of memristor-based neuromorphic computing. To address this issue, we present a memristor-based cascaded framework with some basic computation units, several neural network processing units can be cascaded by this means to improve the processing capability of the dataset. Besides, we introduce a split method to reduce pressure of input terminal. Compared with VGGNet and GoogLeNet, the proposed cascaded framework can achieve 93.54% Fashion-MNIST accuracy under the 4.15M parameters. Extensive experiments with Ti/AlOx/TaOx/Pt we fabricated are conducted to show that the circuit simulation results can still provide a high recognition accuracy, and the recognition accuracy loss after circuit simulation can be controlled at around 0.26%.
ARTICLE | doi:10.20944/preprints202101.0579.v2
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Network Interpretation; Image Classification; Convolutional Neural Network; Integrated Gradient
Online: 22 November 2021 (14:06:52 CET)
A convolutional neural network (CNN) is sometimes understood as a black box in the sense that while it can approximate any function, studying its structure will not give us any insights into the nature of the function being approximated. In other terms, the discriminative ability does not reveal much about the latent representation of a network. This research aims to establish a framework for interpreting the CNNs by profiling them in terms of interpretable visual concepts and verifying them by means of Integrated Gradient. We also ask the question, "Do different input classes have a relationship or are they unrelated?" For instance, could there be an overlapping set of highly active neurons to identify different classes? Could there be a set of neurons that are useful for one input class whereas misleading for a different one? Intuition answers these questions positively, implying the existence of a structured set of neurons inclined to a particular class. Knowing this structure has significant values; it provides a principled way for identifying redundancies across the classes. Here the interpretability profiling has been done by evaluating the correspondence between individual hidden neurons and a set of human-understandable visual semantic concepts. We also propose an integrated gradient-based class-specific relevance mapping approach that takes the spatial position of the region of interest in the input image. Our relevance score verifies the interpretability scores in terms of neurons tuned to a particular concept/class. Further, we perform network ablation and measure the performance of the network based on our approach.
ARTICLE | doi:10.20944/preprints201910.0137.v1
Subject: Computer Science And Mathematics, Mathematics Keywords: topology optimization; convolutional neural network; high-resolution
Online: 12 October 2019 (03:56:19 CEST)
Topology optimization is a pioneering design method that can provide various candidates with high mechanical properties. However, the high-resolution for the optimum structures is highly desired, normally in turn leading to computationally intractable puzzle, especially for the famous Solid Isotropic Material with Penalization (SIMP) method. In this paper, an efficient and high-resolution topology optimization method is proposed based on the Super-Resolution Convolutional Neural Network (SRCNN) technique in the framework of SIMP. The SRCNN includes four processes, i.e. refining, path extraction & representation, non-linear mapping, and reconstruction. The high computational efficiency is achieved by a pooling strategy, which can balance the number of finite element analysis (FEA) and the output mesh in optimization process. To further reduce the high computational cost of 3D topology optimization problems, a combined treatment method using 2D SRCNN is built as another speeding-up strategy. A number of typical examples justify that the high-resolution topology optimization method adopting SRCNN has excellent applicability and high efficiency for 2D and 3D problems with arbitrary boundary conditions, any design domain shape, and varied load.
ARTICLE | doi:10.20944/preprints202303.0221.v1
Subject: Computer Science And Mathematics, Computer Networks And Communications Keywords: polyp segmentation; computer vision; ensemble; transformers; convolutional neural networks
Online: 13 March 2023 (07:31:25 CET)
In the realm of computer vision, semantic segmentation is the task of recognizing objects in images at the pixel level. This is done by performing a classification of each pixel. The task is complex and requires sophisticated skills and knowledge about the context to identify objects’ boundaries. The importance of semantic segmentation in many domains is undisputed. In medical diagnostics, it simplifies the early detection of pathologies, thus mitigating the possible consequences. In this work, we provide a review of the literature on deep ensemble learning models for polyp segmentation and we develop new ensembles based on convolutional neural networks and transformers. The development of an effective ensemble entails ensuring diversity between its components. To this end, we combine different models (HarDNet-MSEG, Polyp-PVT, and HSNet) trained with different data augmentation techniques, optimization methods, and learning rates, which we experimentally demonstrate to be useful to form a better ensemble. Most importantly, we introduce a new method to obtain the segmentation mask which is more suitable for combining transformers in an ensemble. In our extensive experimental evaluation, the proposed ensembles exhibit state-of-the-art performance.
ARTICLE | doi:10.20944/preprints202002.0231.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Convolutional Neural Networks; ensemble of classifiers; activation functions; image classification; skin detection
Online: 17 February 2020 (01:50:08 CET)
In recent years, the field of deep learning achieved considerable success in pattern recognition, image segmentation and may other classification fields. There are a lot of studies and practical applications of deep learning on images, video or text classification. In this study, we suggest a method for changing the architecture of the most performing CNN models with the aim of designing new models to be used as stand-alone networks or as a component of an ensemble. We propose to replace each activation layer of a CNN (usually a ReLu layer) by a different activation function stochastically drawn from a set of activation functions: in this way the resulting CNN has a different set of activation function layers.
ARTICLE | doi:10.20944/preprints202008.0113.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Scene classification; Deep Learning; Convolutional Neural Networks; Feature learning
Online: 5 August 2020 (06:19:27 CEST)
State-of-the-art remote sensing scene classification methods employ different Convolutional Neural Network architectures for achieving very high classification performance. A trait shared by the majority of these methods is that the class associated with each example is ascertained by examining the activations of the last fully connected layer, and the networks are trained to minimize the cross-entropy between predictions extracted from this layer and ground-truth annotations. In this work, we extend this paradigm by introducing an additional output branch which maps the inputs to low dimensional representations, effectively extracting additional feature representations of the inputs. The proposed model imposes additional distance constrains on these representations with respect to identified class representatives, in addition to the traditional categorical cross-entropy between predictions and ground-truth. By extending the typical cross-entropy loss function with a distance learning function, our proposed approach achieves significant gains across a wide set of benchmark datasets in terms of classification, while providing additional evidence related to class membership and classification confidence.
ARTICLE | doi:10.20944/preprints202207.0056.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: deep learning; convolutional neural networks; classification; machine learning; IoT
Online: 5 July 2022 (04:22:49 CEST)
In videos, the human's actions are of three-dimensional (3D) signals. These videos investigate the spatiotemporal knowledge of human behavior. The promising ability is investigated using 3D convolution neural networks (CNNs). The 3D CNNs have not yet achieved high output for their well-established two-dimensional (2D) equivalents in still photographs. Board 3D Convolutional Memory and Spatiotemporal fusion face training difficulty preventing 3D CNN from accomplishing remarkable evaluation. In this paper, we implement Hybrid Deep Learning Architecture that combines STIP and 3D CNN features to enhance the performance of 3D videos effectively. After implementation, the more detailed and deeper charting for training in each circle of space-time fusion. The training model further enhances the results after handling complicated evaluations of models. The video classification model is used in this implemented model. Intelligent 3D Network Protocol for Multimedia Data Classification using Deep Learning is introduced to further understand space-time association in human endeavors. In the implementation of the result, the well-known dataset, i.e., UCF101 to, evaluates the performance of the proposed hybrid technique. The results beat the proposed hybrid technique that substantially beats the initial 3D CNNs. The results are compared with state-of-the-art frameworks from literature for action recognition on UCF101 with an accuracy of 95%.
ARTICLE | doi:10.20944/preprints202304.1061.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: driving a car; driving behavior; electrooculography; convolutional neural networks
Online: 27 April 2023 (08:14:38 CEST)
To drive safely, the driver must be aware of the surroundings, pay attention to the road traffic, and be ready to adapt to new circumstances. Most studies on driving safety focus on detecting anomalies in driver behavior and monitoring the cognitive capabilities of drivers. In our study, we proposed a classifier for basic activities in driving a car, based on a similar approach that could be applied to the recognition of basic activities in daily life, that is, using electrooculographic (EOG) signals and a one-dimensional convolutional neural network (1D CNN). Our classifier achieved an accuracy of 80% for the 16 primary and secondary activities. The accuracy related to primary activities in driving, including crossroad, parking, roundabout was 97.9%, 96.8%, 97.4%, and 99.5%, respectively. The F1 score for secondary driving actions (0.99) was higher than for primary driving activities (0.93–0.94). Furthermore, using the same algorithm, it was possible to distinguish four secondary activities related to activities of daily life and secondary when driving a car.
ARTICLE | doi:10.20944/preprints202302.0396.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Convolutional Neural Network; Ensemble Learning; Transfer Learning; Fine-tuning; Plankton Classification; foraminifera
Online: 23 February 2023 (03:37:23 CET)
This paper presents a study of an automated system for identifying planktic foraminifera at the species level. The system uses a combination of deep learning methods, specifically Convolutional Neural Networks (CNNs), to analyze digital images of foraminifera taken at different illumination angles. The dataset is composed of 1437 groups of sixteen grayscale images, one group for each foraminifer, that are then converted to RGB images with various processing methods. These RGB images are fed into a set of CNNs, organized in an Ensemble Learning (EL) environment. The ensemble is built by training different networks using different approaches for creating the RGB images. The study finds that an ensemble of CNN models trained on different RGB images improves the system's performance compared to other state-of-the-art approaches. The proposed system was also found to outperform human experts in classification accuracy.
ARTICLE | doi:10.20944/preprints202305.1490.v1
Subject: Engineering, Civil Engineering Keywords: Surrogate Model; Convolutional Neural Network; Physics-Informed Neural Networks; Elliptic PDE; FEM
Online: 22 May 2023 (09:48:22 CEST)
This study aimed at exploring what role artificial intelligence techniques could play in the futural numerical analysis. In this paper, a convolutional neural network techniques based on modified loss function is proposed as a surrogate of finite element method(FEM). Several surrogate-based physics-informed neural networks(PINNs) are developed to solve representative boundary value problems based on elliptic partial differential equations (PDEs). Results from the proposed surrogate-based approach are in good agreement with ones from conventional FEM. It is found that modification of the loss function could improve the prediction accuracy of the neural network. It is indicated that to some extent the artificial intelligence technique could replace conventional numerical analysis as a great surrogate model.
ARTICLE | doi:10.20944/preprints201801.0019.v1
Subject: Computer Science And Mathematics, Analysis Keywords: high resolution remote sensing image; convolutional neural networks; full convolution networks; Bayesian convolutional neural networks; building extraction; conditional probability density function
Online: 3 January 2018 (04:46:44 CET)
When extract building from high resolution remote sensing image with meter/sub-meter accuracy, the shade of trees and interference of roads are the main factors of reducing the extraction accuracy. Proposed a Bayesian Convolutional Neural Networks(BCNET) model base on standard fully convolutional networks(FCN) to solve these problems. First take building with no shade or artificial removal of shade as Sample-A, woodland as Sample-B, road as Sample-C. Set up 3 sample libraries. Learn these sample libraries respectively, get their own set of feature vector; Mixture Gauss model these feature vector set, evaluate the conditional probability density function of mixture of noise object and roofs; Improve the standard FCN from the 2 aspect:(1) Introduce atrous convolution. (2) Take conditional probability density function as the activation function of the last convolution. Carry out experiment using unmanned aerial vehicle(UVA) image, the results show that BCNET model can effectively eliminate the influence of trees and roads, the building extraction accuracy can reach 97%.
TECHNICAL NOTE | doi:10.20944/preprints201811.0529.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: Calving Front; Image Segmentation; U-Net; Convolutional Neural Network; Machine Learning; Greenland
Online: 21 November 2018 (14:05:00 CET)
The continuous and precise mapping of glacier calving fronts is essential for monitoring and understanding rapid glacier changes in Antarctica and Greenland, which have the potential for significant sea level rise within the current century. This effort has been mostly restricted to the slow and painstaking manual digitalization of the calving front positions in thousands of satellite imagery products. Here, we have developed a machine learning toolkit to robustly and automatically detect glacier calving front margins in satellite imagery. The toolkit is based on semantic image segmentation using Convolutional Neural Networks (CNN) with a modified U-Net architecture to isolate the calving fronts from satellite images after having been trained with a dataset of images and their corresponding manually-determined calving fronts. As a case study we train our neural network on a varied set Landsat images with lowered resolutions from Jakobshavn, Sverdrup, and Kangerlussuaq glaciers, Greenland and test the results on novel images from Helheim glacier, Greenland to evaluate the performance of the approach. The neural network is able to identify the calving front in new images with a mean deviation of 96.3 m from the true fronts, equivalent to 1.97 pixels on average, while the corresponding error for manually-determined fronts on the same resolution images is 92.5 m. We find that the trained neural network significantly outperforms common edge detection techniques, and can be used to continuously map out calving-ice fronts with a variety of data products.
ARTICLE | doi:10.20944/preprints202005.0430.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Activity Context Sensing; Smartphones; Deep Convolutional Neural Networks; Smart devices
Online: 26 May 2020 (11:33:55 CEST)
With the widespread of embedded sensing capabilities of mobile devices, there has been unprecedented development of context-aware solutions. This allows the proliferation of various intelligent applications such as those for remote health and lifestyle monitoring, intelligent personalized services, etc. However, activity context recognition based on multivariate time series signals obtained from mobile devices in unconstrained conditions is naturally prone to imbalance class problems. This means that recognition models tend to predict classes with the majority number of samples whilst ignoring classes with the least number of samples, resulting in poor generalization. To address this problem, we propose to augment the time series signals from inertia sensors with signals from ambient sensing to train deep convolutional neural networks (DCNN) models. DCNN provides the characteristics that capture local dependency and scale invariance of these combined sensor signals. Consequently, we developed a DCNN model using only inertial sensor signals and then developed another model that combined signals from both inertia and ambient sensors aiming to investigate the class imbalance problem by improving the performance of the recognition model. Evaluation and analysis of the proposed system using data with imbalanced classes show that the system achieved better recognition accuracy when data from inertial sensors are combined with those from ambient sensors such as environment noise level and illumination, with an overall improvement of 5.3% accuracy.
ARTICLE | doi:10.20944/preprints202301.0031.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Spatially Adaptive De-normalization (SPADE); Super-Resolution; Convolutional Neural Network; Generative Adversarial Network)
Online: 9 January 2023 (02:15:56 CET)
With the development of deep learning technology, various structures and research methods for super-resolution restoration of natural images and document images have been introduced. In particular, a number of recent studies have been conducted and developed in image restoration using generative adversarial network. Super-resolution restoration is ill-posed problem because of some complex restraints such as a lot of high-resolution images being restored for the same low-resolution image and also difficulty in restoring noises like edges, light smudging, and blurring. In this study, we utilized the spatially adaptive de-normalization (SPADE) structure for document image restoration to solve previous problems such as edge unclearness, hardness to catch features of texts, and the image color transition. Consequently, it can be confirmed that the edge of the character and the ambiguous stroke are restored more clearly when contrasting with the other previously suggested methods. Also, the proposed method’s PSNR and SSIM scores are geting 8% and 15% higher, respectively, compared to the previous methods.
ARTICLE | doi:10.20944/preprints202003.0035.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: meta-learning; lie group; machine learning; deep learning; convolutional neural network
Online: 3 March 2020 (11:09:53 CET)
Deep learning has achieved lots of successes in many fields, but when trainable sample are extremely limited, deep learning often under or overfitting to few samples. Meta-learning was proposed to solve difficulties in few-shot learning and fast adaptive areas. Meta-learner learns to remember some common knowledge by training on large scale tasks sampled from a certain data distribution to equip generalization when facing unseen new tasks. Due to the limitation of samples, most approaches only use shallow neural network to avoid overfitting and reduce the difficulty of training process, that causes the waste of many extra information when adapting to unseen tasks. Euclidean space-based gradient descent also make meta-learner's update inaccurate. These issues cause many meta-learning model hard to extract feature from samples and update network parameters. In this paper, we propose a novel method by using multi-stage joint training approach to post the bottleneck during adapting process. To accelerate adapt procedure, we also constraint network to Stiefel manifold, thus meta-learner could perform more stable gradient descent in limited steps. Experiment on mini-ImageNet shows that our method reaches better accuracy under 5-way 1-shot and 5-way 5-shot conditions.
ARTICLE | doi:10.20944/preprints201812.0090.v3
Subject: Engineering, Control And Systems Engineering Keywords: deep convolutional neural networks; multi-class segmentation; global convolutional network; channel attention; transfer learning; ISPRS Vaihingen; Landsat-8
Online: 4 January 2019 (11:47:42 CET)
In the remote sensing domain, it is crucial to complete semantic segmentation on the raster images, e.g., river, building, forest, etc, on raster images. A deep convolutional encoder--decoder (DCED) network is the state-of-the-art semantic segmentation method for remotely sensed images. However, the accuracy is still limited, since the network is not designed for remotely sensed images and the training data in this domain is deficient. In this paper, we aim to propose a novel CNN for semantic segmentation particularly for remote sensing corpora with three main contributions. First, we propose applying a recent CNN called a global convolutional network (GCN), since it can capture different resolutions by extracting multi-scale features from different stages of the network. Additionally, we further enhance the network by improving its backbone using larger numbers of layers, which is suitable for medium resolution remotely sensed images. Second, "channel attention'' is presented in our network in order to select the most discriminative filters (features). Third, "domain-specific transfer learning'' is introduced to alleviate the scarcity issue by utilizing other remotely sensed corpora with different resolutions as pre-trained data. The experiment was then conducted on two given datasets: (i) medium resolution data collected from Landsat-8 satellite and (ii) very high resolution data called the ISPRS Vaihingen Challenge Dataset. The results show that our networks outperformed DCED in terms of $F1$ for 17.48% and 2.49% on medium and very high resolution corpora, respectively.
ARTICLE | doi:10.20944/preprints201809.0361.v3
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: deep learning; convolutional neural networks; polar mesocyclones; satellite data processing; pattern recognition
Online: 29 October 2018 (10:16:49 CET)
Polar mesocyclones (MCs) are small marine atmospheric vortices. The class of intense MCs, called polar lows, are accompanied by extremely strong surface winds and heat fluxes and thus largely influencing deep ocean water formation in the polar regions. Accurate detection of polar mesocyclones in high-resolution satellite data, while challenging, is a time-consuming task, when performed manually. Existing algorithms for the automatic detection of polar mesocyclones are based on the conventional analysis of patterns of cloudiness and involve different empirically defined thresholds of geophysical variables. As a result, various detection methods typically reveal very different results when applied to a single dataset. We develop a conceptually novel approach for the detection of MCs based on the use of deep convolutional neural networks (DCNNs). As a first step, we demonstrate that DCNN model is capable of performing binary classification of 500x500km patches of satellite images regarding MC patterns presence in it. The training dataset is based on the reference database of MCs manually tracked in the Southern Hemisphere from satellite mosaics. We use a subset of this database with MC diameters falling in the range of 200-400 km. This dataset is further used for testing several different DCNN setups, specifically, DCNN built “from scratch”, DCNN based on VGG16 pre-trained weights also engaging the Transfer Learning technique, and DCNN based on VGG16 with Fine Tuning technique. Each of these networks is further applied to both infrared (IR) and a combination of infrared and water vapor (IR+WV) satellite imagery. The best skills (97% in terms of the binary classification accuracy score) is achieved with the model that averages the estimates of the ensemble of different DCNNs. The algorithm can be further extended to the automatic identification and tracking numerical scheme and applied to other atmospheric phenomena characterized by a distinct signature in satellite imagery.
TECHNICAL NOTE | doi:10.20944/preprints202009.0678.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: multi-frame super resolution; wide activation super resolution; 3D convolutional neural network; deep learning
Online: 27 September 2020 (11:54:56 CEST)
The small satellite market continues to grow year after year. A compound annual growth rate of 17% is estimated during the period between 2020 and 2025. Low-cost satellites can send a vast amount of images to be post-processed at the ground to improve the quality and extract detailed information. In this domain lies the resolution enhancement task, where a low-resolution image is converted to a higher resolution automatically. Deep learning approaches to Super-Resolution (SR) reached the state-of-the-art in multiple benchmarks; however, most of them were studied in a single-frame fashion. With satellite imagery, multi-frame images can be obtained at different conditions giving the possibility to add more information per image and improve the final analysis. In this context, we developed and applied to the PROBA-V dataset of multi-frame satellite images a model that recently topped the European Space Agency’s Multi-frame Super Resolution (MFSR) competition. The model is based on proven methods that worked on 2D images tweaked to work on 3D: the Wide Activation Super Resolution (WDSR) family. We show that with a simple 3D CNN residual architecture with WDSR blocks and a frame permutation technique as data augmentation better scores can be achieved than with more complex models. Moreover, the model requires few hardware resources, both for training and evaluation, so it can be applied directly from a personal laptop.
ARTICLE | doi:10.20944/preprints202009.0524.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: COVID-19; chest X-ray images; deep convolutional neural network; COV-MCNet; deep learning
Online: 23 September 2020 (03:31:30 CEST)
The COVID-19 pandemic situation has created even more difficulties in the quick identification and screening of the COVID-19 patients for the medical specialists. Therefore, a significant study is necessary for detecting COVID-19 cases using an automated diagnosis method, which can aid in controlling the spreading of the virus. In this paper, the study suggests a Deep Convolutional Neural Network-based multi-classification approach (COV-MCNet) using eight different pre-trained architectures such as VGG16, VGG19, ResNet50V2, DenseNet201, InceptionV3, MobileNet, InceptionResNetV2, Xception which are trained and tested on the X-ray images of COVID-19, Normal, Viral Pneumonia, and Bacterial Pneumonia. The results from 3-class (Normal vs. COVID-19 vs. Viral Pneumonia) showed that only the ResNet50V2 model provides the highest classification performance (accuracy: 95.83%, precision: 96.12%, recall: 96.11%, F1-score: 96.11%, specificity: 97.84%) compared to rest of the models. The results from 4-class (Normal vs. COVID-19 vs. Viral Pneumonia vs. Bacterial Pneumonia) demonstrated that the pre-trained model DenseNet201 provides the highest classification performance (accuracy: 92.54%, precision: 93.05%, recall: 92.81%, F1-score: 92.83%, specificity: 97.47%). Notably, the ResNet50V2 (3-class) and DenseNet201 (4-class) models in the proposed COV-MCNet framework showed higher accuracy compared to the rest six models. This indicates that the designed system can produce promising results to detect the COVID-19 cases on the availability of more data. The proposed multi-classification network (COV-MCNet) significantly speeds up the existing radiology-based method, which will be helpful to the medical community and clinical specialists for early diagnosis of the COVID-19 cases during this pandemic.
ARTICLE | doi:10.20944/preprints201906.0270.v2
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: Land cover mapping; Convolutional neural networks; UNET; Sentinel-2
Online: 9 August 2019 (11:54:37 CEST)
The Sentinel-2 satellite mission offers high resolution multispectral time series image data, enabling the production of detailed land cover maps globally. At this scale, the trade-off between processing time and result quality is a central design decision. Currently, this machine learning task is usually performed using pixelwise classification methods. The radical shift of the computer vision field away from hand engineered image features and towards more automation by representation learning comes with many promises, including higher quality results and less engineering effort. In this paper we assess fully convolutional neural networks architectures as replacements for a Random Forest classifier in an operational context for the production of high resolution land cover maps with Sentinel-2 time series at the country scale. Our contributions include a framework for working with Sentinel-2 L2A time series image data, an adaptation of the U-Net model for dealing with sparse annotation data while maintaining high resolution output, and an analysis of those results in the context of operational production of land cover maps.
ARTICLE | doi:10.20944/preprints201908.0068.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: deep learning; convolutional neural networks (CNN); transfer learning; class activation mapping (CAM); building defects; structural-health monitoring
Online: 6 August 2019 (04:18:29 CEST)
Clients are increasingly looking for fast and effective means to quickly and frequently survey and communicate the condition of their buildings so that essential repairs and maintenance work can be done in a proactive and timely manner before it becomes too dangerous and expensive. Traditional methods for this type of work commonly comprise of engaging building surveyors to undertake a condition assessment which involves a lengthy site inspection to produce a systematic recording of the physical condition of the building elements, including cost estimates of immediate and projected long-term costs of renewal, repair and maintenance of the building. Current asset condition assessment procedures are extensively time consuming, laborious, and expensive and pose health and safety threats to surveyors, particularly at height and roof levels which are difficult to access. We propose a method for automated detection and localisation of key building defects from images using deep learning and convolution neural networks. The proposed model is based on a pre-trained VGG-16 classifier with Class Activation Mapping (CAM) for object localisation. The model has proven to be robust and able to accurately detect and localise mould growth, stains, and paint deterioration defects arising from dampness in buildings. The approach is being developed with potentials to scale-up to support automated detection of defects and deterioration of buildings in real-time using mobile devices and drones.
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: microcombs; optical neural networks; neuromorphic computing, artificial intelligence; Kerr microcombs; convolutional neural network
Online: 16 November 2020 (13:30:14 CET)
Convolutional neural networks (CNNs), inspired by biological visual cortex systems, are a powerful category of artificial neural networks that can extract the hierarchical features of raw data to greatly reduce the network parametric complexity and enhance the predicting accuracy. They are of significant interest for machine learning tasks such as computer vision, speech recognition, playing board games and medical diagnosis [1-7]. Optical neural networks offer the promise of dramatically accelerating computing speed to overcome the inherent bandwidth bottleneck of electronics. Here, we demonstrate a universal optical vector convolutional accelerator operating beyond 10 Tera-FLOPS (floating point operations per second), generating convolutions of images of 250,000 pixels with 8-bit resolution for 10 kernels simultaneously — enough for facial image recognition. We then use the same hardware to sequentially form a deep optical CNN with ten output neurons, achieving successful recognition of full 10 digits with 900 pixel handwritten digit images with 88% accuracy. Our results are based on simultaneously interleaving temporal, wavelength and spatial dimensions enabled by an integrated microcomb source. This approach is scalable and trainable to much more complex networks for demanding applications such as unmanned vehicle and real-time video recognition.
ARTICLE | doi:10.20944/preprints202011.0527.v1
Subject: Engineering, Aerospace Engineering Keywords: Aircraft Maintenance Inspection; Anomaly Detection; Defect Inspection; Convolutional Neural Networks; Mask R-CNN; Generative Adversarial Networks; Image Augmentation
Online: 20 November 2020 (09:16:13 CET)
Convolutional Neural Networks combined with autonomous drones are increasingly seen as enablers of partially automating the aircraft maintenance visual inspection process. Such an innovative concept can have a significant impact on aircraft operations. Through supporting aircraft maintenance engineers detect and classify a wide range of defects, the time spent on inspection can significantly be reduced. Examples of defects that can be automatically detected include aircraft dents, paint defects, cracks and holes, and lightning strike damage. Additionally, this concept could also increase the accuracy of damage detection and reduce the number of aircraft inspection incidents related to human factors like fatigue and time pressure. In our previous work, we have applied a recent Convolutional Neural Network architecture known by MASK R-CNN to detect aircraft dents. MASK-RCNN was chosen because it enables the detection of multiple objects in an image while simultaneously generating a segmentation mask for each instance. The previously obtained F1 and F2 scores were 62.67% and 59.35% respectively. This paper extends the previous work by applying different techniques to improve and evaluate prediction performance experimentally. The approaches uses include (1) Balancing the original dataset by adding images without dents; (2) Increasing data homogeneity by focusing on wing images only; (3) Exploring the potential of three augmentation techniques in improving model performance namely flipping, rotating, and blurring; and (4) using a pre-classifier in combination with MASK R-CNN. The results show that a hybrid approache combining MASK R-CNN and augmentation techniques leads to an improved performance with an F1 score of (67.50%) and F2 score of (66.37%)
REVIEW | doi:10.20944/preprints202110.0135.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: convolutional neural networks (CNNs); deep learning; computer-aided diagnosis; colorectal polyps; colorectal cancer; colonoscopy
Online: 8 October 2021 (10:50:53 CEST)
As a relatively high percentage of adenoma polyps are missed, a computer-aided diagnosis (CAD) tool based on deep learning can aid the endoscopist in diagnosing colorectal polyps or colorectal cancer in order to decrease polyps missing rate and prevent colorectal cancer mortality. Convolutional Neural Network (CNN) is a deep learning method and has achieved better results in detecting and segmenting specific objects in images in the last decade than conventional models such as regression, support vector machines or artificial neural networks. In recent years, based on the studies in medical imaging criteria, CNN models have acquired promising results in detecting masses and lesions in various body organs, including colorectal polyps. In this review, the structure and architecture of CNN models and how colonoscopy images are processed as input and converted to the output are explained in detail. In most primary studies conducted in the colorectal polyp detection and classification field, the CNN model has been regarded as a black box since the calculations performed at different layers in the model training process have not been clarified precisely. Furthermore, I discuss the differences between the CNN and conventional models, inspect how to train the CNN model for diagnosing colorectal polyps or cancer, and evaluate model performance after the training process.
ARTICLE | doi:10.20944/preprints202105.0429.v1
Subject: Medicine And Pharmacology, Other Keywords: Acute lymphoblastic leukemia; Deep convolutional neural networks; Ensemble image classifiers; C-NMC-2019 dataset.
Online: 19 May 2021 (07:42:23 CEST)
Although automated Acute Lymphoblastic Leukemia (ALL) detection is essential, it is challenging due to the morphological correlation between malignant and normal cells. The traditional ALL classification strategy is arduous, time-consuming, often suffers inter-observer variations, and necessitates experienced pathologists. This article has automated the ALL detection task, employing deep Convolutional Neural Networks (CNNs). We explore the weighted ensemble of deep CNNs to recommend a better ALL cell classifier. The weights are estimated from ensemble candidates' corresponding metrics, such as accuracy, F1-score, AUC, and kappa values. Various data augmentations and pre-processing are incorporated for achieving a better generalization of the network. We train and evaluate the proposed model utilizing the publicly available C-NMC-2019 ALL dataset. Our proposed weighted ensemble model has outputted a weighted F1-score of 88.6%, a balanced accuracy of 86.2%, and an AUC of 0.941 in the preliminary test set. The qualitative results displaying the gradient class activation maps confirm that the introduced model has a concentrated learned region. In contrast, the ensemble candidate models, such as Xception, VGG-16, DenseNet-121, MobileNet, and InceptionResNet-V2, separately produce coarse and scatter learned areas for most example cases. Since the proposed ensemble yields a better result for the aimed task, it can experiment in other domains of medical diagnostic applications.
ARTICLE | doi:10.20944/preprints202304.0645.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Lip Reading; Multiclass Classification; Turkish Lip Reading Dataset; Deep Learning; Convolutional Neural Networks; Lip Detection
Online: 20 April 2023 (10:07:48 CEST)
Automated lip reading is a research problem that has developed considerably in recent years. Lip reading is evaluated both visually and audibly in some cases. The lip reading model is a field of use for detecting specific words using images from security cameras, but it is not possible to use audio-visual databases in this situation. It is not possible to obtain the sound input of the pronounced word in all cases. We collected a new Turkish dataset with only the image in this study. The new dataset is produced using Youtube videos, which is an uncontrolled environment. For this reason, images have difficult parameters in terms of environmental factors such as light, angle, color, and personal characteristics of the face. Despite the different features on the human face such as mustache, beard, and make-up, the visual speech recognition problem was developed on 10 classes including single words and two-word phrases using Convolutional Neural Networks (CNN) without any intervention on the data. The proposed study using only-visual data obtained a model which is automated visual speech recognition with a deep learning approach. In addition, since this study uses only-visual data, the computational cost and resource usage is less than in multi-modal studies. It is also the first known study to address the lip reading problem with a deep learning algorithm using a new dataset belonging to the Ural-Altaic languages.
ARTICLE | doi:10.20944/preprints202002.0334.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: deep learning; drone imagery; hyperspectral image classiﬁcation; tree species classification; 3D convolutional neural networks
Online: 24 February 2020 (01:13:13 CET)
Interest in drone solutions in forestry applications is growing. Using drones, datasets can be captured flexibly and at high spatial and temporal resolutions when needed. In forestry applications, fundamental tasks include the detection of individual trees, tree species classification, bio-mass estimation, etc. Deep Neural Networks (DNN) have shown superior results when comparing with conventional machine learning methods such as Multi-Layer Perceptron (MLP) in cases of huge input data. The objective of this research was to investigate 3D convolutional neural networks (3D-CNN) to classify three major tree species in a boreal forest: pine, spruce, and birch. The proposed 3D-CNN models were employed to classify tree species in a test site in Finland. The classifiers were trained with a dataset of 3039 manually labelled trees. Then the accuracies were assessed by employing independent datasets of 803 records. To find the most efficient set of feature combination, we compare the performances of 3D-CNN models trained with hyperspectral (HS) channels, RGB channels, and canopy height model (CHM), separately and combined. It is demonstrated that the proposed 3D-CNN model with RGB and HS layers produces the highest classification accuracy. The producer accuracy of the best 3D-CNN classifier on the test dataset were 99.6%, 94.8%, and 97.4% for pines, spruces, and birches, respectively. The best 3D-CNN classifier produced ~5% better classification accuracy than the MLP with all layers. Our results suggest that the proposed method provides excellent classification results with acceptable performance metrics for HS datasets. Our results show that pine class was detectable in most layers. Spruce was most detectable in RGB data, while birch was most detectable in the HS layers. Furthermore, the RGB datasets provide acceptable results for many low-accuracy applications.
ARTICLE | doi:10.20944/preprints201809.0481.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Brain-Computer Interfaces, spectrogram-based convolutional neural network model(pCNN), Deep Learning, EEG, LSTM, RCNN
Online: 25 September 2018 (08:58:34 CEST)
Non-invasive, electroencephalography (EEG)-based brain-computer interfaces (BCIs) on motor imagery movements translate the subject’s motor intention into control signals through classifying the EEG patterns caused by different imagination tasks, e.g. hand movements. This type of BCI has been widely studied and used as an alternative mode of communication and environmental control for disabled patients, such as those suffering from a brainstem stroke or a spinal cord injury (SCI). Notwithstanding the success of traditional machine learning methods in classifying EEG signals, these methods still rely on hand-crafted features. The extraction of such features is a difficult task due to the high non-stationarity of EEG signals, which is a major cause for the stagnating progress in classification performance. Remarkable advances in deep learning methods allow end-to-end learning without any feature engineering, which could benefit BCI motor imagery applications. We developed three deep learning models: 1) a long short-term memory (LSTM); 2) a proposed spectrogram-based convolutional neural network model (pCNN); and 3) a recurrent convolutional neural network (RCNN), for decoding motor imagery movements directly from raw EEG signals without (manual) feature engineering. Results were evaluated on our own, publicly available, EEG data collected from 20 subjects and on an existing dataset known as 2b EEG dataset from "BCI Competition IV". Overall, better classification performance was achieved with deep learning models compared to state-of-the art machine learning techniques, which could chart a route ahead for developing new robust techniques for EEG signal decoding. We underpin this point by demonstrating the successful real-time control of a robotic arm using our CNN based BCI.
ARTICLE | doi:10.20944/preprints202304.0320.v1
Subject: Computer Science And Mathematics, Mathematical And Computational Biology Keywords: Ovarian Tumours; UNet; Convolutional Neural Networks; VGG 16; DenseNet; ResNet; Dice score; Jaccard score
Online: 13 April 2023 (10:50:53 CEST)
The difficulty in detecting tumors in earlier stages is the major cause of mortalities of patients, despite the advancements in treatment and research regarding ovarian cancer. Deep Learning algorithms are applied to serve the purpose of a diagnostic tool by applying them on CT scan images of the ovarian region. The images go through a series of pre-processing techniques and further the tumor is segmented using the UNet model. Instances are then classified into two categories – benign and malignant tumors. Classification is performed using Deep Learning models like CNN, ResNet, DenseNet, Inception-ResNet, VGG16 and Xception along with Machine Learning models such as Random Forest, Gradient Boosting, AdaBoosting, XGBoosting. DenseNet 121 emerges as the best model on this dataset even after applying optimization on the Machine Learning models by obtaining an accuracy of 95.7%. The current work demonstrates the comparison of multiple CNN architectures among themselves and with common Machine Learning algorithms, with and without optimization techniques applied.
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: automatic detection; chest X-ray; convolutional neural network; COVID-19; deep learning; feature extraction; image classification; pneumonia
Online: 27 April 2021 (14:08:53 CEST)
One of the critical tools for early detection and subsequent evaluation of the incidence of lung diseases is chest radiography. At a time when the speed and reliability of results, especially for COVID-19 positive patients, is important, the development of applications that would facilitate the work of untrained staff involved in the evaluation is also crucial. Our model takes the form of a simple and intuitive application, into which you only need to upload X-rays: tens or hundreds at once. In just a few seconds, the physician will determine the patient's diagnosis, including the percentage accuracy of the estimate. While the original idea was a mere binary classifier that could tell if a patient was suffering from pneumonia or not, in this paper we present a model that distinguishes between a bacterial disease, a viral infection, or a finding caused by COVID-19. The aim of this research is to demonstrate whether pneumonia can be detected or even spatially localized using a uniform, supervised classification.
ARTICLE | doi:10.20944/preprints201808.0112.v2
Subject: Computer Science And Mathematics, Computational Mathematics Keywords: remote sensing; image classification; fully connected conditional random fields (FC-CRF); convolutional neural networks (CNN)
Online: 28 November 2018 (07:11:42 CET)
The interpretation of land use and land cover (LULC) is an important issue in the fields of high-resolution remote sensing (RS) image processing and land resource management. Fully training a new or existing convolutional neural network (CNN) architecture for LULC classification requires a large amount of remote sensing images. Thus, fine-tuning a pre-trained CNN for LULC detection is required. To improve the classification accuracy for high resolution remote sensing images, it is necessary to use another feature descriptor and to adopt a classifier for post-processing. A fully connected conditional random fields (FC-CRF), to use the fine-tuned CNN layers, spectral features, and fully connected pairwise potentials, is proposed for image classification of high-resolution remote sensing images. First, an existing CNN model is adopted, and the parameters of CNN are fine-tuned by training datasets. Then, the probabilities of image pixels belong to each class type are calculated. Second, we consider the spectral features and digital surface model (DSM) and combined with a support vector machine (SVM) classifier, the probabilities belong to each LULC class type are determined. Combined with the probabilities achieved by the fine-tuned CNN, new feature descriptors are built. Finally, FC-CRF are introduced to produce the classification results, whereas the unary potentials are achieved by the new feature descriptors and SVM classifier, and the pairwise potentials are achieved by the three-band RS imagery and DSM. Experimental results show that the proposed classification scheme achieves good performance when the total accuracy is about 85%.
ARTICLE | doi:10.20944/preprints201706.0012.v3
Subject: Engineering, Control And Systems Engineering Keywords: deep convolutional neural networks; road segmentation; conditional random fields; landscape metrics; satellite images; aerial images; THEOS
Online: 5 June 2017 (06:39:54 CEST)
Object segmentation on remotely-sensed images: aerial (or very high resolution, VHS) images and satellite (or high resolution, HR) images, has been applied to many application domains, especially road extraction in which the segmented objects are served as a mandatory layer in geospatial databases. Several attempts in applying deep convolutional neural network (DCNN) to extract roads from remote sensing images have been made; however, the accuracy is still limited. In this paper, we present an enhanced DCNN framework specifically tailored for road extraction on remote sensing images by applying landscape metrics (LMs) and conditional random fields (CRFs). To improve DCNN, a modern activation function, called exponential linear unit (ELU), is employed in our network resulting in a higher number of and yet more accurate extracted roads. To further reduce falsely classified road objects, a solution based on an adoption of LMs is proposed. Finally, to sharpen the extracted roads, a CRF method is added to our framework. The experiments were conducted on Massachusetts road aerial imagery as well as THEOS satellite imagery data sets. The results showed that our proposed framework outperformed Segnet, the state-of-the-art object segmentation technique on any kinds of remote sensing imagery, in most of the cases in terms of precision, recall, and F1.
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Car Detection; Convolutional Neural Networks; Deep Learning; Faster R-CNN; Unmanned Aerial Vehicles; You Only Look Once (Yolo).
Online: 12 March 2020 (08:57:09 CET)
In this paper, we address the problem of car detection from aerial images using Convolutional Neural Networks (CNN). This problem presents additional challenges as compared to car (or any object) detection from ground images because features of vehicles from aerial images are more difficult to discern. To investigate this issue, we assess the performance of two state-of-the-art CNN algorithms, namely Faster R-CNN, which is the most popular region-based algorithm, and YOLOv3, which is known to be the fastest detection algorithm. We analyze two datasets with different characteristics to check the impact of various factors, such as UAV's altitude, camera resolution, and object size. The objective of this work is to conduct a robust comparison between these two cutting-edge algorithms. By using a variety of metrics, we show that YOLOv3 yields better performance in most configurations, except that it exhibits a lower recall and less confident detections when object sizes and scales in the testing dataset differ largely from those in the training dataset.
ARTICLE | doi:10.20944/preprints201910.0195.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: car detection; convolutional neural networks; deep learning; you only look once (yolo); faster r-cnn; unmanned aerial vehicles
Online: 17 October 2019 (12:29:29 CEST)
In this paper, we address the problem of car detection from aerial images using Convolutional Neural Networks (CNN). This problem presents additional challenges as compared to car (or any object) detection from ground images because features of vehicles from aerial images are more difficult to discern. To investigate this issue, we assess the performance of two state-of-the-art CNN algorithms, namely Faster R-CNN, which is the most popular region-based algorithm, and YOLOv3, which is known to be the fastest detection algorithm. We analyze two datasets with different characteristics to check the impact of various factors, such as UAV’s altitude, camera resolution, and object size. The objective of this work is to conduct a robust comparison between these two cutting-edge algorithms. By using a variety of metrics, we show that none of the two algorithms outperforms the other in all cases.
ARTICLE | doi:10.20944/preprints202111.0078.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: GRaVN; machine learning; convolutional neural networks; CNN; raman spectroscopy; analogue missions; planetary science; random undersampling; random oversampling; CanMoon
Online: 3 November 2021 (09:24:38 CET)
During planetary exploration mission operations, one of the key responsibilities of the instrument teams to determine data viability for subsequent analysis. During the 2019 CanMoon Lunar Sample Return Analogue Mission, the Lead Raman Specialist manually examined each spectra to provide quality assurance/validation. This non-trivial process requires years of experience to complete accurately. With the proven efficacy of Convolutional Neural Networks (CNNs) in classification tasks, and the increased use of automation and control loops on planetary space platforms for navigation and science targeting, an opportunity presents itself to approach this validation problem utilising CNNs. We present the Generalised Raman Validation Network (GRaVN), an neural network focused specifically on extracting the generalised structure of Raman spectra for quality assurance/validation. This work demonstrates the viability of utilising a CNN network in validation activities for Raman spectroscopy. Utilising only two hidden layers, a configuration was developed that provided good levels of accuracy on a manually curated dataset. This indicates that such a system could be useful as part of an autonomous control loop during planetary exploration activities.
ARTICLE | doi:10.20944/preprints202305.0796.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: keyword light-weight; insulator and defect detection; YOLOv5; Ghost module; convolutional block attention module; unmanned aerial vehicles
Online: 11 May 2023 (05:21:17 CEST)
Insulator defect detection is of great significance to compromise the stability of the power transmission line. The state-of-the-art network of object detection, YOLOv5, has been widely used on insulator and defect detection. However, YOLOv5 network has some limitations like poor detection rate and high computational loads in detecting small insulator defects. To solve these problems, we proposed a light-weight network for insulator and defect detection. In this network, we introduced Ghost module into YOLOv5 backbone and neck to reduce the parameters and model size to enhance the performance in unmanned aerial vehicles (UAVs). Besides, we added small object detection anchors and layers for small defect detection. In addition, we optimized the backbone of YOLOv5 by applying convolutional block attention module (CBAM) to focus on critical information for insulator and defect detection and suppress uncritical information. The experiment result shows the mean average precision (mAP) 0.5 and the mAP0.5:0.95 of our model can reach 99.4% and 91.7%, the parameters and model weight are reduced to 3807372 and 8.79M, which can easily deploy to embedded devices like UAVs. And the speed of detection can reach 10.9ms/image, which can meet the real-time detection requirement.
ARTICLE | doi:10.20944/preprints202108.0272.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: Remaining Useful Life; Deep Neural Network; Convolutional Neural Network; Genetic Optimization; Neural Network Optimization; Support Vector Regression; Depth Maps; Normal Maps; 3D Point Clouds.
Online: 12 August 2021 (10:40:23 CEST)
In the current industrial landscape, increasingly pervaded by technological innovations, the adoption of optimized strategies for asset management is becoming a critical key success factor. Among the various strategies available, the “Prognostics and Health Management” strategy is able to support maintenance management decisions more accurately, through continuous monitoring of equipment health and “Remaining Useful Life” forecasting. In the present study, Convolutional Neural Network-based Deep Neural Network techniques are investigated for the Remaining Useful Life prediction of a punch tool, whose degradation is caused by working surface deformations during the machining process. Surface deformation is determined using a 3D scanning sensor capable of returning point clouds with micrometric accuracy during the operation of the punching machine, avoiding both downtime and human intervention. The 3D point clouds thus obtained are transformed into bidimensional image-type maps, i.e., maps of depths and normal vectors, to fully exploit the potential of convolutional neural networks for extracting features. Such maps are then processed by comparing 15 genetically optimized architectures with the transfer learning of 19 pre-trained models, using a classic machine learning approach, i.e., Support Vector Regression, as a benchmark. The achieved results clearly show that, in this specific case, optimized architectures provide performance far superior (MAPE=0.058) to that of transfer learning which, instead, remains at a lower or slightly higher level (MAPE=0.416) than Support Vector Regression (MAPE=0.857).
ARTICLE | doi:10.20944/preprints201711.0053.v3
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: ultrasound; b-mode; skeletal muscle; fascicle orientation; pennation angle; fiber orientation; fiber tract; fascicle tract; convolutional neural network; deconvolutional neural network
Online: 19 January 2018 (14:05:16 CET)
Direct measurement of strain within muscle is important for understanding muscle function in health and disease. Current technology (kinematics, dynamometry, electromyography) provides limited ability to measure strain within muscle. Regional fiber orientation and length are related with active/passive strain within muscle. Currently, ultrasound imaging provides the only non-invasive means of observing regional fiber orientation within muscle during dynamic tasks. Previous attempts to automatically estimate fiber orientation from ultrasound are not adequate, often requiring manual region selection, feature engineering, providing low-resolution estimations (one angle per muscle), and deep muscles are often not attempted. Here, we propose deconvolutional neural networks (DCNN) for estimating fiber orientation at the pixel-level. Dynamic ultrasound images sequences of the calf muscles were acquired (25 Hz) from 8 healthy volunteers (4 male, ages: 25–36, median 30). A combination of expert annotation and interpolation/extrapolation provided labels of regional fiber orientation for each image. We then trained DCNNs both with and without dropout using leave one out cross-validation. Our results demonstrated robust estimation of regional fiber orientation with approximately 3° error, which was an improvement on previous methods. The methods presented here provide new potential to study muscle in disease and health.
ARTICLE | doi:10.20944/preprints201902.0203.v1
Subject: Biology And Life Sciences, Plant Sciences Keywords: Northern Corn Leaf Blight (Exserohilum); Gray Leaf Spot (Cerospora); Common Rust (Puccinia sorghi); Convolutional Neural Networks (CNN); Neuroph Studio
Online: 21 February 2019 (13:04:05 CET)
Plant leaf diseases can affect the plants’ leaves to an extent that the plants can collapse and die completely. These diseases may drastically drop the supply of vegetables and fruits to the market, and result in a low agricultural economy. In the literature, different laboratory methods of plant leaf disease detection have been used. These methods were time consuming and could not cover large areas for the detection of leaf diseases. This study infiltrates through the facilitated principles of the Convolutional Neural Networks (CNN) in order to model a network for image recognition and classification of these diseases. Neuroph was used to perform the training of a CNN network that recognized and classified images of the maize leaf diseases that were collected by use of a smart phone camera. A novel way of training and the methodology used, expedite a quick and easy implementation of the system in practice. The developed model was able to recognize 3 different types of maize leaf diseases out of healthy leaves. The Northern Corn Leaf Blight (Exserohilum), Common Rust (Puccinia sorghi) and Gray Leaf Spot (Cerospora) diseases were chosen for this study as they affect most parts of Southern Africa’s maize fields.
ARTICLE | doi:10.20944/preprints202108.0392.v1
Subject: Engineering, Control And Systems Engineering Keywords: image quality assessment; real-time image processing; image functions adaptation; convolutional neural network; face alignment; deep neural network; random forest
Online: 18 August 2021 (17:06:02 CEST)
In recent years, data providers are generating and streaming a large number of images. More particularly, processing images that contain faces have received great attention due to its numerous applications, such as entertainment and social media apps. The enormous amount of images shared on these applications presents serious challenges and requires massive computing resources to ensure efficient data processing. However, images are subject to a wide range of distortions in real application scenarios during the processing, transmission, sharing, or combination of many factors. So, there is a need to guarantee acceptable delivery content, even though some distorted images do not have access to their original version. In this paper, we present a framework developed to estimate the images' quality while processing a large number of images in real-time. Our quality evaluation is measured using an integration of a deep network with random forests. In addition, a face alignment metric is used to assess the facial features. Experimental results have been conducted on two artificially distorted benchmark datasets, LIVE and TID2013. We show that our proposed approach outperforms the state-of-art methods, having a Pearson Correlation Coefficient (PCC) and Spearman Rank Order Correlation Correlation Coefficient (SROCC) with subjective human scores of almost 0.942 and 0.931 while minimizing the processing time from 4.8ms to 1.8ms.
ARTICLE | doi:10.20944/preprints202003.0313.v3
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: object detection; faster region-based convolutional neural network (FRCNN); single-shot multibox detector (SSD); super-resolution; remote sensing imagery; edge enhancement; satellites
Online: 29 April 2020 (13:33:56 CEST)
The detection performance of small objects in remote sensing images has not been satisfactory compared to large objects, especially in low-resolution and noisy images. A generative adversarial network (GAN)-based model called enhanced super-resolution GAN (ESRGAN) showed remarkable image enhancement performance, but reconstructed images usually miss high-frequency edge information. Therefore, object detection performance showed degradation for small objects on recovered noisy and low-resolution remote sensing images. Inspired by the success of edge enhanced GAN (EEGAN) and ESRGAN, we applied a new edge-enhanced super-resolution GAN (EESRGAN) to improve the quality of remote sensing images and used different detector networks in an end-to-end manner where detector loss was backpropagated into the EESRGAN to improve the detection performance. We proposed an architecture with three components: ESRGAN, EEN, and Detection network. We used residual-in-residual dense blocks (RRDB) for both the ESRGAN and EEN, and for the detector network, we used a faster region-based convolutional network (FRCNN) (two-stage detector) and a single-shot multibox detector (SSD) (one stage detector). Extensive experiments on a public (car overhead with context) dataset and another self-assembled (oil and gas storage tank) satellite dataset showed superior performance of our method compared to the standalone state-of-the-art object detectors.
ARTICLE | doi:10.20944/preprints202006.0031.v3
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Deep learning; Convolutional Neural Network; Coronavirus; COVID-19; radiology; CT scan; Medical image analysis; Automatic medical diagnosis; lung CT scan dataset
Online: 5 September 2020 (03:36:20 CEST)
COVID-19 is a severe global problem, and AI can play a significant role in preventing losses by monitoring and detecting infected persons in early-stage. This paper aims to propose a high-speed and accurate fully-automated method to detect COVID-19 from the patient's CT scan images. We introduce a new dataset that contains 48260 CT scan images from 282 normal persons and 15589 images from 95 patients with COVID-19 infections. At the first stage, this system runs our proposed image processing algorithm to discard those CT images that inside the lung is not properly visible in them. This action helps to reduce the processing time and false detections. At the next stage, we introduce a novel method for increasing the classification accuracy of convolutional networks. We implemented our method using the ResNet50V2 network and a modified feature pyramid network alongside our designed architecture for classifying the selected CT images into COVID-19 or normal with higher accuracy than other models. After running these two phases, the system determines the condition of the patient using a selected threshold. We are the first to evaluate our system in two different ways. In the single image classification stage, our model achieved 98.49% accuracy on more than 7996 test images. At the patient identification phase, the system correctly identified almost 234 of 245 patients with high speed. We also investigate the classified images with the Grad-CAM algorithm to indicate the area of infections in images and evaluate our model classification correctness.
ARTICLE | doi:10.20944/preprints201808.0034.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: android; malware; convolutional neural network
Online: 2 August 2018 (06:12:48 CEST)
Using smartphone especially android platform has already got eighty percent market shares, due to aforementioned report, it becomes attacker’s primary goal. There is a growing number of private data onto smart phones and low safety defense measure, attackers can use multiple way to launch and to attack user’s smartphones.(e.g. Using different coding style to confuse the software of detecting malware). Existing android malware detection methods use multiple features, like safety sensor API, system call, control flow structure and data information flow, then using machine learning to check whether its malware or not. These feature provide app’s unique property and limitation, that is to say, from some perspectives it might suit for some specific attack, but wouldn’t suit for others. Nowadays most malware detection methods use only one aforementioned feature, and these methods mostly analysis to detect code, but facing the influence of malware’s code confusion and zero-day attack, aforementioned feature extraction method may cause wrong judge. So, it’s necessary to design an effective technique analysis to prevent malware. In this paper, we use the importance of word from apk, because of code confusion, some malware attackers only rename variables, if using general static analysis wouldn’t judge correctly, then use these importance value to go through our proposed method to generate picture, finally using convolutional neural network to see whether the apk file is malware or not.
ARTICLE | doi:10.20944/preprints201807.0119.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Convolutional Neural Network,Single Shot Detector, Regional Convolutional Neural Network, Machine Learning, Visualization-Localization
Online: 6 July 2018 (14:38:52 CEST)
The emerging use of visualization techniques in pathology and microbiol- ogy has been accelerated by machine learning (ML) approaches towards image preprocessing, classification, and feature extraction in an increasingly complex series of datasets. Modern Convolutional Neural Network (CNN) architectures have developed into an umbrella of vast image reinforcement and recognition methods, including a combined classification-localization of single/multi-object featured images. As a subtype neural network, CNN cre- ates a rapid order of complexity by initially detecting borderlines, edges, and colours in images for dataset construction, eventually capable in mapping intricate objects and conformities. This paper investigates the disparities between Tensorflow object detection APIs, exclusively, Single Shot Detector (SSD) Mobilenet V1 and the Faster RCNN Inception V2 model, to sample computational drawbacks in accuracy-precision vs. real time visualization capabilities. The situation of rapid ML medical image analysis is theoretically framed in regions with limited access to pathology and disease prevention departments (e.g. 3rd world and impoverished countries). Dark field mi- croscopy datasets of an initial 62 XML-JPG annotated training files were processed under Malaria and Syphilis classes. Model trainings were halted as soon as loss values were regularized and converged.
ARTICLE | doi:10.20944/preprints202305.2163.v1
Subject: Engineering, Bioengineering Keywords: chickpea; convolutional neural network; transfer learning; classification
Online: 31 May 2023 (03:32:49 CEST)
Chickpea is one of the most widely consumed pulses globally because of its high protein content. The morphological features of chickpea seed, such as colour, texture, are observable and play a major role in classifying different chickpea varieties. This process is often carried out by human experts, and is time-consuming, inaccurate, and expensive. The objective of the study was to design an automated chickpea classifier using an RGB colour image-based model by considering the morphological features of chickpea seed. As part of the data acquisition process, five hundred and fifty images were collected per variety for four varieties of chickpea (CDC-Alma, CDC-Consul, CDC-Cory, and CDC-Orion) using an industrial RGB camera and a mobile phone camera. Three CNN-based models such as NasNet-A (mobile), MobileNetV3 (small), and EfficientNetB0 were evaluated using a transfer learning-based approach. The classification accuracy was 97%, 99%, and 98% for NasNet-A (mobile), MobileNetV3 (small), and EfficientNetB0 models, respectively. The MobileNetV3 model was used for further deployment on an Android mobile and Raspberry Pi 4 devices based on its higher accuracy and light-weight architecture. The classification accuracy for the four chickpea varieties was 100% while the MobileNetV3 model was deployed on both Android mobile and Raspberry Pi 4 platforms.
ARTICLE | doi:10.20944/preprints202211.0226.v1
Subject: Computer Science And Mathematics, Analysis Keywords: deep learning; convolutional neural networks; remote sensing
Online: 14 November 2022 (01:20:07 CET)
Deep Learning is an extremely important research topic in Earth Observation. Current use-cases range from semantic image segmentation, object detection to more common problems found in computer vision such as object identification. Earth Observation is an excellent source for different types of problems and data for Machine Learning in general and Deep Learning in particular. It can be argued that both Earth Observation and Deep Learning as fields of research will benefit greatly from this recent trend of research. In this paper we take several state of the art Deep Learning network topologies and provide a detailed analysis of their performance for semantic image segmentation for building footprint detection. The dataset used is comprised of high resolution images depicting urban scenes. We focused on single model performance on simple RGB images. In most situations several methods have been applied to increase the accuracy of prediction when using deep learning such as ensembling, alternating between optimisers during training and using pretrained weights to bootstrap new models. These methods although effective, are not indicative of single model performance. Instead, in this paper, we present different topology variations of these state of the art topologies and study how these variations effect both training convergence and out of sample, single model, performance.
ARTICLE | doi:10.20944/preprints202111.0186.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Explainable AI; Convolutional Neural Network; Network Compression
Online: 9 November 2021 (15:03:27 CET)
Model understanding is critical in many domains, particularly those involved in high-stakes decisions, i.e., medicine, criminal justice, and autonomous driving. Explainable AI (XAI) methods are essential for working with black-box models such as Convolutional Neural Networks. This paper evaluates the traffic sign classifier of Deep Neural Network (DNN) from the Programmable Systems for Intelligence in Automobiles (PRYSTINE) project for explainability. The results of explanations were further used for the CNN PRYSTINE classifier vague kernels` compression. After all, the precision of the classifier was evaluated in different pruning scenarios. The proposed classifier performance methodology was realised by creating the original traffic sign and traffic light classification and explanation code. First, the status of the kernels of the network was evaluated for explainability. For this task, the post-hoc, local, meaningful perturbation-based forward explainable method was integrated into the model to evaluate each kernel status of the network. This method enabled distinguishing high and low-impact kernels in the CNN. Second, the vague kernels of the classifier of the last layer before the fully connected layer were excluded by withdrawing them from the network. Third, the network's precision was evaluated in different kernel compression levels. It is shown that by using the XAI approach for network kernel compression, the pruning of 5% of kernels leads only to a 1% loss in traffic sign and traffic light classification precision. The proposed methodology is crucial where execution time and processing capacity prevail.
ARTICLE | doi:10.20944/preprints202007.0379.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Transfer Learning; Convolutional Neural Networks; Emotion Recognition
Online: 17 July 2020 (13:58:18 CEST)
The paper concludes the first research on mouth-based Emotion Recognition (ER), adopting a Transfer Learning (TL) approach. Transfer Learning results paramount for mouth-based emotion ER, because a few data sets are available, and most of them include emotional expressions simulated by actors, instead of adopting a real-world categorization. Using TL we can use fewer training data than training a whole network from scratch, thus more efficiently fine-tuning the network with emotional data and improving the convolutional neural network accuracy in the desired domain. The proposed approach aims at improving the Emotion Recognition dynamically, taking into account not only new scenarios but also modified situations with respect to the initial training phase, because the image of the mouth can be available even when the whole face is visible only in an unfavourable perspective. Typical applications include automated supervision of bedridden critical patients in an healthcare management environment, or portable applications supporting disabled users having difficulties in seeing or recognizing facial emotions. This work takes advantage from previous preliminary works on mouth-based emotion recognition using CNN deep-learning, and has the further benefit of testing and comparing a set of networks on large data sets for face-based emotion recognition well known in literature. The final result is not directly comparable with works on full-face ER, but valorizes the significance of mouth in emotion recognition, obtaining consistent performances on the visual emotion recognition domain.
ARTICLE | doi:10.20944/preprints202306.0260.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: chromosome classification; convolutional neural networks; ensemble; data augmentation
Online: 5 June 2023 (08:02:04 CEST)
Object classification is a crucial task in deep learning, which involves the identification and categorization of objects in images or videos. Although humans can easily recognize common objects, such as cars, animals, or plants, performing this task on a large scale can be time-consuming and error-prone. Therefore, automating this process using neural networks can save time and effort while achieving higher accuracy. Our study focuses on the classification step of human chromosome karyotyping, an important medical procedure that helps diagnose genetic disorders. Traditionally, this task is performed manually by expert cytologists, which is a time-consuming process that requires specialized medical skills. Therefore, automating it through deep learning can be immensely useful. To accomplish this, we implemented and adapted existing preprocessing and data augmentation techniques to prepare the chromosome images for classification. We used ResNet-50 convolutional neural networks and an ensemble approach to classify the chromosomes, obtaining state-of-the-art performance in the tested dataset.
COMMUNICATION | doi:10.20944/preprints202209.0041.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Deep Learning; Convolutional Neural Networks; Medical Image Segmentation
Online: 5 September 2022 (03:12:55 CEST)
Convolutional neural network architectures have become increasingly complex, which has improved the performance slowly on well-known benchmark datasets in the recent years. In this research, we have analyzed the true need for such complexity. We have introduced G-Net light, a lightweight modified GoogleNet with improved filter count per layer to reduce feature overlaps and complexity. Additionally, by limiting the amount of pooling layers in the proposed architecture, we have exploited the skip connections to minimize the spatial information loss. The investigations on the proposed architecture are evaluated on three retinal vessel segmentation publicly available datasets. The proposed G-Net light outperforms other vessel segmentation architectures by reducing the number of trainable parameters..
REVIEW | doi:10.20944/preprints202206.0179.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Optical neural networks; neuromorphic processor; microcomb; convolutional accelerator
Online: 13 June 2022 (09:58:01 CEST)
Optical neural networks (ONNs), or optical neuromorphic hardware accelerators, have the potential to dramatically enhance the computing power and energy efficiency of mainstream electronic processors, due to their ultra-large bandwidths of up to 10’s of terahertz together with their analog architecture that avoids the need for reading and writing data back-and-forth. Different multiplexing techniques have been demonstrated to demonstrate ONNs, amongst which wavelength-division multiplexing (WDM) techniques make sufficient use of the unique advantages of optics in terms of broad bandwidths. Here, we review recent advances in WDM-based ONNs, focusing on methods that use integrated microcombs to implement ONNs. We present results for human image processing using an optical convolution accelerator operating at 11 Tera operations per second. The open challenges and limitations of ONNs that need to be addressed for future applications are also discussed.
ARTICLE | doi:10.20944/preprints202111.0047.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Data augmentation; Deep Learning; Convolutional Neural Networks; Ensemble.
Online: 2 November 2021 (11:18:23 CET)
Convolutional Neural Networks (CNNs) have gained prominence in the research literature on image classification over the last decade. One shortcoming of CNNs, however, is their lack of generalizability and tendency to overfit when presented with small training sets. Augmentation directly confronts this problem by generating new data points providing additional information. In this paper, we investigate the performance of more than ten different sets of data augmentation methods, with two novel approaches proposed here: one based on the Discrete Wavelet Transform and the other on the Constant-Q Gabor transform. Pretrained ResNet50 networks are finetuned on each augmentation method. Combinations of these networks are evaluated and compared across three benchmark data sets of images representing diverse problems and collected by instruments that capture information at different scales: a virus data set, a bark data set, and a LIGO glitches data set. Experiments demonstrate the superiority of this approach. The best ensemble proposed in this work achieves state-of-the-art performance across all three data sets. This result shows that varying data augmentation is a feasible way for building an ensemble of classifiers for image classification (code available at https://github.com/LorisNanni).
ARTICLE | doi:10.20944/preprints202104.0753.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Convolutional extreme learning machine; Deep learning; Multimedia analysis
Online: 28 April 2021 (15:31:14 CEST)
Many works have recently identified the need to combine deep learning with extreme learning to strike a performance balance with accuracy especially in the domain of multimedia applications. Considering this new paradigm, namely convolutional extreme learning machine (CELM), we present a systematic review that investigates alternative deep learning architectures that use extreme learning machine (ELM) for a faster training to solve problems based on image analysis. We detail each of the architectures found in the literature, application scenarios, benchmark datasets, main results, advantages, and present the open challenges for CELM. We follow a well structured methodology and establish relevant research questions that guide our findings. We hope that the observation and classification of such works can leverage the CELM research area providing a good starting point to cope with some of the current problems in the image-based computer vision analysis.
ARTICLE | doi:10.20944/preprints202104.0412.v1
Subject: Environmental And Earth Sciences, Geophysics And Geology Keywords: deep learning; hydraulic conductivity; convolutional neural networks; groundwater
Online: 15 April 2021 (12:25:05 CEST)
We confirm that energy dissipation weighting provides the most accurate approach to determining the effective hydraulic conductivity (Keff) of a binary K grid. A deep learning algorithm (UNET) can infer Keff with extremely high accuracy (R2 > 0.99). The UNET architecture could be trained to infer the energy dissipation weighting pattern from an image of the K distribution with high fidelity, although it was less accurate for cases with highly localized structures that controlled flow. Furthermore, the UNET architecture learned to infer the energy dissipation weighting even if it was not trained on this information directly. However, the weights were represented within the UNET in a way that was not immediately interpretable by a human user. This reiterates the idea that even if ML/DL algorithms are trained to make some hydrologic predictions accurately, they must be designed and trained to provide each user-required output if their results are to be used to improve our understanding of hydrologic systems most effectively.
ARTICLE | doi:10.20944/preprints202011.0331.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Convolutional neural network; Unmanned Aerial Vehicles; Deep learning.
Online: 12 November 2020 (08:57:18 CET)
The evolution in imaging technologies and artificial intelligence algorithms, coupled with improvements in UAV technology, has enabled the use of unmanned aircraft in a wide range of applications. The feasibility of this kind of approach for cattle monitoring has been demonstrated by several studies, but practical use is still challenging due to the particular characteristics of this application, such as the need to track mobile targets and the extensive areas that need to be covered in most cases. The objective of this study was to investigate the feasibility of using a tilted angle to increase the area covered by each image. Deep Convolutional Neural Networks (Xception architecture) were used to generate the models for the experiments, which covered aspects like ideal input dimensions, effect of the distance between animals and sensor, effect of classification error on the overall detection process, and impact of physical obstacles on the accuracy of the model. Experimental results indicate that oblique images can be successfully used under certain conditions, but some practical limitations need to be addressed in order to make this approach appealing.
ARTICLE | doi:10.20944/preprints201910.0061.v1
Subject: Physical Sciences, Applied Physics Keywords: noctilucent clouds; linear discriminant analysis; convolutional neural networks
Online: 7 October 2019 (11:07:56 CEST)
In this paper, we present a framework to study the spatial structure of noctilucent clouds formed by ice particles in the upper atmosphere at mid and high latitude during summer. We study noctilucent cloud activity in optical images taken from three different locations and under different atmospheric conditions. In order to identify and distinguish noctilucent cloud activity from other objects in the scene, we employ linear discriminant analysis (LDA) with feature vectors ranging from simple metrics to higher-order local autocorrelation (HLAC), and histogram of oriented gradients (HOG). Finally, we propose a Convolutional Neural Networks (CNN) based method for the detection of noctilucent clouds. The results clearly indicate that the CNN based approach outperforms LDA based methods used in this article. Furthermore, we outline suggestions for future research directions to establish a framework that can be used for synchronizing the optical observations from ground based camera systems with echoes measured with radar systems like EISCAT in order to obtain independent additional information on the ice clouds.
ARTICLE | doi:10.20944/preprints201907.0115.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Automated Weeding; Mobile Convolutional Neural Netowrks, Semantic Segmentation
Online: 8 July 2019 (12:29:21 CEST)
Automated weeding is an important research area in agrorobotics. Weeds can be removed mechanically or with the precise usage of herbicides. Deep Learning techniques achieved state of the art results in many computer vision tasks, however their deployment on low-cost mobile computers is still challenging. These paper present an advanced version of the system presented in . The described system contains several novelties, compared both with its previous version and related work. It is a part of a project of the automatic weeding machine, developed by Warsaw University of Technology and MCMS Warka Ltd. The obtained model reaches satisfying accuracy at over 10~FPS on the Raspberry Pi 3B+ computer. It was tested for four different plant species at different growth stadiums and lighting conditions. The system performing semantic segmentation is based on Convolutional Neural Networks. Its custom architecture mixes U-Net, MobileNets, DenseNet and ResNet concepts. Amount of needed manual ground truth labels was significantly decreased by the usage of knowledge distillation process, learning final model to mimic an ensemble of complex models on the large database of unlabeled data. Further decrease of the inference time was obtained by two custom modifications: in the usage of separable convolutions in DenseNet block and in the number of channels in each layer. In the authors’ opinion, described novelties can be easily transferred to other agrorobotics tasks.
ARTICLE | doi:10.20944/preprints202206.0053.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Brain-Computer Interface Systems; Convolutional Neural Network; Deep Learning
Online: 6 June 2022 (03:23:36 CEST)
Objective A trained T1 class Convolutional Neural Network (CNN) model will be used to examine its ability to successfully identify motor imagery when fed pre-processed electroencephalography (EEG) data. In theory, and if the model has been trained accurately, it should be able to identify a class and label it accordingly. The CNN model will then be restored and used to try and identify the same class of motor imagery data using much smaller sampled data in an attempt to simulate live data. Approach PyCharm, a Python platform, will be used to house and process the CNN. The raw data used for the training of the CNN will be sourced from the PhysioBank website. The EEG signal data will then be pre-processed using Brainstorm software that is a toolbox used in conjunction with MATLAB. The sample data used to validate and test the trained CNN, will be also be extracted from Brainstorm but in a much smaller size compared to the training data which is comprised of thousands of images. The sample size would be comparable to a person wearing a Brain Computer Interface (BCI), offering approximately 20 seconds of motor imagery signal data. Results The raw EEG data was successfully extracted and pre-processed. The deep learning model was trained using the extracted image data along with their corresponding labels. After training, it was able to accurately identify the T1 class label at 100 percent. The python coding was then modified to restore the trained model and feed it test sample data in which it was found to recognise 6 out of 10 lines of T1 signal image data. The result suggested that the initial training of the model required a different, more varying approach, so that it would be able to detect varying sample signal image data. The outcome of which could mean that the model could be used in applications for multiple patients wearing the same BCI hardware to control a device or interface.
ARTICLE | doi:10.20944/preprints202204.0177.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: Plant disease; Machine vision; UAV; Smartphone; Convolutional Neural Network
Online: 19 April 2022 (07:44:29 CEST)
Stripe rust (caused by Puccinia striiformis f. sp. tritici) is one of the most devastating diseases of wheat and causes large-scale epidemics and severe yield loss. Applying fungicides during early epidemic development is crucial to controlling the disease but is often challenged by resource-limited human visual scouting. Deep learning has the potential to process images and videos captured from affordable devices to empower high-throughput phenotyping for early detection of stripe rust for timely application of fungicides and improve control efficiency. Here, we developed RustNet, a neural network-based image classifier, for efficiently monitoring fields for stripe rust. RustNet was built on a ResNet-18 architecture pre-trained with ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) dataset using transfer learning. RGB images and videos of multiple wheat fields with different wheat types (winter and spring wheat), conditions (irrigated and non-irrigated), and locations were acquired using smartphones or unmanned aerial vehicles near the canopy. A semi-automated image labeling approach was conducted to improve labeling efficiency by combining automated machine labeling and human correction. Cross-validations across multiple categories (sensor platforms, wheat types, and locations) achieved Area Under Curve from 0.72 to 0.87. Independent validation on a published dataset from Germany achieved accuracies ranging from 0.79 to 0.86. The visualization of the last convolutional layer of RustNet demonstrated the identification of pixels with stripe rust. RustNet is freely available at https://zzlab.net/RustNet.
Subject: Environmental And Earth Sciences, Oceanography Keywords: breaking waves; optical flow; convolutional neural networks; image classification
Online: 11 October 2021 (15:49:36 CEST)
The use of convolutional neural networks (CNNs) in image classification has become the standard method of approaching computer vision problems. Here we apply pre-trained networks to classify images of non-breaking, plunging and spilling breaking waves. The CNNs are used as basic feature extractors and a classifier is then trained on top of these networks. The dynamic nature of breaking waves is exploited by using image sequences to gain extra information and improve the classification results. We also see improved classification performance in using pre-computed image features such as the optical flow between image pairs. The inclusion of the dynamic information improves the classification between breaking wave classes. We also provide corrections to the methodology from the article from which the data originates to achieve a more accurate assessment of performance.
ARTICLE | doi:10.20944/preprints202008.0641.v2
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: brain tumor; machine learning; ensemble methods; convolutional neural networks
Online: 8 June 2021 (13:53:28 CEST)
In this paper, we propose methods for brain tumor detection in MRI images based on ensemble learning. We build upon prior research on ensemble methods by testing the concatenation of pre-trained models: features extracted via transfer learning are merged and segmented by classification algorithms or a stacked ensemble of those algorithms. The proposed approach achieved accuracy scores of 0.98 , outperforming a benchmark VGG-16 model. Considerations to granular computing are given in the paper as well.
ARTICLE | doi:10.20944/preprints201911.0019.v1
Subject: Engineering, Control And Systems Engineering Keywords: community detection; social network; convolutional neural network; auto-encoder
Online: 3 November 2019 (15:51:34 CET)
With the fast development of the mobile Internet, the online platforms of social networks have rapidly been developing for the purpose of making friends, sharing information, etc. In these online platforms, users being related to each other forms social networks. Literature reviews have shown that social networks have community structure. Through the studies of community structure, the characteristics and functions of networks structure and the dynamical evolution mechanism of networks can be used for predicting user behaviours and controlling information dissemination. Therefore, this study proposes a deep community detection method which includes (1) matrix reconstruction method, (2) spatial feature extraction method and (3) community detection method. The original adjacency matrix in social network is reconstructed based on the opinion leader and nearer neighbors for obtaining spatial proximity matrix. The spatial eigenvector of reconstructed adjacency matrix can be extracted by an auto-encoder based on convolution neural network for the improvement of modularity. In experiments, four open datasets of practical social networks were selected to evaluate the proposed method, and the experimental results show that the proposed deep community detection method obtained higher modularity than other methods. Therefore, the proposed deep community detection method can effectively detect high quality communities in social networks.
ARTICLE | doi:10.20944/preprints202110.0359.v2
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: brain; pituitary adenoma; Dysembryoplastic neuroepithelial tumor; DNET; Ganglioglioma; deep learning; digital pathology; convolutional neural network; computer vision; machine learning; convolutional neural network; CNN
Online: 26 October 2021 (14:10:11 CEST)
Background: Processing whole-slide images (WSI) to train neural networks can be intricate and laborious. We developed an open-source library covering recurrent tasks in processing of WSI and in evaluating the performance of the trained networks for classification tasks. Methods: Two histopathology use-cases were selected. First we aimed to train a CNN to distinguish H&E-stained slides obtained from neuropathologically classified low-grade epilepsy-associated dysembryoplastic neuroepithelial tumor (DNET) and ganglioglioma (GG). The second project we trained a convolutional neural network (CNN) to predict the hormone expression of pituitary adenoms only from hematoxylin and eosin (H&E) stained slides. In the same approach, we addressed the issue to also predict clinically silent corticotroph adenoma. We included four clinico-pathological disease conditions in a multilabel approach. Results: Our best performing CNN achieved an area under the curve (AUC) of 0.97 for the receiver operating characteristic (ROC) for corticotroph adenoma, 0.86 for silent corticotroph adenoma and 0.98 for gonadotroph adenoma. Our DNET-GG classifier achieved an AUC of 1.00 for the ROC curve. All scores were calculated with the help of our library on predictions on a case basis. Conclusions: Our comprehensive library is most helpful to standardize the work-flow and minimize the work-burden in training CNN. It is also compatible with fastai. Indeed, our new CNNs reliably extracted neuropathologically relevant information from the H&E staining only. This approach will supplement the clinico-pathological diagnosis of brain tumors, which is currently based on cost-intense microscopic examination and variable panels of immunohistochemical stainings.
REVIEW | doi:10.20944/preprints201805.0484.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: deep learning; deep convolutional neural networks; dcnn; convolutional neural networks; cnn; robot learning; transfer learning; robotic grasping; robotic grasp detection; human-robot collaboration
Online: 31 May 2018 (17:27:23 CEST)
In order for robots to attain more general-purpose utility, grasping is a necessary skill to master. Such general-purpose robots may use their perception abilities in order to visually identify grasps for a given object. A grasp describes how a robotic end-effector can be arranged on top of an object to securely grab it between the robotic gripper and successfully lift it without slippage. Traditionally, grasp detection requires expert human knowledge to analytically form the task-specific algorithm, but this is an arduous and time-consuming approach. During the last five years, deep learning methods have enabled significant advancements in robotic vision, natural language processing, and automated driving applications. The successful results of these methods have driven robotics researchers to explore the application of deep learning methods in task generalised robotic applications. This paper reviews the current state-of-the-art in regards to the application of deep learning methods to generalised robotic grasping and discusses how each element of the deep learning approach has improved the overall performance of robotic grasp detection. A number of the most promising approaches are evaluated and the most successful for grasp detection is identified as the one-shot detection method. The availability of suitable volumes of appropriate training data is identified as a major obstacle for effective utilisation of the deep learning approaches, and the use of transfer learning techniques is identified as a potential mechanism to address this. Finally, current trends in the field and future potential research directions are discussed.
ARTICLE | doi:10.20944/preprints202304.1031.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: deep learning; convolutional neural networks; livestock; pose estimation; animal behavior
Online: 27 April 2023 (04:19:46 CEST)
Automatic and real-time pose estimation is important in monitoring animal behavior, health and welfare. In this paper, we utilized pose estimation for monitoring farrowing process to prevent piglet mortality and preserve the health and welfare of sow. State-of-the-art Deep Learning (DL) methods have lately been used for animal pose estimation. The aim of this paper was to probe the generalization ability of five common DL networks (ResNet50, ResNet101, MobileNet, EfficientNet and DLCRNet) for sow and piglet pose estimation. These architectures predict body parts of several piglets and the sow directly from input video sequences. Real farrowing data from a commercial farm was used for training and validation of the proposed networks. The experimental results demonstrated that MobileNet was able to detect seven body parts of the sow with median test error of 0.61 pixels.
CASE REPORT | doi:10.20944/preprints202304.0259.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: TensorFlow; Convolutional Neural Network; Machine Learning; Artificial Intelligence; Deep Learning
Online: 12 April 2023 (08:19:41 CEST)
Advances in computing technology provide an opportunity to grow and develop an effective crop protection system. This study is a new way of diagnosing plant diseases using a deep and convolutional neural network. Deep learning is the strengthened method of machine learning. It uses a neural network that imitates like a human brain and gains certain information. Although powerful, its training process necessitates millions of tagged data points. The study's Methodology consists of three key stages: data acquisition, pre-processing, and Model Building.
REVIEW | doi:10.20944/preprints202208.0067.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: deep learning; 3D reconstruction; convolutional neural networks; texture-less surfaces
Online: 2 August 2022 (12:17:08 CEST)
3D reconstruction from a single 2D input is a classic problem in the field of computer vision. With the advancements in deep learning, the performance of 3D reconstruction has also significantly improved. The reconstruction task is more difficult for objects with no textures or complex deformations. This paper serves as a review of recent literature on 3D reconstruction from a single view, with a focus on deep learning methods from 2018 to 2021. Due to lack of standard datasets or 3D shape representation methods, it is hard make direct comparisons between all reviewed methods. However, this paper reviews different approaches for reconstructing 3d shape as depth maps, surface normals, point clouds and meshes; along with various loss functions and evaluation metrics used to train and evaluate these methods.
ARTICLE | doi:10.20944/preprints202206.0368.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: hand gesture classification; transfer learning; three-dimensional convolutional; LSTM network
Online: 27 June 2022 (13:36:40 CEST)
This paper introduces a multi-class hand gesture recognition model developed to identify a set of defined hand gesture sequences in two-dimensional RGB video recordings. The work presents an action detection classifier that looks at both appearance and spatiotemporal parameters of consecutive frames. The classifier utilizes a convolutional-based network combined with a long-short-term memory unit. To leverage the need for a large-scale dataset, the model uses an available dataset to then adopt a technique known as transfer learning to fine-tune the model on the hand gestures of relevance. Validation curves performed over a batch size of 64 indicate an accuracy of 93.95% (± 0.37) with a mean Jaccard index of 0.812 (± 0.105) for 22 participants. The presented model illustrates the possibility of training a model with a small set of data (113,410 fully labelled frames). The proposed pipeline embraces a small-sized architecture that could facilitate its adoption.