ARTICLE | doi:10.20944/preprints201811.0579.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Deep learning, Cognitive, LSTM, Neural network, Ngrams
Online: 26 November 2018 (10:06:05 CET)
Cognitive neuroscience is the study of how the human brain functions on tasks like decision making, language, perception and reasoning. Deep learning is a class of machine learning algorithms that use neural networks. They are designed to model the responses of neurons in the human brain. Learning can be supervised or unsupervised. Ngram token models are used extensively in language prediction. Ngrams are probabilistic models that are used in predicting the next word or token. They are a statistical model of word sequences or tokens and are called Language Models or Lms. Ngrams are essential in creating language prediction models. We are exploring a broader sandbox ecosystems enabling for AI. Specifically, around Deep learning applications on unstructured content form on the web.
ARTICLE | doi:10.20944/preprints202201.0465.v1
Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: genetic algorithm; deep neural network; hidden layer; optimal architecture; intrusion detection
Online: 31 January 2022 (13:26:18 CET)
Computer network attacks are evolving in parallel with the evolution of hardware and neural network architecture. Despite major advancements in Network Intrusion Detection System (NIDS) technology, most implementations still depend on signature-based intrusion detection systems, which can’t identify unknown attacks. Deep learning can help NIDS to detect novel threats since it has a strong generalization ability. The deep neural network’s architecture has a significant impact on the model’s results. We propose a genetic algorithm based model to find the optimal number of hidden layers and the number of neurons in each layer of the deep neural network (DNN) architecture for the network intrusion detection binary classification problem. Experimental results demonstrate that the proposed DNN architecture shows better performance than classical machine learning algorithms at a lower computational cost.
ARTICLE | doi:10.20944/preprints202005.0347.v1
Subject: Engineering, Mechanical Engineering Keywords: deep learning; maximum mean discrepancy; gearbox; fault detection
Online: 22 May 2020 (05:21:56 CEST)
In the past years, various intelligent machine learning and deep learning algorithms have been developed and widely applied for gearbox fault detection and diagnosis. However, the real-time application of these intelligent algorithms has been limited, mainly due to the fact that the model developed using data from one machine or one operating condition has serious diagnosis performance degradation when applied to another machine or the same machine with a different operating condition. The reason for poor model generalization is the distribution discrepancy between the training and testing data. This paper proposes to address this issue using a deep learning based cross domain adaptation approach for gearbox fault diagnosis. Labelled data from training dataset and unlabeled data from testing dataset is used to achieve the cross-domain adaptation task. A deep convolutional neural network (CNN) is used as the main architecture. Maximum mean discrepancy is used as a measure to minimize the distribution distance between the labelled training data and unlabeled testing data. The study proposes to reduce the discrepancy between the two domains in multiple layers of the designed CNN to adapt the learned representations from the training data to be applied in the testing data. The proposed approach is evaluated using experimental data from a gearbox under significant speed variation and multiple health conditions. An appropriate benchmarking with both traditional machine learning methods and other domain adaptation methods demonstrates the superiority of the proposed method.
ARTICLE | doi:10.20944/preprints201811.0546.v4
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Convolutional Neural Network (CNN), Deep learning, Architecture, Applications
Online: 14 February 2019 (10:01:31 CET)
With the increase of the Artificial Neural Network (ANN), machine learning has taken a forceful twist in recent times. One of the most spectacular kinds of ANN design is the Convolutional Neural Network (CNN). The Convolutional Neural Network (CNN) is a technology that mixes artificial neural networks and up to date deep learning strategies. In deep learning, Convolutional Neural Network is at the center of spectacular advances. This artificial neural network has been applied to several image recognition tasks for decades and attracted the eye of the researchers of the many countries in recent years as the CNN has shown promising performances in several computer vision and machine learning tasks. This paper describes the underlying architecture and various applications of Convolutional Neural Network.
ARTICLE | doi:10.20944/preprints202304.0141.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: plume rise; deep learning; plume cloud recognition
Online: 10 April 2023 (04:11:22 CEST)
Estimating plume cloud height is essential for various applications, such as global climate models. Smokestack plume rise is the constant height at which the plume cloud is carried downwind as its momentum dissipates and the plume cloud and the ambient temperatures equalize. Although different parameterizations are used in most air-quality models to predict the plume rise, they have been unable to estimate it properly. This paper proposes a novel framework to monitor smokestack plume clouds and make long-term, real-time measurements of the plume rise. For this purpose, a three-stage framework is developed based on Deep Convolutional Neural Networks (DCNNs). In the first stage, an improved Mask R-CNN, called Deep Plume Rise Network (DPRNet), is applied to recognize the plume cloud. Then, image processing analysis and least squares theory are respectively used to detect the plume cloud’s boundaries and fit an asymptotic model into their centerlines. The y-component coordinate of this model’s critical point is considered the plume rise. In the last stage, a geometric transformation phase converts image measurements into real-life ones. A wide range of images with different atmospheric conditions, including day, night, and cloudy/foggy, have been selected for the DPRNet training algorithm. Obtained results show that the proposed method outperforms widely-used networks in smoke border detection and recognition.
ARTICLE | doi:10.20944/preprints202005.0455.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: pattern recognition; deep convolutional neural network; Brahmi script; CNN
Online: 28 May 2020 (07:33:32 CEST)
Significant progress has made in pattern recognition technology. However, one obstacle that has not yet overcome is the recognition of words in the Brahmi script, specifically the identification of characters, compound characters, and word. This study proposes the use of the deep convolutional neural network with dropout to recognize the Brahmi words. This study also proposed a DCNN for Brahmi word recognition and a series of experiments are performed on standard Brahmi dataset. The practical operation of this method was systematically tested on accessible Brahmi image database, achieving 92.47% recognition rate by CNN with dropout respectively which is among the best while comparing with the ones reported in the literature for the same task.
BRIEF REPORT | doi:10.20944/preprints202207.0419.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: computer vision; deep learning; CoughNet model
Online: 27 July 2022 (10:01:54 CEST)
To solve two key problems in the identification of people who are infected with COVID-19: the first problem is that the identification accuracy is not high enough. The second problem is that present identification method such as nucleic acid testing is expensive in many countries. Methods: So, I decided to design a fast identification method for COVID-19 patients which is based on deep learning. After the model (CoughNet) learns more than 6,000 cough spectrograms of both COVID-19 patients and normal people, the accuracy rate of identification of COVID-19 patients and normal people is higher than 99% in the test set. Structure: This paper is mainly divided into three parts: the first part introduces the main background and research status of the research; The second part introduces the research methods; The third part introduces the specific process of the experiment.
ARTICLE | doi:10.20944/preprints201912.0252.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: time series; deep learning; convolutional neural network; recurrence plot; financial market prediction
Online: 19 December 2019 (07:39:54 CET)
An application of deep convolutional neural network and recurrence plot for financial market movement prediction is presented. Though it is challenging and subjective to interpret its information, the pattern formed by a recurrence plot provide a useful insight into the dy- namical system. We used a recurrence plot of seven financial time series to train a deep neural network for financial market movement predic- tion. Our approach is tested on our dataset and achieved an average of 53.25% classification accuracy. The result suggests that a well trained deep convolutional neural network can learn a recurrence plot and pre- dict a financial market direction.
ARTICLE | doi:10.20944/preprints202006.0368.v1
Subject: Business, Economics And Management, Finance Keywords: Fraud Detection; Recurrent Neural Network; PaySim; Financial Transactions; Deep Learning
Online: 30 June 2020 (11:34:34 CEST)
Online transactions are becoming more popular in present situation where the globe is facing an unknown disease COVID-19. Now authorities of Countries requested peoples to use cashless transaction as far as possible. Practically it is not always possible to use it in all transactions. Since number of such cashless transactions have been increasing during lockdown period due to COVID-19, fraudulent transactions are also increasing in a rapid way. Fraud can be analysed by viewing a series of customer transactions data that was done in his/her previous transactions. Normally banks or other transaction authorities warned their customers about the transaction If any deviation is noticed by them from available patterns. These authorities think that it is possibly of fraudulent transaction. For detection of fraud during COVID-19, banks and credit card companies are applying various methods such as data mining , decision tree, rule based mining, neural network, fuzzy clustering approach and machine learning methods. These approaches is try to find out normal usage pattern of customers based on their past activities. The objective of this paper is to find out such fraud transactions during such unmanageable situation.Digital payment schemes are often threatened by fraudulent activities. Detecting fraud transaction in during money transfer may save customers from financial loss. Mobile based money transactions are focused in this paper for fraud detection. A Deep Learning (DL) framework is suggested in this paper that monitors and detects fraudulent activities. Implementing and applying recurrent neural network on PaySim generated synthetic financial dataset, deceptive transactions are identified. The proposed method is capable to detect deceptive transactions with an accuracy of 99.87%, F1-Score of 0.99 and MSE of 0.01.
ARTICLE | doi:10.20944/preprints202106.0613.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: LRTI; URTI; Asthma; Cough Classification; Respiratory Pathology Classification; MFCCs; BiLSTM; Deep Neural Networks
Online: 25 June 2021 (09:45:00 CEST)
Intelligent systems are transforming the world, as well as our healthcare system. We propose a deep learning-based cough sound classification model that can distinguish between children with healthy versus pathological coughs such as asthma, upper respiratory tract infection (URTI), and lower respiratory tract infection (LRTI). In order to train a deep neural network model, we collected a new dataset of cough sounds, labelled with clinician's diagnosis. The chosen model is a bidirectional long-short term memory network (BiLSTM) based on Mel Frequency Cepstral Coefficients (MFCCs) features. The resulting trained model when trained for classifying two classes of coughs -- healthy or pathology (in general or belonging to a specific respiratory pathology), reaches accuracy exceeding 84\% when classifying cough to the label provided by the physicians' diagnosis. In order to classify subject's respiratory pathology condition, results of multiple cough epochs per subject were combined. The resulting prediction accuracy exceeds 91\% for all three respiratory pathologies. However, when the model is trained to classify and discriminate among the four classes of coughs, overall accuracy dropped: one class of pathological coughs are often misclassified as other. However, if one consider the healthy cough classified as healthy and pathological cough classified to have some kind of pathologies, then the overall accuracy of four class model is above 84\%. A longitudinal study of MFCC feature space when comparing pathologicial and recovered coughs collected from the same subjects revealed the fact that pathological cough irrespective of the underlying conditions occupy the same feature space making it harder to differentiate only using MFCC features.
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: waste classification; transfer learning; deep learning; recognition classification
Online: 23 February 2020 (14:01:01 CET)
Using machine learning or deep learning to solve the problem of garbage recognition and classification is an important application in computer vision, but due to the incomplete establishment of garbage datasets and the poor performance of complex network models on smart terminal devices, the existing garbage classification models The effect is not good.This paper presents a waste classification and identification method base on transfer learning and lightweight neural network. By migrating the lightweight neural network MobileNetV2 and rebuild it, The reconstructed network is used for feature extraction, and the extracted features are introduced into the SVM to realize the identification of 6 types of garbage. The model was trained and verified by using 2527 pieces of garbage labeled data in the TrashNet dataset, which ultimately resulted in a classification accuracy of 98.4% of the method, which proves that the method can effectively improve the classification accuracy and time and overcome the problem of weak data and less labeling. The over-fitting phenomenon encountered by small data sets in deep learning makes the model robust.
ARTICLE | doi:10.20944/preprints202209.0060.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Autonomous Driving; Deep Learning; LIDAR Data; Wavelets; 3D Object Detection
Online: 5 September 2022 (13:03:00 CEST)
3D object detection is crucial for autonomous driving to understand the driving environment. Since the pooling operation causes information loss in the standard CNN, we have designed a wavelet multiresolution analysis-based 3D object detection network without a pooling operation. Additionally, instead of using a single filter like the standard convolution, we use the lower-frequency and higher-frequency coefficients as a filter. These filters capture more relevant parts than a single filter, enlarging the receptive field. The model comprises a discrete wavelet transform (DWT) and an inverse wavelet transform (IWT) with skip connections to encourage feature reuse for contrasting and expanding layers. The IWT enriches the feature representation by fully recovering the lost details during the downsampling operation. Element-wise summation is used for the skip connections to decrease the computational burden. We train the model for the Haar and Daubechies (Db4) wavelets. The two-level wavelet decomposition result shows that we can build a lightweight model without losing significant performance. The experimental results on the KITTI’s BEV and 3D evaluation benchmark show our model outperforms the Pointpillars base model by up to 14 \% while reducing the number of trainable parameters. Code will be released.
REVIEW | doi:10.20944/preprints202104.0202.v1
Subject: Computer Science And Mathematics, Mathematical And Computational Biology Keywords: Spiking Neural Network (SNN); Biological Inspiration; Deep Learning; Neuromorphic Computing
Online: 7 April 2021 (12:13:16 CEST)
Recent advancement of deep learning has been elevated the multifaceted nature in various applications of this field. Artificial neural networks are now turning into a genuinely old procedure in the vast area of computer science; the principal thoughts and models are more than fifty years of age. However, in this modern computing era, 3rd generation intelligent models are introduced by scientists. In the biological neuron, actual film channels control the progression of particles over the layer by opening and shutting in light of voltage changes because of inborn current flows and remotely led to signals. A comprehensive 3rd generation, Spiking Neural Network (SNN) is diminishing the distance between deep learning, machine learning, and neuroscience in a biologically-inspired manner. It also connects neuroscience and machine learning to establish high-level efficient computing. Spiking Neural Networks initiate utilizing spikes, which are discrete functions that happen at focuses as expected, as opposed to constant values. This paper is a review of the biological-inspired spiking neural network and its applications in different areas. The author aims to present a brief introduction to SNN, which incorporates the mathematical structure, applications, and implementation of SNN. This paper also represents an overview of machine learning, deep learning, and reinforcement learning. This review paper can help advanced artificial intelligence researchers to get a compact brief intuition of spiking neural networks.
ARTICLE | doi:10.20944/preprints202103.0220.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: Convolutional Neural Network; Deep Learning; Environmental Monitoring
Online: 8 March 2021 (13:37:58 CET)
Accurately mapping individual tree species in densely forested environments is crucial to forest inventory. When considering only RGB images, this is a challenging task for many automatic photogrammetry processes. The main reason for that is the spectral similarity between species in RGB scenes, which can be a hindrance for most automatic methods. State-of-the-art deep learning methods could be capable of identifying tree species with an attractive cost, accuracy, and computational load in RGB images. This paper presents a deep learning-based approach to detect an important multi-use species of palm trees (Mauritia flexuosa; i.e., Buriti) on aerial RGB imagery. In South-America, this palm tree is essential for many indigenous and local communities because of its characteristics. The species is also a valuable indicator of water resources, which comes as a benefit for mapping its location. The method is based on a Convolutional Neural Network (CNN) to identify and geolocate singular tree species in a high-complexity forest environment, and considers the likelihood of every pixel in the image to be recognized as a possible tree by implementing a confidence map feature extraction. This study compares the performance of the proposed method against state-of-the-art object detection networks. For this, images from a dataset composed of 1,394 airborne scenes, where 5,334 palm-trees were manually labeled, were used. The results returned a mean absolute error (MAE) of 0.75 trees and an F1-measure of 86.9%. These results are better than both Faster R-CNN and RetinaNet considering equal experiment conditions. The proposed network provided fast solutions to detect the palm trees, with a delivered image detection of 0.073 seconds and a standard deviation of 0.002 using the GPU. In conclusion, the method presented is efficient to deal with a high-density forest scenario and can accurately map the location of single species like the M flexuosa palm tree and may be useful for future frameworks.
ARTICLE | doi:10.20944/preprints202301.0148.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Uncertainty quantification; Deep learning, Alzheimer; MRI; MCD; Classification
Online: 9 January 2023 (06:58:45 CET)
One of the most common forms of dementia is Alzheimer’s disease (AD), which leads to progressive mental deterioration. Unfortunately, there is no definitive diagnosis and cure that can stop the condition progressing. The diagnosis is often performed based on the clinical history and neuropsychological data, including magnetic resonance imaging (MRI). Deep neural networks (DNN) algorithms are gaining popularity for medical diagnosis, and have been used widely for the analysis of MRI data. DNNs can extract hidden features from thousands of training images automatically. However, they cannot judge how confident they are about their predictions. To use DNNs in safety-critical applications such as medical diagnosis, uncertainty quantification of DNNs predictions is crucial. For this purpose, Monte Carlo dropout (MCD) has been widely used, however, it may lead to overconfident and miss calibrated results. This paper proposes a framework in which the MCD algorithm’s hyper-parameters are optimized during training using Bayesian optimization for the first time. The conducted optimization leads to assigning high predictive entropy to erroneous predictions and making it possible to recognize risky predictions. The proposed framework is used for AD diagnosis, which has not been done before. We compare our method with some existing methods in the literature based on different uncertainty quantification criteria. The results of comprehensive experiments on the Kaggle dataset using a deep model pre-trained on the ImageNet dataset show that the proposed algorithm can quantify uncertainty much better than the existing methods.
ARTICLE | doi:10.20944/preprints201607.0085.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: CNN; Deep Learning; AlexNet; VGGNet; Texture Descriptor; Garment Categories; 13 Garment Trend Identification; Design Classification for Garments.
Online: 27 July 2016 (15:39:53 CEST)
Automatic garments design class identification for recommending the fashion trends is important nowadays because of the rapid growth of online shopping. By learning the properties of images efficiently, a machine can give better accuracy of classification. Several methods, based on Hand-Engineered feature coding exist for identifying garments design classes. But, most of the time, those methods do not help to achieve better results. Recently, Deep Convolutional Neural Networks (CNNs) have shown better performances for different object recognition. Deep CNN uses multiple levels of representation and abstraction that helps a machine to understand the types of data (images, sound, and text) more accurately. In this paper, we have applied deep CNN for identifying garments design classes. To evaluate the performances, we used two well-known CNN models AlexNet and VGGNet on two different datasets. We also propose a new CNN model based on AlexNet and found better results than existing state-of-the-art by a significant margin.
ARTICLE | doi:10.20944/preprints201811.0612.v1
Subject: Environmental And Earth Sciences, Geophysics And Geology Keywords: geophysical signal processing; pattern recognition; temporal convolutional neural networks; seismology; deep learning; nuclear treaty monitoring
Online: 29 November 2018 (03:37:48 CET)
The detection of seismic events at regional and teleseismic distances is critical to Nuclear Treaty Monitoring. Traditionally, detecting regional and teleseismic events has required the use of an expensive multi-instrument seismic array; however in this work, we present DeepPick, a novel seismic detection algorithm capable of array-like performance from a single trace. We achieve this directly, by training our single-trace detector against labeled events from an array catalog, and by utilizing a deep temporal convolutional neural network. The training data consists of all arrivals in the International Seismological Centre Catalog for seven seismic arrays over a five year window from 1 Jan 2010 to 1 Jan 2015, yielding a total training set of 608,362 detections. The test set consists of the same seven arrays over a one year window from 1 Jan 2015 to 1 Jan 2016. We report our results by training the algorithm on six of the arrays and testing it on the seventh, so as to demonstrate the transportability and generalization of the technique to new stations. Detection performance against this test set is outstanding. Fixing a type-I error rate of 1%, the algorithm achieves an overall recall rate of 73% on the 141,095 array beam picks in the test set, yielding 102,394 correct detections. This is more than 4 times the 23,259 detections found in the analyst-reviewed single-trace catalogs over the same period, and represents an 8dB improvement in detector sensitivity over current methods. These results demonstrate the potential of our algorithm to significantly enhance the effectiveness of the global treaty monitoring network.
ARTICLE | doi:10.20944/preprints202105.0636.v1
Subject: Engineering, Automotive Engineering Keywords: cultural heritage; environment; deep learning; artificial intelligence; neural network.
Online: 26 May 2021 (13:06:34 CEST)
This work aims to contribute to better understanding the use of public street spaces. (1) Background: In this sense, with a multidisciplinary approach, the objective of this work is to propose an experimental and reproducible method on a large scale. (2) Study area: The applied methodology uses artificial intelligence to analyze Google Street View (GSV) images at street level. (3) Method: The purpose is to validate a methodology that allows to characterize and quantify the use (pedestrians and cars) of some squares in Rome belonging to different historical periods. (4) Results: Through the use of machine vision techniques, typical of artificial intelligence and which use convolutional neural networks, a historical reading of some selected squares is proposed with the aim of interpreting the dynamics of use and identifying some critical issues in progress. (5) Conclusions: This work validated the usefulness of a method applied to the use of artificial intelligence for the analysis of GSV images at street level.
ARTICLE | doi:10.20944/preprints202302.0299.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: deepfake detection; CNN; deep neural network; computer vision; scale invariant feature transform; histogram of oriented gradients
Online: 17 February 2023 (06:51:37 CET)
Deepfakes are manipulated or altered images, or video, that are created using deep learning models with high levels of photorealism. The two popular methods of producing a deepfake are based on either convolutional neural networks (CNN), or autoencoders. Deepfakes created using CNN comparatively show higher qualities of realism, yet oftentimes leave artifacts and distortions in the generated media that can be detected using machine learning and deep learning algorithms. In recent years, there has been an influx of periocular image and video data because of the increase usage of face masks. By wearing masks, much of what is used for facial recognition is hidden, leaving only the periocular region visible to an observer. This loss of vital information leads to easier misidentification of media, allowing deepfakes to less likely be identified as fake. In this work, feature extraction methods, such as Scale-Invariant Feature Transform (SIFT), Histogram of Oriented Gradients (HOG), and CNN, are used to train an ensemble deep learning model to detect deepfakes in videos on a frame-by-frame level based on the periocular region. Our proposed model is able to distinguish original and manipulated images with accuracies around 98.9 percent, which is an improvement to previous works by combining SIFT and HOG for deepfake detection in convolutional neural networks.
Subject: Computer Science And Mathematics, Computer Science Keywords: Indoor Localization; Sensor Fusion; Multimodal Deep Neural Network; Multimodal Sensing; WiFi Fingerprinting; Pedestrian Dead Reckoning
Online: 13 October 2021 (12:14:39 CEST)
Many engineered approaches have been proposed over the years for solving the hard problem of performing indoor localisation using smartphone sensors. However, specialising these solutions for difficult edge cases remains challenging. Here we propose an end-to-end hybrid multimodal deep neural network localisation system, MM-Loc, relying on zero hand-engineered features, learning them automatically from data instead. This is achieved by using modality-specific neural networks to extract preliminary features from each sensing modality, which are then combined by cross-modality neural structures. We show that our choice of modality-specific neural architectures is capable of estimating the location with good accuracy independently. But for better accuracy, a multimodal neural network fusing the features of early modality-specific representations is a better proposition. Our proposed MM-Loc solution is tested on cross-modality samples characterised by different sampling rates and data representation (inertial sensors, magnetic and WiFi signals), outperforming traditional approaches for location estimation. MM-Loc elegantly trains directly from data unlike conventional indoor positioning systems, which rely on human intuition.
ARTICLE | doi:10.20944/preprints201907.0121.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Artificial Neural Networks; Deep Learning; Generative Neural Networks; Incremental Learning; Novelty detection; Catastrophic Interference
Online: 8 July 2019 (14:29:28 CEST)
Deep learning models are part of the family of artificial neural networks and, as such, it suffers of catastrophic interference when they learn sequentially. In addition, most of these models have a rigid architecture which prevents the incremental learning of new classes. To overcome these drawbacks, in this article we propose the Self-Improving Generative Artificial Neural Network (SIGANN), a type of end-to-end Deep Neural Network system which is able to ease the catastrophic forgetting problem when leaning new classes. In this method, we introduce a novelty detection model to automatically detect samples of new classes, moreover an adversarial auto-encoder is used to produce samples of previous classes. This system consists of three main modules: a classifier module implemented using a Deep Convolutional Neural Network, a generator module based on an adversarial autoencoder; and a novelty detection module, implemented using an OpenMax activation function. Using the EMNIST data set, the model was trained incrementally, starting with a small set of classes. The results of the simulation show that SIGANN is able to retain previous knowledge with a gradual forgetfulness for each learning sequence. Moreover, SIGANN can detect new classes that are hidden in the data and, therefore, proceed with incremental class learning.
ARTICLE | doi:10.20944/preprints201910.0056.v1
Subject: Engineering, Control And Systems Engineering Keywords: Fusarium head blight disease; color imaging; deep neural network
Online: 6 October 2019 (04:11:58 CEST)
Fusarium head blight (FHB) disease is extensively distributed worldwide. This disease damages grain quality and reduces yield. The detection of this disease in a high throughput way is crucial to planters and breeders. Our study focused on developing a method for processing wheat color images and accurately detecting disease areas using deep learning and image processing techniques. The color images of wheat at the milky stage were collected and processed to construct datasets, which were used to retrain a deep convolutional neural network model using transfer learning. Testing results showed that the model can detect spikes, and the coefficient of determination of the number of spikes between the manual count and the detection was 0.80. The model was assessed, and the mean average precision for the testing dataset was 0.9201. On the basis of the results of spike detection, a new color feature was applied to obtain the gray image of each spike. Then, a modified region growing algorithm was implemented to segment and detect the diseased areas of each spike. Results show that the region growing algorithm performs better than K-means and Otsu’s method in segmenting the FHB disease. Overall, this study demonstrates that deep learning techniques enable the accurate detection of FHB in wheat using color images, and the proposed method can effectively detect spikes and diseased areas, thereby improving the efficiency of FHB detection.
REVIEW | doi:10.20944/preprints202102.0340.v1
Subject: Computer Science And Mathematics, Security Systems Keywords: Cybersecurity; Deep Learning; Artificial Neural Network; Artificial Intelligence; Cyber-Attacks; Cybersecurity Analytics; Cyber Threat Intelligence
Online: 16 February 2021 (15:31:02 CET)
Deep learning (DL), which is originated from an artificial neural network (ANN), is one of the major technologies of today's smart cybersecurity systems or policies to function in an intelligent manner. Popular deep learning techniques, such as Multi-layer Perceptron (MLP), Convolutional Neural Network (CNN or ConvNet), Recurrent Neural Network (RNN) or Long Short-Term Memory (LSTM), Self-organizing Map (SOM), Auto-Encoder (AE), Restricted Boltzmann Machine (RBM), Deep Belief Networks (DBN), Generative Adversarial Network (GAN), Deep Transfer Learning (DTL or Deep TL), Deep Reinforcement Learning (DRL or Deep RL), or their ensembles and hybrid approaches can be used to intelligently tackle the diverse cybersecurity issues. In this paper, we aim to present a comprehensive overview from the perspective of these neural networks and deep learning techniques according to today's diverse needs. We also discuss the applicability of these techniques in various cybersecurity tasks such as intrusion detection, identification of malware or botnets, phishing, predicting cyber-attacks, e.g. denial of service (DoS), fraud detection or cyber-anomalies, etc. Finally, we highlight several research issues and future directions within the scope of our study in the field. Overall, the ultimate goal of this paper is to serve as a reference point and guidelines for the academia and professionals in the cyber industries, especially from the deep learning point of view.
ARTICLE | doi:10.20944/preprints201807.0086.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: vibration measurement; frequency prediction; deep learning; convolutional neural network; photogrammetry; computer vison; non-contact measurement
Online: 5 July 2018 (08:31:00 CEST)
Vibration measurement serves as the basis for various engineering practices such as natural frequency or resonant frequency estimation. As image acquisition devices become cheaper and faster, vibration measurement and frequency estimation through image sequence analysis continue to receive increasing attention. In the conventional photogrammetry and optical methods of frequency measurement, vibration signals are first extracted before implementing the vibration frequency analysis algorithm. In this work, we demonstrated that frequency prediction can be achieved using a single feed-forward convolutional neural network. The proposed method is verified using a vibration signal generator and excitation system, and the result obtained was compared with that of an industrial contact vibrometer in a real application. Our experimental results demonstrate that the proposed method can achieve acceptable prediction accuracy even in unfavorable field conditions.
ARTICLE | doi:10.20944/preprints201808.0130.v1
Subject: Engineering, Mechanical Engineering Keywords: SHM; Electromechanical Impedance; Piezoelectricity; Intelligent Fault Diagnosis; Machine Learning; CNN; Deep Learning
Online: 6 August 2018 (21:51:53 CEST)
Preliminaries Convolutional Neural Network (CNN) applications have recently emerged in Structural Health Monitoring (SHM) systems focusing mostly on vibration analysis. However, the SHM literature shows clearly that there is a lack of application regarding the combination of PZT (Lead Zirconate Titanate) based method and CNN. Likewise, applications using CNN along with the Electromechanical Impedance (EMI) technique applied to SHM systems are rare. To encourage this combination, an innovative SHM solution through the combination of the EMI-PZT and CNN is presented here. To accomplish this, the EMI signature is split into several parts followed by computing the Euclidean distances among them to form a RGB (red, green and blue) frame. As a result, we introduce a dataset formed from the EMI-PZT signals of 720 frames, encompassing a total of 4 types of structural conditions for each PZT. In a case study, the CNN-based method was experimentally evaluated using three PZTs glued onto an aluminum plate. The results reveal an effective pattern classification; yielding a 100% hit rate which outperforms other SHM approaches. Furthermore, the method needs only a small dataset for training the CNN, providing several advantages for industrial applications.
ARTICLE | doi:10.20944/preprints202007.0209.v1
Subject: Engineering, Control And Systems Engineering Keywords: Deep learning; Head Related Transfer Function (HRTF); Restoration; Ambisonics; Spatial Audio; Spherical harmonic; Audio signal processing; Denoising; Auto-Encoder; Neural Network
Online: 10 July 2020 (08:58:11 CEST)
Spherical harmonic (SH) interpolation is a commonly used method to spatially up-sample sparse Head Related Transfer Function (HRTF) datasets to denser HRTF datasets. However, depending on the number of sparse HRTF measurements and SH order, this process can introduce distortions in high frequency representation of the HRTFs. This paper investigates whether it is possible to restore some of the distorted high frequency HRTF components using machine learning algorithms. A combination of Convolutional Auto-Encoder (CAE) and Denoising Auto-Encoder (DAE) models is proposed to restore the high frequency distortion in SH interpolated HRTFs. Results are evaluated using both Perceptual Spectral Difference (PSD) and localisation prediction models, both of which demonstrate significant improvement after the restoration process.
ARTICLE | doi:10.20944/preprints201812.0211.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: VHR image; building roof; segmentation; GF2; deep convolution neural network
Online: 18 December 2018 (04:07:47 CET)
This paper presents a novel approach for semantic segmentation of building roof in dense urban environment with Deep Convolution Neural Network (DCNN) using imagery acquired by a Chinese Very High Resolution (VHR) satellite mission, i.e. GaoFen-2 (GF-2). To provide an operational end-to-end work flow for accurate build roof mapping with feature extraction as well as image segmentation, a fully convolutional DCNN with both convolutional and deconvolutional layers is designed to perform the VHR image analysis for labeling pixels. Since the diverse urban patterns and building styles in large areas, sample image data sets of building roof and non-building roof are collected over different metropolitan regions in China. We selected typical cities with dense urban environment in each metropolitan region as study areas for collecting training and test samples. High performance cluster with GPU-mounted workstations is employed to perform the model training and optimization. With the building roof samples collected over different cities, the predictive model with multiple NN layers is developed for building roof labeling. The validation of the building roof map shows that the overall accuracy(OA) and the mean Intersection Over Union( mIOU) of DCNN based segmentation are 94.67%, 0.85 respectively, while CRF-refined segmentation achieved OA of 94.69% and mIOU of 0.83. The results suggest that the proposed approach is a promising solution for building roof mapping with VHR images over large areas across different urban and building patterns. With the operational acquisition of GF2 VHR imagery, it is expected to develop an automated pipeline for operational built-up area monitoring and timely update of building roof map over large areas.
Subject: Engineering, Automotive Engineering Keywords: traffic engineering; traffic incident detection; CNN-XGBoost; Convolution Neural Network; Deep Learning
Online: 15 April 2020 (14:13:35 CEST)
Accurate and efficient traffic incident detection methods can effectively alleviate traffic congestion caused by traffic incidents, prevent secondary accidents and improve the safety of urban road traffic.Aiming at the problems that the traditional machine learning event detection method cannot fully extract the parameter characteristics of traffic flow and is not suitable formulti-dimensional and non-linear massive data, we propose a new traffic event detection method(CNN-XGBoost).This method combines the respective advantages of Convolution Neural Network(CNN) and Extreme Gradation Boosting (XGBoost). Firstly, we preprocessed the original freeway traffic incident detection data set by constructing initial variable set, data normalization, data balance processing and dimension reorganization. Secondly,we use CNN network to automatically extract the deep features of event detection data, and use XGBoost as a classifier to classify the extracted features for expressway traffic event detection.Finally, we use the data set of Hangzhou expressway microwave detector in China to carry out simulation experiments on CNN-XGBoost. The experimental results show that compared with XGBoost, CNN, Support Vector Machine (SVM) and Gradient Boosting Decision Tree (GBDT) and other methods, CNN-XGBoost method can effectively improve the accuracy of expressway traffic event detection and has better generalization ability.
ARTICLE | doi:10.20944/preprints202209.0190.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: green coffee bean; lightweight framework; deep convolutional neural network; explainable model; random optimization
Online: 14 September 2022 (04:04:05 CEST)
In recent years, the demand for coffee has increased tremendously. During the production process, green coffee beans are traditionally screened manually for defective beans before they are packed into coffee bean packages; however, this method is not only time-consuming but also increases the rate of human error due to fatigue. Therefore, this paper proposed a lightweight deep convolutional neural network (LDCNN) for the quality detection system of green coffee beans, which combined depthwise separable convolution (DSC), squeeze-and-excite block (SE block), skip block, and other frameworks. To avoid the influence of low parameters of the lightweight model caused by the model training process, rectified Adam (RA), lookahead (LA), and gradient centralization (GC) were included to improve efficiency; the model was also put into the embedded system. Finally, the local interpretable model-agnostic explanations (LIME) model was employed to explain the predictions of the model. The experimental results indicated that the accuracy rate of the model could reach up to 98.38% and the F1 score could be as high as 98.24% when detecting the quality of green coffee beans. Hence, it can obtain higher accuracy, lower computing time, and lower parameters. Moreover, the interpretable model verified that the lightweight model in this work is reliable, providing the basis for screening personnel to understand the judgment through its interpretability, thereby improving the classification and prediction of the model.
ARTICLE | doi:10.20944/preprints202304.0996.v1
Subject: Biology And Life Sciences, Biology And Biotechnology Keywords: Convolutional Neural Network; Deep Learning; Photoplethysmography; Respiratory Rate; Time Series
Online: 26 April 2023 (13:17:24 CEST)
Respiratory rate is an important biomarker that indicates changes in the clinical condition of critically ill patients, so a surveillance tool that can accurately monitor the changing respiratory rate in real time is needed. Through investigating various pairs of machine learning models, we proposed new machine learning model for real-time respiratory rate estimation using photoplethysmogram. New photoplethysmogram-driven respiratory rate dataset(StMary) was collected from surgical intensive care unit of a tertiary referral hospital, using photoplethysmogram signal collector. For 50patients and 50healthy volunteers, 2-minute photoplethysmogram was collected for each subject twice. To evaluate the respiratory rate of subject, it was inputted into the deep neural network model we built, and dataset was splitted into training, validation, testing dataset, then 4-fold cross validation was exploited. Our deep neural network model trained with StMary and two public datasets(BIDMC and CapnoBase) individually, or selectively merged dataset had shown a low error rate in respiration rate measurements. Our model trained with StMary showed low mean absolute error score(1.0273±0.8965), and trained with 3 datasets(CapnoBase, BIDMC and StMary) showed a lower error rate(1.7359±1.6724) than the model trained with CapnoBase and BIDMC(1.9480±1.6751). We could verify the performance of model evaluating respiratory rate from photoplethysmogram, and our dataset could contribute as the clinical research data that supports artificial intelligence models evaluating respiratory rate and surveillance tools to test whether their monitoring function works properly.
ARTICLE | doi:10.20944/preprints202211.0437.v3
Subject: Engineering, Civil Engineering Keywords: deep neural network; long short-term memory; suspended sediment; discharge
Online: 16 December 2022 (08:08:08 CET)
The dynamics of suspended sediment involves inherent non-linearity and complexity as a result of the presence of both spatial variability of the basin characteristics and temporal climatic patterns. As a result of this complexity, the conventional sediment rating curve (SRC) and other empirical methods produce inaccurate predictions. Deep neural networks (DNNs) have emerged as one of the advanced modeling techniques capable of addressing inherent non-linearity in hydrological processes over the last few decades. DNN algorithms are used to perform predictive analysis and investigate the interdependencies among the most pivotal water quantity and quality parameters i.e., discharge, suspended sediment concentration (SSC), and turbidity. In this study, the Long short-term memory (LSTM) algorithm of DNNs is used to model the discharge-suspended sediment relationship for the Stony Clove Creek. The simulations were run using primary data on discharge, SSC and turbidity. For the development of the DNN models and examining the effects of input vectors, combinations of different input vectors (namely discharge, and SSC) for the current and previous days are considered. Furthermore, a suitable modelling approach with an appropriate model input structure is suggested based on model performance indices for the training and testing phases. The performance of developed models is assessed using statistical indices such as root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2). Statistically, the performance of DNN-based models in simulating the daily SSC performed well with observed sediment concentration series data. The study demonstrates the suitability of the DNN approach for simulation and estimation of daily SSC, opening up new research avenues for applying hybrid soft computing models in hydrology.
REVIEW | doi:10.20944/preprints202206.0167.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: deep learning; convolutional neural network; brain tumor classification; clinical application
Online: 13 June 2022 (04:57:42 CEST)
Deep learning has shown remarkable results in every field, especially in the biomedical field, due to its ability to exploit large-scale datasets. A convolutional neural network (CNN) is a widely used deep learning approach to solve medical imaging problems. Over the past few years, many studies have focused on CNN-based techniques for brain tumor diagnosis. There are, however, still some critical challenges that CNNs face towards clinic application. This study presents a comprehensive review of current literature that involves CNN architectures for brain tumor classification. We compare the key achievements in the performance evaluation metrics of the applied classification algorithms. In addition, this review assesses the clinical effectiveness of the included studies to elaborate on the limitations and directions of this area for future work. No review focusing on the clinical effectiveness of previous works in this field has been published. We believe that this study has the potential to elevate the application of CNN-based deep learning methods in clinical practice and also can be a quick reference for biomedical researchers who are interested in this field.
ARTICLE | doi:10.20944/preprints202302.0086.v2
Subject: Engineering, Civil Engineering Keywords: Deep neural network; long short-term memory; water quality; discharge; stream-water
Online: 17 April 2023 (07:21:31 CEST)
Multivariate predictive analysis of the Stream-Water (SW) parameters (discharge, water level, temperature, dissolved oxygen, pH, turbidity, and specific conductance) is a pivotal task in the field of water resource management during the era of rapid climate change. The highly dynamic and evolving nature of the meteorological and climatic features have a significant impact on the temporal distribution of the SW variables in recent days making the SW variables forecasting even more complicated for diversified water-related issues. To predict the SW variables, various physics-based numerical models are used using numerous hydrologic parameters. Extensive lab-based investigation and calibration are required to reduce the uncertainty involved in those parameters. However, in the age of data-informed analysis and prediction, several deep learning algorithms showed satisfactory performance in dealing with sequential data. In this research, a comprehensive Explorative Data Analysis (EDA) and feature engineering were performed to prepare the dataset to obtain the best performance of the predictive model. Long Short-Term Memory (LSTM) neural network regression model is trained using over several years of daily data to predict the SW variables up to one week ahead of time (lead time) with satisfactory performance. The performance of the proposed model is found highly adequate through the comparison of the predicted data with the observed data, visualization of the distribution of the errors, and a set of error matrices. Higher performance is achieved through the increase in the number of epochs and hyperparameter tuning. This model can be transferred to other locations with proper feature engineering and optimization to perform univariate predictive analysis and potentially be used to perform real-time SW variables prediction.
ARTICLE | doi:10.20944/preprints201812.0258.v1
Subject: Chemistry And Materials Science, Surfaces, Coatings And Films Keywords: copper; polymer coatings; polyvinyl alcohol; silver nanoparticles; deep learning; CNN
Online: 21 December 2018 (07:51:06 CET)
In order to design effective protective coatings against corrosion, the polyvinyl alcohol (PVA) as compound and composite with silver nanoparticles (nAg/PVA) were electrodeposited on copper surface employing electrochemical techniques such as linear potentiometry and cyclic voltammetry. A new paradigm was used to distinguish the features of coatings, i.e., a Deep Convolutional Neural Network (CNN) was implemented to automatically and hierarchically extract the discriminative characteristics from the information given by optical microscopy images. The main arguments that invoke a CNN implementation in the surface science of materials are the following: artificial intelligence techniques can be successfully applied to learn differences between surface coatings; based on their popularity for image processing, CNN can model images related to the problem of coatings; deep learning is able to extract the features that are distinguishable between material surfaces. To provide an overview of the copper surface, CNN was applied on microscope slides (CNN@microscopy) and inherently learnt distinctive characteristics for each class of surface morphology. The material surface morphology controlled by CNN without the interference of the human factor was successfully conducted, in our study, to extract the similarities/differences between unprotected and protected surfaces to establish the PVA and nAg/PVA performance to retard the copper corrosion.
ARTICLE | doi:10.20944/preprints202001.0283.v3
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Autonomous vehicle; Self-driving; Real Driving Behavior; Deep Neural Network; LSV-DNN
Online: 30 November 2020 (11:16:54 CET)
Considering the significant advancements in autonomous vehicle technology, research in this field is of interest to researchers. To drive vehicles autonomously, controlling steer angle, gas hatch, and brakes need to be learned. The behavioral cloning method is used to imitate humans’ driving behavior. We created a dataset of driving in different routes and conditions and using the designed model, the output used for controlling the vehicle is obtained. In this paper, the Learning of Self-driving Vehicles Based on Real Driving Behavior Using Deep Neural Network Techniques (LSV-DNN) is proposed. We designed a convolutional network which uses the real driving data obtained through the vehicle’s camera and computer. The response of the driver is during driving is recorded in different situations and by converting the real driver’s driving video to images and transferring the data to an excel file, obstacle detection is carried out with the best accuracy and speed using the Yolo algorithm version 3. This way, the network learns the response of the driver to obstacles in different locations and the network is trained with the Yolo algorithm version 3 and the output of obstacle detection. Then, it outputs the steer angle and amount of brake, gas, and vehicle acceleration. The LSV-DNN is evaluated here via extensive simulations carried out in Python and TensorFlow environment. We evaluated the network error using the loss function. By comparing other methods which were conducted on the simulator’s data, we obtained good performance results for the designed network on the data from KITTI benchmark, the data collected using a private vehicle, and the data we collected.
Subject: Engineering, Automotive Engineering Keywords: virtual sensor; automotive control; active suspension; vehicle state estimation; neural networks; deep learning; long-short term memory; sequence regression
Online: 24 September 2021 (12:42:07 CEST)
With the automotive industry moving towards automated driving, sensing is increasingly important in enabling technology. The virtual sensors allow data fusion from various vehicle sensors and provide a prediction for measurement that is hard or too expensive to measure in another way or in the case of demand on continuous detection. In this paper, virtual sensing is discussed for the case of vehicle suspension control, where information about the relative velocity of the unsprung mass for each vehicle corner is required. The corresponding goal can be identified as a regression task with multi-input sequence input. The hypothesis is that the state-of-art method of Bidirectional Long-Short Term Memory (BiLSMT) can solve it. In this paper, a virtual sensor has been proposed and developed by training a neural network model. The simulations have been performed using an experimentally validated full vehicle model in IPG Carmaker. Simulations provided the reference data which was used for Neural Network (NN) training. The extensive dataset covering 26 scenarios has been used to obtain training, validation and testing data. The Bayesian Search was used to select the best neural network structure using root mean square error as a metric. The best network is made of 167 BiLSTM, 256 fully connected hidden units and 4 output units. Error histograms and spectral analysis of the predicted signal compared to the reference signal are presented. The results demonstrate the good applicability of neural network-based virtual sensors to estimate vehicle unsprung mass relative velocity.
ARTICLE | doi:10.20944/preprints202210.0059.v1
Subject: Engineering, Control And Systems Engineering Keywords: Artificial Intelligence; Cybersecurity; Remote Control; Fake Signals; Replay Attack; Deep Learning, ResNet50, Transfer Learning.
Online: 6 October 2022 (09:16:56 CEST)
The keyless systems have replaced the old fashion methods of inserting physical keys in the keyhole to, i.e., unlock the door, because they are inconvenient and easy to be exploited by the threat actors. Keyless systems use the technology of radio frequency (RF) as an interface to transmit signals from the key fob to the vehicle. However, Keyless systems are susceptible to being compromised by a thread actor who intercepts the transmitted signal and performs a reply attack. In this paper, we propose a transfer learning-based model to identify the replay attacks launched against remote keyless controlled vehicles. Specifically, the system makes use of a pre-trained ResNet50 deep neural network to predict the wireless remote signals used to lock or unlock doors of a remote-controlled vehicle system remotely. The signals are finally classified into three classes: real signal, fake signal high gain, and fake signal low gain. We have trained our model with 100 epochs (3800 iterations) on a KeFRA 2022 dataset, a modern dataset. The model has recorded a final validation accuracy of 99.71% and a final validation loss of 0.29% at a low inferencing time of 50 ms for the model-based SGD solver. The experimental evaluation revealed the supremacy of the proposed model.
ARTICLE | doi:10.20944/preprints202103.0302.v2
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Searaser; Flow-3D; Prediction; Long short term memory; deep neural network; Root mean error.
Online: 13 April 2021 (09:51:25 CEST)
Accurate forecasts of ocean waves energy can not only reduce costs for investment but it is also essential for management and operation of electrical power. This paper presents an innovative approach based on the Long Short Term Memory (LSTM) to predict the power generation of an economical wave energy converter named “Searaser”. The data for analyzing is provided by collecting the experimental data from another study and the exerted data from numerical simulation of searaser. The simulation is done with Flow-3D software which has high capability in analyzing the fluid solid interactions. The lack of relation between wind speed and output power in previous studies needs to be investigated in this field. Therefore, in this study the wind speed and output power are related with a LSTM method. Moreover, it can be inferred that the LSTM Network is able to predict power in terms of height more accurately and faster than the numerical solution in a field of predicting. The network output figures show a great agreement and the root mean square is 0.49 in the mean value related to the accuracy of LSTM method. Furthermore, the mathematical relation between the generated power and wave height was introduced by curve fitting of the power function to the result of LSTM method.
ARTICLE | doi:10.20944/preprints202009.0516.v1
Subject: Computer Science And Mathematics, Computational Mathematics Keywords: gully erosion susceptibility; deep learning neural network; particle swarm optimization; Shiran watershed
Online: 22 September 2020 (09:48:07 CEST)
This study aims to evaluate a new approach in modeling gully erosion susceptibility based on deep learning neural network (DLNN) model, ensemble Particle swarm optimization (PSO) algorithm with DLNN (PSO-DLNN) and comparing these approaches with common artificial neural network (ANN) and support vector machine (SVM) models in Shiran watershed, Iran. For this purpose, 13 independent variables affecting gully erosion susceptibility in the study area, including altitude, slope, aspect, plan curvature, profile curvature, drainage density, distance from river, land use, soil, lithology, rainfall, , stream power index (SPI), topographic wetness index (TWI), were prepared. Also, 132 gully erosion locations were identified during field visits. Data for modeling were divided into two categories of training (70%) and testing (30%). Receiver operating characteristic (ROC) parameters including sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV) and area under curve (AUC) were used to evaluate the performance of the models. The results showed that, the AUC values from ROC with considering testing datasets of PSO-DLNN is 0.89 and which is associated with superb accuracy. Rest of the models also associated with optimal accuracy and near about PSO-DLNN model; the AUC values from ROC of DLNN, SVM and ANN for testing datasets are 0.87, 0.85 and 0.84 respectively. The PSO algorithm has updated and optimized the weights of DLNN model, and as a result, the efficiency of this model in predicting gully erosion susceptibility has increased. Therefore, it can be concluded that the use of DLNN model and its ensemble with PSO algorithm can be used as a novel and practical method in predicting the susceptibility of gully erosion that helps planners and managers in managing and reducing the risk of this phenomenon.
ARTICLE | doi:10.20944/preprints202209.0398.v1
Subject: Engineering, Civil Engineering Keywords: river discharge; hydro informatics; water resource; data-driven; deep learning; LSTM
Online: 26 September 2022 (11:30:24 CEST)
River flow prediction is a pivotal task in the field of water resource management during the era of rapid climate change. The highly dynamic and evolving nature of the climatic variables e.g., precipitation has a significant impact on the temporal distribution of the river discharge in recent days making the discharge forecasting even more complicated for diversified water-related issues e.g., flood prediction and irrigation planning. To predict the discharge, various physics-based numerical models are used using numerous hydrologic parameters. Extensive lab-based investigation and calibration are required to reduce the uncertainty involved in those parameters. However, in the age of data-driven predictions, several deep learning algorithms showed satisfactory performance in dealing with sequential data. In this research, Long Short-term Memory (LSTM) neural network regression model is trained using over 80 years of daily data to forecast the discharge time series up to 3 days ahead of time. The performance of the model is found satisfactory through the comparison of the predicted data with the observed data, visualization of the distribution of the errors and Root Mean Squared Error (RMSE) value of 0.09. Higher performance is achieved through the increase in the number of epochs and hyper parameter tuning. This model can be transferred to other locations with proper feature engineering and optimization to perform univariate predictive analysis and potentially be used to perform real-time river discharge prediction.
ARTICLE | doi:10.20944/preprints202209.0231.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: neural networks; regularization; deep networks
Online: 15 September 2022 (13:06:13 CEST)
Numerous approaches address over-fitting in neural networks: by imposing a penalty on the parameters of the network (L1, L2, etc); by changing the network stochastically (drop-out, Gaussian noise, etc.); or by transforming the input data (batch normalization, etc.). In contrast, we aim to ensure that a minimum amount of supporting evidence is present when fitting the model parameters to the training data. This, at the single neuron level, is equivalent to ensuring that both sides of the separating hyperplane (for a standard artificial neuron) have a minimum number of data points — noting that these points need not belong to the same class for the inner layers. We firstly benchmark the results of this approach on the standard Fashion-MINST dataset, comparing it to various regularization techniques. Interestingly, we note that by nudging each neuron to divide, at least in part, its input data, the resulting networks make use of each neuron, avoiding a hyperplane completely on one side of its input data (which is equivalent to a constant into the next layers). To illustrate this point, we study the prevalence of saturated nodes throughout training, showing that neurons are activated more frequently and earlier in training when using this regularization approach. A direct consequence of the improved neuron activation is that deep networks are now easier to train. This is crucially important when the network topology is not known a priori and fitting often remains stuck in a suboptimal local minima. We demonstrate this property by training a network of increasing depth (and constant width): most regularization approaches will result in increasingly frequent training failures (over different random seeds) whilst the proposed evidence-based regularization significantly outperforms in its ability to train deep networks.
ARTICLE | doi:10.20944/preprints202109.0130.v1
Subject: Environmental And Earth Sciences, Environmental Science Keywords: machine learning; deep learning; calibration; air quality; low-cost sensors; exposure assessment
Online: 7 September 2021 (14:24:56 CEST)
Although commercially-available low-cost air quality sensors have low accuracy, the sensor system are being used to collect the data for the regulation of PM2.5 emission caused by industrial activities or to estimate the personal exposure for PM2.5. In this work, to solve the accuracy problem of low-cost PM sensor, we developed a new PM2.5 calibration model by combining the deep neural network (DNN) optimized in calibration problem and a LSTM optimized in time-dependent characteristics. First, two datasets were generated to test the accuracy performance and generalization performance of the PM2.5 calibration machine learning (ML) model. The PM2.5 concentrations, temperature and humidity by low-cost sensor and gravimetric-based PM2.5 measuring instrument were sampled for a sufficiently long time. The proposed model was compared with benchmark (multiple linear regression model) and low-cost sensor results. For root mean square error (RMSE) for PM2.5 concentrations, the proposed model reduced 41-60% of error compared to the raw data of low-cost sensor, and reduced 30-51% of error compared to the benchmark model. R2 of ML model, MLR and raw data were 93, 80 and 59 %. Also, the developed model still showed consistent calibration performance when calibrated with new sensors in different locations. Low-cost sensors combined with ML model not only can improve the calibration performance of benchmark, but also can be applied to the sensor monitoring systems for various epidemiologic investigations and regulatory decisions.
ARTICLE | doi:10.20944/preprints201903.0039.v2
Subject: Engineering, Control And Systems Engineering Keywords: Handwritten digit recognition; Convolutional Neural Network (CNN); Deep learning; MNIST dataset; Epochs; Hidden Layers; Stochastic Gradient Descent; Backpropagation
Online: 20 September 2019 (10:12:26 CEST)
In recent times, with the increase of Artificial Neural Network (ANN), deep learning has brought a dramatic twist in the field of machine learning by making it more Artificial Intelligence (AI). Deep learning is used remarkably used in vast ranges of fields because of its diverse range of applications such as surveillance, health, medicine, sports, robotics, drones etc. In deep learning, Convolutional Neural Network (CNN) is at the center of spectacular advances that mixes Artificial Neural Network (ANN) and up to date deep learning strategies. It has been used broadly in pattern recognition, sentence classification, speech recognition, face recognition, text categorization, document analysis, scene, and handwritten digit recognition. The goal of this paper is to observe the variation of accuracies of CNN to classify handwritten digits using various numbers of hidden layer and epochs and to make the comparison between the accuracies. For this performance evaluation of CNN, we performed our experiment using Modified National Institute of Standards and Technology (MNIST) dataset. Further, the network is trained using stochastic gradient descent and the backpropagation algorithm.
ARTICLE | doi:10.20944/preprints202203.0288.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: computer-aided detection; convolutional neural network; COVID-19; deep learning; image classification
Online: 22 March 2022 (02:19:50 CET)
One of the critical tools for early detection and subsequent evaluation of the incidence of lung diseases is chest radiography. This study presents a real-world implementation of a convolutional neural network (CNN) based Carebot Covid app to detect COVID-19 from chest X-ray (CXR) images. Our proposed model takes the form of a simple and intuitive application. Used CNN can be deployed as a STOW-RS prediction endpoint for direct implementation into DICOM viewers. The results of this study show that the deep learning model based on DenseNet and ResNet architecture can detect SARS-CoV-2 from CXR images with precision of 0.981, recall of 0.962 and AP of 0.993.
ARTICLE | doi:10.20944/preprints202304.0022.v1
Subject: Engineering, Aerospace Engineering Keywords: non-destructive testing; deep learning; automated defect recognition (ADR); semantic segmentation; digital X-ray radiography
Online: 3 April 2023 (10:25:31 CEST)
In response to the growing inspection demand exerted by process automation in component manufacturing, Non-destructive testing (NDT) continues to explore automated approaches that utilize deep learning algorithms for defect identification, including within digital X-ray radiography images. This necessitates a thorough understanding of the implication of image quality parameters on the performance of these deep learning models. This study investigates the influence of two image quality parameters, namely Signal-to-Noise Ratio (SNR) and Contrast-to-Noise Ratio (CNR), on the performance of U-net deep learning segmentation model. Input images were acquired with varying combinations of exposure factors such as kilovoltage, milli-ampere, and exposure time, which altered the resultant quality. The data was sorted into 5 different datasets according to their measured SNR and CNR values. The deep learning model was trained 5 distinct times, utilizing a unique dataset for each training session. Training the model with high CNR values yielded an intersection over Union (IoU) metric of 0.9594 on test data of the same category but drops to 0.5875 when tested on lower CNR test data. The result in this study emphasizes the importance of achieving a balance in training dataset according to the investigated quality parameters, to enhance the performance of deep learning segmentation models in NDT radiography applications.
ARTICLE | doi:10.20944/preprints202210.0448.v1
Subject: Arts And Humanities, Art Keywords: computational creativity; deep learning; feature extraction; image analysis; machine perception; painting classification; residual networks; transfer learning
Online: 28 October 2022 (09:37:03 CEST)
With the increasing availability of large digitized fine art collections, automated analysis and classification of paintings is becoming an interesting area of research. However, due to domain specificity, implicit subjectivity, and pervasive nuances that vaguely separate art movements, analyzing art using machine learning techniques poses significant challenges. Residual networks, or variants thereof, are one the most popular tools for image classification tasks, which can extract relevant features for well-defined classes. In this case study, we focus on the classification of a selected painting 'Portrait of the Painter Charles Bruni' by Johann Kupetzky and the analysis of the performance of the proposed classifier. We show that the features extracted during residual network training can be useful for image retrieval within search systems in online art collections.
REVIEW | doi:10.20944/preprints202104.0421.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: non-intrusive load monitoring; load disaggregation; NILM; review; deep learning; deep neural networks; machine learning
Online: 15 April 2021 (15:05:09 CEST)
This paper reviews non-intrusive load monitoring (NILM) approaches that employ deep neural networks to disaggregate appliances from low frequency data, i.e. data with sampling rates lower than the AC base frequency. We first review the many degrees of freedom of these approaches, what has already been done in literature, and compile the main characteristics of the reviewed publications in an extensive overview table. The second part of the paper discusses selected aspects of the literature and corresponding research gaps. In particular, we do a performance comparison with respect to reported MAE and F$_1$-scores and observe different recurring elements in the best performing approaches, namely data sampling intervals below 10\,s, a large field of view, the usage of GAN losses, multi-task learning, and post-processing. Subsequently, multiple input features, multi-task learning and related research gaps are discussed, the need for comparative studies is highlighted, and finally, missing elements for a successful deployment of NILM approaches based on deep neural networks are pointed out. We conclude the review with an outlook on possible future scenarios.
ARTICLE | doi:10.20944/preprints202102.0318.v3
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: Machine Learning; Artificial Intelligence; Androgen Receptor; Random Forest; Deep Neural Network; Convolutional
Online: 24 February 2021 (13:14:01 CET)
Substances that can modify the androgen receptor pathway in humans and animals are entering the environment and food chain with the proven ability to disrupt hormonal systems and leading to toxicity and adverse effects on reproduction, brain development, and prostate cancer, among others. State-of-the-art databases with experimental data of human, chimp, and rat effects by chemicals have been used to build machine learning classifiers and regressors and evaluate these on independent sets. Different featurizations, algorithms, and protein structures lead to dif- ferent results, with deep neural networks (DNNs) on user-defined physicochemically-relevant features developed for this work outperforming graph convolutional, random forest, and large featurizations. The results show that these user-provided structure-, ligand-, and statistically-based features and specific DNNs provided the best results as determined by AUC (0.87), MCC (0.47), and other metrics and by their interpretability and chemical meaning of the descriptors/features. In addition, the same features in the DNN method performed better than in a multivariate logistic model: validation MCC = 0.468 and training MCC = 0.868 for the present work compared to evalu- ation set MCC = 0.2036 and training set MCC = 0.5364 for the multivariate logistic regression on the full, unbalanced set. Techniques of this type may improve AR and toxicity description and predic- tion, improving assessment and design of compounds. Source code and data are available at https://github.com/AlfonsoTGarcia-Sosa/ML
ARTICLE | doi:10.20944/preprints202009.0524.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: COVID-19; chest X-ray images; deep convolutional neural network; COV-MCNet; deep learning
Online: 23 September 2020 (03:31:30 CEST)
The COVID-19 pandemic situation has created even more difficulties in the quick identification and screening of the COVID-19 patients for the medical specialists. Therefore, a significant study is necessary for detecting COVID-19 cases using an automated diagnosis method, which can aid in controlling the spreading of the virus. In this paper, the study suggests a Deep Convolutional Neural Network-based multi-classification approach (COV-MCNet) using eight different pre-trained architectures such as VGG16, VGG19, ResNet50V2, DenseNet201, InceptionV3, MobileNet, InceptionResNetV2, Xception which are trained and tested on the X-ray images of COVID-19, Normal, Viral Pneumonia, and Bacterial Pneumonia. The results from 3-class (Normal vs. COVID-19 vs. Viral Pneumonia) showed that only the ResNet50V2 model provides the highest classification performance (accuracy: 95.83%, precision: 96.12%, recall: 96.11%, F1-score: 96.11%, specificity: 97.84%) compared to rest of the models. The results from 4-class (Normal vs. COVID-19 vs. Viral Pneumonia vs. Bacterial Pneumonia) demonstrated that the pre-trained model DenseNet201 provides the highest classification performance (accuracy: 92.54%, precision: 93.05%, recall: 92.81%, F1-score: 92.83%, specificity: 97.47%). Notably, the ResNet50V2 (3-class) and DenseNet201 (4-class) models in the proposed COV-MCNet framework showed higher accuracy compared to the rest six models. This indicates that the designed system can produce promising results to detect the COVID-19 cases on the availability of more data. The proposed multi-classification network (COV-MCNet) significantly speeds up the existing radiology-based method, which will be helpful to the medical community and clinical specialists for early diagnosis of the COVID-19 cases during this pandemic.
ARTICLE | doi:10.20944/preprints202108.0011.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Transformer; spike; neural decoding; CNN; RNN; LSTM; deep learning; information; neuroscience
Online: 2 August 2021 (09:51:43 CEST)
Neural decoding from spiking activity is an essential tool for understanding the information encoded in population neurons, especially in applications like brain-computer interface (BCI). Various quantitative methods have been proposed and have shown superiorities under different scenarios respectively. From the machine learning perspective, the decoding task is to map the high-dimensional spatial & temporal neuronal activity to the low-dimensional physical quantities (e.g., velocity, position). Because of the complex interactions and the abundant dynamics among neural circuits, good decoding algorithms usually have the capability of capturing flexible spatiotemporal structures embedded in the input feature space. Recently, the Transformer-based models are widely used in processing natural languages and images due to its superior performances in handling long-range and global dependencies. Hence, in this work we examine the potential applications of Transformers in neural decoding and introduce two Transformer-based models. Besides adapting the Transformer to neuronal data, we also propose a data augmentation method for overcoming the data shortage issue. We test our models on three experimental datasets and their performances are comparable to the previous state-of-the-art (SOTA) RNN-based methods. In addition, Transformer-based models show increased decoding performances when the input sequences are longer, while LSTM-based models deteriorate quickly. Our research suggests that Transformer-based models are important additions to the existing neural decoding solutions, especially for large datasets with long temporal dependencies.
ARTICLE | doi:10.20944/preprints202109.0285.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: remote sensing; deep learning; image classification
Online: 16 September 2021 (13:38:55 CEST)
Autonomous image recognition has numerous potential applications in the field of planetary science and geology. For instance, having the ability to classify images of rocks would allow geologists to have immediate feedback without having to bring back samples to the laboratory. Also, planetary rovers could classify rocks in remote places and even in other planets without needing human intervention. Shu et al. classified 9 different types of rock images using a Support Vector Machine (SVM) with the image features extracted autonomously. Through this method, the authors achieved a test accuracy of 96.71%. In this research, Convolutional Neural Networks(CNN) have been used to classify the same set of rock images. Results show that a 3-layer network obtains an average accuracy of 99.60% across 10 trials on the test set. A version of Self-taught Learning was also implemented to prove the generalizability of the features extracted by the CNN. Finally, one model has been chosen to be deployed on a mobile device to demonstrate practicality and portability. The deployed model achieves a perfect classification accuracy on the test set, while taking only 0.068 seconds to make a prediction, equivalent to about 14 frames per second.
ARTICLE | doi:10.20944/preprints202105.0117.v2
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: decision trees; deep feed-forward network; neural trees; consistency; optimal rate of convergence
Online: 9 November 2021 (16:54:30 CET)
Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980s. On the other hand, deep learning methods have boosted the capacity of machine learning algorithms and are now being used for non-trivial applications in various applied domains. But training a fully-connected deep feed-forward network by gradient-descent backpropagation is slow and requires arbitrary choices regarding the number of hidden units and layers. In this paper, we propose near-optimal neural regression trees, intending to make it much faster than deep feed-forward networks and for which it is not essential to specify the number of hidden units in the hidden layers of the neural network in advance. The key idea is to construct a decision tree and then simulate the decision tree with a neural network. This work aims to build a mathematical formulation of neural trees and gain the complementary benefits of both sparse optimal decision trees and neural trees. We propose near-optimal sparse neural trees (NSNT) that is shown to be asymptotically consistent and robust in nature. Additionally, the proposed NSNT model obtain a fast rate of convergence which is near-optimal up to some logarithmic factor. We comprehensively benchmark the proposed method on a sample of 80 datasets (40 classification datasets and 40 regression datasets) from the UCI machine learning repository. We establish that the proposed method is likely to outperform the current state-of-the-art methods (random forest, XGBoost, optimal classification tree, and near-optimal nonlinear trees) for the majority of the datasets.
ARTICLE | doi:10.20944/preprints202301.0208.v1
Subject: Physical Sciences, Biophysics Keywords: Deep belief network; Diabetes; Prediction; Risk Factors; Deep Learning
Online: 12 January 2023 (03:54:15 CET)
Diabetes mellitus is a popular life-threatening disease and patients may gradually have started suffering from other diabetes-causing diseases such as heart attacks, stroke, hypertension, blurry vision, blindness, foot ulcer, amputation, kidney damage and other organ failures before diagnosis. Early detection can help reduce the fatality of this disease. Deep learning models have proven very useful in disease detection and computer-aided diagnosis. In this work, we proposed a deep unsupervised machine learning model for early detection of diabetes using voting ensemble feature selection and deep belief neural networks (DBN). Dataset was obtained from an online repository containing responses of prediagnosed patients to direct questionnaires administered in Sylhet Diabetes Hospital in Sylhet, Bangladesh. The dataset was preprocessed and preprocessed. Features were reduced using the ensemble feature selector. The DBN model was pretrained and tuned to obtain optimal performance. The model was also compared with other models with no multiple hidden layers. The DBN performed at its relative best with F1-measure, precision and recall of 1.00, 0.92 and 1.00 respectively. We conclude that DBN is a useful tool for an unsupervised early prediction of Type II diabetes mellitus.
ARTICLE | doi:10.20944/preprints201910.0376.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: artificial neural network; deep learning; LSTM; speech processing
Online: 31 October 2019 (16:40:30 CET)
Speech signals are degraded in real-life environments, product of background noise or other factors. The processing of such signals for voice recognition and voice analysis systems presents important challenges. One of the conditions that make adverse quality difficult to handle in those systems is reverberation, produced by sound wave reflections that travel from the source to the microphone in multiple directions.To enhance signals in such adverse conditions, several deep learning-based methods have been proposed and proven to be effective. Recently, recurrent neural networks, especially those with long and short-term memory (LSTM), have presented surprising results in tasks related to time-dependent processing of signals, such as speech. One of the most challenging aspects of LSTM networks is the high computational cost of the training procedure, which has limited extended experimentation in several cases. In this work, we present a proposal to evaluate the hybrid models of neural networks to learn different reverberation conditions without any previous information. The results show that some combination of LSTM and perceptron layers produce good results in comparison to those from pure LSTM networks, given a fixed number of layers. The evaluation has been made based on quality measurements of the signal's spectrum, training time of the networks and statistical validation of results. Results help to affirm the fact that hybrid networks represent an important solution for speech signal enhancement, with advantages in efficiency, but without a significan drop in quality.
REVIEW | doi:10.20944/preprints202202.0050.v1
Subject: Engineering, Bioengineering Keywords: Neuroprosthetics; Brain Computer Interface; Neural Implants; Deep Brain Stimulation
Online: 3 February 2022 (11:06:15 CET)
Recent progress in microfabrication technique allowed the rapid development of neural implants. They are getting categorized as effective tools for clinical practice, especially to treat traumatic and neurodegenerative disorders. Microelectrode arrays already have been used in numerous neural interface devices. Basically, almost all neural implants have been developed based on BCI (Brain Computer Interface) system. When BCI system falls under invasive technique, it is referred as BMI or Brain Machine Interface. BMIs hold promises for neurorehabilitation of motor and sensory function, cognitive state evaluation and treatment of neurological chaos. A directed overview of the field of neural implants is discussed in this article. The aim of this review is to give a brief introduction of neural prosthetics as well as their exciting applications in treating neurological disorders and a deep discussion on their functionality are mentioned. BCI system and their different types, their functionality, their pros and cons, how other neural implants developed, and their present status have been covered. Different possibilities and possible future of deep brain stimulation (DBS), Neuralink, motor and sensory neural prosthetics are further discussed.
ARTICLE | doi:10.20944/preprints202301.0579.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: hybrid modeling; deep neural networks; deep learning; SBML; systems biology; computational modeling
Online: 31 January 2023 (08:51:13 CET)
In this paper we propose a computational framework that merges mechanistic modeling with deep neural networks obeying the Systems Biology Markup Language (SBML) standard. Over the last 20 years, the systems biology community has developed a large number of mechanistic models in SBML that are currently stored in public databases. With the proposed framework, existing SBML mechanistic models may be upgraded to hybrid systems through the incorporation of deep neural networks into the model core, using a freely available python tool. The so-formed hybrid mechanistic/neural network models are trained with a deep learning algorithm based on the adaptive moment estimation method (ADAM), stochastic regularization and semidirect sensitivity equations. The trained hybrid models are encoded in SBML and stored back in model databases, where they can be further analyzed as regular SBML models. The application of this approach is illustrated with three well-known case studies: the threonine synthesis model in Escherichia coli, the P58IPK signal transduction model, and the Yeast glycolytic oscillations model. The proposed framework is expected to greatly facilitate the widespread use of hybrid modeling techniques for systems biology applications.
ARTICLE | doi:10.20944/preprints202304.1088.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: deep learning; image aesthetics assessment; image enhancement
Online: 28 April 2023 (03:15:16 CEST)
Abstract: Image aesthetic assessment (IAA) with neural attention has made significant progress due to its effectiveness in object recognition. Current studies have shown that the features learned by convolutional neural networks (CNN) at different learning stages indicate meaningful information. The shallow feature contains the low-level information of images and the deep feature perceives the image semantics and themes. Inspired by this, we propose a visual enhancement network with feature fusion (FF-VEN). It consists of two sub-modules, the visual enhancement module (VE module) and the shallow and deep feature fusion module (SDFF module). The former uses an adaptive filter in the spatial domain to simulate human eyes according to the region of interest (ROI) extracted by neural feedback. The latter not only takes out the shallow feature and the deep feature by transverse connection, but also uses a feature fusion unit (FFU) to fuse the pooled features together with the aim of information contribution maximization. Experiments on standard AVA dataset and Photo.net dataset show the effectiveness of FF-VEN.
Subject: Engineering, Control And Systems Engineering Keywords: deep learning; signal detection; wideband spectrogram; centerline
Online: 12 May 2020 (12:50:41 CEST)
Wideband signal detection is an important problem in wireless communication. With the rapid development of deep learning (DL) technology, some DL-based methods are applied to wireless technology, and the effect is obvious. In this paper, we propose a novel neural network for multi-type signal detection that can locate signals and recognize signal types in wideband spectrogram. Our network utilizes the key point estimation to locate the rough centerline of signal region and identify class. Then, several regressions are carried out to achieve properties, such as local offset and border offsets of bounding box, which is further synthesized for a more fine location. Experimental results demonstrate that our method performs more accurate than other DL-based object detection methods previously employed for the same detection task. Specifically, our method runs obviously faster than existing methods, and abandons the anchor generation, which makes it more favorable for real-time applications.
REVIEW | doi:10.20944/preprints202104.0739.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Deep neural network; survey; document images; review paper; deep learning; performance evaluation; page object detection, graphical page objects; document image analysis; page segmentation
Online: 28 April 2021 (10:17:49 CEST)
In any document, graphical elements like tables, figures, and formulas contain essential information. The processing and interpretation of such information require specialized algorithms. Off-the-shelf OCR components cannot process this information reliably. Therefore, an essential step in document analysis pipelines is to detect these graphical components. It leads to a high-level conceptual understanding of the documents that makes digitization of documents viable. Since the advent of deep learning, the performance of deep learning-based object detection has improved many folds. In this work, we outline and summarize the deep learning approaches for detecting graphical page objects in the document images. Therefore, we discuss the most relevant deep learning-based approaches and state-of-the-art graphical page object detection in document images. This work provides a comprehensive understanding of the current state-of-the-art and related challenges. Furthermore, we discuss leading datasets along with the quantitative evaluation. Moreover, it discusses briefly the promising directions that can be utilized for further improvements.
ARTICLE | doi:10.20944/preprints202211.0130.v1
Subject: Computer Science And Mathematics, Other Keywords: IoT; localization; LoRaWAN; Deep Learning
Online: 8 November 2022 (01:06:12 CET)
In the field of low power wireless networks, one of the techniques on which many researchers are putting their efforts is related to positioning methodologies such as fingerprinting in dense urban areas. This paper presents an experimental study aimed at quantifying the mean location estimation error in densely urbanized areas.Using a dataset made available by the University of Antwerp, a neural network was implemented with the aim of providing the position of the end-devices. In this way it was possible to measure the mean location estimation error in an area with high urban density. The results obtained show an accuracy in the localization of the end-device of less than 150 meters.This result would make it possible to use the fingerprint instead of alternative, energy consuming, methodologies such as GPS in IoT (Internet of Things) applications where battery life is the primary requirement to be met.
ARTICLE | doi:10.20944/preprints202002.0180.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: deep learning; neural attention; loans; loan origination; machine learning
Online: 14 February 2020 (02:45:01 CET)
In this paper we address the understanding of the problem, of why a deep learning model decides that an individual is eligible for a loan or not. Here we propose a novel approach for inferring, which attributes matter the most, for making a decision in each specific individual case. Specifically we leverage concepts from neural attention to devise a novel feature wise attention mechanism. As we show, using real world datasets, our approach offers unique insights into the importance of various features, by producing a decision explanation for each specific loan case. At the same time, we observe that our novel mechanism, generates decisions which are much closer to the decisions generated by human experts, compared to the existent competitors.
ARTICLE | doi:10.20944/preprints202304.0804.v1
Subject: Computer Science And Mathematics, Robotics Keywords: registration; point clouds; urban scene; deep learning
Online: 23 April 2023 (13:59:52 CEST)
Urban scene point cloud pose significant challenges for registration due to its large data volume, similar scenarios and dynamic objects. In this paper, we propose PCRMLP, a model for urban scene point cloud registration that achieves comparable registration performance to prior learning-based methods. Compared to previous works which focus on extracting features and estimating correspondence, the model estimates the transformation implicitly from concrete instances. An instance-level urban scene representation method is introduced to extract instance descriptors via semantic segmentation and DBSCAN, which enable the model to obtain robust instance features, filter dynamic objects and estimate transformation in a more logical manner. Then a lightweight network consisting of MLPs is employed to obtain transformation in an encoder-decoder manner. We validate the approach on KITTI dataset. Experimental results demonstrate that PCRMLP can obtain a satisfactory coarse transformation from instance descriptors just in 0.0028s. With a subsequent ICP refinement module, the proposed method achieves higher registration accuracy and computational efficiency than prior learning-based works.
ARTICLE | doi:10.20944/preprints202301.0075.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: EMG; optimization; genetic algorithm; deep learning
Online: 4 January 2023 (09:21:39 CET)
Hand gesture recognition has many valuable applications in engineering and health care. This study proposes a novel model which can accurately distinguish hand gestures using forearm muscles' surface electromyogram (sEMG) signals. A deep learning algorithm with hyper parameters impacting the final model’s accuracy and a convolutional neural network (CNN) were employed in the recognition stage. The number of convolutional layers, kernels per layer, and neurons in the dense layer were selected for optimization, while the remaining parameters, such as the learning rate, batch size, and number of epochs, were chosen based on trial and error and prior knowledge. The optimal values for the selected hyperparameters were obtained using a genetic algorithm to achieve maximum recognition accuracy. The UC2018 Dual-Myo database was used for training and testing the model based on EMG signals characterizing the activity of eight different hand gestures. The final structure of the model consisted of two convolutional layers with 131 and 28 kernels, a dense layer with 111 neurons, and a softmax layer with eight neurons. Upon optimizing the hyperparameters using the genetic algorithm, the accuracy of the proposed model increased from 91.86% to 96.4% at best and 95.3% on average in real-time applications and 99.6% in an offline mode. Future work is warranted towards improving the architecture and the computational cost.
ARTICLE | doi:10.20944/preprints201811.0400.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Super-Resolution; Deep-learning; Generative Adversarial Networks; CMOS sensors
Online: 16 November 2018 (10:44:26 CET)
Complementary Metal-Oxide-Semiconductor (CMOS) is a typical image sensor that has a wide range of applications. However, considering the limitations of the weather condition and hardware cost, it is hard to capture high-resolution images by CMOS sensor. Recently, Super-Resolution (SR) techniques for image restoration has been gaining attentions due to its excellent performance. Under the powerful learning ability, Generative Adversarial Networks (GANs) have been proved to achieve great success. In this paper, we propose the Advanced Generative Adversarial Networks (AGAN) to efficiently correct these issues; 1) we design a Laplacian pyramid framework as pre-trained module, which is beneficial to provide multi-scale features for our input. 2) at each feature block, a convolutional skip-connections network, which may contain some latent information, is significant for generative model to reconstruct a plausible-looking image; 3) considering that edge details usually play an important role in image generation, a novel perceptual loss function is defined to train and seek optimal parameters. It is effective to achieve excellent and compelling quality captured by CMOS sensor. Quantitative and qualitative evaluations have been demonstrated that our algorithm not only fully takes advantage of Convolutional Neural Networks (CNNs) to improve the image quality, but also performs better than previous GAN algorithms for super-resolution task.
ARTICLE | doi:10.20944/preprints202008.0113.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Scene classification; Deep Learning; Convolutional Neural Networks; Feature learning
Online: 5 August 2020 (06:19:27 CEST)
State-of-the-art remote sensing scene classification methods employ different Convolutional Neural Network architectures for achieving very high classification performance. A trait shared by the majority of these methods is that the class associated with each example is ascertained by examining the activations of the last fully connected layer, and the networks are trained to minimize the cross-entropy between predictions extracted from this layer and ground-truth annotations. In this work, we extend this paradigm by introducing an additional output branch which maps the inputs to low dimensional representations, effectively extracting additional feature representations of the inputs. The proposed model imposes additional distance constrains on these representations with respect to identified class representatives, in addition to the traditional categorical cross-entropy between predictions and ground-truth. By extending the typical cross-entropy loss function with a distance learning function, our proposed approach achieves significant gains across a wide set of benchmark datasets in terms of classification, while providing additional evidence related to class membership and classification confidence.
ARTICLE | doi:10.20944/preprints202304.0203.v4
Subject: Engineering, Electrical And Electronic Engineering Keywords: Electric Vehicles; Battery Management System; Lithium-ion batteries; Deep Learning
Online: 19 April 2023 (03:34:32 CEST)
This paper presents an improved SOC estimation method for lithium ion batteries in Electric Vehicles using Bayesian optimized feedforward network. This innovative bayesian optimized neural network method attempts to minimize a scalar objective function by extracting hyperpa-rameters (hidden neurons in both layers) using a surrogate model. Furthemore, the hyperparameters are built and data samples are trained and validated. The performance of the proposed deep learning neural network is evaluated. Two reasonable size data samples are ex-tracted from Panasonic 18650PF Li-ion Mendeley datasets that are used for training and valida-tion. RNN and LSTM neural network algorithms offer the common core property of retaining past information and/or hidden states for better SOC estimation. However, the feature of this pro-posed method is the inclusion of Bayesian optimization that chooses optimal double layer hidden neurons. Analysis of results shows that Bayesian optimized feedforward algorithm with average MAPE (0.20%) is the lowest and is the best selection compared with average MAPE for other five deep learning algorithms. In the last quarter of fuel gauge, where fuel anxiety is severe, feed-forward with Bayesian Optimization algorithm is still the best selection (with MAPE of 0.64%).
ARTICLE | doi:10.20944/preprints202305.1522.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Recommendation system; Contrast learning; Deep Learning
Online: 22 May 2023 (11:55:55 CEST)
Modelling both long and short-term user interests from historical data is crucial for accurate recommendations. However, unifying these metrics across multiple application domains can be challenging, and existing approaches often rely on complex, intertwined models which can be difficult to interpret. To address this issue, we propose a lightweight, plug-and-play interest enhancement module that fuses interest vectors from two independent models. After analyzing the dataset, we identify deviations in the recommendation performance of long and short-term interest models. To compensate for these differences, we use feature enhancement and loss correction during training. In the fusion process, we explicitly split long-term interest features with longer duration into multiple local features. We then use a shared attention mechanism to fuse multiple local features with short-term interest features to obtain interaction features. To correct for bias between models, we introduce a comparison learning task that monitors the similarity between local features, short-term features, and interaction features. This adaptively reduces the distance between similar features. Our proposed module combines and compares multiple independent long-term and short-term interest models on multiple domain datasets. As a result, it not only accelerates the convergence of the models but also achieves outstanding performance in challenging recommendation scenarios.
ARTICLE | doi:10.20944/preprints201804.0286.v1
Subject: Business, Economics And Management, Finance Keywords: electricity price forecasting; deep learning; gated recurrent units; long short term memory; artificial intelligence, turkish day-ahead market
Online: 23 April 2018 (11:38:27 CEST)
Accurate electricity price forecasting has become a substantial requirement since the liberalization of the electricity markets. Due to the challenging nature of the electricity prices, which includes high volatility, sharp price spikes and seasonality, various types of electricity price forecasting models still compete and can not outperform each other consistently. Neural Networks have been successfully used in machine learning problems and Recurrent Neural Networks (RNNs) have been proposed to address time-dependent learning problems. In particular, Long Short Term Memory and Gated Recurrent Units (GRU) are tailor-made for time series price estimation. In this paper, we propose to use Gated Recurrent Units as a new technique for electricity price forecasting. We have trained a variety of algorithms with rolling 3-year window and compared the results with the RNNs. In our experiments, 3-layered GRUs outperformed all other neural network structures and state of the art statistical techniques in a statistically significant manner in the Turkish day-ahead market.
ARTICLE | doi:10.20944/preprints202207.0056.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: deep learning; convolutional neural networks; classification; machine learning; IoT
Online: 5 July 2022 (04:22:49 CEST)
In videos, the human's actions are of three-dimensional (3D) signals. These videos investigate the spatiotemporal knowledge of human behavior. The promising ability is investigated using 3D convolution neural networks (CNNs). The 3D CNNs have not yet achieved high output for their well-established two-dimensional (2D) equivalents in still photographs. Board 3D Convolutional Memory and Spatiotemporal fusion face training difficulty preventing 3D CNN from accomplishing remarkable evaluation. In this paper, we implement Hybrid Deep Learning Architecture that combines STIP and 3D CNN features to enhance the performance of 3D videos effectively. After implementation, the more detailed and deeper charting for training in each circle of space-time fusion. The training model further enhances the results after handling complicated evaluations of models. The video classification model is used in this implemented model. Intelligent 3D Network Protocol for Multimedia Data Classification using Deep Learning is introduced to further understand space-time association in human endeavors. In the implementation of the result, the well-known dataset, i.e., UCF101 to, evaluates the performance of the proposed hybrid technique. The results beat the proposed hybrid technique that substantially beats the initial 3D CNNs. The results are compared with state-of-the-art frameworks from literature for action recognition on UCF101 with an accuracy of 95%.
ARTICLE | doi:10.20944/preprints202009.0416.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: adversarial learning; deep cross-modal hashing; self-attention mechanism
Online: 18 September 2020 (04:16:58 CEST)
Recently deep cross-modal hashing networks have received increasing interests due to its superior query efficiency and low storage cost. However, most of existing methods concentrate less on hash representations learning part, which means the semantic information of data cannot be fully used. Furthermore, they may neglect the high-ranking relevance and consistency of hash codes. To solve these problems, we propose a Self-Attention and Adversary Guided Hashing Network (SAAGHN). Specifically, it employs self-attention mechanism in hash representations learning part to extract rich semantic relevance information. Meanwhile, in order to keep invariability of hash codes, adversarial learning is adopted in the hash codes learning part. In addition, to generate higher-ranking hash codes and avoid local minima early, a new batch semi-hard cosine triplet loss and a cosine quantization loss are proposed. Extensive experiments on two benchmark datasets have shown that SAAGHN outperforms other baselines and achieves the state-of-the-art performance.
ARTICLE | doi:10.20944/preprints202211.0488.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Capsule network; differential features; deep learning; micro-expression recognition; spatiotemporal features
Online: 28 November 2022 (03:12:12 CET)
Micro-expression (ME) is one of the key psychological stress reactions. It is a modest, spontaneous facial mechanism. ME has significant applicability in a variety of psychologically-related sectors because to its precision and unpredictability with regard to psychological manifestations. Nevertheless, the current Micro-expression recognition (MER) algorithms have poor accuracy and a limited quantity of ME data, and this study issue has not been thoroughly investigated. Therefore, we present an approach for deep learning based on a Spatio-temporal capsule network (STCP-Net). STCP-Net has four components: a jitter reduction module, a differential feature extraction module, an STCP module, and a fully linked layer. The first two modules are aimed to extract diversifying differential features more precisely and to limit the influence of head jitter. The STCP module is used to extract Spatio-temporal features layer by layer, taking the temporal and geographical connection between features into account. This research runs sufficient trials using the Leave One Subject Out (LOSO) methodology for cross-validation using the CASMEII dataset. The conclusion and analysis demonstrate that the algorithm is innovative and efficient.
BRIEF REPORT | doi:10.20944/preprints202305.0768.v1
Subject: Environmental And Earth Sciences, Environmental Science Keywords: Climate; Contiguous United States; Deep Neural Network; Land Cover; Large Wildfire
Online: 10 May 2023 (14:46:12 CEST)
Over the last several decades, large wildfires are increasingly common across the United States causing disproportionate impact on forest health and function, human well-being, and economy. Here, we examine the severity of large wildfires across the Contiguous United States over the past decade (2011-2020) using a wide array of meteorological, vegetational, and topographical features in the Deep Neural Network model. A total of 4,538 wildfire incidents were used in the analysis covering 87,305 square miles of burned area. We observed the highest number of large wildfires in California, Texas, and Idaho, with lightning causing 43 % of these incidents. Importantly, results indicate that the severity of wildfire occurrences is highly correlated with the climatological forcings, land cover, location, and elevation of the ecosystem. Overall, results may serve useful guide in managing landscapes under changing climate and disturbance regimes.
ARTICLE | doi:10.20944/preprints201905.0228.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Deep learning, LSTM, Machine learning, Post-filtering, Signal processing, Speech Synthesis
Online: 17 May 2019 (16:16:53 CEST)
Several researchers have contemplated deep learning-based post-filters to increase the quality of statistical parametric speech synthesis, which perform a mapping of the synthetic speech to the natural speech, considering the different parameters separately and trying to reduce the gap between them. The Long Short-term Memory (LSTM) Neural Networks have been applied successfully in this purpose, but there are still many aspects to improve in the results and in the process itself. In this paper, we introduce a new pre-training approach for the LSTM, with the objective of enhancing the quality of the synthesized speech, particularly in the spectrum, in a more efficient manner. Our approach begins with an auto-associative training of one LSTM network, which is used as an initialization for the post-filters. We show the advantages of this initialization for the enhancing of the Mel-Frequency Cepstral parameters of synthetic speech. Results show that the initialization succeeds in achieving better results in enhancing the statistical parametric speech spectrum in most cases when compared to the common random initialization approach of the networks.
ARTICLE | doi:10.20944/preprints201810.0494.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: unsupervised training; features learning; deep learning; time series forecasting
Online: 22 October 2018 (12:24:43 CEST)
A continuous Deep Belief Network (cDBN) with two hidden layers is proposed in this paper, focusing on the problem of weak feature learning ability when dealing with continuous data. In cDBN, the input data is trained in an unsupervised way by using continuous version of transfer functions, the contrastive divergence is designed in hidden layer training process to raise convergence speed, an improved dropout strategy is then implemented in unsupervised training to realize features learning by de-cooperating between the units, and then the network is fine-tuned using back propagation algorithm. Besides, hyper-parameters are analysed through stability analysis to assure the network can find the optimal. Finally, the experiments on Lorenz chaos series, CATS benchmark and other real world like CO2 and waste water parameters forecasting show that cDBN has the advantage of higher accuracy, simpler structure and faster convergence speed than other methods.
ARTICLE | doi:10.20944/preprints201805.0276.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: blood pressure; oscillometric measurement; statistical analysis; normality; confidence interval; deep belief networks
Online: 21 May 2018 (12:54:26 CEST)
Oscillometric blood pressure (BP) devices currently estimate a single point but do not identify fluctuations in BP or distinguish them from variations in response to physiological properties. In this paper, to analyze BP normality based on oscillometric measurements, we use statistical approaches including kurtosis, skewness, Kolmogorov-Smirnov, and correlation tests. Then, to mitigate uncertainties, we use a deep neural network (DNN) to determine the confidence limits (CLs) of BP measurements based on their normality. The proposed DNN regression model decreases the standard deviation of error (SDE) of the mean error (ME) and the mean absolute error (MAE) and reduces the uncertainty of the CLs and SDEs of the proposed technique. We validate the normality of the distribution of the BP estimation distribution which fits the Gaussian distribution very well. We use a rank test in the DNN regression model to demonstrate the independence of the artificial SBP and DBP estimations. First, we perform statistical tests to verify the normality of the BP measurements for individual subjects. The proposed methodology provides accurate BP estimations and reduces the uncertainties associated with the CLs and SDEs based on the DNN regression estimator.
REVIEW | doi:10.20944/preprints202011.0152.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: EEG signal recognition; machine learning in EEG; neural networks in EEG; dry electrode EEG; deep learning EEG
Online: 3 November 2020 (14:07:29 CET)
In the last decade, unprecedented progress in the development of neural networks influenced dozens of different industries, among which are signal processing for the electroencephalography process (EEG). Electroencephalography, even though it appeared in the first half of the 20th century, to this day didn’t change the physical principles of operation. But the signal processing technique due to the use of neural networks progressed significantly in this area. Evidence for this can serve that for the past 5 years more than 1000 publications on the topic of using machine learning have been published in popular libraries. Many different models of neural networks complicate the process of understanding the real situation in this area. In this manuscript, we provided the most comprehensive overview of research where were used neural networks for EEG signal processing.
ARTICLE | doi:10.20944/preprints202107.0699.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: adaptive computing; dynamic deep neural structure; adpative convolution; dynamic training
Online: 30 July 2021 (12:25:45 CEST)
The colossal depths of the deep neural network sometimes suffer from ineffective backpropagation of the gradients through all its depths. Whereas, The strong performance of shallower multilayer neural structures prove their ability to increase the gradient signals in the early stages of training which easily gets backpropagated for global loss corrections. Shallow neural structures are always a good starting point for encouraging the sturdy feature characteristics of the input. In this research, a shallow, deep neural structure called PrimeNet is proposed. PrimeNet is aimed to dynamically identify and encourage the quality visual indicators from the input to be used by the subsequent deep network layers and increase the gradient signals in the lower stages of the training pipeline. In addition to this, the layerwise training is performed with the help of locally generated errors which means the gradient is not backpropagated to previous layers, and the hidden layer weights are updated during the forward pass, making this structure a backpropagation free variant. PrimeNet has obtained state-of-the-art results on various image datasets, attaining the dual objective of (1) compact dynamic deep neural structure, which (2) eliminates the problem of backwards-locking. The PrimeNet unit is proposed as an alternative to traditional convolution and dense blocks for faster and memory-efficient training, outperforming previously reported results aimed at adaptive methods for parallel and multilayer deep neural systems.
ARTICLE | doi:10.20944/preprints202005.0430.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Activity Context Sensing; Smartphones; Deep Convolutional Neural Networks; Smart devices
Online: 26 May 2020 (11:33:55 CEST)
With the widespread of embedded sensing capabilities of mobile devices, there has been unprecedented development of context-aware solutions. This allows the proliferation of various intelligent applications such as those for remote health and lifestyle monitoring, intelligent personalized services, etc. However, activity context recognition based on multivariate time series signals obtained from mobile devices in unconstrained conditions is naturally prone to imbalance class problems. This means that recognition models tend to predict classes with the majority number of samples whilst ignoring classes with the least number of samples, resulting in poor generalization. To address this problem, we propose to augment the time series signals from inertia sensors with signals from ambient sensing to train deep convolutional neural networks (DCNN) models. DCNN provides the characteristics that capture local dependency and scale invariance of these combined sensor signals. Consequently, we developed a DCNN model using only inertial sensor signals and then developed another model that combined signals from both inertia and ambient sensors aiming to investigate the class imbalance problem by improving the performance of the recognition model. Evaluation and analysis of the proposed system using data with imbalanced classes show that the system achieved better recognition accuracy when data from inertial sensors are combined with those from ambient sensors such as environment noise level and illumination, with an overall improvement of 5.3% accuracy.
TECHNICAL NOTE | doi:10.20944/preprints202009.0678.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: multi-frame super resolution; wide activation super resolution; 3D convolutional neural network; deep learning
Online: 27 September 2020 (11:54:56 CEST)
The small satellite market continues to grow year after year. A compound annual growth rate of 17% is estimated during the period between 2020 and 2025. Low-cost satellites can send a vast amount of images to be post-processed at the ground to improve the quality and extract detailed information. In this domain lies the resolution enhancement task, where a low-resolution image is converted to a higher resolution automatically. Deep learning approaches to Super-Resolution (SR) reached the state-of-the-art in multiple benchmarks; however, most of them were studied in a single-frame fashion. With satellite imagery, multi-frame images can be obtained at different conditions giving the possibility to add more information per image and improve the final analysis. In this context, we developed and applied to the PROBA-V dataset of multi-frame satellite images a model that recently topped the European Space Agency’s Multi-frame Super Resolution (MFSR) competition. The model is based on proven methods that worked on 2D images tweaked to work on 3D: the Wide Activation Super Resolution (WDSR) family. We show that with a simple 3D CNN residual architecture with WDSR blocks and a frame permutation technique as data augmentation better scores can be achieved than with more complex models. Moreover, the model requires few hardware resources, both for training and evaluation, so it can be applied directly from a personal laptop.
ARTICLE | doi:10.20944/preprints202206.0043.v1
Subject: Medicine And Pharmacology, Orthopedics And Sports Medicine Keywords: deep learning; lumbarnet; lumbar spine; spondylolisthesis; u-net
Online: 3 June 2022 (10:15:57 CEST)
A common spinal condition, spondylolisthesis is the presence of a relative back or forth displacement between the upper and lower vertebra due to one vertebra being oriented away from the smooth curvature of a normal spine. Aging-related illnesses such as degenerative spondylolisthesis are especially burdensome on social welfare and health-care systems in an aging society, especially radiologists and clinical physicians. Therefore, we proposed a computer aided diagnosis algorithm, named LumbarNet, for vertebral slippage detection on clinical X-ray images. Collaborating with i) a P-grade, ii) a piecewise slope detection scheme, and iii) a dynamic shift detection routine, LumbarNet was thus specialized for analyzing complex structural patterns in lumbar spine X-ray images and outcompeted other U-Net based methods. Extensive experiments on lumbar spine X-ray images in standard clinical practices showed that LumbarNet achieved a mean intersection over union value of 0.88 in vertebral region detection and an accuracy of 88.83% in vertebral slippage detection.
ARTICLE | doi:10.20944/preprints202003.0035.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: meta-learning; lie group; machine learning; deep learning; convolutional neural network
Online: 3 March 2020 (11:09:53 CET)
Deep learning has achieved lots of successes in many fields, but when trainable sample are extremely limited, deep learning often under or overfitting to few samples. Meta-learning was proposed to solve difficulties in few-shot learning and fast adaptive areas. Meta-learner learns to remember some common knowledge by training on large scale tasks sampled from a certain data distribution to equip generalization when facing unseen new tasks. Due to the limitation of samples, most approaches only use shallow neural network to avoid overfitting and reduce the difficulty of training process, that causes the waste of many extra information when adapting to unseen tasks. Euclidean space-based gradient descent also make meta-learner's update inaccurate. These issues cause many meta-learning model hard to extract feature from samples and update network parameters. In this paper, we propose a novel method by using multi-stage joint training approach to post the bottleneck during adapting process. To accelerate adapt procedure, we also constraint network to Stiefel manifold, thus meta-learner could perform more stable gradient descent in limited steps. Experiment on mini-ImageNet shows that our method reaches better accuracy under 5-way 1-shot and 5-way 5-shot conditions.
ARTICLE | doi:10.20944/preprints201810.0756.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Dermoscopic Image, Skin Lesion, Melanoma, Simulation, Generative Adversarial Networks, Deep Learning
Online: 1 November 2018 (17:54:12 CET)
Automated skin lesion analysis is one of the trending fields that has gained attention among the dermatologists and healthcare practitioners. Skin lesion restoration is an essential preprocessing step for lesion enhancements for accurate automated analysis and diagnosis. Digital hair removal is a non-invasive method for image enhancement by solving the hair-occlusion artefact in previously captured images. Several hair removal methods were proposed for skin delineation and removal. However, manual annotation is one of the main challenges that hinder the validation of these proposed methods on a large number of images or using benchmarking datasets for comparison purposes. In the presented work, we propose a realistic hair simulator based on context-aware image synthesis using image-to-image translation techniques via conditional adversarial generative networks for generation of different hair occlusions in skin images, along with the ground-truth mask for hair location. Besides, we explored using three loss functions including L1-norm, L2-norm and structural similarity index (SSIM) to maximise the synthesis quality. For quantitatively evaluate the realism of image synthesis, the t-SNE feature mapping and Bland-Altman test are employed as objective metrics. Experimental results show the superior performance of our proposed method compared to previous methods for hair synthesis with plausible colours and preserving the integrity of the lesion texture.
ARTICLE | doi:10.20944/preprints201908.0068.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: deep learning; convolutional neural networks (CNN); transfer learning; class activation mapping (CAM); building defects; structural-health monitoring
Online: 6 August 2019 (04:18:29 CEST)
Clients are increasingly looking for fast and effective means to quickly and frequently survey and communicate the condition of their buildings so that essential repairs and maintenance work can be done in a proactive and timely manner before it becomes too dangerous and expensive. Traditional methods for this type of work commonly comprise of engaging building surveyors to undertake a condition assessment which involves a lengthy site inspection to produce a systematic recording of the physical condition of the building elements, including cost estimates of immediate and projected long-term costs of renewal, repair and maintenance of the building. Current asset condition assessment procedures are extensively time consuming, laborious, and expensive and pose health and safety threats to surveyors, particularly at height and roof levels which are difficult to access. We propose a method for automated detection and localisation of key building defects from images using deep learning and convolution neural networks. The proposed model is based on a pre-trained VGG-16 classifier with Class Activation Mapping (CAM) for object localisation. The model has proven to be robust and able to accurately detect and localise mould growth, stains, and paint deterioration defects arising from dampness in buildings. The approach is being developed with potentials to scale-up to support automated detection of defects and deterioration of buildings in real-time using mobile devices and drones.
Subject: Chemistry And Materials Science, Biomaterials Keywords: Microscopy Image Segmentation; Deep Learning; Data Augmentation; Synthetic Training Data; Parametric Models
Online: 1 March 2021 (13:07:00 CET)
The analysis of microscopy images has always been an important yet time consuming process in in materials science. Convolutional Neural Networks (CNNs) have been very successfully used for a number of tasks, such as image segmentation. However, training a CNN requires a large amount of hand annotated data, which can be a problem for material science data. We present a procedure to generate synthetic data based on ad-hoc parametric data modelling for enhancing generalization of trained neural network models. Especially for situations where it is not possible to gather a lot of data, such an approach is beneficial and may enable to train a neural network reasonably. Furthermore, we show that targeted data generation by adaptively sampling the parameter space of the generative models gives superior results compared to generating random data points.
ARTICLE | doi:10.20944/preprints201809.0361.v3
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: deep learning; convolutional neural networks; polar mesocyclones; satellite data processing; pattern recognition
Online: 29 October 2018 (10:16:49 CET)
Polar mesocyclones (MCs) are small marine atmospheric vortices. The class of intense MCs, called polar lows, are accompanied by extremely strong surface winds and heat fluxes and thus largely influencing deep ocean water formation in the polar regions. Accurate detection of polar mesocyclones in high-resolution satellite data, while challenging, is a time-consuming task, when performed manually. Existing algorithms for the automatic detection of polar mesocyclones are based on the conventional analysis of patterns of cloudiness and involve different empirically defined thresholds of geophysical variables. As a result, various detection methods typically reveal very different results when applied to a single dataset. We develop a conceptually novel approach for the detection of MCs based on the use of deep convolutional neural networks (DCNNs). As a first step, we demonstrate that DCNN model is capable of performing binary classification of 500x500km patches of satellite images regarding MC patterns presence in it. The training dataset is based on the reference database of MCs manually tracked in the Southern Hemisphere from satellite mosaics. We use a subset of this database with MC diameters falling in the range of 200-400 km. This dataset is further used for testing several different DCNN setups, specifically, DCNN built “from scratch”, DCNN based on VGG16 pre-trained weights also engaging the Transfer Learning technique, and DCNN based on VGG16 with Fine Tuning technique. Each of these networks is further applied to both infrared (IR) and a combination of infrared and water vapor (IR+WV) satellite imagery. The best skills (97% in terms of the binary classification accuracy score) is achieved with the model that averages the estimates of the ensemble of different DCNNs. The algorithm can be further extended to the automatic identification and tracking numerical scheme and applied to other atmospheric phenomena characterized by a distinct signature in satellite imagery.
ARTICLE | doi:10.20944/preprints202208.0197.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Deep neural networks; Adversarial Attacks; Poisoning; Backdoors; Trojans; Taxonomy; Ontology; Knowledge Base; Explainable AI; Green AI
Online: 10 August 2022 (09:39:07 CEST)
Deep neural networks (DNN) have successfully delivered a cutting-edge performance in several fields. With the broader deployment of DNN models on critical applications, the security of DNNs becomes an active and yet nascent area. Attacks against DNNs can have catastrophic results, according to recent studies. Poisoning attacks, including backdoor and Trojan attacks, are one of the growing threats against DNNs. Having a wide-angle view of these evolving threats is essential to better understand the security issues. In this regard, creating a semantic model and a knowledge graph for poisoning attacks can reveal the relationships between attacks across intricate data to enhance the security knowledge landscape. In this paper, we propose a DNN Poisoning Attacks Ontology (DNNPAO) that would enhance knowledge sharing and enable further advancements in the field. To do so, we have performed a systematic review of the relevant literature to identify the current state. We collected 28,469 papers from IEEE, ScienceDirect, Web of Science, and Scopus databases, and from these papers, 712 research papers were screened in a rigorous process, and 55 poisoning attacks in DNNs were identified and classified. We extracted a taxonomy of the poisoning attacks as a scheme to develop DNNPAO. Subsequently, we used DNNPAO as a framework to create a knowledge base. Our findings open new lines of research within the field of AI security.
ARTICLE | doi:10.20944/preprints202207.0265.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: GOES-R; GRU; Deep Learning; Wildfires; Active Fires; Early Detection; Monitoring
Online: 18 July 2022 (10:28:06 CEST)
Early detection of wildfires has been limited using the sun-synchronous orbit satellites due to their low temporal resolution and wildfires’ fast spread in the early stage. NOAA’s geostationary weather satellites GOES-R can acquire images every 15 minutes at 2km spatial resolution, and have been used for early fire detection. However, advanced processing algorithms are needed to provide timely and reliable detection of wildfires. In this research, a deep learning framework, based on Gated Recurrent Units (GRU), is proposed to detect wildfires at early stage using GOES-R dense time series data. GRU model maintains good performance on temporal modelling while keep a simple architecture, makes it suitable to efficiently process time-series data. 36 different wildfires in North and South America under the coverage of GOES-R satellites are selected to assess the effectiveness of the GRU method. The detection times based on GOES-R are compared with VIIRS active fire products at 375m resolution in NASA’s Fire Information for Resource Management System (FIRMS). The results show that GRU-based GOES-R detections of the wildfires are earlier than that of the VIIRS active fire products in most of the study areas. Also, results from proposed method offer more precise location on the active fire at early stage than GOES-R Active Fire Product in mid-latitude and low-latitude regions.
ARTICLE | doi:10.20944/preprints202109.0180.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: point cloud registration; template point cloud; multiple partial point cloud; deep learning
Online: 10 September 2021 (10:26:10 CEST)
With the advancement of photoelectric technology and computer image processing technology, the visual measurement method based on point clouds is gradually applied to the 3D measurement of large workpieces. Point cloud registration is a key step in 3D measurement, and its registration accuracy directly affects the accuracy of 3D measurements. In this study, we designed a novel MPCR-Net for multiple partial point cloud registration networks. First, an ideal point cloud was extracted from the CAD model of the workpiece and was used as the global template. Next, a deep neural network was used to search for the corresponding point groups between each partial point cloud and the global template point cloud. Then, the rigid body transformation matrix was learned according to these correspondence point groups to realize the registration of each partial point cloud. Finally, the iterative closest point algorithm was used to optimize the registration results to obtain a final point cloud model of the workpiece. We conducted point cloud registration experiments on untrained models and actual workpieces, and by comparing them with existing point cloud registration methods, we verified that the MPCR-Net could improve the accuracy and robustness of the 3D point cloud registration.
ARTICLE | doi:10.20944/preprints202305.0282.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Artificial intelligence; Neural Networks; Deep learning; Multitasking learning; Solar photovoltaic; Smart grids; Multiple Electrical Disturbances; Power quality
Online: 5 May 2023 (03:12:18 CEST)
Electrical power quality is one of the main elements in power generation systems. At the same time, it is one of the most significant challenges regarding stability and reliability. Due to different switching devices in this type of architecture, different kinds of power generators, and non-linear loads are used for different industrial processes. As a result of this, the need to classify and analyze Power quality disturbance (PQD) to prevent and analyze the degradation of the system reliability affected by the non-linear and non-stationary oscillatory nature. This paper presents A Novel Mul-titasking Deep Neural Network (MDL) for the Classification and Analysis of Multiple Electrical Disturbances. The characteristics are extracted with a specialized and adaptive methodology for non-stationary signals, Empirical Mode Decomposition (EMD). The methodology’s design, devel-opment, and various performance tests are carried out with 28 different difficulty levels, such as severity, disturbance duration time, and noise in the 20 dB to 60 dB signal range. MDL was devel-oped with a diverse data set in difficulty and noise, with a quantity of 4500 records of different samples of multiple electrical disturbances. The analysis and classification methodology has an average accuracy percentage of 95% with multiple disturbances. In addition, an average accuracy percentage of 90% in analyzing important signal aspects for studying electrical power quality such as crest factor, Per Unit voltage analysis, Short Term Flicker Perceptibility (Pst), and Total Harmonic Distortion (THD), among others.
Subject: Engineering, Bioengineering Keywords: AI; deep-learning; neural-networks; graph neural-networks; cheminformatics; molecular property; machine-learning; computational chemistry; lipophilicity; solubility
Online: 1 October 2021 (14:29:01 CEST)
The accurate prediction of molecular properties such as lipophilicity and aqueous solubility are of great importance and pose challenges in several stages of the drug discovery pipeline. Machine learning methods like graph-based neural networks (GNNs) have shown exceptionally good performance in predicting these properties. In this work, we introduce a novel GNN architecture, called directed edge graph isomorphism network (D-GIN). It is composed of two distinct sub-architectures (D-MPNN, GIN) and achieves an improvement in accuracy over its sub-architectures employing various learning, and featurization strategies. We argue that combining models with different key aspects help make graph neural networks deeper and simultaneously increase their predictive power. Furthermore, we address current limitations in assessment of deep-learning models, namely, comparison of single training run performance metrics, and offer a more robust solution.
ARTICLE | doi:10.20944/preprints202304.0645.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Lip Reading; Multiclass Classification; Turkish Lip Reading Dataset; Deep Learning; Convolutional Neural Networks; Lip Detection
Online: 20 April 2023 (10:07:48 CEST)
Automated lip reading is a research problem that has developed considerably in recent years. Lip reading is evaluated both visually and audibly in some cases. The lip reading model is a field of use for detecting specific words using images from security cameras, but it is not possible to use audio-visual databases in this situation. It is not possible to obtain the sound input of the pronounced word in all cases. We collected a new Turkish dataset with only the image in this study. The new dataset is produced using Youtube videos, which is an uncontrolled environment. For this reason, images have difficult parameters in terms of environmental factors such as light, angle, color, and personal characteristics of the face. Despite the different features on the human face such as mustache, beard, and make-up, the visual speech recognition problem was developed on 10 classes including single words and two-word phrases using Convolutional Neural Networks (CNN) without any intervention on the data. The proposed study using only-visual data obtained a model which is automated visual speech recognition with a deep learning approach. In addition, since this study uses only-visual data, the computational cost and resource usage is less than in multi-modal studies. It is also the first known study to address the lip reading problem with a deep learning algorithm using a new dataset belonging to the Ural-Altaic languages.
ARTICLE | doi:10.20944/preprints202005.0002.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Grammar Evolution; Deep Learning; Network Architecture Search; Grammar Guided Genetic Programming
Online: 2 May 2020 (11:36:18 CEST)
Photogrammetry involves aerial photography of the earth’s surface and subsequently processing the images to provide a more accurate depiction of the area (Orthophotography). It’s used by the Spanish Instituto Geográfico Nacional to update road cartography but requires a significant amount of manual labor due to the need to perform visual inspection of all tiled images. Deep Learning techniques (artificial neural networks with more than one hidden layer) can perform road detection but it is still unclear how to find the optimal network architecture. Our system applies grammar guided genetic programming to the search of deep neural network architectures. In this kind of evolutive algorithm all the population individuals (here candidate network architectures) are constrained to rules specified by a grammar that defines valid and useful structural patterns to guide the search process. Grammar used includes well-known complex structures (e.g. Inception-like modules) combined with a custom designed mutation operator (dynamically links the mutation probability to structural diversity). Pilot results show that the system is able to design models for road detection that obtain test accuracies similar to that reached by state of the art models when evaluated over a dataset from the Spanish National Aerial Orthophotography Plan.
ARTICLE | doi:10.20944/preprints201912.0059.v2
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: hyperspectral image classiﬁcation; deep learning; channel-wise attention mechanism; spatial-wise attention mechanism
Online: 12 February 2020 (05:40:08 CET)
In recent years, researchers have paid increasing attention on hyperspectral image (HSI) classification using deep learning methods. To improve the accuracy and reduce the training samples, we propose a double-branch dual-attention mechanism network (DBDA) for HSI classification in this paper. Two branches are designed in DBDA to capture plenty of spectral and spatial features contained in HSI. Furthermore, a channel attention block and a spatial attention block are applied to these two branches respectively, which enables DBDA to refine and optimize the extracted feature maps. A series of experiments on four hyperspectral datasets show that the proposed framework has superior performance to the state-of-the-art algorithm, especially when the training samples are signally lacking.
REVIEW | doi:10.20944/preprints202110.0135.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: convolutional neural networks (CNNs); deep learning; computer-aided diagnosis; colorectal polyps; colorectal cancer; colonoscopy
Online: 8 October 2021 (10:50:53 CEST)
As a relatively high percentage of adenoma polyps are missed, a computer-aided diagnosis (CAD) tool based on deep learning can aid the endoscopist in diagnosing colorectal polyps or colorectal cancer in order to decrease polyps missing rate and prevent colorectal cancer mortality. Convolutional Neural Network (CNN) is a deep learning method and has achieved better results in detecting and segmenting specific objects in images in the last decade than conventional models such as regression, support vector machines or artificial neural networks. In recent years, based on the studies in medical imaging criteria, CNN models have acquired promising results in detecting masses and lesions in various body organs, including colorectal polyps. In this review, the structure and architecture of CNN models and how colonoscopy images are processed as input and converted to the output are explained in detail. In most primary studies conducted in the colorectal polyp detection and classification field, the CNN model has been regarded as a black box since the calculations performed at different layers in the model training process have not been clarified precisely. Furthermore, I discuss the differences between the CNN and conventional models, inspect how to train the CNN model for diagnosing colorectal polyps or cancer, and evaluate model performance after the training process.
ARTICLE | doi:10.20944/preprints202105.0429.v1
Subject: Medicine And Pharmacology, Other Keywords: Acute lymphoblastic leukemia; Deep convolutional neural networks; Ensemble image classifiers; C-NMC-2019 dataset.
Online: 19 May 2021 (07:42:23 CEST)
Although automated Acute Lymphoblastic Leukemia (ALL) detection is essential, it is challenging due to the morphological correlation between malignant and normal cells. The traditional ALL classification strategy is arduous, time-consuming, often suffers inter-observer variations, and necessitates experienced pathologists. This article has automated the ALL detection task, employing deep Convolutional Neural Networks (CNNs). We explore the weighted ensemble of deep CNNs to recommend a better ALL cell classifier. The weights are estimated from ensemble candidates' corresponding metrics, such as accuracy, F1-score, AUC, and kappa values. Various data augmentations and pre-processing are incorporated for achieving a better generalization of the network. We train and evaluate the proposed model utilizing the publicly available C-NMC-2019 ALL dataset. Our proposed weighted ensemble model has outputted a weighted F1-score of 88.6%, a balanced accuracy of 86.2%, and an AUC of 0.941 in the preliminary test set. The qualitative results displaying the gradient class activation maps confirm that the introduced model has a concentrated learned region. In contrast, the ensemble candidate models, such as Xception, VGG-16, DenseNet-121, MobileNet, and InceptionResNet-V2, separately produce coarse and scatter learned areas for most example cases. Since the proposed ensemble yields a better result for the aimed task, it can experiment in other domains of medical diagnostic applications.
ARTICLE | doi:10.20944/preprints201809.0481.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Brain-Computer Interfaces, spectrogram-based convolutional neural network model(pCNN), Deep Learning, EEG, LSTM, RCNN
Online: 25 September 2018 (08:58:34 CEST)
Non-invasive, electroencephalography (EEG)-based brain-computer interfaces (BCIs) on motor imagery movements translate the subject’s motor intention into control signals through classifying the EEG patterns caused by different imagination tasks, e.g. hand movements. This type of BCI has been widely studied and used as an alternative mode of communication and environmental control for disabled patients, such as those suffering from a brainstem stroke or a spinal cord injury (SCI). Notwithstanding the success of traditional machine learning methods in classifying EEG signals, these methods still rely on hand-crafted features. The extraction of such features is a difficult task due to the high non-stationarity of EEG signals, which is a major cause for the stagnating progress in classification performance. Remarkable advances in deep learning methods allow end-to-end learning without any feature engineering, which could benefit BCI motor imagery applications. We developed three deep learning models: 1) a long short-term memory (LSTM); 2) a proposed spectrogram-based convolutional neural network model (pCNN); and 3) a recurrent convolutional neural network (RCNN), for decoding motor imagery movements directly from raw EEG signals without (manual) feature engineering. Results were evaluated on our own, publicly available, EEG data collected from 20 subjects and on an existing dataset known as 2b EEG dataset from "BCI Competition IV". Overall, better classification performance was achieved with deep learning models compared to state-of-the art machine learning techniques, which could chart a route ahead for developing new robust techniques for EEG signal decoding. We underpin this point by demonstrating the successful real-time control of a robotic arm using our CNN based BCI.
ARTICLE | doi:10.20944/preprints202212.0405.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: hyperspectral data; few-shot learning; deep features; convolution kernels; edge-preserving filtering
Online: 22 December 2022 (01:44:48 CET)
In recent years, different deep learning frameworks were introduced for hyperspectral image (HSI) classification. However, the proposed network models have a higher model complexity and do not provide high classification accuracy if few-shot learning is used. This paper pre-sents an HSI classification method that combines random patches network (RPNet) and re-cursive filtering (RF) to obtain informative deep features. The proposed method first convolves image bands with random patches to extract multi-level deep RPNet features. Thereafter, the RPNet feature set is subjected to dimension reduction through principal component analysis (PCA) and the extracted components are filtered using the RF procedure. Finally, HSI spectral features and the obtained RPNet-RF features are combined to classify the HSI using a support vector machine (SVM) classifier. In order to test the performance of the proposed RPNet-RF method, some experiments were performed on three widely known datasets using a few training samples for each class and classification results were compared with those obtained by other advanced HSI classification methods adopted for small training samples. The comparison showed that the RPNet-RF classification is characterized by higher values of such evaluation metrics as overall accuracy and Kappa coefficient (https://github.com/UchaevD/RPNet-RF).
ARTICLE | doi:10.20944/preprints202002.0334.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: deep learning; drone imagery; hyperspectral image classiﬁcation; tree species classification; 3D convolutional neural networks
Online: 24 February 2020 (01:13:13 CET)
Interest in drone solutions in forestry applications is growing. Using drones, datasets can be captured flexibly and at high spatial and temporal resolutions when needed. In forestry applications, fundamental tasks include the detection of individual trees, tree species classification, bio-mass estimation, etc. Deep Neural Networks (DNN) have shown superior results when comparing with conventional machine learning methods such as Multi-Layer Perceptron (MLP) in cases of huge input data. The objective of this research was to investigate 3D convolutional neural networks (3D-CNN) to classify three major tree species in a boreal forest: pine, spruce, and birch. The proposed 3D-CNN models were employed to classify tree species in a test site in Finland. The classifiers were trained with a dataset of 3039 manually labelled trees. Then the accuracies were assessed by employing independent datasets of 803 records. To find the most efficient set of feature combination, we compare the performances of 3D-CNN models trained with hyperspectral (HS) channels, RGB channels, and canopy height model (CHM), separately and combined. It is demonstrated that the proposed 3D-CNN model with RGB and HS layers produces the highest classification accuracy. The producer accuracy of the best 3D-CNN classifier on the test dataset were 99.6%, 94.8%, and 97.4% for pines, spruces, and birches, respectively. The best 3D-CNN classifier produced ~5% better classification accuracy than the MLP with all layers. Our results suggest that the proposed method provides excellent classification results with acceptable performance metrics for HS datasets. Our results show that pine class was detectable in most layers. Spruce was most detectable in RGB data, while birch was most detectable in the HS layers. Furthermore, the RGB datasets provide acceptable results for many low-accuracy applications.
ARTICLE | doi:10.20944/preprints202009.0458.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: machine learning; deep leaning; physiological maturity; computer vision; plant breeding; Phenology; Glycine max (L.) Merr.
Online: 19 September 2020 (10:08:43 CEST)
Soybean maturity is a trait of critical importance for the development of new soybean cultivars, nevertheless, its characterization based on visual ratings has many challenges. Unmanned aerial vehicles (UAVs) imagery-based high-throughput phenotyping methodologies have been proposed as an alternative to the traditional visual ratings of pod senescence. However, the lack of scalable and accurate methods to extract the desired information from the images remains a significant bottleneck in breeding programs. The objective of this study was to develop an image-based high-throughput phenotyping system for evaluating soybean maturity in breeding programs. Images were acquired twice a week, starting when the earlier lines began maturation until the latest ones were mature. Two complementary convolutional neural networks (CNN) were developed to predict the maturity date. The first using a single date and the second using the five best image dates identified by the first model. The proposed CNN architecture was validated using more than 15,000 ground truth observations from five trials, including data from three growing seasons and two countries. The trained model showed good generalization capability with a root mean squared error lower than two days in four out of five trials. Four methods of estimating prediction uncertainty showed potential at identifying different sources of errors in the maturity date predictions. The architecture used solves limitations of previous research and can be used at scale in commercial breeding programs.
ARTICLE | doi:10.20944/preprints202201.0090.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: formula detection; Hybrid Task Cascade network; mathematical expression detection; document image analysis; deep neural networks; computer vision
Online: 6 January 2022 (12:56:23 CET)
This work presents an approach for detecting mathematical formulas in scanned document images. The proposed approach is end-to-end trainable. Since many OCR engines cannot reliably work with the formulas, it is essential to isolate them to obtain the clean text for information extraction from the document. Our proposed pipeline comprises a hybrid task cascade network with deformable convolutions and a Resnext101 backbone. Both of these modifications help in better detection. We evaluate the proposed approaches on the ICDAR-2017 POD and Marmot datasets and achieve an overall accuracy of 96% for the ICDAR-2017 POD dataset. We achieve an overall reduction of error of 13%. Furthermore, the results on Marmot datasets are improved for the isolated and embedded formulas. We achieved an accuracy of 98.78% for the isolated formula and 90.21% overall accuracy for embedded formulas. Consequently, it results in an error reduction rate of 43% for isolated and 17.9% for embedded formulas.