ARTICLE | doi:10.20944/preprints202311.1851.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Speech enhancement; Noise suppression; Deep learning; Variational autoencoders
Online: 29 November 2023 (06:25:59 CET)
This paper presents an approach to enhancing the clarity and intelligibility of speech in digital communications compromised by various background noises. Utilizing deep learning techniques, specifically a Variational Autoencoder (VAE) with 2D convolutional filters, we aim to suppress background noise in audio signals. Our method focuses on four simulated environmental noise scenarios: storms, wind, traffic, and aircraft. Training dataset has been obtained from public sources (TED-LIUM 3 dataset, which includes audio recordings from the popular TED-TALK series) combining with these background noises. The audio signals were transformed into 2D power spectrograms, upon which our VAE model was trained to filter out the noise and reconstruct clean audio. Our results demonstrate that the model outperforms existing state-of-the-art solutions in noise sup-pression. Although differences in noise types were observed, it was challenging to definitively conclude which background noise most adversely affects speech quality. Results have been assessed with objective methods (mathematical metrics) and subjective (listening to a set of audios by humans). Notably, wind noise showed the smallest deviation between the noisy and cleaned audio, perceived subjectively as the most improved scenario. Future work involves refining the phase calculation of the cleaned audio and creating a more balanced dataset to minimize differences in audio quality across scenarios. Additionally, practical ap-plications of the model in real-time streaming audio are envisaged. This research contributes significantly to the field of audio signal processing by offering a deep learning solution tailored to various noise conditions, enhancing digital communication quality.
ARTICLE | doi:10.20944/preprints202108.0570.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: anomaly detection; anomaly segmentation; self-attention; transformers; autoencoders
Online: 31 August 2021 (11:47:08 CEST)
Anomaly detection and segmentation aim at distinguishing abnormal images from normal images and further localizing the anomalous regions. Feature reconstruction based method has become one of the mainstream methods for this task. This kind of method has two assumptions: (1) The features extracted by neural network is a good representation of the image. (2) The autoencoder solely trained on the features of normal images cannot reconstruct the features of anomalous regions well. But these two assumptions are hard to meet. In this paper, we propose a new anomaly segmentation method based on feature reconstruction. Our approach mainly consists of two parts: (1) We use a pretrained vision transformer (ViT) to extract the features of the input image. (2) We design a self-attention autoencoder to reconstruct the features. We regard that the self-attention operation which has a global receptive field is beneficial to the methods based on feature reconstruction both in feature extraction and reconstruction. The experiments show that our method outperforms the state-of-the-art approaches for anomaly segmentation on the MVTec dataset. It is both effective and time-efficient.
ARTICLE | doi:10.20944/preprints202101.0344.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: attitude estimation; autoencoders; deep learning; denoising; Kalman filter; underwater environment
Online: 18 January 2021 (14:22:52 CET)
One of the main issues for underwater robots navigation is represented by the accurate vehicle positioning, which heavily depends on the orientation estimation phase. The systems employed to this scope are affected by different noise typologies, mainly related to the sensors and to the irregular noise of the underwater environment. Filtering algorithms can reduce their effect if opportunely configured, but this process usually requires fine techniques and time. This paper presents DANAE++, an improved denoising autoencoder based on DANAE, which is able to recover Kalman Filter IMU/AHRS orientation estimations from any kind of noise, independently of its nature. This deep learning-based architecture already proved to be robust and reliable, but in its enhanced implementation significant improvements are obtained both in terms of results and performance. In fact, DANAE++is able to denoise the three angles describing the attitude at the same time, and that is verified also on the estimations provided by the more performing Extended KF. Further tests could make this method suitable for real-time applications on navigation tasks.
ARTICLE | doi:10.20944/preprints201906.0062.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: Hyperspectral Imagery, Machine Learning, Atmospheric Compensation, Autoencoders, Radiative Transfer Modeling
Online: 7 June 2019 (14:45:54 CEST)
The increasing spatial and spectral resolution of hyperspectral imagers yields detailed spectroscopy measurements from both space-based and airborne platforms. Machine learning algorithms have achieved state-of-the-art material classification performance on benchmark hyperspectral data sets; however, these techniques often do not consider varying atmospheric conditions experienced in a real-world detection scenario. To reduce the impact of atmospheric effects in the at-sensor signal, atmospheric compensation must be performed. Radiative Transfer (RT) modeling can generate high-fidelity atmospheric estimates at detailed spectral resolutions, but is often too time-consuming for real-time detection scenarios. This research utilizes machine learning methods to perform dimension reduction on the transmittance, upwelling radiance, and downwelling radiance (TUD) data to create high accuracy atmospheric estimates with lower computational cost than RT modeling. The utility of this approach is investigated using the instrument line shape for the Mako long-wave infrared hyperspectral sensor. This study employs physics-based metrics and loss functions to identify promising dimension reduction techniques. As a result, TUD vectors can be produced in real-time allowing for atmospheric compensation across diverse remote sensing scenarios.
ARTICLE | doi:10.20944/preprints202308.0131.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: deep generative model (DGM); Variational Autoencoders (VAE); Generative Adversarial Network (GAN)
Online: 2 August 2023 (03:39:21 CEST)
Generative artificial intelligence (GenAI) has been developing with many incredible achievements like ChatGPT and Bard. Deep generative model (DGM) is a branch of GenAI, which is preeminent in generating raster data such as image and sound due to strong points of deep neural network (DNN) in inference and recognition. The built-in inference mechanism of DNN, which simulates and aims to synaptic plasticity of human neuron network, fosters generation ability of DGM which produces surprised results with support of statistical flexibility. Two popular approaches in DGM are Variational Autoencoders (VAE) and Generative Adversarial Network (GAN). Both VAE and GAN have their own strong points although they share and imply underline theory of statistics as well as incredible complex via hidden layers of DNN when DNN becomes effective encoding/decoding functions without concrete specifications. In this research, I try to unify VAE and GAN into a consistent and consolidated model called Adversarial Variational Autoencoders (AVA) in which VAE and GAN complement each other, for instance, VAE is good at generator by encoding data via excellent ideology of Kullback-Leibler divergence and GAN is a significantly important method to assess reliability of data which is realistic or fake. In other words, AVA aims to improve accuracy of generative models, besides AVA extends function of simple generative models. In methodology this research focuses on combination of applied mathematical concepts and skillful techniques of computer programming in order to implement and solve complicated problems as simply as possible.
ARTICLE | doi:10.20944/preprints202309.1733.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Adversarial robustness; adversarial attacks; adversarial purification; knowledge distillation; image classification; convolutional autoencoders
Online: 26 September 2023 (05:39:42 CEST)
Despite the impressive performance of deep neural networks on many different vision tasks, they have been known to be vulnerable to intentionally added noise to input images. To combat these adversarial examples (AEs), improving the adversarial robustness of models has emerged as an important research topic, and research has been conducted in various directions including adversarial training, image denoising, and adversarial purification. Among them, this paper focuses on adversarial purification, which is a kind of pre-processing that removes noise before AEs enter a classification model. The advantage of adversarial purification is that it can improve robustness without affecting the model’s nature, while another defense techniques like adversarial training suffer from a decrease in model accuracy. Our proposed purification framework utilizes a Convolutional Autoencoder as a base model to capture the features of images and their spatial structure. We further aim to improve the adversarial robustness of our purification model by distilling the knowledge from teacher models. To this end, we train two Convolutional Autoencoders (teachers), one with adversarial training and the other with normal training. Then, through ensemble knowledge distillation, we transfer the ability of denoising and restoring of original images to the student model (purification model). Our extensive experiments confirm that our student model achieves high purification performance(i.e., how accurately a pre-trained classification model classifies purified images). The ablation study confirms the positive effect of our idea of ensemble knowledge distillation from two teachers on performance.
ARTICLE | doi:10.20944/preprints201906.0104.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Deep Learning, Generative Adversarial Networks (GANs), Machine Learning, Autoencoders, Voice Conversion, Ethics, CycleGANs
Online: 12 June 2019 (11:17:52 CEST)
The upsurge of Generative Adversarial Networks (GANs) in the previous five years has led to advancements in unsupervised data manipulation, sourced feature translation, and precise input-output synthesis through a competitive optimization of the discriminator and generator networks. More specifically, the recent rise of cycle-consistent GANs enables style transfers from a discrete source (input A) to target domain (input B) by preprocessing object features for a multi-discriminative adversarial network. Traditionally, cyclical adversarial networks have been exploited for unpaired image-to-image translation and domain adaptation by determining mapped relationships between an input A graphic and an input B graphic. However, this integral mechanism of domain adaptation can be applied to the complex acoustical features of human speech. Although well-established datasets, such as the 2018 Voice Conversion Challenge repository, paved way for female-male voice transformation, cycle-GANs have rarely been re-engineered for voices outside the datasets. More critically, cycle-GANs have massive potential to extract surface-level and hidden feature to distort an input A source into a texturally unrelated target voice. By preprocessing, compressing, and packaging unique acoustical voice properties, CycleGANs can learn to decompose speech signals and implement new translation models while preserving emotion, the intent of words, rhythm, and accents. Due to the potential of CycleGAN’s autoencoder in realistic unsupervised voice-voice conversion/feature adaptation, the researchers raise the ethical implications of controlling source input A to manipulate target voice B, particularly in cases of defamation and sabotage of target B’s words. This paper analyzes the potential of cycle-consistent GANs in deceptive voice-voice conversion by manipulating interview excerpts of political candidates.
Subject: Computer Science And Mathematics, Information Systems Keywords: trajectory data analytics; air traffic flows; anomaly detection; air traffic management; machine learning; autoencoders
Online: 21 December 2019 (12:23:31 CET)
A large amount of data is produced every day by stakeholders of the Air Traffic Management (ATM) system, in particular airline operators, airports, and air navigation service provider (ANSP). Most data is kept private for many reasons, including commercial and security concerns. More than data, shared information is precious, as it leverages intelligent decision-making support tools designed to smooth daily operations. We present a framework to detect, identify and characterise anomalies in past aircraft trajectory data. It is based on an open source of ADS-B based aircraft trajectories, and extracted information can benefit a wide range of stakeholders: Air Traffic Control (ATC) training centres could play more realistic simulations; ANSP may improve capacity indicators; academics improve safety models and risk estimations; and commercial stakeholders, like airlines and airports, may use such information to improve short-term predictions and optimise their operations. The technique is based on autoencoding artificial neural networks applied on flows of trajectories, which provide a useful reading grid associating cluster analysis with quantified level of abnormality. In particular, we find that the highest anomaly scores correspond to poor weather conditions, whereas anomalies with a lower score relate to ATC tactical actions.
ARTICLE | doi:10.20944/preprints202311.0231.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Single image super-resolution; deep learning; autoencoders; convolutional neural net- 13 works; convolution; transpose convolution; skipped connections
Online: 3 November 2023 (09:49:18 CET)
Single Image Super Resolution (SSIR) is a problem in computer vision where the goal is 1 to create high-resolution images from low-resolution ones. It has important applications in fields 2 such as medical imaging and security surveillance. While traditional methods such as interpolation 3 and reconstruction-based models have been used in the past, deep learning techniques have recently 4 gained attention due to their superior performance and computational efficiency. This article proposes 5 an Autoencoder based Deep Learning Model for SSIR, in particular, a light model that uses fewer 6 parameters without compromising performance. The down-sampling part of the Autoencoder 7 mainly uses 3 by 3 convolution and has no subsampling layers. The up-sampling part uses transpose 8 convolution and residual connections from the down sampling part. The model is trained using a 9 subset of the VILRC ImageNet database. The model is evaluated using quantitative metrics PSNR, 10 SSIM as well as qualitative measures such as perceptual quality. PSNR and SSIM figures as high as 11 76.06 and 0.93 are reported.
ARTICLE | doi:10.20944/preprints202305.0982.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Alzheimer’s Disease; SH-SY5Y cells; Nuclear Magnetic Resonance (NMR); Convolutional autoencoders; Embedding of NMR spectra; Data augmentation.
Online: 15 May 2023 (05:42:08 CEST)
Alzheimer’s Disease (AD) affects the quality of life of millions of people worldwide and represents one of the biggest challenges for the whole society. The SH-SY5Y neuroblastoma cell line is often used as an in vitro model of neuronal function and is widely applied to study the molecular events leading to AD. In the last few years, basic research on SH-SY5Y cells has provided interesting insights for the discovery of new drugs and biomarkers for improved AD treatment and diagnosis. At the same time, untargeted NMR metabolomics is widely applied on biological fluids for (i) metabolic profile analysis, (ii) screening for differential metabolites, (iii) analysis of metabolic pathways, and (iv) the discovery of new biomarkers. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) have proved to be powerful methods for processing NMR data, being useful in signal quantization, even if more sophisticated --- typically non--linear --- techniques are needed to obtain compact yet information--rich embeddings for complex spectra. In this paper, a compression technique based on convolutional autoencoders is proposed, which can perform a high dimensionality reduction of the spectral signal (up to more than 300 times), maintaining informative features (guaranteed by a reconstruction error always smaller than 5%). Moreover, before compression, an ad hoc preprocessing method was devised to remedy the scarcity of available data. The compressed spectral data were then used to train some SVM classifiers to distinguish diseased from healthy cells, achieving an accuracy close to 78%, a significantly better performance with respect to using PCA--compressed data.
ARTICLE | doi:10.20944/preprints202109.0389.v1
Subject: Engineering, Control And Systems Engineering Keywords: Deep learning; Variational Autoencoders (VAEs); data representation learning; generative models; unsupervised learning; few shot learning; latent space; transfer learning
Online: 22 September 2021 (16:04:22 CEST)
Despite the importance of few-shot learning, the lack of labeled training data in the real world, makes it extremely challenging for existing machine learning methods as this limited data set does not represent the data variance well. In this research, we suggest employing a generative approach using variational autoencoders (VAEs), which can be used specifically to optimize few-shot learning tasks by generating new samples with more intra-class variations. The purpose of our research is to increase the size of the training data set using various methods to improve the accuracy and robustness of the few-shot face recognition. Specifically, we employ the VAE generator to increase the size of the training data set, including the basic and the novel sets while utilizing transfer learning as the backend. Based on extensive experimental research, we analyze various data augmentation methods to observe how each method affects the accuracy of face recognition. We conclude that the face generation method we proposed can effectively improve the recognition accuracy rate to 96.47% using both the base and the novel sets.
ARTICLE | doi:10.20944/preprints202005.0444.v1
Subject: Computer Science And Mathematics, Computational Mathematics Keywords: restricted Boltzmann machine; contrastive divergence; extreme learning machine; online sequential extreme learning machine; autoencoders; deep belief network; deep learning
Online: 27 May 2020 (08:18:39 CEST)
Abstract: The main contribution of this paper is to introduce a new iterative training algorithm for restricted Boltzmann machines. The proposed learning path is inspired from online sequential extreme learning machine one of extreme learning machine variants which deals with time accumulated sequences of data with fixed or varied sizes. Recursive least squares rules are integrated for weights adaptation to avoid learning rate tuning and local minimum issues. The proposed approach is compared to one of the well known training algorithms for Boltzmann machines named “contrastive divergence”, in term of time, accuracy and algorithmic complexity under the same conditions. Results strongly encourage the new given rules during data reconstruction.
ARTICLE | doi:10.20944/preprints202107.0385.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Visual Question Generation; Visual Question Answering; Variational Autoencoders; Radiology Images; Domain Knowledge; UMLS; Data Augmentation; Computer Vision; Natural Language Processing; Artificial Intelligence; Medical Domain.
Online: 16 July 2021 (16:18:56 CEST)
Visual Question Generation (VQG) from images is a rising research topic in both fields of natural language processing and computer vision. Although there are some recent efforts towards generating questions from images in the open domain, the VQG task in the medical domain has not been well-studied so far due to the lack of labeled data. In this paper, we introduce a goal-driven VQG approach for radiology images called VQGRaD that generates questions targeting specific image aspects such as modality and abnormality. In particular, we study generating natural language questions based on the visual content of the image and on additional information such as the image caption and the question category. VQGRaD encodes the dense vectors of different inputs into two latent spaces, which allows generating, for a specific question category, relevant questions about the images, with or without their captions. We also explore the impact of domain knowledge incorporation (e.g., medical entities and semantic types) and data augmentation techniques on visual question generation in the medical domain. Experiments performed on the VQA-RAD dataset of clinical visual questions showed that VQGRaD achieves 61.86% BLEU score and outperforms strong baselines. We also performed a blinded human evaluation of the grammaticality, fluency, and relevance of the generated questions. The human evaluation demonstrated the better quality of VQGRaD outputs and showed that incorporating medical entities improves the quality of the generated questions. Using the test data and evaluation process of the ImageCLEF 2020 VQA-Med challenge, we found that relying on the proposed data augmentation technique to generate new training samples by applying different kinds of transformations, can mitigate the lack of data, avoid overfitting, and bring a substantial improvement in medical VQG.
ARTICLE | doi:10.20944/preprints202203.0403.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: behavioral change prediction; learned features; deep feature learning; handcrafted features; bidirectional long-short term memory; autoencoders; temporal convolutional neural network; clinical decision support system; multisensory stimulation therapy; physiological signals.
Online: 31 March 2022 (08:38:58 CEST)
Predicting change from multivariate time series has relevant applications ranging from medical to engineering fields. Multisensory stimulation therapy in patients with dementia aims to change the patient’s behavioral state. For example, patients who exhibit a baseline of agitation may be paced to change their behavioral state to relaxed. This study aims to predict changes in behavioral state from the analysis of the physiological and neurovegetative parameters to support the therapist during the stimulation session. In order to extract valuable indicators for predicting changes, both handcrafted and learned features were evaluated and compared. The handcrafted features were defined starting from the CATCH22 feature collection, while the learned ones were extracted using a Temporal Convolutional Network, and the behavioral state was predicted through Bidirectional Long Short-Term Memory Auto-Encoder, operating jointly. From the comparison with the state-of-the-art, the learned features-based approach exhibits superior performance with accuracy rates of up to 99.42% with a time window of 70 seconds and up to 98.44% with a time window of 10 seconds.