ARTICLE | doi:10.20944/preprints202303.0517.v1
Subject: Biology And Life Sciences, Behavioral Sciences Keywords: Autism spectrum disorder; Auditory stream segregation; Hearing assistive technology; Speech-in-noise perception; Tonal language speakers
Online: 30 March 2023 (02:52:15 CEST)
Purpose: Hearing assistive technology (HAT) has been shown to be a viable solution to the speech-in-noise perception (SPIN) issue in children with autism spectrum disorder (ASD); however, little is known about its efficacy in tonal language speakers. This study compared sentence-level SPIN performance between Chinese children with ASD and neurotypical (NT) children and evaluated HAT use in improving SPIN performance and easing SPIN difficulty. Methods: Children with ASD (n=26) and NT children (n=19) aged 6-12 performed two adaptive tests in steady-state noise and three fixed-level tests in quiet and steady-state noise with and without using HAT. Speech recognition thresholds (SRT) and accuracy rates were assessed using adaptive and fixed-level tests, respectively. Parents or teachers of the ASD group completed a questionnaire regarding children’s listening difficulty under six circumstances before and after a ten-day trial period of HAT use. Results: Although the two groups of children had comparable SRTs, the ASD group showed a significantly lower SPIN accuracy rate than the NT group. Also, a significant impact of noise was found in the ASD group’s accuracy rate, but not in the NT group’s. There was a general improvement in the ASD group’s SPIN performance with HAT and a decrease in their listening difficulty ratings across all conditions after the device trial. Conclusion: The findings indicated inadequate SPIN in the ASD group using a relatively sensitive measure to gauge SPIN performance among children. The markedly increased accuracy rate in noise during HAT-on sessions for the ASD group confirmed the feasibility of HAT for improving SPIN performance in controlled laboratory settings, and the reduced post-use ratings of listening difficulty further confirmed the benefits of HAT use in daily scenarios.
ARTICLE | doi:10.20944/preprints202309.1636.v1
Subject: Social Sciences, Language And Linguistics Keywords: Autism spectrum conditions; Atypical resource allocation; Listening effort; Pupillometry; Speech-in-noise recognition
Online: 26 September 2023 (03:10:24 CEST)
Purpose: School-age children with autism spectrum conditions (ASC) often experience difficulties in speech-in-noise (SiN) perception, leading to increased listening effort that impacts their well-being and academic performance. This study aimed to investigate the SiN processing challenges faced by Mandarin-speaking children with ASC and its impact on their listening effort. Methods: Participants completed sentence recognition tests in both quiet and noisy conditions, with a steady-state noise masker presented at 0 dB signal-to-noise ratio in the noisy condition. We compared recognition accuracy and task-evoked pupil responses from 23 Mandarin-speaking children with ASC to 19 age-matched neurotypical (NT) counterparts to gauge their behavioral performance and listening effort during these auditory tasks. Results: The ASC group demonstrated notably decreased accuracy in noise compared to their NT peers, suggesting poorer SiN perception. Pupillometric data further revealed significantly larger peak dilations in the ASC group than in the NT group under comparable conditions. Importantly, the ASC group's peak dilation in quiet mirrored the NT group's in noise. However, the ASC group exhibited shorter peak latencies and reduced mean dilations than the NT group in similar conditions. Such patterns indicate the ASC group might initially experience a heightened cognitive load but utilize fewer cognitive resources as the task continued, indicating an atypical allocation of cognitive resources and a potential tendency towards relatively superficial and automated auditory processing. Conclusion: Our findings highlight the unique SiN processing challenges children with ASC face, underscoring the importance of a nuanced, individual-centric approach for interventions and support.
REVIEW | doi:10.20944/preprints202309.0505.v1
Subject: Medicine And Pharmacology, Otolaryngology Keywords: cochlear implant; patient-reported outcomes; pure tone average; speech in noise; music perception
Online: 7 September 2023 (11:22:04 CEST)
Electric stimulation via a Cochlear Implant (CI) enables people with severe to profound sensorineural hearing loss to regain speech understanding and music appreciation and thus allowing them to actively engage in social life. Three main manufacturers (Cochlear, MED-EL and Advanced Bionics “AB”) have been offering CI systems, thus challenging CI recipients and Otolaryngologists with a difficult decision, as currently no comprehensive overview or meta-analyses on performance outcome following CI implantation is available. The main goal of this scoping review is to provide evidence that data and standardized speech and music performance tests are available for performing such comparisons. To this end, a literature search was conducted to find studies that address speech and music outcomes in CI recipients. From a total of 1592 papers, 188 paper abstracts were analyzed and 147 articles were found suitable for examination of full text. From which, 42 studies were included for synthesis. A total of 16 studies used the consonant-nucleus-consonant (CNC) word recognition test in quiet at 60db SPL. We found that aside from technical comparisons, only very few publications compare speech outcomes across manufacturers of CI systems. Evidence suggests though, that these data are available in large CI centers in Germany and US. Future studies should therefore leverage large data cohorts to perform such comparisons that could provide critical evaluation criteria and assist both CI recipients and Otolaryngologists to make informed performance-based decisions.
ARTICLE | doi:10.20944/preprints202104.0651.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: speech processing, data augmentation, speech emotion recognition, generative adversarial net-works
Online: 26 April 2021 (10:49:55 CEST)
Nowadays, and with the mechanization of life, speech processing has become so crucial for the interaction between humans and machines. Deep neural networks require a database with enough data for training. The more features are extracted from the speech signal, the more samples are needed to train these networks. Adequate training of these networks can be ensured when there is access to sufficient and varied data in each class. If there is not enough data; it is possible to use data augmentation methods to obtain a database with enough samples. One of the obstacles to developing speech emotion recognition systems is the Data sparsity problem in each class for neural network training. The current study has focused on making a cycle generative adversarial network for data augmentation in a system for speech emotion recognition. For each of the five emotions employed, an adversarial generating network is designed to generate data that is very similar to the main data in that class, as well as differentiate the emotions of the other classes. These networks are taught in an adversarial way to produce feature vectors like each class in the space of the main feature, and then they add to the training sets existing in the database to train the classifier network. Instead of using the common cross-entropy error to train generative adversarial networks and to remove the vanishing gradient problem, Wasserstein Divergence has been used to produce high-quality artificial samples. The suggested network has been tested to be applied for speech emotion recognition using EMODB as training, testing, and evaluating sets, and the quality of artificial data evaluated using two Support Vector Machine (SVM) and Deep Neural Network (DNN) classifiers. Moreover, it has been revealed that extracting and reproducing high-level features from acoustic features, speech emotion recognition with separating five primary emotions has been done with acceptable accuracy.
ARTICLE | doi:10.20944/preprints202106.0296.v1
Subject: Social Sciences, Psychology Keywords: reading comprehension; speech-in-noise recognition; nature F0 contours; flattened F0 contours; Chinese character decoding
Online: 10 June 2021 (13:36:17 CEST)
Theories of reading comprehension emphasize decoding and listening comprehension as two essential components. The current study aimed to investigate how Chinese character decoding and context-driven auditory semantic integration contribute to reading comprehension in Chinese middle school students. Seventy-five middle school students were tested. Context-driven auditory semantic integration was assessed with speech-in-noise tests in which the fundamental frequency (F0) contours of spoken sentences were either kept natural or acoustically flattened with the latter requiring a higher degree of contextual information. Statistical modelling with hierarchical regression was conducted to examine the contributions of Chinese character decoding and context-driven auditory semantic integration to reading comprehension. Performance on Chinese character decoding and auditory semantic integration scores with the flattened (but not natural) F0 sentences significantly predicted reading comprehension. Furthermore, the contributions of these two factors to reading comprehension were better fitted with an additive model instead of a multiplicative model. These findings indicate that reading comprehension in middle schoolers is associated with not only character decoding but also the listening ability to make better use of the sentential context for semantic integration in a severely degraded speech-in-noise condition. The results add to our better understanding of the multi-faceted reading comprehension in children. Future research could further address the age-dependent development and maturation of reading skills by examining and controlling other important cognitive variables, and apply neuroimaging techniques such as functional magmatic resonance imaging to reveal the neural substrates for the contribution of auditory semantic integration and the observed additive model to reading comprehension.
ARTICLE | doi:10.20944/preprints202310.0722.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Audio-visual speech; emotion recognition; children
Online: 13 October 2023 (07:11:19 CEST)
Detecting and understanding emotions is critical for our daily activities. As emotion recognition (ER) systems develop, we start looking at more difficult cases than just acted adult audio-visual speech. In this work, we investigate automatic classification of audio-visual emotional speech of children. Our interest is, specifically, in the improvement of the utilization of the cross-modal relationships between the selected modalities: video and audio. To underscore the importance of developing ER systems for the real-world environment, we present a corpus of children’s emotional audio-visual speech that we collected. We select a state-of-the-art model as a baseline for the purposes of comparison and present several modifications focused on a deeper learning of the cross-modal relationships. By conducting experiments with our proposed approach and the selected baseline model, we observe a relative improvement in performance by 2%. Finally, we conclude that focusing more on the cross-modal relationships may be beneficial for building ER systems for child-machine communications and the environments where qualified professionals work with children.
ARTICLE | doi:10.20944/preprints202311.1851.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Speech enhancement; Noise suppression; Deep learning; Variational autoencoders
Online: 29 November 2023 (06:25:59 CET)
This paper presents an approach to enhancing the clarity and intelligibility of speech in digital communications compromised by various background noises. Utilizing deep learning techniques, specifically a Variational Autoencoder (VAE) with 2D convolutional filters, we aim to suppress background noise in audio signals. Our method focuses on four simulated environmental noise scenarios: storms, wind, traffic, and aircraft. Training dataset has been obtained from public sources (TED-LIUM 3 dataset, which includes audio recordings from the popular TED-TALK series) combining with these background noises. The audio signals were transformed into 2D power spectrograms, upon which our VAE model was trained to filter out the noise and reconstruct clean audio. Our results demonstrate that the model outperforms existing state-of-the-art solutions in noise sup-pression. Although differences in noise types were observed, it was challenging to definitively conclude which background noise most adversely affects speech quality. Results have been assessed with objective methods (mathematical metrics) and subjective (listening to a set of audios by humans). Notably, wind noise showed the smallest deviation between the noisy and cleaned audio, perceived subjectively as the most improved scenario. Future work involves refining the phase calculation of the cleaned audio and creating a more balanced dataset to minimize differences in audio quality across scenarios. Additionally, practical ap-plications of the model in real-time streaming audio are envisaged. This research contributes significantly to the field of audio signal processing by offering a deep learning solution tailored to various noise conditions, enhancing digital communication quality.
BRIEF REPORT | doi:10.20944/preprints202310.0690.v2
Subject: Computer Science And Mathematics, Other Keywords: Keyword Detection; Audio Models; Speech Processing
Online: 7 November 2023 (02:34:57 CET)
This study introduces an original comprehensive system centered on identifying specific terms that indicate a user's position, particularly the discrete values representing latitude and longitude. This system not only detects these terms but also retrieves the corresponding numerical data for accurate and efficient determination of locations. The importance of this study can be applied various fields, notably aiding offline operations of military personnel, who often lack internet access. In such scenarios, precise awareness of location is vital for strategic manoeuvres, rescue operations, and navigating unfamiliar landscapes. The system allows these personnel by allowing them to extract exact location coordinates from spoken terms, thereby enhancing their awareness even in challenging surroundings. Apart from its military utility, the project holds broader significance. Teams responding to emergencies, personnel involved in disaster management, and exploratory missions can all gain from this technology during disruptions in communication infrastructure. Furthermore, travelers, adventurers, and outdoor enthusiasts can utilize this system to accurately determine their positions in remote areas without relying on online maps. We used offline speech recognition techniques to precisely transcribe spoken terms, achieving an accuracy of over 91.3% and a word error rate of 4.2%. For sound recognition, the OpenAI Whisper model was used, and a conversion process from SpeechRecognition to AudioSegmentation was implemented, followed by transforming the audio into .wav format, we have also developed the interface of the app to use it efficiently using Streamlit. This was done to ensure seamless compatibility with the Whisper model and uninterrupted audio input. By training the system to identify specific linguistic linked to location, it achieves robust detection and extraction of relevant terms. This approach eliminates the necessity for constant internet connectivity, rendering it exceptionally useful in remote, offline, and resource-limited situations.
ARTICLE | doi:10.20944/preprints202208.0109.v1
Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: speech emotion recognition; affective computing; data augmentations; wav2vec 2.0; SVM
Online: 4 August 2022 (14:09:21 CEST)
Data augmentation techniques recently gained more adoption in speech processing, including speech emotion recognition. Although more data tends to be more effective, there may be a trade-off in which more data will not provide a better model. This paper reports experiments on investigating the effects of data augmentation in speech emotion recognition. The investigation aims at finding the most useful type of data augmentation and the number of data augmentations for speech emotion recognition. The experiments are conducted on the Japanese Twitter-based emotional speech corpus. The results show that for speaker-independent data, two data augmentations with glottal source extraction and silence removal exhibited the best performance among others, even with more data augmentation techniques. For the text-independent data (including speaker and text-independent), more data augmentations tend to improve speech emotion recognition performances. The results highlight the trade-off between the number of data augmentation and the performance of speech emotion recognition showing the necessity to choose a proper data augmentation technique for a specific application.
ARTICLE | doi:10.20944/preprints201903.0047.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Gender Recognition; Speech Signal; Deep Learning; Evolutionary Search; PSO search; Wolf Search
Online: 4 March 2019 (13:42:02 CET)
The speech entailed in human voice comprises essentially para-linguistic information used in many voice-recognition applications. Gender voice-recognition is considered one of the pivotal parts to be detected from a given voice, a task that involves certain complications. In order to distinguish gender from a voice signal, a set of techniques have been employed to determine relevant features to be utilized for building a model from a training set. This model is useful for determining the gender (i.e, male or female) from a voice signal. The contributions are involved in two folds: (i) providing analysis information about well-known voice signal features using a prominent dataset, (ii) studying various machine learning models of different theoretical families to classify the voice gender, and (iii) using three prominent feature selection algorithms to find promisingly optimal features for improving classification models. Experimental results show the importance of sub-features over others, which are vital for enhancing the efficiency of classification models performance. Experimentation reveals that the best recall value is equal to 99.97%; 99.7% of two models of Deep Learning (DL) and Support Vector Machine (SVM) and with feature selection the best recall value is 100% for SVM techniques.
ARTICLE | doi:10.20944/preprints202102.0156.v1
Subject: Social Sciences, Anthropology Keywords: ANN; NN; Speech Recognition; interaction; hybrid method
Online: 5 February 2021 (10:58:40 CET)
Human and Computer interaction has been a part of our day-to-day life. Speech is one of the essential and comfortable ways of interacting through devices as well as a human being. The device, particularly smartphones have multiple sensors in camera and microphone, etc. speech recognition is the process of converting the acoustic signal to a smartphone as a set of words. The efficient performance of the speech recognition system highly enhances the interaction between humans and machines by making the latter more receptive to user needs. The recognized words can be applied for many applications such as Commands & Control, Data entry, and Document preparation. This research paper highlights speech recognition through ANN (Artificial Neural Network). Also, a hybrid model is proposed for audio-visual speech recognition of the Tamil and Malay language through SOM (Self-organizing map0 and MLP (Multilayer Perceptron). The Effectiveness of the different models of NN (Neural Network) utilized in speech recognition will be examined.
ARTICLE | doi:10.20944/preprints202305.0247.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Speech emotion recognition; one-dimensional neural network; LSTM; CNN; MFCCs
Online: 4 May 2023 (09:45:11 CEST)
In recent years, with the popularity of smart mobile devices, the interaction between devices and users, especially in the form of voice interaction, has become increasingly important. If smart devices can understand more users' emotional states through voice data, more customized services can be provided for users. This paper proposes a novel machine learning model for speech emotion recognition, which combines convolutional neural networks (CNN), long short-term memory neural networks (LSTM), and deep neural networks (DNN), called CLDNN. To make the designed system can recognize the audio signal closer to the human auditory system does, this article uses the Mel frequency cepstral coefficients (MFCCs) of audio data as the input of the machine learning model. First, the MFCCs of the voice signal is extracted as the input of the model, and the feature values of the data are calculated using several local feature learning blocks (LFLB) composed of one-dimensional CNN. Because the audio signals are time-series data, the feature values obtained from LFLBs then input into LSTM layer to enhance the learning on time-series level. Finally, fully connected layers are used for classification and prediction. Three databases RAVDESS, EMO-DB and IEMOCAP are used for the experiments in this paper. The experimental results show that the proposed method can improve the accuracy compared to other related researches in speech emotion recognition.
ARTICLE | doi:10.20944/preprints202302.0035.v1
Subject: Medicine And Pharmacology, Otolaryngology Keywords: Speech-in-noise hearing difficulties; Hidden hearing loss (HHL); hearing aids; self-report; Reaction time; Ecologically momentary assessment (EMA)
Online: 2 February 2023 (08:37:41 CET)
Objective: This study assessed hearing aid benefits for people with a normal audiogram but hearing-in-noise problems in everyday listening situations. Design: Exploratory double-blinded case control study whereby participants completed retrospective questionnaires, ecological momentary assessments, speech-in-noise testing, and mental effort testing with and without hearing aids. Twenty-seven adults reporting speech-in-noise problems but normal air-conduction pure-tone audiometry took part in the study. They were randomly separated into an experimental group who trialled mild-gain hearing aids with advanced directional processing and a control group fitted with hearing aids with no gain or directionality. Results: Self-reports showed mild-gain hearing aids reduce hearing-in-noise difficulties and provide a better hearing experience (i.e., improved understanding, participation, and mood). Despite the self-reported benefits, the laboratory tests did not reveal a benefit from the mild-gain hearing aids, with no group differences on speech-in-noise tests or mental effort measures. Further, participants found the elevated cost of hearing aids to be a barrier for their adoption. Conclusions: Hearing aids benefit the listening experience in some listening situations for people with normal audiogram who report hearing difficulties in noise. Decreasing the price of hearing aids may lead to greater accessibility to those seeking remediation for their communication needs.
ARTICLE | doi:10.20944/preprints202212.0426.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Speech Recognition; Keyword Spotting; Child abuse; Federated Learning; Whisper; Wav2vec2.0
Online: 22 December 2022 (09:27:37 CET)
The growth in online child exploitation material is a significant challenge for European Law Enforcement Agencies (LEAs). One of the most important sources of such online information corresponds to audio material that needs to be analyzed to find evidence in a timely and practical manner. That is why LEAs require a next-generation AI-powered platform to process audio data from online sources. We propose the use of speech recognition and keyword spotting to transcribe audiovisual data and to detect the presence of keywords related to child abuse. The considered models are based on two of the most accurate neural-based architectures to date: Wav2vec2.0 and Whisper. The systems are tested under an extensive set of scenarios in different languages. Additionally, keeping in mind that obtaining data from LEAs is very sensitive, we explore the use of federated learning to have more robust systems for the addressed application, while maintaining the privacy of the data to LEAs. The considered models achieved a word error rate between 11% and 25%, depending on the language. In addition, the systems are able to recognize a set of spotted words with true positives rates between 82% and 98%, depending on the language. Finally, federated learning strategies show that they can maintain and even improve the performance of the systems when compared to centralized trained models. The proposed systems sit the basis for an AI-powered platform for automatic analysis of audio in the context of forensic applications within child abuse. The use of federated learning is also promising for the addressed scenario, where data privacy is an important issue to be managed.
ARTICLE | doi:10.20944/preprints201811.0163.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Cymatics, Speech recognition, Mel-Frequency Cepstral Coefficients (MFCC), Dynamic time warping (DTW), Chladni plates
Online: 7 November 2018 (13:42:22 CET)
This paper propose an original approach of achieving a Cymatics based visual perception of isolated speech commands. The idea is to smartly combine the effective speech processing and analysis methods with the phenomena of Cymatics. In this context, an effective approach for automatic isolated speech based message recognition is proposed. The incoming speech segment is enhanced by applying the appropriate pre-emphasis filtering, noise thresholding and zero alignment operations. The Mel-Frequency Cepstral coefficients (MFCCs), Delta coefficients and Delta-Delta coefficients are extracted from the enhanced speech segment. Later on, the Dynamic Time Warping (DTW) technique is employed to compare these extracted features with the reference templates. The comparison outcomes are used to make the classification decision. The classification decision is transformed into a methodical excitation. Finally, this excitation is converted into the systematic visual perceptions via the phenomenon of Cymatics. The system functionality is tested with an experimental setup and results are presented. The approach is novel and can be employed in various applications like visual art, encryption, education, archeology, architecture, integration of impaired people, etc.
ARTICLE | doi:10.20944/preprints202205.0066.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: code-switching; automatic speech recognition; low resource languages; language modelling
Online: 6 May 2022 (09:09:31 CEST)
We present improvements in n-best rescoring of code-switched speech achieved by n-gram augmentation as well as optimised pretraining of long short-term memory (LSTM) language models with larger corpora of out-of-domain monolingual text. In addition, we consider the application of large pretrained transformer-based architectures. Our experimental evaluation is performed on an under-resourced corpus of code-switched speech comprising four bilingual code-switched sub-corpora, each containing a Bantu language (isiZulu, isiXhosa, Sesotho, or Setswana) and English. We find in our experiments that, by combining n-gram augmentation with the optimised pretraining strategy, speech recognition errors are reduced for each individual bilingual pair by 3.51% absolute on average over the four corpora. Importantly, we find that even speech recognition at language boundaries improves by 1.14% even though the additional data is monolingual. Utilising the augmented n-grams for lattice generation, we then contrast these improvements with those achieved after fine-tuning pretrained transformer-based models such as distilled GPT-2 and M-BERT. We find that, even though these language models have not been trained on any of our target languages, they can improve speech recognition performance even in zero-shot settings. After fine-tuning on in-domain data, these large architectures offer further improvements, achieving a 4.45% absolute decrease in overall speech recognition errors and a 3.52% improvement over language boundaries. Finally, a combination of the optimised LSTM and fine-tuned BERT models achieves a further gain of 0.47% absolute on average for three of the four language pairs compared to M-BERT. We conclude that the careful optimisation of the pretraining strategy used for neural network language models can offer worthwhile improvements in speech recognition accuracy even at language switches, and that much larger state-of-the-art architectures such as GPT-2 and M-BERT promise even further gains.
ARTICLE | doi:10.20944/preprints202108.0433.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Speech emotion recognition; Feature extraction; Heterogeneous parallel network; Spectral features; Prosodic features; Multi-feature fusion
Online: 23 August 2021 (12:16:40 CEST)
Speech emotion recognition remains a heavy lifting in natural language processing. It has strict requirements to the effectiveness of feature extraction and that of acoustic model. With that in mind, a Heterogeneous Parallel Convolution Bi-LSTM model is proposed to address these challenges. It consists of two heterogeneous branches: the left one contains two dense layers and a Bi-LSTM layer, while the right one contains a dense layer, a convolution layer, and a Bi-LSTM layer. It can exploit the spatiotemporal information more effectively, and achieves 84.65%, 79.67%, and 56.50% unweighted average recall on the benchmark databases EMODB, CASIA, and SAVEE, respectively. Compared with the previous research results, the proposed model achieves better performance stably.
ARTICLE | doi:10.20944/preprints202307.0886.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: signal analysis; mode recognition; noise coding; deep learning; attention mechanism
Online: 13 July 2023 (12:21:44 CEST)
Target recognition mainly includes three approaches: optical image-based, echo detection-based, and passive signal analysis-based methods. Among them, the passive signal-based method is closely integrated with practical applications due to its strong environmental adaptability. Based on passive radar signal analysis, we design an "end-to-end" model that cascades a noise estimation network with a recognition network to identify working modes in noise environment. The noise estimation network is implemented based on U-Net, which adopts a method of feature extraction and reconstruction to adaptively estimate the noise mapping level of the sample, which can help the recognition network to reduce noise interference. Focusing on the characteristics of radar signal, the recognition network is realized based on Multi-Scale Convolutional Attention Network (MSCANet). Firstly, the deep group convolution is used to isolate the channel interaction in the shallow network. Then, through the multi-scale convolution module, finer-grained features of the signal are extracted without increasing the complexity of the model. Finally, the self-attention mechanism is used to suppress the influence of low-correlation and negative-correlation channels and spaces. This method overcomes the problem that the conventional method is seriously disturbed by noise. We validated the proposed method in 81 kinds of noise environments, achieving an average accuracy of 94.65%. Additionally, we discussed the performance of six machine learning algorithms and four deep learning algorithms. Compared to these methods, proposed MSCANet achieved an accuracy improvement of approximately 17%. Our method demonstrates better generalization and robustness.
ARTICLE | doi:10.20944/preprints202309.1202.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: speech emotion recognition; deep learning; Deep Belief Network; deep neural network; Convolutional Neural Network; LSTM; attention mechanism
Online: 19 September 2023 (08:24:22 CEST)
Speech Emotion Recognition (SER) is an interesting and difficult problem to handle. In this paper, we deal with it through the implementation of deep learning networks. We have designed and implemented six different deep learning networks, a Deep Belief Network (DBN), a simple deep neural network (SDNN), a LSTM network (LSTM), a LSTM network with the addition of an attention mechanism (LSTM-ATN), a Convolutional neural network (CNN), and a Convolutional neural network with the addition of an attention mechanism (CNN-ATN), having in mind, apart from solving the SER problem, to test the impact of attention mechanism to the results. Dropout and Batch Normalization techniques are also used to improve the generalization ability (prevention of overfitting) of the models as well as to speed up the training process. The Surrey Audio-Visual Expressed Emotion database (SAVEE), and the Ryerson Audio-Visual Database (RAVDESS) database were used for training and evaluation of our models. The results showed that networks with the addition of the attention mechanism did better than the others. Furthermore, they showed that CNN-ATN was the best among tested networks, achieving an accuracy of 74% for the SAVEE and 77% for the RAVDESS dataset, and exceeded existing state-of-the-art systems for the same datasets.
ARTICLE | doi:10.20944/preprints201810.0739.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Event-Driven Processing, Speech recognition, Adaptive Resolution Analysis, Features extraction, Dynamic Time Warping, Classification
Online: 31 October 2018 (08:14:15 CET)
This paper proposes a novel approach, based on the adaptive rate processing and analysis, for the isolated speech recognition. The idea is to smartly combine the event-driven signal acquisition and windowing along with adaptive rate processing, analysis and classification for realizing an effective isolated speech recognition. The incoming speech signal is digitized with an event-driven A/D converter (EDADC). The output of EDADC is windowed with an activity selection process. These windows are later on resampled uniformly with an adaptive rate interpolator. The resampled windows are de-noised with an adaptive rate filter and their spectrum are computed with an adaptive resolution short time Fourier transform (ARSTFT). Later on, the magnitude, Delta and Delta-Delta spectral coefficients are extracted. The Dynamic Time Warping (DTW) technique is employed to compare these extracted features with the reference templates. The comparison outcomes are used to make the classification decision. The system functionality is tested for a case study and results are presented. An 8.2 times reduction in acquired number of samples is achieved by the devised approach as compared to the classical one. It aptitudes a significant computational gain and power consumption reduction of the proposed system over the counter classical ones. An average subject dependent isolated speech recognition accuracy of 96.8% is achieved. It shows that the proposed approach is a potential candidate for the automatic speech recognition applications like rehabilitation centers, smart call centers, smart homes, etc.
REVIEW | doi:10.20944/preprints202011.0152.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: EEG signal recognition; machine learning in EEG; neural networks in EEG; dry electrode EEG; deep learning EEG
Online: 3 November 2020 (14:07:29 CET)
In the last decade, unprecedented progress in the development of neural networks influenced dozens of different industries, among which are signal processing for the electroencephalography process (EEG). Electroencephalography, even though it appeared in the first half of the 20th century, to this day didn’t change the physical principles of operation. But the signal processing technique due to the use of neural networks progressed significantly in this area. Evidence for this can serve that for the past 5 years more than 1000 publications on the topic of using machine learning have been published in popular libraries. Many different models of neural networks complicate the process of understanding the real situation in this area. In this manuscript, we provided the most comprehensive overview of research where were used neural networks for EEG signal processing.
ARTICLE | doi:10.20944/preprints202112.0134.v1
Subject: Computer Science And Mathematics, Robotics Keywords: Human Robot Interaction (HRI); social robot; Speech Emotion Recognition (SER); Gender Recognition, affective states
Online: 8 December 2021 (14:31:07 CET)
The real challenge in Human Robot Interaction (HRI) is to build machines capable of perceiving human emotions so that robots can interact with humans in a proper manner. It is well known from the literature that emotion varies accordingly to many factors. Among these, gender represents one of the most influencing one, and so an appropriate gender-dependent emotion recognition system is recommended. In this paper, a two-level hierarchical Speech Emotion Recognition (SER) system is proposed: the first level is represented by the Gender Recognition (GR) module for the speaker’s gender identification; the second is a gender-specific SER block. Specifically for this work, the attention was focused on the optimisation of the first level of the proposed architecture. The system was designed to be installed on social robots for hospitalised and living at home elderly patients monitoring. Hence, the importance of reducing the software computational effort of the architecture also minimizing the hardware bulkiness, in order for the system to be suitable for social robots. The algorithm was executed on the Raspberry Pi hardware. For the training, the Italian emotional database EMOVO was used. Results show a GR accuracy value of 97.8%, comparable with the ones found in literature.
ARTICLE | doi:10.20944/preprints202004.0001.v1
Subject: Medicine And Pharmacology, Neuroscience And Neurology Keywords: stuttering; power spectra; speech preparation; imagined speech; simulated speech
Online: 1 April 2020 (07:52:09 CEST)
Purpose: The present study which addressed adults who stutter (AWS), has been an attempt to investigate power spectral dynamics in stuttering state through answering the written questions using quantitative electroencephalography (qEEG).Materials and Methods: A 64-channel EEG setup was used for data acquisition in 9 AWS. Since speech, and especially stuttering, causes significant noise in the EEG, the three conditions of speech preparation (SP), imagined speech (IS), and simulated speech (SS) in a 7-band format were chosen to source localize the signals using the standard low-resolution electromagnetic tomography (sLORETA) tool in fluent and disfluent states. Results: Having extracted enough fluent and disfluent utterances, significant differences were noted. Consistent with previous studies, the lack of beta suppression in SP, especially in beta2 and beta3 and somewhat in gamma band, was localized in supplementary motor area (SMA) and pre motor area in disfluent state. Delta band frequency was the best marker of stuttering shared in all 3 experimental conditions. Decreased delta power in SMA of both hemispheres and right premotor area through SP, in fronto-central and right angular gyrus through IS, and in SMA of both hemispheres through SS were a notable qEEG features of disfluent speech. Conclusion: The dynamics of beta and delta frequency bands may potentially explain the neural networks involved in stuttering. Based on the above, neurorehabilitation may better be formulated in the treatment of speech disfluency, namely stuttering.
ARTICLE | doi:10.20944/preprints202103.0221.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Speech enhancement; Kalman filter; Kalman gain; robustness metric; sensitivity metric; LPC, whitening filter; real-life noise
Online: 8 March 2021 (13:39:44 CET)
The inaccurate estimates of linear prediction coefficient (LPC) and noise variance introduce bias in Kalman filter (KF) gain and degrades speech enhancement performance. The existing methods proposed a tuning of the biased Kalman gain particularly in stationary noise condition. This paper introduces a tuning of the KF gain for speech enhancement in real-life noise conditions. First, we estimate noise from each noisy speech frame using a speech presence probability (SPP) method to compute the noise variance. Then construct a whitening filter (with its coefficients computed from the estimated noise) and employed to the noisy speech, yielding a pre-whitened speech, from where the speech LPC parameters are computed. Then construct KF with the estimated parameters, where the robustness metric offsets the bias in Kalman gain during speech absence to that of the sensitivity metric during speech presence to achieve better noise reduction. Where the noise variance and the speech model parameters are adopted as a speech activity detector. The reduced-biased Kalman gain enables the KF to minimize the noise effect significantly, yielding the enhanced speech. Objective and subjective scores on NOIZEUS corpus demonstrates that the enhanced speech produced by the proposed method exhibits higher quality and intelligibility than some benchmark methods.
ARTICLE | doi:10.20944/preprints202112.0196.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Speech Rehabilitation; Speech Quality Assessment; LSTM
Online: 13 December 2021 (10:10:36 CET)
The article considers an approach to the problem of assessing the quality of speech during speech rehabilitation as a classification problem. For this, a classifier is built on the basis of an LSTM neural network for dividing speech signals into two classes: before the operation and immediately after. At the same time, speech before the operation is the standard to which it is necessary to approach in the process of rehabilitation. The metric of belonging of the evaluated signal to the reference class acts as an assessment of speech. An experimental assessment of rehabilitation sessions and a comparison of the resulting assessments with expert assessments of phrasal intelligibility were carried out.
ARTICLE | doi:10.20944/preprints202005.0383.v1
Subject: Social Sciences, Psychology Keywords: child speech; speech production; speech perception; learning; consonant age of acquisition
Online: 24 May 2020 (16:07:44 CEST)
Purpose: Perceptual learning and production practice are basic mechanisms that children depend on to acquire adult levels of speech accuracy. In this study, we examined perceptual learning and production practice as they contributed to changes in speech accuracy in three- and four-year-old children. Our primary focus was manipulating the order of perceptual learning and baseline production practice to better understand when and how these learning mechanisms interact. Method: Sixty-five typically-developing children between the ages of three and four were included in the study. Children were asked to produce CVCCVC nonwords like /bozjəm/ and /tʌvtʃəp/ that were described as the names of make-believe animals. All children completed two separate experimental blocks: a baseline block in which participants heard each nonword once and repeated it, and a test block in which the perceptual input frequency of each nonword varied between 1 and 10. Half of the participants completed a baseline-test order; half completed a test-baseline order. Results: Greater accuracy was observed for nonwords produced in the second experimental block, reflecting a production practice effect. Perceptual learning resulted in greater accuracy during the test for nonwords that participants heard 3 or more times. However, perceptual learning did not carry over to baseline productions in the test-baseline design, suggesting that it reflects a kind of temporary priming. Finally, a post hoc analysis suggested that the size of the production practice effect depended on the age of acquisition of the consonants that comprised the nonwords. Conclusions: The study provides new details about how perceptual learning and production practice interact with each other and with phonological aspects of the nonwords, resulting in complex effects on speech accuracy and learning of form-referent pairs. These findings may ultimately help speech-language pathologists maximize their clients’ improvement in therapy.
ARTICLE | doi:10.20944/preprints201910.0259.v1
Subject: Chemistry And Materials Science, Metals, Alloys And Metallurgy Keywords: nanoindentation; pop-in; crystal plasticity; hardness; avalanches; noise; face-centered cubic
Online: 22 October 2019 (15:32:12 CEST)
We present a high-throughput nanoindentation study of in-situ bending effects on incipient plastic deformation behavior of polycrystalline and single-crystalline pure aluminum and pure copper at ultra-nano depths (<200nm). We find that hardness displays a statistically inverse dependence on in-plane stress for indentation depths smaller than 10nm, and the dependence disappears for larger indentation depths. In addition, plastic noise in the nanoindentation force and displacement displays statistically robust noise features, independently of applied stresses. Our experimental results suggest the existence of a regime in FCC crystals where ultra-nano hardness is sensitive to residual applied stresses, but plasticity pop-in noise is insensitive to it.
ARTICLE | doi:10.20944/preprints202306.0223.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Voice Cloning; Speech Synthesis; Speech Quality Evaluation
Online: 5 June 2023 (02:27:49 CEST)
Voice cloning, an emerging field in the speech processing area, aims to generate synthetic utterances that closely resemble the voices of specific individuals. In this study, we investigate the impact of various techniques on improving the quality of voice cloning, specifically focusing on a low-quality dataset. To contrast our findings, we also use two high-quality corpora for comparative analysis. We conduct exhaustive evaluations of the quality of the gathered corpora in order to select the most suitable audios for the training of a Voice Cloning system. Following these measurements, we conduct a series of ablations by removing audios with lower SNR and higher variability in utterance speed from the corpora in order to decrease their heterogeneity. Furthermore, we introduce a novel algorithm that calculates the fraction of aligned input characters by exploiting the attention matrix of the Tacotron 2 Text-to-Speech (TTS) system. This algorithm provides a valuable metric for evaluating the alignment quality during the voice cloning process. We present the results of our experiments, demonstrating that the performed ablations significantly increase the quality of synthesised audios for the challenging low-quality corpus. Notably, our findings indicate that models trained on a 3-hour corpus from a pre-trained model exhibit comparable audio quality to models trained from scratch using significantly larger amounts of data.
REVIEW | doi:10.20944/preprints201805.0096.v1
Subject: Chemistry And Materials Science, Polymers And Plastics Keywords: metamaterials; aviation noise; aeroacoustics; noise absorption; noise reflection; noise trapping; acoustic cloaking
Online: 4 May 2018 (15:05:21 CEST)
Metamaterials, man-made composites scaled smaller than the wavelength, have demonstrated a huge potential in their applications in acoustics, opening up for sub--wavelength acoustic absorbers, acoustic invisibility, perfect acoustic mirrors and acoustic lenses for hyper focusing, acoustic illusions and enabling new degrees of freedom in the control of the acoustic field. The zero, or even negative, refractive sound index of metamaterials offers possibilities in control of the acoustic pattern and sound at sub--wavelength scales. Despite the tremendous growth of the research on acoustic metamaterials during the last decade, the potential of metamaterial-based technologies in aeronautics is still not fully explored and its utilization is still in its infancy. Thus the principal concepts mentioned above could very well provide means to develop devices that would allow the mitigation of the impact of the civil aviation noise on the community. This paper gives a review of the state of the art of the most relevant works on acoustic metamaterials, analyzing them against their potential applicability in aeronautics, and in this process identifying possible implementation areas and interesting metabehaviors. It also identifies some technical challenges and possible future directions for research with the goal of unveiling the potential of metamaterials technologies in aeronautics.
ARTICLE | doi:10.20944/preprints202106.0687.v1
Subject: Physical Sciences, Acoustics Keywords: automatic speech recognition (ASR); automatic assessment tools; foreign language pronunciation; pronunciation training; computer-assisted pronunciation training (CAPT); automatic pronunciation assessment; learning environments; minimal pairs
Online: 29 June 2021 (07:31:41 CEST)
General–purpose automatic speech recognition (ASR) systems have improved their quality and are being used for pronunciation assessment. However, the assessment of isolated short utterances, as words in minimal pairs for segmental approaches, remains an important challenge, even more for non-native speakers. In this work, we compare the performance of our own tailored ASR system (kASR) with the one of Google ASR (gASR) for the assessment of Spanish minimal pair words produced by 33 native Japanese speakers in a computer-assisted pronunciation training (CAPT) scenario. Participants of a pre/post-test training experiment spanning four weeks were split into three groups: experimental, in-classroom, and placebo. Experimental group used the CAPT tool described in the paper, which we specially designed for autonomous pronunciation training. Statistically significant improvement for experimental and in-classroom groups is revealed, and moderate correlation values between gASR and kASR results were obtained, beside strong correlations between the post-test scores of both ASR systems with the CAPT application scores found at the final stages of application use. These results suggest that both ASR alternatives are valid for assessing minimal pairs in CAPT tools, in the current configuration. Discussion on possible ways to improve our system and possibilities for future research are included.
ARTICLE | doi:10.20944/preprints201909.0094.v1
Subject: Medicine And Pharmacology, Other Keywords: sleep quality; road traffic noise; actimetry; indoor noise; noise measurements; noise annoyance; noise sensitivity; time of day
Online: 9 September 2019 (08:45:43 CEST)
It is unclear which noise exposure time window and noise characteristics during nighttime are most detrimental for sleep quality in real life settings. We have conducted a field study with 105 volunteers wearing a wrist actimeter to record their sleep during seven days, together with concurrent outdoor noise measurements at their bedroom window. Actimetry recorded sleep latency increased by 5.6 minutes (95% confidence interval: 1.6 to 9.6 minutes) per 10 dB(A) increase in noise exposure during the first hour after bedtime. Actimetry assessed sleep efficiency was significantly reduced by 2-3 percent per 10 dB(A) increase in measured outdoor noise (Leq, 1h) for the last three hours of sleep. For subjectively reported sleepiness, noise exposure during the last hour prior to wake up was most crucial with an increase in the sleepiness score of 0.31 units (95% CI: 0.08 to 0.54) per 10 dB(A) Leq,1h. Associations for estimated indoor noise were not more pronounced than for outdoor noise. Considering noise events in addition to equivalent sound pressure levels (Leq) only marginally improved the statistical models. Our study provides evidence that matching the nighttime noise exposure time window to the individual’s diurnal sleep-wake pattern results in a better estimate of detrimental nighttime noise effects on sleep. We found that noise exposure at the beginning and the end of the sleep is most crucial for sleep quality.
ARTICLE | doi:10.20944/preprints201807.0106.v1
Subject: Social Sciences, Cognitive Science Keywords: auditory-visual speech perception; bipolar disorder; speech perception
Online: 6 July 2018 (05:21:19 CEST)
The focus of this study was to investigate how individuals with bipolar disorder integrate auditory and visual speech information compared to non-disordered individuals and whether there were any differences in auditory and visual speech integration in the manic and depressive episodes in bipolar disorder patients. It was hypothesized that bipolar groups’ auditory-visual speech integration would be less robust than the control group. Further, it was predicted that those in the manic phase of bipolar disorder would integrate visual speech information more than their depressive phase counterparts. To examine these, the McGurk effect paradigm was used with typical auditory-visual speech (AV) as well as auditory-only (AO) speech perception on visual-only (VO) stimuli. Results. Results showed that the disordered and non-disordered groups did not differ on auditory-visual speech (AV) integration and auditory-only (AO) speech perception but on visual-only (VO) stimuli. The results are interpreted to pave the way for further research whereby both behavioural and physiological data are collected simultaneously. This will allow us understand the full dynamics of how, actually, the auditory and visual (relatively impoverished in bipolar disorder) speech information are integrated in people with bipolar disorder.
ARTICLE | doi:10.20944/preprints202201.0184.v2
Subject: Environmental And Earth Sciences, Environmental Science Keywords: wind turbine; noise annoyance; fear; worry; noise sensitivity; noise management
Online: 8 June 2022 (12:31:13 CEST)
Wind energy in Europe is aimed to grow at a steady, high pace, but opposition from residents to local wind farm plans is one of the obstacles to further growth. A large body of evidence shows that local populations want to be involved and respected for their concerns, but in practice this is a complex process that cannot be solved with simple measures such as financial compensation. The visual presence and the acoustic impact of a wind farm is an important concern for residents. Generally environmental noise management aims to reduce the exposure of the population, usually based on acoustics and restricted to a limited number of sources (such as transportation or industry) and sound descriptors (such as Lden). Individual perceptions are taken into account only at an aggregate, statistical level (such as percentage of exposed, annoyed or sleep-disturbed persons in the population). Individual perceptions and reactions to sound vary in intensity and over different dimensions (such as pleasure/fear or distraction). Sound level is in fact a weak predictor of the perceived health effects of sound. The positive or negative perception of and attitude to the source of the sound is a better predictor of its effects. This article aims to show how the two perspectives (based on acoustics and on perception) can lead to a combined approach in the management of a wind farm aimed to reduce annoyance, not primarily of sound level. An important aspect in this approach is what the sound means to people: is it associated with the experience of having no say in plans, does it lead to anxiety or worry, is it appropriate? The available knowledge will be applied to wind farm management: planning as well as operation.
ARTICLE | doi:10.20944/preprints202107.0376.v1
Subject: Engineering, Automotive Engineering Keywords: power substation; transformer noise; low-frequency noise; noise masking; soundscape
Online: 16 July 2021 (14:33:26 CEST)
Low-frequency audible noise generated by the magnetostriction effect inherent to the operation of power transformers has become a major drawback, especially in cases where the electrical substation is located in urban areas subject to strict environmental regulations that imposes sound pressure limits, differing for day and night periods. Such regulations apply a +5 dB penalty if a tonal component of noise is present, which is clearly the case of magnetostriction noise, typically concentrated at twice the industrial frequency (50 Hz or 60 Hz, depending on the country). The strategy used to eliminate the tonal characteristics, therefore contributing to establish compliance with the applicable regulation and to alleviate the discomfort it causes to the human ear, consisted in superimposing to the substation noise a masking sound synthesized from “sounds of nature” with suitable intensities, to flatten the noise spectrum while enhancing the soundscape. The masking system (heavy-duty speakers powered by a microprocessor platform) was validated at an already judicialized urban scenario. Measurement results confirmed that the masking solution was capable of flattening the tonal frequencies, whose beneficial effect yielded the cancellation of the public civil action filed by the neighbors. The proposed solution is ready to be replicated to other scenarios.
ARTICLE | doi:10.20944/preprints201807.0588.v1
Subject: Public Health And Healthcare, Nursing Keywords: Noise; Noise Levels; Noise Measurement; Medical Intensive Care Units; Nursing
Online: 30 July 2018 (12:05:35 CEST)
This study was undertaken to investigate and analyze noise pollution in a large Chinese governmental hospital’s medical intensive care unit and compare to the WHO guidelines.This cross-sectional study was conducted in a MICU at a public governmental teaching hospital in Fujian province between July and August of 2017. A WENSN® WS1361 Integrated Sound Level Meter (China) was used for continuous every five seconds one week noise levels recording. After this measurement, the decibel meter was used for recording different location of isolation rooms and open bays, including occupied and unoccupied patient, and recording sound events occurs in the ICU to identify sources of noise. Peak and average noise levels were obtained from the meter, and data were downloaded from the WS1361 into a laptop computer. The measured mean equivalent sound pressure levels (L) and standard Aeq deviation over one week period were 66.64±7.57 dB(A), with acute spikes reaching 119.7 dB(A), the average sound level for a 24 hour period in a work day was 68.03±5.07 dB(A). These are higher than the current daytime environmental noise limit of 40-45 decibels in China and WHO. Mean work day noise was significantly louder than weekend time, there was a significant difference in work days and weekend (t=16.85;P=0.000).There was a statistical difference between the day time and night time shifts (t=34.67;P=0.000). The isolation rooms were significantly quieter than the open-bay rooms(t=46.15; p=0.00), sound levels in the occupied and unoccupied rooms also had significant difference(t=17.26; P=0.000).Two types of noise resources, including twenty kinds sources were identified and measured, mean noise levels ranged from of 61.33 to 79.21 dB(A). This study shows noise levels in intensive care units were exceeded the recommended. The study of the influence of noise on patient and staff is needed, and noise reduction strategies must be conduct in ICU.
ARTICLE | doi:10.20944/preprints202210.0480.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Speech Recognition; Automatic Speech Recognition; Language Identification; Wav2Vec2; Multilingual
Online: 31 October 2022 (10:06:34 CET)
This paper documents the development of a special case of multilingual Automatic Speech Recognition model, specifically tailored to attend two languages spoken by the majority of Latin America, Portuguese and Spanish. The bilingual model combines Language Identification and Speech Recognition developed with the Wav2Vec2.0 architecture and trained on several open and private speech datasets. In this model, the feature encoder is trained jointly for all tasks and different context encoders are trained for each task. The model is evaluated separately on two tasks: language identification and speech recognition. The results indicate that this model achieves good performance on speech recognition and average performance on language identification, training on a low quantity of speech material. The average accuracy of the language identification module on the MLS dataset is 66.75%. The average Word Error Rate in the same scenario is 13.89%, which is better than average 22.58% achieved by the commercial speech recognizer developed by Google.
ARTICLE | doi:10.20944/preprints201910.0336.v1
Subject: Public Health And Healthcare, Other Keywords: noise of baseball stadium; recreational noise exposure; survey of noise exposure; noise-induced hearing loss
Online: 29 October 2019 (10:52:24 CET)
This study measures the noise levels in a baseball stadium and analyzes baseball fans’ attitude of effect of recreational noise exposure on their hearing. In the baseball stadium, noise levels were measured in four seating sections using a sound level meter during the games. The LAeq average of the 16 measures produced 91.7 dBA, showing a significantly high noise level in the red and navy sections. As a function of frequency by LZeq analysis, the noise levels were significantly higher in low frequencies than other frequencies. For the survey sample, 688 randomly selected participants completed a 16-question survey on their noise exposure during the game and on the potential risk of hearing loss. Despite the very high noise levels, 70% of the respondents preferred sitting in either the red or the navy section to be closer to the cheerleaders and to obtain a good view. Most respondents reported that they did not consider wearing earplugs, and one-third experienced hearing muffled speech after the game. We conclude that the noise levels in baseball stadiums are high enough to cause hearing damage and/or tinnitus later, but expect these results to improve public education regarding safe noise exposure during popular sports activities.
ARTICLE | doi:10.20944/preprints201802.0071.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: narrowband PLC; impulsive noise; noise modeling
Online: 8 February 2018 (15:37:40 CET)
Currently, narrowband Power line communication (PLC) is considered as an attractive communication system in smart grid environments for applications such as advanced metering infrastructure (AMI). In this paper, we will present a comprehensive comparison and analysis in time and frequency domain of noise measured in China and Italy. In addition, impulsive noise in these two countries are mainly analyzed and modeled using two probability based models, Middleton Class A (MCA) model and \(\alpha\) stable distribution model. The results prove that noise measured in China is rich in impulsive noise, and can be modeled well by \(\alpha\) stable distribution model, while noise measured in Italy has less impulsive noise, and can be better modeled by MCA model.
ARTICLE | doi:10.20944/preprints201608.0227.v1
Subject: Engineering, Civil Engineering Keywords: railway noise; railway vibration; squeal noise vibration; screeching noise vibration; impact noise vibration; abatement; mitigation; life cycle analysis
Online: 29 August 2016 (12:39:40 CEST)
The railway industry focus in the past years was to research, find and develop methods to mitigate noise and vibration resulted from wheel/rail contact along track infrastructure. This resulted in a wide range of abatement measures that are available for the professionals of the industry today. However, although there are many options in the market, their practical implementations depend upon general constraints that affect most technological application in the engineering world. The progression of these technologies have facilitated the selection of more adequate methods for each best case scenario, but further studies are ought to be made to proper assess if each one is fit for their purpose. Every method implementation must be analyzed through budget and timeframe limitations, which includes building, maintenance and inspection costs and time allocation, while also aiming to meet different benefits, such as environmental impact control and wear of the whole infrastructure. There are several situations and facilities in a railway project design that need noise and vibration mitigation methods and each design allocates different priorities for each one of them. Traditionally the disturbance caused by railways to the community are generated by wheel/rail contact sound radiation that expresses in different ways, depending on the movement of the rolling stock and track alignment, such as rolling noise, impact noise and curve noise. More specifically, in special trackworks such as turnouts, the main area of this study, there are two noises types that must be evaluated: impact noise and screeching noise. With respect to the second, it is similar to curve squeals and, being such, its mitigation methods are to be assigned as if it was to abate curve squeal in turnouts and crossings. The impact noise on the other hand, emerges from the sound made by the rolling stock moving through joints and discontinuities (i.e. gaps) that composes these special components of a railway track. A life cycle analysis is therefore substantial for this reality and in this case will be applied to Squeal and Impact Noise on Special Trackwork. The evaluation is based on a valid literature review and the total costs were assumed by industry reports to maintain coherency. The period for a life cycle analysis is usually of 50 years, hence it was the value assumed. As for the general parameters, an area with high density of people was considered to estimate the values for a community with very strict limits for noise and vibration.
ARTICLE | doi:10.20944/preprints202309.0497.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Arabic Hate Speech; Natural Language Processing (NLP); Machine Learning; Arabic 18 Hate Speech Detection; Arabic Hate Speech Corpus
Online: 7 September 2023 (07:14:15 CEST)
Hate Speech Detection in Arabic presents a multifaceted challenge due to the broad and diverse linguistic terrain. With its multiple dialects and rich cultural subtleties, Arabic requires particular measures to address hate speech online successfully. To address this issue, academics and developers have used natural language processing (NLP) methods and machine learning algorithms adapted to the complexities of Arabic text. However, many proposed methods were hampered by a lack of a comprehensive dataset/corpus of Arabic hate speech. In this research, we propose a novel multi-class public Arabic dataset comprised of 403,688 annotated tweets categorized as extremely positive, positive, neutral, or negative based on the presence of hate speech. Using our developed dataset, we additionally characterize the performance of multiple machine learning models for Hate speech identification in Arabic Jordanian dialect tweets. Specifically, the Word2Vec, TF-IDF, and AraBert text representation models have been applied to produce word vectors. With the help of these models, we can provide classification models with vectors representing text. After that, seven Machine learning classifiers have been evaluated: Support Vector Machine (SVM), Logistic Regression (LR), Naive Bays (NB), Random Forest (RF), AdaBoost (Ada), XGBoost (XGB), and CatBoost (CatB). In light of this, the experimental evaluation revealed that, in this challenging and unstructured setting, our gathered and annotated datasets were rather efficient and generated encouraging assessment outcomes. This will enable academics to delve further into this crucial field of study.
ARTICLE | doi:10.20944/preprints202107.0322.v3
Subject: Physical Sciences, Condensed Matter Physics Keywords: fluctuations; noise spectra; longitudinal and transverse electric fields; Nyquist noise; photon number noise
Online: 17 January 2022 (09:04:34 CET)
We derive the thermal noise spectrum of the Fourier transform of the electric field operator of a given wave vector starting from the quantum-statistical definitions and relate it to the complex frequency and wave vector dependent complex conductivity in a homogeneous, isotropic system of electromagnetic interacting electrons. We analyze separately the longitudinal and transverse case with their peculiarities. The Nyquist formula for vanishing frequency and wave vector, as well as its modification for non-vanishing frequencies and wave vectors follow immediately. Furthermore we discuss also the noise of the photon occupation numbers. It is important to stress that no additional assumptions at all were used in this straightforward proof.
ARTICLE | doi:10.20944/preprints201708.0035.v1
Subject: Environmental And Earth Sciences, Other Keywords: noise measurement; road traffic noise; neighborhood noise; informal settings; developing country; South Africa.
Online: 9 August 2017 (06:03:00 CEST)
In developing countries, noise exposure and its negative health effects have been little explored. The present study aimed to assess the noise exposure situation in adults living in informal settings in the Western Cape Province, South Africa. We conducted continuous one-week outdoor noise measurements at 134 homes in four different areas. These data were used to develop a land use regression (LUR) model to predict A-weighted day-evening-night equivalent sound level (Lden) from geographic information system (GIS) variables. Mean noise exposure during day (6:00-18:00) was 60.0 A-weighted decibels (dB(A)) (interquartile range 56.9-62.9 dB(A)), during night (22:00-6:00) 52.9 dB(A) (49.3-55.8 dB(A)) and average Lden was 63.0 dB(A) (60.1-66.5 dB(A)). Main predictors of the LUR model were related to road traffic and household density. Model performance was low (adjusted R2=0.130) suggesting that other influences than represented in the geographic predictors are relevant for noise exposure. This is one of the few studies on the noise exposure situation in low- and middle-income countries. It demonstrates that noise exposure levels are high in these settings.
ARTICLE | doi:10.20944/preprints202103.0513.v1
Subject: Engineering, Automotive Engineering Keywords: Automatic Voice Query Service; Automatic Speech Recognition; Multi-Accented Mandarin Speech Recognition
Online: 22 March 2021 (10:55:53 CET)
Automatic Voice Query Service can extremely reduce the artificial cost, which could improve the response efficiency for users. The automatic speech recognition (ASR) is one of the important component in AVQS. However, many dialect areas in China make the AVQS have to response the multi-accented Mandarin users by single acoustic model in ASR. This problem severely limits the accuracy of ASR for multi-accented speech in the AVQS. In this paper, a new framework for AVQS is proposed to improve the accuracy of response. Firstly, the fusion feature including iVector and filterbank acoustic features is used to train the Transformer-CTC model. Secondly, the transformer-CTC model is used to construct the end-to-end ASR. Finally, key words matching algorithm for AVQS based on fuzzy mathematic theory is proposed to further improve the accuracy of response. The results show that the final accuracy in our proposed framework for AVQS arrives at 91.5%. The proposed framework for AVQS can satisfy the service requirement of different areas in mainland of China. This research has a great significance for exploring the application value of artificial intelligence in the real scene.
Subject: Engineering, Electrical And Electronic Engineering Keywords: thermal noise; negative feedback; low-noise resistors; theory; design
Online: 31 May 2020 (19:02:49 CEST)
The concept of using special electrical circuit design realize a "cold resistors", that is, an active resistor circuitry with lowered effective noise temperature, was first introduced about 80 years ago. Later on, various kinds of artificial resistors were applied in different research areas, such as gravitational wave detection, photo-amplifiers and quartz oscillators. Their proofs of concepts were experimentally proved. Unfortunately, the complete theory was not found even though several attempts had been published, sometimes with errors. In this paper, we describe a correct and complete circuit theoretical model of a cold resistor system. The results are confirmed by computer simulations. A design tools for this circuit is also shown.
REVIEW | doi:10.20944/preprints201910.0078.v1
Subject: Engineering, Mechanical Engineering Keywords: drones; aerodynamics; aeroacoustics; rotor noise; airframe noise; porous material
Online: 8 October 2019 (06:11:47 CEST)
In the last decade, the drone market has grown rapidly for both civil and military purposes. Due to their versatility, drones demand is constantly increasing, with several industrial players joining the venture to transfer urban mobility to the air. This has exacerbated the problem of noise pollution, mainly due to the relatively lower altitude of these vehicles and to the proximity of their routes to extremely densely populated areas. In particular, both the aerodynamic and aeroacoustic optimization of the propulsive system and of its interaction with the airframe are key aspects of the design of aerial vehicles for the success or the failure of their mission. The industrial challenge involves finding the best performance in terms of loading, efficiency and weight, and, at the same time, the most silent configuration. For this reason, research has focused on an initial localization of the noise sources and, on further analysis, of the noise generation mechanism, focusing particularly on directivity and scattering. The aim of the present study is to review the noise source mechanisms and the state-of-the-art technologies available in literature for its suppression, focusing especially on the fluid-dynamic aspects of low Reynolds numbers of the propulsive system and on the interaction of the propulsive-system flow with the airframe.
ARTICLE | doi:10.20944/preprints202006.0117.v1
Subject: Medicine And Pharmacology, Other Keywords: Image Noise Removal; Image Enhancement; MFNR; Speckle noise; Median Filter
Online: 9 June 2020 (05:00:26 CEST)
Speckle noise is one of the most difficult noises to remove especially in medical applications. It is a nuisance in ultrasound imaging systems which is used in about half of all medical screening systems. Thus, noise removal is an important step in these systems, thereby creating reliable, automated, and potentially low cost systems. Herein, a generalized approach MFNR (Multi-Frame Noise Removal) is used, which is a complete Noise Removal system using KDE (Kernal Density Estimation). Any given type of noise can be removed if its probability density function (PDF) is known. Herein, we extracted the PDF parameters using KDE. Noise removal and detail preservation are not contrary to each other as the case in single-frame noise removal methods. Our results showed practically complete noise removal using MFNR algorithm compared to standard noise removal tools. The Peak Signal to Noise Ratio (PSNR) performance was used as a comparison metric. This paper is an extension to our previous paper where MFNR Algorithm was showed as a general purpose complete noise removal tool for all types of noises
ARTICLE | doi:10.20944/preprints202301.0008.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Secondary emotions; emotional speech synthesis; fundamental frequency contour; Fujisaki model; low-resource; empathetic speech
Online: 3 January 2023 (07:29:37 CET)
A low-resource emotional speech synthesis system for empathetic speech synthesis based on modelling prosody features is presented here. Secondary emotions, identified to be needed for empathetic speech, are modelled and synthesised in this paper. As secondary emotions are subtle in nature, they are difficult to model compared to primary emotions. They are also less explored, and this is one of the few studies that model secondary emotions in speech. Current speech synthesis research uses large databases and deep learning techniques to develop emotion models. There are many secondary emotions, and hence, developing large databases for each of the secondary emotions is expensive. This research presents a proof-of-concept using hand-crafted feature extraction and modelling of these features using a low resource-intensive machine learning approach, thus creating synthetic speech with secondary emotions. Here, a quantitative model-based transformation is used to shape the emotional speech fundamental frequency contour. Speech rate and mean intensity are modelled via rule-based approaches. Using these models, an emotional text-to-speech synthesis system to synthesise five secondary emotions - anxious, apologetic, confident, enthusiastic and worried is developed. A perception test to evaluate the synthesised emotional speech is also conducted.
REVIEW | doi:10.20944/preprints201912.0082.v1
Subject: Public Health And Healthcare, Public, Environmental And Occupational Health Keywords: noise; noise induced hearing loss; noise apps; weather stressors; psychological stressors; tractor safety; seatbelt use; dust; air quality
Online: 6 December 2019 (11:37:43 CET)
There are numerous hazards found on the farms. Most of them are ignored, which might cause the farmer to pay later in terms of his ill health, potential injuries or death. The current article discusses some of the common issues such as dust and air quality concerns; environmental (weather) stressors and psychological stressors; noise and hearing protection; and tractor safety and seatbelt use. And finally, the recommendations to overcome the hazards are discussed.
ARTICLE | doi:10.20944/preprints202311.1456.v1
Subject: Engineering, Automotive Engineering Keywords: booming noise; electric vehicle; tailgate; guide bumper; inner panel; noise reduction
Online: 23 November 2023 (08:18:44 CET)
This article investigates the source of booming noise emanating from the tailgates of electric vehicles, along with proposed strategies to mitigate it. This study involved the measurement of booming noises during on-road vehicle tests to pinpoint their origins. Additionally, operational deflection shapes (ODS) were extracted from the tailgate vibration signals to gain insight into its dynamic behavior. Modal tests were conducted on the tailgate to determine its dynamic characteristics and compared with driving test results to reveal the mechanism responsible for tailgate-induced booming noise. It was established that such noise is primarily due to the tailgate modes resulting from a combination of rigid body motion in the fore-aft direction and deformation in the central section of the panel. An analytical model of the tailgate was developed using commercial finite element analysis software to propose measures for reducing booming noise. Experimental findings validated the model accuracy. Structural enhancements were implemented to enhance the panel stiffness and improve the connection between the vehicle and tailgate via bushings to dampen the booming noise resulting from tailgate motion. Under random force inputs, analytical results demonstrated a 13.8% reduction in maximum deformation in the tailgate model in the improved structural configuration with increased panel stiffness.
ARTICLE | doi:10.20944/preprints202304.0575.v3
Subject: Engineering, Bioengineering Keywords: Inner Speech; Imagined Speech; EEG Decoding; Brain-Computer Interface; BCI; LSTM; Wavelet Scattering Transformation; WST.
Online: 15 May 2023 (05:43:54 CEST)
In this paper, we propose an imagined speech-based brain wave pattern recognition using deep learning. Multiple features were extracted concurrently from eight-channel Electroencephalography (EEG) signals. To obtain classifiable EEG data with fewer number of sensors, we placed the EEG sensors on carefully selected spots on the scalp. To decrease the dimensions and complexity of the EEG dataset and to avoid overfitting during the deep learning algorithm, we utilized the wavelet scattering transformation. A low-cost 8-channel EEG headset was used with MATLAB 2023a to acquire the EEG data. The Long-Short Term Memory Recurrent Neural Network (LSTM-RNN) was used to decode the identified EEG signals into four audio commands: Up, Down, Left, and Right. Wavelet scattering transformation was applied to extract the most stable features by passing the EEG dataset through a series of filtration processes. Filtration has been implemented for each individual command in the EEG datasets. The proposed imagined speech-based brain wave pattern recognition approach achieved a 92.50% overall classification accuracy. This accuracy is promising for designing a trustworthy imagined speech-based Brain-Computer Interface (BCI) future real-time systems. For better evaluation of the classification performance, other metrics were considered, and we obtained 92.74%, 92.50% and 92.62% for precision, recall, and F1-score, respectively.
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Gaussian noise; variance estimation.
Online: 5 August 2021 (15:24:18 CEST)
This article describes an algorithm for estimation the variance of Gaussian noise. The data is smoothed using the Savitsky-Golay polynomial filter. Absolute differences between original and smoothed data are sorted in ascending order. The initial part of this sequence is selected for analysis. The result of calculation mean value of differences can be used to estimate the variance of the noise. By selecting points for analysis, the impact of cosmic ray noise and other artifacts can be reduced. The use of the proposed method for artificial and real spectra shows the ability to effectively estimate the noise variance. The algorithm contains no user-defined parameters.
ARTICLE | doi:10.20944/preprints202103.0587.v1
Subject: Computer Science And Mathematics, Computer Networks And Communications Keywords: UOWC; scattering; scattering noise
Online: 24 March 2021 (13:11:37 CET)
In underwater optical wireless communications (UOWC), scattering of the propagating light beam results in both intensity and phase variations, which limit the transmission link range and channel bandwidth, respectively. Scattering of photons while propagating through the channel is a random process, which results in the channel-dependent scattering noise. In this work, we introduce for the first time an analytical model for this noise and investigate its effect on the bit error rate performance of the UOWC system for three types of waters and a range of transmission link spans. We show that, for a short range of un-clear water or a longer range of clear water, the number of photons experiencing scattering is high, thus leading to the increased scattering noise.
CONCEPT PAPER | doi:10.20944/preprints202108.0194.v1
Subject: Social Sciences, Sociology Keywords: congruence; voice; speech; communication; identity; personality
Online: 9 August 2021 (12:41:06 CEST)
Purpose: We present a theoretical framework that formalizes and defines the constructs of communicative congruence and communicative dysphoria that is rooted within a comprehensive and mechanistic theory of personality. Background: Voice therapists have likely encountered a patient who states that a therapeutic target voice “isn’t me.” The ability to accurately convey a person’s sense of self, or identity, through their voice, speech, and communication behaviors seems to have high relevance to both patients and clinicians alike. However, to date, we lack a mechanistic theoretical framework through which to understand and interrogate the phenomenon of congruence between one’s communication behaviors and their sense of self. Results: We review the initial notion of congruence, first proposed by Carl Rogers. We then review several theories on selfhood, identity, and personality. After reviewing these theories, we explain how our proposed constructs fit within our chosen theory, the Cybernetic Big Five Theory of Personality. We then discuss similarities and differences to a similarly named construct, the Vocal Congruence Scale. Next, we review how these constructs may come to bear on an existing theory relevant to voice therapy, the Trans Theoretical Model of Health Behavior Change. Finally, we state testable hypotheses for future exploration, which we hope will establish a foundation for future investigations into communicative congruence. Conclusion: To our knowledge, the present paper is the first to explicitly define communicative congruence and communicative dysphoria. We embed these constructs within a comprehensive and mechanistic theory of personality and, in doing so, hope to provide a rigorous and comprehensive theoretical framework that will allow us to test and better understand these proposed constructs.
ARTICLE | doi:10.20944/preprints202011.0646.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: social media; hate speech; text classification
Online: 25 November 2020 (14:12:07 CET)
The exponential increase in the use of the Internet and social media over the last two decades has changed human interaction. This has led to many positive outcomes, but at the same time it has brought risks and harms. While the volume of harmful content online, such as hate speech, is not manageable by humans, interest in the academic community to investigate automated means for hate speech detection has increased. In this study, we analyse six publicly available datasets by combining them into a single homogeneous dataset and classify them into three classes, abusive, hateful or neither. We create a baseline model and we improve model performance scores using various optimisation techniques. After attaining a competitive performance score, we create a tool which identifies and scores a page with effective metric in near-real time and uses the same as feedback to re-train our model. We prove the competitive performance of our multilingual model on two langauges, English and Hindi, leading to comparable or superior performance to most monolingual models.
ARTICLE | doi:10.20944/preprints202310.1690.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: n/a; Neural Networks; Noise Detection; Noise Filtering; Classroom Recording; Classroom Analysis
Online: 26 October 2023 (09:02:56 CEST)
Audio recording in classrooms is a common practice in educational research, with applications ranging from detecting classroom activities to analysing student behaviour. Previous research has employed neural networks for classroom activity detection and speaker role identification. However, these recordings are often affected by background noise that can hinder further analysis, and the literature has only sought to identify noise with general filters and not specifically designed for classrooms. Although the use of high-end microphones and environmental monitoring can mitigate this problem, these solutions can be costly and potentially disruptive to the natural classroom environment. In this context, we propose the development of a neural network model that can specifically detect and filter out background noise in classroom recordings. This model would allow the use of lower quality recordings without compromising analysis capability, thus facilitating data collection in natural educational environments and reducing the costs associated with high-end recording equipment.
ARTICLE | doi:10.20944/preprints201911.0388.v1
Subject: Public Health And Healthcare, Public, Environmental And Occupational Health Keywords: social noise; auditory, non-auditory noise effects; personal music players; university students
Online: 30 November 2019 (10:07:18 CET)
Purpose: The study is aimed to quantify the effects of social noise (personal music players (PMP), high-intensity noise exposure events) and road traffic noise exposures in the sample of Slovak university students living and studying in Bratislava. Methods: There were 1,003 university students (306 males and 697 females, average age 23.13±2) enrolled in the study; 347 lived in the student housing facility exposed to road traffic noise (LAeq =67.6 dB) and 656 in the control one (LAeq =53.4 dB). Respondents completed a validated ICBEN 5-grade scale “Noise annoyance questionnaire”. The exposure to PMP was objectified by the conversion of the subjective evaluation of the volume setting and duration. With the cooperation of the ENT specialist, we arranged audiometric examinations on the pilot sample of 41 volunteers. Results: From the total sample of 1,003 students, 794 (79.16 %) of them reported the use of PMP in the course of the last week; average time of 285 minutes. There was a significant difference in PMP use between the exposed (85.59 %) and the control group (75.76 %) (p=0.01). Among PMP users 30.7 % exceeded the LAV (lower action value for industry LAeq,8h = 80 dB). On a pilot sample of volunteers (n=41) audiometry testing was performed indicating a hearing threshold shift at higher frequencies in 22% of subjects. Conclusions: The results of the study on a sample of young healthy individuals showed the importance of exposure to environmental noise from different sources (transportation, neighborhood, construction, entertainment facilities, etc.) as well as social noise and the need for prevention and intervention.
ARTICLE | doi:10.20944/preprints201806.0326.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: environmental noise monitoring; noise sensing; A-weighting; hardware platform; wireless sensor network
Online: 20 June 2018 (15:59:02 CEST)
Wireless sensor networks can provide a cheap and flexible infrastructure to support the measurement of noise pollution. However, the processing of the gathered data is challenging to implement on resource-constrained nodes, because each node has its own limited power supply, low-performance and low-power micro-controller unit and other limited processing resources, as well as limited amount of memory. We propose a sensor node for monitoring of indoor ambient noise. The sensor node is based on a hardware platform with limited computational resources and utilizes a number of simplifications to approximate more complex and costly signal processing stage. Furthermore, to reduce the communication between the sensor node and a sink node, as well as the power consumed by the IEEE 802.15.4 (ZigBee) transceiver, we perform digital A-weighting filtering and non-calibrated calculation of the sound pressure level on the node. According to experimental results, the proposed sound level meter can accurately measure the noise levels of up to 100~dB, with the mean difference of less than 2~dB compared to Class 1 sound level meter. The proposed device can continuously monitor indoor noise for several days. Despite the limitations of the used hardware platform, the presented node is a promising low-cost and low-power solution for indoor ambient noise monitoring.
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Gaussian noise; Speckle Noise; Mean square error(MSE); DE noising filters; Maximum difference value (MD); Peak signal to noise ratio(PSNR)
Online: 4 June 2020 (05:52:55 CEST)
Noise reduction in medical images is a perplexing undertaking for the researchers in digital image processing. Noise generates maximum critical disturbances as well as touches the medical images quality, ultrasound images in the field of biomedical imaging. The image is normally considered as gathering of data and existence of noises degradation the image quality. It ought to be vital to reestablish the original image noises for accomplishing maximum data from images. Medical images are debased through noise through its transmission and procurement. Image with noise reduce the image contrast and resolution, thereby decreasing the diagnostic values of the medical image. This paper mainly focuses on Gaussian noise, Pepper noise, Uniform noise, Salt and Speckle noise. Different filtering techniques can be adapted for noise declining to improve the visual quality as well as reorganization of images. Here four types of noises have been undertaken and applied on medical images. Besides numerous filtering methods like Gaussian, median, mean and Weiner applied for noise reduction as well as estimate the performance of filter through the parameters like mean square error (MSE), peak signal to noise ratio (PSNR), Average difference value (AD) and Maximum difference value (MD) to diminish the noises without corrupting the medical image data.
ARTICLE | doi:10.20944/preprints202303.0158.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: speech enhancement; online applicability; real-time factor
Online: 8 March 2023 (15:25:56 CET)
Deep-learning-based speech enhancement techniques have been recently grown in interest, since their impressive performance can potentially benefit a wide variety of digital voice communication systems. However, such performance has been evaluated mostly in offline audio processing scenarios (i.e. feeding the model, in one go, a complete audio recording, which may extend several seconds). It is of great interest to evaluate and characterize the current state-of-the-art in applications that process audio online (i.e. feeding the model a sequence of segments of audio data, concatenating the results at the output end). Although evaluations and comparisons between speech enhancement techniques have been carried out before, as far as the author knows, the work presented here is the first that evaluates the performance of such techniques in relation to their online applicability. Meaning, this work measures how the output signal-to-interference ratio (as a separation metric), the response time and memory usage (as online metrics) are impacted by the input length (the size of audio segments), in addition to the amount of noise, amount and number of interferences, and amount of reverberation. Three popular models were evaluated, given their availability on public repositories and online viability: MetricGAN+, Spectral Feature Mapping with Mimic Loss, and Demucs-Denoiser. The characterization was carried out using a systematic evaluation protocol based on the Speechbrain framework. Several intuitions are presented and discussed, and some recommendations for future work are proposed.
ARTICLE | doi:10.20944/preprints202302.0465.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: CGA-MGAN; Gated Attention Unit; Speech Enhancement
Online: 27 February 2023 (09:24:31 CET)
In recent years, neural networks based on attention mechanisms have been increasingly widely used in speech recognition, separation, enhancement, and other fields. In particular, the convolution-augmented transformer has achieved good performance as it can combine the advantages of convolution and self-attention. Recently, the gated attention unit (GAU) has been proposed. Compared with the traditional multi-head self-attention, approaches with GAU are effective and computationally efficient. In this article, we propose a network for speech enhancement called CGA-MGAN, a kind of Metric GAN based on convolution-augmented gated attention. CGA-MGAN captures local and global correlations in speech signals at the same time through the fusion of convolution and gated attention units. Experiments on Voice Bank + DEMAND show that the CGA-MGAN we propose achieves an excellent performance (3.47 PESQ, 0.96 STOI, and 11.09dB SSNR) at a relatively small model size (1.14M).
ARTICLE | doi:10.20944/preprints202301.0580.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Electronic monitoring; hate speech; data leakage; prediction.
Online: 31 January 2023 (08:59:39 CET)
Technological innovations and the expansion of Internet access have produced significant changes in the configurations of organizations and, consequently, in the relationships between employees and employers. This new scenario generates the need for greater monitoring in the workplace in order to control inappropriate behavior or situations that may generate misfortunes. Two important problems faced are the dissemination of hate through networks and data leakage that can have social, psychological, and financial impacts. Thus, monitoring tools can be incorporated to assist in surveillance, and thus ensure the achievement of organizational objectives. This paper presents a workplace computer monitoring solution that integrates Spyware techniques, and text sentiment classification, along with a distributed microservices architecture, which aims to collect a range of information and generate alerts to managers regarding hate speech and vulnerabilities. Preliminary tests have been conducted to evaluate the performance of Spyware integrated with prediction models.
ARTICLE | doi:10.20944/preprints202211.0017.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: text-to-speech; naturalness; intelligibility; Brazilian Portuguese
Online: 1 November 2022 (04:37:04 CET)
This paper compares the performance of three text-to-speech (TTS) models released from June 2021 to January 2022 in order to establish a baseline for Brazilian Portuguese. Those models were trained using dataset for Brazilian Portuguese. The experimental setup considers tts-portuguese dataset to fine-tune the following TTS models: VITS end-to-end model; glowtts and gradtts acoustic models both using hifi-gan vocoder. Performance metrics are arranged into objective and subjective metrics. As subjective metrics, the naturalness and intelligibility are measured based on the mean opinion score (MOS). Results shows that gradtts+hifigan model achieved naturalness of 4.07 MOS, close to performance of current commercial models.
ARTICLE | doi:10.20944/preprints201712.0058.v1
Subject: Social Sciences, Language And Linguistics Keywords: speech synthesis; evaluation; hesitation; virtual agents; interaction
Online: 11 December 2017 (07:03:14 CET)
Conversational spoken dialogue systems that interact with the user rather than merely reading text can be equipped with hesitations to manage the dialogue flow and the users' attention. Based on a series of empirical studies, we built an elaborated hesitation synthesis strategy for dialogue systems that inserts hesitations of scalable extent wherever needed in the ongoing utterance. So far, evaluations of hesitating systems have shown that synthesis quality is affected negatively by hesitations, but that there is improvement in interaction quality. We argue that due to its conversational nature, hesitation synthesis needs interactive evaluation rather than traditional MOS-based questionnaires. To prove this point, we dually evaluate our system’s speech synthesis component: on the one hand, linked to the dialogue system evaluation, on the other hand, in the traditional MOS way. This way we are able to analyze and discuss differences that arise due to the evaluation methodology. Our results suggest that MOS scales are not sufficient to assess speech synthesis quality, which has implications for future research that are discussed in this paper. Furthermore, our results indicate that hesitations work well to increase task performance and that an elaborated strategy is necessary to avoid likability issues.
BRIEF REPORT | doi:10.20944/preprints201912.0397.v1
Subject: Computer Science And Mathematics, Software Keywords: noise measurement app; usability; smartphone
Online: 31 December 2019 (02:16:57 CET)
This study aims to assess using a smartphone app (DecibelX), as a noise measuring alternative to the more costly traditional use of measuring noise levels with a Sound Level Meter (SLM). The study compares the accuracy of the app to readings taken with a SLM and dosimeter, and also evaluates the app’s performance for pure tone and narrow band noise. And a usability study identifies strengths and weaknesses related to usability of the app.
ARTICLE | doi:10.20944/preprints201909.0178.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: motion capture; evaluation; noise modelling; noise color; Allan variance; simulated annealing; ant colony optimization
Online: 17 September 2019 (03:59:00 CEST)
Optical motion capture systems are state-of-the-art in motion acquisition, however as any measurement systems they are not error free -- noise is their intrinsic feature. The works so far mostly employ simple noise model, expressing the uncertainty as a simple variance. In the work we prove the existence of several types of noise and demonstrate how to quantify them using Allan variance. For the automated readout of the noise coefficients we solve the multidimensional regression problem using sophisticated metaheuristics in exploration-exploitation scheme. Besides classic types of noise we identified the presence of the correlated noises and periodic distortion in our facility. We had also opportunity to observe the influence of camera failure to the overall performance.
ARTICLE | doi:10.20944/preprints201901.0088.v1
Subject: Engineering, Control And Systems Engineering Keywords: signal-to-noise ratio; nighttime light imaging; time sequence images; Luojia 1-01; radiative transfer model; radiometric calibration; in-orbit test
Online: 9 January 2019 (15:43:53 CET)
Signal-to-noise ratio (SNR) is an important index to evaluate radiation performance and image quality of optical imaging systems under low illumination background. Under the nighttime lighting condition, the illumination of remote sensing objects is low and varies greatly, usually ranging from several lux to tens of thousands of lux. Nighttime light remote sensing imaging requires high sensitivity and large dynamic range of detectors. Luojia 1-01 is the first professional nighttime light remote sensing satellite in the world. In this paper, we took the nighttime light remote sensing camera carried on the satellite as research object, proposed an in-orbit SNR test method based on time series images to overcome the problem of low spatial resolution. We first analyzed the process of luminous flux transmission between objects and satellite and established a radiative transfer model. By combining the parameters of large relative aperture optical system and high sensitivity CMOS device, we established SNR model and specially analyzed the effect of exposure time and quantization bits on SNR. Finally we used the proposed in-orbit test method to calculate SNR of lighting images acquired by satellite. And the measured result is in good agreement with the model predicted data. Under the condition of 10lx illumination, the SNR of typical objects can reach 27.02dB, which is much better than the requirement of 20dB for engineering application.
ARTICLE | doi:10.20944/preprints202310.1967.v1
Subject: Medicine And Pharmacology, Dentistry And Oral Surgery Keywords: tongue frenulum; ankyloglossia; swallowing; tongue mobility; speech; occlusion
Online: 31 October 2023 (07:59:09 CET)
(1) The incidence of ankyloglossia ranges from 0.02 to 10.7%. The literature describes the effect of ankyloglossia on selected dysfunctions of the stomatognathic system, however no studies could be found reporting the influence of ankyloglossia on the occurrence of several disorders in a group of subjects. The aim of the present study was to assess the effect of lingual frenulum on swallowing, speech, occlusion, and periodontal status; (2) Methods: The subjects were 172 patients, 86 with ankyloglossia (study group) and 86 with normal tongue frenulum (control group). In all subjects, the length of tongue frenulum, the type of swallowing, tongue mobility, occlusion, periodontal status and speech abnormalities were assessed; (3) Results: All subjects from the control group and all those with mild ankyloglossia showed normal tongue mobility. A limited tongue mobility was found in 29.4% subject with moderate and in 70.6% subjects with severe ankyloglossia. Rhotacism was observed in 21.3% subjects with normal frenulum, in 2.1% with mild, 38.3% with moderate, and 38.3% with severe ankyloglossia. Malocclusion or crowding was diagnosed in subjects with mild, moderate and severe ankyloglossia in 7.4%, 33.9% and 20.7% subjects (total 62%), respectively, whereas in the control group - in 21.6% subjects. No abnormalities in the periodontium in the area of the lingual surfaces of the crowns of the lower central incisors were found in any of the examined persons. Among patients with infantile type of swallowing 24.4% had a normal length of the tongue frenulum, 11.1% - mild, 28.9% - moderate, and 35.6 - severe ankyloglossia. Among patients presenting a mature type of swallowing 58.7% had a normal length of the frenulum; (4) Conclusions: 1.A shortened tongue frenulum correlates with “infantile swallowing pattern”. 2. Moderate or severe ankyloglossia significantly limits tongue mobility. 3. Short tongue frenulum is related to speech disorders.
ARTICLE | doi:10.20944/preprints202306.1186.v1
Subject: Engineering, Bioengineering Keywords: SAEF; audiology competencies; audiometry simulation; speech language; students.
Online: 16 June 2023 (07:39:25 CEST)
The information society has transformed human life. Technology is almost everywhere, including health and education. For example, years ago, speech and language therapy students required a long time and high-cost equipment to develop healthcare of the auditory and vestibular systems competencies. The high cost of the equipment permitted its practical use only in classes, hindering students’ autonomy in developing those competencies. That situation was a real issue, even more in times of pandemic where online education was essential. This article describes SAEF, an open-source software simulator for autonomously developing procedural audiology therapy competencies, user acceptance, and the validity of experiments and results. SAEF delivers immediate feedback and performance results. Obtained results permit validating students’ and educators’ acceptability of SAEF in audiology therapy education. Obtained results invite authors to continue developing simulator software solutions in other health education contexts. SAEF was developed using open-source technology to facilitate its accessibility, classification, and sustainability.
ARTICLE | doi:10.20944/preprints202211.0047.v1
Subject: Medicine And Pharmacology, Otolaryngology Keywords: hearing therpy; speech therapy; cochlea implant; digital application
Online: 2 November 2022 (06:10:30 CET)
Background: In order to achieve the best possible hearing and understanding with a cochlear implant (CI), regular hearing speech therapy treatment is necessary after implantation. This treatment should also be accessible to the growing proportion of hearing-impaired people with a migration background. This requires an alternative to therapy in the therapist's native language. The aim of this study was to evaluate six multilingual conversation applications with regard to their usefulness for therapy. Material and Methods: The six most commonly used applications were reviewed in terms of accuracy of content and grammatical translation, as well as pronunciation for English, Spanish, Arabic, Turkish, and Russian by native speakers. The number of available languages, availability, cost, and additional features were also analyzed. The accuracy of the content and grammatical translation as well as the pronunciation were statistically evaluated, and the differences were highlighted. The results of the different applications were compared with the performance of a native speaker. Results: All applications tested differed significantly from the native speaker level, with Google Translator showing the closest approximation to the native speaker level. All apps offer translations for multiple languages and, with exceptions, are available in both app stores. Furthermore, all apps have additional therapist-facilitating features. Conclusion: Multilingual conversation apps can make speech therapy in a foreign language much easier when used with patients. An adaptation of the software to the specific requirements of a hearing speech therapy is necessary to achieve a linguistic level that corresponds to the native language of the therapist and to enable an easy use in the therapy.
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Multimodal Machine Learning; Deep Learning; Hate Speech Detection
Online: 15 March 2021 (13:46:27 CET)
Hateful and abusive speech presents a major challenge for all online social media platforms. Recent advances in Natural Language Processing and Natural Language Understanding allow more accurate detection of hate speech in textual streams. This study presents a multimodal approach to hate speech detection by combining Computer Vision and Natural Language processing models for abusive context detection. Our study focuses on Twitter messages and, more specifically, on hateful, xenophobic and racist speech in Greek aimed at refugees and migrants. In our approach we combine transfer learning and fine-tuning of Bidirectional Encoder Representations from Transformers (BERT) and Residual Neural Networks (Resnet). Our contribution includes the development of a new dataset for hate speech classification, consisting of tweet ids, along with the code to obtain their visual appearance, as they would have been rendered in a web browser. We have also released a pre-trained Language Model trained on Greek tweets, which has been used in our experiments. We report a consistently high level of accuracy (accuracy score=0.970, f1-score=0.947 in our best model) in racist and xenophobic speech detection.
ARTICLE | doi:10.20944/preprints202010.0342.v1
Subject: Social Sciences, Safety Research Keywords: online hate; hate speech; online disinhibition; online safety
Online: 16 October 2020 (08:27:29 CEST)
Today’s youth have almost universal access to the internet and frequently engage in social networking activities using various social media platforms and devices. This is a phenomenon that hate groups are exploiting when disseminating their propaganda. This study seeks to better understand youth exposure to hateful material in the online space by exploring predictors of such exposure including demographic characteristics (age, gender and race), academic performance, online behaviours, online disinhibition, risk perception, and parents/guardians’ supervision of online activities. We implemented a cross-sectional study design, using a paper questionnaire, in two high schools in Massachusetts (USA), focusing on students 14 to 19 years old. Logistic regression models were used to study the association between independent variables (demographics, online behaviours, risk perception, parental supervision) and exposure to hate online. Results revealed an association between exposure to hate messages in the online space and time spent online, academic performance, communicating with a stranger on social media, and benign online disinhibition. In our sample, benign online disinhibition was also associated with students’ risk of encountering someone online that tried to convince them of racist views. This study represents an important first step in understanding youth’s risk factors of exposure to hateful material online.
ARTICLE | doi:10.20944/preprints201911.0346.v1
Subject: Medicine And Pharmacology, Neuroscience And Neurology Keywords: speech; Parkinson’s disease; deep brain stimulation; voice; articulation
Online: 28 November 2019 (02:57:03 CET)
Deep brain stimulation (DBS) of the subthalamic nucleus (STN) has become an effective and widely used tool in the treatment of Parkinson’s disease (PD). STN-DBS has varied effects on speech. Clinical speech ratings suggest worsening following STN-DBS, but quantitative intelligibility, perceptual, and acoustic studies have produced mixed and inconsistent results. Improvements in phonation and declines in articulation have frequently been reported during different speech tasks under different stimulation conditions. Questions remain about preferred STN-DBS stimulation settings. Seven right-handed, native speakers of English with PD treated with bilateral STN-DBS were studied off medication at three stimulation conditions: stimulators off, 60 Hz (low frequency stimulation - LFS), and the typical clinical setting of 185 Hz (High frequency - HFS). Spontaneous speech was recorded in each condition and excerpts were prepared for transcription (intelligibility) and difficulty judgements. Separate excerpts were prepared for listeners to rate abnormalities in voice, articulation, fluency, and rate. Intelligibility for spontaneous speech was reduced at both HFS and LFS when compared to STN-DBS off. Speech produced at HFS was more intelligible than that produced at LFS, but HFS made the intelligibility task (transcription) subjectively more difficult. Both voice quality and articulation were judged to be more abnormal with STN-DBS on. STN-DBS reduced the intelligibility of spontaneous speech at both LFS and HFS but lowering the frequency did not improve intelligibility. Voice quality ratings with STN-DBS were correlated with the ratings made without stimulation. This was not true for articulation ratings. STN-DBS exacerbated an existing voice disorder and may have introduced new articulatory abnormalities.
ARTICLE | doi:10.20944/preprints201910.0376.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: artificial neural network; deep learning; LSTM; speech processing
Online: 31 October 2019 (16:40:30 CET)
Speech signals are degraded in real-life environments, product of background noise or other factors. The processing of such signals for voice recognition and voice analysis systems presents important challenges. One of the conditions that make adverse quality difficult to handle in those systems is reverberation, produced by sound wave reflections that travel from the source to the microphone in multiple directions.To enhance signals in such adverse conditions, several deep learning-based methods have been proposed and proven to be effective. Recently, recurrent neural networks, especially those with long and short-term memory (LSTM), have presented surprising results in tasks related to time-dependent processing of signals, such as speech. One of the most challenging aspects of LSTM networks is the high computational cost of the training procedure, which has limited extended experimentation in several cases. In this work, we present a proposal to evaluate the hybrid models of neural networks to learn different reverberation conditions without any previous information. The results show that some combination of LSTM and perceptron layers produce good results in comparison to those from pure LSTM networks, given a fixed number of layers. The evaluation has been made based on quality measurements of the signal's spectrum, training time of the networks and statistical validation of results. Results help to affirm the fact that hybrid networks represent an important solution for speech signal enhancement, with advantages in efficiency, but without a significan drop in quality.
ARTICLE | doi:10.20944/preprints201910.0231.v1
Subject: Computer Science And Mathematics, Robotics Keywords: Android; arduino; bluetooth; grass cutter; sensors; speech recognition
Online: 20 October 2019 (02:03:44 CEST)
We present an Arduino-based automatic robotic system which is used for cutting grass or lawns, mostly healthy grass which needs to cut neatly like in a public park or a private garden. The purpose of this proposed project is to design a programmable automatic pattern design grass cutting robot with solar power which no longer requires time-consuming manual grass-cutting, and that can be operated wirelessly using an Android Smartphone via Bluetooth from a safe distance which is capable of cutting the grass in indeed required shapes and patterns; the cutting blade can also be adjusted to maintain the different length of the grass. The main focus was to design a prototype that can work with a little or no Physical user interaction. The proposed work is accomplished by using an Arduino microcontroller, DC geared Motors, IR obstacle detection sensor, motor shield, relay module, DC battery, solar panel, and Bluetooth module. The grass-cutting robot system can be moved to the location in the lawn remotely where the user wants to cut the grass directly or in desired patterns. The user can press the desired pattern button from the mobile application, and the system will start cutting grass in the similar design such as a circle, spiral, rectangle, and continue pattern. Also, with the assistance of sensors positioned at the front of the vehicle, an automatic barrier detection system is introduced to enhance safety measurements to prevent any risks. IR obstacle detector sensors are used to detect obstacles, if any obstacle is found in front of the robot while traveling; it avoids the barrier by taking a right/right turn or stop automatically appropriately, thereby preventing the collision. Also, the main aim of this project is the formation of a grass cutter that relieves the user from mowing their own grasses and reduces environmental and noise pollution. The proposed system is designed as a lab-scale prototype to experimentally validate the efﬁciency, accuracy, and affordability of the systems. The experimental results prove that the proposed work has all in one capability (Simple and Pattern based grass cutting with mobile-application, obstacle detection), is very easy to use, and can be easily assembled in a simple hardware circuit. We note that the systems proposed can be implemented on a large scale under real conditions in the future, which will be useful in robotics applications and cutting grass in playing grounds such as cricket, football, and hockey, etc.
ARTICLE | doi:10.20944/preprints202012.0728.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: omics data; hierarchical clustering; noise quantification
Online: 29 December 2020 (14:02:28 CET)
Identifying groups that share common features among datasets through clustering analysis is a typical problem in many fields of science, particularly in post-omics and systems biology research. In respect of this, quantifying how a measure can cluster or organize intrinsic groups is important since currently there is no statistical evaluation of how ordered is, or how much noise is embedded in the resulting clustered vector. Many of the literature focuses on how well the clustering algorithm orders the data, with several measures regarding external and internal statistical measures; but none measure has been developed to statistically quantify the noise in an arranged vector posterior a clustering algorithm, i.e., how much of the clustering is due to randomness. Here, we present a quantitative methodology, based on autocorrelation, to assess this problem.
DATA DESCRIPTOR | doi:10.20944/preprints201810.0179.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: imaging; CMOS; camera; SNR; noise; performance
Online: 9 October 2018 (09:38:23 CEST)
Expensive cameras meant for research applications are usually characterized by the manufacturers and detailed specifications  are available for them. Suppliers of inexpensive cameras usually do not provide such detailed information about their cameras. This data set provides the acquisition speed and noise characteristics acquired from a monochrome 1.2 megapixel CMOS camera, the QHY5L-II M . The source code provided along with this data set  can also be used to acquire similar data for other QHY cameras. This enables the use of such cost-effective cameras for other scientific applications in other fields, beyond the designed use in Astronomy.
ARTICLE | doi:10.20944/preprints201802.0153.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: microwave filters; vibration sensitivity; acoustic noise
Online: 26 February 2018 (07:56:14 CET)
A novel characterization method for discrete saw filters vibrational sensitivity is presented. The proposed approach allows the characterization of filters under vibrations and the extraction of a behavioural model. Filters are assumed to be transducers so that external induced vibrational energy is partially transformed in a undesired simultaneous amplitude and phase modulation of the input RF signal. When the filter is mechanically excited with vibrations, it introduces spurious amplitude and phase modulation to the input signal that can potentially affect the link quality.
ARTICLE | doi:10.20944/preprints202012.0164.v1
Subject: Engineering, Automotive Engineering Keywords: variational Bayesian; multiple-fading factors; time-varying noise covariance matrices; inaccurate noise; target tracking; update monitoring strategy
Online: 7 December 2020 (14:54:14 CET)
Aiming at the problem that the performance of Adaptive Kalman filter estimation will be affected when the statistical characteristics of the process and measurement noise matrix are inaccurate and time-varying in the linear Gaussian state-space model, an algorithm of Multi-fading factor and update monitoring strategy adaptive Kalman filter based variational Bayesian is proposed. Inverse Wishart distribution is selected as the measurement noise model, the system state vector and measurement noise covariance matrix are estimated with the variational Bayesian method. The process noise covariance matrix is estimated by the maximum a posteriori principle, and the update monitoring strategy with adjustment factors is used to maintain the positive semi-definite of the updated matrix. The above optimal estimation results are introduced as time-varying parameters into the multiple fading factors to improve the estimation accuracy of the one-step state predicted covariance matrix. The application of the proposed algorithm in target tracking is simulated. The results show that compared with the current filters, the proposed filtering algorithm has better accuracy and convergence performance, and realizes the simultaneous estimation of inaccurate time-varying process and measurement noise covariance matrices.
ARTICLE | doi:10.20944/preprints202305.1060.v1
Subject: Social Sciences, Education Keywords: EFL; language functions; speech acts; teacher’s perception; textbook evaluation
Online: 15 May 2023 (15:54:12 CEST)
This study mostly analyzes the pragmatic viewpoints of speech acts and language functions through Halliday’s (1975) language functions and Searle’s (1976) speech acts were adapted to analyze the functional aspects of the conversations and it was also intended to explore the teachers’ perception toward teaching and learning English as a Foreign Language (EFL) learners’ textbooks and the teacher’s components of communicative knowledge regarding the functions of language in daily activity. The participants of this paper consisted of thirteen Sunrise 10, 11, 12 grades of Kurdish teachers at high school English in Iraqi Kurdistan. Through semi-structured interview, it was found that the conversations in the mentioned textbooks are insufficient from the pragmatic point of view. Some recommendation for the textbook designers, teachers, material developer were raised to make up the shortcomings of the textbooks. The findings reveal that the conversation texts in Sunrise textbooks are not meeting the systematic standard of pragmatic competence the English language learners and the book designer must be recommended to be aware of those shortcomings of the textbook series if they are required to develop their speaking skills in both student and activity series. The implications of this paper can be helpful in comparing the results of this study with other similar studies to check if there is a universal pattern in performing the speech acts and language functions and the interest to Kimberley Education for Life learners in increasing their knowledge of pragmatics in general and the role of language functions and speech acts investigated in this study.
ARTICLE | doi:10.20944/preprints202211.0041.v1
Subject: Social Sciences, Language And Linguistics Keywords: older adults; whispered speech; lexical tone; vowel; duration; intensity
Online: 2 November 2022 (03:53:54 CET)
Purpose: This study aimed to examine how aging and modifications of critical acoustic parameters may affect the perception of whispered speech as a degraded signal. Method: Forty Mandarin-speaking adults were included in the study. Part 1 of the study compared the perception of Mandarin lexical tones, vowels, and syllables in older and younger adults in whispered vs. phonated speech conditions. Parts 2 and 3 further examined how modification of duration and intensity cues contributed to the perceptual outcomes. Results: Perception of whispered tones was compromised in older and younger adults. Older adults identified lexical tones less accurately than their younger counterparts, particularly for phonated T2, T3 and whispered T3. Aging also negatively affected the vowel identification of /i, u/ in the whispered condition. Syllable-level accuracy was largely dependent on the accuracy of lexical tones and vowels. Furthermore, reduced duration led to the decreased accuracy of phonated T3 and whispered T2, T3 but increased accuracy of phonated T4. Reduced intensity lowered the recognition accuracy for phonated vowels /i, ɤ, o, y/ in older adults and /i, u/ in younger adults, and it also lowered the accuracy of whispered vowels /a, ɤ/ in older adults. Contrary to our expectation, increased duration and intensity did not improve older adults’ speech perception in either phonated or whispered conditions. Conclusion: The results suggest that aging adversely affected speech perception in both phonated and whispered conditions with more challenges in identifying whispered speech for older adults. While older adults’ diminished performance may be potentially due to problems with processing the degraded temporal and spectral information of the target speech sounds, it cannot be simply compensated for by increasing the duration and intensity of the target sounds beyond the audible level.
ARTICLE | doi:10.20944/preprints202210.0424.v1
Subject: Social Sciences, Language And Linguistics Keywords: emotional speech processing; communication channel; emotion category; task type
Online: 27 October 2022 (08:04:59 CEST)
How language mediates emotional perception and experience is poorly understood. The present event-related potential (ERP) study examined the explicit and implicit processing of emotional speech to differentiate the relative influences of communication channel, emotion category and task type in the prosodic salience effect. Thirty participants (15 women) were presented with spoken words denoting happiness, sadness and neutrality in either the prosodic or semantic channel. They were asked to judge the emotional content (explicit task) and speakers’ gender (implicit task) of the stimuli. Results indicated that emotional prosody (relative to semantics) triggered larger N100 and P200 amplitudes with greater delta, theta and alpha inter-trial phase coherence (ITPC) values in the corresponding early time windows, and continued to produce larger LPC amplitudes and faster responses during late stages of higher-order cognitive processing. The relative salience of prosodic and semantics was modulated by emotion and task, though such modulatory effects varied across different processing stages. The prosodic salience effect was reduced for sadness processing and in the implicit task during early auditory processing and decision-making but reduced for happiness processing in the explicit task during conscious emotion processing. Additionally, across-trial synchronization of delta, theta and alpha bands predicted the ERP components with higher ITPC values significantly associated with stronger N100, P200 and LPC enhancement. These findings reveal the neurocognitive dynamics of emotional speech processing with prosodic salience tied to stage-dependent emotion- and task-specific effects, which can reveal insights to research reconciling language and emotion processing from cross-linguistic/cultural and clinical perspectives.
ARTICLE | doi:10.20944/preprints202105.0777.v1
Subject: Social Sciences, Psychology Keywords: statistical learning; experiment interaction; phonology; child speech; language acquisition
Online: 31 May 2021 (13:37:09 CEST)
When participants in a statistical learning paradigm are asked to learn from two incompatible or competing inputs, they often fail to learn from one or both inputs. This study presents the results of two experiments that were both completed by one group of typically developing four-year-old children. One experiment targeted word-medial consonant patterns (phonotactics), whereas the other targeted strong-weak and weak-strong stress patterns (prosody). The order of the experiments was critical for learning outcomes in the phonotactics experiment: When children learned phonotactics first, their production accuracy increased following exposure to a high frequency input. When children learned phonotactics second, however, their production accuracy dropped when they were exposed to the high frequency input. Results from the prosody experiment were inconclusive, with limited evidence of any learning effect. Overall, the results suggest that children may conflate learning experiences, and patterns learned from an initial experimental input compete with patterns in a subsequent experiment. When considering natural language acquisition, the results suggest that an isolated episode of learning may lead to generalizations that are incompatible with later input, and possibly, with larger patterns in the language.
REVIEW | doi:10.20944/preprints202009.0197.v2
Subject: Social Sciences, Psychology Keywords: academic freedom; free speech; censorship; free inquiry; thought suppression
Online: 12 October 2020 (10:07:22 CEST)
This paper explores the suppression of ideas within academic scholarship by academics, either by self-suppression or because of the efforts of other academics. Legal, moral, and social issues distinguishing freedom of speech, freedom of inquiry, and academic freedom are reviewed. How these freedoms and protections can come into tension is then explored by an analysis of denunciation mobs who exercise their legal free speech rights to call for punishing scholars who express ideas they disapprove of and condemn. When successful, these efforts, which constitute legally protected speech, will suppress certain ideas. Real-world examples over the past five years of academics who have been sanctioned or terminated for scholarship targeted by a denunciation mob are then explored.
REVIEW | doi:10.20944/preprints202311.0344.v1
Subject: Engineering, Aerospace Engineering Keywords: aeroacoustics; turbomachinery noise; analytical modeling; computational aeroacoustics
Online: 6 November 2023 (10:32:44 CET)
The present paper is aimed at presenting an updated review of prediction methods for the aerodynamic noise of ducted rotor-stator stages. Indeed, ducted rotating-blade technologies are in continuous evolution and increasingly used, for both aeronautical propulsion units, power generation and air conditioning systems. Different needs are faced, from the early design stage to the final definition of a machine. Fast-running, approximate analytical approaches and high-fidelity numerical simulations are considered as the best suited tools for each, respectively. Recent advances are discussed, with emphasis on their pros and cons.
ARTICLE | doi:10.20944/preprints202309.0585.v1
Subject: Social Sciences, Language And Linguistics Keywords: babble noise; lexical tone; emotional prosody; masking
Online: 8 September 2023 (11:14:25 CEST)
How people recognize linguistic and emotional prosody in different listening conditions is essential for understanding the complex interplay between social context, cognition, and communication. The perception of both lexical tones and emotional prosody depends on prosodic features including pitch, intensity, duration, and voice quality. However, it is unclear which aspect of prosody is perceptually more salient and resistant to noise. This study aimed to investigate the relative perceptual robustness of emotional prosody and lexical tone recognition in quiet and in the presence of multi-talker babble noise. Forty young adults with normal hearing listened to monosyllables either with or without background babble noise and completed two identification tasks, one for emotion recognition and the other for lexical tone recognition. Compared with emotional prosody, lexical tones were more perceptually salient in multi-talker babble noise. Native Mandarin Chinese participants identified lexical tones more accurately and quickly than vocal emotions at the same signal-to-noise ratio. Lexical tone perception is also more robust against babble speech noise degradation than emotional prosody perception for native Mandarin Chinese listeners. Acoustic and cognitive dissimilarities between linguistic prosody and emotional prosody may have led to the phenomenon, which calls for further explorations into the underlying psychobiological and neurophysiological mechanisms.
REVIEW | doi:10.20944/preprints202101.0493.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: cartography; seismic waves; subsurface; ambient noise survey
Online: 25 January 2021 (12:41:22 CET)
The cartographic cum site-effectuated view of Shillong region of northeast India is presented here. Starting from the existing tectonics, the prevalent geological settings of the study area is comprehensively delineated. The seismic prone area is further overviewed in the context of site effects with accompaniment of available borehole information. The resonance frequency estimates form ambient noise survey along with receiver functions are outlined which implicates a heterogeneous subsurface. This further helps in segregating the region into two compelling profiles, thereby enabling us to get a deeper insight in the probable subsurface as well as heterogeneity. Eventually, the influence of topography over strata was also highlighted and interpreted as well.
ARTICLE | doi:10.20944/preprints202011.0725.v1
Subject: Engineering, Automotive Engineering Keywords: Communications engineering; impulsive noise; variational Bayesian inference
Online: 30 November 2020 (12:02:13 CET)
Impulsive noise is the main limiting factor for transmission over channels affected by electromagnetic interference. We study the estimation of (correlated) Gaussian signals in an impulsive noise scenarios. In this work, we analyze some of the existing as well as some novel estimation algorithms. Their performance is compared, for the first time, for different channel conditions, including the Markov-Middleton scenario, where the impulsive noise switches between different noise states. Following a modern approach in digital communications, the receiver design is based on a factor graph model and implements a message passing algorithm. The correlation among signal samples as well as among noise states brings about a loopy factor graph, where an iterative message passing scheme should be employed. As it is well known, approximate variational inference techniques are necessary in these cases. We propose and analyze different algorithms and provide a complete performance comparison among them, showing that both Expectation Propagation, Transparent Propagation, and the Parallel Iterative Schedule approaches reach a performance close to the optimal, at different channel conditions.
ARTICLE | doi:10.20944/preprints202006.0197.v1
Subject: Physical Sciences, Mathematical Physics Keywords: Covid-19; fluctuations; noise; epidemics; lockdown; promiscuity
Online: 16 June 2020 (07:42:04 CEST)
Most popular statistical models in epidemic evolution focus on the dynamics of average relevant quantities and overlooks the role of small fluctuations on the model parameters. Models for Covid-19 are no exception. In this paper we show that the role of time-correlated fluctuations, far from being negligible, can in fact determine the spreading of an epidemic and, most importantly, the resurgence of the exponential diffusion in the presence of time-limited episodes in promiscuity behaviours.
REVIEW | doi:10.20944/preprints201608.0236.v1
Subject: Environmental And Earth Sciences, Environmental Science Keywords: noise pollution; mechanical wood industries; equipment; control
Online: 31 August 2016 (09:03:57 CEST)
High level of noise is a disturbance to the human environment. Noise in industries is also an occupational hazard because of its attendant effects on workers’ health. Noise presents health and social problems in industrial operations, and the source is related to the machineries used in the industries. One of the unique features of the noise associated with wood machinery is the level of exposure and duration. Equipment used in a factory can be extremely loud. They can produce noise at decibels high enough to cause environmental health and safety concerns. The mechanically driven transport and handling equipment, cutting, milling, shaping and dust extractor installations in the wood industry generate noise. The sources of noise pollution have increased due to non-compliance with basic safety practices. The increased use of locally fabricated machine in the industry has increased the level of noise and vibration. The effects of industrial noise pollution as discussed include: increase in blood pressure; increased stress; fatigue; vertigo; headaches; sleep disturbance; annoyance; speech problems; dysgraphia, which means reading/learning impairment; aggression; anxiety and withdrawal. As presented in this paper, noise control techniques include; sound insulation, sound absorption, vibration damping and Vibration isolation.
ARTICLE | doi:10.20944/preprints202006.0091.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Breast Cancer Screening; Digital Image Elasto Tomography (DIET); Image Noise Removal, Image Enhancement; Multiple Frame Noise Removal (MFNR)
Online: 7 June 2020 (14:53:34 CEST)
Breast cancer is a leading cause of death among women. Conventional screening methods, such as mammography, and ultrasound diagnosis are expensive and have significant limitations. Digital Image Elasto Tomography (DIET) is a new noninvasive breast cancer screening system that has a potential to be a low cost and reliable breast cancer screening tool. It is based on modal analysis of the breast mass, and stereographic 3D image analysis to detect the stiffer abnormal tissues. However, camera sensor noise, especially Gaussian noise is a major source of Optical Flow (OF) error in this approach to tumor detection. This work studies the performance of different conventional filters, including the standard Gaussian filter tool to remove this noise and produce more robust screening results. A radical approach, Multiple Frame Noise Removal (MFNR) is proposed, for use in this type of medical image processing instead of a Gaussian filter or other typical image noise removal tools. Its a multiple frame noise removal method where Probability Density Function (PDF) of noise is extracted from the multiple images by characterizing the same pixel positions in multiple images. The noise becomes deterministic, and hence easily removed. The proposed algorithm was applied to a data set from 10 phantom breast tests with a prototype DIET system, and 10 in-vivo samples from healthy women. Comparisons were made to an optimal Gaussian filter form that is commonly used. Reductions in OF error using these digitally imaged data sets was used to compare performance. Refinement of the images for medical applications requires higher PSNR, which was successfully achieved by using MFNR algorithm. In this study, the algorithm was used to improve the imaging results of a DIET system. The conventional wisdom that states that noise removal and detail preservation are contrasting effects is
ARTICLE | doi:10.20944/preprints201708.0012.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: power gating; read decoupling; read-write static noise margin; dynamic noise margin; read-write energy; schmitt trigger; leakage power
Online: 4 August 2017 (11:08:55 CEST)
An ultra-low power (ULP), power gated static random access memory (SRAM) is presented for Internet of Things (IoT) applications, which operates in sub-threshold voltage ranges from 300mV to 500mV. The proposed SRAM has tendency to operate in low supply voltages with high static and dynamic noise margins. The IoT application involves battery enabled low leakage memory architecture in subthreshold regime which has low power consumption. Therefore, to improve power consumption along with better cell stability, a power gated 10T SRAM is presented. The proposed cell uses a power gated p-MOS transistor to reduce the leakage power or static power in standby mode. Moreover, due to the schmitt triggering and read decoupling of 10T SRAM the static and dynamic behavior in read, write and standby mode has shown enhanced tolerance at different process, voltage and temperature (PVT) conditions. The proposed SRAM shows better results in terms of leakage power, read static noise margin (RSNM), write static noise margin (WSNM), write-ability or write trip point (WTP), read-write energy and dynamic read margin (DRM). Further, these parameters are observed at 8-Kilo bit (Kb) and compared with already existing SRAM architectures. It is observed that the leakage power is reduced by 1/81×, 1/75× of the conventional 6T (C6T) SRAM and read decoupled 8T (RD8T) SRAM, respectively at 300mV VDD. On the contrary, RSNM, WSNM, WTP and DRM values are improved by 3×, 2×, 11.11% and 31.8% as compared to C6T SRAM, respectively. Similarly, proposed 10T has 1.48×, 25% and 9.75% better RSNM, WSNM and WTP values as compared to RD8T SRAM, respectively at 300mV VDD.
CASE REPORT | doi:10.20944/preprints202212.0561.v1
Subject: Social Sciences, Psychology Keywords: Potocki–Lupski syndrome; 17p11.2; PTLS; autism; ASD; EEG; language; speech
Online: 29 December 2022 (13:00:18 CET)
Potocki-Lupski Syndrome (PTLS) is a rare condition associated with a duplication of 17p11.2 that may underlie a wide range of congenital abnormalities and heterogeneous behavioral phenotypes. Along with developmental delay and intellectual disability, autism-specific traits are often reported to be the most common among patients with PTLS. To contribute to the discussion of the role of autism spectrum disorder (ASD) in the PTLS phenotype, we present a case of a female adolescent with a de novo dup(17)(p11.2p11.2) without ASD features, focusing on in-depth clinical, behavioral, and electrophysiological (EEG) evaluations. Among EEG features, we found the atypical peak-slow wave patterns and a unique saw-like sharp wave of 13 Hz that was not previously described in any other patient. The power spectral density of the resting state EEG was typical in our patient with only the values of non-linear EEG dynamics: Hjorth complexity and Fractal dimension were drastically attenuated compared with the patient’s neurotypical peers. Here we also summarize results from previously published reports of PTLS that point to the about 21% occurrence of ASD in PTLS that might be biased, taking into account methodological limitations. More consistent among PTLS patients were intellectual disability and speech and language disorders.
DATA DESCRIPTOR | doi:10.20944/preprints202212.0118.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Lip reading; Visual speech recognition; Turkish dataset; Face parts detection
Online: 7 December 2022 (06:50:33 CET)
The promised dataset was obtained from the daily Turkish words and phrases pronounced by various people in the videos posted on YouTube. The purpose of collecting the dataset is to provide detection of the spoken word by recognizing patterns or classifying lip movements with supervised, unsupervised, semi-supervised learning and machine learning algorithms. Most of the datasets related with lip reading consist of people recorded on camera with fixed backgrounds and the same conditions, but the dataset presented here consists of images compatible with machine learning models developed for real-life challenges. It contains a total of 2335 instances taken from TV series, movies, vlogs, and song clips on YouTube. The images in the dataset vary due to factors such as the way people say words, accent, speaking rate, gender and age. Furthermore, the instances in the dataset consist of videos with different angles, shadows, resolution, and brightness that are not created manually. The most important feature of our lip reading dataset is that we contribute to the non-synthetic Turkish dataset pool, which does not have wide dataset varieties. Machine learning studies can be carried out in many areas, such as the defense industry and social life, with this dataset.
ARTICLE | doi:10.20944/preprints202203.0333.v1
Subject: Engineering, Control And Systems Engineering Keywords: Hate speech detection; Social media; Machine learning; Multi-model learning
Online: 25 March 2022 (02:10:12 CET)
Users on the social networking platform have the freedom to express themselves freely. Towards the same time, this has created a forum for disagreement and hate directed at someone, society, racism, sexual orientation, and so on. Identifying hate online is a challenging task. Researchers from all around the world have contributed major methods for detecting hate speech, but owing to the issue's complexity, there are still many unresolved issues. In this research, we offer a multi-model learning strategy for detecting hate speech on Twitter. We utilised the Kaggle TwitterHate dataset, which had 31962 tweets categorised as binary hate or non-hate, to evaluate our technique. The suggested method is tested using commonly used machine learning classifiers with multi-model technique. Using TF-IDF features, we acquired detection results of 96.29 %, precision of 96%, recall of 96%, and f1-score of 96%.
ARTICLE | doi:10.20944/preprints201805.0274.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: artificial intelligence; semantic web; natural language; Google cloud speech; SPARQL
Online: 21 May 2018 (12:38:00 CEST)
The main restriction of the Semantic Web is the difficult of the SPARQL language, that is necessary to extract information from the Knowledge Representation also known as ontology. Making the Semantic Web accessible for people who do not know SPARQL, is essential the use of friendlier interfaces and a good alternative is Natural Language. This paper shows the implementation of a friendly prototype interface to query and retrieve, by voice, information from website building with the Semantic Web tools. In that way, the end users avoid the complicated SPARQL language. To achieve this, the interface recognizes a speech query and converts it into text, it processes the text through a java program and identifies keywords, generates a SPARQL query, extracts the information from the website and read it in voice, for the user. In our work Google Cloud Speech API makes Speech-to-Text conversions and Text-to Speech conversions are made with SVOX Pico. As results, we have measured three variables: The success rate in queries, the response time of query and a usability survey. The values of the variables allows the evaluation of our prototype. Finally the interface proposed provides us a new approach in the problem, using the Cloud like a Service, reducing barriers of access to the Semantic Web for people without technical knowledge of Semantic Web technologies.
REVIEW | doi:10.20944/preprints202311.1003.v1
Subject: Environmental And Earth Sciences, Sustainable Science And Technology Keywords: automotive; noise; vibration; harshness; human health; environment; sustainability
Online: 15 November 2023 (11:52:15 CET)
This paper exposes the initial part of a larger research on the impact of the automotive on human health and the quality of life and environment. After a brief introduction on the need for responsible and sustainable approaches and few questions on the subject, some considerations and statistical data are presented, based on a study of the literature on the impact of road traffic and vehicles noise and vibration on people's comfort and health and their quality of life. The research results show, based on official statistics, the significant harmful impact of noise and vibration from motor vehicles and road transport on people's health and quality of life, especially in urban areas during the day. Also, the significant increase in the number of electric and hybrid vehicles from one year to another, a reality and a necessity nowadays, and the awareness that electric vehicles are not perfectly quiet and comfortable, open new research opportunities and require the development of new standards, materials, tools, equipment and test methods in the field of NVH, in a sustained synergistic approach from all stakeholders, to meet the needs and demands of today's consumers, and to comply with existing regulations and standards on environmental protection and sustainable development.
ARTICLE | doi:10.20944/preprints202309.1508.v1
Subject: Engineering, Mechanical Engineering Keywords: high-strength steel; Barkhausen noise; surface heterogeneity; asymmetry
Online: 22 September 2023 (09:17:26 CEST)
This study deals with two different aspects of the high-strength low-alloyed 1100 MC steel. The first is associated with the remarkable heterogeneity in the surface state produced during sheet rolling with respect to the sheet width. The variable-thickness surface layer exhibits a microstructure different from that of the deeper bulk. Variation of the thickness of the thermally softened near-surface region strongly affects Barkhausen noise, as well. This technique can be considered a reliable tool for monitoring the aforementioned heterogeneity. It can also be reported that the opposite sides of the sheet are different with respect to the surface state, heterogeneity distribution, and corresponding Barkhausen noise. These aspects indicate the different conditions during hot rolling followed by rapid quenching on the upper and lower rollers. The second aspect is related to the remarkable asymmetry of Barkhausen noise emission with respect to two consecutive bursts. This asymmetry is due to the presence of remnant magnetisation in the sheet produced during manufacturing. The remnant magnetisation is coupled to the magnetic field produced by the excitation coil of the Barkhausen noise sensor and strongly contributes to the aforementioned asymmetry. As soon as sufficient removal of this remnant magnetisation is carried out in the vanishing magnetic field (demagnetisation), the aforementioned remarkable asymmetry is fully lost.
ARTICLE | doi:10.20944/preprints202308.1378.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: data synthesis; unknown noise; interpolation; sample optimization; robust
Online: 18 August 2023 (12:00:13 CEST)
Most existing data synthesis methods are designed to tackle problems such as dataset imbalance, data anonymization and insufficient sample size. There is a lack of effective synthesis methods for the limited number of datasets which contain a large of features and unknown noise to expand the size of the dataset. We propose a data synthesis method, named Adaptive Subspace Interpolation for Sample Optimization (ASISO). The idea is to divide the original feature space into several subspaces with an equal number of samples, and then perform interpolation for the samples in the adjacent subspaces. This method can adaptively adjust the size of the dataset containing unknown noise, and the expanded data typically contain minimal error with actual. Moreover, it adjusts the structure of the samples, which can significantly reduce the proportion of samples with large errors. In addition, the hyperparameters of this method have an intuitive explanation and usually require little calibration. Experimental results on artificial data and benchmark data sets demonstrate that ASISO is a robust and stable method to optimize samples.
ARTICLE | doi:10.20944/preprints202202.0311.v2
Subject: Physical Sciences, Astronomy And Astrophysics Keywords: Gravitational noise; Gravitational waves; correlation analysis; digital filters
Online: 4 May 2022 (12:31:43 CEST)
Analyzing the records of Advanced LIGO and Virgo gravitational observatories, we found a specific time-domain asymmetry, inherent only to the signals of their gravitational detectors. Experiments with different periodic signals, Gaussian and non-Gaussian noises made it possible to conclude that the noise of gravitational detectors is an unusual mixture of signals. The gravitational-wave signals have been detected and recognized using a specialized Pearson correlation analyzer. It turned out that the detector signals include a significant (– 6 dB) component, which has the properties of records of reliably recognized gravitational waves. This allows one to argue that the gravitational noise is largely due to the processes of merging astronomical objects. Since the specific signal is registered by the detectors continuously, the field of gravitational oscillations of the sub-kilohertz band can be considered as detected. A method of analysis has also been developed to estimate the contribution of the gravitational noise component to the total signal energy. With its help it will be possible not only to pass to the radio-frequency estimation of the magnitude of gravitational disturbances but also, possibly, to construct a map of the gravitational noise of the sky.
ARTICLE | doi:10.20944/preprints202006.0090.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Noise Removal; Image Enhancement; MFNR; multi-dimensional data
Online: 7 June 2020 (14:51:03 CEST)
In research applications across several areas, noise removal is indispensable for accuracy of final results. Noise is caused due to physical principals, such as background electronic noise, quantum effect, and wave rebound effect to name a few. Noise removal can help improve results in medical, astronomy, defense, and numerous other fields. Addressing this limitation would result in potentially low cost, automatic, and reliable systems. In this paper, a generalized new approach i.e. Multi-Frame Noise Removal (MFNR) is proposed for noise removal. Given any type of data, the probability density function (PDF) of the noise can be determined. Herein, we extracted the noise PDF parameters using KDE (Kernel Density Estimation). Because the data is corrupted by “deterministic” noise, hence can be cleaned. This could be used as a general purpose noise removal tool. The data point with same position in multiple frames helps us determine the noise PDF characteristics and hence making it possible to remove noise. The conventional wisdom which states that noise removal and detail preservation are contrary to each other is not true for MFNR. Experimental results validate our proposed method which showed practically complete noise reduction based on number of frames used, as compared to existing benchmark methods.