ARTICLE | doi:10.20944/preprints201905.0342.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: cadastral boundaries; automation; feature extraction; object based image analysis
Online: 29 May 2019 (04:37:50 CEST)
The objective to fast-track the mapping and registration of large numbers of unrecorded land rights globally, leads to the experimental application of Artificial Intelligence (AI) in the domain of land administration, and specifically the application of automated visual cognition techniques for cadastral mapping tasks. In this research, we applied and compared the ability of rule-based systems within Object Based Image Analysis (OBIA), as opposed to human analysis, to extract visible cadastral boundaries from Very high resolution (VHR) World View-2 image, in both rural and urban settings. From our experiments, machine-based techniques were able to automatically delineate a good proportion of rural parcels with explicit polygons where the correctness of the automatically extracted boundaries was 47.4% against 74.24% for humans and the completeness of 45% for machine, as against 70.4% for humans. On the contrary, in the urban area, automatic results were counterintuitive: even though urban plots and buildings are clearly marked with visible features such as fences, roads and tacitly perceptible to eyes, automation resulted in geometrically and topologically poorly structured data, that could neither be geometrically compared with human digitised, nor actual cadastral data from the field. These results provide an updated snapshot with regards to the performance of contemporary machine-drive feature extraction techniques compared to conventional manual digitising.
ARTICLE | doi:10.20944/preprints201812.0067.v1
Subject: Environmental And Earth Sciences, Environmental Science Keywords: built-up area; classification; Landsat 8- OLI; feature engineering; feature learning; CNN; accuracy evaluation
Online: 5 December 2018 (12:06:34 CET)
Detailed built-up area information is valuable for mapping complex urban environments. Although a large number of classification algorithms about built-up areas have been developed, they are rarely tested from the perspective of feature engineering and feature learning. Therefore we launched a unique investigation to provide a full test of the OLI imagery for 15-m resolution built-up area classification in 2015, in Beijing, China. Training a classifier requires many sample points, and we propose a method based on the ESA's 38-meter global built-up area data of 2014, Open Street Map and MOD13Q1-NDVI to achieve rapid and automatic generation of a large number of sample points. Our aim is to examine the influence of a single pixel and image patch under traditional feature engineering and modern feature learning strategies. In feature engineering, we consider spectra, shape and texture as the input features, and SVM, random forest (RF) and AdaBoost as the classification algorithms. In feature learning, the convolution neural network (CNN) is used as the classification algorithm. In total, 26 built-up land cover maps were produced. Experimental results show that: (1) the approaches based on feature learning are generally better than those based on feature engineering in terms of classification accuracy, and the performance of ensemble classifiers e.g., RF, is comparable to that of CNN. Two dimensional CNN and the 7 neighborhood RF have the highest classification accuracy of nearly 91%. (2) Overall, the classification effect and accuracy based on image patches are better than those based on single pixels. The features that can highlight the information of the target category (for example, PanTex and EMBI) can help improve classification accuracy.
ARTICLE | doi:10.20944/preprints202306.0755.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Convolutional neural network; Chest CT images; Classification; Adaptive Feature Extraction
Online: 12 June 2023 (04:29:17 CEST)
Deep convolutional neural networks (CNN) are favored methods widely used in medical image processing due to their assured shown performance. Recently, the emergence of new lung diseases and the possibility of early detection of their symptoms has attracted many researchers to classify diseases by training deep CNNs on lung CT images. The trained networks are expected to distinguish between lung indications in diﬀerent diseases, especially at the early stages of them. With the hope of achieving this purpose, we proposed an eﬃcient deep CNN called AFEX-Net with adaptive feature extraction layers that successfully extract distinguishing features and classify chest CT images. The eﬃciency of the proposed network has two aspects: it is a lightweight network with low number of parameters and fast training and it has adaptive pooling layers and adaptive activation functions to increase its level of compatibility to the input data. The proposed network has been evaluated on a dataset with more than 10K chest CT slices, while an eﬃcient pre-processing method is developed to remove any bias from the images. Additionally, we evaluated the performance of the proposed model on the public COVID-CTset dataset to prove the generalisability of our model. The obtained results conﬁrm the competence of the proposed network in confronting medical images, where prompt and accurate learning is required.
ARTICLE | doi:10.20944/preprints202108.0067.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: Feature extraction; independent component analysis; 3D inversion; physical properties
Online: 3 August 2021 (09:45:30 CEST)
A major problem in the post-inversion geophysical interpretation is the extraction of geological information from inverted physical property models, which do not necessarily represent all underlying geological features. No matter how accurate the inversions are, each inverted physical property model is sensitive to limited aspects of subsurface geology and is insensitive to other geological features that are otherwise detectable with complementary physical property models. Therefore, specific parts of the geological model can be reconstructed from different physical property models. To show how this reconstruction works, we simulated a complex geological system that comprises an original layered earth model that has passed several geological deformations and alteration overprints. Linear combination of complex geological features comprised three physical property distributions: Electrical resistivity, induced polarization chargeability, and magnetic susceptibility models. This study proposes a multivariate feature extraction approach to extract information about the underlying geological features comprising the bulk physical properties. We evaluated our method in numerical simulations and compared three feature extraction algorithms to see the tolerance of each method to the geological artifacts and noises. We show that the fast-independent component analysis (fast-ICA) algorithm by negentropy maximization is a robust method in the geological feature extraction that can handle the added unknown geological noises. The post-inversion physical properties are also used to reconstruct the underlying geological sources. We show that the sharpness of the inverted images is an important constraint on the feature extraction process. Our method successfully separates geological features in multiple 3D physical property models. This methodology is reproducible for any number of lithologies and physical property combinations and can recover the latent geological features, including the background geological patterns from overprints of chemical alteration.
ARTICLE | doi:10.20944/preprints202308.1580.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Machine Learning; Face Recognition; image classification, Feature Extraction
Online: 22 August 2023 (13:22:53 CEST)
It is crucial to select the right machine learning classifier for image classification and face recog-nition. This study examines the effectiveness of four different face recognition classifiers - Support Vector Machines (SVM), Random Forest, K-Nearest Neighbors (KNN), and Neural Networks. An analysis of the Large Faces in the Wild (LFW) dataset was carried out using Principal Component Analysis (PCA). Classifiers are rigorously trained and evaluated based on the extracted features. Comparison of classifier performance is an insightful way to figure out their strengths and weaknesses. Having a visual representation of the classifier's performance gives a complete understanding of its capabilities. Through the selection of the most appropriate classifier, study results contribute to advancements in image classification, recognition, and biometric identification. The comparison study demonstrated that the Neural Network classifier was exceptionally accurate and proficient in recognizing faces from the LFW dataset when used in conjunction with PCA for feature extraction. According to the comparative analysis, the Neural Network classifier proved exceptionally accurate and proficient at identifying faces from the LFW dataset when combined with PCA for feature extraction.
ARTICLE | doi:10.20944/preprints201906.0245.v1
Subject: Engineering, Automotive Engineering Keywords: feature extraction; corner detection; FAST algorithm; Harris detector; UAV
Online: 25 June 2019 (08:27:29 CEST)
Many corner detector techniques have already been used in extracting information from UAV images to perform various photogrammetric and mapping activities. Among these techniques is the Feature from Accelerated Segment Test (FAST) and the Harris corner detector. It is widely agreed that the evaluation of detectors is of great importance because it evaluates and enhances the accuracy of the detected features. This research evaluates the performance of FAST-9 and FAST-12 as well as the Harris detector in terms of the repeatability rate, completeness, and correctness under different threshold values. Each method is evaluated in terms of its ability for detection UAV objects (crowd and cars features). Then the common detected features between both FAST versions and the Harris detector are extracted. This is to determine which method performs best under different image conditions (e.g., illumination variations, camera position and orientation, and image noise). The results show that the size of the threshold plays a crucial role in determining the number of detected feature points. An increase in the threshold value leads to a decrease in the number of detected points and vice versa. Thus, the correctness decreases whereas the completeness increases as a function of the threshold values. Furthermore, the relationship between the FAST-9 and the Harris detector is slightly better than those between the FAST-12 and the Harris detector. This is because the number of common features between the FAST-9 and the Harris detector are relatively higher than those between the FAST-12 and the Harris detector.
Subject: Computer Science And Mathematics, Information Systems Keywords: local feature extraction; scale-space representation; laplacian of gaussian; convolution template
Online: 8 October 2019 (10:33:37 CEST)
This paper presents a novel method to extract local features, which instead of calculating local extrema computes global maxima in a discretized scale-space representation. To avoid obtaining precise scales by interpolation and to achieve perfect rotation invariance, two essential techniques, increasing the width of kernels in pixel and utilizing disk-shaped convolution template are adopted in this method. Since the size of a convolution template is finite and finite templates can introduce computational error into convolution, we sufficiently discuss this problem and work out an upper bound of the computational error. The upper bound is utilized in the method to ensure that all features obtained are computed under a given tolerance. Besides, the technique of relative threshold to determine features is adopted to reinforce the robustness for the scene of changing illumination. Simulations show that this new method attains high performance of repeatability in various situations including scale change, rotation, blur, JPEG compression, illumination change and even viewpoint change.
ARTICLE | doi:10.20944/preprints201804.0192.v1
Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: abnormal ECG; ECG processing, feature extraction; heart beat classification, abnormality detection
Online: 16 April 2018 (06:28:25 CEST)
Automated Electrocardiogram (ECG) processing is an important technique which helps in identifying abnormalities in the heart before any formal diagnosis. This research presents a real-time and lightweight R-assisted feature extraction algorithm and a heartbeat classification scheme which achieves highly accurate abnormality detection. In the proposed algorithm, we extract fifteen features from each heartbeat taken from raw Lead-II ECG signals. The features carry medically valuable information such as locations, amplitude and energy of ECG waves (P, Q, R, S, T waves) which are then used for detection of any abnormality that might be present in the heartbeat using various classification algorithms. We have used four popular databases from Physionet and extracted ten thousand ECG signals from each for training the models and benchmarking results. Four classification models i.e. Naïve Bays, k-Nearest Neighbor, Neural Network, Decision Tree were used for abnormality detection validating the efficiency of the system.
ARTICLE | doi:10.20944/preprints202309.1397.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Relation extraction; Subject feature; Attention mechanism; Railway traffic in Tibet
Online: 21 September 2023 (03:30:34 CEST)
To address the deficiency of existing relation extraction models in effectively extracting relational triplets pertaining to railway traffic knowledge in Tibet, this paper constructs a Tibet Railway Traffic text dataset and provides an enhanced relation extraction model. The proposed model incorporates subject feature enhancement and relational attention mechanisms. It leverages a pre-trained model as the embedding layer to obtain vector representations of text. Subsequently, the subject is extracted and its semantic information is augmented using an LSTM neural network. Furthermore, during object extraction, the multi-head attention mechanism enables the model to prioritize relations associated with the aforementioned features. Finally, objects are extracted based on the subjects and relations. The proposed method has been comprehensively evaluated on multiple datasets, including the Tibet Railway Traffic text dataset and two public datasets. The results on the Tibet dataset achieves an F1-score of 93.3\%, surpassing the baseline model CasRel by 0.8\%, indicating a superior applicability of the proposed model. On the other hand, the model achieves F1-scores of 91.1\% and 92.6\% on two public datasets, NYT and WebNLG, respectively, outperforming the baseline CasRel by 1.5\% and 0.8\%, which highlights the good generalization ability of the proposed model.
ARTICLE | doi:10.20944/preprints202208.0201.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: auto-encoder; high sparse binary data; feature extraction; SNV integration
Online: 10 August 2022 (10:27:32 CEST)
Genomics involving tens of thousands of genes is a complex system determining phenotype. An interesting and vital issue is that how to integrate highly sparse genetic genomics data with a mass of minor effects into prediction model for improving prediction power. We find that deep learning method can work well to extract features by transforming highly sparse dichotomous data to lower dimensional continuous data in a non-linear way. This idea may provide benefits in risk prediction based on genome-wide data associated e.g. integrating most of the information in the genotype data. Hence, we developed a multi-stage strategy to extract information from highly sparse binary genotype data and applied it for risk prediction. Specifically, we first reduced the number of biomarkers via a univariable regression model to a moderate size. Then a trainable auto-encoder was used to extract compact representations from the reduced data. Next, we performed a LASSO problem process over a grid of tuning parameter values to select the optimal combination of extracted features. Finally, we applied such feature combination to two prognostic models, and evaluated predictive effect of the models. The results of simulation studies and real data applying indicated that these highly compressed transformation features could better improve predictive performance and did not easily lead to over-fitting.
ARTICLE | doi:10.20944/preprints202211.0094.v1
Subject: Engineering, Mechanical Engineering Keywords: Bearing fault feature extraction; Blind deconvolution (BD); Multi-task optimization; Convolutional neural network
Online: 4 November 2022 (13:41:46 CET)
Blind deconvolution (BD) is one of the effective methods that help pre-process vibration signals and assist in bearing fault diagnosis. Currently, most BD methods design an optimization criterion and use frequency or time domain information independently to optimize a deconvolution filter. It recovers weak periodic impulses related to incipient faults. However, the random noise interference may cause the optimizer to overfit. The time-domain-based BD methods tend to extract fault-unrelated single peak impulse, and the frequency-domain-based BD methods tend to retain the maximum energy frequency component, which will lose the fault-related harmonics frequency components. To solve the above issue, we propose a hybrid criterion that combines the kurtosis for time domain optimization and the $G-l_1/l_2$ norm for the frequency domain. These two criteria are monotonically increasing and decreasing, so they mutually constrain to avoid overfitting. After that, we design a multi-task one-dimensional convolutional neural network with time and frequency branches to achieve an optimal solution for this hybrid criterion. The multi-task neural network realizes the simultaneous optimization of two domains. Experimental results show that our proposed method outperforms other state-of-the-art methods.
ARTICLE | doi:10.20944/preprints201803.0266.v1
Subject: Engineering, Mechanical Engineering Keywords: variational mode decomposition; random decrement technique; crankshaft bearing; engine; feature extraction
Online: 30 March 2018 (10:01:18 CEST)
The vibration signal of the engine contains strong background noise and many kinds of modulating components, which is difficult to diagnose. Variational mode decomposition (VMD) is a recently introduced adaptive signal decomposition algorithm with a solid theoretical foundation and good noise robustness compared with empirical mode decomposition (EMD). VMD can effectively avoid endpoint effect and modal aliasing. However, VMD cannot effectively eliminate the random noise in the signal, so the random decrement technique is introduced to solve the problem. Based on the crankshaft bearing fault simulation experiment, the four kinds of wear state vibration signals are decomposed by VMD, and the modal components with smaller permutation entropy are selected as fault components. Then the fault component is processed by the random decrement technique, and the Hilbert envelope spectrum of the fault component is obtained. Compared with the fault feature extraction method based on EMD and EEMD, the feature extraction results of the proposed method are better than those of the above two methods. The simulation analysis and the simulation test of the crankshaft bearing fault verify the effectiveness of the proposed method.
ARTICLE | doi:10.20944/preprints202309.0667.v1
Subject: Engineering, Architecture, Building And Construction Keywords: traditional village; roof feature line; slope segmentation; cloth simulation filter; UAV
Online: 11 September 2023 (10:12:44 CEST)
The extraction of roof feature lines is an important foundation for realizing large-scale and batch 3D modeling. However, the current traditional point cloud segmentation algorithms do not have satisfactory results in extracting roof feature lines of Chinese traditional residential buildings. In this paper, taking Jingping Village in Western Hunan as an example, we propose a method that combines multiple algorithms based on slope segmentation of roof patches to extract feature lines. Firstly, VDVI and CSF algorithms are used to extract the building and roof point cloud based on the MVS point cloud. Secondly, according to roof features, village buildings are classified, and 3D roof point cloud is projected into 2D regular grid data. Finally, the roof slope is segmented via slope direction, and internal and external feature lines are obtained after refinement through Canny edge detection and Hough straight line detection. Results reveal that this method effectively extracts feature lines of low-building roofs in traditional villages, with slope-based roof surface segmentation accuracy surpassing 99.6%. This method significantly outperforms the RANSAC algorithm and region segmentation algorithm.
ARTICLE | doi:10.20944/preprints202004.0524.v2
Subject: Biology And Life Sciences, Virology Keywords: unsupervised learning; tensor decomposition; feature selection; COVID-19; drug discovery; gene expression
Online: 3 June 2020 (05:29:09 CEST)
Background: COVID-19 is a critical pandemic that has affected human communities worldwide, and there is an urgent need to develop effective drugs. Although there are a large number of candidate drug compounds that may be useful for treating COVID-19, the evaluation of these drugs is time-consuming and costly. Thus, screening to identify potentially effective drugs prior to experimental validation is necessary. Method: In this study, we applied the recently proposed method tensor decomposition (TD)-based unsupervised feature extraction (FE) to gene expression profiles of multiple lung cancer cell lines infected with severe acute respiratory syndrome coronavirus 2. We identified drug candidate compounds that significantly altered the expression of the 163 genes selected by TD-based unsupervised FE. Results: Numerous drugs were successfully screened, including many known antiviral drug compounds such as C646, chelerythrine chloride, canertinib, BX-795, sorafenib, sorafenib, QL-X-138, radicicol, A-443654, CGP-60474, alvocidib, mitoxantrone, QL-XII-47, geldanamycin, fluticasone, atorvastatin, quercetin, motexafin gadolinium, trovafloxacin, doxycycline, meloxicam, gentamicin, and dibromochloromethane. The screen also identified ivermectin, which was first identified as an anti-parasite drug and recently the drug was included in clinical trials for SARS-CoV-2. Conclusions: The drugs screened using our strategy may be effective candidates for treating patients with COVID-19.
ARTICLE | doi:10.20944/preprints201812.0237.v1
Subject: Engineering, Mechanical Engineering Keywords: signal processing; sparse regression; system identification; impulse response; optimization; feature generation; structural dynamics; time series classification
Online: 19 December 2018 (16:21:41 CET)
Time recordings of impulse-type oscillation responses are short and highly transient. These characteristics may complicate the usage of classical spectral signal processing techniques for a) describing the dynamics and b) deriving discriminative features from the data. However, common model identification and validation techniques mostly rely on steady-state recordings, characteristic spectral properties and non-transient behavior. In this work, a recent method, which allows reconstructing differential equations from time series data, is extended for higher degrees of automation. With special focus on short and strongly damped oscillations, an optimization procedure is proposed that fine-tunes the reconstructed dynamical models with respect to model simplicity and error reduction. This framework is analyzed with particular focus on the amount of information available to the reconstruction, noise contamination and non-linearities contained in the time series input. Using the example of a mechanical oscillator, we illustrate how the optimized reconstruction method can be used to identify a suitable model and to extract features from uni-variate and multivariate time series recordings in an engineering-compliant environment. Moreover, the determined minimal models allow for identifying the qualitative nature of the underlying dynamical systems as well as testing for the degree and strength of non-linearity. The reconstructed differential equations would then be potentially available for classical numerical studies, such as bifurcation analysis. These results represent a physically interpretable enhancement of data-driven modeling approaches in structural dynamics.
ARTICLE | doi:10.20944/preprints201611.0052.v1
Subject: Physical Sciences, Acoustics Keywords: empirical mode decomposition; intrinsic mode function; permutation entropy; multi-scale permutation entropy; feature extraction
Online: 9 November 2016 (10:24:35 CET)
In order to solve the problem of feature extraction of underwater acoustic signals in complex ocean environment, a new method for feature extraction from ship radiated noise is presented based on empirical mode decomposition theory and permutation entropy. It analyzes the separability for permutation entropies of the intrinsic mode functions of three types of ship radiated noise signals, and discusses the permutation entropy of the intrinsic mode function with the highest energy. In this study, ship radiated noise signals measured from three types of ships are decomposed into a set of intrinsic mode functions with empirical mode decomposition method. Then, the permutation entropies of all intrinsic mode functions are calculated with appropriate parameters. The permutation entropies are obviously different in the intrinsic mode functions with the highest energy, thus, the permutation entropy of the intrinsic mode function with the highest energy is regarded as a new characteristic parameter to extract the feature of ship radiated noise. After that, the characteristic parameters, namely, the energy difference between high and low frequency, permutation entropy, and multi-scale permutation entropy, are compared with the permutation entropy of the intrinsic mode function with the highest energy. It is discovered that the four characteristic parameters are at the same level for similar ships, however, there are differences in the parameters for different types of ships. The results demonstrate that the permutation entropy of the intrinsic mode function with the highest energy is better in separability as the characteristic parameter than the other three parameters by comparing their fluctuation ranges and the average values of the four characteristic parameters. Hence, the feature of ship radiated noise can be extracted efficiently with the method.
ARTICLE | doi:10.20944/preprints202306.2037.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: Automatic Feature Extraction; Cadastral mapping; Fit-for-purpose; Interactive delineation; Mean-shift segmentation; Random Forest classification; Land administration
Online: 29 June 2023 (03:03:19 CEST)
Fit-for-purpose land administration (FFPLA) seeks to simplify cadastral mapping via lowering the costs and time associated with conventional surveying methods. The approach can be applied to both initial establishment and on-going maintenance of system. In Ethiopia, cadastral maintenance remains an on-going challenge, especially in rapidly urbanizing peri-urban areas, where farmers' land rights and tenure security are often jeopardized. Automatic Feature Extraction (AFE) is an emerging FFPLA approach, proposed as an alternative for mapping and updating cadastral boundaries. This study explores the role of the AFE approach for updating cadastral boundaries in the vibrant peri-urban areas of Addis Ababa. Open-source software solutions are utilized to assess the (semi-) automatic extraction of cadastral boundaries from orthophotos (segmentation), designation of 'boundary' and 'non-boundary' outlines (classification), and delimitation of cadastral boundaries (interactive delineation). Both qualitative and quantitative assessments of the achieved results (validation) are undertaken. A high-resolution orthophoto of the study area and a reference cadastral boundary shape file are used, respectively, for extracting the parcel boundaries and validating the interactive delineation results. Qualitative (visual) assessment verified the completed extraction of newly constructed cadastral boundaries in the study area, although non-boundary outlines such as footpaths and artefacts are also retrieved. For the buffer overlay analysis, the interactively delineated boundary lines and the reference cadastre were buffered within the spatial accuracy limits for urban and rural cadasters. As a result, the quantitative assessment delivered 52% correctness and 32% completeness for a buffer width of 0.4m and 0.6m, respectively, for the interactively delineated and reference boundaries. The study further demonstrated the potentially significant role AFE could assist in delivering fast, affordable, and reliable cadastral mapping. Further investigation, based on user input and expertise evaluation, could help to improve the approach and apply it to a real-world setting.
ARTICLE | doi:10.20944/preprints201703.0134.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: spatial-spectral feature; very high spatial resolution image; classification; Tobler’s First Law of Geography
Online: 17 March 2017 (05:06:12 CET)
Aerial image classification has become popular and has attracted extensive research efforts in recent decades. The main challenge lies in its very high spatial resolution but relatively insufficient spectral information. To this end, spatial-spectral feature extraction is a popular strategy for classification. However, parameter determination for that feature extraction is usually time-consuming and depends excessively on experience. In this paper, an automatic spatial feature extraction approach based on image raster and segmental vector data cross-analysis is proposed for the classification of very high spatial resolution (VHSR) aerial imagery. First, multi-resolution segmentation is used to generate strongly homogeneous image objects and extract corresponding vectors. Then, to automatically explore the region of a ground target, two rules, which are derived from Tobler’s First Law of Geography (TFL) and a topological relationship of vector data, are integrated to constrain the extension of a region around a central object. Third, the shape and size of the extended region are described. A final classification map is achieved through a supervised classifier using shape, size, and spectral features. Experiments on three real aerial images of VHSR (0.1 to 0.32 m) are done to evaluate effectiveness and robustness of the proposed approach. Comparisons to state-of-the-art methods demonstrate the superiority of the proposed method in VHSR image classification.
ARTICLE | doi:10.20944/preprints202308.0528.v1
Subject: Engineering, Bioengineering Keywords: Speech Imagery; Mental Task; Machine Leaning; Feature Extraction; Common spatial pattern (CSP); Filter bank Common Spatial Pattern (FBCSP); Brain – Computer Interface (BCI); Principal Components Analysis (PCA); Feature Selection; Channel Selection; Mutual Information; Lagrange Formula; Deep Learning; SVM Classifier
Online: 7 August 2023 (10:23:13 CEST)
Nowadays, brain signal processing is performed rapidly in various brain-computer interface (BCI) applications. Most researchers focus on developing new methods for the future or improving the basic implemented models to identify the optimum standalone feature set. Our research focuses on four ideas. One of them introduces future communication models, and the others are for improving old models or methods. These are: 1) new communication imagery model instead of speech imager using the mental task: Due to speech imagery is very difficult, and it is impossible to imagine sound for all of the characters in all of the languages. Our research introduces a new mental task model for all languages that call Lip-sync imagery. This model can use for all characters in all languages. This paper implemented two lip-sync for two sounds, characters or letters. 2) New combination Signals: Selecting an inopportune frequency domain can lead to inefficient feature extraction. Therefore, domain selection is so important for processing. This combination of limited frequency ranges proposes a preliminary for creating Fragmentary Continuous frequency. For the first model, two s intervals of 4 Hz as filter banks were examined and tested. The primary purpose is to identify the combination of filter banks with 4Hz (scale of each filter bank) from the 4Hz to 40Hz frequency domain as new combination signals (8Hz) to obtain well and efficient features using increasing distinctive patterns and decreasing similar patterns of brain activities.3) new supplement bond graph classifier for SVM classifier: When SVM linear uses in very noisy, the performance is decreased. But we introduce a new bond graph linear classifier to supplement SVM linear in noisy data. 4) a deep formula recognition model: it converts the data of the first layer into a formula model (formula extraction model). The main goal is to reduce the noise in the subsequent layers for the coefficients of the formulas. The output of the last layer is the coefficients selected by different functions in different layers. Finally, the classifier extracts the root interval of the formulas, and the diagnosis does based on the root interval. For all of the ideas achieved the results of implementing methods. The results are between 55% to 98%. Less result is 55% for the deep detection formula, and the highest result is 98% for new combination signals.
REVIEW | doi:10.20944/preprints202012.0377.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: Feature Selection, Feature Ranking, Grouping, Clustering, Biological Knowledge.
Online: 15 December 2020 (12:10:44 CET)
In the last two decades, there have been massive advancements in high throughput technologies, which resulted in the exponential growth of public repositories of gene expression datasets for various phenotypes. It is possible to unravel biomarkers by comparing the gene expression levels under different conditions, such as disease vs. control, treated vs. not treated, drug A vs. drug B, etc. This problem refers to a well-studied problem in the machine learning domain, i.e., the feature selection problem. In biological data analysis, most of the computational feature selection methodologies were taken from other fields, without considering the nature of the biological data. For gene expression data analysis, most of the existing feature selection methods rely on expression values alone to select the genes; and biological knowledge is integrated at the end of the analysis in order to gain biological insights or to support the initial findings. Thus, integrative approaches that utilize the biological knowledge while performing feature selection are necessary for this kind of data. The main idea behind the integrative gene selection process is to generate a ranked list of genes considering both the statistical metrics that are applied to the gene expression data, and the biological background information which is provided as external datasets. Since the integrative approach attracted attention in the gene expression domain, lately the gene selection process shifted from being purely data-centric to more incorporative analysis with additional biological knowledge.
ARTICLE | doi:10.20944/preprints202304.1204.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: dimensionality reduction; autoencoder; feature extraction; feature selection; guiding layer; regularization
Online: 29 April 2023 (04:14:20 CEST)
In the era of big data, feature engineering has proved its efficiency and importance in dimensionality reduction and useful information extraction from original features. Feature engineering can be expressed as dimensionality reduction and is divided into two types of methods such as feature selection and feature extraction. Each method has its pros and cons. There are a lot of studies to combine these methods. Sparse autoencoder (SAE) is a representative deep feature learning method that combines feature selection with feature extraction. However, existing SAEs do not consider the feature importance during training. It causes extracting irrelevant information. In this paper, we propose a parallel guiding sparse autoencoder (PGSAE) to guide the information by two parallel guiding layers and sparsity constraints. The parallel guiding layers keep the main distribution using Wasserstein distance which is a metric of distribution difference, and it suppresses the leverage of guiding features to prevent overfitting. We perform our experiments using four datasets that have different dimensionality and number of samples. The proposed PGSAE method produces a better classification performance compared to other dimensionality reduction methods.
ARTICLE | doi:10.20944/preprints202303.0391.v1
Subject: Medicine And Pharmacology, Veterinary Medicine Keywords: prognosis and health management, preprocessing data, feature extraction, feature selection.
Online: 22 March 2023 (04:31:53 CET)
In the chemical processing industries, sensors for pumps are among the most commonly used machinery. Condition-based maintenance (CBM) and prognosis health management (PHM) determine the most cost-effective time to overhaul pumps. In order to determine the status of the pump, a signal-emitting accelerometer is employed. Stationarity-based feature extraction from amplitude signals is used to process the signal. Utilizing the time-domain function, multiple statistical results were produced. Eight fault codes were classified using support vector machine method. The enormous amount of data points necessitated the use of feature selection. In terms of accuracy, precision, recall, and F1 score, the Chi-square feature selection method exceeds other approaches.
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: COVID-19; machine learning; feature significance; feature correlation; risk factors
Online: 2 June 2021 (14:54:10 CEST)
The COVID-19 pandemic affected the whole world, but not all countries were impacted equally. This opens the question of what factors can explain the initial faster spread in some countries compared to others. Many such factors are overshadowed by the effect of the countermeasures, so we studied the early phases of the infection when countermeasures have not yet taken place. We collected the most diverse dataset of potentially relevant factors and infection metrics to date for this task. Using it, we show the importance of different factors and factor categories as determined by both statistical methods and machine learning (ML) feature selection (FS) approaches. Factors related to culture (e.g., individualism, openness), development, and travel proved the most important. A more thorough factor analysis was then made using a novel rule discovery algorithm. We also show how interconnected these factors are and caution against relying on ML analysis in isolation. Importantly, we explore potential pitfalls found in the methodology of similar work and demonstrate their impact on COVID-19 data analysis. Our best models using the decision tree classifier can predict the infection class with roughly 80% accuracy.
ARTICLE | doi:10.20944/preprints202309.0133.v1
Subject: Engineering, Aerospace Engineering Keywords: Star image registration; Radial module feature; Rotation angle feature; Robustness; Real-time
Online: 4 September 2023 (07:16:38 CEST)
Star image registration is the most important step in the application of astronomical image differencing, stacking and mosaicking, which requires high robustness, accuracy and real--time of the algorithm, but there is no high--performance registration algorithm in this field. In this paper, we propose a star image registration algorithm that relies only on radial module features (RMF) and rotation angle features (RAF), which has excellent robustness, high accuracy, and good real--time performance. The test results on a large amount of simulated and real data show that the comprehensive performance of the proposed algorithm is significantly better than the four classical baseline algorithms in the presence of rotation, insufficient overlapping area, false stars, position deviation, magnitude deviation and complex sky background, which is a more ideal star image registration algorithm.
ARTICLE | doi:10.20944/preprints202306.0180.v1
Subject: Engineering, Control And Systems Engineering Keywords: Condition monitoring; Induction motor; Inter-turn short-circuit; Feature calculation; Feature reduction
Online: 2 June 2023 (10:22:01 CEST)
Electrical rotating machines like Induction Motors (IMs) are widely used in several industrial applications since their robust elements, provide high efficiency and give versatility in industrial applications. Nevertheless, the occurrence of faults in IMs is inherent to their operating conditions, hence, Inter-turn short-circuit (ITSC) is one of the most common failures that affect IMs and its appearance is due to electrical stresses leads to the degradation of the stator winding insulation. In this regard, this work proposes a diagnosis methodology for the assessment and detection of incipient ITSC in IMs, the proposed method is based on the processing of vibration, stator currents and magnetic stray-flux signals. Certainly, the novelty and contribution include the characterization of different physical magnitudes by estimating a set of statistical time domain features, as well as, their fusion and reduction through the Linear discriminant Analysis technique within a feature-level fusion approach. Furthermore, the fusion and reduction of information from different physical magnitudes leads to perform the automatic fault detection and identification by a simple Neural-Network (NN) structure. The proposed method is evaluated under a complete set of experimental data and the obtained results demonstrate that the fusion of information from different sources (physical magnitudes) allows to improve the accuracy during the detection of ITSC in IMs , the results make this proposal feasible to be incorporated as a part of condition-based maintenance programs in the industry.
ARTICLE | doi:10.20944/preprints202201.0258.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Skin cancer; Deep learning; Hybrid feature extractor; Local binary pattern; Feature extraction
Online: 18 January 2022 (12:43:50 CET)
Skin cancer is an exquisite disease globally nowadays. Because of the poor contrast and apparent resemblance between skin and lesions, automatic identification of skin cancer is complicated. The rate of human death can be massively reduced if melanoma skin cancer can be detected quickly using dermoscopy images. In this research, an anisotropic diffusion filtering method is used on dermoscopy images to remove multiplicative speckle noise and the fast-bounding box (FBB) method is applied to segment the skin cancer region. Furthermore, the paper consists of two feature extractor parts. One of the two features extractor parts is the hybrid feature extractor (HFE) part and another is the convolutional neural network VGG19 based CNN feature extractor part. The HFE portion combines three feature extraction approaches into a single fused feature vector: Histogram-Oriented Gradient (HOG), Local Binary Pattern (LBP), and Speed Up Robust Feature (SURF). The CNN method also is used to extract additional features from test and training datasets. This two-feature vector is fused to design the classification model. This classifier performs the classification of dermoscopy images whether it is melanoma or non-melanoma skin cancer. The proposed methodology is performed on two ordinary datasets and achieved the accuracy 99.85%, sensitivity 91.65%, and specificity 95.70%, which makes it more successful than previous machine learning algorithms.
ARTICLE | doi:10.20944/preprints202301.0304.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Disease Dental Caries; Gradient Boosting Decision Tree; Feature Selection; Machine Learning; Feature importance
Online: 17 January 2023 (09:25:45 CET)
Caries is a prevalent oral disease that primarily affects children and teenagers. Advances in ma-chine learning have caught the attention of scientists working with decision support systems to predict early tooth decay. Current research has developed machine learning algorithm for caries classification and reached high accuracy especially in ML for image data. Unfortunately, most studies on dental caries only focus on classification and prediction tasks, meanwhile dental carries prevention is more important. Therefore, this study aims to design an efficient feature for decision support system machine learning based that can identify various risk factors that cause dental caries and its prevention. The data used in the research work was obtained from the 2018 Korean Children's Oral Health Survey, which totaled nine datasets. The experimental results show that combining the mRMR and GINI Feature Importance methods when training with the GBDT model achieved the optimum performance of 95%, 93%, 99%, and 88% for accuracy, F1 score, precision, and recall, respectively. So, the proposed method has provided effective predictive model for dental caries prediction.
ARTICLE | doi:10.20944/preprints202206.0390.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Object detection; Feature fusion network; Multiple feature selection; Angle prediction; Pixel Attention Mechanism
Online: 29 June 2022 (03:09:52 CEST)
The object detection task is usually affected by complex backgrounds. In this paper, a new image object detection method is proposed, which can perform multi-feature selection on multi-scale feature maps. By this method, a bidirectional multi-scale feature fusion network is designed to fuse semantic features and shallow features to improve the detection effect of small objects in complex backgrounds. When the shallow features are transferred to the top layer, a bottom-up path is added to reduce the number of network layers experienced by the feature fusion network, reducing the loss of shallow features. In addition, a multi-feature selection module based on the attention mechanism is used to minimize the interference of useless information on subsequent classification and regression, allowing the network to adaptively focus on appropriate information for classification or regression to improve detection accuracy. Because the traditional five-parameter regression method has severe boundary problems when predicting objects with large aspect ratios, the proposed network treats angle prediction as a classification task. The experimental results on the DOTA dataset, the self-made DOTA-GF dataset and the HRSC 2016 dataset show that, compared with several popular object detection algorithms, the proposed method has certain advantages in detection accuracy.
ARTICLE | doi:10.20944/preprints202302.0196.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Recommendation; GNN; Information; Feature; Structure
Online: 13 February 2023 (03:28:02 CET)
With the rapid development of the Internet industry, the problem of information overload has arisen due to the abundance of information available online. Recommendation algorithms, as the core of recommendation systems, have been attracting much attention and are a hot topic of research for many experts and scholars. The classical recommendation algorithms are mainly divided into three major categories: collaborative filtering recommendation algorithms, content-based recommendation algorithms, and hybrid-based recommendation algorithms. Although these algorithms are widely used in various fields, with the proliferation of information, these traditional recommendation algorithms are no longer able to meet the needs of the times. To address this issue, recommendation systems have been developed to provide users with personalized and relevant information or products. Despite the wide use of recommendation algorithms, such as collaborative filtering, content-based filtering, and hybrid approaches, traditional recommendation algorithms have limitations and are no longer suitable for meeting the demands of the times. This paper proposes a new recommendation algorithm, SFRRG, that fuses structure and feature information in graph neural networks to improve the performance of the recommendation system in rating prediction. The effectiveness of the proposed algorithm is demonstrated through experiments on various data sets and compared with existing recommendation algorithms.
ARTICLE | doi:10.20944/preprints202211.0534.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Recommendation; GNN, Feature; Path; Embedding
Online: 29 November 2022 (04:19:09 CET)
Recommender systems as an effective information filtering system can be used to obtain information through the user's explicit or implicit behavior. On the one hand finding items that may be of interest to the user. On the other hand, the recommendation facilitates the interaction between the user and the item to increase the revenue. Recommender systems have been widely used in various fields, such as e-commerce, travel recommendation, online books and movies, social networks, etc, which can satisfy the intrinsic implicit needs of users through personalized services. In recent years, the development of deep learning has further improved the performance of recommendation systems. Although these methods improve the performance of the recommendation system, when the number of users and products increases, the recommendation system may face sparsity and cold start problems, and thus cannot achieve personalized recommendations. Knowledge graphs, which are structured data, have become the choice of many algorithms due to the high quality and wide scale of the data, and therefore many recommendation algorithms combined with knowledge graphs have emerged as a popular new direction in recommendation systems. These algorithms are able to preserve the rich connections between different entities. Moreover, when constructing the features of an entity, the entities that are far away from the central entity can also be utilized. Entities are no longer only directly connected to each other. To address the shortcomings of existing recommendation algorithms, this paper designs the recommendation algorithm GPRE using graph neural networks. GPRE focuses on expressing the user's features. The graph neural network provides GPRE with a strong generalization capability for modeling, which can provide long-range semantics between users and entities, as well as selective entity selection in the auxiliary graph neural network. Explicit semantic links are established between remote and central nodes to reduce the introduction of noise. In this paper, experiments are conducted on real-world datasets and the results are compared with baselines. The experimental results show that GPRE performs well on the experimental dataset.
ARTICLE | doi:10.20944/preprints202002.0415.v1
Subject: Medicine And Pharmacology, Other Keywords: electromyography; EMG; feature extraction; feature selection; myoelectric control; classification; pattern recognition; prosthetics; wearables; amputee
Online: 28 February 2020 (02:09:05 CET)
Myoelectric control is the cornerstone of many assistive technologies used in clinical practice, such as prosthetics and orthoses, and human-computer interaction, such as virtual reality control. Although the performance of such devices exceeds 90\% in controlled environments, myoelectric devices still face challenges in robustness to variability of daily living conditions. Within this survey, the intrisic physiological mechanisms limiting practical implementations of myoelectric devices were explored: the limb position effect and the contraction intensity effect. The degradation of electromyography (EMG) pattern recognition in the presence of these factors was demonstrated on six datasets, where performance was 13% and 20% lower in realistic environments compared to controlled environments for the limb position and contraction intensity effect, respectively. The experimental designs of limb position and contraction intensity literature were surveyed. Current state-of-the-art training strategies and robust algorithms for both effects were compiled and presented. Recommendations for future limb position effect studies include: the collection protocol providing exemplars of 6 positions (four limb positions and three forearm orientations), three-dimensional space experimental designs, transfer learning approaches, and multi-modal sensor configurations. Recommendations for future contraction intensity effect studies include: the collection of dynamic contractions, nonlinear complexity features, and proportional control.
ARTICLE | doi:10.20944/preprints201712.0057.v1
Subject: Environmental And Earth Sciences, Other Keywords: dimension reduction; feature extraction; hyperspectral image; weighted feature space; low rank representation; spectral clustering
Online: 11 December 2017 (06:55:22 CET)
Containing hundreds of spectral bands (features), hyperspectral images (HSIs) have high ability in discrimination of land cover classes. Traditional HSIs data processing methods consider the same importance for all bands in the original feature space (OFS), while different spectral bands play different roles in identification of samples of different classes. In order to explore the relative importance of each feature, we learn a weighting matrix and obtain the relative weighted feature space (RWFS) as an enriched feature space for HSIs data analysis in this paper. To overcome the difficulty of limited labeled samples which is common case in HSIs data analysis, we extend our method to semisupervised framework. To transfer available knowledge to unlabeled samples, we employ graph based clustering where low rank representation (LRR) is used to define the similarity function for graph. After construction the RWFS, any arbitrary dimension reduction method and classification algorithm can be employed in RWFS. The experimental results on two well-known HSIs data set show that some dimension reduction algorithms have better performance in the new weighted feature space.
ARTICLE | doi:10.20944/preprints202108.0433.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Speech emotion recognition; Feature extraction; Heterogeneous parallel network; Spectral features; Prosodic features; Multi-feature fusion
Online: 23 August 2021 (12:16:40 CEST)
Speech emotion recognition remains a heavy lifting in natural language processing. It has strict requirements to the effectiveness of feature extraction and that of acoustic model. With that in mind, a Heterogeneous Parallel Convolution Bi-LSTM model is proposed to address these challenges. It consists of two heterogeneous branches: the left one contains two dense layers and a Bi-LSTM layer, while the right one contains a dense layer, a convolution layer, and a Bi-LSTM layer. It can exploit the spatiotemporal information more effectively, and achieves 84.65%, 79.67%, and 56.50% unweighted average recall on the benchmark databases EMODB, CASIA, and SAVEE, respectively. Compared with the previous research results, the proposed model achieves better performance stably.
ARTICLE | doi:10.20944/preprints202111.0243.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Feature Selection; Malaria Diagnosis; Supervised learning
Online: 15 November 2021 (10:36:16 CET)
Malaria remains an important cause of death, especially in sub-Saharan Africa with about 228 million malaria cases worldwide and an estimated 405,000 deaths in 2019. Currently, malaria is diagnosed in the health facility using a microscope (BS) or rapid malaria diagnostic test (MRDT) and with area where these tools are inadequate the presumptive treatment is performed. Apart from that self-diagnosis and treatment is also practiced in some of the households. With the high-rate self-medication on malaria drugs, this study aimed at computing the most significant features using feature selection methods for best prediction of malaria in Tanzania that can be used in developing a machine learning model for malaria diagnosis. A malaria symptoms and clinical diagnosis dataset were extracted from patients’ files from four (4) identified health facilities in the regions of Kilimanjaro and Morogoro. These regions were selected to represent the high endemic areas (Morogoro) and low endemic areas (Kilimanjaro) in the country. The dataset contained 2556 instances and 36 variables. The random forest classifier a tree based was used to select the most important features for malaria prediction. Regional based features were obtained to facilitate accurate prediction. The feature ranking as indicated that fever is universally the most influential feature for predicting malaria followed by general body malaise, vomiting and headache. However, these features are ranked differently across the regional datasets. Subsequently, six predictive models, using important features selected by feature selection method, were used to evaluate the features performance. The features identified complies with malaria diagnosis and treatment guideline provided with WHO and Tanzania Mainland. The compliance is observed so as to produce a prediction model that will fit in the current health care provision system in Tanzania.
ARTICLE | doi:10.20944/preprints202110.0042.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: classification; ensemble; subspace; sparsity; feature ranking
Online: 4 October 2021 (10:36:37 CEST)
We propose a new ensemble classification algorithm, named Super Random Subspace Ensemble (Super RaSE), to tackle the sparse classification problem. The proposed algorithm is motivated by the Random Subspace Ensemble algorithm (RaSE). The RaSE method was shown to be a flexible framework that can be coupled with any existing base classification. However, the success of RaSE largely depends on the proper choice of the base classifier, which is unfortunately unknown to us. In this work, we show that Super RaSE avoids the need to choose a base classifier by randomly sampling a collection of classifiers together with the subspace. As a result, Super RaSE is more flexible and robust than RaSE. In addition to the vanilla Super RaSE, we also develop the iterative Super RaSE, which adaptively changes the base classifier distribution as well as the subspace distribution. We show the Super RaSE algorithm and its iterative version perform competitively for a wide range of simulated datasets and two real data examples. The new Super RaSE algorithm and its iterative version are implemented in a new version of the R package RaSEn.
ARTICLE | doi:10.20944/preprints202308.2105.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: convolutional neural networks; feature selection; transfer learning; feature fusion; gray wolf optimization; deep learning; skin lesion
Online: 31 August 2023 (10:24:52 CEST)
Melanoma is widely recognized as one of the most lethal forms of skin cancer, with its incidence showing an upward trend in recent years. Nonetheless, the timely detection of this malignancy substantially enhances the likelihood of patients’ long-term survival. Several computer-based methods have recently been proposed in the pursuit of diagnosing skin lesions at their early stages. Despite achieving some level of success, there still remains a margin of error that the machine learning community considers to be an unresolved research challenge. This study presents a novel framework for the classification of skin lesions. The framework incorporates deep features to generate a highly discriminant feature vector, while also maintaining the integrity of the original feature space. Recent deep models including Darknet53, DenseNet201, InceptionV3, and InceptionResNetV2 are employed in our study for the purpose of feature extraction. Additionally, transfer learning is leveraged to enhance the performance of our approach. In the subsequent phase, the extracted feature information from the chosen pre-existing models is combined, with the aim of preserving maximum information, prior to undergoing the process of feature selection using a novel entropy-controlled grey wolf optimization (ECGWO) algorithm. The integration of fusion and selection techniques is employed to initially incorporate the feature vector with a high level of information and subsequently eliminate redundant and irrelevant feature information. The efficacy of our design is substantiated through the evaluation on three benchmark dermoscopic datasets, namely PH2, ISIC-MSK, and ISIC-UDA. In order to validate the proposed methodology, a comprehensive evaluation is conducted, including a rigorous comparison with established techniques in the field.
ARTICLE | doi:10.20944/preprints202305.2209.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: breast cancer; Convolutional Neural Network (CNN); computer aided diagnosis (CAD); feature selection; feature classification; mammography images
Online: 31 May 2023 (09:02:30 CEST)
The prompt and accurate diagnosis of breast lesions, including the distinction between cancer, non-cancer, and suspicious cancer, plays a crucial role in the prognosis of breast cancer. In this paper, we introduce a novel method based on feature extraction and reduction for detection of breast cancer in mammography images. First, we extract features from multiple pre-trained convolutional neural network (CNN) models, and then concatenate them. The most informative features are selected based on their mutual information with the target variable. Subsequently, the selected features can be classified using a machine learning algorithm. We evaluate our approach using four different machine learning algorithms, and our results demonstrate that the neural network-based classifier yields an accuracy as high as 92% for the RSNA dataset which is a new dataset that provides two views and additional features such as age. We compare our proposed algorithm with state-of-the-art methods and demonstrate its superiority, particularly in terms of accuracy and sensitivity. For the MIAS dataset, we achieve an accuracy as high as 94.5%, and for the DDSM dataset, an accuracy of 96% is attained. These results highlight the effectiveness of our method in accurately diagnosing breast lesions and surpassing existing approaches.
ARTICLE | doi:10.20944/preprints202007.0688.v1
Subject: Medicine And Pharmacology, Neuroscience And Neurology Keywords: Computer aided diagnosis (CAD), brain magnetic resonance imaging (MRI) scans, feature extraction, feature reduction, classifiers, classification rule.
Online: 29 July 2020 (10:14:43 CEST)
Manual interpretation of these huge amounts of image volumes are susceptible to inter-reader variability and human error. Thus, accurate automated CAD scheme is highly desirable in clinical pathological diagnosis. In this research, plethora of machine learning paradigms (e.g. feature extraction, dimensionality reduction and supervised classification methods) were explored, evaluated, compared and analyzed to identify the optimal pathway for brain MR images (normal vs neoplastic) binary classification task. External validation dataset was used to test the generalizability of the optimal predictive models implemented. Relevant and informative features were selected to construct cross-validated decision tree and eventually simple rule set was built based on the decision tree. The experimental results show that almost all pattern recognition paradigms achieve high accuracy with careful selection of number of attributes. LDA+ELM with 55 features are the optimal pipelines which achieve perfect classification when training and test data are of same source; and achieving (accuracy=97.5%, AUC=0.989, sensitivity=95% and specificity=100%) under balanced test dataset; (accuracy=99.5%, AUC=0.988, sensitivity=95% and specificity=100%). Cross-validated decision tree model also shows comparable performance: accuracy=98.8%, AUC=99.1%, sensitivity=99.6% and specificity=98.2%. Three highly relevant and robust attributes are visualized and selected for construction of decision tree models and finally a rule sets are read directly off the decision tree. This rule sets can potentially serve as fast and accurate classification algorithm.
Subject: Engineering, Marine Engineering Keywords: Internal wave recognition; automation; CNN; feature extraction
Online: 14 August 2023 (04:38:20 CEST)
The internal wave recognition algorithm in an ocean data buoy system can be used to realize the real-time and flexible observation of internal waves, but there is no accurate automatic recognition method. To meet the need for automatic, real-time, and reliable internal wave recognition, an automatic internal wave recognition algorithm has been proposed for a tightly profiled intelligent buoy system. The sea profile temperature data collected by the Bailong buoy system in the Andaman Sea in 2018 were used to train and test the internal wave recognition neural network model, which consists of two parts: feature extraction and feature classification. The experiment compares the long short-term memory network (LSTM), convolutional neural network (CNN) with different layers, and deep neural network (DNN) without a feature extraction network and adjusts the number of convolutional nuclei and convolutional strides to improve the feature extraction efficiency. Experiments show that the best results can be obtained when a CNN layer is used as the feature extraction network, the convolutional step length is 4, the number of convolutional kernels is 5. The recall reaches 95.31% and the precision is 97.53%. The internal wave identification delay of the algorithm is 5.0862 minutes, the number of parameters is 1593, and the number of calculations is 3024. The algorithm can be directly deployed to the ocean data buoy system to realize the demand for automatic, real-time and reliable internal wave identification at the buoy end.
REVIEW | doi:10.20944/preprints202305.0663.v1
Subject: Computer Science And Mathematics, Mathematical And Computational Biology Keywords: depression detection; fusion; feature extraction; deep learning
Online: 9 May 2023 (13:20:58 CEST)
This study compares the performance of existing studies on multimodal emotion recognition, and proposes a model that fuses two modalities with the speaker's text and voice signals as input values and detects depression. Based on the DAIC-WOZ dataset, voice features were extracted using CNN, text features were extracted using Transformers, and two modalities were fused through a tensor fusion network. We also build a model to detect whether the speaker is depressed or not using LSTM in the final layer. This study suggests the possibility of increasing access to mental illness diagnosis by enabling patients to detect depression on their own in daily conversations. If the model proposed in this study is developed and the voice conversation system is connected, it will be easier for patients who cannot visit the hospital periodically or who are reluctant to visit the hospital to check their condition and seek recovery. Furthermore, it can be expanded to multi-label classification for various mental diseases and used as a simple self-mental disease diagnosis tool.
ARTICLE | doi:10.20944/preprints202102.0260.v3
Subject: Computer Science And Mathematics, Discrete Mathematics And Combinatorics Keywords: Feature Selection; Discrete Data; Heuristics; Running average
Online: 7 December 2021 (11:28:35 CET)
By applying a running average (with a window-size= d), we could transform Discrete data to broad-range, Continuous values. When we have more than 2 columns and one of them is containing data about the tags of classification (Class Column), we could compare and sort the features (Non-class Columns) based on the R2 coefficient of the regression for running averages. The parameters tuning could help us to select the best features (the non-class columns which have the best correlation with the Class Column). “Window size” and “Ordering” could be tuned to achieve the goal. this optimization problem is hard and we need an Algorithm (or Heuristics) for simplifying this tuning. We demonstrate a novel heuristics, Called Simulated Distillation (SimulaD), which could help us to gain a somehow good results with this optimization problem.
ARTICLE | doi:10.20944/preprints202111.0024.v1
Subject: Computer Science And Mathematics, Analysis Keywords: Fake news detection; Deep learning; Feature Engineering
Online: 1 November 2021 (15:34:46 CET)
The rapid infiltration of fake news is a flaw to the otherwise valuable internet, a virtually global network that allows for the simultaneous exchange of information. While a common, and normally effective, approach to such classification tasks is designing a deep learning-based model, the subjectivity behind the writing and production of misleading news invalidates this technique. Deep learning models are unexplainable in nature, making the contextualization of results impossible because it lacks explicit features used in traditional machine learning. This paper emphasizes the need for feature engineering to effectively address this problem: containing the spread of fake news at the source, not after it has become globally prevalent. Insights from extracted features were used to manipulate the text, which was then tested on deep learning models. The original unknown yet substantial impact that the original features had on deep learning models was successfully depicted in this study.
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: feature subset selection; disease classification; subtype detection
Online: 14 June 2021 (10:30:21 CEST)
Biologists seek to identify a small number of significant features that are important, non-redundant, and relevant from diverse omics data. For example, statistical methods like LIMMA and DEseq distinguish differentially expressed genes between a case and control group from the transcript profile. Researchers also apply various column subset selection algorithms on genomics datasets for a similar purpose. Unfortunately, genes selected by such statistical or machine learning methods are often highly co-regulated, making their performance inconsistent. Here, we introduce a novel feature selection algorithm that selects highly disease-related and non-redundant features from a diverse set of omics datasets. We successfully applied this algorithm to three different biological problems: a) disease to normal sample classification, b) multiclass classification of different disease samples, and c) disease subtypes detection. Considering classification ROC-AUC, False-positive, and False-negative rates, our algorithm outperformed other gene selection and differential expression (DE) methods for all six types of cancer datasets from TCGA considered here for binary and multiclass classification problems. Moreover, genes picked by our algorithm improved the disease subtyping accuracy for four different cancer types over the state-of-the-art methods. Hence, we posit that our proposed feature reduction method can support the community to solve various problems, including the selection of disease-specific biomarkers, precision medicine design, and disease sub-type detection.
ARTICLE | doi:10.20944/preprints202011.0412.v1
Subject: Biology And Life Sciences, Anatomy And Physiology Keywords: Process; ontological category; life concept; essential feature
Online: 16 November 2020 (10:49:11 CET)
Although increasing knowledge about biological systems has advanced exponentially in recent decades, it is surprising to realize that the very definition of Life keeps presenting theoretical challenges. Even if several lines of reasoning seek to identify the essence of life phenomenon, most of these thoughts contain fundamental problem in their basic conceptual structure. Most concepts fail to identify necessary and sufficient features to define life. Here, we analyzed the main conceptual framework regarding theoretical aspects supporting life concepts, such as (i) the physical, (ii) the cellular and (iii) the molecular approaches. Based on ontological analysis, we propose that Life should not be positioned under the ontological category of Matter. Yet, life should be better understood under the top-level ontology of “Process”. Exercising an epistemological approach, we propose that the essential characteristic pervading each and every living being is the presence of organic codes. Therefore, we explore theories in biosemiotics in order to propose a clear concept of life as a macrocode composed by multiple inter-related coding layers. Therefore, we suggest a clear distinction between the concept of life and living beings, a distinction that is not evident in theoretical terms. From the proposed concept, we suggest that the evolutionary process is a fundamental characteristic for life’s maintenance but not to its definition. The current proposition opens a fertile field of debate in astrobiology, biosemiotics and robotics.
ARTICLE | doi:10.20944/preprints202009.0521.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: electroencephalographic; feature selection; machine learning; prediction model
Online: 22 September 2020 (11:27:03 CEST)
In recent years, research has focused on generating mechanisms to assess the levels of subjects' cognitive workload when performing various activities that demand high concentration levels, such as driving a vehicle. These mechanisms have implemented several tools to analyze cognitive workload where the electroencephalographic (EEG) signals are the most used due to its high precision. However, one of the main challenges in the EEG signals implementing is finding the appropriate information to identify cognitive states. Here we show a new feature selection model for pattern recognition using information from EEG signals based on machine learning techniques called GALoRIS. GALoRIS combines Genetic Algorithms and Logistic Regression to create a new fitness function that identifies and selects the critical EEG features that contribute to recognizing high and low cognitive workload and structures a new dataset capable of optimizing the model's predictive process. We found that GALoRIS identifies data related to high and low cognitive workload of subjects while driving a vehicle using information extracted from multiple EEG signals, reducing the original dataset by more than 50%, maximizing the model's predictive capacity-achieving a precision rate greater than 90%.
ARTICLE | doi:10.20944/preprints202001.0318.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: efficient binary symbiotic; feature selection; classification; optimization
Online: 26 January 2020 (08:30:17 CET)
Feature selection is one of the main data preprocessing steps in machine learning. Its goal is to reduce the number of features by removing extra and noisy features. Feature selection methods must consider the accuracy of classification algorithms while performing feature reduction on a dataset. Meta-heuristic algorithms are the most successful and promising methods for solving this issue. The symbiotic organisms search algorithm is one of the successful meta-heuristic algorithms which is inspired by the interaction of organisms in the nature called Parasitism Commensalism Mutualism. In this paper, three engulfing binary methods based on the symbiotic organisms search algorithm are presented for solving the feature selection problem. In the first and second methods, several S-shaped and V-shaped transfer functions are used for binarizing the symbiotic organisms search algorithm, respectively. These methods are called BSOSS and BSOSV. In the third method, two new operators called BMP and BCP are presented for binarizing the symbiotic organisms search algorithm. This method is called EBSOS. The third approach presents an advanced binary version of the coexistence search algorithm with two new operators, BMP and BCP, to solve the feature selection problem, named EBSOS. The proposed methods are run on 18 standard UCI datasets and compared to base and important meta-heuristic algorithms. The test results show that the EBSOS method has the best performance among the three proposed approaches for binarization of the coexistence search algorithm. Finally, the proposed EBSOS approach was compared to other meta-heuristic methods including the genetic algorithm, binary bat algorithm, binary particle swarm algorithm, binary flower pollination algorithm, binary grey wolf algorithm, binary dragonfly algorithm, and binary chaotic crow search algorithm. The results of different experiments showed that the proposed EBSOS approach has better performance compared to other methods in terms of feature count and accuracy criteria. Furthermore, the proposed EBSOS approach was practically evaluated on spam email detection in particular. The results of this experiment also verified the performance of the proposed EBSOS approach. In addition, the proposed EBSOS approach is particularly combined with the classifiers including SVM, KNN, NB and MLP to evaluate this method performance in the detection of spam emails. The obtained results showed that the proposed EBSOS approach has significantly improved the accuracy and speed of all the classifiers in spam email detection.
ARTICLE | doi:10.20944/preprints201908.0011.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: rain cell; tracking; PIV; feature-based verification
Online: 1 August 2019 (10:16:12 CEST)
This study proposes a new algorithm termed rain cell identification and tracking (RCIT) to identify and track rain cells from high resolution weather radar data. Previous algorithms have limitations when tracking non-consequent rain cells owing to their use of maximum correlation coefficient methods and their lack of an alternative way to handle the variation stages of rain cells during their life cycles. To address these deficiencies, various methods are implemented in the new algorithm. These include the particle image velocimetry (PIV) method for motion estimation and the rain cell matching rule to obtain the stage changes of rain cells. High resolution (5-min and 1-km) radar reflectivity data from three rainy days over the German federal state North Rhine Westphalia (NRW) are used to evaluate the proposed algorithm. The performance of the new algorithm is compared with a radar reflectivity map and verified by two object-oriented methods: structure–amplitude–location (SAL) and geometric index. The verification results suggest that the performance of the new algorithm is good. Application of the RCIT algorithm to the selected cases shows that the inner structure of rainfall events in the experimental region present extreme value distributions, with most rainfall events having a short duration with less intensity. The new algorithm can effectively capture the stage changes of rain cells during their life cycles. The proposed algorithm can serve as the basis for further hydro-meteorological applications such as spatial and temporal analysis of rainfall events and short-term flood forecasting.
ARTICLE | doi:10.20944/preprints201704.0174.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Hierarchical search; Image retrieval; Multi-feature fusion
Online: 26 April 2017 (18:51:42 CEST)
Aiming at the problems that are poor generalization performance, low retrieval accuracy and large time consumption of existing content-based image retrieval system, the hierarchical image retrieval method based on multi feature fusion is proposed in this paper. The retrieval accuracy rates on Corel5K, UKbeach and Holidays are 68.23(Top 1), 3.73(N-S) and 88.20(mAp), respectively. The experimental results show that the method proposed in this paper can effectively improve the deficiency of single feature retrieval and save time significantly in the premise of a small amount of loss of accuracy.
ARTICLE | doi:10.20944/preprints202003.0284.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: deep learning; composite hybrid feature selection; machine learning; stack hybrid classification; CT-image; MPEG7 edge histogram feature extraction; CNN
Online: 18 March 2020 (08:32:46 CET)
The paper demonstrates the analysis of Corona Virus Disease based on a probabilistic model. It involves a technique for classification and prediction by recognizing typical and diagnostically most important CT images features relating to Corona Virus. The main contributions of the research include predicting the probability of recurrences in no recurrence (first time detection) cases at applying our proposed approach for feature extraction. The combination of the conventional statistical and machine learning tools is applied for feature extraction from CT images through four images filters in combination with proposed composite hybrid feature extraction (CHFS). The selected features were classified by the stack hybrid classification system(SHC). Experimental study with real data demonstrates the feasibility and potential of the proposed approach for the said cause.
ARTICLE | doi:10.20944/preprints201703.0206.v1
Subject: Engineering, Control And Systems Engineering Keywords: signal processing; feature selection; feature fusion; data fusion; gender recognition; sensor fusion; heart rate variability (HRV), electromyography (EMG); stepper
Online: 28 March 2017 (02:38:01 CEST)
Gender recognition is trivial for physiotherapist, but it is considered a challenge for computers. The electromyography (EMG) and heart rate variability (HRV) were utilized in this work for gender recognition during the stepping exercise using a stepper. The relevant features were extracted and selected. The selected features were then fused to automatically predict gender recognition. However, the feature selection for gender classification became a challenge to ensure better accuracy. Thus, in this paper, a feature selection approach based on both the performance and the diversity between the two features from the rank-score characteristic (RSC) function in a combinatorial fusion approach (CFA) was employed. Then, the features from the selected feature sets were fused using a CFA. The results were then compared with other fusion techniques such as naive bayes (NB), decision tree (J48), k-nearest neighbor (KNN) and support vector machine (SMO). Besides, the results were also compared with previous researches in gender recognition. The experimental results showed that the CFA was efficient for feature selection. The fusion method was also able to improve the accuracy of the gender recognition rate. The CFA provides much better gender classification results which is 94.51% compared to Nazarloo's work (90.34%) and other classifiers.
ARTICLE | doi:10.20944/preprints202308.1933.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Pancreatic cancer; NIH PLCO dataset; feature selection; classification
Online: 29 August 2023 (10:11:39 CEST)
Background: Pancreatic cancer (PC) is a disease with poor prognosis and survival rate. There is a pertinent need to identify the risk factors of this disease. The purpose of this study is to identify a subset of factors (a.k.a. features) as predictors of PC from the Prostate, Lung, Colorectal and Ovarian (PLCO) cancer dataset consisting of responses to 65 questions about demographics, cancer and health history, medication usage, and smoking habits from 154,897 participants. Method: There are two challenges to selecting the subset of features that predict PC with highest probability: the problem is computationally intractable, and the PLCO dataset is highly imbalanced. We use an innovative method to use the dataset in a balanced way, without involving up- or down-sampling. We use nine feature selection methods to select the optimal subset of features from the preprocessed and balanced dataset. Results: Our preprocessed dataset consists of 32 risk factors (8 demographics, 5 cancer history, 13 health history, 2 medication usage, 4 smoking habits). Risk factors belonging to cancer and health history, followed by smoking habits, were consistently chosen by the feature selection methods. We also discuss findings in the medical sciences literature that corroborate our findings. Conclusions: The study found that risk factors belonging to cancer and health history are the most prominent ones for PC. In particular, previously diagnosed with PC is chosen as the most prominent risk factor by majority of methods. While most of our findings are consistent with the literature, some of our findings shed light on novel factors that may not have received their due attention by the research community.
ARTICLE | doi:10.20944/preprints202307.1609.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: surface scanner; photogrammetry; close-range photogrammetry; feature tracks
Online: 25 July 2023 (03:05:48 CEST)
A close-range photogrammetric approach and its implementation using a CNC device and a macro camera are proposed. A tailored image acquisition approach is proposed that is implemented using this device. To increase reconstruction robustness and accuracy, the key point features detected in the acquired images are tracked across multiple views from multiple viewpoints at multiple distances. This approach reduces spurious correspondences and, as a result, the estimation accuracy of calibration parameters is increased and reconstruction errors are reduced. Qualitative and quantitative evaluation demonstrate the efficacy and accuracy of the proposed approach, which exhibits micrometre resolution and low implementation cost.
ARTICLE | doi:10.20944/preprints202307.0581.v1
Subject: Computer Science And Mathematics, Computer Networks And Communications Keywords: feature selection; taguchi-method; weighted average; classificaton; ensemble
Online: 10 July 2023 (09:39:48 CEST)
Feature selection is a crucial step in machine learning, aiming to identify the most relevant features in high-dimensional data, in order to reduce the computational complexity of model development and improve its generalization performance. Ensemble feature ranking methods combine the results of several feature selection techniques to identify a subset of the most relevant features for a given task. In many cases, they produce a more comprehensive ranking of features than the individual methods used in them. This paper presents a novel approach to ensemble feature ranking, which uses a weighted average of the individual ranking scores calculated by the individual methods. The optimal weights are determined using a Taguchi-type design of experiments. The proposed methodology significantly improves classification performance on the CSE-CIC-IDS2018 dataset, particularly for attack types where traditional average-based feature ranking score combinations resulted in low classification metrics.
ARTICLE | doi:10.20944/preprints202305.1277.v1
Subject: Engineering, Architecture, Building And Construction Keywords: Occupant behavior; Machine learning; Feature selection; Parameter tuning
Online: 18 May 2023 (05:27:29 CEST)
In this study, machine learning was used to predict and analyze the behavior of occupants in Gifu City residences during winter. Global warming is currently progressing worldwide, and it is important to control greenhouse gas emissions from the perspective of adaptation and mitigation. Occupant behavior is highly individualized and must be analyzed to accurately determine a building's energy consumption. The accuracy of heating behavior prediction has been studied using three different methods: logistic regression, support vector machine (SVM), and deep neural network (DNN). The generalization ability of the support vector machine and the deep neural network was improved by parameter tuning. Parameter tuning of the SVM showed that the values of C and gamma affected the prediction accuracy. The prediction accuracy improved by approximately 11.9 %, confirming the effectiveness of parameter tuning on SVM. Parameter tuning of the DNN showed that the values of layer and neuron affected the prediction accuracy. Although parameter tuning also improved the prediction accuracy of DNN, and the rate of increase was lower than that of SVM.
ARTICLE | doi:10.20944/preprints202304.0124.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: YOLOv8; small size targets; target detection; feature fusion
Online: 7 April 2023 (11:35:23 CEST)
Traditional camera sensors rely on human eyes for observation. However, the human eye 1 is prone to fatigue when observing targets of different sizes for a long time in complex scenes, and 2 human cognition is limited, which often leads to judgment errors and greatly reduces the efficiency. 3 Target recognition technology is an important technology to judge the target category in camera 4 sensor. In order to solve this problem, a small size target detection algorithm for special scenarios was 5 proposed by this paper. Its advantage is that this algorithm not only has higher precision for small 6 size target detection, but also can ensure that the detection accuracy of each size is not lower than the 7 existing algorithm. In this paper, a new down-sampling method was proposed, which could better 8 preserve the context feature information. The feature fusion network was improved to effectively 9 combine shallow information and deep information. A new network structure was proposed to 10 effectively improve the detection accuracy of the model. In terms of accuracy, it is better than: YOLOX, 11 YOLOXR, YOLOv3, scaled YOLOv5, YOLOv7-Tiny and YOLOv8.Three authoritative public data sets 12 were used in this experiment: a) On Visdron data sets (small size targets), DC-YOLOv8 is 2.5% more 13 accurate than YOLOv8. b) On Tinyperson data sets (minimal size targets), DC-YOLOv8 is 1% more 14 accurate than YOLOv8. c) On PASCAL VOC2007 data sets (Normal size target), DC-YOLOv8 is 0.5% 15 more accurate than YOLOv8.
ARTICLE | doi:10.20944/preprints202204.0254.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: multi-target tracking; DeepSORT; feature extraction; target detection
Online: 27 April 2022 (09:01:45 CEST)
Pedestrian multi-target tracking technology plays an important role in artificial intelligence, driverless, virtual reality and other fields. The pedestrian multi-target tracking algorithm DeepSORT based on detection is widely used in industry. It mainly tracks multiple pedestrian targets continuously and keeps their ID unchanged. In order to improve the applicability and tracking accuracy of DeepSORT algorithm, this paper improved the IOU distance measurement in the matching process. At the same time, ResNet50 is used as the feature extraction backbone network, and combined with FPN (Feature Pyramid Network), the appearance features of multi-layer pedestrians are fused to improve the tracking accuracy of DeepSORT algorithm. The proposed algorithm is verified on the public data set MOT-16 and it’s tracking accuracy is enhanced to 4.1%.
ARTICLE | doi:10.20944/preprints202109.0236.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Spam Detection; Feature extraction; N-grams; Machine Learning
Online: 14 September 2021 (11:36:36 CEST)
Recently, spam emails have become a significant problem with the expanding usage of the Internet. It is to some extend obvious to filter emails. A spam filter is a system that detects undesired and malicious emails and blocks them from getting into the users' inboxes. Spam filters check emails for something "suspicious" in terms of text, email address, header, attachments, and language. However, we have used different features such as word2vec, word n-grams, character n-grams, and a combination of variable length n-grams for comparative analysis in our proposed approach. Different machine learning models such as support vector machine (SVM), decision tree (DT), logistic regression (LR), and multinomial naïve bayes (MNB) are applied to train the extracted features. We use different evaluation metrics such as precision, recall, f1-score, and accuracy to evaluate the experimental results. Among them, SVM provides 97.6 \% of accuracy, 98.8\% of precision, and 94.9\% of f1-score using a combination of n-gram features.
ARTICLE | doi:10.20944/preprints202103.0447.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: COVID-19; ICU; feature selection; classification; ARIMA model
Online: 17 March 2021 (14:56:27 CET)
Since December 2019, the world is fighting against coronavirus disease (COVID-19). This disease is caused by a novel coronavirus termed as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). This work focuses on the applications of machine learning algorithms in the context of COVID-19. Firstly, regression analysis is performed to model the number of confirmed cases and death cases. Our experiments show that autoregressive integrated moving average (ARIMA) can reliably model the increase in the number of confirmed cases and can predict future cases. Secondly, a number of classifiers are used to predict whether a COVID-19 patient needs to be admitted to an intensive care unit (ICU) or semi-ICU. For this, classification algorithms are applied to a dataset having 5644 samples. Using this dataset, the most significant attributes are selected using features selection by ExtraTrees classifier, and Proteina C reativa (mg/dL) is found to be the highest-ranked feature. In our experiments, random forest, logistic regression, support vector machine, XGBoost, stacking and voting classifiers are applied to the top 10 selected attributes of the dataset. Results show that random forest and hard voting classifiers achieve the highest classification accuracy values near 98%, and the highest recall value of 98% in predicting the need for admission into ICU/semi ICU units.
ARTICLE | doi:10.20944/preprints202011.0541.v1
Subject: Engineering, Automotive Engineering Keywords: QC denoise automation; feature transformation techniques; classification methods
Online: 20 November 2020 (12:12:03 CET)
Seismic imaging is the main technology used for subsurface hydrocarbon prospection. It~provides an image of the subsurface using the same principles as ultrasound medical imaging. As for any data acquired through hydrophones (pressure sensors) and/or geophones (velocity/acceleration sensors), the raw seismic data are heavily contaminated with noise and unwanted reflections that need to be removed before further processing. Therefore, the noise attenuation is done at an early stage and often while acquiring the data. Quality control (QC) is mandatory to give confidence in the denoising process and to ensure that a costly data re-acquisition is not needed. QC is done manually by humans and comprises a major portion of the cost of a typical seismic processing project. It is therefore advantageous to automate this process to improve cost and efficiency. Here, we propose a supervised learning approach to build an automatic QC system. The~QC system is an attribute-based classifier that is trained to classify three types of filtering (mild = under filtering, noise remaining in the data; optimal = good filtering; harsh = over filtering, the signal is distorted). The attributes are computed from the data and represent geophysical and statistical measures of the quality of the filtering. The system is tested on a full-scale survey (9000 km2) to QC the results of the swell noise attenuation process in marine seismic data.
ARTICLE | doi:10.20944/preprints202010.0048.v1
Subject: Engineering, Automotive Engineering Keywords: QC denoise automation; feature transformation techniques; classification methods
Online: 2 October 2020 (15:32:31 CEST)
Seismic imaging is the main technology used for subsurface hydrocarbon prospection. It provides an image of the subsurface using the same principles as ultrasound medical imaging. It is based on emitting a sound (pressure) wave through the subsurface and recording the reflected echoes using hydrophones (pressure sensors) and/or geophones (velocity/acceleration sensors). Contrary to medical imaging, which is done in real time, subsurface seismic imaging is an offline process that involves a huge volume of data and needs considerable computing power. The raw seismic data are heavily contaminated with noise and unwanted reflections that need to be removed before further processing. Therefore, the noise attenuation is done at an early stage and often while acquiring the data. Quality control (QC) is mandatory to give confidence in the denoising process and to ensure that a costly data re-acquisition is not needed. QC is done manually by humans and comprises a major portion of the cost of a typical seismic processing project. It is therefore advantageous to automate this process to improve cost and efficiency. Here, we propose a supervised learning approach to build an automatic QC system. The QC system is an attribute-based classifier that is trained to classify three types of filtering (mild = underfiltering, noise remaining in the data; optimal = good filtering; harsh = overfiltering, the signal is distorted). The attributes are computed from the data and represent geophysical and statistical measures of the quality of the filtering. The system is tested on a full-scale survey (9000 km2) to QC the results of the swell noise attenuation process in marine seismic data. The results are encouraging and helped identify localized issues that were difficult for a human to spot.
ARTICLE | doi:10.20944/preprints202006.0048.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Pattern Recognition; Feature extraction; SVM; HOG; Zonal density
Online: 5 June 2020 (14:03:45 CEST)
Significant progress has made in pattern recognition technology. However, one obstacle that has not yet overcome is the recognition of words in the Brahmi script, specifically the recognition of characters, compound characters, and word because of complex structure. For this kind of complex pattern recognition problem, it is always difficult to decide which feature extraction and classifier would be the best choice. Moreover, it is also true that different feature extraction and classifiers offer complementary information about the patterns to be classified. Therefore, combining feature extraction and classifiers, in an intelligent way, can be beneficial compared to using any single feature extraction. This study proposed the combination of HOG +zonal density with SVM to recognize the Brahmi words. Keeping these facts in mind, in this paper, information provided by structural and statistical based features are combined using SVM classifier for script recognition (word-level) purpose from the Brahmi words images. Brahmi word dataset contains 6,475 and 536 images of Brahmi words of 170 classes for the training and testing, respectively, and the database is made freely available. The word samples from the mentioned database are classified based on the confidence scores provided by support vector machine (SVM) classifier while HOG and zonal density use to extract the features of Brahmi words. Maximum accuracy suggested by system is 95.17% which is better than previously suggested studies.
ARTICLE | doi:10.20944/preprints201901.0068.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: disease classification; read mapping; feature selection; machine learning
Online: 8 January 2019 (11:46:34 CET)
Disease classification based on biological data is an important area in bioinformatics and biomedical research. It helps the doctors and medical practitioners for the early detection of disease and support them as a computer-aided diagnostic tool for accurate diagnosis, prognosis, and treatment of disease. Earlier Microarray gene expression data have wide application for the classification of disease, but now Next-generation sequencing (NGS) has replaced the Microarray technology. From the last few years, RNA sequence (RNA-Seq) data are widely used for the transcriptomic analysis. Hence, RNA-Seq based classification of disease is in its infancy. In this article, we present a general framework for the classification of disease constructed on RNA-Seq data. This framework will guide the researchers to process RNA-Seq, extract relevant features and apply the appropriate classifier to classify any kind of disease.
COMMUNICATION | doi:10.20944/preprints201803.0054.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: data feature selection; data clustering; travel time prediction
Online: 7 March 2018 (13:30:06 CET)
In recent years, governments applied intelligent transportation system (ITS) technique to provide several convenience services (e.g., garbage truck app) for residents. This study proposes a garbage truck fleet management system (GTFMS) and data feature selection and data clustering methods for travel time prediction. A GTFMS includes mobile devices (MD), on-board units, fleet management server, and data analysis server (DAS). When user uses MD to request the arrival time of garbage truck, DAS can perform the procedure of data feature selection and data clustering methods to analyses travel time of garbage truck. The proposed methods can cluster the records of travel time and reduce variation for the improvement of travel time prediction. After predicting travel time and arrival time, the predicted information can be sent to user’s MD. In experimental environment, the results showed that the accuracies of previous method and proposed method are 16.73% and 85.97%, respectively. Therefore, the proposed data feature selection and data clustering methods can be used to predict stop-to-stop travel time of garbage truck.
ARTICLE | doi:10.20944/preprints201610.0075.v1
Subject: Computer Science And Mathematics, Signal Processing Keywords: BCI; recognition; feature extraction; ACCLN network; RBF network
Online: 19 October 2016 (10:09:19 CEST)
The electroencephalogram (EEG) is a record of brain activity. Brain Computer Interface (BCI) technology formed by the EEG signal has become one of the hotspots at present. How to extract the feature signal of EEG is the most basic research of BCI technology. In this paper, A new method of recognizing fatigue, conscious, concentrated state of human brain is proposed by the combination of discrete wavelet transform and the neural network based on EEG signal. First of all, the law signal is preprocessed by the wavelet denoising method because the law EEG signal contains a large number of high frequency noise, which is decomposed into multi-layer high frequency signal and low frequency signal. thus, δ wave, θ wave, α wave, β wave are obtained by the wavelet transform. And then, frequency band energy of the different wave is regards as the feature signal of EEG. In the experiment, the feature signal is classified by radial basic function (RBF) and annealed chaotic competitive learning network (ACCLN). RBF and ACCLN networks are trained with 500 sets of sample data and are tested by 100 sets of samples in different mental states. The experimental results show that the average accuracy of RBF network under three conditions are 88.75%, 88.25%, 88.5%, respectively, and the correct rate of ACCLN network is 97%, 98%, 98%, respectively.
ARTICLE | doi:10.20944/preprints202111.0202.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: solar energy; solar radiation prediction; hybrid machine learning; feature selection; feature extraction; classification algorithms; regression analysis; weather research and forecasting (WRF)
Online: 10 November 2021 (10:48:15 CET)
Solar radiation prediction is an important process in ensuring optimal exploitation of solar energy power. Numerous models have been applied to this problem, such as numerical weather prediction models and artificial intelligence models. However, well-designed hybridization approaches that combine numerical models with artificial intelligence models to yield a more powerful model can provide a significant improvement in prediction accuracy. In this paper, we propose novel hybrid machine learning approaches that exploit auxiliary numerical data. The proposed hybrid methods invoke different machine learning paradigms, including feature selection, classification, and regression. Additionally, numerical weather prediction (NWP) models are used in the proposed hybrid models. Feature selection is used for feature space dimension reduction to reduce the large number of recorded parameters that affect estimation and prediction processes. The rough set theory is applied for attribute reduction and the dependency degree is used as a fitness function. We investigate the effect of the attribute reduction process with thirty different classification and prediction models in addition to the proposed hybrid model. Then, different machine learning models are constructed based on classification and regression techniques to predict solar radiation. Moreover, other hybrid prediction models are formulated to use the output of the numerical model of Weather Research and Forecasting (WRF) as learning elements in order to improve the prediction accuracy. The proposed methodologies are evaluated using a data set that is collected from different regions in Saudi Arabia.
ARTICLE | doi:10.20944/preprints202308.1334.v1
Subject: Environmental And Earth Sciences, Environmental Science Keywords: PM2.5 concentration; feature selection; clustering algorithm; Adaboost integration model
Online: 18 August 2023 (09:49:34 CEST)
Determining accurate PM2.5 pollution concentrations and understanding their dynamic patterns is crucial for scientifically informed air pollution control strategies. Traditional reliance on linear correlation coefficients for ascertaining PM2.5 related factors only uncovers superficial relationships. Moreover, the invariance of conventional prediction models restricts their accuracy. To enhance the precision of PM2.5 concentration prediction, this study introduces a novel integrated model that leverages feature selection and a clustering algorithm. Comprising three components - feature selection, clustering, and integrated prediction, the model first employs the non-dominated sorting Genetic Algorithm (NSGA-III) to identify the most impactful features affecting PM2.5 concentration within air pollutants and meteorological factors. This step offers more valuable feature data for subsequent modules. The model then adopts a two-layer clustering method (SOM+K-means) to analyze the multifaceted irregularity within the dataset. Finally, the model establishes the Extreme Learning Machine (ELM) weak learner for each classification, integrating multiple weak learners using the Adaboost algorithm to obtain a comprehensive prediction model. Through feature correlation enhancement, data irregularity exploration, and model adaptability improvement, the proposed model significantly enhances the overall prediction performance. Data sourced from 12 Beijing-based monitoring sites in 2016 were utilized for an empirical study, and the model's results compared with five other predictive models. The outcomes demonstrate that the proposed model significantly heightens prediction accuracy, offering useful insights and potential for broadened application to multifactor correlation concentration prediction methodologies for other pollutants.
ARTICLE | doi:10.20944/preprints202305.0489.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Covid-19; KNN; SVM; Fractional Fourier transform; Feature Extraction
Online: 8 May 2023 (09:12:58 CEST)
Covid-19 is a lung disease caused by a Coronavirus family virus. Due to its extraordinary prevalence and death rates, it has spread quickly to every country in the world. Thus, achieving peaks and outlines and curing different types of relapses is extremely important. Given the worldwide prevalence of Coronavirus and the participation of physicians in all countries, Information has been gathered regarding the properties of the virus, its diverse types, and the means of analyzing it. Numerous approaches have been used to identify this evolving virus. It is generally considered the most accurate and acceptable method of examining the patient's lungs and chest through a CT scan. As part of the feature extraction process, a method known as fractional Fourier transform (FrFT) has been applied as one of the time-frequency domain transformations. The proposed method was applied to a database consisting of 2481 CT images. Following the transformation of all images into equal sizes and the removal of non-lung areas, multiple combination windows are used to reduce the number of features extracted from the images. In this paper, the results obtained for KNN and SVM classification have been obtained with accuracy values of 99.84% and 99.90%, respectively.
ARTICLE | doi:10.20944/preprints202303.0505.v1
Subject: Social Sciences, Psychology Keywords: emotional prosody; multi-feature oddball; mismatch negativity (MMN); P3a
Online: 29 March 2023 (10:55:44 CEST)
Purpose: Emotional voice conveys important social cues that demand listeners’ attention and timely processing. This event-related potential study investigated the feasibility of a multi-feature oddball paradigm to examine adult listeners’ neural responses to detecting emotional prosody changes in non-repeating naturally spoken words. Method: Thirty-three adult listeners completed the experiment by passively listening to the words in neutral and three alternating emotions while watching a silent movie. Previous research documented pre-attentive change-detection electrophysiological responses (e.g., MMN, P3a) to emotions carried by fixed syllables or words. Given that the MMN and P3a have also been shown to reflect extraction of abstract regularities over repetitive acoustic patterns, the current study employed a multi-feature oddball paradigm to compare listeners’ MMN and P3a to emotional prosody change from neutral to angry, happy, and sad emotions delivered with hundreds of non-repeating words in a single recording session. Results: Both MMN and P3a were successfully elicited by the emotional prosodic change over the varying linguistic context. Angry prosody elicited the strongest MMN compared to happy and sad prosodies. Happy prosody elicited the strongest P3a in the centro-frontal electrodes, and angry prosody elicited the smallest P3a. Conclusions: The results demonstrated that listeners were able to extract the acoustic patterns for each emotional prosody category over constantly changing spoken words. The findings confirm the feasibility of the multi-feature oddball paradigm in investigating emotional speech processing beyond simple acoustic change detection, which may potentially be applied to pediatric and clinical populations.
ARTICLE | doi:10.20944/preprints202203.0399.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: microbiome; genetic algorithm; feature selection; human health; machine learning
Online: 31 March 2022 (08:00:03 CEST)
The relationship between the host and the microbiome, or the assemblage of microorganisms (including bacteria, archaea, fungi, and viruses), has been proven crucial for its health and disease development. The high dimensionality of microbiome datasets has often been addressed as a major difficulty for data analysis, such as the use of Machine Learning (ML) and Deep Learning (DL) models. Here we present BiGAMi, a bi-objective genetic algorithm fitness function for feature selection in microbial datasets to train high-performing phenotype classifiers. The proposed fitness function allowed us to build classifiers that outperformed the baseline performance estimated by the original studies by using as few as 0.04% to 2.32% features of the original dataset. In 19 out of 21 classification exercises, BiGAMi achieved its results by selecting 6-68% fewer features than the highest performance of a Sequential Forward Feature Selection algorithm. This study showed that the application of a bi-objective GA fitness function against microbiome datasets succeeded in selecting small subsets of bacteria whose contribution to understood diseases and the host state was already experimentally proven. Applying this feature selection approach to novel diseases is expected to quickly reveal the microbes most relevant to a specific condition.
ARTICLE | doi:10.20944/preprints202008.0113.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Scene classification; Deep Learning; Convolutional Neural Networks; Feature learning
Online: 5 August 2020 (06:19:27 CEST)
State-of-the-art remote sensing scene classification methods employ different Convolutional Neural Network architectures for achieving very high classification performance. A trait shared by the majority of these methods is that the class associated with each example is ascertained by examining the activations of the last fully connected layer, and the networks are trained to minimize the cross-entropy between predictions extracted from this layer and ground-truth annotations. In this work, we extend this paradigm by introducing an additional output branch which maps the inputs to low dimensional representations, effectively extracting additional feature representations of the inputs. The proposed model imposes additional distance constrains on these representations with respect to identified class representatives, in addition to the traditional categorical cross-entropy between predictions and ground-truth. By extending the typical cross-entropy loss function with a distance learning function, our proposed approach achieves significant gains across a wide set of benchmark datasets in terms of classification, while providing additional evidence related to class membership and classification confidence.
ARTICLE | doi:10.20944/preprints202008.0051.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Variational autoencoder; Adversarial learning; Deep feature consistent; Data generation
Online: 2 August 2020 (18:11:27 CEST)
We present a method to improve the reconstruction and generation performance of variational autoencoder (VAE) by injecting an adversarial learning. On the other hand, instead of comparing the reconstructed with the original data to calculate the reconstruction loss, we use a consistency principle for deep features. The training process of the VAE is then divided into two steps, training the encoder and then training the decoder. By using this two-step learning process, our method can be more widely used in applications other than image processing. While training the encoder, the label information is integrated to better structure the latent space in a supervised way. The adversarial constraints allow the decoder to generate data with better authenticity and more realistic than the conventional VAE. We present experimental results to show that our method gives better performance than the original VAE.
ARTICLE | doi:10.20944/preprints202003.0081.v1
Subject: Biology And Life Sciences, Biophysics Keywords: COVID-19; electrostatic feature; salt bridging network; structural update
Online: 5 March 2020 (03:37:44 CET)
Since the Coronavirus disease (COVID-19) outbreak at the end of 2019, the past two month has seen an acceleration both in and outside China in the R&D of the diagnostics, vaccines and therapeutics for this novel coronavirus. As one of the molecular forces that determine protein structure, electrostatic effects dominate many aspects of protein behaviour and biological function. Thus, incorporating currently available experimental structures related to COVID-19, this article reports a simple python-based analysis tool and a LaTeX-based editing tool to extract and summarize the electrostatic features from experimentally determined structures, to strengthen our understanding of COVID-19's structure and function and to facilitate machine-learning and structure-based computational design of its neutralizing antibodies and/or small molecule(s) as potential therapeutic candidates. Finally, this article puts forward a brief update of the structurally observed electrostatic features of the COVID-19 coronavirus.
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: wheat; UAV image; color index; texture feature index; biomass
Online: 26 December 2019 (12:27:49 CET)
In order to realize rapid and nondestructive monitoring of wheat biomass in field, field experiments based on different densities, nitrogen fertilizer and variety treatments were studied. RGB images of wheat in the main growth stage were obtained by UAV, and wheat color and texture feature indices were obtained by image processing, and wheat biomass was obtained by field sampling in the same period. Then the relationship between different color and texture feature indices and wheat biomass was analyzed to select the color and texture feature index suitable for wheat biomass estimation. The results showed that there was a high correlation between image color index and wheat biomass in different stages, and most of them reached a very significant correlation level. However, the correlation between image texture feature index and wheat biomass was poor, only a few indexes reached significant or extremely significant correlation level. Based on the above results, the color indices with the highest correlation to wheat biomass or the combining indices of color and texture feature in different growth stage were used to construct estimation model of wheat biomass. The models were validated using independently measured biomass data, and the correlation between simulated and measured values reached the significant level, RMSE were smaller. This indicated that the estimated results by the models were reliable and accurate. It also showed that the estimation models of wheat biomass combined with color and texture feature indices of UAV image were better than the single color index models. The results would provide a new method for real-time monitoring of wheat field growth and biomass estimation.
ARTICLE | doi:10.20944/preprints201705.0142.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: edge detection; hyperspectral image; gravitation; remote sensing; feature space
Online: 19 May 2017 (06:00:18 CEST)
Edge detection is one of the key issues in the field of computer vision and remote sensing image analysis. Although many different edge-detection methods have been proposed for gray-scale, color, and multispectral images, they still face difficulties when extracting edge features from hyperspectral images (HSIs) that contain a large number of bands with very narrow gap in the spectral domain. Inspired by the clustering characteristic of the gravitation, a novel edge-detection algorithm for HSIs is presented in this paper. In the proposed method, we first construct a joint feature space by combining the spatial and spectral features. Each pixel of HSI is assumed to be a celestial object in the joint feature space, which exerts gravitational force to each of its neighboring pixel. Accordingly, each object travels in the joint feature space until it reaches a stable equilibrium. At the equilibrium, the image is smoothed and the edges are enhanced, where the edge pixels can be easily distinguished by calculating the gravitational potential energy. The proposed edge-detection method is tested on several benchmark HSIs and the obtained results were compared with those of three state-of-the-art approaches. The experimental results confirm the efficacy of the proposed method
ARTICLE | doi:10.20944/preprints202003.0299.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Data Mining; Alzheimer’s Dementia; Composite Hybrid Feature Selection; Machine learning; stack Hybrid Classification; AI; MRI; Neuroimaging; MPEG7 edge histogram feature extraction; CNN
Online: 19 March 2020 (11:25:01 CET)
Alzheimer's disease (AD) detection acting as an essential role in global health care due to misdiagnosis and sharing many clinical sets with other types of dementia, and costly monitoring the progression of the disease over time by magnetic reasoning imaging (MRI) with consideration of human error in manual reading. This paper goal a comparative study on the performance of data mining techniques on two datasets of Clinical and Neuroimaging Tests with AD. Our proposed model in the first stage, Apply clinical medical dataset to a composite hybrid feature selection (CHFS), for extract new features to select the best features due to eliminating obscures features, In parallel with Apply a novel hybrid feature extraction of three batch edge detection algorithm and texture from MRI images dataset and optimized with fuzzy 64-bin histogram. In the second stage, we applied a clinical dataset to a stacked hybrid classification(SHC) model to combine Jrip and random forest classifiers with six model evaluations as meta-classifier individually to improve the prediction of clinical diagnosis. At the same stage of improving the classification accuracy of neuroimaging (MRI) dataset images by applying a convolution neural network (CNN) in comparison with traditional classifiers, running on extracted features from images. The authors have collected the clinical dataset of 426 subjects with (1229 potential patient sample) from oasis.org and (MRI) dataset from a benchmark kaggle.com with a total of around ~5000 images each segregated into the severity of Alzheimer's. The datasets evaluated using an explorer set of weka data mining software for the analysis purpose. The experimental show that the proposed model of (CHFS) feature extraction lead to effectively reduced the false-negative rate with a relatively high overall accuracy with a stack hybrid classification of support vector machine (SVM) as meta-classifier of 96.50% compared to 68.83% of the previous result on a clinical dataset, Besides a compared model of CNN classification on MRI images dataset of 80.21%. The results showed the superiority of our CHFS model in predicting Alzheimer's disease more accurately with the clinical medical dataset in early-stage compared with the neuroimaging (MRI) dataset. The results of the proposed model were able to predict with accurately classify Alzheimer's clinical samples at a low cost in comparison with the MRI-CNN images model at the early stage and get a good indicator for high classification rate for MRI images when applying our proposed model of SHC.
ARTICLE | doi:10.20944/preprints202309.0156.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: deep learning; feature attribution; gaussian noise; LSTM; precipitation prediction; RMSE
Online: 5 September 2023 (03:03:04 CEST)
This paper explores the use of different deep learning models for predicting precipitation in 56 meteorological stations in Jilin Province, China. The models used include Stacked-LSTM, Transformer, and SVR, and Gaussian noise is added to the data to improve their robustness. Results show that the Stacked-LSTM model performs the best, achieving high prediction accuracy and stability. The study also conducts variable attribution analysis using LightGBM and finds that temperature, dew point, precipitation in previous days, and air pressure are the most important factors affecting precipitation prediction, which is consistent with traditional meteorological theory. The paper provides detailed information on the data processing, model training, and parameter settings, which can serve as a reference for future precipitation prediction tasks. The findings suggest that adding Gaussian noise to the dataset can improve the model's generalization ability, especially for predicting days with zero precipitation. Overall, this study provides useful insights into the application of deep learning models in precipitation prediction and can contribute to the development of meteorological forecasting and applications.
ARTICLE | doi:10.20944/preprints202308.1885.v1
Subject: Chemistry And Materials Science, Food Chemistry Keywords: Cheminformatics; Taste prediction; Machine learning; Deep learning; Molecular feature representation
Online: 29 August 2023 (03:06:30 CEST)
Taste determination in small molecules is critical in food chemistry, but traditional experimental methods can be time-consuming. Consequently, computational techniques have emerged as val-uable tools for this task. In this study, we explore taste prediction using various molecular feature representations and assess the performance of different machine learning algorithms on a dataset comprising 2,601 molecules. The results reveal that GNN-based models outperform other ap-proaches in taste prediction. Moreover, consensus models that combine diverse molecular repre-sentations demonstrate improved performance. Among these, molecular fingerprints + GNN con-sensus model emerges as the top performer, highlighting the complementary strengths of GNNs and molecular fingerprints. These findings have significant implications for food chemistry research and related fields. By leveraging these computational approaches, taste prediction can be expedited, leading to advancements in understanding the relationship between molecular structure and taste perception in various food components and related compounds.
ARTICLE | doi:10.20944/preprints202308.0373.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: ReID; Pyramid Vision Transformer; local feature clustering; side information embeddings
Online: 4 August 2023 (07:26:19 CEST)
Due to the influence of background conditions, lighting conditions, occlusion issues and the image resolution, how to extract robust person features is one of the difficulties in ReID research. Vision in Transformers (ViT) has achieved significant results in the field of computer vision. However, the existing problems still limit its application in ReID due to slow extraction of person features and difficulty in utilizing local features of people. To solve the mentioned problems, we utilize Pyramid Vision Transformer (PVT) as the backbone of feature extraction and propose a PVT-based ReID method in conjunction with other studies. Firstly, some improvements suitable for ReID are used on the PVT backbone, and we establish a basic model by using powerful methods verified on CNN-based ReID. Secondly, in an effort to further promote the robustness of the person features extracted by the PVT backbone, two new modules are designed. (1) The local feature clustering (LFC) is recommend to enhance the robustness of person features by calculating the distance between local features and global feature to select the most discrete local features and clustering them. (2) The side information embeddings (SIE) are used to encode non-visual information and send it into the network for training to reduce its impact on person features. Finally, the experiments show that PVTReID has achieved excellent results in ReID datasets and are 20% faster on average than CNN-based ReID methods.
ARTICLE | doi:10.20944/preprints202306.0161.v1
Subject: Computer Science And Mathematics, Security Systems Keywords: biometrics; deep learning; time series; feature selection; classification; accelerometer; Sustainability
Online: 2 June 2023 (09:00:51 CEST)
With the growing popularity of smartphones, user identification has become an essential component of maintaining security and privacy. This study investigates how smartphone accelerometer data can be used to identify users, and it makes recommendations for the ideal application parts. Accelerometer data from the HMOG public dataset was used to train deep learning, conventional classifiers, and voting classifiers, which were then utilized to identify users. To enhance performance, feature selection and pre-processing techniques were researched. The results show that RFE feature selection outperforms other approaches and that LSTM followed by XGBoost has the best identification performance as indicated by a relatively large number of machine learning performance measures. The proposed identification system nevertheless performed well and outperformed existing methods, which were principally created and tested on the same HMOG public smartphone dataset, even with a larger number of users. Further work would be necessary for such an application to reach its full potential, though.
ARTICLE | doi:10.20944/preprints202305.1866.v1
Subject: Engineering, Mechanical Engineering Keywords: Computational heat transfer; Coating; Feature combination; Machine learning; Heat-exchangers
Online: 26 May 2023 (05:38:40 CEST)
Cross flow heat exchangers are commonly used in the thermal industry to transfer heat from hot tubes to cooling fluid. To protect the heat exchanger tubes from corrosion and dust accumulation, microscale coatings are often applied. In this study, we present machine-learning models for predicting heat transfer from hot tubes with different micro-sized coatings to cooling fluid in a turbulent flow using computational fluid dynamics simulations. A dataset of approximately 1000 cases was generated by varying the coating coverage thickness of each tube, the inlet Reynolds number, fluid flow inlet temperature, and wall temperature of tubes. The machine-learning models were generated to predict the overall heat flow rate in the heat exchanger, and it was found that combining the features based on their importance preserved the accuracy of the models while maintaining all the relevant information. The simulation results demonstrate that the proposed method increases the coefficient of determination (R2) for the models. The R2 values for unseen data for Random Forest, K-Nearest Neighbors, and Support Vector Regression were 0.9810, 0.9037, and 0.9754, respectively, indicating the usefulness of the proposed model for predicting heat transfer in various types of heat exchangers.
ARTICLE | doi:10.20944/preprints202305.1519.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: crop prediction; machine learning; feature selection; artificial intelligent; smart farming
Online: 22 May 2023 (11:24:05 CEST)
This research investigates the potential benefits of integrating machine learning algorithms and IoT sensors in modern agriculture. The focus is on optimizing crop production and reducing waste through informed decisions about planting, watering, and harvesting crops. The paper discusses the current state of machine learning and IoT in agriculture, highlighting key challenges and opportunities. It also presents experimental results that demonstrate the impact of changing labels on the accuracy of data analysis algorithms. The findings recommend that by analyzing wide-ranging data collected from farms, including real-time data from IoT sensors, farmers can make more informed verdicts about factors that affect crop growth. Eventually, the integration of these technologies can transform modern agriculture by increasing crop yields while minimizing waste. In our studies, we achieve a classification accuracy of 99.59% using the Bayes Net algorithm and 99. 46% using Naïve Bayes Classifier, and Hoeffding Tree algorithms. Our results indicate that we achieved high accuracy results in our experiments in order to increase crop growth.
ARTICLE | doi:10.20944/preprints202305.0319.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: hyperspectral images; convolutional neural networks; graph convolutional networks; feature fusion
Online: 5 May 2023 (07:40:07 CEST)
Convolutional neural networks (CNN) have attracted much attention as a commonly used method for hyperspectral image (HSI) classification in recent years, however, CNNs can only be applied to Euclidean data and have limitations in dealing with relationships due to the limitations of local feature extraction. However, each pixel of a hyperspectral image contains a set of spectral bands that are correlated and interact with each other, and the methods used to process Euclidean data cannot effectively obtain these correlations. In contrast, the graph convolutional network (GCN) can be used in non-Euclidean data, but usually leads to oversmoothing and ignoring local detail features due to the need for superpixel segmentation processing to reduce computational effort. To overcome the above problems, we constructed a network a fusion network based on GCN and CNN, which contains two branches: a graph convolutional network based on superpixel segmentation and a convolutional network with added attention mechanism. The graph convolu-tional branch can extract the structural features and capture the relationships between the nodes, and the convolutional branch can extract the detailed features in the local fine region. Owing to the fact that the features extracted from the two branches are different, the classification performance can be improved by fusing the complementary features extracted from the two branches. To vali-date the proposed algorithm, experiments were conducted on three widely used datasets, namely Indian Pines, Pavia University, and Salinas, and the overall accuracy of 98.78% was obtained in the Indian Pines dataset, and the overall accuracy of 98.99% and 98.69% was obtained in the other two datasets. The results showed that the proposed fusion network can obtain richer features and achieve high classification accuracy.
ARTICLE | doi:10.20944/preprints202303.0209.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: chili sauce; Lactiplantibacillus plantarum; feature-based molecular network; metabolomics; taste
Online: 13 March 2023 (03:09:30 CET)
Lactobacillus plantarum has been observed to play a crucial role in shaping the sensory properties of chili sauce. However, the specific taste-active metabolites responsible for the desirable flavor profile of chili sauce remain inadequately characterized. This study employed a combination of metabolomics and web-based computational tools analysis to investigate the dynamic changes in taste-active metabolites during chili sauce fermentation. Initially, metabolites were rapidly annotated using a feature-based molecular network, leading to the tentative annotation of 206 metabolites, of which a significant proportion had not been previously reported. Subsequently, the VirtualTaste tool identified dihydrosphingosine, lactic acid, isoleucine, phytosphingosine, and gluconic acid as potential taste markers for quality control. Finally, pathway enrichment analysis revealed that these components were primarily associated with amino acid tRNA, phenylalanine, tyrosine, and tryptophan biosynthesis, as well as sphingolipid metabolism. This study provides valuable insights into the mechanisms underlying the formation of the distinctive flavor of chili sauce.
REVIEW | doi:10.20944/preprints202302.0478.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: data preprocessing; feature extraction; deep learning; illumination estimation; color constancy
Online: 28 February 2023 (01:24:14 CET)
Deep learning (DL) models have been recently widely used to extract task-oriented patterns from large scale of datasets, and to improve the data image understanding and analysis accuracy in many different decision-making processes for tasks such as image classification, segmentation, detection, and so on. However, in practice, the performances of DL models are easily affected by environmental illumination conditions. Conversely, DL models can also be utilized for extracting the illumination hints from the images, and these hints are critically useful for improving the model robustness, classifying the environmental scenes, estimating scene depth information, and rendering 3D objects. In this study, an extensive and exhaustive review is carried out for DL based color constancy, indoor and outdoor illumination estimation, and image depth estimations with the considerations of strengths and weaknesses of DL models. This study also explores the different network designs and the paradoxes in parameter optimization during the model training. Current technology barriers involved in implementing these models and recommendations to overcome these barriers are also suggested in the review.
ARTICLE | doi:10.20944/preprints202212.0490.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Kubernetes; Cluster federation; Feature-oriented configuration management; Vendor lock-in
Online: 26 December 2022 (12:06:42 CET)
Kubernetes (K8s) defines standardized APIs for container-based cluster orchestration so it becomes possible for application managers to deploy their applications in a unified manner across different cloud providers. A practical problem is however feature incompatibility between different K8s vendors, who offer commercial K8s products based on the open-source K8s distribution. A large number of documented features in this open-source distribution are optional features that are turned off by default, but can be activated by setting specific combinations of parameters and plug-in components in configuration manifests for the K8s control plane and worker node agents. However, none of these configuration manifests are standardized, giving K8s vendors the freedom to hide the manifests behind a single, more restricted, and proprietary customization interface. Therefore some optional K8s features cannot be activated consistently across K8s vendors and applications that require these features cannot be run on those vendors. In this paper we present a unified, vendor-agnostic feature management approach that bypasses the proprietary customization interface of K8s vendors in order to consistently activate optional K8s features across a federation of clusters hosted by different Kubernetes vendors. We describe vendor-agnostic reconfiguration tactics that are already applied in industry and cover a wide range of optional K8s features. Based on these tactics, we design and implement an autonomic controller for declarative feature compatibility management across a cluster federation. We found that the features configured through our vendor-agnostic approach have no impact on application performance when compared with a cluster where the features are configured using the configuration manifests of the open-source K8s distribution. Moreover, the maximum time to complete reconfiguration of a single feature is within 100 seconds, which is 6 times faster than using proprietary customization interfaces of mainstream K8s vendors such as Google Kubernetes Engine. However, there is a non-negligible disruption to running applications when performing the reconfiguration; this disruption impact does not appear using the proprietary customization methods of the K8s vendors. Therefore, our approach is best applied in the following three use cases: (i) when starting up new K8s clusters, (ii) when optional K8s features of existing clusters must be activated as quickly as possibly and temporary disruption to running applications can be tolerated or (iii) when proprietary customization interfaces do not allow to activate the desired optional feature.
ARTICLE | doi:10.20944/preprints202205.0379.v1
Subject: Engineering, Control And Systems Engineering Keywords: Battery autonomy; battery size; feature selection; Machine Learning; Optimization algorithms
Online: 27 May 2022 (10:12:42 CEST)
Microgrids are becoming popular nowadays because they provide clean, efficient, and low-cost energy. To use the stored energy in times of emergency or peak loads, microgrids require bulk storage capacity. Since microgrids are the future of renewable energy, the energy storage technology employed should be optimized to generate electricity. Batteries play a variety of essential roles in daily life and are used at peak hours and during a time of emergency. There are different types of batteries i.e., lion batteries, lead-acid batteries, etc. Optimal battery sizing of microgrids is a challenging problem, that limits modern technologies such as electric vehicles, etc. It is important to know different battery features such as battery life, battery throughput, and battery autonomy to get optimal battery sizing for microgrids. Mixed-integer linear programming (MILP) is an established technique for the integration and optimization of different energy sources and parameters for optimal battery sizing. A new MILP based dataset is introduced in this work. Support vector machine (SVM) is the machine learning application used to estimate the optimum battery size. The impact of feature selection algorithms on the proposed machine learning-based model is evaluated. The performance of the six best-performing feature selection algorithms is analyzed. The experimental results show that the feature selection algorithms improve the performance of the proposed methodology. Ranker search shows the best performance with a Spearman’s rank-ordered correlation constant of 0.9756, linear correlation constant of 0.9452, Kendall correlation constant of 0.8488 and root mean squared error of 0.0525.
ARTICLE | doi:10.20944/preprints202201.0259.v2
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: image classifier; image part; quick learning; feature overlap; positional context
Online: 11 April 2022 (10:17:57 CEST)
This paper describes an image processing method that makes use of image parts instead of neural parts. Neural networks excel at image or pattern recognition and they do this by constructing complex networks of weighted values that can cover the complexity of the pattern data. These features however are integrated holistically into the network, which means that they can be difficult to use in an individual sense. A different method might scan individual images and use a more local method to try to recognise the features in it. This paper suggests such a method, where a trick during the scan process can not only recognise separate image parts, as features, but it can also produce an overlap between the parts. It is therefore able to produce image parts with real meaning and also place them into a positional context. Tests show that it can be quite accurate, on some handwritten digit datasets, but not as accurate as a neural network, for example. The fact that it offers an explainable interface could make it interesting however. It also fits well with an earlier cognitive model, and an ensemble-hierarchy structure in particular.
TECHNICAL NOTE | doi:10.20944/preprints202009.0168.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: lung function; epigenetic aging; machine learning; feature selection; hyperparamter tuning
Online: 8 September 2020 (03:15:28 CEST)
Epigenetic aging has been found associated with a number of phenotypes and diseases. Few studies investigated its effect on lung function in relatively older people. However, this effect has not been explored in younger population. This study examines whether lung function at adolescent can be predicted with epigenetic age accelerations (AAs) using machine learning techniques. DNA methylation based AAs were estimated in 326 matched samples at two time points (at 10 years and 18 years) from the Isle of Wight Birth Cohort. Five machine learning regression models (linear, lasso, ridge, elastic net, and Bayesian ridge) were used to predict FEV1 (Forced Expiratory Volume in one second) and FVC (Forced Vital Capacity) at 18 years from feature selected predictor variables (based on mutual information) and AA changes between the two time points. The best models were ridge regression (R2 = 75.21% ± 7.42%; RMSE = 0.3768 ± 0.0653) and elastic net regression (R2 = 75.38% ± 6.98%; RMSE = 0.445 ± 0.069) for FEV1 and FVC, respectively. This study suggests that the application of machine learning in conjunction with tracking changes in AA over life span can be beneficial to assess the lung health in adolescence.
REVIEW | doi:10.20944/preprints202009.0013.v1
Subject: Medicine And Pharmacology, Oncology And Oncogenics Keywords: MicroRNA Expression; Feature Selection; Cancer Diagnosis; Fuzzy Logic; Co-Learning
Online: 1 September 2020 (11:42:35 CEST)
MicroRNAs are used as biomarkers for classification of cancer subtypes since certain miRNAs are differentially expressed in normal and patient samples. Moreover, miRNAs target mRNAs and can heavily influence Gene Expressions. Thus, deregulation of miRNAs is linked to various disorders. Thus, miRNAs can be used for prognosis and developing personalized health solutions for patients. Given the importance of miRNAs, there has been substantial work done in the field. In this paper, recent works in the field of using miRNAs expressions of patients were considered. A total of 20 papers were surveyed which utilized feature selection ensembles, fuzzy logic as well as deep learning. 10 papers have been reported which offer insight into how miRNAs can be utilized for subtype-specific or generalized cancer diagnosis.
ARTICLE | doi:10.20944/preprints202003.0036.v1
Subject: Medicine And Pharmacology, Other Keywords: ECG feature selection; heartbeat classification; arrhythmia detection; random forest classifier
Online: 3 March 2020 (11:12:20 CET)
Finding an optimal combination of features and classifier is still an open problem in the development of automatic heartbeat classification systems, especially when applications that involve resource-constrained devices are considered. In this paper, a novel study of the selection of informative features and the use of a random forest classifier while following the recommendations of the Association for the Advancement of Medical Instrumentation (AAMI) and an inter-patient division of datasets is presented. Features were selected using a filter method based on the mutual information ranking criterion on the training set. Results showed that normalized R-R intervals and features relative to the width of the QRS complex are the most discriminative among those considered. The best results achieved on the MIT-BIH Arrhythmia Database were an overall accuracy of 96.14% and F1-scores of 97.97%, 73.06%, and 90.85% in the classification of normal beats, supraventricular ectopic beats, and ventricular ectopic beats respectively. In comparison with other state of the art approaches tested under similar constraints, this work represents one of the highest performances reported to date while relying on a very small feature vector.
ARTICLE | doi:10.20944/preprints202002.0059.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: EEG; Transition; 2D to 3D; Anaglyph; Feature extraction; Classification; Hybrid
Online: 5 February 2020 (10:48:51 CET)
Despite the long and extensive history of 3D technology, it has recently attracted the attention of researchers. This technology has become the center of interest of young people because of the real feelings and sensations it creates. People see their environment as 3D because of their eye structure. In this study, it is hypothesized that people lose their perception of depth during sleepy moments and that there is a sudden transition from 3D vision to 2D vision. Regarding these transitions, the EEG signal analysis method was used for deep and comprehensive analysis of 2D and 3D brain signals. In this study, a single-stream anaglyph video of random 2D and 3D segments was prepared. After watching this single video, the obtained EEG recordings were considered for two different analyses: the part involving the critical transition (transition-state) and the state analysis of only the 2D versus 3D or 3D versus 2D parts (steady-state). The main objective of this study is to see the behavioral changes of brain signals in 2D and 3D transitions. To clarify the impacts of the human brain’s power spectral density (PSD) in 2D-to-3D (2D_3D) and 3D-to-2D (3D_2D) transitions of anaglyph video, 9 visual healthy individuals were prepared for testing in this pioneering study. Spectrogram graphs based on Short Time Fourier transform (STFT) were considered to evaluate the power spectrum analysis in each EEG channel of transition or steady-state. Thus, in 2D and 3D transition scenarios, important channels representing EEG frequency bands and brain lobes will be identified. To classify the 2D and 3D transitions, the dominant bands and time intervals representing the maximum difference of PSD were selected. Afterward, effective features were selected by applying statistical methods such as standard deviation (SD), maximum (max), and Hjorth parameters to epochs indicating transition intervals. Ultimately, k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), and Linear Discriminant Analysis (LDA) algorithms were applied to classify 2D_3D and 3D_2D transitions. The frontal, temporal, and partially parietal lobes show 2D_3D and 3D_2D transitions with a good classification success rate. Overall, it was found that Hjorth parameters and LDA algorithms have 71.11% and 77.78% classification success rates for transition and steady-state, respectively.
ARTICLE | doi:10.20944/preprints201911.0261.v1
Subject: Engineering, Control And Systems Engineering Keywords: feature selection; locally linear embedding; regularization technology; bearing fault diagnosis
Online: 22 November 2019 (10:05:03 CET)
The purpose of feature selection is to find important features from the original high-dimensional space. As atypical feature selection algorithm, Locally linear embedding(LLE)-based feature selection algorithm, which applies the idea of LLE to the graph-preserving feature selection framework, has been received wide attention. However, LLE-based feature selection framework is sensitive to noise and K-nearest neighbors. To address these problems, an improved LLE-based feature selection algorithm, robust LLE (RLLE) vote, is proposed. In this algorithm, $l_1$ and $l_2$ regularization are introduced into the high-dimensional reconstruction model of LLE. Furthermore, RLLE vote also proposes a criterion to measure the difference between the reconstruction features and the original features, and then the importance features can be selected by this criteria. Extensive experiments are carried out on a benchmark fault data set and the bearing data set collected from our own laboratory, and the experimental results demonstrate that RLLE vote achieves the most significant performance compared existing state-of-art methods.
ARTICLE | doi:10.20944/preprints201908.0228.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: EEG; luminance; brightness; IAPS; STFT; feature extraction; visual processing; emotion
Online: 22 August 2019 (03:43:25 CEST)
The aim of this study was to examine brightness effect, which is the perceptual property of visual stimuli, on brain responses obtained during visual processing of these stimuli. For this purpose, brain responses of the brain to changes in brightness were explored comparatively using different emotional images (pleasant, unpleasant and neutral) with different luminance levels. Moreover, electroencephalography recordings from 12 different electrode sites of 31 healthy participants were used. The power spectra obtained from the analysis of the recordings using short time Fourier transform were analyzed, and a statistical analysis was performed on features extracted from these power spectra. Statistical findings obtained from electrophysiological data were compared with those obtained from behavioral data. The results showed that the brightness of visual stimuli affected the power of brain responses depending on frequency, time and location. According to the statistically verified findings, the distinctive effect of brightness occurred in the parietal and occipital regions for all the three types of stimuli. Accordingly, the increase in the brightness of pleasant and neutral images increased the average power of responses in the parietal and occipital regions whereas the increase in the brightness of unpleasant images decreased the average power of responses in these regions. However, the increase in brightness for all the three types of stimuli reduced the average power of frontal and central region responses (except for 100-300 ms time window for unpleasant stimuli). The statistical results obtained for unpleasant images were found to be in accordance with the behavioral data. The results also revealed that the brightness of visual stimuli could be represented by changing the activity power of the brain cortex. The main contribution of this research was to comprehensively examine brightness effect on brain activity for images with different emotional content and different frequency bands at different time windows of visual processing for different brain regions. The findings emphasized that the brightness of visual stimuli should be viewed as an important parameter in studies using emotional image techniques such as image classification, emotion evaluation and neuro-marketing.
REVIEW | doi:10.20944/preprints201810.0087.v1
Subject: Computer Science And Mathematics, Geometry And Topology Keywords: Conflation, feature matching, deconfliction, road network, complexity of spatial conflation.
Online: 4 October 2018 (15:51:44 CEST)
Spatial data conflation plays a fundamental role in in many aspects of modern Geographic Information Systems (GIS) research and development such as geospatial data visualization, incremental updating of databases and disaster evaluation. The primary objective of conflation is to derive valuable information based on the comparison of multiple spatial data sources of homogeneous or heterogeneous nature. This paper reviews the state of art of the concept of conflation, feature matching, the most important progresses of conflation between image and road network data, and the complexity of spatial conflation.
ARTICLE | doi:10.20944/preprints201802.0102.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: line array cameras; pavement crack detection; feature analysis; adaptive lifting
Online: 15 February 2018 (16:41:25 CET)
This paper proposes a crack recognition method based on high-resolution line array cameras and adaptive lifting algorithm. By defining the crack rate, this algorithm calculates the ratio of the crack area to the area of the entire collected image to characterize the damage extent of the current section. The algorithm first uses image preprocessing to reduce the image noise, then uses histogram equalization to enhance the feature of the crack region, divides the whole image into multiple sub-blocks, and extracts region features in the sub-block. At the same time, this algorithm defines related feature descriptors, and constructs weak classifiers according to each feature descriptor, and converts the weak classifiers into strong classifiers by using an adaptive lifting algorithm. Finally, this algorithm realizes the division of the crack regions. Experimental results show that the proposed algorithm can meet the actual needs and is better than other classical algorithms.
ARTICLE | doi:10.20944/preprints201801.0160.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: target detection; dynamic background; information theory; feature matrix; computing resources
Online: 17 January 2018 (12:45:29 CET)
In recent years, many algorithms based on end-to-end deep networks have been proposed to deal with the target detection problem of videos. However, the deep network models usually consume a lot of computing resources during the procedure of analysis of videos with complex dynamic backgrounds. In this paper, a new method of object detection based on information theory is presented. Firstly, each frame in a video is converted into an effective information map by using the Harris corner detection method. Secondly, the sensitive areas in the frame are extracted by using the context information and the effective information maps of the consecutive video frames. The sensitive areas in the video frame are the candidate areas where the target objects would be appeared at high probabilities. Thirdly, the information entropy features of each sensitive area are extracted to form the feature matrix, based on which, an SVM model is trained for selecting the target areas from the sensitive areas. Finally, the locations of the objects are detected based on the target areas in the video with a complex dynamic background. As a lightweight video detection framework, the method presented in this paper can save a lot of computing resources. Experimental results show that this method can achieve good results in the benchmark of CDnet 2014.
ARTICLE | doi:10.20944/preprints201608.0055.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: seismic damage building; watershed segmentation; SAR; texture feature; change detection
Online: 5 August 2016 (12:19:24 CEST)
The information of seismic damage of buildings in SAR images of different time phase, especially in SAR images after earthquake, is easily disturbed by other factors, which affects the accuracy of information discrimination. In order to identify and evaluate the distribution information of the seismic damage accurately and make full use of the abundant texture features in the SAR image. The conventional method of change detection based on texture features usually takes the pixel as the calculating unit. In this paper, a method of texture feature change detection of SAR images based on watershed segmentation algorithm is proposed. Based on the optimization of texture feature parameters, the feature parameters are segmented by the watershed segmentation algorithm, and the feature object image is obtained. This method introduces the idea of object oriented, and carries out the calculation of the difference map at the object level, Finally, the classification threshold value of different types of seismic damage types is selected, and the recognition of building damage is achieved. Taking the ALOS data before and after the earthquake in Yushu as an example to verify the effectiveness of the method, the overall accuracy of the building extraction is 88.9%, Compared with pixel-based methods, it is proved that the proposed method is effective.
ARTICLE | doi:10.20944/preprints202004.0032.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: indoor positioning system; image-based positioning system; computer vision; SIFT; feature detection; feature description; cell phone camera; PnP problem; projection matrix; epipolar geometry; OpenCV
Online: 3 April 2020 (11:59:48 CEST)
As people grow a custom to effortless outdoor navigation there is a rising demand for similar possibility indoors as well. Unfortunately, indoor localization, being one of the necessary requirements for navigation, continues to be problem without a clear solution. In this article we are proposing a method for an indoor positioning system using a single image. This is made possible using small preprocessed database of images with known control points as the only preprocessing needed. Using feature detection with SIFT algorithm we can look through the database and find image which is the most similar to the image taken by user. Pair of images is then used to find coordinates of database image using PnP problem. Furthermore, projection and essential matrices are determined allowing for the user image localization ~ determining the position of the user in indoor environment. Benefits of this approach lies in the single image being the only input from user and no requirements for new onsite infrastructure and thus enables a simpler realization for the building management.
ARTICLE | doi:10.20944/preprints202309.0234.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Multi-Task Learning; Convolutional Neural Network; Automatic modulation recognition; Feature Fusion
Online: 6 September 2023 (15:08:47 CEST)
Recently, deep learning models have been widely applied to modulation recognition, which have become a hot topic due to their excellent end-to-end learning capabilities. However, current methods are mostly based on uni-modal inputs, which suffer from incomplete information and local optimization. To complement the advantages of different modalities, we focus on the multi-modal fusion method. Therefore, we introduce an iterative dual-scale attentional fusion (iDAF) method to integrate multimodal data. Firstly, two feature maps with different receptive field sizes are constructed using local and global embedding layers. Secondly, the feature inputs are iterated into the Iterative Dual Scale Attention Module (iDCAM), where the two branches capture the details of high-level features and the global weights of each modal channel, respectively. The iDAF not only extracts the recognition characteristics of each specific domains, but also complements the strengths of different modalities to obtain a fruitful view. Our iDAF achieves a recognition accuracy of 93.5\% at 10dB and 0.6232 at full SNR. The comparative experiments and ablation studies effectively demonstrate the effectiveness and superiority of the iDAF.
ARTICLE | doi:10.20944/preprints202309.0227.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: metastasis marker; gene expression; machine learning; XGBoost; breast cancer; feature importance.
Online: 5 September 2023 (05:21:26 CEST)
Cancer metastasis accounts for approximately 90% of cancer deaths, and elucidating markers in metastasis is the first step in its prevention. To characterize metastasis marker genes of breast cancer (MGs), XGBoost models that classify metastasis status were trained with gene expression profiles from TCGA. Then, a metastasis score (MS) was assigned to each gene by calculating the inner product between the feature importance and AUC performance of the models. As a result, the 54, 202, and 357 genes with the highest MS were characterized as MGs by empirical P-value cutoffs of 0.001, 0.005, and 0.01, respectively. The three sets of MGs were compared with those from existing metastasis marker databases, which provided significant results in most comparisons. We noticed that the set of MGs with the median EP cutoff showed better performance than the other two sets, suggesting the importance of the cutoff used in determining MGs. They were also significantly enriched in biological processes associated to breast cancer metastasis. The MGs that could not be identified by statistical analysis (e.g., GOLM1, ELAVL1, UBP1, and AZGP1) as well as the MGs with the highest MS (e.g., ZNF676, FAM163B, LDOC2, IRF1, and STK40) were verified via the literature. Additionally, we checked how close the MGs are located to each other in the protein–protein interaction networks. We expect that the characterized markers will help understand and prevent breast cancer metastasis.