ARTICLE | doi:10.20944/preprints201905.0342.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: cadastral boundaries; automation; feature extraction; object based image analysis
Online: 29 May 2019 (04:37:50 CEST)
The objective to fast-track the mapping and registration of large numbers of unrecorded land rights globally, leads to the experimental application of Artificial Intelligence (AI) in the domain of land administration, and specifically the application of automated visual cognition techniques for cadastral mapping tasks. In this research, we applied and compared the ability of rule-based systems within Object Based Image Analysis (OBIA), as opposed to human analysis, to extract visible cadastral boundaries from Very high resolution (VHR) World View-2 image, in both rural and urban settings. From our experiments, machine-based techniques were able to automatically delineate a good proportion of rural parcels with explicit polygons where the correctness of the automatically extracted boundaries was 47.4% against 74.24% for humans and the completeness of 45% for machine, as against 70.4% for humans. On the contrary, in the urban area, automatic results were counterintuitive: even though urban plots and buildings are clearly marked with visible features such as fences, roads and tacitly perceptible to eyes, automation resulted in geometrically and topologically poorly structured data, that could neither be geometrically compared with human digitised, nor actual cadastral data from the field. These results provide an updated snapshot with regards to the performance of contemporary machine-drive feature extraction techniques compared to conventional manual digitising.
ARTICLE | doi:10.20944/preprints201812.0067.v1
Subject: Environmental And Earth Sciences, Environmental Science Keywords: built-up area; classification; Landsat 8- OLI; feature engineering; feature learning; CNN; accuracy evaluation
Online: 5 December 2018 (12:06:34 CET)
Detailed built-up area information is valuable for mapping complex urban environments. Although a large number of classification algorithms about built-up areas have been developed, they are rarely tested from the perspective of feature engineering and feature learning. Therefore we launched a unique investigation to provide a full test of the OLI imagery for 15-m resolution built-up area classification in 2015, in Beijing, China. Training a classifier requires many sample points, and we propose a method based on the ESA's 38-meter global built-up area data of 2014, Open Street Map and MOD13Q1-NDVI to achieve rapid and automatic generation of a large number of sample points. Our aim is to examine the influence of a single pixel and image patch under traditional feature engineering and modern feature learning strategies. In feature engineering, we consider spectra, shape and texture as the input features, and SVM, random forest (RF) and AdaBoost as the classification algorithms. In feature learning, the convolution neural network (CNN) is used as the classification algorithm. In total, 26 built-up land cover maps were produced. Experimental results show that: (1) the approaches based on feature learning are generally better than those based on feature engineering in terms of classification accuracy, and the performance of ensemble classifiers e.g., RF, is comparable to that of CNN. Two dimensional CNN and the 7 neighborhood RF have the highest classification accuracy of nearly 91%. (2) Overall, the classification effect and accuracy based on image patches are better than those based on single pixels. The features that can highlight the information of the target category (for example, PanTex and EMBI) can help improve classification accuracy.
ARTICLE | doi:10.20944/preprints202306.0755.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Convolutional neural network; Chest CT images; Classification; Adaptive Feature Extraction
Online: 12 June 2023 (04:29:17 CEST)
Deep convolutional neural networks (CNN) are favored methods widely used in medical image processing due to their assured shown performance. Recently, the emergence of new lung diseases and the possibility of early detection of their symptoms has attracted many researchers to classify diseases by training deep CNNs on lung CT images. The trained networks are expected to distinguish between lung indications in diﬀerent diseases, especially at the early stages of them. With the hope of achieving this purpose, we proposed an eﬃcient deep CNN called AFEX-Net with adaptive feature extraction layers that successfully extract distinguishing features and classify chest CT images. The eﬃciency of the proposed network has two aspects: it is a lightweight network with low number of parameters and fast training and it has adaptive pooling layers and adaptive activation functions to increase its level of compatibility to the input data. The proposed network has been evaluated on a dataset with more than 10K chest CT slices, while an eﬃcient pre-processing method is developed to remove any bias from the images. Additionally, we evaluated the performance of the proposed model on the public COVID-CTset dataset to prove the generalisability of our model. The obtained results conﬁrm the competence of the proposed network in confronting medical images, where prompt and accurate learning is required.
ARTICLE | doi:10.20944/preprints202108.0067.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: Feature extraction; independent component analysis; 3D inversion; physical properties
Online: 3 August 2021 (09:45:30 CEST)
A major problem in the post-inversion geophysical interpretation is the extraction of geological information from inverted physical property models, which do not necessarily represent all underlying geological features. No matter how accurate the inversions are, each inverted physical property model is sensitive to limited aspects of subsurface geology and is insensitive to other geological features that are otherwise detectable with complementary physical property models. Therefore, specific parts of the geological model can be reconstructed from different physical property models. To show how this reconstruction works, we simulated a complex geological system that comprises an original layered earth model that has passed several geological deformations and alteration overprints. Linear combination of complex geological features comprised three physical property distributions: Electrical resistivity, induced polarization chargeability, and magnetic susceptibility models. This study proposes a multivariate feature extraction approach to extract information about the underlying geological features comprising the bulk physical properties. We evaluated our method in numerical simulations and compared three feature extraction algorithms to see the tolerance of each method to the geological artifacts and noises. We show that the fast-independent component analysis (fast-ICA) algorithm by negentropy maximization is a robust method in the geological feature extraction that can handle the added unknown geological noises. The post-inversion physical properties are also used to reconstruct the underlying geological sources. We show that the sharpness of the inverted images is an important constraint on the feature extraction process. Our method successfully separates geological features in multiple 3D physical property models. This methodology is reproducible for any number of lithologies and physical property combinations and can recover the latent geological features, including the background geological patterns from overprints of chemical alteration.
ARTICLE | doi:10.20944/preprints202308.1580.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Machine Learning; Face Recognition; image classification, Feature Extraction
Online: 22 August 2023 (13:22:53 CEST)
It is crucial to select the right machine learning classifier for image classification and face recog-nition. This study examines the effectiveness of four different face recognition classifiers - Support Vector Machines (SVM), Random Forest, K-Nearest Neighbors (KNN), and Neural Networks. An analysis of the Large Faces in the Wild (LFW) dataset was carried out using Principal Component Analysis (PCA). Classifiers are rigorously trained and evaluated based on the extracted features. Comparison of classifier performance is an insightful way to figure out their strengths and weaknesses. Having a visual representation of the classifier's performance gives a complete understanding of its capabilities. Through the selection of the most appropriate classifier, study results contribute to advancements in image classification, recognition, and biometric identification. The comparison study demonstrated that the Neural Network classifier was exceptionally accurate and proficient in recognizing faces from the LFW dataset when used in conjunction with PCA for feature extraction. According to the comparative analysis, the Neural Network classifier proved exceptionally accurate and proficient at identifying faces from the LFW dataset when combined with PCA for feature extraction.
ARTICLE | doi:10.20944/preprints201906.0245.v1
Subject: Engineering, Automotive Engineering Keywords: feature extraction; corner detection; FAST algorithm; Harris detector; UAV
Online: 25 June 2019 (08:27:29 CEST)
Many corner detector techniques have already been used in extracting information from UAV images to perform various photogrammetric and mapping activities. Among these techniques is the Feature from Accelerated Segment Test (FAST) and the Harris corner detector. It is widely agreed that the evaluation of detectors is of great importance because it evaluates and enhances the accuracy of the detected features. This research evaluates the performance of FAST-9 and FAST-12 as well as the Harris detector in terms of the repeatability rate, completeness, and correctness under different threshold values. Each method is evaluated in terms of its ability for detection UAV objects (crowd and cars features). Then the common detected features between both FAST versions and the Harris detector are extracted. This is to determine which method performs best under different image conditions (e.g., illumination variations, camera position and orientation, and image noise). The results show that the size of the threshold plays a crucial role in determining the number of detected feature points. An increase in the threshold value leads to a decrease in the number of detected points and vice versa. Thus, the correctness decreases whereas the completeness increases as a function of the threshold values. Furthermore, the relationship between the FAST-9 and the Harris detector is slightly better than those between the FAST-12 and the Harris detector. This is because the number of common features between the FAST-9 and the Harris detector are relatively higher than those between the FAST-12 and the Harris detector.
ARTICLE | doi:10.20944/preprints202311.1147.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: ransomware detection; machine learning; dynamic analysis; n-grams; Class Feature Weighting (CFW)
Online: 20 November 2023 (11:06:27 CET)
Ransomware attacks have risen alarmingly, with encryption techniques becoming more complex. This paper introduces a novel detection model tailored for ransomware's distinctive characteristics. The Intel PIN tool extracts Windows API invocation sequences related to file operations. These sequences are used to construct n-grams, forming feature vectors enhanced by a new Class Feature Weighting (CFW) metric to improve malware detection. Preliminary results demonstrate elevated accuracy and precision versus existing methods. The major contributions are: (1) Introducing an innovative deep learning model for few-shot ransomware classification using entropy features and transfer learning. (2) Achieving high weighted F1-score in classifying ransomware variants into families with limited training data. (3) Demonstrating the potential of entropy-based features to capture intricacies lost in image-based approaches, improving detection of new strains.
Subject: Computer Science And Mathematics, Information Systems Keywords: local feature extraction; scale-space representation; laplacian of gaussian; convolution template
Online: 8 October 2019 (10:33:37 CEST)
This paper presents a novel method to extract local features, which instead of calculating local extrema computes global maxima in a discretized scale-space representation. To avoid obtaining precise scales by interpolation and to achieve perfect rotation invariance, two essential techniques, increasing the width of kernels in pixel and utilizing disk-shaped convolution template are adopted in this method. Since the size of a convolution template is finite and finite templates can introduce computational error into convolution, we sufficiently discuss this problem and work out an upper bound of the computational error. The upper bound is utilized in the method to ensure that all features obtained are computed under a given tolerance. Besides, the technique of relative threshold to determine features is adopted to reinforce the robustness for the scene of changing illumination. Simulations show that this new method attains high performance of repeatability in various situations including scale change, rotation, blur, JPEG compression, illumination change and even viewpoint change.
ARTICLE | doi:10.20944/preprints201804.0192.v1
Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: abnormal ECG; ECG processing, feature extraction; heart beat classification, abnormality detection
Online: 16 April 2018 (06:28:25 CEST)
Automated Electrocardiogram (ECG) processing is an important technique which helps in identifying abnormalities in the heart before any formal diagnosis. This research presents a real-time and lightweight R-assisted feature extraction algorithm and a heartbeat classification scheme which achieves highly accurate abnormality detection. In the proposed algorithm, we extract fifteen features from each heartbeat taken from raw Lead-II ECG signals. The features carry medically valuable information such as locations, amplitude and energy of ECG waves (P, Q, R, S, T waves) which are then used for detection of any abnormality that might be present in the heartbeat using various classification algorithms. We have used four popular databases from Physionet and extracted ten thousand ECG signals from each for training the models and benchmarking results. Four classification models i.e. Naïve Bays, k-Nearest Neighbor, Neural Network, Decision Tree were used for abnormality detection validating the efficiency of the system.
ARTICLE | doi:10.20944/preprints202309.1397.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Relation extraction; Subject feature; Attention mechanism; Railway traffic in Tibet
Online: 21 September 2023 (03:30:34 CEST)
To address the deficiency of existing relation extraction models in effectively extracting relational triplets pertaining to railway traffic knowledge in Tibet, this paper constructs a Tibet Railway Traffic text dataset and provides an enhanced relation extraction model. The proposed model incorporates subject feature enhancement and relational attention mechanisms. It leverages a pre-trained model as the embedding layer to obtain vector representations of text. Subsequently, the subject is extracted and its semantic information is augmented using an LSTM neural network. Furthermore, during object extraction, the multi-head attention mechanism enables the model to prioritize relations associated with the aforementioned features. Finally, objects are extracted based on the subjects and relations. The proposed method has been comprehensively evaluated on multiple datasets, including the Tibet Railway Traffic text dataset and two public datasets. The results on the Tibet dataset achieves an F1-score of 93.3\%, surpassing the baseline model CasRel by 0.8\%, indicating a superior applicability of the proposed model. On the other hand, the model achieves F1-scores of 91.1\% and 92.6\% on two public datasets, NYT and WebNLG, respectively, outperforming the baseline CasRel by 1.5\% and 0.8\%, which highlights the good generalization ability of the proposed model.
ARTICLE | doi:10.20944/preprints202310.0974.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: feature extraction network; felf-attention mechanism; blurred image processing; underwater visual application
Online: 16 October 2023 (14:38:48 CEST)
Underwater image processing faces significant challenges due to the absorption and scattering of light as it travels through water. A self-supervised learning network based on the self-attention mechanism is proposed for underwater visual applications in this paper. The goal is to improve the effectiveness and stability of feature extraction from underwater images. By incorporating self-attention mechanism, the sensitivity of the original network architecture to degraded features of blurred underwater images is enhanced. The network proposed in this paper is trained using transfer learning and evaluated on various underwater image datasets. With distributive, quantitative, and qualitative advantages compared with other methods, experimental results have demonstrated that the algorithm presented shows a slighter decline in feature extraction ability for blurred images as the turbidity of water increases.
ARTICLE | doi:10.20944/preprints202309.2035.v1
Subject: Engineering, Mechanical Engineering Keywords: Feature extraction; Prognostics; Self-attention transfer network; High-dimensional data
Online: 29 September 2023 (08:21:32 CEST)
Machinery degradation assessment can offer meaningful prognosis and health management in-formation. Although numerous machine prediction models based on artificial intelligence have emerged in recent years, they still face a series of challenges: (1) Many models continue to rely on manual feature extraction. (2) Deep learning models still struggle with long sequence prediction tasks. (3) Health indicators are inefficient for remaining useful life (RUL) prediction with cross-operational environments when dealing with high-dimensional datasets as inputs. This research proposes a health indicator construction methodology based on a transformer self-attention transfer network (TSTN). This methodology can directly deal with the high-dimensional raw dataset and keep all the information without missing when the signals are taken as the input of the diagnosis and prognosis model. First, we design an encoder with a long-term and short-term self-attention mechanism to capture crucial time-varying information from a high-dimensional dataset. Second, we propose an estimator that can map the embedding from the encoder output to the estimated degradation trends. Then, we present a domain dis-criminator to extract invariant features from different machine operating conditions. The case studies with the FEMTO-ST bearing dataset and the Monte Carlo method for RUL prediction during the degradation process are conducted. The experiment results fully exhibited the signif-icant advantages of the proposed method compared to other state-of-the-art techniques.
ARTICLE | doi:10.20944/preprints202208.0201.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: auto-encoder; high sparse binary data; feature extraction; SNV integration
Online: 10 August 2022 (10:27:32 CEST)
Genomics involving tens of thousands of genes is a complex system determining phenotype. An interesting and vital issue is that how to integrate highly sparse genetic genomics data with a mass of minor effects into prediction model for improving prediction power. We find that deep learning method can work well to extract features by transforming highly sparse dichotomous data to lower dimensional continuous data in a non-linear way. This idea may provide benefits in risk prediction based on genome-wide data associated e.g. integrating most of the information in the genotype data. Hence, we developed a multi-stage strategy to extract information from highly sparse binary genotype data and applied it for risk prediction. Specifically, we first reduced the number of biomarkers via a univariable regression model to a moderate size. Then a trainable auto-encoder was used to extract compact representations from the reduced data. Next, we performed a LASSO problem process over a grid of tuning parameter values to select the optimal combination of extracted features. Finally, we applied such feature combination to two prognostic models, and evaluated predictive effect of the models. The results of simulation studies and real data applying indicated that these highly compressed transformation features could better improve predictive performance and did not easily lead to over-fitting.
ARTICLE | doi:10.20944/preprints202310.1345.v1
Subject: Environmental And Earth Sciences, Geography Keywords: area studied; BLFR model; BI-LSTM-CRF; improved heuristic disambiguation method; feature template; random forest
Online: 23 October 2023 (05:43:30 CEST)
Geospatial knowledge in massive academic papers can provide knowledge services such as location-based research hotspot analysis, spatio-temporal data aggregation, research results recommendation, etc. However, geospatial knowledge often exists implicitly in literature resources in unstructured form, which is difficult to be directly accessed and mined and utilized for rapid production of massive thematic maps. In this paper, we take the geospatial knowledge of the area studied as an example and introduce its extraction method in detail. An integrated feature template matching and random forest classification algorithm is proposed for accurately identifying research areas from the abstract texts of academic papers and producing thematic maps. Firstly, the precise recognition of geographical names is achieved step by step based on BiLSTM-CRF algorithm and improved heuristic disambiguation method; then, the area studied is extracted by the designed integrated feature recognition template of area studied using random forest classification algorithm, and a fast thematic map is designed for the knowledge of area studied, topic and literature. The experimental results show that the area studied recognition accuracy can reach 97%, the F-value is 96%, and the recall rate reaches 96%, achieving high accuracy and high efficiency of area studied extraction in text. Based on the geospatial knowledge, the thematic map can achieve the effect of fast map formation and accurate expression.
ARTICLE | doi:10.20944/preprints202211.0094.v1
Subject: Engineering, Mechanical Engineering Keywords: Bearing fault feature extraction; Blind deconvolution (BD); Multi-task optimization; Convolutional neural network
Online: 4 November 2022 (13:41:46 CET)
Blind deconvolution (BD) is one of the effective methods that help pre-process vibration signals and assist in bearing fault diagnosis. Currently, most BD methods design an optimization criterion and use frequency or time domain information independently to optimize a deconvolution filter. It recovers weak periodic impulses related to incipient faults. However, the random noise interference may cause the optimizer to overfit. The time-domain-based BD methods tend to extract fault-unrelated single peak impulse, and the frequency-domain-based BD methods tend to retain the maximum energy frequency component, which will lose the fault-related harmonics frequency components. To solve the above issue, we propose a hybrid criterion that combines the kurtosis for time domain optimization and the $G-l_1/l_2$ norm for the frequency domain. These two criteria are monotonically increasing and decreasing, so they mutually constrain to avoid overfitting. After that, we design a multi-task one-dimensional convolutional neural network with time and frequency branches to achieve an optimal solution for this hybrid criterion. The multi-task neural network realizes the simultaneous optimization of two domains. Experimental results show that our proposed method outperforms other state-of-the-art methods.
ARTICLE | doi:10.20944/preprints201803.0266.v1
Subject: Engineering, Mechanical Engineering Keywords: variational mode decomposition; random decrement technique; crankshaft bearing; engine; feature extraction
Online: 30 March 2018 (10:01:18 CEST)
The vibration signal of the engine contains strong background noise and many kinds of modulating components, which is difficult to diagnose. Variational mode decomposition (VMD) is a recently introduced adaptive signal decomposition algorithm with a solid theoretical foundation and good noise robustness compared with empirical mode decomposition (EMD). VMD can effectively avoid endpoint effect and modal aliasing. However, VMD cannot effectively eliminate the random noise in the signal, so the random decrement technique is introduced to solve the problem. Based on the crankshaft bearing fault simulation experiment, the four kinds of wear state vibration signals are decomposed by VMD, and the modal components with smaller permutation entropy are selected as fault components. Then the fault component is processed by the random decrement technique, and the Hilbert envelope spectrum of the fault component is obtained. Compared with the fault feature extraction method based on EMD and EEMD, the feature extraction results of the proposed method are better than those of the above two methods. The simulation analysis and the simulation test of the crankshaft bearing fault verify the effectiveness of the proposed method.
ARTICLE | doi:10.20944/preprints202309.0667.v1
Subject: Engineering, Architecture, Building And Construction Keywords: traditional village; roof feature line; slope segmentation; cloth simulation filter; UAV
Online: 11 September 2023 (10:12:44 CEST)
The extraction of roof feature lines is an important foundation for realizing large-scale and batch 3D modeling. However, the current traditional point cloud segmentation algorithms do not have satisfactory results in extracting roof feature lines of Chinese traditional residential buildings. In this paper, taking Jingping Village in Western Hunan as an example, we propose a method that combines multiple algorithms based on slope segmentation of roof patches to extract feature lines. Firstly, VDVI and CSF algorithms are used to extract the building and roof point cloud based on the MVS point cloud. Secondly, according to roof features, village buildings are classified, and 3D roof point cloud is projected into 2D regular grid data. Finally, the roof slope is segmented via slope direction, and internal and external feature lines are obtained after refinement through Canny edge detection and Hough straight line detection. Results reveal that this method effectively extracts feature lines of low-building roofs in traditional villages, with slope-based roof surface segmentation accuracy surpassing 99.6%. This method significantly outperforms the RANSAC algorithm and region segmentation algorithm.
ARTICLE | doi:10.20944/preprints202311.0350.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: 3D segmentation; feature extraction; regression machine learning; weight estimation
Online: 6 November 2023 (11:20:30 CET)
Accurate weight measurement is pivotal for monitoring the growth and well-being of cattle. However, the conventional weighing process, which involves physically placing cattle on scales, is labor-intensive and distressing for the animals. Hence, the development of automated cattle weight prediction techniques assumes critical significance. This study proposes a weight prediction approach for Korean cattle using 3D segmentation-based feature extraction and regression machine learning techniques from incomplete 3D shapes acquired from real farm environments. In the initial phase, we generated mesh data of 3D Korean cattle shapes using a multiple-camera system. Subsequently, deep learning-based 3D segmentation with the PointNet network model was employed to segment two dominant parts of the cattle. From these segmented parts, three crucial dimensions of Korean cattle were extracted. Finally, we implemented five regression machine learning models (CatBoost regression, LightGBM, Polynomial regression, Random Forest regression, and XGBoost regression) for weight prediction. To validate our approach, we captured 270 Korean cattle in various poses, totaling 1190 poses of 270 cattle. The best result was achieved with mean absolute error (MAE) of 25.2 kg and mean absolute percent error (MAPE) of 5.81% using the random forest regression model.
ARTICLE | doi:10.20944/preprints202311.1647.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: spiking neural network; logistic functions, fixed weights; random weights; classification; feature extraction
Online: 27 November 2023 (04:52:18 CET)
In this paper, we demonstrate that fixed-weight layers, generated from random distribution or logistic functions, can effectively extract significant features from input data, resulting in high accuracy on a variety of tasks, including Fisher’s iris, Wisconsin Breast Cancer, and MNIST datasets. We have observed that logistic functions yield high accuracy with less dispersion in results. We have also assessed the precision of our approach under conditions of minimizing spikes number generated in the network, it is a practically useful for reducing energy consumption in spiking neural networks. Our findings reveal that the proposed method demonstrates a highest accuracy on Fisher’s iris and MNIST datasets with logistic regression. Furthermore, they surpass the accuracy of the conventional (non-spiking) approach using logistic regression in the case of Wisconsin Breast Cancer. We have also investigated the impact of non-stochastic spike generation on accuracy.
ARTICLE | doi:10.20944/preprints202004.0524.v2
Subject: Biology And Life Sciences, Virology Keywords: unsupervised learning; tensor decomposition; feature selection; COVID-19; drug discovery; gene expression
Online: 3 June 2020 (05:29:09 CEST)
Background: COVID-19 is a critical pandemic that has affected human communities worldwide, and there is an urgent need to develop effective drugs. Although there are a large number of candidate drug compounds that may be useful for treating COVID-19, the evaluation of these drugs is time-consuming and costly. Thus, screening to identify potentially effective drugs prior to experimental validation is necessary. Method: In this study, we applied the recently proposed method tensor decomposition (TD)-based unsupervised feature extraction (FE) to gene expression profiles of multiple lung cancer cell lines infected with severe acute respiratory syndrome coronavirus 2. We identified drug candidate compounds that significantly altered the expression of the 163 genes selected by TD-based unsupervised FE. Results: Numerous drugs were successfully screened, including many known antiviral drug compounds such as C646, chelerythrine chloride, canertinib, BX-795, sorafenib, sorafenib, QL-X-138, radicicol, A-443654, CGP-60474, alvocidib, mitoxantrone, QL-XII-47, geldanamycin, fluticasone, atorvastatin, quercetin, motexafin gadolinium, trovafloxacin, doxycycline, meloxicam, gentamicin, and dibromochloromethane. The screen also identified ivermectin, which was first identified as an anti-parasite drug and recently the drug was included in clinical trials for SARS-CoV-2. Conclusions: The drugs screened using our strategy may be effective candidates for treating patients with COVID-19.
ARTICLE | doi:10.20944/preprints201812.0237.v1
Subject: Engineering, Mechanical Engineering Keywords: signal processing; sparse regression; system identification; impulse response; optimization; feature generation; structural dynamics; time series classification
Online: 19 December 2018 (16:21:41 CET)
Time recordings of impulse-type oscillation responses are short and highly transient. These characteristics may complicate the usage of classical spectral signal processing techniques for a) describing the dynamics and b) deriving discriminative features from the data. However, common model identification and validation techniques mostly rely on steady-state recordings, characteristic spectral properties and non-transient behavior. In this work, a recent method, which allows reconstructing differential equations from time series data, is extended for higher degrees of automation. With special focus on short and strongly damped oscillations, an optimization procedure is proposed that fine-tunes the reconstructed dynamical models with respect to model simplicity and error reduction. This framework is analyzed with particular focus on the amount of information available to the reconstruction, noise contamination and non-linearities contained in the time series input. Using the example of a mechanical oscillator, we illustrate how the optimized reconstruction method can be used to identify a suitable model and to extract features from uni-variate and multivariate time series recordings in an engineering-compliant environment. Moreover, the determined minimal models allow for identifying the qualitative nature of the underlying dynamical systems as well as testing for the degree and strength of non-linearity. The reconstructed differential equations would then be potentially available for classical numerical studies, such as bifurcation analysis. These results represent a physically interpretable enhancement of data-driven modeling approaches in structural dynamics.
ARTICLE | doi:10.20944/preprints201611.0052.v1
Subject: Physical Sciences, Acoustics Keywords: empirical mode decomposition; intrinsic mode function; permutation entropy; multi-scale permutation entropy; feature extraction
Online: 9 November 2016 (10:24:35 CET)
In order to solve the problem of feature extraction of underwater acoustic signals in complex ocean environment, a new method for feature extraction from ship radiated noise is presented based on empirical mode decomposition theory and permutation entropy. It analyzes the separability for permutation entropies of the intrinsic mode functions of three types of ship radiated noise signals, and discusses the permutation entropy of the intrinsic mode function with the highest energy. In this study, ship radiated noise signals measured from three types of ships are decomposed into a set of intrinsic mode functions with empirical mode decomposition method. Then, the permutation entropies of all intrinsic mode functions are calculated with appropriate parameters. The permutation entropies are obviously different in the intrinsic mode functions with the highest energy, thus, the permutation entropy of the intrinsic mode function with the highest energy is regarded as a new characteristic parameter to extract the feature of ship radiated noise. After that, the characteristic parameters, namely, the energy difference between high and low frequency, permutation entropy, and multi-scale permutation entropy, are compared with the permutation entropy of the intrinsic mode function with the highest energy. It is discovered that the four characteristic parameters are at the same level for similar ships, however, there are differences in the parameters for different types of ships. The results demonstrate that the permutation entropy of the intrinsic mode function with the highest energy is better in separability as the characteristic parameter than the other three parameters by comparing their fluctuation ranges and the average values of the four characteristic parameters. Hence, the feature of ship radiated noise can be extracted efficiently with the method.
ARTICLE | doi:10.20944/preprints202306.2037.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: Automatic Feature Extraction; Cadastral mapping; Fit-for-purpose; Interactive delineation; Mean-shift segmentation; Random Forest classification; Land administration
Online: 29 June 2023 (03:03:19 CEST)
Fit-for-purpose land administration (FFPLA) seeks to simplify cadastral mapping via lowering the costs and time associated with conventional surveying methods. The approach can be applied to both initial establishment and on-going maintenance of system. In Ethiopia, cadastral maintenance remains an on-going challenge, especially in rapidly urbanizing peri-urban areas, where farmers' land rights and tenure security are often jeopardized. Automatic Feature Extraction (AFE) is an emerging FFPLA approach, proposed as an alternative for mapping and updating cadastral boundaries. This study explores the role of the AFE approach for updating cadastral boundaries in the vibrant peri-urban areas of Addis Ababa. Open-source software solutions are utilized to assess the (semi-) automatic extraction of cadastral boundaries from orthophotos (segmentation), designation of 'boundary' and 'non-boundary' outlines (classification), and delimitation of cadastral boundaries (interactive delineation). Both qualitative and quantitative assessments of the achieved results (validation) are undertaken. A high-resolution orthophoto of the study area and a reference cadastral boundary shape file are used, respectively, for extracting the parcel boundaries and validating the interactive delineation results. Qualitative (visual) assessment verified the completed extraction of newly constructed cadastral boundaries in the study area, although non-boundary outlines such as footpaths and artefacts are also retrieved. For the buffer overlay analysis, the interactively delineated boundary lines and the reference cadastre were buffered within the spatial accuracy limits for urban and rural cadasters. As a result, the quantitative assessment delivered 52% correctness and 32% completeness for a buffer width of 0.4m and 0.6m, respectively, for the interactively delineated and reference boundaries. The study further demonstrated the potentially significant role AFE could assist in delivering fast, affordable, and reliable cadastral mapping. Further investigation, based on user input and expertise evaluation, could help to improve the approach and apply it to a real-world setting.
ARTICLE | doi:10.20944/preprints201703.0134.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: spatial-spectral feature; very high spatial resolution image; classification; Tobler’s First Law of Geography
Online: 17 March 2017 (05:06:12 CET)
Aerial image classification has become popular and has attracted extensive research efforts in recent decades. The main challenge lies in its very high spatial resolution but relatively insufficient spectral information. To this end, spatial-spectral feature extraction is a popular strategy for classification. However, parameter determination for that feature extraction is usually time-consuming and depends excessively on experience. In this paper, an automatic spatial feature extraction approach based on image raster and segmental vector data cross-analysis is proposed for the classification of very high spatial resolution (VHSR) aerial imagery. First, multi-resolution segmentation is used to generate strongly homogeneous image objects and extract corresponding vectors. Then, to automatically explore the region of a ground target, two rules, which are derived from Tobler’s First Law of Geography (TFL) and a topological relationship of vector data, are integrated to constrain the extension of a region around a central object. Third, the shape and size of the extended region are described. A final classification map is achieved through a supervised classifier using shape, size, and spectral features. Experiments on three real aerial images of VHSR (0.1 to 0.32 m) are done to evaluate effectiveness and robustness of the proposed approach. Comparisons to state-of-the-art methods demonstrate the superiority of the proposed method in VHSR image classification.
ARTICLE | doi:10.20944/preprints202308.0528.v1
Subject: Engineering, Bioengineering Keywords: Speech Imagery; Mental Task; Machine Leaning; Feature Extraction; Common spatial pattern (CSP); Filter bank Common Spatial Pattern (FBCSP); Brain – Computer Interface (BCI); Principal Components Analysis (PCA); Feature Selection; Channel Selection; Mutual Information; Lagrange Formula; Deep Learning; SVM Classifier
Online: 7 August 2023 (10:23:13 CEST)
Nowadays, brain signal processing is performed rapidly in various brain-computer interface (BCI) applications. Most researchers focus on developing new methods for the future or improving the basic implemented models to identify the optimum standalone feature set. Our research focuses on four ideas. One of them introduces future communication models, and the others are for improving old models or methods. These are: 1) new communication imagery model instead of speech imager using the mental task: Due to speech imagery is very difficult, and it is impossible to imagine sound for all of the characters in all of the languages. Our research introduces a new mental task model for all languages that call Lip-sync imagery. This model can use for all characters in all languages. This paper implemented two lip-sync for two sounds, characters or letters. 2) New combination Signals: Selecting an inopportune frequency domain can lead to inefficient feature extraction. Therefore, domain selection is so important for processing. This combination of limited frequency ranges proposes a preliminary for creating Fragmentary Continuous frequency. For the first model, two s intervals of 4 Hz as filter banks were examined and tested. The primary purpose is to identify the combination of filter banks with 4Hz (scale of each filter bank) from the 4Hz to 40Hz frequency domain as new combination signals (8Hz) to obtain well and efficient features using increasing distinctive patterns and decreasing similar patterns of brain activities.3) new supplement bond graph classifier for SVM classifier: When SVM linear uses in very noisy, the performance is decreased. But we introduce a new bond graph linear classifier to supplement SVM linear in noisy data. 4) a deep formula recognition model: it converts the data of the first layer into a formula model (formula extraction model). The main goal is to reduce the noise in the subsequent layers for the coefficients of the formulas. The output of the last layer is the coefficients selected by different functions in different layers. Finally, the classifier extracts the root interval of the formulas, and the diagnosis does based on the root interval. For all of the ideas achieved the results of implementing methods. The results are between 55% to 98%. Less result is 55% for the deep detection formula, and the highest result is 98% for new combination signals.
ARTICLE | doi:10.20944/preprints202312.0066.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: face swapping; feature disentanglement; semantic hierarchy-based feature fusion; controllable identity feature transfer
Online: 1 December 2023 (10:47:32 CET)
Face swapping is an intriguing and intricate task in the field of computer vision. Currently, most mainstream face swapping methods employ face recognition models to extract identity features and inject them into the generation process. Nonetheless, such methods often struggle to effectively transfer identity information, result in generated results failing to achieve a high identity similarity with the source face. Furthermore, if we can accurately disentangle identity information, we can achieve controllable face swapping, thereby providing more choices to users. In pursuit of this goal, we propose a new face swapping framework (ControlFace) based on the disentanglement of identity information. We disentangle the structure and texture of the source face, encoding and characterizing them in the form of feature embeddings separately. According to the semantic level of each feature representation, we inject them into the corresponding feature mapper and fuse them adequately in the latent space of StyleGAN. Owing to such disentanglement of structure and texture, we are able to controllablely transfer parts of the identity features. Extensive experiments and comparisons with state-of-the-art face swapping methods demonstrate the superiority of our face swapping framework in terms of transferring identity information, producing high-quality face images and controllable face swapping.
REVIEW | doi:10.20944/preprints202012.0377.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: Feature Selection, Feature Ranking, Grouping, Clustering, Biological Knowledge.
Online: 15 December 2020 (12:10:44 CET)
In the last two decades, there have been massive advancements in high throughput technologies, which resulted in the exponential growth of public repositories of gene expression datasets for various phenotypes. It is possible to unravel biomarkers by comparing the gene expression levels under different conditions, such as disease vs. control, treated vs. not treated, drug A vs. drug B, etc. This problem refers to a well-studied problem in the machine learning domain, i.e., the feature selection problem. In biological data analysis, most of the computational feature selection methodologies were taken from other fields, without considering the nature of the biological data. For gene expression data analysis, most of the existing feature selection methods rely on expression values alone to select the genes; and biological knowledge is integrated at the end of the analysis in order to gain biological insights or to support the initial findings. Thus, integrative approaches that utilize the biological knowledge while performing feature selection are necessary for this kind of data. The main idea behind the integrative gene selection process is to generate a ranked list of genes considering both the statistical metrics that are applied to the gene expression data, and the biological background information which is provided as external datasets. Since the integrative approach attracted attention in the gene expression domain, lately the gene selection process shifted from being purely data-centric to more incorporative analysis with additional biological knowledge.
ARTICLE | doi:10.20944/preprints202311.0111.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Glove-wearing detection; YOLOv8; Feature Pyramid Network; Feature Layer
Online: 2 November 2023 (04:07:54 CET)
Wearing gloves while operating machinery in workshops is an essential precaution to prevent mechanical injuries and burns from high temperatures, among other potential hazards. Ensuring workers are properly equipped with gloves, which is a crucial measure in accident prevention. Glove images often occupy a minimal proportion of the frame and are easily obscured by cluttered backgrounds, especially with limited edge computing resources. Consequently, this study proposes a glove detection algorithm called YOLOv8-AFPN-M-C2f based on YOLOv8, offering swifter detection speeds, lower computational demands, and enhanced accuracy for workshop scenarios. This research innovates by substituting the head of YOLOv8 with the AFPN-M-C2f network, amplifying the pathways for feature vector propagation, and mitigating semantic discrepancies between non-adjacent feature layers. Additionally, the introduction of a superficial feature layer enriches surface feature information, augmenting the model's sensitivity to smaller objects. To assess the performance of the YOLOv8-AFPN-M-C2f model, we conducted multiple experiments using a factory glove detection dataset compiled for this study. The results indicate that the enhanced YOLOv8 model surpasses other network models. Compared to the baseline YOLOv8 model, the refined version shows a 2.6% increase in mAP@50%, a 90.1% rise in FPS, and a 13% reduction in the number of parameters. This research contributes an effective solution for the detection of glove adherence.
ARTICLE | doi:10.20944/preprints202310.1526.v1
Subject: Physical Sciences, Theoretical Physics Keywords: feature losses; feature gains; quantum physics; and general relativity theory
Online: 24 October 2023 (10:06:30 CEST)
The Theory of differences between elements explains the reasons for the differences in physical concepts between different elements, such as quantum physics and the Theory of general relatividade, sendo explicada através de observações. Assim, é possível concluir a partir das observações que as diferenças são motivadas pelas reações entre os elementos, pela composição dos elementos, e pela influência dos espaços que promovem perdas e ganhos de características; portanto, não há semelhança geral entre a física quântica e a teoria geral da relatividade. O estudo tem como objetivo compreender os elementos e seus conceitos físicos que compõem o Universo.
ARTICLE | doi:10.20944/preprints202304.1204.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: dimensionality reduction; autoencoder; feature extraction; feature selection; guiding layer; regularization
Online: 29 April 2023 (04:14:20 CEST)
In the era of big data, feature engineering has proved its efficiency and importance in dimensionality reduction and useful information extraction from original features. Feature engineering can be expressed as dimensionality reduction and is divided into two types of methods such as feature selection and feature extraction. Each method has its pros and cons. There are a lot of studies to combine these methods. Sparse autoencoder (SAE) is a representative deep feature learning method that combines feature selection with feature extraction. However, existing SAEs do not consider the feature importance during training. It causes extracting irrelevant information. In this paper, we propose a parallel guiding sparse autoencoder (PGSAE) to guide the information by two parallel guiding layers and sparsity constraints. The parallel guiding layers keep the main distribution using Wasserstein distance which is a metric of distribution difference, and it suppresses the leverage of guiding features to prevent overfitting. We perform our experiments using four datasets that have different dimensionality and number of samples. The proposed PGSAE method produces a better classification performance compared to other dimensionality reduction methods.
ARTICLE | doi:10.20944/preprints202303.0391.v1
Subject: Medicine And Pharmacology, Veterinary Medicine Keywords: prognosis and health management, preprocessing data, feature extraction, feature selection.
Online: 22 March 2023 (04:31:53 CET)
In the chemical processing industries, sensors for pumps are among the most commonly used machinery. Condition-based maintenance (CBM) and prognosis health management (PHM) determine the most cost-effective time to overhaul pumps. In order to determine the status of the pump, a signal-emitting accelerometer is employed. Stationarity-based feature extraction from amplitude signals is used to process the signal. Utilizing the time-domain function, multiple statistical results were produced. Eight fault codes were classified using support vector machine method. The enormous amount of data points necessitated the use of feature selection. In terms of accuracy, precision, recall, and F1 score, the Chi-square feature selection method exceeds other approaches.
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: COVID-19; machine learning; feature significance; feature correlation; risk factors
Online: 2 June 2021 (14:54:10 CEST)
The COVID-19 pandemic affected the whole world, but not all countries were impacted equally. This opens the question of what factors can explain the initial faster spread in some countries compared to others. Many such factors are overshadowed by the effect of the countermeasures, so we studied the early phases of the infection when countermeasures have not yet taken place. We collected the most diverse dataset of potentially relevant factors and infection metrics to date for this task. Using it, we show the importance of different factors and factor categories as determined by both statistical methods and machine learning (ML) feature selection (FS) approaches. Factors related to culture (e.g., individualism, openness), development, and travel proved the most important. A more thorough factor analysis was then made using a novel rule discovery algorithm. We also show how interconnected these factors are and caution against relying on ML analysis in isolation. Importantly, we explore potential pitfalls found in the methodology of similar work and demonstrate their impact on COVID-19 data analysis. Our best models using the decision tree classifier can predict the infection class with roughly 80% accuracy.
ARTICLE | doi:10.20944/preprints202309.0133.v1
Subject: Engineering, Aerospace Engineering Keywords: Star image registration; Radial module feature; Rotation angle feature; Robustness; Real-time
Online: 4 September 2023 (07:16:38 CEST)
Star image registration is the most important step in the application of astronomical image differencing, stacking and mosaicking, which requires high robustness, accuracy and real--time of the algorithm, but there is no high--performance registration algorithm in this field. In this paper, we propose a star image registration algorithm that relies only on radial module features (RMF) and rotation angle features (RAF), which has excellent robustness, high accuracy, and good real--time performance. The test results on a large amount of simulated and real data show that the comprehensive performance of the proposed algorithm is significantly better than the four classical baseline algorithms in the presence of rotation, insufficient overlapping area, false stars, position deviation, magnitude deviation and complex sky background, which is a more ideal star image registration algorithm.
ARTICLE | doi:10.20944/preprints202306.0180.v1
Subject: Engineering, Control And Systems Engineering Keywords: Condition monitoring; Induction motor; Inter-turn short-circuit; Feature calculation; Feature reduction
Online: 2 June 2023 (10:22:01 CEST)
Electrical rotating machines like Induction Motors (IMs) are widely used in several industrial applications since their robust elements, provide high efficiency and give versatility in industrial applications. Nevertheless, the occurrence of faults in IMs is inherent to their operating conditions, hence, Inter-turn short-circuit (ITSC) is one of the most common failures that affect IMs and its appearance is due to electrical stresses leads to the degradation of the stator winding insulation. In this regard, this work proposes a diagnosis methodology for the assessment and detection of incipient ITSC in IMs, the proposed method is based on the processing of vibration, stator currents and magnetic stray-flux signals. Certainly, the novelty and contribution include the characterization of different physical magnitudes by estimating a set of statistical time domain features, as well as, their fusion and reduction through the Linear discriminant Analysis technique within a feature-level fusion approach. Furthermore, the fusion and reduction of information from different physical magnitudes leads to perform the automatic fault detection and identification by a simple Neural-Network (NN) structure. The proposed method is evaluated under a complete set of experimental data and the obtained results demonstrate that the fusion of information from different sources (physical magnitudes) allows to improve the accuracy during the detection of ITSC in IMs , the results make this proposal feasible to be incorporated as a part of condition-based maintenance programs in the industry.
ARTICLE | doi:10.20944/preprints202201.0258.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Skin cancer; Deep learning; Hybrid feature extractor; Local binary pattern; Feature extraction
Online: 18 January 2022 (12:43:50 CET)
Skin cancer is an exquisite disease globally nowadays. Because of the poor contrast and apparent resemblance between skin and lesions, automatic identification of skin cancer is complicated. The rate of human death can be massively reduced if melanoma skin cancer can be detected quickly using dermoscopy images. In this research, an anisotropic diffusion filtering method is used on dermoscopy images to remove multiplicative speckle noise and the fast-bounding box (FBB) method is applied to segment the skin cancer region. Furthermore, the paper consists of two feature extractor parts. One of the two features extractor parts is the hybrid feature extractor (HFE) part and another is the convolutional neural network VGG19 based CNN feature extractor part. The HFE portion combines three feature extraction approaches into a single fused feature vector: Histogram-Oriented Gradient (HOG), Local Binary Pattern (LBP), and Speed Up Robust Feature (SURF). The CNN method also is used to extract additional features from test and training datasets. This two-feature vector is fused to design the classification model. This classifier performs the classification of dermoscopy images whether it is melanoma or non-melanoma skin cancer. The proposed methodology is performed on two ordinary datasets and achieved the accuracy 99.85%, sensitivity 91.65%, and specificity 95.70%, which makes it more successful than previous machine learning algorithms.
ARTICLE | doi:10.20944/preprints202301.0304.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Disease Dental Caries; Gradient Boosting Decision Tree; Feature Selection; Machine Learning; Feature importance
Online: 17 January 2023 (09:25:45 CET)
Caries is a prevalent oral disease that primarily affects children and teenagers. Advances in ma-chine learning have caught the attention of scientists working with decision support systems to predict early tooth decay. Current research has developed machine learning algorithm for caries classification and reached high accuracy especially in ML for image data. Unfortunately, most studies on dental caries only focus on classification and prediction tasks, meanwhile dental carries prevention is more important. Therefore, this study aims to design an efficient feature for decision support system machine learning based that can identify various risk factors that cause dental caries and its prevention. The data used in the research work was obtained from the 2018 Korean Children's Oral Health Survey, which totaled nine datasets. The experimental results show that combining the mRMR and GINI Feature Importance methods when training with the GBDT model achieved the optimum performance of 95%, 93%, 99%, and 88% for accuracy, F1 score, precision, and recall, respectively. So, the proposed method has provided effective predictive model for dental caries prediction.
ARTICLE | doi:10.20944/preprints202206.0390.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Object detection; Feature fusion network; Multiple feature selection; Angle prediction; Pixel Attention Mechanism
Online: 29 June 2022 (03:09:52 CEST)
The object detection task is usually affected by complex backgrounds. In this paper, a new image object detection method is proposed, which can perform multi-feature selection on multi-scale feature maps. By this method, a bidirectional multi-scale feature fusion network is designed to fuse semantic features and shallow features to improve the detection effect of small objects in complex backgrounds. When the shallow features are transferred to the top layer, a bottom-up path is added to reduce the number of network layers experienced by the feature fusion network, reducing the loss of shallow features. In addition, a multi-feature selection module based on the attention mechanism is used to minimize the interference of useless information on subsequent classification and regression, allowing the network to adaptively focus on appropriate information for classification or regression to improve detection accuracy. Because the traditional five-parameter regression method has severe boundary problems when predicting objects with large aspect ratios, the proposed network treats angle prediction as a classification task. The experimental results on the DOTA dataset, the self-made DOTA-GF dataset and the HRSC 2016 dataset show that, compared with several popular object detection algorithms, the proposed method has certain advantages in detection accuracy.
ARTICLE | doi:10.20944/preprints202302.0196.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Recommendation; GNN; Information; Feature; Structure
Online: 13 February 2023 (03:28:02 CET)
With the rapid development of the Internet industry, the problem of information overload has arisen due to the abundance of information available online. Recommendation algorithms, as the core of recommendation systems, have been attracting much attention and are a hot topic of research for many experts and scholars. The classical recommendation algorithms are mainly divided into three major categories: collaborative filtering recommendation algorithms, content-based recommendation algorithms, and hybrid-based recommendation algorithms. Although these algorithms are widely used in various fields, with the proliferation of information, these traditional recommendation algorithms are no longer able to meet the needs of the times. To address this issue, recommendation systems have been developed to provide users with personalized and relevant information or products. Despite the wide use of recommendation algorithms, such as collaborative filtering, content-based filtering, and hybrid approaches, traditional recommendation algorithms have limitations and are no longer suitable for meeting the demands of the times. This paper proposes a new recommendation algorithm, SFRRG, that fuses structure and feature information in graph neural networks to improve the performance of the recommendation system in rating prediction. The effectiveness of the proposed algorithm is demonstrated through experiments on various data sets and compared with existing recommendation algorithms.
ARTICLE | doi:10.20944/preprints202211.0534.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Recommendation; GNN, Feature; Path; Embedding
Online: 29 November 2022 (04:19:09 CET)
Recommender systems as an effective information filtering system can be used to obtain information through the user's explicit or implicit behavior. On the one hand finding items that may be of interest to the user. On the other hand, the recommendation facilitates the interaction between the user and the item to increase the revenue. Recommender systems have been widely used in various fields, such as e-commerce, travel recommendation, online books and movies, social networks, etc, which can satisfy the intrinsic implicit needs of users through personalized services. In recent years, the development of deep learning has further improved the performance of recommendation systems. Although these methods improve the performance of the recommendation system, when the number of users and products increases, the recommendation system may face sparsity and cold start problems, and thus cannot achieve personalized recommendations. Knowledge graphs, which are structured data, have become the choice of many algorithms due to the high quality and wide scale of the data, and therefore many recommendation algorithms combined with knowledge graphs have emerged as a popular new direction in recommendation systems. These algorithms are able to preserve the rich connections between different entities. Moreover, when constructing the features of an entity, the entities that are far away from the central entity can also be utilized. Entities are no longer only directly connected to each other. To address the shortcomings of existing recommendation algorithms, this paper designs the recommendation algorithm GPRE using graph neural networks. GPRE focuses on expressing the user's features. The graph neural network provides GPRE with a strong generalization capability for modeling, which can provide long-range semantics between users and entities, as well as selective entity selection in the auxiliary graph neural network. Explicit semantic links are established between remote and central nodes to reduce the introduction of noise. In this paper, experiments are conducted on real-world datasets and the results are compared with baselines. The experimental results show that GPRE performs well on the experimental dataset.
ARTICLE | doi:10.20944/preprints202002.0415.v1
Subject: Medicine And Pharmacology, Other Keywords: electromyography; EMG; feature extraction; feature selection; myoelectric control; classification; pattern recognition; prosthetics; wearables; amputee
Online: 28 February 2020 (02:09:05 CET)
Myoelectric control is the cornerstone of many assistive technologies used in clinical practice, such as prosthetics and orthoses, and human-computer interaction, such as virtual reality control. Although the performance of such devices exceeds 90\% in controlled environments, myoelectric devices still face challenges in robustness to variability of daily living conditions. Within this survey, the intrisic physiological mechanisms limiting practical implementations of myoelectric devices were explored: the limb position effect and the contraction intensity effect. The degradation of electromyography (EMG) pattern recognition in the presence of these factors was demonstrated on six datasets, where performance was 13% and 20% lower in realistic environments compared to controlled environments for the limb position and contraction intensity effect, respectively. The experimental designs of limb position and contraction intensity literature were surveyed. Current state-of-the-art training strategies and robust algorithms for both effects were compiled and presented. Recommendations for future limb position effect studies include: the collection protocol providing exemplars of 6 positions (four limb positions and three forearm orientations), three-dimensional space experimental designs, transfer learning approaches, and multi-modal sensor configurations. Recommendations for future contraction intensity effect studies include: the collection of dynamic contractions, nonlinear complexity features, and proportional control.
ARTICLE | doi:10.20944/preprints201712.0057.v1
Subject: Environmental And Earth Sciences, Other Keywords: dimension reduction; feature extraction; hyperspectral image; weighted feature space; low rank representation; spectral clustering
Online: 11 December 2017 (06:55:22 CET)
Containing hundreds of spectral bands (features), hyperspectral images (HSIs) have high ability in discrimination of land cover classes. Traditional HSIs data processing methods consider the same importance for all bands in the original feature space (OFS), while different spectral bands play different roles in identification of samples of different classes. In order to explore the relative importance of each feature, we learn a weighting matrix and obtain the relative weighted feature space (RWFS) as an enriched feature space for HSIs data analysis in this paper. To overcome the difficulty of limited labeled samples which is common case in HSIs data analysis, we extend our method to semisupervised framework. To transfer available knowledge to unlabeled samples, we employ graph based clustering where low rank representation (LRR) is used to define the similarity function for graph. After construction the RWFS, any arbitrary dimension reduction method and classification algorithm can be employed in RWFS. The experimental results on two well-known HSIs data set show that some dimension reduction algorithms have better performance in the new weighted feature space.
ARTICLE | doi:10.20944/preprints202108.0433.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Speech emotion recognition; Feature extraction; Heterogeneous parallel network; Spectral features; Prosodic features; Multi-feature fusion
Online: 23 August 2021 (12:16:40 CEST)
Speech emotion recognition remains a heavy lifting in natural language processing. It has strict requirements to the effectiveness of feature extraction and that of acoustic model. With that in mind, a Heterogeneous Parallel Convolution Bi-LSTM model is proposed to address these challenges. It consists of two heterogeneous branches: the left one contains two dense layers and a Bi-LSTM layer, while the right one contains a dense layer, a convolution layer, and a Bi-LSTM layer. It can exploit the spatiotemporal information more effectively, and achieves 84.65%, 79.67%, and 56.50% unweighted average recall on the benchmark databases EMODB, CASIA, and SAVEE, respectively. Compared with the previous research results, the proposed model achieves better performance stably.
ARTICLE | doi:10.20944/preprints202111.0243.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Feature Selection; Malaria Diagnosis; Supervised learning
Online: 15 November 2021 (10:36:16 CET)
Malaria remains an important cause of death, especially in sub-Saharan Africa with about 228 million malaria cases worldwide and an estimated 405,000 deaths in 2019. Currently, malaria is diagnosed in the health facility using a microscope (BS) or rapid malaria diagnostic test (MRDT) and with area where these tools are inadequate the presumptive treatment is performed. Apart from that self-diagnosis and treatment is also practiced in some of the households. With the high-rate self-medication on malaria drugs, this study aimed at computing the most significant features using feature selection methods for best prediction of malaria in Tanzania that can be used in developing a machine learning model for malaria diagnosis. A malaria symptoms and clinical diagnosis dataset were extracted from patients’ files from four (4) identified health facilities in the regions of Kilimanjaro and Morogoro. These regions were selected to represent the high endemic areas (Morogoro) and low endemic areas (Kilimanjaro) in the country. The dataset contained 2556 instances and 36 variables. The random forest classifier a tree based was used to select the most important features for malaria prediction. Regional based features were obtained to facilitate accurate prediction. The feature ranking as indicated that fever is universally the most influential feature for predicting malaria followed by general body malaise, vomiting and headache. However, these features are ranked differently across the regional datasets. Subsequently, six predictive models, using important features selected by feature selection method, were used to evaluate the features performance. The features identified complies with malaria diagnosis and treatment guideline provided with WHO and Tanzania Mainland. The compliance is observed so as to produce a prediction model that will fit in the current health care provision system in Tanzania.
ARTICLE | doi:10.20944/preprints202110.0042.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: classification; ensemble; subspace; sparsity; feature ranking
Online: 4 October 2021 (10:36:37 CEST)
We propose a new ensemble classification algorithm, named Super Random Subspace Ensemble (Super RaSE), to tackle the sparse classification problem. The proposed algorithm is motivated by the Random Subspace Ensemble algorithm (RaSE). The RaSE method was shown to be a flexible framework that can be coupled with any existing base classification. However, the success of RaSE largely depends on the proper choice of the base classifier, which is unfortunately unknown to us. In this work, we show that Super RaSE avoids the need to choose a base classifier by randomly sampling a collection of classifiers together with the subspace. As a result, Super RaSE is more flexible and robust than RaSE. In addition to the vanilla Super RaSE, we also develop the iterative Super RaSE, which adaptively changes the base classifier distribution as well as the subspace distribution. We show the Super RaSE algorithm and its iterative version perform competitively for a wide range of simulated datasets and two real data examples. The new Super RaSE algorithm and its iterative version are implemented in a new version of the R package RaSEn.
ARTICLE | doi:10.20944/preprints202308.2105.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: convolutional neural networks; feature selection; transfer learning; feature fusion; gray wolf optimization; deep learning; skin lesion
Online: 31 August 2023 (10:24:52 CEST)
Melanoma is widely recognized as one of the most lethal forms of skin cancer, with its incidence showing an upward trend in recent years. Nonetheless, the timely detection of this malignancy substantially enhances the likelihood of patients’ long-term survival. Several computer-based methods have recently been proposed in the pursuit of diagnosing skin lesions at their early stages. Despite achieving some level of success, there still remains a margin of error that the machine learning community considers to be an unresolved research challenge. This study presents a novel framework for the classification of skin lesions. The framework incorporates deep features to generate a highly discriminant feature vector, while also maintaining the integrity of the original feature space. Recent deep models including Darknet53, DenseNet201, InceptionV3, and InceptionResNetV2 are employed in our study for the purpose of feature extraction. Additionally, transfer learning is leveraged to enhance the performance of our approach. In the subsequent phase, the extracted feature information from the chosen pre-existing models is combined, with the aim of preserving maximum information, prior to undergoing the process of feature selection using a novel entropy-controlled grey wolf optimization (ECGWO) algorithm. The integration of fusion and selection techniques is employed to initially incorporate the feature vector with a high level of information and subsequently eliminate redundant and irrelevant feature information. The efficacy of our design is substantiated through the evaluation on three benchmark dermoscopic datasets, namely PH2, ISIC-MSK, and ISIC-UDA. In order to validate the proposed methodology, a comprehensive evaluation is conducted, including a rigorous comparison with established techniques in the field.
ARTICLE | doi:10.20944/preprints202305.2209.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: breast cancer; Convolutional Neural Network (CNN); computer aided diagnosis (CAD); feature selection; feature classification; mammography images
Online: 31 May 2023 (09:02:30 CEST)
The prompt and accurate diagnosis of breast lesions, including the distinction between cancer, non-cancer, and suspicious cancer, plays a crucial role in the prognosis of breast cancer. In this paper, we introduce a novel method based on feature extraction and reduction for detection of breast cancer in mammography images. First, we extract features from multiple pre-trained convolutional neural network (CNN) models, and then concatenate them. The most informative features are selected based on their mutual information with the target variable. Subsequently, the selected features can be classified using a machine learning algorithm. We evaluate our approach using four different machine learning algorithms, and our results demonstrate that the neural network-based classifier yields an accuracy as high as 92% for the RSNA dataset which is a new dataset that provides two views and additional features such as age. We compare our proposed algorithm with state-of-the-art methods and demonstrate its superiority, particularly in terms of accuracy and sensitivity. For the MIAS dataset, we achieve an accuracy as high as 94.5%, and for the DDSM dataset, an accuracy of 96% is attained. These results highlight the effectiveness of our method in accurately diagnosing breast lesions and surpassing existing approaches.
ARTICLE | doi:10.20944/preprints202007.0688.v1
Subject: Medicine And Pharmacology, Neuroscience And Neurology Keywords: Computer aided diagnosis (CAD), brain magnetic resonance imaging (MRI) scans, feature extraction, feature reduction, classifiers, classification rule.
Online: 29 July 2020 (10:14:43 CEST)
Manual interpretation of these huge amounts of image volumes are susceptible to inter-reader variability and human error. Thus, accurate automated CAD scheme is highly desirable in clinical pathological diagnosis. In this research, plethora of machine learning paradigms (e.g. feature extraction, dimensionality reduction and supervised classification methods) were explored, evaluated, compared and analyzed to identify the optimal pathway for brain MR images (normal vs neoplastic) binary classification task. External validation dataset was used to test the generalizability of the optimal predictive models implemented. Relevant and informative features were selected to construct cross-validated decision tree and eventually simple rule set was built based on the decision tree. The experimental results show that almost all pattern recognition paradigms achieve high accuracy with careful selection of number of attributes. LDA+ELM with 55 features are the optimal pipelines which achieve perfect classification when training and test data are of same source; and achieving (accuracy=97.5%, AUC=0.989, sensitivity=95% and specificity=100%) under balanced test dataset; (accuracy=99.5%, AUC=0.988, sensitivity=95% and specificity=100%). Cross-validated decision tree model also shows comparable performance: accuracy=98.8%, AUC=99.1%, sensitivity=99.6% and specificity=98.2%. Three highly relevant and robust attributes are visualized and selected for construction of decision tree models and finally a rule sets are read directly off the decision tree. This rule sets can potentially serve as fast and accurate classification algorithm.
Subject: Engineering, Marine Engineering Keywords: Internal wave recognition; automation; CNN; feature extraction
Online: 14 August 2023 (04:38:20 CEST)
The internal wave recognition algorithm in an ocean data buoy system can be used to realize the real-time and flexible observation of internal waves, but there is no accurate automatic recognition method. To meet the need for automatic, real-time, and reliable internal wave recognition, an automatic internal wave recognition algorithm has been proposed for a tightly profiled intelligent buoy system. The sea profile temperature data collected by the Bailong buoy system in the Andaman Sea in 2018 were used to train and test the internal wave recognition neural network model, which consists of two parts: feature extraction and feature classification. The experiment compares the long short-term memory network (LSTM), convolutional neural network (CNN) with different layers, and deep neural network (DNN) without a feature extraction network and adjusts the number of convolutional nuclei and convolutional strides to improve the feature extraction efficiency. Experiments show that the best results can be obtained when a CNN layer is used as the feature extraction network, the convolutional step length is 4, the number of convolutional kernels is 5. The recall reaches 95.31% and the precision is 97.53%. The internal wave identification delay of the algorithm is 5.0862 minutes, the number of parameters is 1593, and the number of calculations is 3024. The algorithm can be directly deployed to the ocean data buoy system to realize the demand for automatic, real-time and reliable internal wave identification at the buoy end.
REVIEW | doi:10.20944/preprints202305.0663.v1
Subject: Computer Science And Mathematics, Mathematical And Computational Biology Keywords: depression detection; fusion; feature extraction; deep learning
Online: 9 May 2023 (13:20:58 CEST)
This study compares the performance of existing studies on multimodal emotion recognition, and proposes a model that fuses two modalities with the speaker's text and voice signals as input values and detects depression. Based on the DAIC-WOZ dataset, voice features were extracted using CNN, text features were extracted using Transformers, and two modalities were fused through a tensor fusion network. We also build a model to detect whether the speaker is depressed or not using LSTM in the final layer. This study suggests the possibility of increasing access to mental illness diagnosis by enabling patients to detect depression on their own in daily conversations. If the model proposed in this study is developed and the voice conversation system is connected, it will be easier for patients who cannot visit the hospital periodically or who are reluctant to visit the hospital to check their condition and seek recovery. Furthermore, it can be expanded to multi-label classification for various mental diseases and used as a simple self-mental disease diagnosis tool.
ARTICLE | doi:10.20944/preprints202102.0260.v3
Subject: Computer Science And Mathematics, Discrete Mathematics And Combinatorics Keywords: Feature Selection; Discrete Data; Heuristics; Running average
Online: 7 December 2021 (11:28:35 CET)
By applying a running average (with a window-size= d), we could transform Discrete data to broad-range, Continuous values. When we have more than 2 columns and one of them is containing data about the tags of classification (Class Column), we could compare and sort the features (Non-class Columns) based on the R2 coefficient of the regression for running averages. The parameters tuning could help us to select the best features (the non-class columns which have the best correlation with the Class Column). “Window size” and “Ordering” could be tuned to achieve the goal. this optimization problem is hard and we need an Algorithm (or Heuristics) for simplifying this tuning. We demonstrate a novel heuristics, Called Simulated Distillation (SimulaD), which could help us to gain a somehow good results with this optimization problem.
ARTICLE | doi:10.20944/preprints202111.0024.v1
Subject: Computer Science And Mathematics, Analysis Keywords: Fake news detection; Deep learning; Feature Engineering
Online: 1 November 2021 (15:34:46 CET)
The rapid infiltration of fake news is a flaw to the otherwise valuable internet, a virtually global network that allows for the simultaneous exchange of information. While a common, and normally effective, approach to such classification tasks is designing a deep learning-based model, the subjectivity behind the writing and production of misleading news invalidates this technique. Deep learning models are unexplainable in nature, making the contextualization of results impossible because it lacks explicit features used in traditional machine learning. This paper emphasizes the need for feature engineering to effectively address this problem: containing the spread of fake news at the source, not after it has become globally prevalent. Insights from extracted features were used to manipulate the text, which was then tested on deep learning models. The original unknown yet substantial impact that the original features had on deep learning models was successfully depicted in this study.
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: feature subset selection; disease classification; subtype detection
Online: 14 June 2021 (10:30:21 CEST)
Biologists seek to identify a small number of significant features that are important, non-redundant, and relevant from diverse omics data. For example, statistical methods like LIMMA and DEseq distinguish differentially expressed genes between a case and control group from the transcript profile. Researchers also apply various column subset selection algorithms on genomics datasets for a similar purpose. Unfortunately, genes selected by such statistical or machine learning methods are often highly co-regulated, making their performance inconsistent. Here, we introduce a novel feature selection algorithm that selects highly disease-related and non-redundant features from a diverse set of omics datasets. We successfully applied this algorithm to three different biological problems: a) disease to normal sample classification, b) multiclass classification of different disease samples, and c) disease subtypes detection. Considering classification ROC-AUC, False-positive, and False-negative rates, our algorithm outperformed other gene selection and differential expression (DE) methods for all six types of cancer datasets from TCGA considered here for binary and multiclass classification problems. Moreover, genes picked by our algorithm improved the disease subtyping accuracy for four different cancer types over the state-of-the-art methods. Hence, we posit that our proposed feature reduction method can support the community to solve various problems, including the selection of disease-specific biomarkers, precision medicine design, and disease sub-type detection.
ARTICLE | doi:10.20944/preprints202011.0412.v1
Subject: Biology And Life Sciences, Anatomy And Physiology Keywords: Process; ontological category; life concept; essential feature
Online: 16 November 2020 (10:49:11 CET)
Although increasing knowledge about biological systems has advanced exponentially in recent decades, it is surprising to realize that the very definition of Life keeps presenting theoretical challenges. Even if several lines of reasoning seek to identify the essence of life phenomenon, most of these thoughts contain fundamental problem in their basic conceptual structure. Most concepts fail to identify necessary and sufficient features to define life. Here, we analyzed the main conceptual framework regarding theoretical aspects supporting life concepts, such as (i) the physical, (ii) the cellular and (iii) the molecular approaches. Based on ontological analysis, we propose that Life should not be positioned under the ontological category of Matter. Yet, life should be better understood under the top-level ontology of “Process”. Exercising an epistemological approach, we propose that the essential characteristic pervading each and every living being is the presence of organic codes. Therefore, we explore theories in biosemiotics in order to propose a clear concept of life as a macrocode composed by multiple inter-related coding layers. Therefore, we suggest a clear distinction between the concept of life and living beings, a distinction that is not evident in theoretical terms. From the proposed concept, we suggest that the evolutionary process is a fundamental characteristic for life’s maintenance but not to its definition. The current proposition opens a fertile field of debate in astrobiology, biosemiotics and robotics.
ARTICLE | doi:10.20944/preprints202009.0521.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: electroencephalographic; feature selection; machine learning; prediction model
Online: 22 September 2020 (11:27:03 CEST)
In recent years, research has focused on generating mechanisms to assess the levels of subjects' cognitive workload when performing various activities that demand high concentration levels, such as driving a vehicle. These mechanisms have implemented several tools to analyze cognitive workload where the electroencephalographic (EEG) signals are the most used due to its high precision. However, one of the main challenges in the EEG signals implementing is finding the appropriate information to identify cognitive states. Here we show a new feature selection model for pattern recognition using information from EEG signals based on machine learning techniques called GALoRIS. GALoRIS combines Genetic Algorithms and Logistic Regression to create a new fitness function that identifies and selects the critical EEG features that contribute to recognizing high and low cognitive workload and structures a new dataset capable of optimizing the model's predictive process. We found that GALoRIS identifies data related to high and low cognitive workload of subjects while driving a vehicle using information extracted from multiple EEG signals, reducing the original dataset by more than 50%, maximizing the model's predictive capacity-achieving a precision rate greater than 90%.
ARTICLE | doi:10.20944/preprints202001.0318.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: efficient binary symbiotic; feature selection; classification; optimization
Online: 26 January 2020 (08:30:17 CET)
Feature selection is one of the main data preprocessing steps in machine learning. Its goal is to reduce the number of features by removing extra and noisy features. Feature selection methods must consider the accuracy of classification algorithms while performing feature reduction on a dataset. Meta-heuristic algorithms are the most successful and promising methods for solving this issue. The symbiotic organisms search algorithm is one of the successful meta-heuristic algorithms which is inspired by the interaction of organisms in the nature called Parasitism Commensalism Mutualism. In this paper, three engulfing binary methods based on the symbiotic organisms search algorithm are presented for solving the feature selection problem. In the first and second methods, several S-shaped and V-shaped transfer functions are used for binarizing the symbiotic organisms search algorithm, respectively. These methods are called BSOSS and BSOSV. In the third method, two new operators called BMP and BCP are presented for binarizing the symbiotic organisms search algorithm. This method is called EBSOS. The third approach presents an advanced binary version of the coexistence search algorithm with two new operators, BMP and BCP, to solve the feature selection problem, named EBSOS. The proposed methods are run on 18 standard UCI datasets and compared to base and important meta-heuristic algorithms. The test results show that the EBSOS method has the best performance among the three proposed approaches for binarization of the coexistence search algorithm. Finally, the proposed EBSOS approach was compared to other meta-heuristic methods including the genetic algorithm, binary bat algorithm, binary particle swarm algorithm, binary flower pollination algorithm, binary grey wolf algorithm, binary dragonfly algorithm, and binary chaotic crow search algorithm. The results of different experiments showed that the proposed EBSOS approach has better performance compared to other methods in terms of feature count and accuracy criteria. Furthermore, the proposed EBSOS approach was practically evaluated on spam email detection in particular. The results of this experiment also verified the performance of the proposed EBSOS approach. In addition, the proposed EBSOS approach is particularly combined with the classifiers including SVM, KNN, NB and MLP to evaluate this method performance in the detection of spam emails. The obtained results showed that the proposed EBSOS approach has significantly improved the accuracy and speed of all the classifiers in spam email detection.
ARTICLE | doi:10.20944/preprints201908.0011.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: rain cell; tracking; PIV; feature-based verification
Online: 1 August 2019 (10:16:12 CEST)
This study proposes a new algorithm termed rain cell identification and tracking (RCIT) to identify and track rain cells from high resolution weather radar data. Previous algorithms have limitations when tracking non-consequent rain cells owing to their use of maximum correlation coefficient methods and their lack of an alternative way to handle the variation stages of rain cells during their life cycles. To address these deficiencies, various methods are implemented in the new algorithm. These include the particle image velocimetry (PIV) method for motion estimation and the rain cell matching rule to obtain the stage changes of rain cells. High resolution (5-min and 1-km) radar reflectivity data from three rainy days over the German federal state North Rhine Westphalia (NRW) are used to evaluate the proposed algorithm. The performance of the new algorithm is compared with a radar reflectivity map and verified by two object-oriented methods: structure–amplitude–location (SAL) and geometric index. The verification results suggest that the performance of the new algorithm is good. Application of the RCIT algorithm to the selected cases shows that the inner structure of rainfall events in the experimental region present extreme value distributions, with most rainfall events having a short duration with less intensity. The new algorithm can effectively capture the stage changes of rain cells during their life cycles. The proposed algorithm can serve as the basis for further hydro-meteorological applications such as spatial and temporal analysis of rainfall events and short-term flood forecasting.
ARTICLE | doi:10.20944/preprints201704.0174.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Hierarchical search; Image retrieval; Multi-feature fusion
Online: 26 April 2017 (18:51:42 CEST)
Aiming at the problems that are poor generalization performance, low retrieval accuracy and large time consumption of existing content-based image retrieval system, the hierarchical image retrieval method based on multi feature fusion is proposed in this paper. The retrieval accuracy rates on Corel5K, UKbeach and Holidays are 68.23(Top 1), 3.73(N-S) and 88.20(mAp), respectively. The experimental results show that the method proposed in this paper can effectively improve the deficiency of single feature retrieval and save time significantly in the premise of a small amount of loss of accuracy.
ARTICLE | doi:10.20944/preprints202003.0284.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: deep learning; composite hybrid feature selection; machine learning; stack hybrid classification; CT-image; MPEG7 edge histogram feature extraction; CNN
Online: 18 March 2020 (08:32:46 CET)
The paper demonstrates the analysis of Corona Virus Disease based on a probabilistic model. It involves a technique for classification and prediction by recognizing typical and diagnostically most important CT images features relating to Corona Virus. The main contributions of the research include predicting the probability of recurrences in no recurrence (first time detection) cases at applying our proposed approach for feature extraction. The combination of the conventional statistical and machine learning tools is applied for feature extraction from CT images through four images filters in combination with proposed composite hybrid feature extraction (CHFS). The selected features were classified by the stack hybrid classification system(SHC). Experimental study with real data demonstrates the feasibility and potential of the proposed approach for the said cause.
ARTICLE | doi:10.20944/preprints201703.0206.v1
Subject: Engineering, Control And Systems Engineering Keywords: signal processing; feature selection; feature fusion; data fusion; gender recognition; sensor fusion; heart rate variability (HRV), electromyography (EMG); stepper
Online: 28 March 2017 (02:38:01 CEST)
Gender recognition is trivial for physiotherapist, but it is considered a challenge for computers. The electromyography (EMG) and heart rate variability (HRV) were utilized in this work for gender recognition during the stepping exercise using a stepper. The relevant features were extracted and selected. The selected features were then fused to automatically predict gender recognition. However, the feature selection for gender classification became a challenge to ensure better accuracy. Thus, in this paper, a feature selection approach based on both the performance and the diversity between the two features from the rank-score characteristic (RSC) function in a combinatorial fusion approach (CFA) was employed. Then, the features from the selected feature sets were fused using a CFA. The results were then compared with other fusion techniques such as naive bayes (NB), decision tree (J48), k-nearest neighbor (KNN) and support vector machine (SMO). Besides, the results were also compared with previous researches in gender recognition. The experimental results showed that the CFA was efficient for feature selection. The fusion method was also able to improve the accuracy of the gender recognition rate. The CFA provides much better gender classification results which is 94.51% compared to Nazarloo's work (90.34%) and other classifiers.
ARTICLE | doi:10.20944/preprints202312.0581.v1
Subject: Engineering, Other Keywords: Artificial Intelligence; Load Forecasting; Feature Selection; Outlier Rejection
Online: 8 December 2023 (12:42:36 CET)
Recently, the application of Artificial Intelligence (AI) in many areas of life has allowed raising the efficiency of systems and converting them into smart ones, especially in the field of energy. Integrating AI with power systems allows electrical grids to be smart enough to predict the future load, which is known as Intelligent Load Forecasting (ILF). Hence, suitable decisions for power system planning and operation procedures can be taken accordingly. Moreover, ILF can play a vital role in electrical demand response, which guarantees a reliable transitioning of power systems. This paper introduces a Perfect Load Forecasting Strategy (PLFS) for predicting future load in smart electrical grids based on AI techniques. The proposed PLFS consists of two sequential phases, which are; Data Preprocessing Phase (DPP) and Load Forecasting Phase (LFP). In the former phase, input electrical load dataset is prepared before the actual forecasting takes place through two essential tasks, namely feature selection and outlier rejection. Feature selec-tion is done using Advanced Leopard Seal Optimization (ALSO) as a new natural inspired opti-mization technique, Citation: To be added by editorial staff during production. Academic Editor: Firstname Last-name Received: date Revised: date Accepted: date Published: date Copyright: © 2023 by the authors. Submitted for possible open access publication under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). While outlier rejection is accomplished through Interquartile Range (IQR) as a measure of statis-tical dispersion. On the other hand, actual load forecasting takes place in LFP using a new pre-dictor called; Weighted K-Nearest Neighbor (WKNN) algorithm. The proposed PLFS has been tested through excessive experiments. Results have shown that PLFS outperforms recent load forecasting techniques as it introduces the maximum prediction accuracy with the minimum root mean square error.
ARTICLE | doi:10.20944/preprints202311.1004.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: ransomware; detection; dynamic analysis; feature selection; machine learning
Online: 16 November 2023 (02:13:52 CET)
Ransomware constitutes a distinctive category of pernicious software that sequesters a user's digital assets by encryption, holding them hostage until a sum is extorted from the victim. These incursions have escalated to become among the most prevalent and significant threats confronting both individuals and corporate entities. In combatting this virulent program, dynamic analysis has been established as the favored detection modality. Such analyses typically hinge on Windows API calls, the conduits through which programs requisition services from the operating system. Yet, the superfluous and unrelated Windows API calls interjected by adversaries into the execution stream of suspect binaries precipitate an excessively noisy behavioral sequence, which impairs the performance of counter-ransomware mechanisms. The research outlined herein introduces a novel non-signature-based detection paradigm that harnesses efficacious Windows API call sequences through supervised machine learning strategies. An innovative Enhanced Min Max (EmRmR) filter technique is proposed, aiming to purge noisy features and isolate the most indicative feature subset that encapsulates the ransomware's true behavior. The EmRmR method, diverging from the traditional Min Max approach, circumvents the superfluous calculations that are a hallmark of the conventional algorithms, thereby necessitating a reduced number of evaluations. Additionally, a refinement procedure has been integrated to contract the program's call trace volume by discarding those Windows API calls lacking a robust correlation with ransomware's pivotal behavior. Subsequent to rigorous experimental analyses and juxtaposition with extant behavior-based detection methodologies, the proposed strategy has demonstrated its efficacy in differentiating ransomware behavior, delivering high detection precision alongside a diminution in false-positive occurrences.
ARTICLE | doi:10.20944/preprints202310.1084.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: rotary table; eccentricity error; moiré signal; phase feature
Online: 17 October 2023 (11:25:50 CEST)
In view of the limitations of the existing eccentricity error separation method of the rotary table, an eccentricity error separation method based on the phase feature of the moiré signal of single reading head is proposed herein. A grating pair transmission model is established based on the analysis of the principle of the rotary table, thereby the influence of eccentricity error on the phase feature of the moiré signal in the rotation course of the rotary table is clarified, and the corresponding model between the phase feature spectral components and the eccentricity error is established. The verification experiments of the proposed method are carried out based on the laboratory-made circuit system. After verifying the accuracy of the data acquisition of the laboratory-made circuit board, the verification experiments of the eccentricity error separation effect of the proposed method are carried out. The experimental data are compared with those of the traditional method and the results show that the error between the two methods is 2.34μm and the relative error is 2.3%.
ARTICLE | doi:10.20944/preprints202308.1933.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Pancreatic cancer; NIH PLCO dataset; feature selection; classification
Online: 29 August 2023 (10:11:39 CEST)
Background: Pancreatic cancer (PC) is a disease with poor prognosis and survival rate. There is a pertinent need to identify the risk factors of this disease. The purpose of this study is to identify a subset of factors (a.k.a. features) as predictors of PC from the Prostate, Lung, Colorectal and Ovarian (PLCO) cancer dataset consisting of responses to 65 questions about demographics, cancer and health history, medication usage, and smoking habits from 154,897 participants. Method: There are two challenges to selecting the subset of features that predict PC with highest probability: the problem is computationally intractable, and the PLCO dataset is highly imbalanced. We use an innovative method to use the dataset in a balanced way, without involving up- or down-sampling. We use nine feature selection methods to select the optimal subset of features from the preprocessed and balanced dataset. Results: Our preprocessed dataset consists of 32 risk factors (8 demographics, 5 cancer history, 13 health history, 2 medication usage, 4 smoking habits). Risk factors belonging to cancer and health history, followed by smoking habits, were consistently chosen by the feature selection methods. We also discuss findings in the medical sciences literature that corroborate our findings. Conclusions: The study found that risk factors belonging to cancer and health history are the most prominent ones for PC. In particular, previously diagnosed with PC is chosen as the most prominent risk factor by majority of methods. While most of our findings are consistent with the literature, some of our findings shed light on novel factors that may not have received their due attention by the research community.
ARTICLE | doi:10.20944/preprints202307.1609.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: surface scanner; photogrammetry; close-range photogrammetry; feature tracks
Online: 25 July 2023 (03:05:48 CEST)
A close-range photogrammetric approach and its implementation using a CNC device and a macro camera are proposed. A tailored image acquisition approach is proposed that is implemented using this device. To increase reconstruction robustness and accuracy, the key point features detected in the acquired images are tracked across multiple views from multiple viewpoints at multiple distances. This approach reduces spurious correspondences and, as a result, the estimation accuracy of calibration parameters is increased and reconstruction errors are reduced. Qualitative and quantitative evaluation demonstrate the efficacy and accuracy of the proposed approach, which exhibits micrometre resolution and low implementation cost.
ARTICLE | doi:10.20944/preprints202307.0581.v1
Subject: Computer Science And Mathematics, Computer Networks And Communications Keywords: feature selection; taguchi-method; weighted average; classificaton; ensemble
Online: 10 July 2023 (09:39:48 CEST)
Feature selection is a crucial step in machine learning, aiming to identify the most relevant features in high-dimensional data, in order to reduce the computational complexity of model development and improve its generalization performance. Ensemble feature ranking methods combine the results of several feature selection techniques to identify a subset of the most relevant features for a given task. In many cases, they produce a more comprehensive ranking of features than the individual methods used in them. This paper presents a novel approach to ensemble feature ranking, which uses a weighted average of the individual ranking scores calculated by the individual methods. The optimal weights are determined using a Taguchi-type design of experiments. The proposed methodology significantly improves classification performance on the CSE-CIC-IDS2018 dataset, particularly for attack types where traditional average-based feature ranking score combinations resulted in low classification metrics.
ARTICLE | doi:10.20944/preprints202305.1277.v1
Subject: Engineering, Architecture, Building And Construction Keywords: Occupant behavior; Machine learning; Feature selection; Parameter tuning
Online: 18 May 2023 (05:27:29 CEST)
In this study, machine learning was used to predict and analyze the behavior of occupants in Gifu City residences during winter. Global warming is currently progressing worldwide, and it is important to control greenhouse gas emissions from the perspective of adaptation and mitigation. Occupant behavior is highly individualized and must be analyzed to accurately determine a building's energy consumption. The accuracy of heating behavior prediction has been studied using three different methods: logistic regression, support vector machine (SVM), and deep neural network (DNN). The generalization ability of the support vector machine and the deep neural network was improved by parameter tuning. Parameter tuning of the SVM showed that the values of C and gamma affected the prediction accuracy. The prediction accuracy improved by approximately 11.9 %, confirming the effectiveness of parameter tuning on SVM. Parameter tuning of the DNN showed that the values of layer and neuron affected the prediction accuracy. Although parameter tuning also improved the prediction accuracy of DNN, and the rate of increase was lower than that of SVM.
ARTICLE | doi:10.20944/preprints202304.0124.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: YOLOv8; small size targets; target detection; feature fusion
Online: 7 April 2023 (11:35:23 CEST)
Traditional camera sensors rely on human eyes for observation. However, the human eye 1 is prone to fatigue when observing targets of different sizes for a long time in complex scenes, and 2 human cognition is limited, which often leads to judgment errors and greatly reduces the efficiency. 3 Target recognition technology is an important technology to judge the target category in camera 4 sensor. In order to solve this problem, a small size target detection algorithm for special scenarios was 5 proposed by this paper. Its advantage is that this algorithm not only has higher precision for small 6 size target detection, but also can ensure that the detection accuracy of each size is not lower than the 7 existing algorithm. In this paper, a new down-sampling method was proposed, which could better 8 preserve the context feature information. The feature fusion network was improved to effectively 9 combine shallow information and deep information. A new network structure was proposed to 10 effectively improve the detection accuracy of the model. In terms of accuracy, it is better than: YOLOX, 11 YOLOXR, YOLOv3, scaled YOLOv5, YOLOv7-Tiny and YOLOv8.Three authoritative public data sets 12 were used in this experiment: a) On Visdron data sets (small size targets), DC-YOLOv8 is 2.5% more 13 accurate than YOLOv8. b) On Tinyperson data sets (minimal size targets), DC-YOLOv8 is 1% more 14 accurate than YOLOv8. c) On PASCAL VOC2007 data sets (Normal size target), DC-YOLOv8 is 0.5% 15 more accurate than YOLOv8.
ARTICLE | doi:10.20944/preprints202204.0254.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: multi-target tracking; DeepSORT; feature extraction; target detection
Online: 27 April 2022 (09:01:45 CEST)
Pedestrian multi-target tracking technology plays an important role in artificial intelligence, driverless, virtual reality and other fields. The pedestrian multi-target tracking algorithm DeepSORT based on detection is widely used in industry. It mainly tracks multiple pedestrian targets continuously and keeps their ID unchanged. In order to improve the applicability and tracking accuracy of DeepSORT algorithm, this paper improved the IOU distance measurement in the matching process. At the same time, ResNet50 is used as the feature extraction backbone network, and combined with FPN (Feature Pyramid Network), the appearance features of multi-layer pedestrians are fused to improve the tracking accuracy of DeepSORT algorithm. The proposed algorithm is verified on the public data set MOT-16 and it’s tracking accuracy is enhanced to 4.1%.
ARTICLE | doi:10.20944/preprints202109.0236.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Spam Detection; Feature extraction; N-grams; Machine Learning
Online: 14 September 2021 (11:36:36 CEST)
Recently, spam emails have become a significant problem with the expanding usage of the Internet. It is to some extend obvious to filter emails. A spam filter is a system that detects undesired and malicious emails and blocks them from getting into the users' inboxes. Spam filters check emails for something "suspicious" in terms of text, email address, header, attachments, and language. However, we have used different features such as word2vec, word n-grams, character n-grams, and a combination of variable length n-grams for comparative analysis in our proposed approach. Different machine learning models such as support vector machine (SVM), decision tree (DT), logistic regression (LR), and multinomial naïve bayes (MNB) are applied to train the extracted features. We use different evaluation metrics such as precision, recall, f1-score, and accuracy to evaluate the experimental results. Among them, SVM provides 97.6 \% of accuracy, 98.8\% of precision, and 94.9\% of f1-score using a combination of n-gram features.
ARTICLE | doi:10.20944/preprints202103.0447.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: COVID-19; ICU; feature selection; classification; ARIMA model
Online: 17 March 2021 (14:56:27 CET)
Since December 2019, the world is fighting against coronavirus disease (COVID-19). This disease is caused by a novel coronavirus termed as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). This work focuses on the applications of machine learning algorithms in the context of COVID-19. Firstly, regression analysis is performed to model the number of confirmed cases and death cases. Our experiments show that autoregressive integrated moving average (ARIMA) can reliably model the increase in the number of confirmed cases and can predict future cases. Secondly, a number of classifiers are used to predict whether a COVID-19 patient needs to be admitted to an intensive care unit (ICU) or semi-ICU. For this, classification algorithms are applied to a dataset having 5644 samples. Using this dataset, the most significant attributes are selected using features selection by ExtraTrees classifier, and Proteina C reativa (mg/dL) is found to be the highest-ranked feature. In our experiments, random forest, logistic regression, support vector machine, XGBoost, stacking and voting classifiers are applied to the top 10 selected attributes of the dataset. Results show that random forest and hard voting classifiers achieve the highest classification accuracy values near 98%, and the highest recall value of 98% in predicting the need for admission into ICU/semi ICU units.
ARTICLE | doi:10.20944/preprints202011.0541.v1
Subject: Engineering, Automotive Engineering Keywords: QC denoise automation; feature transformation techniques; classification methods
Online: 20 November 2020 (12:12:03 CET)
Seismic imaging is the main technology used for subsurface hydrocarbon prospection. It~provides an image of the subsurface using the same principles as ultrasound medical imaging. As for any data acquired through hydrophones (pressure sensors) and/or geophones (velocity/acceleration sensors), the raw seismic data are heavily contaminated with noise and unwanted reflections that need to be removed before further processing. Therefore, the noise attenuation is done at an early stage and often while acquiring the data. Quality control (QC) is mandatory to give confidence in the denoising process and to ensure that a costly data re-acquisition is not needed. QC is done manually by humans and comprises a major portion of the cost of a typical seismic processing project. It is therefore advantageous to automate this process to improve cost and efficiency. Here, we propose a supervised learning approach to build an automatic QC system. The~QC system is an attribute-based classifier that is trained to classify three types of filtering (mild = under filtering, noise remaining in the data; optimal = good filtering; harsh = over filtering, the signal is distorted). The attributes are computed from the data and represent geophysical and statistical measures of the quality of the filtering. The system is tested on a full-scale survey (9000 km2) to QC the results of the swell noise attenuation process in marine seismic data.
ARTICLE | doi:10.20944/preprints202010.0048.v1
Subject: Engineering, Automotive Engineering Keywords: QC denoise automation; feature transformation techniques; classification methods
Online: 2 October 2020 (15:32:31 CEST)
Seismic imaging is the main technology used for subsurface hydrocarbon prospection. It provides an image of the subsurface using the same principles as ultrasound medical imaging. It is based on emitting a sound (pressure) wave through the subsurface and recording the reflected echoes using hydrophones (pressure sensors) and/or geophones (velocity/acceleration sensors). Contrary to medical imaging, which is done in real time, subsurface seismic imaging is an offline process that involves a huge volume of data and needs considerable computing power. The raw seismic data are heavily contaminated with noise and unwanted reflections that need to be removed before further processing. Therefore, the noise attenuation is done at an early stage and often while acquiring the data. Quality control (QC) is mandatory to give confidence in the denoising process and to ensure that a costly data re-acquisition is not needed. QC is done manually by humans and comprises a major portion of the cost of a typical seismic processing project. It is therefore advantageous to automate this process to improve cost and efficiency. Here, we propose a supervised learning approach to build an automatic QC system. The QC system is an attribute-based classifier that is trained to classify three types of filtering (mild = underfiltering, noise remaining in the data; optimal = good filtering; harsh = overfiltering, the signal is distorted). The attributes are computed from the data and represent geophysical and statistical measures of the quality of the filtering. The system is tested on a full-scale survey (9000 km2) to QC the results of the swell noise attenuation process in marine seismic data. The results are encouraging and helped identify localized issues that were difficult for a human to spot.
ARTICLE | doi:10.20944/preprints202006.0048.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Pattern Recognition; Feature extraction; SVM; HOG; Zonal density
Online: 5 June 2020 (14:03:45 CEST)
Significant progress has made in pattern recognition technology. However, one obstacle that has not yet overcome is the recognition of words in the Brahmi script, specifically the recognition of characters, compound characters, and word because of complex structure. For this kind of complex pattern recognition problem, it is always difficult to decide which feature extraction and classifier would be the best choice. Moreover, it is also true that different feature extraction and classifiers offer complementary information about the patterns to be classified. Therefore, combining feature extraction and classifiers, in an intelligent way, can be beneficial compared to using any single feature extraction. This study proposed the combination of HOG +zonal density with SVM to recognize the Brahmi words. Keeping these facts in mind, in this paper, information provided by structural and statistical based features are combined using SVM classifier for script recognition (word-level) purpose from the Brahmi words images. Brahmi word dataset contains 6,475 and 536 images of Brahmi words of 170 classes for the training and testing, respectively, and the database is made freely available. The word samples from the mentioned database are classified based on the confidence scores provided by support vector machine (SVM) classifier while HOG and zonal density use to extract the features of Brahmi words. Maximum accuracy suggested by system is 95.17% which is better than previously suggested studies.
ARTICLE | doi:10.20944/preprints201901.0068.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: disease classification; read mapping; feature selection; machine learning
Online: 8 January 2019 (11:46:34 CET)
Disease classification based on biological data is an important area in bioinformatics and biomedical research. It helps the doctors and medical practitioners for the early detection of disease and support them as a computer-aided diagnostic tool for accurate diagnosis, prognosis, and treatment of disease. Earlier Microarray gene expression data have wide application for the classification of disease, but now Next-generation sequencing (NGS) has replaced the Microarray technology. From the last few years, RNA sequence (RNA-Seq) data are widely used for the transcriptomic analysis. Hence, RNA-Seq based classification of disease is in its infancy. In this article, we present a general framework for the classification of disease constructed on RNA-Seq data. This framework will guide the researchers to process RNA-Seq, extract relevant features and apply the appropriate classifier to classify any kind of disease.
COMMUNICATION | doi:10.20944/preprints201803.0054.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: data feature selection; data clustering; travel time prediction
Online: 7 March 2018 (13:30:06 CET)
In recent years, governments applied intelligent transportation system (ITS) technique to provide several convenience services (e.g., garbage truck app) for residents. This study proposes a garbage truck fleet management system (GTFMS) and data feature selection and data clustering methods for travel time prediction. A GTFMS includes mobile devices (MD), on-board units, fleet management server, and data analysis server (DAS). When user uses MD to request the arrival time of garbage truck, DAS can perform the procedure of data feature selection and data clustering methods to analyses travel time of garbage truck. The proposed methods can cluster the records of travel time and reduce variation for the improvement of travel time prediction. After predicting travel time and arrival time, the predicted information can be sent to user’s MD. In experimental environment, the results showed that the accuracies of previous method and proposed method are 16.73% and 85.97%, respectively. Therefore, the proposed data feature selection and data clustering methods can be used to predict stop-to-stop travel time of garbage truck.
ARTICLE | doi:10.20944/preprints201610.0075.v1
Subject: Computer Science And Mathematics, Signal Processing Keywords: BCI; recognition; feature extraction; ACCLN network; RBF network
Online: 19 October 2016 (10:09:19 CEST)
The electroencephalogram (EEG) is a record of brain activity. Brain Computer Interface (BCI) technology formed by the EEG signal has become one of the hotspots at present. How to extract the feature signal of EEG is the most basic research of BCI technology. In this paper, A new method of recognizing fatigue, conscious, concentrated state of human brain is proposed by the combination of discrete wavelet transform and the neural network based on EEG signal. First of all, the law signal is preprocessed by the wavelet denoising method because the law EEG signal contains a large number of high frequency noise, which is decomposed into multi-layer high frequency signal and low frequency signal. thus, δ wave, θ wave, α wave, β wave are obtained by the wavelet transform. And then, frequency band energy of the different wave is regards as the feature signal of EEG. In the experiment, the feature signal is classified by radial basic function (RBF) and annealed chaotic competitive learning network (ACCLN). RBF and ACCLN networks are trained with 500 sets of sample data and are tested by 100 sets of samples in different mental states. The experimental results show that the average accuracy of RBF network under three conditions are 88.75%, 88.25%, 88.5%, respectively, and the correct rate of ACCLN network is 97%, 98%, 98%, respectively.
ARTICLE | doi:10.20944/preprints202111.0202.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: solar energy; solar radiation prediction; hybrid machine learning; feature selection; feature extraction; classification algorithms; regression analysis; weather research and forecasting (WRF)
Online: 10 November 2021 (10:48:15 CET)
Solar radiation prediction is an important process in ensuring optimal exploitation of solar energy power. Numerous models have been applied to this problem, such as numerical weather prediction models and artificial intelligence models. However, well-designed hybridization approaches that combine numerical models with artificial intelligence models to yield a more powerful model can provide a significant improvement in prediction accuracy. In this paper, we propose novel hybrid machine learning approaches that exploit auxiliary numerical data. The proposed hybrid methods invoke different machine learning paradigms, including feature selection, classification, and regression. Additionally, numerical weather prediction (NWP) models are used in the proposed hybrid models. Feature selection is used for feature space dimension reduction to reduce the large number of recorded parameters that affect estimation and prediction processes. The rough set theory is applied for attribute reduction and the dependency degree is used as a fitness function. We investigate the effect of the attribute reduction process with thirty different classification and prediction models in addition to the proposed hybrid model. Then, different machine learning models are constructed based on classification and regression techniques to predict solar radiation. Moreover, other hybrid prediction models are formulated to use the output of the numerical model of Weather Research and Forecasting (WRF) as learning elements in order to improve the prediction accuracy. The proposed methodologies are evaluated using a data set that is collected from different regions in Saudi Arabia.
ARTICLE | doi:10.20944/preprints202309.1845.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: cognitive functions; machine learning; feature selection; violence risk assessment
Online: 28 September 2023 (03:08:34 CEST)
Machine Learning techniques can be used to identify whether deficits in cognitive functions contribute to antisocial and aggressive behavior. This paper initially presents the results of tests conducted on delinquent and non-delinquent youths to assess their cognitive functions. The dataset extracted from these assessments, consisting of 37 predictor variables and one target. was used to train three algorithms that aim to predict whether the data corresponds to that of a young offender or a non-offending youth. Prior to this, statistical tests were conducted on the data to identify characteristics that exhibited significant differences in order to select the most relevant features and optimize the prediction results. Additionally, other feature selection methods, such as Boruta, RFE, and Filter, were applied, and their effects on the accuracy of each of the three machine learning models used (SVM, RF, and KNN) were compared. 80% of the data were utilized for training, while the remaining 20% were used for validation. The best result was achieved by the K-NN model trained with 19 features selected by the Boruta method, followed by the SVM model trained with 24 features selected by the filter method.
ARTICLE | doi:10.20944/preprints202308.1334.v1
Subject: Environmental And Earth Sciences, Environmental Science Keywords: PM2.5 concentration; feature selection; clustering algorithm; Adaboost integration model
Online: 18 August 2023 (09:49:34 CEST)
Determining accurate PM2.5 pollution concentrations and understanding their dynamic patterns is crucial for scientifically informed air pollution control strategies. Traditional reliance on linear correlation coefficients for ascertaining PM2.5 related factors only uncovers superficial relationships. Moreover, the invariance of conventional prediction models restricts their accuracy. To enhance the precision of PM2.5 concentration prediction, this study introduces a novel integrated model that leverages feature selection and a clustering algorithm. Comprising three components - feature selection, clustering, and integrated prediction, the model first employs the non-dominated sorting Genetic Algorithm (NSGA-III) to identify the most impactful features affecting PM2.5 concentration within air pollutants and meteorological factors. This step offers more valuable feature data for subsequent modules. The model then adopts a two-layer clustering method (SOM+K-means) to analyze the multifaceted irregularity within the dataset. Finally, the model establishes the Extreme Learning Machine (ELM) weak learner for each classification, integrating multiple weak learners using the Adaboost algorithm to obtain a comprehensive prediction model. Through feature correlation enhancement, data irregularity exploration, and model adaptability improvement, the proposed model significantly enhances the overall prediction performance. Data sourced from 12 Beijing-based monitoring sites in 2016 were utilized for an empirical study, and the model's results compared with five other predictive models. The outcomes demonstrate that the proposed model significantly heightens prediction accuracy, offering useful insights and potential for broadened application to multifactor correlation concentration prediction methodologies for other pollutants.
ARTICLE | doi:10.20944/preprints202305.0489.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: Covid-19; KNN; SVM; Fractional Fourier transform; Feature Extraction
Online: 8 May 2023 (09:12:58 CEST)
Covid-19 is a lung disease caused by a Coronavirus family virus. Due to its extraordinary prevalence and death rates, it has spread quickly to every country in the world. Thus, achieving peaks and outlines and curing different types of relapses is extremely important. Given the worldwide prevalence of Coronavirus and the participation of physicians in all countries, Information has been gathered regarding the properties of the virus, its diverse types, and the means of analyzing it. Numerous approaches have been used to identify this evolving virus. It is generally considered the most accurate and acceptable method of examining the patient's lungs and chest through a CT scan. As part of the feature extraction process, a method known as fractional Fourier transform (FrFT) has been applied as one of the time-frequency domain transformations. The proposed method was applied to a database consisting of 2481 CT images. Following the transformation of all images into equal sizes and the removal of non-lung areas, multiple combination windows are used to reduce the number of features extracted from the images. In this paper, the results obtained for KNN and SVM classification have been obtained with accuracy values of 99.84% and 99.90%, respectively.
ARTICLE | doi:10.20944/preprints202303.0505.v1
Subject: Social Sciences, Psychology Keywords: emotional prosody; multi-feature oddball; mismatch negativity (MMN); P3a
Online: 29 March 2023 (10:55:44 CEST)
Purpose: Emotional voice conveys important social cues that demand listeners’ attention and timely processing. This event-related potential study investigated the feasibility of a multi-feature oddball paradigm to examine adult listeners’ neural responses to detecting emotional prosody changes in non-repeating naturally spoken words. Method: Thirty-three adult listeners completed the experiment by passively listening to the words in neutral and three alternating emotions while watching a silent movie. Previous research documented pre-attentive change-detection electrophysiological responses (e.g., MMN, P3a) to emotions carried by fixed syllables or words. Given that the MMN and P3a have also been shown to reflect extraction of abstract regularities over repetitive acoustic patterns, the current study employed a multi-feature oddball paradigm to compare listeners’ MMN and P3a to emotional prosody change from neutral to angry, happy, and sad emotions delivered with hundreds of non-repeating words in a single recording session. Results: Both MMN and P3a were successfully elicited by the emotional prosodic change over the varying linguistic context. Angry prosody elicited the strongest MMN compared to happy and sad prosodies. Happy prosody elicited the strongest P3a in the centro-frontal electrodes, and angry prosody elicited the smallest P3a. Conclusions: The results demonstrated that listeners were able to extract the acoustic patterns for each emotional prosody category over constantly changing spoken words. The findings confirm the feasibility of the multi-feature oddball paradigm in investigating emotional speech processing beyond simple acoustic change detection, which may potentially be applied to pediatric and clinical populations.
ARTICLE | doi:10.20944/preprints202203.0399.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: microbiome; genetic algorithm; feature selection; human health; machine learning
Online: 31 March 2022 (08:00:03 CEST)
The relationship between the host and the microbiome, or the assemblage of microorganisms (including bacteria, archaea, fungi, and viruses), has been proven crucial for its health and disease development. The high dimensionality of microbiome datasets has often been addressed as a major difficulty for data analysis, such as the use of Machine Learning (ML) and Deep Learning (DL) models. Here we present BiGAMi, a bi-objective genetic algorithm fitness function for feature selection in microbial datasets to train high-performing phenotype classifiers. The proposed fitness function allowed us to build classifiers that outperformed the baseline performance estimated by the original studies by using as few as 0.04% to 2.32% features of the original dataset. In 19 out of 21 classification exercises, BiGAMi achieved its results by selecting 6-68% fewer features than the highest performance of a Sequential Forward Feature Selection algorithm. This study showed that the application of a bi-objective GA fitness function against microbiome datasets succeeded in selecting small subsets of bacteria whose contribution to understood diseases and the host state was already experimentally proven. Applying this feature selection approach to novel diseases is expected to quickly reveal the microbes most relevant to a specific condition.
ARTICLE | doi:10.20944/preprints202008.0113.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Scene classification; Deep Learning; Convolutional Neural Networks; Feature learning
Online: 5 August 2020 (06:19:27 CEST)
State-of-the-art remote sensing scene classification methods employ different Convolutional Neural Network architectures for achieving very high classification performance. A trait shared by the majority of these methods is that the class associated with each example is ascertained by examining the activations of the last fully connected layer, and the networks are trained to minimize the cross-entropy between predictions extracted from this layer and ground-truth annotations. In this work, we extend this paradigm by introducing an additional output branch which maps the inputs to low dimensional representations, effectively extracting additional feature representations of the inputs. The proposed model imposes additional distance constrains on these representations with respect to identified class representatives, in addition to the traditional categorical cross-entropy between predictions and ground-truth. By extending the typical cross-entropy loss function with a distance learning function, our proposed approach achieves significant gains across a wide set of benchmark datasets in terms of classification, while providing additional evidence related to class membership and classification confidence.
ARTICLE | doi:10.20944/preprints202008.0051.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Variational autoencoder; Adversarial learning; Deep feature consistent; Data generation
Online: 2 August 2020 (18:11:27 CEST)
We present a method to improve the reconstruction and generation performance of variational autoencoder (VAE) by injecting an adversarial learning. On the other hand, instead of comparing the reconstructed with the original data to calculate the reconstruction loss, we use a consistency principle for deep features. The training process of the VAE is then divided into two steps, training the encoder and then training the decoder. By using this two-step learning process, our method can be more widely used in applications other than image processing. While training the encoder, the label information is integrated to better structure the latent space in a supervised way. The adversarial constraints allow the decoder to generate data with better authenticity and more realistic than the conventional VAE. We present experimental results to show that our method gives better performance than the original VAE.
ARTICLE | doi:10.20944/preprints202003.0081.v1
Subject: Biology And Life Sciences, Biophysics Keywords: COVID-19; electrostatic feature; salt bridging network; structural update
Online: 5 March 2020 (03:37:44 CET)
Since the Coronavirus disease (COVID-19) outbreak at the end of 2019, the past two month has seen an acceleration both in and outside China in the R&D of the diagnostics, vaccines and therapeutics for this novel coronavirus. As one of the molecular forces that determine protein structure, electrostatic effects dominate many aspects of protein behaviour and biological function. Thus, incorporating currently available experimental structures related to COVID-19, this article reports a simple python-based analysis tool and a LaTeX-based editing tool to extract and summarize the electrostatic features from experimentally determined structures, to strengthen our understanding of COVID-19's structure and function and to facilitate machine-learning and structure-based computational design of its neutralizing antibodies and/or small molecule(s) as potential therapeutic candidates. Finally, this article puts forward a brief update of the structurally observed electrostatic features of the COVID-19 coronavirus.
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: wheat; UAV image; color index; texture feature index; biomass
Online: 26 December 2019 (12:27:49 CET)
In order to realize rapid and nondestructive monitoring of wheat biomass in field, field experiments based on different densities, nitrogen fertilizer and variety treatments were studied. RGB images of wheat in the main growth stage were obtained by UAV, and wheat color and texture feature indices were obtained by image processing, and wheat biomass was obtained by field sampling in the same period. Then the relationship between different color and texture feature indices and wheat biomass was analyzed to select the color and texture feature index suitable for wheat biomass estimation. The results showed that there was a high correlation between image color index and wheat biomass in different stages, and most of them reached a very significant correlation level. However, the correlation between image texture feature index and wheat biomass was poor, only a few indexes reached significant or extremely significant correlation level. Based on the above results, the color indices with the highest correlation to wheat biomass or the combining indices of color and texture feature in different growth stage were used to construct estimation model of wheat biomass. The models were validated using independently measured biomass data, and the correlation between simulated and measured values reached the significant level, RMSE were smaller. This indicated that the estimated results by the models were reliable and accurate. It also showed that the estimation models of wheat biomass combined with color and texture feature indices of UAV image were better than the single color index models. The results would provide a new method for real-time monitoring of wheat field growth and biomass estimation.
ARTICLE | doi:10.20944/preprints201705.0142.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: edge detection; hyperspectral image; gravitation; remote sensing; feature space
Online: 19 May 2017 (06:00:18 CEST)
Edge detection is one of the key issues in the field of computer vision and remote sensing image analysis. Although many different edge-detection methods have been proposed for gray-scale, color, and multispectral images, they still face difficulties when extracting edge features from hyperspectral images (HSIs) that contain a large number of bands with very narrow gap in the spectral domain. Inspired by the clustering characteristic of the gravitation, a novel edge-detection algorithm for HSIs is presented in this paper. In the proposed method, we first construct a joint feature space by combining the spatial and spectral features. Each pixel of HSI is assumed to be a celestial object in the joint feature space, which exerts gravitational force to each of its neighboring pixel. Accordingly, each object travels in the joint feature space until it reaches a stable equilibrium. At the equilibrium, the image is smoothed and the edges are enhanced, where the edge pixels can be easily distinguished by calculating the gravitational potential energy. The proposed edge-detection method is tested on several benchmark HSIs and the obtained results were compared with those of three state-of-the-art approaches. The experimental results confirm the efficacy of the proposed method
ARTICLE | doi:10.20944/preprints202003.0299.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Data Mining; Alzheimer’s Dementia; Composite Hybrid Feature Selection; Machine learning; stack Hybrid Classification; AI; MRI; Neuroimaging; MPEG7 edge histogram feature extraction; CNN
Online: 19 March 2020 (11:25:01 CET)
Alzheimer's disease (AD) detection acting as an essential role in global health care due to misdiagnosis and sharing many clinical sets with other types of dementia, and costly monitoring the progression of the disease over time by magnetic reasoning imaging (MRI) with consideration of human error in manual reading. This paper goal a comparative study on the performance of data mining techniques on two datasets of Clinical and Neuroimaging Tests with AD. Our proposed model in the first stage, Apply clinical medical dataset to a composite hybrid feature selection (CHFS), for extract new features to select the best features due to eliminating obscures features, In parallel with Apply a novel hybrid feature extraction of three batch edge detection algorithm and texture from MRI images dataset and optimized with fuzzy 64-bin histogram. In the second stage, we applied a clinical dataset to a stacked hybrid classification(SHC) model to combine Jrip and random forest classifiers with six model evaluations as meta-classifier individually to improve the prediction of clinical diagnosis. At the same stage of improving the classification accuracy of neuroimaging (MRI) dataset images by applying a convolution neural network (CNN) in comparison with traditional classifiers, running on extracted features from images. The authors have collected the clinical dataset of 426 subjects with (1229 potential patient sample) from oasis.org and (MRI) dataset from a benchmark kaggle.com with a total of around ~5000 images each segregated into the severity of Alzheimer's. The datasets evaluated using an explorer set of weka data mining software for the analysis purpose. The experimental show that the proposed model of (CHFS) feature extraction lead to effectively reduced the false-negative rate with a relatively high overall accuracy with a stack hybrid classification of support vector machine (SVM) as meta-classifier of 96.50% compared to 68.83% of the previous result on a clinical dataset, Besides a compared model of CNN classification on MRI images dataset of 80.21%. The results showed the superiority of our CHFS model in predicting Alzheimer's disease more accurately with the clinical medical dataset in early-stage compared with the neuroimaging (MRI) dataset. The results of the proposed model were able to predict with accurately classify Alzheimer's clinical samples at a low cost in comparison with the MRI-CNN images model at the early stage and get a good indicator for high classification rate for MRI images when applying our proposed model of SHC.
ARTICLE | doi:10.20944/preprints202312.0577.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Forest Fire Prediction; CRISP-DM Methodology; Feature Selection; Decision Tree
Online: 8 December 2023 (10:26:24 CET)
Despite being considered as natural components of many ecosystems, forest fires pose significant threats to the environment and human health. In order to ensure public safety and effective fire suppression planning, it is necessary to develop reliable prediction models to mitigate forest fire danger. These models should account for specific environmental conditions. The advent of big data in recent years has opened new avenues for improving forest fire predictions. Machine learning techniques, surpassing traditional forecasting methods, have shown significant promise in this area. By applying the Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology to a real-world dataset, this paper explores the application of machine learning approaches to understand forest fire patterns and predict fire danger. We consider six distinct time stages and incorporate feature selection to refine our predictions. It is important to note that forest fire behavior models are not universally effective due to geographical variations in data. Nevertheless, advanced decision-making techniques are vital in forest fire management. Our research presents a systematic exploration of the topic, comparing various machine learning models, thereby providing a comprehensive baseline for future investigations in this crucial environmental arena.
ARTICLE | doi:10.20944/preprints202311.1681.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: information extraction; entity relation extraction; graph neural networks; dependency feature
Online: 27 November 2023 (07:55:12 CET)
In the intricate domain of text analysis, syntactic dependency trees play a crucial role in unraveling the web of relations among entities embedded in the text. These trees provide a structural roadmap, guiding us through the complex syntax to pinpoint the interactions and connections between different entities. However, the challenge lies in sifting through this intricate structure to extract relevant information, a task that requires precision and discernment. Traditional approaches often rely on rule-based pruning methods to simplify these dependency structures, focusing on certain parts while discarding others. Yet, this approach has its pitfalls, as it can overlook critical nuances and connections that are vital for a comprehensive understanding of the text. Addressing this gap, our research introduces the Syntactic Dependency-Aware Neural Networks (SDANNs), a groundbreaking model designed to harness the full power of the entire dependency tree. This approach marks a significant departure from the conventional methods. Instead of the rigid rule-based pruning, SDANNs implement a more flexible and dynamic 'soft-pruning' technique. This method allows the model to adaptively focus on the sub-structures within the dependency tree that are most relevant for understanding the relationships between entities. By doing so, it ensures that no vital information is overlooked, and all potential connections are considered. The efficacy of SDANNs is not just theoretical but has been empirically validated through extensive testing and evaluations across a wide range of tasks. These tasks include the extraction of complex relations spanning multiple sentences, as well as detailed analyses at the sentence level. In each of these scenarios, SDANNs have demonstrated a remarkable ability to leverage the full structural complexity of dependency trees. This capability sets them apart from existing models, enabling a more nuanced and comprehensive analysis of textual relations. The results of these evaluations consistently show that SDANNs not only meet but significantly exceed the performance of prior models. This superiority is evident in the way SDANNs handle the multifaceted and often subtle interactions within the text, offering insights that were previously inaccessible with conventional methods. In summary, the Syntactic Dependency-Aware Neural Networks represent a significant advancement in the field of text analysis. By fully embracing the complexity of syntactic dependency trees and employing a sophisticated 'soft-pruning' approach, SDANNs open new avenues for exploring and understanding the intricate relationships that exist within written language. This model stands as a testament to the potential of combining advanced neural network architectures with a deep understanding of linguistic structures, paving the way for more accurate, nuanced, and comprehensive analyses of text.
ARTICLE | doi:10.20944/preprints202310.1324.v1
Subject: Biology And Life Sciences, Animal Science, Veterinary Science And Zoology Keywords: feature selection; milk mid-infrared spectra; fatty acids concentration; regression
Online: 20 October 2023 (10:15:41 CEST)
Milk MIR spectra have been shown to provide valuable information on a wide range of traits to be used in dairy cattle breeding programs. Selecting the most informative variables from complex data can improve prediction accuracy and model robustness and, consequently, the interpretability of MIR spectra. Thus, we aimed to investigate the prediction performance of feature selection methods based on MIR spectra data, using the milk fatty acid (FA) profile as an example to illustrate the evaluated procedure. Data of MIR spectra, milk test-day records, and reference FA concentrations of 155 first-parity Holstein cows were used in the analyses. Four models comprising different explanatory variables and five feature selection methods were evaluated. The results indicated that the Competitive Adaptive Reweighted Sampling (CARS) method can effectively select the most informative variables from the MIR spectra, resulting in higher prediction accuracies than other variable selection approaches. The model including selected MIR spectra and cow information variables [days in milk at the test day, age at the test day, pregnancy stage (in days), number of days open, number of inseminations, and somatic cell count] yielded the best FA profile predictions based on Partial Least Square regression. In particular, ten FAs (C8:0, C10:0, C14:1, C17:0 isomers, C18:1, C18:1 isomer, medium-chain FA, unsaturation FA, monounsaturated FA, and polyunsaturated FA) presented accuracies based on the determination coefficient (R2cv) ranging from 0.66 to 0.85 in internal validation and from 0.65 to 0.84 in external validation. By running CARS 1,000 times in internal validations, we obtained the frequency of selected milk MIR wavenumber for 35 FAs. The most related wavenumbers to FAs were found within 1,003 to 1,145 cm-1, while other discrete areas were between 1,651 to 1,797 and 2,834 to 2,954 cm-1. These biomarkers may give insights into the relationship between MIR spectra and FA phenotypes. In conclusion, using CARS and cow information improved predictions of FAs based on MIR spectra in Chinese Holstein dairy cows. Additional validation studies should be conducted as larger datasets become available.
ARTICLE | doi:10.20944/preprints202310.1230.v1
Subject: Engineering, Aerospace Engineering Keywords: fatigue recognition; Air traffic controller; Feature fusion; Multi-mode; scheduling
Online: 19 October 2023 (08:42:59 CEST)
An algorithm is proposed for discriminating the fatigue state of air traffic controllers based on ap-plying multispeech feature fusion using an FSVM to voice data, and for extracting eye-fatigue-state discrimination features based on PERCLOS eye data. For the speech algorithm and an eye-fatigue index, a new controller fatigue-state evaluation index based on the entropy weight method is proposed based on decision-level fusion of fatigue discrimination results for speech and the eyes. Experimental results show that the fatigue-state recognition accuracy rate was 84.81% for the fatigue state evaluation index, which was 3.36% and 1.86% higher than those for speech and eye assessments, respectively. The comprehensive fatigue evaluation index provides important reference values for controller scheduling and mental-state evaluations.
REVIEW | doi:10.20944/preprints202309.1939.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: HAR, Human Activity Recognition, Feature extraction, Machine Learning, Computer Vision
Online: 28 September 2023 (11:13:40 CEST)
Human Action Recognition is widely used in multiple fields to recognize activities and extract spatial and temporal information. This paper focuses on analyzing various methods as well as provides extensive knowledge of the foundational concepts of the HAR. A dataset is also very crucial for any research study so we discussed the popular dataset and its features in this paper.
ARTICLE | doi:10.20944/preprints202309.1699.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Image retrieval, Deep learning, Multi-scale feature, Deep supervised hashing .
Online: 26 September 2023 (05:13:58 CEST)
Deep networks-based hashing has gained significant popularity in recent years, particularly in the field of image retrieval. However, most existing methods only focus on extracting semantic information from the final layer, disregarding valuable structural information that contains important semantic details crucial for effective hash learning. To address this limitation and improve image retrieval accuracy, we propose a novel deep hashing method called Deep Supervised Hashing by Fusing Multiscale Deep Features (DSHFMDF). Our approach involves extracting multiscale features from multiple convolutional layers and fusing them to generate more robust representations for efficient image retrieval. Experimental results on CIFAR10 and NUS-WIDE datasets demonstrate that our method surpasses the performance of state-of-the-art hashing techniques.
ARTICLE | doi:10.20944/preprints202309.0156.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: deep learning; feature attribution; gaussian noise; LSTM; precipitation prediction; RMSE
Online: 5 September 2023 (03:03:04 CEST)
This paper explores the use of different deep learning models for predicting precipitation in 56 meteorological stations in Jilin Province, China. The models used include Stacked-LSTM, Transformer, and SVR, and Gaussian noise is added to the data to improve their robustness. Results show that the Stacked-LSTM model performs the best, achieving high prediction accuracy and stability. The study also conducts variable attribution analysis using LightGBM and finds that temperature, dew point, precipitation in previous days, and air pressure are the most important factors affecting precipitation prediction, which is consistent with traditional meteorological theory. The paper provides detailed information on the data processing, model training, and parameter settings, which can serve as a reference for future precipitation prediction tasks. The findings suggest that adding Gaussian noise to the dataset can improve the model's generalization ability, especially for predicting days with zero precipitation. Overall, this study provides useful insights into the application of deep learning models in precipitation prediction and can contribute to the development of meteorological forecasting and applications.
ARTICLE | doi:10.20944/preprints202308.1885.v1
Subject: Chemistry And Materials Science, Food Chemistry Keywords: Cheminformatics; Taste prediction; Machine learning; Deep learning; Molecular feature representation
Online: 29 August 2023 (03:06:30 CEST)
Taste determination in small molecules is critical in food chemistry, but traditional experimental methods can be time-consuming. Consequently, computational techniques have emerged as val-uable tools for this task. In this study, we explore taste prediction using various molecular feature representations and assess the performance of different machine learning algorithms on a dataset comprising 2,601 molecules. The results reveal that GNN-based models outperform other ap-proaches in taste prediction. Moreover, consensus models that combine diverse molecular repre-sentations demonstrate improved performance. Among these, molecular fingerprints + GNN con-sensus model emerges as the top performer, highlighting the complementary strengths of GNNs and molecular fingerprints. These findings have significant implications for food chemistry research and related fields. By leveraging these computational approaches, taste prediction can be expedited, leading to advancements in understanding the relationship between molecular structure and taste perception in various food components and related compounds.
ARTICLE | doi:10.20944/preprints202308.0373.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: ReID; Pyramid Vision Transformer; local feature clustering; side information embeddings
Online: 4 August 2023 (07:26:19 CEST)
Due to the influence of background conditions, lighting conditions, occlusion issues and the image resolution, how to extract robust person features is one of the difficulties in ReID research. Vision in Transformers (ViT) has achieved significant results in the field of computer vision. However, the existing problems still limit its application in ReID due to slow extraction of person features and difficulty in utilizing local features of people. To solve the mentioned problems, we utilize Pyramid Vision Transformer (PVT) as the backbone of feature extraction and propose a PVT-based ReID method in conjunction with other studies. Firstly, some improvements suitable for ReID are used on the PVT backbone, and we establish a basic model by using powerful methods verified on CNN-based ReID. Secondly, in an effort to further promote the robustness of the person features extracted by the PVT backbone, two new modules are designed. (1) The local feature clustering (LFC) is recommend to enhance the robustness of person features by calculating the distance between local features and global feature to select the most discrete local features and clustering them. (2) The side information embeddings (SIE) are used to encode non-visual information and send it into the network for training to reduce its impact on person features. Finally, the experiments show that PVTReID has achieved excellent results in ReID datasets and are 20% faster on average than CNN-based ReID methods.
ARTICLE | doi:10.20944/preprints202306.0161.v1
Subject: Computer Science And Mathematics, Security Systems Keywords: biometrics; deep learning; time series; feature selection; classification; accelerometer; Sustainability
Online: 2 June 2023 (09:00:51 CEST)
With the growing popularity of smartphones, user identification has become an essential component of maintaining security and privacy. This study investigates how smartphone accelerometer data can be used to identify users, and it makes recommendations for the ideal application parts. Accelerometer data from the HMOG public dataset was used to train deep learning, conventional classifiers, and voting classifiers, which were then utilized to identify users. To enhance performance, feature selection and pre-processing techniques were researched. The results show that RFE feature selection outperforms other approaches and that LSTM followed by XGBoost has the best identification performance as indicated by a relatively large number of machine learning performance measures. The proposed identification system nevertheless performed well and outperformed existing methods, which were principally created and tested on the same HMOG public smartphone dataset, even with a larger number of users. Further work would be necessary for such an application to reach its full potential, though.
ARTICLE | doi:10.20944/preprints202305.1866.v1
Subject: Engineering, Mechanical Engineering Keywords: Computational heat transfer; Coating; Feature combination; Machine learning; Heat-exchangers
Online: 26 May 2023 (05:38:40 CEST)
Cross flow heat exchangers are commonly used in the thermal industry to transfer heat from hot tubes to cooling fluid. To protect the heat exchanger tubes from corrosion and dust accumulation, microscale coatings are often applied. In this study, we present machine-learning models for predicting heat transfer from hot tubes with different micro-sized coatings to cooling fluid in a turbulent flow using computational fluid dynamics simulations. A dataset of approximately 1000 cases was generated by varying the coating coverage thickness of each tube, the inlet Reynolds number, fluid flow inlet temperature, and wall temperature of tubes. The machine-learning models were generated to predict the overall heat flow rate in the heat exchanger, and it was found that combining the features based on their importance preserved the accuracy of the models while maintaining all the relevant information. The simulation results demonstrate that the proposed method increases the coefficient of determination (R2) for the models. The R2 values for unseen data for Random Forest, K-Nearest Neighbors, and Support Vector Regression were 0.9810, 0.9037, and 0.9754, respectively, indicating the usefulness of the proposed model for predicting heat transfer in various types of heat exchangers.
ARTICLE | doi:10.20944/preprints202305.1519.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: crop prediction; machine learning; feature selection; artificial intelligent; smart farming
Online: 22 May 2023 (11:24:05 CEST)
This research investigates the potential benefits of integrating machine learning algorithms and IoT sensors in modern agriculture. The focus is on optimizing crop production and reducing waste through informed decisions about planting, watering, and harvesting crops. The paper discusses the current state of machine learning and IoT in agriculture, highlighting key challenges and opportunities. It also presents experimental results that demonstrate the impact of changing labels on the accuracy of data analysis algorithms. The findings recommend that by analyzing wide-ranging data collected from farms, including real-time data from IoT sensors, farmers can make more informed verdicts about factors that affect crop growth. Eventually, the integration of these technologies can transform modern agriculture by increasing crop yields while minimizing waste. In our studies, we achieve a classification accuracy of 99.59% using the Bayes Net algorithm and 99. 46% using Naïve Bayes Classifier, and Hoeffding Tree algorithms. Our results indicate that we achieved high accuracy results in our experiments in order to increase crop growth.