ARTICLE | doi:10.20944/preprints201904.0095.v2
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: EMR; SVM; Classification; Clustering
Online: 12 April 2019 (20:53:16 CEST)
Lately, the Critical Pathway(CP) of Electronic Medical Record(EMR) is used to the guideline for a treatment in the public hospital. We propose a healthcare promotion service using disease pattern with lifestyle risk factors. We classify a medical historical patient data with disease codes with lifestyle risk factors (hypertension, diabetes, smoking, overweight, excessive alcohol intake, and low physical activity) to make the lifestyle risk factors through the classification. We finally make the clusters of disease code with lifestyle risk factors using the medical historical data based on EMR's electronic discharge summary data. As the result of that, we do a healthcare recommending service based on the disease pattern with lifestyle risk. We can build a medical help desk of a public hospital to support people as we check into the public hospital; how to get the procedure of curing, the desired curing clinical method for the healthcare promotion service by each disease code, and how to be better our healthcare. We evaluate the performance of the proposed system by experimenting with the datasets collected at the medical center to measure performance and report some experimental results.
ARTICLE | doi:10.20944/preprints202105.0441.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: emotion recognition; MLP; SVM; RAVDESS
Online: 19 May 2021 (12:53:55 CEST)
herein, we have compared the performance of SVM and MLP in emotion recognition using speech and song channels of the RAVDESS dataset. We have undertaken a journey to extract various audio features, identify optimal scaling strategy and hyperparameter for our models. To increase sample size, we have performed audio data augmentation and addressed data imbalance using SMOTE. Our data indicate that optimised SVM outperforms MLP with an accuracy of 82 compared to 75%. Following data augmentation, the performance of both algorithms was identical at ~79%, however, overfitting was evident for the SVM. Our final exploration indicated that the performance of both SVM and MLP were similar in which both resulted in lower accuracy for the speech channel compared to the song channel. Our findings suggest that both SVM and MLP are powerful classifiers for emotion recognition in a vocal-dependent manner.
ARTICLE | doi:10.20944/preprints201703.0156.v2
Online: 21 March 2017 (03:49:41 CET)
One of the challenges in Content-Based Image Retrieval (CBIR) is to reduce the semantic gaps between low-level features and high-level semantic concepts. In CBIR, the images are represented in the feature space and the performance of CBIR depends on the type of selected feature representation. Late fusion also known as visual words integration is applied to enhance the performance of image retrieval. The recent advances in image retrieval diverted the focus of research towards the use of binary descriptors as they are reported computationally efficient. In this paper, we aim to investigate the late fusion of Fast Retina Keypoint (FREAK) and Scale Invariant Feature Transform (SIFT). The late fusion of binary and local descriptor is selected because among binary descriptors, FREAK has shown good results in classification-based problems while SIFT is robust to translation, scaling, rotation and small distortions. The late fusion of FREAK and SIFT integrates the performance of both feature descriptors for an effective image retrieval. Experimental results and comparisons show that the proposed late fusion enhances the performances of image retrieval.
ARTICLE | doi:10.20944/preprints202201.0415.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Drought tolerance index; Stress tolerance index; MLP; SVM; MLP-GA; SVM-GA; Genetic Algorithm
Online: 27 January 2022 (11:21:14 CET)
Maize (Zea mays subsp. mays) is the staple food crop in the world. In this study, multi-layer perceptron (MLP), support vector machine (SVM), genetic algorithm-based multi-layer perceptron (MLP-GA), and genetic algorithm-based support vector machine (SVM-GA) hybrid artificial intelligence algorithms were used for the prediction of drought tolerance and stress tolerance indices in teosinte maize lines. Correspondingly, the gamma test technique was applied to determine efficient input and output vectors. The potential of developed models was evaluated based on statistical indices and graphical representation. Results of gamma test based on the least value of gamma and standard error indices show that day of anthesis (DOA), day of silking (DOS), yield index (YI), and gross yield per plant (GYP) information vector arrangements were determined as efficient information vector combination for drought-tolerant index (DTI) as well as the stress-tolerant index (STI). The results of MLP, SVM, MLP-GA, and SVM-GA algorithms were compared based on statistical indices and visual interpretation that have satisfactory for prediction of the drought-tolerant index and stress-tolerant index in maize crop. It has also seemed that genetic algorithm-based hybrid models (MLP-GA and SVM-GA) were found a better prediction of the drought-tolerant index and stress-tolerant index in maize crop. Similarly, the SVM-GA model has the highest potential to forecast the DTI and STI in maize crops as compared to MLP, SVM, MLP-GA models.
Subject: Engineering, Electrical & Electronic Engineering Keywords: coronavirus; COVID-19; diagnosis; deep features; SVM
Online: 22 April 2020 (05:58:22 CEST)
The detection of coronavirus (COVID-19) is now a critical task for the medical practitioner. The coronavirus spread so quickly between people and approaches 100,000 people worldwide. In this consequence, it is very much essential to identify the infected people so that prevention of spread can be taken. In this paper, the deep feature plus support vector machine (SVM) based methodology is suggested for detection of coronavirus infected patient using X-ray images. For classification, SVM is used instead of deep learning based classifier, as the later one need a large dataset for training and validation. The deep features from the fully connected layer of CNN model are extracted and fed to SVM for classification purpose. The SVM classifies the corona affected X-ray images from others. The methodology consists of three categories of Xray images, i.e., COVID-19, pneumonia and normal. The method is beneficial for the medical practitioner to classify among the COVID-19 patient, pneumonia patient and healthy people. SVM is evaluated for detection of COVID-19 using the deep features of different 13 number of CNN models. The SVM produced the best results using the deep feature of ResNet50. The classification model, i.e. ResNet50 plus SVM achieved accuracy, sensitivity, FPR and F1 score of 95.33%,95.33%,2.33% and 95.34% respectively for detection of COVID-19 (ignoring SARS, MERS and ARDS). Again, the highest accuracy achieved by ResNet50 plus SVM is 98.66%. The result is based on the Xray images available in the repository of GitHub and Kaggle. As the data set is in hundreds, the classification based on SVM is more robust compared to the transfer learning approach. Also, a comparison analysis of other traditional classification method is carried out. The traditional methods are local binary patterns (LBP) plus SVM, histogram of oriented gradients (HOG) plus SVM and Gray Level Co-occurrence Matrix (GLCM) plus SVM. In traditional image classification method, LBP plus SVM achieved 93.4% of accuracy.
ARTICLE | doi:10.20944/preprints201909.0082.v1
Online: 7 September 2019 (01:19:04 CEST)
This study presents an analysis of subsidence rates and their effects on Mexico City. Mexico City is well known for its subsidence as a result of excess water withdrawal for many years. This study focuses on this problem utilizing the integration of Interferometric Synthetic Aperture Radar (InSAR), Continuous Global Positioning Systems (CGPS), and optical remote sensing data. Fifty-two ENVISAT-ASAR, nine GPS stations, and one Landsat ETM+ image from Mexico City area have been analyzed to prepare a better understanding of the subsidence rates and its effects on Mexico City’s commune. This study has utilized InSAR methods. It includes differential interferometry and Persistent Scatter Interferometry (PSI) to monitor the existing subsidence in the Mexico City area. The InSAR data covers the temporal baseline between 2002 until June 2010, and the GPS data include temporal baseline from 1998 until 2012. Maximum of 352 mm annually change in Line Of Sight (LOS) direction is in agreement with the previous geodetic studies. InSAR data have been compared with CGPS data at the same time interval. The finding of this study reveals a high amount of correlation (up to 0.98) between two independent geodetic methods. We also implemented the Support Vector Machine (SVM) analysis method based on Landsat ETM+ image to classify Mexico City’s populated density area. This method performed comparing the subsidence rates with populated area buildings. This integrated study shows that the fastest subsidence zone (i.e., areas greater than 100 mm/yr) in the over mentioned temporal baseline occurs in the high and sparsely populated areas
ARTICLE | doi:10.20944/preprints201803.0128.v1
Subject: Engineering, Biomedical & Chemical Engineering Keywords: moments invariants; ZM,PZM; OFMM; SVM; PSO
Online: 16 March 2018 (05:26:31 CET)
This paper provides orthogonal moments (OM) such as, Zernike Moments(ZM), Psuedo Zernike Moments(PZM) and Orthogonal Fourier Mellin Moments(OFMM) for the analysis of melanoma images. The moment invariants may vary with respect to geometric variations. For the analysis of orthogonal moments hundred random melanoma images and hundred non-melanoma images have been taken into consideration from the database of 570 melanoma images and 250 non-melanoma images respectively. Orthoganal moments have been computed by varying the phase angles from 10° to 40° with an equal interval of 10° degree for the orders 2, 4,8,16,32,64,128,256 respectively. For the optimal OMs Particle Swarm Optimization (PSO) technique have been used. These set of extracted optimal OMs have been further applied to classify melanoma images. Support Vector Machine (SVM) has been used for the classification of sensitivity=88.78%.
ARTICLE | doi:10.20944/preprints202206.0163.v1
Subject: Engineering, Civil Engineering Keywords: MARS; SVM; RF; rainfall; runoff; rainfall-runoff modelling
Online: 13 June 2022 (03:29:36 CEST)
Nowadays, great attention has been attributed to the study of runoff and its fluctuation over space and time. There is a crucial need for a good soil and water management system to overcome the challenges of water scarcity and other natural adverse events like floods and landslides, among others. Rainfall-runoff modeling is an appropriate approach for runoff prediction, making it possible to take preventive measures to avoid damage caused by natural hazards such as floods. In the present study, several data driven models, namely: Multiple linear regression (MLR), Multiple adaptive regression splines (MARS), Support vector machine (SVM), and Random Forest (RF), were used for rainfall-runoff prediction of the Gola watershed, located in the south-eastern part of the Uttarakhand. The performance of the models was evaluated based on the coefficient of determination (R2), root mean square error (RMSE), Nash-Sutcliffe efficiency (NSE), and percent bias (PBAIS) indices. In addition to the numerical comparison, the models were evaluated and their performances were evaluated base on graphical plotting, i.e., line diagram, scatter plot, Violin plot, relative error plot and Taylor diagram (TD). The comparison results revealed that the four heuristic methods gave higher accuracy than the MLR model. Among the machine learning models, the RF (RMSE (m3/s), R2, NSE, and PBIAS (%) = 6.31, 0.96, 0.94, and -0.20 during the training period, respectively, and 5.53, 0.95, 0.92, and -0.20 during the testing period, respectively) surpassed the MARS, SVM, and the MLR models in forecasting daily runoff for all cases studies. Among all four models, the RF model outperformed in the training and testing periods. It can be summarized that the RF model is best-in-class and delivers a strong potential for runoff prediction of the Gola watershed.
ARTICLE | doi:10.20944/preprints202007.0628.v2
Subject: Engineering, Energy & Fuel Technology Keywords: DFIG; SVM; VC; Wind Turbine (WT); parameters uncertainly
Online: 16 June 2021 (12:04:39 CEST)
This paper presents the super-twisting algorithm (STA) direct power control (DPC) scheme for the control of active and reactive powers of grid-connected DFIG. Simulations of 5 KW DFIG has been presented to validate the effectiveness and robustness of the proposed approach in the presence of uncertainties with respect to vector control (VC). The proposed controller schemes with fixed gains are effective in reducing the ripple of active and reactive powers, effectively suppress sliding-mode chattering and the effe This paper presents a comparative study of two approaches for the direct power control (DPC) of doubly-fed induction generator (DFIG) based on wind energy conversion system (WECS). Vector Control (VC) and Sliding Mode Control (SMC). The simulation results of the DFIG of 5 KW in the presence of various uncertainties were carried out to evaluate the capability and robustness of the proposed control scheme. The (SMC) strategy is the most appropriate scheme with the best combination such as reducing high powers ripple, diminishing steady-state error in addition to the fact that the impact of machine parameter variations does not change the system performance. cts of parametric uncertainties not affecting system performance.
Online: 13 January 2021 (11:09:03 CET)
Sound signals from the respiratory system are largely the harbingers of human health. Early diagnosis of respiratory tract diseases is of great importance as it creates irreversible effects on human health when delayed. This diagnostic in the medical world has been made possible thanks to machine learning and signal processing analysis. The coronavirus epidemic, which is in question today and deeply shakes the whole world, has been revealed the importance of this issue even more. In terms of the coronavirus pandemic, it has become the focus of researchers to differentiate symptoms from similar diseases such as normal flu or influenza. Among these symptoms, the difference in cough sound has played a distinctive role in the proposed study. Several pioneering studies have proven that almost two-thirds of people who get corona have a dry cough. At this stage, the information of studies based on cough constitutes the main framework of our study. On the other hand, the basis of this study is based on machine learning algorithms. Clinical data collected under the supervision of doctors in a reliable environment was used as dataset. This dataset consists of 16 subjects suspected of the coronavirus with a specific patient demographic. In this study, using the polymerase chain reaction (PCR) test, suspected subjects were divided into two groups as negative and positive. The negative and positive labels represent the patient with non-COVID and with a COVID-19 cough respectively. Using the 3D plot or waterfall representation of the signal frequency spectrum, the salient features of the cough data are revealed. In this way, COVID-19 can be differentiated from other coughs by applying effective feature extraction and classification techniques. Power Spectral Density (PSD) based on Short Time Fourier Transform (STFT) and Mel Frequency Cepstral Coefficients (MFCC) were chosen as the efficient feature extraction method. Finally, among the classification techniques the Support Vector Machine (SVM) algorithm, was applied to the processed signals in order to identify and classify COVID-19 cough. In terms of results evaluation, the cough of subjects with COVID-19 has obtained with 95.86% classification accuracy thanks to the RBF kernel function of SVM and the MFCC method. In other words, the diagnosis of COVID-19 coughs was obtained with 98.6% and 91.7% sensitivity and specificity measures respectively.
ARTICLE | doi:10.20944/preprints202009.0216.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Bitcoin; SVM; linear mixed models; word embedding; ELMo
Online: 10 September 2020 (04:02:53 CEST)
Introduced in 2009, Bitcoin has demonstrated a huge potential as the world’s first digital currency and has been widely used as a financial investment. Our research aims to uncover the relationship between Bitcoin prices and people’s sentiments about Bitcoin on social media. Among various social media platforms, micro-blogging is one of the most popular. Millions of people use micro-blogging platforms to exchange ideas, broadcast views, and to provide opinions on different topics related to politics, culture, science, and technology. This makes them a potentially rich source of data for sentiment analysis. Therefore we chose one of the busiest micro-blogging platforms, Twitter, to perform sentiment analysis on Bitcoin. We used ELMo embedding model to convert Bitcoin-related tweets into a vector form and SVM classifier to divide the tweets into three sentiment categories - positive, negative, and neutral. We then used the sentiment data to find its relation with Bitcoin price fluctuation using the linear mixed model.
ARTICLE | doi:10.20944/preprints202006.0048.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Pattern Recognition; Feature extraction; SVM; HOG; Zonal density
Online: 5 June 2020 (14:03:45 CEST)
Significant progress has made in pattern recognition technology. However, one obstacle that has not yet overcome is the recognition of words in the Brahmi script, specifically the recognition of characters, compound characters, and word because of complex structure. For this kind of complex pattern recognition problem, it is always difficult to decide which feature extraction and classifier would be the best choice. Moreover, it is also true that different feature extraction and classifiers offer complementary information about the patterns to be classified. Therefore, combining feature extraction and classifiers, in an intelligent way, can be beneficial compared to using any single feature extraction. This study proposed the combination of HOG +zonal density with SVM to recognize the Brahmi words. Keeping these facts in mind, in this paper, information provided by structural and statistical based features are combined using SVM classifier for script recognition (word-level) purpose from the Brahmi words images. Brahmi word dataset contains 6,475 and 536 images of Brahmi words of 170 classes for the training and testing, respectively, and the database is made freely available. The word samples from the mentioned database are classified based on the confidence scores provided by support vector machine (SVM) classifier while HOG and zonal density use to extract the features of Brahmi words. Maximum accuracy suggested by system is 95.17% which is better than previously suggested studies.
ARTICLE | doi:10.20944/preprints202104.0146.v1
Subject: Earth Sciences, Oceanography Keywords: Salt Marshes, Google Earth Engine, SVM, Distribution, China’s coast
Online: 5 April 2021 (14:28:19 CEST)
Based on the cloud platform of Google Earth Engine (GEE), this study selected Landsat 5/8 and Sentinel-2 remote sensing images and used Support Vector Machine (SVM) classification method to classify the 35 years of intertidal salt marshes in China, and verified the classification results in combination with field survey. Finally, combining with various driving factors, the reasons and laws affecting the changes of salt marshes species and area were discussed and analyzed. The main results of the study are as follows:The main types of salt marshes plants in China include Phragmites australis, Spartina alterniflora, Suaeda salsa, Scirpus mariquete, Tamarix chinensis, Cyperus malaccensis and Sesuvium portulacastrum. The results salt marshes classification indicated that 166999.32 ha in 1985, 172893.87 ha in 1990, 174952.29 ha in 1995, 125567.51 ha in 2000, 93257.97 ha in 2005, 102539.04 ha in 2010, 96302.92 ha in 2015, and 115722.75 ha in 2019. The main driving factors of salt marsh change from 1985 to 2015 are reclamation, mudflat aquaculture, climate change, coastal zone erosion, invasion of alien species, and natural competition and succession among salt marshes species. The results can be used to quantitatively analyze the salt marshes carbon storage in space and time, and provide data support for the protection of salt marsh wetlands, the restoration of ecological functions and the implementation of "carbon neutral".
ARTICLE | doi:10.20944/preprints201909.0308.v1
Subject: Engineering, Biomedical & Chemical Engineering Keywords: Alzheimer’s disease; emphasis learning; multi-modal classification; svm; pca
Online: 27 September 2019 (10:26:34 CEST)
A method for classification is introduced in this article, and it is tested on ADNI database to diagnose alzheimer’s disease (AD). It is obvious that tunning the performance of a classification to get better results is a complicated problem, and when we want model’s accuracy or other peformance measurments higher than 90%, the problem will be more complicated. In this study, we tried and succeeded to discover a method to solve this problem. The final feature set can be used clustering too, because outgrowth feature set of the proposed method is invigorated. In the recent years, a lot of activities is done to develop computer aided systems (CAD) for alzheimer’s disease diagnosis. Most of these recently developed systems concenterated on extracting and combining features from MRI, PET, CSF, and …; in this article, we made attempt to do so and utilized one more technique to increase classification performance. Finding and producing the best features to solve three binary classiﬁcation problems of AD vs. Normal Control (NC), Mild Cognitive Impairment (MCI) vs. NC, and MCI vs. AD are the purposes of this article. Experiments indicate performance and effectiveness rates of the proposed method, which are accuracies of 98.81%, 81.61%, and 81.40% for AD vs. NC, MCI vs. NC, and AD vs. MCI classification problems, respectively. As can be seen, using this method increased the performance of the three binary problems incredibly.
ARTICLE | doi:10.20944/preprints202208.0109.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: speech emotion recognition; affective computing; data augmentations; wav2vec 2.0; SVM
Online: 4 August 2022 (14:09:21 CEST)
Data augmentation techniques recently gained more adoption in speech processing, including speech emotion recognition. Although more data tends to be more effective, there may be a trade-off in which more data will not provide a better model. This paper reports experiments on investigating the effects of data augmentation in speech emotion recognition. The investigation aims at finding the most useful type of data augmentation and the number of data augmentations for speech emotion recognition. The experiments are conducted on the Japanese Twitter-based emotional speech corpus. The results show that for speaker-independent data, two data augmentations with glottal source extraction and silence removal exhibited the best performance among others, even with more data augmentation techniques. For the text-independent data (including speaker and text-independent), more data augmentations tend to improve speech emotion recognition performances. The results highlight the trade-off between the number of data augmentation and the performance of speech emotion recognition showing the necessity to choose a proper data augmentation technique for a specific application.
ARTICLE | doi:10.20944/preprints202106.0137.v1
Online: 4 June 2021 (10:59:51 CEST)
Multiple sclerosis (MS) is a debilitating disease of the brain and spinal cord (central nervous system). In MS, the immune system attacks the protective sheath (myelin) that covers the nerve fibers, causing communication problems between the brain and the rest of the body. Eventually the disease can cause permanent damage or nerve damage. The signs and symptoms of MS are very different and depend on the extent of the nerve damage and which nerves are affected. Some people with severe MS may lose the ability to walk independently or completely, while others may experience a long recovery period without any new symptoms. Most people with MS have a relapsing-remitting illness. They experience periods of new symptoms or recurrences that occur over days or weeks and usually improve somewhat or completely. Following these recurrences, there are periods of recovery that can last for months or even years. In this Project, we used some methods of machine learning in order to evaluate the precision and accuracy of Methods to Predict and classification of Multiple Sclerosis with different stages. In order to calculate accuracy, precision, recall Fscore we used some different method such as Art Fuzzy, SVM, Decision tree to compare the classes two by two. To improve the results we used the method of Adaptive fuzzy optimization. we used two options Genetic algorithm and particle swarm optimization.
ARTICLE | doi:10.20944/preprints202009.0699.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: SVM; MRMR; Bootstrap; Genes; Gene Expression; Biological Relevance; Subject Classification
Online: 29 September 2020 (09:09:52 CEST)
Selection of biologically relevant genes from high dimensional expression data is a key research problem in gene expression genomics. Most of the available gene selection methods are either based on relevancy or redundancy measure, which are usually adjudged through post selection classification accuracy. Through these methods the ranking of genes was done on a single high-dimensional expression data, which leads to the selection of spuriously associated and redundant genes. Hence, we developed a statistical approach through combining Support Vector Machine with Maximum Relevance and Minimum Redundancy under a sound statistical setup for the selection of biologically relevant genes. Here, the genes are selected through statistical significance values computed using a non-parametric test statistic under a bootstrap based subject sampling model. Further, a systematic and rigorous evaluation of the proposed approach with nine existing competitive methods was carried on six different real crop gene expression datasets. This performance analysis was carried out under three comparison settings, i.e. subject classification, biological relevant criteria based on quantitative trait loci, and gene ontology. Our analytical results showed that the proposed approach selects genes that are more biologically relevant as compared to the existing methods. Moreover, the proposed approach was also found to be better with respect to the competitive existing methods. The proposed statistical approach provides a framework for combining filter, and wrapper methods of gene selection.
ARTICLE | doi:10.20944/preprints201908.0289.v1
Subject: Earth Sciences, Geoinformatics Keywords: drone video; human action recognition; CNN; Support vector machine (SVM)
Online: 28 August 2019 (03:52:22 CEST)
Recognition of the human interaction on the unconstrained videos taken from cameras and remote sensing platforms like a drone is a challenging problem. This study presents a method to resolve issues of motion blur, poor quality of videos, occlusions, the difference in body structure or size, and high computation or memory requirement. This study contributes to the improvement of recognition of human interaction during disasters such as an earthquake and flood utilizing drone videos for rescue and emergency management. We used Support Vector Machine (SVM) to classify the high-level and stationary features obtained from Convolutional Neural Network (CNN) in key-frames from videos. We extracted conceptual features by employing CNN to recognize objects from first and last images from a video. The proposed method demonstrated the context of a scene, which is significant in determining the behaviour of human in the videos. In this method, we do not require person detection, tracking, and many instances of images. The proposed method was tested for the University of Central Florida (UCF Sports Action), Olympic Sports videos. These videos were taken from the ground platform. Besides, camera drone video was captured from Southwest Jiaotong University (SWJTU) Sports Centre and incorporated to test the developed method in this study. This study accomplished an acceptable performance with an accuracy of 90.42%, which has indicated improvement of more than 4.92% as compared to the existing methods.
ARTICLE | doi:10.20944/preprints201906.0195.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: adaptive bilateral; marker watershed; PSO; fuzzy C-mean; GLCM; SVM
Online: 20 June 2019 (09:22:05 CEST)
Recently, the medical image processing is extensively used in several areas. In earlier detection and treatment of these diseases is very helpful to find out the abnormality issues in that image. Here there are number of methods available for segmentation to detect the lung nodule of computer tomography (CT) image. The main result of this paper, the earlier detection of lung nodules using Pre-processing techniques of top-hat transform, median and adaptive bilateral filter was compared both filtering methods and proved the adaptive bilateral filter is suitable method for CT images. The proposed segmentation technique uses novel strip method and the image is split into number of strips 3, 4, 5 and 6. A marker- watershed method based on PSO and Fuzzy C-mean Clustering method was proposed method. Firstly, the input image was reduced noise reduction and smoothing and the filter image is using strips method and then the image is segmented by marker watershed method. Secondly, the enhanced PSO technique was used to locate the better accurate value of the clustering centers of Fuzzy C-mean Clustering. Final stage, with the accurate value of centers and the enhanced target function and the small region of the segmented object was clustered by Fuzzy C-mean. In segmentation algorithm presented in this paper gives 95% of accuracy rate to detect lung nodules when strip count is 5.
ARTICLE | doi:10.20944/preprints202004.0316.v2
Subject: Earth Sciences, Environmental Sciences Keywords: Precision farming; Early crop-type mapping; Sentinel-2; Random Forest; SVM
Online: 17 January 2022 (10:54:10 CET)
Crop-type mapping is an important intermediate step for cost-effective crop management at the field level, as an overview of all fields with a particular crop type can be used for monitoring or yield forecasting, for instance. Our study used a data set with 2400 fields and corresponding satellite observations from the federal state of Bavaria, Germany. The study classified corn, winter wheat, winter barley, sugar beet, potato, and winter rapeseed as the main crops grown in Upper Bavaria. We additionally experimented with a rejection class "Other", which summarised further crop types. Corresponding Sentinel-2 data included the normalised difference vegetation index (NDVI) and raw bands from 2016 to 2018 for each selected field. The influence of raw bands compared to NDVI was analysed and the classification algorithms, i.e. support vector machine (SVM) and random forest (RF), were compared. The study showed that the use of an index should be critically questioned and that raw bands provided a wider spectral bandwidth, which significantly improved the mapping of crop types. The results underline the use of RF with raw bands and achieved overall accuracies (OA) of up to 92%. We also predicted crop types in an unknown year with significantly different weather conditions and several months before the end of the growing season. Thus, the influence of climate anomalies and the accuracy depending on the time of prediction were assessed. The crop types of a test site and year without labels could be determined with an OA of up to 86%. The results demonstrate the usefulness of the proof-of-concept and its readiness for use in real applications.
ARTICLE | doi:10.20944/preprints202112.0307.v1
Subject: Engineering, Civil Engineering Keywords: Road safety; Safety management; Road transportation; GMDH; GOA-SVM; Machine learning
Online: 20 December 2021 (10:37:05 CET)
Evaluation of road safety is a critical issue having to be conducted for successful safety management in road transport systems, whereas safety management is considered in road transportation systems as a challenging task according to the dynamic of this issue and the presence of a large number of effective parameters on road safety. Therefore, evaluation and analysis of important contributing factors affecting the number of crashes play a key role in increasing the efficiency of road safety. For this purpose, in this research work, two machine learning algorithms including the group method of data handling (GMDH)-type neural network and a combination of support vector machine (SVM) and the grasshopper optimization algorithm (GOA) are employed for evaluating the number of vehicles involved in the accident based on the seven factors affecting transport safety including the Daylight (DL), Weekday (W), Type of accident (TA), Location (L), Speed limit (SL), Average speed (AS) and Annual average daily traffic (AADT) of rural roads of Cosenza in southern Italy. In this study, 564 data sets of rural areas were investigated and relevant effective parameters were measured. In the next stage, several models were developed to investigate the parameters affecting the safety management of road transportation for rural areas. The results obtained demonstrated that "Average speed" has the highest level and "Weekday" has the lowest level of importance in the investigated rural area. Finally, although the results of both algorithms were the same, the GOA-SVM model showed a better degree of accuracy and robustness than the GMDH model.
ARTICLE | doi:10.20944/preprints201705.0098.v1
Subject: Earth Sciences, Environmental Sciences Keywords: rule-based classification model; wetland remote sensing; SVM; TC-Wetness; China
Online: 11 May 2017 (08:03:34 CEST)
Wetlands are among the most bio-diverse and highest productivity ecosystems on earth, making their monitoring a high priority to conservation, protection and management interests. Although visual interpretation of satellite images is generally precise for monitoring wetlands, recent works have emphasized computerized classification methods because of the reduction in analyst time. However, it is difficult to automatically identify wetland solely based on spectral characteristics due to the complexity of wetland ecosystems. The ability to extract wetland information rapidly and accurately is the basis and the key to wetland mapping at a large scale. Here we propose an operational method to map China wetlands based on Landsat TM data and ancillary data. On the basis of theoretical analysis of wetland automatic classification, we developed a revised multi-layer wetland classification scheme and a rule-based classification model. In the latter, supervised classification (SVM and decision tree) and unsupervised classification (ISODATA) methods were tested. Four Landsat TM images, representing various wetland eco-regions in China (i.e. the Sanjiang Plain in the northeast China, the North China Plain, the Zoige Plateau in the southwest China and the Pearl River Estuary in southeast China), were automatically classified. The overall classification accuracies were 86.57%, 96.00%, 84.51% and 88.30%, respectively, which we considered to be satisfactory accuracy. Our results indicate that issues such as the resolution of geographic data and the understanding of wetland samples should be carefully addressed in the future.
ARTICLE | doi:10.20944/preprints201908.0225.v1
Subject: Earth Sciences, Geoinformatics Keywords: water bodies; satellite images; vector data; SVM; positive and negative buffering; polygons
Online: 21 August 2019 (10:30:16 CEST)
The technique of obtaining information or data about any feature or object from afar, called in technical parlance as remote sensing, has proven extremely useful in diverse fields. In the ecological sphere, especially, remote sensing has enabled collection of data or information about large swaths of areas or landscapes. Even then, in remote sensing the task of identifying and monitoring of different water reservoirs has proved a tough one. This is mainly because getting correct appraisals about the spread and boundaries of the area under study and the contours of any water surfaces lodged therein becomes a factor of utmost importance. Identification of water reservoirs is rendered even tougher because of presence of cloud in satellite images, which becomes the largest source of error in identification of water surfaces. To overcome this glitch, the method of the shape matching approach for analysis of cloudy images in reference to cloud-free images of water surfaces with the help of vector data processing, is recommended. It includes the database of water bodies in vector format, which is a complex polygon structure. This analysis highlights three steps: First, the creation of vector database for the analysis; second, simplification of multi-scale vector polygon features; and third, the matching of reference and target water bodies database within defined distance tolerance. This feature matching approach provides matching of one to many and many to many features. It also gives the corrected images that are free of clouds.
ARTICLE | doi:10.20944/preprints202103.0434.v1
Subject: Engineering, Automotive Engineering Keywords: WiFi sounder; CSI; MIMO; indoor location estimation; array signal processing; machine learning; SVM
Online: 17 March 2021 (10:57:38 CET)
In recent years, since the propagation channel characteristics have been effectively used for applications such as motion sensing, position detection, etc. A great deal of attention is attracted to channel sounding methods easy to utilize using low-cost devices. This paper presents a device-free indoor location estimation method using spatio-temporal features of radio propagation channels using the 2.4-GHz band 3-by-3 MIMO channel sounder developed using commodity wireless LANs. The measurement results demonstrated a reasonable performance of the proposed method with small number of antennas.
ARTICLE | doi:10.20944/preprints202210.0238.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: NLP; NLU; Twitter; Sentiment Analysis; Opinion Mining; Nigeria; Election; Machine Learning; BERT; LSTM; SVM
Online: 17 October 2022 (12:01:42 CEST)
Introduction: Social media platforms such as Facebook, LinkedIn, Twitter, among others have been used as a tool for staging protests, opinion polls, campaign strategy, medium of agitation and a place of interest expression especially during elections. Past studies have established people’s opinion elections using social media posts. The advent of state-of-the-art algorithms for unstructured text processing implies tremendous progress in natural language processing and understanding. Aim: In this work, a Natural Language framework is designed to understand Nigeria 2023 presidential election based on public opinion using Twitter dataset. Methods: Raw datasets concerning discourse around Nigeria 2023 elections from Twitter of 2,059,113 18 dimensions were collected. Sentiment analysis was performed on the preprocessed dataset using three different machine learning models namely: Long Short-Term Memory (LSTM) Recurrent Neural Network, Bidirectional Encoder Representations from Transformers (BERT) and Linear Support Vector Classifier (LSVC) models. Personal tweet analysis of the three candidates provided insight on their campaign strategies and personalities while public tweet analysis established the public’s opinion about them. The performance of the models was also compared using accuracy, recall, false positive rate, precision and F-measure. Results: LSTM model gave an accuracy, precision, recall, AUC and f-measure of 88%, 82.7%, 87.2% , 87.6% and 82.9% respectively; the BERT model gave an accuracy, precision, recall, AUC and f-measure of 94%, 88.5%, 92.5%, 94.7% and 91.7% respectively while the LSVC model gave an accuracy, precision, recall, AUC and f-measure of 73%, 81.4%, 76.4%, 81.2% and 79.2% respectively. Conclusion: The experimental results show that sentiment analysis and other Natural Language Processing tasks can aid in the understanding of the social media space. Results also revealed the leverage of each aspirant towards winning the election. We conclude that sentiment analysis can form a general basis for generating insights for election and modeling election outcomes.
ARTICLE | doi:10.20944/preprints202108.0366.v2
Subject: Medicine & Pharmacology, Oncology & Oncogenics Keywords: Lung adenocarcinoma; PD-1 inhibitor; LASSO analysis and SVM-RFE; Immune cell infiltration; TCGA
Online: 25 August 2021 (09:22:41 CEST)
In a recent study, the PD-1 inhibitor has been widely used in clinical trials and shown to improve various cancers. However, PD-1/PD-L1 inhibitors showed a low response rate and showed to be effective for a small number of cancer patients. Thus, it is important to identify key genes, which can enhance the PD-1/PD-L1 response for promoting immunotherapy. Here, we used ssGSEA and unsupervised clustering analysis to identify three clusters to show different immune cell infiltration status, prognosis, and biological action. The cluster C showed a better survival rate, high immune cells infiltration, and immunotherapy effect enriched in a variety of immune active pathways, including T and B cell signal receptors. Besides, it showed more immune subtypes C2 and C3. Further, we used WGCNA analysis to confirm the cluster C correlated genes. The red module highly correlated with cluster C for 111 genes which were enriched in a variety of immune-related pathways. To pick candidate genes in SD/PD and CR/PR patients, we used the Least Absolute Shrinkage and SVM-RFE algorithms. In conclusion, our LASSO analysis and SVM-RFE based research identified targets with better prognosis, activated immune-related pathways, and better immunotherapy. The KLRC3 was identified as the key gene which can efficiently respond to immunotherapy with greater efficacy and better prognosis.
ARTICLE | doi:10.20944/preprints202106.0633.v1
Subject: Engineering, Automotive Engineering Keywords: Conditional temporal moments; Optimizable support vector machine (SVM); Gearbox fault diagnosis; Vibration analysis.
Online: 28 June 2021 (09:57:04 CEST)
Fault diagnosis of the gearbox is a decisive part of the modern industry to find the many gearbox defects like gear tooth crack, chipped or broken, etc. But sometimes, the nonstationary properties of vibration signal and low energy of minimal faults make this procedure very challenging. Previously, many types of techniques have been developed for gearbox condition monitoring. But most of the methods are dealing with conventional techniques of the gearbox condition monitoring, such as time-domain analysis or frequency domain analysis. Most of the conventional methods are not suitable for the nonstationary vibration signal. Thus, this paper presents a novel gearbox fault diagnosis technique using conditional temporal moments and an optimizable support vector machine (SVM). This work also presents an integrated features extraction technique based on the standard features, i.e., statistical and spectral features with the combinations of moment features. The impact of the four conditional temporal moments of each gearbox condition is also presented. This work shows that the proposed method successfully classifies and categorizes the gearbox faults at an early stage.
Subject: Earth Sciences, Geology Keywords: gold deposit; alteration information; ASTER image; support vector machine (SVM); principal component analysis (PCA)
Online: 22 October 2019 (04:26:18 CEST)
Dayaoshan, as an important metal ore producing area in China, is faced with the dilemma of resource depletion due to long-term exploitation. In this paper, remote sensing method is used to circle the favorable metallogenic areas and find new ore points for Gulong. Firstly, vegetation interference bas been removed by using mixed pixel decomposition method with hyperplane and genetic algorithm (GA) optimization; then, altered mineral distribution information has been extracted based on principal component analysis (PCA) and support vector machine (SVM) method; Thirdly, the favorable areas of gold mining in Gulong has been delineated by using ant colony algorithm (ACA) optimization SVM model to remove false altered minerals; Lastly, field survey verified that the extracted alteration mineralization information is correct and effective. The results show that the mineral alteration extraction method proposed in this paper has certain guiding significance for metallogenic prediction by remote sensing.
ARTICLE | doi:10.20944/preprints201903.0122.v1
Subject: Earth Sciences, Geoinformatics Keywords: Classification, SVM Classifier, ML Classifier, Supervised and Unsupervised Classification, Object-based Classification, Multispectral Data
Online: 11 March 2019 (09:01:44 CET)
This paper focuses on the crucial role that remote sensing plays in divining land features. Data that is collected distantly provides information in spectral, spatial, temporal and radiometric domains, with each domain having the specific resolution to information collected. Diverse sectors such as hydrology, geology, agriculture, land cover mapping, forestry, urban development and planning, oceanography and others are known to use and rely on information that is gathered remotely from different sensors. In the present study, IRS LISS IV Multi-spectral data is used for land cover mapping. It is known, however, that the task of classifying high-resolution imagery of land cover through manual digitizing consumes time and is way too costly. Therefore, this paper proposes accomplishing classifications by way of enforcing algorithms in computers. These classifications fall in three classes: supervised, unsupervised, and object-based classification. In the case of supervised classification, two approaches are relied upon for land cover classification of high-resolution LISS-IV multispectral image. These approaches are Maximum Likelihood and Support Vector Machine (SVM). Finally, the paper proposes a step-by-step procedure for optical image classification methodology. This paper concludes that in optical data classification, SVM classification gives a better result than the ML classification technique.
ARTICLE | doi:10.20944/preprints202104.0183.v1
Subject: Keywords: Intrusion detection systems; machine learning; NSL-KDD; feature selection; classification model; SBDS, ABDS, Snort, SVM
Online: 6 April 2021 (17:59:47 CEST)
Cloud computing is an emerging area which provide on demand computing resources and services through internet. It is faster and efficient technique but prone to severe security attacks. In this paper author have proposed a Network Intrusion Detection System (NIDS) to detect attacks at front end and backend when bulky flow of data packets flowing in a cloud environment. In our framework we used Signature based detection system for identifying the intruder and the Anomaly based detection system for detecting network attacks. The NIDS sensors were placed in a collaborative manner to prevent the attacks and to update the knowledge bases. Author have used supervised learning model to detect abnormal behavior of packets from network traffic. The dataset were trained and tested in terms of precision, recall, accuracy and model build time to select the best machine-learning model for detection of intruder and to improve the computational time and performance.
ARTICLE | doi:10.20944/preprints202210.0426.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: novelty-class; one online-Class SVM (OCSVM); memory dump; Malware; Principal Component Analysis (PCA); dimensionality reduction
Online: 27 October 2022 (08:17:43 CEST)
Malware complexity is rapidly increasing, causing catastrophic impacts on computer systems. Memory dump malware is gaining increased attention due to its ability to expose plaintext passwords or key encryption files. This paper presents an enhanced classification model based on One class SVM (OCSVM) classifier that can identify any deviation from the normal memory dump file patterns and detect it as malware. The proposed model integrates OCSVM and Principal Component Analysis (PCA) for increased model sensitivity and efficiency. An up-to-date dataset known as “MALMEMANALYSIS-2022” was utilized during the evaluation phase of this study. The accuracy achieved by the traditional one-class classification (TOCC) model was 55%, compared to 99.4% in the one-class classification with PCA (OCC-PCA) model. Such results have confirmed the increased performance achieved by the proposed model.
ARTICLE | doi:10.20944/preprints202005.0451.v1
Subject: Mathematics & Computer Science, Applied Mathematics Keywords: Bilateral Line Local Binary Patterns; Facial matrix; Statistical subspace; Face recognition; Calibrated SVM model; Ensemble learning
Online: 27 May 2020 (12:07:19 CEST)
Local binary pattern is one of the visual descriptors and can be used as a powerful feature extractor for texture classification. In this paper, a novel representation for face recognition is proposed, called it Bilateral Line Local Binary Patterns (BL-LBP). This scheme is an extension of Line Local Binary Patterns descriptors in the statistical learning subspace. The present bilateral descriptors are fused with an ensemble learning of calibrated SVM models. The performance of this scheme is evaluated using 5 standard face databases. It is found that it is robust against illumination variation, diverse facial expressions and head pose variations and its recognition accuracy reaches 98 percent, running on a mobile device with a processing speed of 63 ms per face. Results suggest that our proposed method can be very useful for the vision systems that have limited resources where the computational cost is critical.
ARTICLE | doi:10.20944/preprints202012.0054.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Patterns recognition; Machine learning; Hereditary Ataxia diseases; K-Nearest Neighbors; Multi Layer Perceptron; Ensemble Classification Trees; SVM.
Online: 2 December 2020 (09:33:15 CET)
The progressive impairment analysis in gait from neurological diseases patients such as Hereditary Ataxias (HA) has been carried out using gait data collected with movement sensors. This research is focused on finding the minimum amount required of gait features to recognize efficiently and less intrusive way, HA patients based on data collected with iPhone movement sensors placed on the ankles from 14 HA patients and 14 healthy people. A twofold proposal is made , first a local minimum prominent peak criterion to find out the starting point of each stride, to get 10-stride window about which 56 spatial-temporal features are derived; second a search strategy based on Hill Climbing algorithm to reduce the number of gait features and sensors. The main results were the findings that with two gait patterns a 96% of classification accuracy was achieved by using K-Nearest Neighbors (KNN) and Multi-Layer Perceptron (MLP) algorithms, but in addition, MLP only right ankle sensor patterns were required which also allows to reduce the intrusion.
ARTICLE | doi:10.20944/preprints201910.0349.v2
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: hybrid machine learning; extreme learning machine (ELM); radial basis function (RBF); breast cancer; support vector machine (SVM)
Online: 24 February 2020 (04:10:49 CET)
Mammography is often used as the most common laboratory method for the detection of breast cancer, yet associated with the high cost and many side effects. Machine learning prediction as an alternative method has shown promising results. This paper presents a method based on a multilayer fuzzy expert system for the detection of breast cancer using an extreme learning machine (ELM) classification model integrated with radial basis function (RBF) kernel called ELM-RBF, considering the Wisconsin dataset. The performance of the proposed model is further compared with a linear-SVM model. The proposed model outperforms the linear-SVM model with RMSE, R2, MAPE equal to 0.1719, 0.9374 and 0.0539, respectively. Furthermore, both models are studied in terms of criteria of accuracy, precision, sensitivity, specificity, validation, true positive rate (TPR), and false-negative rate (FNR). The ELM-RBF model for these criteria presents better performance compared to the SVM model.
ARTICLE | doi:10.20944/preprints202107.0638.v1
Subject: Keywords: Image Processing; Automated Plant Diseases Detection; Histogram Oriented Gradient (HOG); Local Binary Pattern (LBP); Support Vector Machine (SVM)
Online: 28 July 2021 (17:18:04 CEST)
: On earth, plants play the most important part. Every organ of a plant plays a vital role in the ecological field as well as the medicinal field. But on the whole earth there are several species of plants are available. Different plants have different diseases. Therefore it is needed to identify the plants and their diseases to prevent loss. Now to identify the plants and their diseases manually is very time consuming. In this research an automatic plant and their disease detection system is proposed. For experimental purposes, high-quality leaf images are accepted for training and testing. For detecting the healthy and diseased area in a leaf, region-based and color-based region thresholding techniques were used. For feature selection Histogram Oriented Gradient (HOG) and Local Binary Pattern (LBP) method were applied. Finally for classification two-class and multi-class Support Vector Machine (SVM) was used. It is observed that both feature selection processes with SVM give 99% accuracy. Finally to understand the automated system a graphical user interface was created for all users.
ARTICLE | doi:10.20944/preprints202111.0345.v1
Subject: Engineering, Biomedical & Chemical Engineering Keywords: brain-computer interface (BCI); electroencephalography (EEG); stress state recognition; feature selection; particle swarm optimization (PSO); mRMR; SVM; DEEP; SEED
Online: 19 November 2021 (11:01:19 CET)
Mental stress state recognition using electroencephalogram (EEG) signals for real-life applications needs a conventional wearable device. This requires an efficient number of EEG channels and an optimal feature set. The main objective of the study is to identify an optimal feature subset that can best discriminate mental stress states while enhancing the overall performance. Thus, multi-domain feature extraction methods were employed, namely, time domain, frequency domain, time-frequency domain, and network connectivity features, to form a large feature vector space. To avoid the computational complexity of high dimensional space, a hybrid feature selection (FS) method of minimum Redundancy Maximum Relevance with Particle Swarm Optimization and Support Vector Machine (mRMR-PSO-SVM) is proposed to remove noise, redundant, and irrelevant features and keep the optimal feature subset. The performance of the proposed method is evaluated and verified using four datasets, namely EDMSS, DEAP, SEED, and EDPMSC. To further consolidate, the effectiveness of the proposed method is compared with that of the state-of-the-art heuristic methods. The proposed model has significantly reduced the features vector space by an average of 70% in comparison to the state-of-the-art methods while significantly increasing overall detection performance.
ARTICLE | doi:10.20944/preprints202004.0503.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: UWB; NLOS identification; multi-path detection; NLOS and MP discrimination; machine learning; SVM; random forest; multilayer perceptron; LOS; DWM1000; indoor localization
Online: 29 April 2020 (10:29:54 CEST)
In Ultra-wideband (UWB)-based wireless ranging or distance measurement, differentiation between line-of-sight~(LOS), non-line-of-sight~(NLOS), and multi-path (MP) conditions are important for precise indoor localization. This is because the accuracy of the reported measured distance in UWB ranging systems is directly affected by the measurement conditions (LOS, NLOS or MP). However, the major contributions in literature only address the binary classification between LOS and NLOS in UWB ranging systems. The MP condition is usually ignored. In fact, the MP condition also has a significant impact on the ranging errors of the UWB compared to the direct LOS measurement results. Though, the magnitudes of the error contained in MP conditions are generally lower than completely blocked NLOS scenarios. This paper addresses machine learning techniques for identification of the mentioned three classes (LOS, NLOS, and MP) in the UWB indoor localization system using an experimental data-set. The data-set was collected in different conditions at different scenarios in indoor environments. Using the collected real measurement data, we compare three machine learning (ML) classifiers, i.e., support vector machine (SVM), random forest (RF) based on an ensemble learning method, and multilayer perceptron (MLP) based on a deep artificial neural network, in terms of their performance. The results show that applying ML methods in UWB ranging systems are effective in identification of the above-mentioned three classes. In specific, the overall accuracy reaches up to 91.9% in the best-case scenario and 72.9% in the worst-case scenario. Regarding the F1-score, it is 0.92 in the best-case and 0.69 in the worst-case scenario. For reproducible results and further exploration, we (will) provide the publicly accessible experimental research data discussed in this paper at PUB - Publications at Bielefeld University. The evaluations of the three classifiers are conducted using the open-source python machine learning library scikit-learn.
ARTICLE | doi:10.20944/preprints202007.0634.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: CVD rehabilitation; Local muscular endurance exercises; Exercise-based rehabilitation; Deep Learning; AlexNet; CNN; SVM; kNN; RF; MLP; PCA; multi-class classification; INSIGHT-LME dataset
Online: 26 July 2020 (15:21:08 CEST)
Exercise-based cardiac rehabilitation requires patients to perform a set of certain prescribed exercises a specific number of times. Local muscular endurance (LME) exercises are an important part of the rehabilitation program. Automatic exercise recognition and repetition counting, from wearable sensor data is an important technology to enable patients to perform exercises independently in remote settings, e.g. their own home. In this paper we first report on a comparison of traditional approaches to exercise recognition and repetition counting, corresponding to supervised machine learning and peak detection from inertial sensing signals respectively, with more recent machine learning approaches, specifically Convolutional Neural Networks (CNNs). We investigated two different types of CNN: one using the AlexNet architecture, the other using time-series array. We found that the performance of CNN based approaches were better than the traditional approaches. For exercise recognition task, we found that the AlexNet based single CNN model outperformed other methods with an overall 97.18% F1-score measure. For exercise repetition counting , again the AlexNet architecture based single CNN model outperformed other methods by correctly counting repetitions in 90% of the performed exercise sets within an error of ±1. To the best of our knowledge, our approach of using a single CNN method for both recognition and repetition counting is novel. In addition to reporting our findings, we also make the dataset we created, the INSIGHT-LME dataset, publicly available to encourage further research.
ARTICLE | doi:10.20944/preprints201811.0293.v1
Subject: Engineering, Energy & Fuel Technology Keywords: machine Learning (ML); artificial neutral network (ANN); bagging decision tree (BDT); SUpport Vector Machines (SVM); no free lunch theorem (NFLT); hyperparameter optimisation; model comparison; heat meter
Online: 13 November 2018 (04:41:07 CET)
Heat metres are used to calculate the consumed energy in central heating systems. The subject of this article is to prepare a method of predicting a failure of a heat meter in the next settlement period. Predicting failures is essential to coordinate the process of exchanging the heat metres and to avoid inaccurate readings, incorrect billing and additional costs. The reliability analysis of heat metres was based on historical data collected over many years. Three independent models of machine learning were proposed, and they were applied to predict failures of metres. The efficiency of the models was confirmed and compared using the selected metrics. The optimisation of hyperparameters characteristics for each of models was successfully applied. The article shows that the diagnostics of devices does not have to rely only on newly collected information, but it is also possible to use the existing big data sets.
ARTICLE | doi:10.20944/preprints201907.0319.v1
Subject: Engineering, Energy & Fuel Technology Keywords: heat meter; district heating; fault detection; predictive maintenance; Machine Learning (ML); Artificial Neural Network (ANN); Bagging Decision Tree (BDT); Support Vector Machines (SVM); hyperparameter optimisation; ensemble model
Online: 28 July 2019 (16:26:47 CEST)
The need to increase the energy efficiency of buildings as well as the use of local renewable heat sources has caused that heat meters are used not only to calculate the consumed energy but also for the active management of central heating systems. Increasing the reading frequency and the use of measurement data to control the heating system expands the requirements for the reliability of heat meters. The aim of the research is to analyse a large set of meters in the real network and predict their faults to avoid inaccurate readings, incorrect billing, heating system disruption and unnecessary maintenance. The reliability analysis of heat metres, based on historical data collected over several years, shows some regularities which cannot be easily described by physics-based models. The failure rate is almost constant and does depend on the past but is a non-linear combination of state variables. To predict meters' failures in the next settlement period, three independent machine learning models are implemented and compared with selected metrics because even the high performance of a single model (87\% True Positive for Neural Network) may be insufficient to make a maintenance decision. Additionally, performing hyperparameters optimisation boosts models' performance by a few percent. Finally, three improved models are used to build an ensemble classifier which outperforms the individual models. The proposed procedure ensures the high efficiency of fault detection (>95\%), while maintaining overfitting at the minimum level. The methodology is universal and can be utilised to study the reliability and predict faults of other types of meters and different objects with the constant failure rate.
ARTICLE | doi:10.20944/preprints201810.0073.v1
Subject: Engineering, Biomedical & Chemical Engineering Keywords: Classification; F-score; Gray-Level Co-occurrence Matrix (GLCM); Gray-Level Run-Length Matrix (GLRLM); Hepatocellular Carcinoma (HCC); Liver Cancer; Liver Abscess; Image Texture, Sequential Backward Selection (SBS); Sequential Forward Selection (SFS); Support Vector Machine (SVM); Ultrasound Image.
Online: 4 October 2018 (14:01:42 CEST)
This paper discusses the computer-aided (CAD) classification between Hepatocellular Carcinoma (HCC), i.e., the most common type of liver cancer, and Liver Abscess, based on ultrasound image texture features and Support Vector Machine (SVM) classifier. Among 79 cases of liver diseases, with 44 cases of HCC and 35 cases of liver abscess, this research extracts 96 features of Gray-Level Co-occurrence Matrix (GLCM) and Gray-Level Run-Length Matrix (GLRLM) from the region of interests (ROIs) in ultrasound images. Three feature selection models, i) Sequential Forward Selection, ii) Sequential Backward Selection, and iii) F-score, are adopted to determine the identification of these liver diseases. Finally, the developed system can classify HCC and liver abscess by SVM with the accuracy of 88.875%. The proposed methods can provide diagnostic assistance while distinguishing two kinds of liver diseases by using a CAD system.
REVIEW | doi:10.20944/preprints201905.0175.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: demand prediction, energy systems; machine learning; artificial neural network (ANN); support vector machines (SVM); neuro-fuzzy; ANFIS; wavelet neural network (WNN); big data; decision tree (DT); ensemble learning; hybrid models; data science; deep learning; renewable energies; energy informatics; prediction; forecasting; energy demand
Online: 14 May 2019 (14:00:40 CEST)
Electricity demand prediction is vital for energy production management and proper exploitation of the present resources. Recently, several novel machine learning (ML) models have been employed for electricity demand prediction to estimate the future prospects of the energy requirements. The main objective of this study is to review the various ML models applied for electricity demand prediction. Through a novel search and taxonomy, the most relevant original research articles in the field are identified and further classified according to the ML modeling technique, perdition type, and the application area. A comprehensive review of the literature identifies the major ML models, their applications and a discussion on the evaluation of their performance. This paper further makes a discussion on the trend and the performance of the ML models. As the result, this research reports an outstanding rise in the accuracy, robustness, precision and the generalization ability of the prediction models using the hybrid and ensemble ML algorithms.