REVIEW | doi:10.20944/preprints202010.0649.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: text mining; natural language processing; electronic health records; clinical text; machine learning
Online: 30 October 2020 (15:01:24 CET)
Electronic health records (EHRs) are becoming a vital source of data for healthcare quality improvement, research, and operations. However, much of the most valuable information contained in EHRs remains buried in unstructured text. The field of clinical text mining has advanced rapidly in recent years, transitioning from rule-based approaches to machine learning and, more recently, deep learning. With new methods come new challenges, however, especially for those new to the field. This review provides an overview of clinical text mining for those who are encountering it for the first time (e.g. physician researchers, operational analytics teams, machine learning scientists from other domains). While not a comprehensive survey, it describes the state of the art, with a particular focus on new tasks and methods developed over the past few years. It also identifies key barriers between these remarkable technical advances and the practical realities of implementation at health systems and in industry.
ARTICLE | doi:10.20944/preprints202009.0728.v2
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Agile Software Development; Agile Methods; Software Team Productivity; Normality; Statistical Model
Online: 30 October 2020 (12:28:01 CET)
Agile methods promise to achieve high productivity and provide high-quality software. Agile software development is the most important approach that has spread through the world of software development over the past decade. The software team’s productivity measurement is essential in agile teams to increase the performance of Software development. Due to the increasing competition of software development companies, software team productivity has become one of the crucial challenges for software companies and teams. Awareness of the level of team productivity can help them to achieve better estimation results on the time and cost of the projects. However, to measure software productivity, there is no definitive solution or approach whether in traditional and agile software development teams that lead to the occurrence of many problems in achieving a reliable definition of software productivity. Hence, this study aims to propose a statistical model to assess the team’s productivity in agile teams. A survey was conducted with forty software companies and measured the impact of six factors of the team on productivity in these companies. The results show that team effectiveness factors including inter-team relationship, quality conformance by the team, team vision, team leader, and requirements handled by the team had a significant impact on team productivity. Moreover, the results also state that inter-team relations affect the most on software teams’ productivity. Finally, the model fit test showed that 80% of productivity depends on team effectiveness factors.
ARTICLE | doi:10.20944/preprints202010.0626.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: function space; function space integral; partial derivative approach; change of scale formula
Online: 30 October 2020 (08:08:24 CET)
We investigate the behavior of the partial derivative approach to the change of scale formula and prove relationships among the analytic Wiener integral and the analytic Feynman integral of the partial derivative for the function space integral.
Thu, 29 October 2020
ARTICLE | doi:10.20944/preprints202010.0617.v1
Subject: Mathematics & Computer Science, Computational Mathematics Keywords: differential algebraic equations; index reduction; block triangular forms
Online: 29 October 2020 (14:34:38 CET)
A new generation of universal tools and languages for modeling and simulation multi-physical domain applications emerged and became widely accepted, which generate large-scale systems of differential algebraic equations (DAEs) automatically. Motivated by the characteristics of DAEs systems with large dimension, high index or block structures, we first propose a modified Pantelides’ algorithm (MPA) for any high order DAEs based on its Σ matrix, which is similar to Pryce’s Σ method. By introducing a vital parameter vector, a modified Pantelides’ algorithm with parameter has been presented.It leads to a block Pantelides’ algorithm (BPA) naturally which can immediately compute the crucial canonical offsets for whole (coupled) systems with block-triangular form. We illustrate these algorithms by some examples. And numerical experiments show that the time complexity of BPA can be reduced by at least O(ℓ) compared to the MPA, which is mainly consistent with the results of our analysis.
ARTICLE | doi:10.20944/preprints202010.0616.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Spike-and-wave; Generalized Gaussian distribution; EEG; Morlet wavelet; k-nearest neighbors classifier; Epilepsy
Online: 29 October 2020 (14:05:54 CET)
Spike-and-wave discharge (SWD) pattern detection in electroencephalography (EEG) signals is a key signal processing problem. It is particularly important for overcoming time-consuming, difficult, and error-prone manual analysis of long-term EEG recordings. This paper presents a new SWD method with a low computational complexity that can be easily trained with data from standard medical protocols. Precisely, EEG signals are divided into time segments for which the Morlet 1-D decomposition is applied. The generalized Gaussian distribution (GGD) statistical model is fitted to the resulting wavelet coefficients. A k-nearest neighbors (k-NN) self-supervised classifier is trained using the GGD parameters to detect the spike-and-wave pattern. Experiments were conducted using 106 spike-and-wave signals and 106 non-spike-and-wave signals for training and another 96 annotated EEG segments from six human subjects for testing. The proposed SWD classification methodology achieved 95 % sensitivity (True positive rate), 87% specificity (True Negative Rate), and 92% accuracy. These results set the path to new research to study causes underlying the so-called absence epilepsy in long-term EEG recordings.
ARTICLE | doi:10.20944/preprints202010.0611.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: polynion; quaternion; octonion; sedenions; ring; algebra
Online: 29 October 2020 (12:43:22 CET)
In this note we introduce the notion of polynions and discuss their mathematical relevance. We note that the well-known mathematical structures like quaternions, octonions and sedenions etc. are special cases of polynions.
ARTICLE | doi:10.20944/preprints202010.0605.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Smart cities; Meta-heuristics; Travelling Salesman Problem; TLBO; Parallelism; GPU
Online: 29 October 2020 (09:34:23 CET)
The development of the smart city concept and the inhabitants’ need to reduce travel time, as well as society’s awareness of the reduction of fuel consumption and respect for the environment, lead to a new approach to the classic problem of the Travelling Salesman Problem (TSP) applied to urban environments. This problem can be formulated as “Given a list of geographic points and the distances between each pair of points, what is the shortest possible route that visits each point and returns to the departure point?” Nowadays, with the development of IoT devices and the high sensoring capabilities, a large amount of data and measurements are available, allowing researchers to model accurately the routes to choose. In this work, the purpose is to give solution to the TSP in smart city environments using a modified version of the metaheuristic optimization algorithm TLBO (Teacher Learner Based Optimization). In addition, to improve performance, the solution is implemented using a parallel GPU architecture, specifically a CUDA implementation.
ARTICLE | doi:10.20944/preprints202010.0604.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: eHealth; end-user; chronic disease; participatory design; socio-technical; diversity in care
Online: 29 October 2020 (09:30:26 CET)
Critically the paper argues that a truly people-centered technology supported chronic care system can only be designed by understanding and responding to the needs, attributes and capabilities of the most vulnerable in society. The paper suggests innovative ways of supporting interactions with these ‘end-users’ and highlights how reflection on these approaches can contribute to emancipating both the health system to move towards more socially inclusive e-health solutions.
ARTICLE | doi:10.20944/preprints202010.0598.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: circulation; informal settlements; COVID-19; coronavirus; physical distancing; social distancing; graph theory; oriented graph; cluster graph; urban planning; architecture; Königsberg; Dharavi; Christopher Alexander; slum; favela
Online: 29 October 2020 (08:44:21 CET)
The COVID-19 pandemic has resulted in a wide range of spatial interventions to slow down the spread of the virus. The spatial limitations of narrow public circulation spaces within informal settlements, which house over one billion people around the world, make it impossible for pedestrians to practice physical distancing (or social distancing). In this paper, we propose a flexible mathematical method, named the Cluster Lane Method, for turning a planar circulation network of any size or complexity into a network of unidirectional lanes, making physical distancing possible in narrow circulation spaces by limiting face-to-face interactions. New notions and theorems about oriented graphs in graph theory are introduced. The paper ends with a discussion of the potential implementation of this cost-efficient, low-tech, sustainable solution, and with the introduction of a novel unidirectional tactile paving for the visually impaired.
Wed, 28 October 2020
ARTICLE | doi:10.20944/preprints202010.0550.v2
Subject: Mathematics & Computer Science, Probability And Statistics Keywords: expectation maximization (EM) algorithm; finite mixture model; conditional mixture model; regression model; adaptive regressive model (ARM)
Online: 28 October 2020 (11:18:04 CET)
Expectation maximization (EM) algorithm is a powerful mathematical tool for estimating statistical parameter when data sample contains hidden part and observed part. EM is applied to learn finite mixture model in which the whole distribution of observed variable is average sum of partial distributions. Coverage ratio of every partial distribution is specified by the probability of hidden variable. An application of mixture model is soft clustering in which cluster is modeled by hidden variable whereas each data point can be assigned to more than one cluster and degree of such assignment is represented by the probability of hidden variable. However, such probability in traditional mixture model is simplified as a parameter, which can cause loss of valuable information. Therefore, in this research I propose a so-called conditional mixture model (CMM) in which the probability of hidden variable is modeled as a full probabilistic density function (PDF) that owns individual parameter. CMM aims to extend mixture model. I also propose an application of CMM which is called adaptive regressive model (ARM). Traditional regression model is effective when data sample is scattered equally. If data points are grouped into clusters, regression model tries to learn a unified regression function which goes through all data points. Obviously, such unified function is not effective to evaluate response variable based on grouped data points. The concept “adaptive” of ARM means that ARM solves the ineffectiveness problem by selecting the best cluster of data points firstly and then evaluating response variable within such best cluster. In order words, ARM reduces estimation space of regression model so as to gain high accuracy in calculation.
ARTICLE | doi:10.20944/preprints202010.0584.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: algorithmic probability; universal constructors; self-replication; universal Turing machines; algorithmic information theory; deterministic finite automaton
Online: 28 October 2020 (11:17:03 CET)
In this article we explore the limiting behavior of the universal prior distribution obtained when applied over multiple meta-level hierarchy of programs and output data of a computational automata model. We were motivated to alleviate the effect of Solomonoff's assumption that all computable functions or hypotheses of the same length are equally likely, by weighing each program in turn by the algorithmic probability of their description number encoding. In the limiting case, the set of all possible program strings of a fixed-length converges to a distribution of self-replicating quines and quine-relays - having the structure of a constructor. We discuss how experimental algorithmic information theory provides insights towards understanding the fundamental metrics proposed in this work and reflect on the significance of these result in digital physics and the constructor theory of life.
ARTICLE | doi:10.20944/preprints202010.0577.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Cloud Computing; Health Systems; Security; Privacy; Data Protection; GDPR
Online: 28 October 2020 (10:00:55 CET)
Currently, there are several challenges that Cloud-based health-care Systems, around the world, are facing. The most important issue is to ensure security and privacy or in other words to ensure the confidentiality, integrity and availability of the data. Although the main provisions for data security and privacy were present in the former legal framework for the protection of personal data, the General Data Protection Regulation (GDPR) introduces new concepts and new requirements. In this paper, we present the main changes and the key challenges of the General Data Protection Regulation, and also at the same time we present how the Cloud-based Security Policy methodology proposed in  could be modified in order to be compliant with the GDPR and how Cloud environments can assist developers to build secure and GDPR compliant Cloud-based health Systems. The major concept of this paper is, primarily, to facilitate Cloud Providers in comprehending the framework of the new General Data Protection Regulation and secondly, to identify security measures and security policy rules for the protection of sensitive data in a Cloud-based Health System, following our risk-based Security Policy Methodology that assesses the associated security risks and takes into account different requirements from patients, hospitals, and various other professional and organizational actors.
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Data Visualization; Visual Analytics; Natural Language Processing; Dark Data; Pattern Recognition
Online: 28 October 2020 (07:47:26 CET)
Over the years, there has been a significant rise in the world's scientific knowledge. However, most of it lacks structure and is often termed as Dark Data. Both humans and expert systems have continually faced difficulty in analyzing and comprehending such overwhelming amounts of information which is crucial in solving several real-world problems. Information and data visualization techniques proffer a promising solution to explore such data by allowing quick comprehension of information, the discovery of emerging trends, identification of relationships and patterns, etc. In this tutorial, we utilize the rich corpus of PubMed comprising of more than 30 million citations from biomedical literature to visually explore and understand the underlying key-insights using various information visualization techniques. With this study, we aim to diminish the limitation of human cognition and perception in handling and examining such large volumes of data by speeding up the process of decision making and pattern recognition and enabling decision-makers to fully understand data insights and make informed decisions.
Tue, 27 October 2020
ARTICLE | doi:10.20944/preprints202010.0560.v1
Subject: Mathematics & Computer Science, Logic Keywords: satisfiability; SAT; fractal dimension; complex networks
Online: 27 October 2020 (16:12:41 CET)
In the last years, we have witnessed a remarkable progress of algorithms solving Boolean satisfiability (SAT). The success of these algorithms has been especially relevant in a large number of industrial or real-world applications, for which these SAT solvers are nowadays an essential core part of their solving processes. Interestingly enough, these applications include a very diverse and heterogeneous range of domains, such as hardware verification, planning, and cryptography, among others. Unfortunately, the reasons of the good performance of these solvers on this variety of industrial benchmarks are not completely understood yet. Since SAT solvers’ efficiency is fundamental in various domains, obtaining a better understanding of these algorithms and the reasons of their good performance is crucial. In order to shed light on this question, SAT solvers are often viewed as complex systems with many interconnected components (e.g., conflict analysis and learning mechanism, database management, search restarts) interacting in many unpredictable ways. There is the common belief that the resulting emergent behavior of these complex systems takes advantage of a certain underlying structure of the SAT formula, which is shared by the majority of these industrial problems regardless the domain they come from. Recently, there have been some attempts of characterizing this structure under the lens of complex networks, with the purpose of better understanding the success of the solvers, and potentially improving them. In this paper, we analyze the structure of industrial SAT instances under the lens of self-similarity, and study how the execution of SAT solvers affect that structure. Many real-world graphs exhibit self-similar structure (with small fractal dimension), which means that after rescaling (replacing groups of nodes by a single node), the same kind of structure can be observed. In our analysis, in which we represent SAT instances as graphs, we observe that many industrial SAT formulas exhibit the same kind of structure. Moreover, we analyze how this structure evolves by effects of learning new clauses during the search. In particular, we observe that learned clauses usually contain variables that are close in the graph representation of the formula. This is, the learning mechanism tends to work locally. On the contrary, this learning mechanism on random SAT formulas –which do not exhibit any structure at all– is unable to generate these local clauses. This difference contributes to explain the success of modern SAT solvers on industrial problems.
CONCEPT PAPER | doi:10.20944/preprints202005.0331.v2
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: optimization; multi-objective optimization; decision making; Time
Online: 27 October 2020 (11:42:31 CET)
Multi-objective optimization (MOO) is an optimization involving minimization of several objective functions more than the conventional one objective optimization which have useful applications in Engineering. Many of the current methodologies addresses challenges and solutions to multi-objective optimization problem, which attempts to solve simultaneously several objectives with multiple constraints, subjoined to each objective. Most challenges in MOO are generally subjected to linear inequality constraints that prevent all objectives from being optimized simultaneously. This paper takes short survey and deep analysis of Random and Uniform Entry-Exit time of objectives. It then breaks down process into sub-process and then presents some new concepts by introducing methods in solving problem in MOO, which comes due to periodical objectives that do not stay for the entire duration of process lifetime unlike permanent objectives, which are optimized once for the entire process lifetime. A methodology based on partial optimization that optimizes each objective iteratively and weight convergence method that optimizes sub-group of objectives is given. Furthermore, another method is introduced which involve objective classification, ranking, estimation and prediction where objectives are classified base on their properties, and ranked using a given criteria and in addition estimated for an optimal weight point (pareto optimal point) if it certifies a coveted optimal weight point. Then finally predicted to find how much it deviates from the estimated optimal weight point. Although this paper presents concepts work only, it’s practical application are beyond the scope of this paper, however base on analysis presented, the concept is worthy of igniting further research and application.
ARTICLE | doi:10.20944/preprints202010.0550.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: expectation maximization (EM) algorithm; finite mixture model; conditional mixture model; regression model; adaptive regressive model (ARM)
Online: 27 October 2020 (11:41:42 CET)
Expectation maximization (EM) algorithm is a powerful mathematical tool for estimating statistical parameter when data sample contains hidden part and observed part. EM is applied to learn finite mixture model in which the whole distribution of observed variable is finite sum of partial distributions. Coverage ratio of every partial distribution is specified by the probability of hidden variable. An application of mixture model is soft clustering in which cluster is modeled by hidden variable whereas each data point can be assigned to more than one cluster and degree of such assignment is represented by the probability of hidden variable. However, such probability in traditional mixture model is simplified as a parameter, which can cause loss of valuable information. Therefore, in this research I propose a so-called conditional mixture model (CMM) in which the probability of hidden variable is modeled as a full probabilistic density function (PDF) that owns individual parameter. CMM aims to improve power of mixture model. I also propose an application of CMM which is called adaptive regressive model (ARM). Traditional regression model is effective when data sample is scattered equally. If data points are grouped into clusters, regression model tries to learn a unified regression function which goes through all data points. Obviously, such unified function is not effective to evaluate response variable based on grouped data points. The concept “adaptive” of ARM means that ARM solves the ineffectiveness problem by selecting the best cluster of data points firstly and then evaluating response variable within such best cluster. In order words, ARM reduces estimation space of regression model so as to gain high accuracy in calculation.
ARTICLE | doi:10.20944/preprints202010.0547.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Remote sensing; Multisensor systems; Information theory; Sea Ice
Online: 27 October 2020 (11:27:40 CET)
Automatic ice charting can not be achieved using only SAR modalities. It is fundamental to combine information from other remote sensors with different characteristics for more reliable sea ice characterization. In this paper, we employ principal feature analysis (PFA) to select significant information from multimodal remote sensing data. PFA is a simple yet very effective approach that can be applied to several types of data without loss of physical interpretability. Considering that different homogeneous regions require different types of information, we perform the selection patch-wise. Accordingly, by exploiting the spatial information, we increase the robustness and accuracy of PFA.
Mon, 26 October 2020
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: fuzzy set; comparison measure; representation; disjoint
Online: 26 October 2020 (14:16:55 CET)
This paper analyzes the representation behaviors of a comparison measure between two compared fuzzy sets. Three types of restrictions on two fuzzy sets are considered in this paper: the two disjoint union fuzzy sets, the two disjoint fuzzy sets and the two general fuzzy sets. Differences exist among the numbers of possible representations of a comparison measure for the three types of fuzzy sets restrictions. The value of comparison measure is constant for the two disjoint union fuzzy sets. There are 42 candidate representations of a comparison measure for the two disjoint fuzzy sets. Of which 13 candidate representations with one or two terms can be used to easily calculate and compare a comparison measure.
ARTICLE | doi:10.20944/preprints202010.0528.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Robust Optimization; Optimization Under Uncertainty; Robustness; Stochastic
Online: 26 October 2020 (14:04:54 CET)
A Robust Optimization framework with original concepts and fundamentals also admitting a fusion of ideals from relative regret models and static robust optimization, containing conservatism concepts is disclosed. The algorithm uses a fine-tune strategy to tune the model so the robustness and a target ideality can be mutually achieved with a specified risk. The framework comprises original concepts, a mathematical approach and an algorithm. The statistical treatment of the data with the original concepts from the framework make it able to make short, middle or long-term decision-making setting. The framework has high tractability since the algorithm forces the creation of a setting that makes a robust optimization with the specified risk. The framework can be applied in linear and nonlinear mathematical models since that the objective function is monotonic in the domain of the active convex region. Several examples are solved to best understand the framework and all results demonstrated high tractability and performance. There is a wide range of applications. Along all the text, there is a profound discussion about its philosophy, objective, original concepts, fields of application, statistical and probabilistic fundamentals.
ARTICLE | doi:10.20944/preprints202010.0527.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: non-linear programming 1; Hadi-Vencheh model 2; multiple criteria ABC inventory classification 3
Online: 26 October 2020 (14:01:43 CET)
In this paper, we present an extended version of the Hadi-Vencheh model for multiple criteria ABC inventory classification. The proposed model is a nonlinear weighted product model (WPM) which determines a common set of weights for all the items. Our proposed nonlinear WPM incorporates multiple criteria with different measure units, without converting the performance of each inventory item in terms of each criterion into a normalized attribute value, thereby providing an improvement over the model proposed by Hadi-Vencheh. Our study mainly includes various criteria for ABC classification, and demonstrates an efficient algorithm for solving nonlinear programming problems in which the feasible solution set does not have to be convex. The algorithm presented in this study improves the solution efficiency of the Canonical Coordinates Method (CCM) algorithm substantially when applied to large scale, nonlinear programming problems. The modified algorithm was tested to compare our proposed model results to the results derived using the Hadi-Vencheh model and demonstrate the algorithm's efficacy. The practical implications of the study are to develop an efficient nonlinear optimization solver by optimizing the quality of existing solutions, thus improving time and space efficiency.
ARTICLE | doi:10.20944/preprints202010.0526.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: audio classification; dissimilarity space; siamese network; ensemble of classifiers; pattern recognition; animal audio
Online: 26 October 2020 (13:57:01 CET)
The classifier system proposed in this work combines the dissimilarity spaces produced by a set of Siamese neural networks (SNNs) designed using 4 different backbones, with different clustering techniques for training SVMs for automated animal audio classification. The system is evaluated on two animal audio datasets: one for cat and another for bird vocalizations. Different clustering methods reduce the spectrograms in the dataset to a set of centroids that generate (in both a supervised and unsupervised fashion) the dissimilarity space through the Siamese networks. In addition to feeding the SNNs with spectrograms, additional experiments process the spectrograms using the Heterogeneous Auto-Similarities of Characteristics. Once the similarity spaces are computed, a vector space representation of each pattern is generated that is then trained on a Support Vector Machine (SVM) to classify a spectrogram by its dissimilarity vector. Results demonstrate that the proposed approach performs competitively (without ad-hoc optimization of the clustering methods) on both animal vocalization datasets. To further demonstrate the power of the proposed system, the best stand-alone approach is also evaluated on the challenging Dataset for Environmental Sound Classification (ESC50) dataset. The MATLAB code used in this study is available at https://github.com/LorisNanni.
ARTICLE | doi:10.20944/preprints202010.0522.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: malaria model; transition matrix; Markov chain; malaria statistics
Online: 26 October 2020 (12:22:20 CET)
The purpose of this study is to estimate the mean transitioning probabilities from a Healthy state to malaria positive uncomplicated state or to malaria positive severe state. It also classifies the various transitioning probabilities of moving through the various states based on some baseline characteristics. Malaria test results for 2019 over a 12-month period were collected from the University of Ghana school clinic. An H-U model for the study was developed and the transition rates from the cross-sectional data are indicated. With two states Healthy (H) and Uncomplicated (U) forming a state space, there were four possible transitions. The results show that the probability of transitioning from a Healthy state to a malaria positive state is 0.03% while the probability that an individual will remain at Healthy state (H) after the test is 99.73%. It was found that if an individual is already positive and has taken medication the probability that its second test came out negative is 6.45% while the chances that it will remain positive but uncomplicated is 93.55%. The study also showed that in the long run, about 95.98% of persons who visited the student clinic with malaria symptoms recorded negative tests for malaria parasite while about 4% recorded positive for malaria. In terms of disaggregation by gender, it was realized that the number of reported negative test results were higher for females (97.08%) than for males (96.13%). However, the infection rate is higher for males (3.87%) than females (2.92%). It is recommended that in as much as the University of Ghana has two health centers (a clinic and hospital), there should be a centralized system to track students’ health so research done would not be biased.
Weekly/Monthly article feeds
Choose the subject area that interest you and we will send you notifications of new preprints at your preferred frequency.