REVIEW | doi:10.20944/preprints202105.0663.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Big Data, Internet Data Sources (IDS), Internet of Things (IoT), Sustainable Development Goals (SDGs), Big data Technologies, Big data Challenges
Online: 27 May 2021 (10:31:03 CEST)
It is strongly believed that technology can reap the best only when it can be tamed by all stakeholders. Big data technology has no exception for this and even after a decade of emergence, the technology is still a herculean task and is in nascent stage with respect to applicability for many people. Having understood the gaps in the technology adoption for big data in the contemporary world, the present exploratory research work intended to highlight the possible prospects of big data technologies. It is also advocated as to how the challenges of various fields can be converted as opportunities with the shift in the perspective towards this evolving concept. Examples of apex organizations like (IMF and ITU) and their initiatives of big data technologies with respect to the Sustainable Development Goals (SDGs) are also cited for a broader outlook. The intervention of the responsible organizations along with the respective governments is also much sought for encouraging the technology adoption across all the sections of the market players.
Subject: Computer Science And Mathematics, Computer Science Keywords: big data; data integration; EVMS; construction management
Online: 30 October 2020 (15:35:00 CET)
In the information age today, data are getting more and more important. While other industries achieve tangible improvement by applying cutting edge information technology, the construction industry is still far from being enough. Cost, schedule, and performance control are three major functions in the project execution phase. Along with their individual importance, cost-schedule integration has been a significant challenge over the past five decades in the construction industry. Although a lot of efforts have been put into this development, there is no method used in construction practice. The purpose of this study is to propose a new method to integrate cost and schedule data using big data technology. The proposed algorithm is designed to provide data integrity and flexibility in the integration process, considerable time reduction on building and changing database, and practical use in a construction site. It is expected that the proposed method can transform the current way that field engineers regard information management as one of the troublesome tasks in a data-friendly way.
ARTICLE | doi:10.20944/preprints202009.0747.v1
Subject: Business, Economics And Management, Accounting And Taxation Keywords: Big Data; Business Plan; Budgeting; Budget; Business Strategy.
Online: 30 September 2020 (13:07:58 CEST)
The business planning process can be considered as a strategic phase of any business. Given that the business plan is a management accounting tool, there are countless approaches that can be adopted to prepare it since there is no legal requirement, as opposed to obligations relating to financial accounting. However, in general, every business plan consists of a numerical part (budget) and a narrative part. In this research, the author highlights, on the basis of experiences and commonly used theories, a standard process that can be adaptable to the business plan of any type of activity. The use of big data is highlighted as an essential part of feeding the data of almost all the steps of the budget. The author then manages to determine a generally applicable standard process, indicating all the data necessary to prepare an accurate and reliable business plan. A case study will provide adequate support to the demonstration of the immediate applicability of the proposed model.
ARTICLE | doi:10.20944/preprints202304.0644.v1
Subject: Business, Economics And Management, Business And Management Keywords: Big data predictive analytics; big data culture; competitive strategies; Strategic alliance performance; Pakistani Companies
Online: 20 April 2023 (10:07:08 CEST)
The study is based on the notion that big data predictive analytics is important for developing strategic alliances performance of companies. this study investigates the relationship between big data predictive analytics, big data culture, and competitive strategies 'techniques were adopted, such as descriptive statistics, correlation,regression, etc. using SPSS and SmartPLS statistical software. Hypotheses were tested with bootstrapped analysis using SEM (through SmartPLS). The study developed a structural equation model by using the SEM analysis. The results of the SEM analysis suggested the hypothesized model of the study was valid. The results supported all the hypotheses of the study.Through empirical analysis, demonstrated the conclusion is that the big data predictive analytics has a positive and significant relationship with strategic alliance performance
ARTICLE | doi:10.20944/preprints202205.0334.v1
Subject: Engineering, Control And Systems Engineering Keywords: Backpressure; Big Data; Spark Streaming; Stream Processing
Online: 24 May 2022 (11:47:39 CEST)
In the past decades, a significant rise in the adoption of streaming applications has changed the decision-making process for the industry and academia sectors. This movement led to the emergence of a plurality of Big Data technologies such as Apache Storm, Spark, Heron, Samza, Flink, and other systems to provide in-memory processing for real-time Big Data analysis at high throughput. Spark Streaming represents one of the most popular open-source implementations which handles an ever-increasing data ingestion and processing by using the Unified Memory Manager to manage memory occupancy between storage and processing regions dynamically, which is the focus of this study. The problem behind memory management for data-intensive stream processing pipelines is that the incoming data is faster than the downstream operators can consume. Consequently, the backpressure of Spark acts in the opposite direction of downstream operators. In such a case, the incoming data overwhelms the memory manager and provokes memory leak issues. As a result, it affects the performance of applications generating, e.g., high latency, low throughput, or even data loss. In such a case, the initial intuition motivating our work is that memory management became the critical factor in keeping processing at scale and system stability of Spark. This work provides a deep dive into Spark backpressure, evaluates its structure, presents the main characteristics to support data-intensive streaming pipelines, and investigates the current in-memory-based performance issues.
ARTICLE | doi:10.20944/preprints201804.0144.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: big data; SIEM; correlation analysis; cyber crime profiling
Online: 11 April 2018 (08:39:02 CEST)
The number of SIEM introduction is increasing in order to detect threat patterns in a short period of time with a large amount of structured/unstructured data, to precisely diagnose crisis to threats, and to provide an accurate alarm to an administrator by correlating collected information. However, it is difficult to quickly recognize and handle with various attack situations using a solution equipped with complicated functions during security monitoring. In order to overcome this situation, new detection analysis process has been required, and there is an effort to increase response speed during security monitoring and to expand accurate linkage analysis technology. In this paper, reflecting these requirements, we design and propose profiling auto-generation model that can improve the efficiency and speed of attack detection for potential threats requirements. we design and propose profiling auto-generation model that can improve the efficiency and speed of attack detection for potential threats.
REVIEW | doi:10.20944/preprints202003.0141.v1
Subject: Medicine And Pharmacology, Other Keywords: data sharing; data management; data science; big data; healthcare
Online: 8 March 2020 (16:46:20 CET)
In recent years, more and more health data are being generated. These data come not only from professional health systems, but also from wearable devices. All these data combined form ‘big data’ that can be utilized to optimize treatments for each unique patient (‘precision medicine’). To achieve this precision medicine, it is necessary that hospitals, academia and industry work together to bridge the ‘valley of death’ of translational medicine. However, hospitals and academia often have problems with sharing their data, even though the patient is actually the owner of his/her own health data, and the sharing of data is associated with increased citation rate. Academic hospitals usually invest a lot of time in setting up clinical trials and collecting data, and want to be the first ones to publish papers on this data. The idea that society benefits the most if the patient’s data are shared as soon as possible so that other researchers can work with it, has not taken root yet. There are some publicly available datasets, but these are usually only shared after studies are finished and/or publications have been written based on the data, which means a severe delay of months or even years before others can use the data for analysis. One solution is to incentivize the hospitals to share their data with (other) academic institutes and the industry. Here we discuss several aspects of data sharing in the medical domain: publisher requirements, data ownership, support for data sharing, data sharing initiatives and how the use of federated data might be a solution. We also discuss some potential future developments around data sharing.
ARTICLE | doi:10.20944/preprints201806.0219.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Big data technology; Business intelligence; Data integration; System virtualization.
Online: 13 June 2018 (16:19:48 CEST)
Big Data warehouses are a new class of databases that largely use unstructured and volatile data for analytical purpose. Examples of this kind of data sources are those coming from the Web, such as social networks and blogs, or from sensor networks, where huge amounts of data may be available only for short intervals of time. In order to manage massive data sources, a strategy must be adopted to define multidimensional schemas in presence of fast-changing situations or even undefined business requirements. In the paper, we propose a design methodology that adopts agile and automatic approaches, in order to reduce the time necessary to integrate new data sources and to include new business requirements on the fly. The data are immediately available for analyses, since the underlying architecture is based on a virtual data warehouse that does not require the importing phase. Examples of application of the methodology are presented along the paper in order to show the validity of this approach compared to a traditional one.
ARTICLE | doi:10.20944/preprints202012.0529.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: e-commerce; big data; bibliometric analysis; knowledge mapping
Online: 21 December 2020 (14:24:06 CET)
The e-commerce platform in the digital economy era has evolved into a data platform ecosystem built around data resources and data mining technology systems. The most typical applications of big data are also concentrated in the field of e-commerce. E-commerce companies should first grasp the interactive relationship among the three major factors of data, technology and innovation, e-commerce platform operation is a multidisciplinary research field. It is not easy for researchers to obtain a panoramic view of the knowledge structure in this field. Knowledge graph is a kind of graph that shows the development process and structure relationship of knowledge with the field of knowledge as the object. It is not only a visual knowledge mapping, but also a serialized knowledge pedigree, which provides researchers with a quantitative research method for the development trend of statistics and academic status. The purpose of this research is to help researchers understand the key knowledge, evolutionary trends and research frontiers of current research. This study uses Citespace bibliometric analysis to analyze the data of the Science Net database and finds that: 1) The development of the research field has gone through three stages, and some representative key scholars and key documents have been recognized; 2) the common knowledge mapping of literature The co-occurrence of citations and keywords shows research hotspots; 3) The results of burst detection and central node analysis reveal research frontiers and development trends. Today, the visualization of big data brings different challenges. The abstraction between the world and today's data visualization occurs when the data is captured. Every user sees his own visualization data generated by standardized calculations. At the same time, there are still many controversies in the theoretical model, structure and structural dimensions. This is the direction that future researchers need to further study.
ARTICLE | doi:10.20944/preprints201811.0074.v1
Subject: Medicine And Pharmacology, Psychiatry And Mental Health Keywords: system dynamics modeling; big data; mental distress; diet
Online: 5 November 2018 (02:34:30 CET)
Dietary factors are one of the risk factors that can impact the brain chemistry, which leads to mental distress. Based on our data mining approach, we found that mental distress in men is associated with eating unhealthy food. Our aim in this paper is to apply results from our big data analytics approach to inform system dynamics (SD) modeling to investigate the causal relationships between brain structures, nutrients from food and dietary supplements, and mental health. We perform descriptive analysis based on a large data set to estimate the SD modeling parameters. Finally, we calibrate the model towards a time series data collected for individuals on their dietary and distress patterns. The results reveal that bridging these different methodologies leads to further insights from the SD model and decreases the error of calibrated parameter values. Future research is needed to validate our initial results for investigating the relationship between mental distress and dietary intake.
REVIEW | doi:10.20944/preprints202203.0407.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: big data analytics; healthcare; data technologies; decision making; information management; EHR
Online: 31 March 2022 (12:24:19 CEST)
Big data analytics tools are the use of advanced analytic techniques targeting large and diverse volumes of data that include structured, semi-structured, and unstructured data from different sources and in different sizes from terabytes to zetabytes. The health sector is faced with the need to generate and manage large data sets from various health systems, such as electronic health records and clinical decision support systems. This data can be used by providers, clinicians, and policymakers to plan and implement interventions, detect disease more quickly, predict outcomes, and personalize care delivery. However, little attention is paid to the connection between big data analytics tools and the health sector. Thus, a systematic review of the bibliometric literature (LRSB) was developed to study how the adoption of big data analytics tools and infrastructures will revolutionize the healthcare industry. The review integrated 77 scientific and/or academic documents indexed in SCOPUS presenting up‐to‐date knowledge on current insights on how big data analytics technologies influence the healthcare sector and the different big data analytical tools used. The LRSB provides findings related to the impact of Big Data analytics on the health sector by introducing opportunities and technologies that provide practical solutions to various challenges.
ARTICLE | doi:10.20944/preprints202005.0274.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: big data; deep learning; intelligent systems; medical imaging; multi-data processing
Online: 16 May 2020 (17:43:42 CEST)
Big Data in medicine includes possibly fast processing of large data sets, both current and historical in purpose supporting the diagnosis and therapy of patients' diseases. Support systems for these activities may include pre-programmed rules based on data obtained from the interview medical and automatic analysis of test results diagnostic results will lead to classification of observations to a specific disease entity. The current revolution using Big Data significantly expands the role of computer science in achieving these goals, which is why we propose a Big Data computer data processing system using artificial intelligence to analyze and process medical images.
ARTICLE | doi:10.20944/preprints202105.0601.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Mobile RPG; Big Data; Text Mining; Topic Modeling
Online: 25 May 2021 (10:21:36 CEST)
As RPG has high sales and profits, lots of developers have supplied various RPG to market but it changed to mass production type with sensational advertising, low quality and excessive charging and similar contents which affects game market and users’ game play experience. The author of this paper studied ways to improve mobile RPG by collecting and analyzing users’ reviews using crawling on Google Play Store. The author of this paper used topic modeling that uses text mining technique and LDA (Latent Dirichlet Allocation) to extract meaningful information from collected big data and visualized it. Inferring users’ reviews, figuring out opinions objectively and seeking ways to improve games are helpful in improving mobile RPG that can be played continuously.
ARTICLE | doi:10.20944/preprints202305.0722.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Anomaly detection; Malaria data; Machine learning; big data; epidemic
Online: 10 May 2023 (09:34:36 CEST)
Disease surveillance is critical to monitor ongoing control activities, detect early outbreaks and to inform intervention priorities and policies. Unfortunately, most data from disease surveillance remain under-utilised to support decision-making in real-time. Using the Brazilian Amazon malaria surveillance data as a case study, we explore unsupervised anomaly detection machine learning techniques to analyse and discover potential anomalies. We found that our models are able to detect early outbreaks, peak of outbreaks as well as change points in the proportion of positive malaria cases. Specifically, the sustained rise in malaria in the Brazilian Amazon in 2016 was flagged by several models. We also found that no single model detects all the anomalies across all health regions. The approaches using Clustering-based local outlier algorithm ranked first before Principal component analysis and Stochastic outlier selection in maximising the number of anomalies detected in local health regions. Because of this, we also provide the minimum number of machine learning models (top-k models) to maximise the number of anomalies detected across different health regions. We discovered that the top-3 models that maximise the coverage of the number and types of anomalies detected across the 13 health regions are: Principal component analysis, Stochastic outlier selection and Multi-covariance determinant. Anomaly detection approaches provide interesting solutions to discover patterns of epidemiological importance when confronted with a large volume of data across space and time. Our exploratory approach can be replicated for other diseases and locations to inform timely interventions and actions toward endemic disease control.
ARTICLE | doi:10.20944/preprints202206.0320.v4
Subject: Biology And Life Sciences, Other Keywords: data; reproducibility; FAIR; data reuse; public data; big data; analysis
Online: 2 November 2022 (02:55:49 CET)
With an increasing amount of biological data available publicly, there is a need for a guide on how to successfully download and use this data. The Ten simple rules for using public biological data are: 1) use public data purposefully in your research, 2) evaluate data for your use case, 3) check data reuse requirements and embargoes, 4) be aware of ethics for data reuse, 5) plan for data storage and compute requirements, 6) know what you are downloading, 7) download programmatically and verify integrity, 8) properly cite data, 9) make reprocessed data and models Findable, Accessible, Interoperable, and Reusable (FAIR) and share, and 10) make pipelines and code FAIR and share. These rules are intended as a guide for researchers wanting to make use of available data and to increase data reuse and reproducibility.
ARTICLE | doi:10.20944/preprints201808.0335.v1
Subject: Business, Economics And Management, Business And Management Keywords: big data; maturity model; temporal analytics; advanced business analytics
Online: 18 August 2018 (11:05:24 CEST)
The main aim of this paper is to explore the issue of big data and to propose a conceptual framework for big data, based on the temporal dimension. The Temporal Big Data Maturity Model (TBDMM) is a means for assessing organization’s readiness to fully profit from big data analysis. It allows the measurement of the current state of the organization’s big data assets and analytical tools, and to plan their future development. The framework explicitly incorporates a time dimension, providing a complete means for assessing also the readiness to process temporal data and/or knowledge that can be found in modern sources, such as big data ones. Temporality in the proposed framework extends and enhances the already existing maturity models for big data. This research paper is based on a critical analysis of literature, as well as creative thinking, and on the case-study approach involving multiple cases. The literature-based research has shown that the existing maturity models for big data do not treat the temporal dimension as the basic one. At the same time, dynamic analytics is crucial for a sustainable competitive advantage. This conceptual framework was well received among practitioners, to whom it has been presented during interviews. The participants in the consultations often expressed their need of temporal big data analytics, and hence the temporal approach of the maturity model was widely welcomed.
ARTICLE | doi:10.20944/preprints201810.0253.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: adaptive filtering; set-membership filtering; affine projection; data censoring; big data; outliers
Online: 12 October 2018 (04:57:08 CEST)
In this paper, the set-membership affine projection (SM-AP) algorithm is utilized to censor non-informative data in big data applications. To this end, the probability distribution of the additive noise signal and the excess of mean-squared error (EMSE) in steady-state are employed in order to estimate the threshold parameter of the single threshold SM-AP (ST-SM-AP) algorithm aiming at attaining the desired update rate. Furthermore, by defining an acceptable range for the error signal, the double threshold SM-AP (DT-SM-AP) algorithm is proposed to detect very large errors due to the irrelevant data such as outliers. The DT-SM-AP algorithm can censor non-informative and irrelevant data in big data applications, and it can improve misalignment and convergence rate of the learning process with high computational efficiency. The simulation and numerical results corroborate the superiority of the proposed algorithms over traditional algorithms.
REVIEW | doi:10.20944/preprints202202.0345.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: big data; machine learning; agriculture; challenges; systematic literature review
Online: 28 February 2022 (03:14:56 CET)
Agricultural Big Data is a set of technologies that allows responding to the challenges of the new data era. In conjunction with machine learning, farmers can use data to address different problems such as farmers' decision-making, crops, weeds, animal research, land, food availability and security, weather, and climate change. The purpose of this paper is to synthesize the evidence regarding the challenges involved in implementing machine learning in Agricultural Big Data. We conducted a Systematic Literature Review applying the PRISMA protocol. This review includes 30 papers, published from 2015 to 2020. We develop a framework that summarizes the main challenges encountered, the use of machine learning techniques, as well as the main technologies used. A major challenge is the design of Agricultural Big Data architectures, due to the need to modify the set of technologies adapting the machine learning techniques, as the volume of data increases.
REVIEW | doi:10.20944/preprints202205.0325.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: big data; architecture; agriculture; climate change; systematic literature review
Online: 24 May 2022 (07:42:55 CEST)
Climate change is currently one of the main problems facing agriculture to achieve sustainability. It causes situations such as drought, increased rainfall, and increased diseases, causing a decrease in food production. In order to combat these problems, Agricultural Big Data contributes with tools that allow improving the understanding of complex, multivariate, and unpredictable agricultural ecosystems through the collection, storage, processing, and analysis of vast amounts of data from diverse heterogeneous sources. This research aims to discuss the advancement of technologies used in Agricultural Big Data architectures in the context of climate change. The study aims to highlight the tools used to process, analyze, and visualize the data and discuss the use of the architectures in the crop, water, climate, and soil management, especially to analyze the context, whether it is in Resilience Mitigation or Adaptation. The PRISMA protocol guided the study, finding 33 relevant papers. Despite the advances in this line of research, few papers were found that mention the components of the architectures, in addition to the lack of standards and the use of reference architectures, which allow the proper development of Agricultural Big Data in the context of climate change.
ARTICLE | doi:10.20944/preprints202201.0172.v2
Subject: Business, Economics And Management, Business And Management Keywords: blockchain; healthcare supply chain management; logistics cooperation; big data
Online: 19 January 2022 (12:09:00 CET)
This study emphasizes the necessity of introducing a blockchain-based joint logistics system to strengthen the competency of medical supply chain management (SCM) and tries to develop a healthcare supply chain management (HSCM) competency measurement item through an analytic hierarchy process. The variables needed for using blockchain-based joint logistics are the performance expectations, effort expectations, promotion conditions, and social impact of the UTAUT model, and the HSCM competency results in increased reliability and transparency, enhanced SCM, and enhanced scalability. Word cloud results, analyzing the most important considerations to realize work efficiency among medical industry-related agencies, mentioned numerous words, including sudden situations, delivery, technology trust, information sharing, effectiveness, urgency, etc. This might imply the need to establish a system that can respond immediately to emergency situations during holidays. It could also suggest the importance of real-time information sharing to increase the efficiency of inventory management. Therefore, there is a need of a business model that can increase the visibility of real-time medical SCM through big data analysis. By analyzing the importance of securing reliability based on the blockchain technology in the establishment of a supply chain network for HSCM competency, we reveal that joint logistics can be achieved and synergistic effects can be created by implementing the integrated database to secure HSCM competency. Strengthening partnerships, such as joint logistics, will eventually lead to HSCM competency. In particular, HSCM should seek ways to upgrade its competitive capabilities through big data analysis based on the establishment of a joint logistics system.
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: reproductive health; infertility; big data; Machine Learning; AI; Systems Biology
Online: 18 November 2020 (13:51:46 CET)
Advances in machine learning (ML) and artificial intelligence (AI) are transforming the way we treat patients in ways not even imagined a few years ago. Cancer research is at the forefront of this movement. Infertility, though not a life-threatening condition, affects around 15% of couples trying for a pregnancy. Increasing availability of large datasets from various sources creates an opportunity to introduce ML and AI into infertility prevention and treatment. At present in the field of assisted reproduction, very little is done in order to prevent infertility from arising, with the main focus put on treatment when often advanced maternal age and low ovarian reserve make it very difficult to conceive. A shift from this disease-centric model to a health centric model in infertility is already taking place with more emphasis on the patient as an active participator in the process. Poor quality and incomplete data as well as biological variability remain the main limitations in the widespread and reliable implementation of AI in the field of reproductive medicine. That said, one of the areas where this technology managed to find a foothold is identification of developmentally competent embryos. More work is required however to learn about ways to improve natural conception, the detection and diagnosis of infertility, and improve assisted reproduction treatments (ART) and ultimately, develop clinically useful algorithms able to adjust treatment regimens in order to assure a successful outcome of either fertility preservation or infertility treatment. Progress in genomics, digital technologies and advances in integrative biology has had a tremendousimpact on research and clinical medicine. With the rise of ‘big data’, artificial intelligence, and the advances in molecular profiling, there is an enormous potential to transform not only scientific research progress, but also clinical decision making towards predictive, preventive, and personalized medicine. In the field of reproductive health, there is now an exciting opportunity to leverage these technologies and develop more sophisticated approaches to diagnose and treat infertility disorders. In this review, we present a comprehensive analysis and interpretation of different innovation forces that are driving the emergence of a system approach to the infertility sector. Here we discuss recent influential work and explore the limitations of the use of Machine Learning models in this rapidly developing area.
ARTICLE | doi:10.20944/preprints202002.0294.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: bitmap indexing; processing in memory; memory wall; Big Data; Internet Of Things
Online: 20 February 2020 (08:24:48 CET)
To live in the information society means to be surrounded by billions of electronic devices full of sensors that constantly acquire data. This enormous amount of data must be processed and classified. A solution commonly adopted is to send these data to server farms to be remotely elaborated. The drawback is a huge battery drain due to high amount of information that must be exchanged. To compensate this problem data must be processed locally, near the sensor itself. But this solution requires huge computational capabilities. While microprocessors, even mobile ones, nowadays have enough computational power, their performance are severely limited by the Memory Wall problem. Memories are too slow, so microprocessors cannot fetch enough data from them, greatly limiting their performance. A solution is the Processing-In-Memory (PIM) approach. New memories are designed that are able to elaborate data inside them eliminating the Memory Wall problem. In this work we present an example of such system, using as a case of study the Bitmap Indexing algorithm. Such algorithm is used to classify data coming from many sources in parallel. We propose an hardware accelerator designed around the Processing-In-Memory approach, that is capable of implementing this algorithm and that can also be reconfigured to do other tasks or to work as standard memory. The architecture has been synthesized using CMOS technology. The results that we have obtained highlights that, not only it is possible to process and classify huge amount of data locally, but also that it is possible to obtain this result with a very low power consumption.
ARTICLE | doi:10.20944/preprints201808.0350.v2
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: big data; clustering; data mining; educational data mining; e-learning; profile learning
Online: 19 October 2018 (05:58:05 CEST)
Educational data-mining is an evolving discipline that focuses on the improvement of self-learning and adaptive methods. It is used for finding hidden patterns or intrinsic structures of educational data. In the arena of education, the heterogeneous data is involved and continuously growing in the paradigm of big-data. To extract meaningful information adaptively from big educational data, some specific data mining techniques are needed. This paper presents a clustering approach to partition students into different groups or clusters based on their learning behavior. Furthermore, personalized e-learning system architecture is also presented which detects and responds teaching contents according to the students’ learning capabilities. The primary objective includes the discovery of optimal settings, in which learners can improve their learning capabilities. Moreover, the administration can find essential hidden patterns to bring the effective reforms in the existing system. The clustering methods K-Means, K-Medoids, Density-based Spatial Clustering of Applications with Noise, Agglomerative Hierarchical Cluster Tree and Clustering by Fast Search and Finding of Density Peaks via Heat Diffusion (CFSFDP-HD) are analyzed using educational data mining. It is observed that more robust results can be achieved by the replacement of existing methods with CFSFDP-HD. The data mining techniques are equally effective to analyze the big data to make education systems vigorous.
CONCEPT PAPER | doi:10.20944/preprints202111.0117.v1
Subject: Business, Economics And Management, Business And Management Keywords: Big data predictive analytics; competitive strategies; strategic alliance performance; Telecom sector
Online: 5 November 2021 (11:29:12 CET)
Based on the resource-based theory, the current study examines the relationship between competitive strategies and strategic alliance performance. Furthermore, big data predictive analytics is treated as a boundary condition between competitive strategies and strategic alliance performance. Big data of predictive analytics in operations and industrial management has been a focal point in the current era. There has been little attention has about big data predictive analytics influences on competitive strategies and strategic alliance performance, especially in developing countries like Pakistan. A survey instrument was used to record the responses from 331 employees of the telecom sectors companies working in Pakistan. Study findings show that big competitive strategies have a positive and significant relationship with strategic alliances performance. It was also found that big data predictive analytics plays the role of moderator between competitive strategies and strategic alliance performance. The study add a new perspective and contribution to the literature on big data predictive analytics, strategic alliance performance, and competitive strategies in Pakistan's telecom sector companies. Further, the study results explain that big data analytics is just like the companies' lifeblood in the current era. The efficient and effective use of big data analytics, companies can boost their standards in a competitive environment.
ARTICLE | doi:10.20944/preprints202301.0415.v2
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Psychological Health; Drugs; Twitter; Machine Learning; Big Data; Drug Abuse; Toxicology; Social Factors; Economic Factors; Environmental Factors
Online: 27 February 2023 (13:31:40 CET)
Mental health issues can have significant impacts on individuals and communities and hence on social sustainability. There are several challenges facing mental health treatment, however, more important is to remove the root causes of mental illnesses because doing so can help prevent mental health problems from occurring or recurring. This requires a holistic approach to understanding mental health issues that are missing from the existing research. Mental health should be understood in the context of social and environmental factors. More research and awareness are needed, as well as interventions to address root causes. The effectiveness and risks of medications should also be studied. This paper proposes a big data and machine learning-based approach for the automatic discovery of parameters related to mental health from Twitter data. The parameters are discovered from three different perspectives, Drugs & Treatments, Causes & Effects, and Drug Abuse. We used Twitter to gather 1,048,575 tweets in Arabic about psychological health in Saudi Arabia. We built a big data machine learning software tool for this work. A total of 52 parameters were discovered for all three perspectives. We defined 6 macro-parameters (Diseases & Disorders, Individual Factors, Social & Economic Factors, Treatment Options, Treatment Limitations, and Drug Abuse) to aggregate related parameters. We provide a comprehensive account of mental health, causes, medicines and treatments, mental health and drug effects, and drug abuse, as seen on Twitter, discussed by the public and health professionals. Moreover, we identify their associations with different drugs. The work will open new directions for social media-based identification of drug use and abuse for mental health, as well as other micro and macro factors related to mental health. The methodology can be extended to other diseases and provides a potential for discovering evidence for forensics toxicology from social and digital media.
CONCEPT PAPER | doi:10.20944/preprints202102.0203.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: Bigdata; IoT; Big Data Analytics; Covid-19; healthcare
Online: 8 February 2021 (12:19:28 CET)
— Big Data analytics has come a long way since its inception. This field is growing day by day. With the advent of large handling capacity of computational analysis of modern computing systems as well as Internet of Things (IoT), this field has revolutionized the way we think about data. It has influenced the major domains such as healthcare, automobile, computing, climatology, and space communications. Of late, the health care sector has been largely influenced by this. This communication deals with the areas of healthcare where big data analytics has been largely influential. Encompassing the basics of Big Data Analytics (BDA) driven by IoT, the applications of it in healthcare sector are outlined, accompanied by future expectations. Additionally, it also presents a comprehensive analysis of recent application with special reference to Covid-19 in this sector.
ARTICLE | doi:10.20944/preprints201811.0339.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: eHealth; big data; deep learning; watson; spark; decision support system; prevention pathways
Online: 15 November 2018 (04:14:36 CET)
Data collection and analysis are becoming more and more important in a variety of application domains as long as the novel technologies advance. At the same time, we are experiencing a growing need for human-machine interaction with expert systems pushing research through new knowledge representation models and interaction paradigms. In particular, in the last years eHealth - that indicates all the health-care practices supported by electronic elaboration and remote communications - calls for the availability of smart environment and big computational resources. The aim of this paper is to introduce the HOLMeS (Health On-Line Medical Suggestions) framework. The introduced system proposes to change the eHealth paradigm where a trained machine learning algorithm, deployed on a cluster-computing environment, provides medical suggestion via both chat-bot and web-app modules. The chat-bot, based on deep learning approaches, is able to overcome the limitation of biased interaction between users and software, exhibiting a human-like behavior. Results demonstrate the effectiveness of the machine learning algorithms showing 74.65% of Area Under ROC Curve (AUC) when first-level features are used to assess the occurrence of different prevention pathways. When disease-specific features are added, HOLMeS shows 86.78% of AUC achieving a more specific prevention pathway evaluation.
ARTICLE | doi:10.20944/preprints202208.0083.v1
Subject: Business, Economics And Management, Accounting And Taxation Keywords: Ratios; Financial Crisis; Covid-19; Big Data; Accounting Data
Online: 3 August 2022 (10:42:06 CEST)
The effects of the 2008 financial crisis undoubtedly caused problems not only to the banking sector but also to the real economy of the developed and the developing countries in almost all around the globe. Besides, as is widely known, every banking crisis entails the corresponding cost to the economy of each country affected by it, which results from the shakeout and the restructuring of its financial system. The purpose of this research is to investigate the consequences of the financial crisis and the COVID-19 health crisis and how these affected the course of the four systemic banks (Eurobank, Alpha Bank, National Bank, Piraeus Bank) through the analysis of ratios for the period of 2015-2020.
ARTICLE | doi:10.20944/preprints201810.0601.v1
Subject: Engineering, Civil Engineering Keywords: support vector machine; travelling time; intelligent transportation system; artificial fish swarm algorithm; big data
Online: 25 October 2018 (10:48:45 CEST)
Freeway travelling time is affected by many factors including traffic volume, adverse weather, accident, traffic control and so on. We employ the multiple source data-mining method to analyze freeway travelling time. We collected toll data, weather data, traffic accident disposal logs and other historical data of freeway G5513 in Hunan province, China. Using Support Vector Machine (SVM), we proposed the travelling time model based on these databases. The new SVM model can simulate the nonlinear relationship between travelling time and those factors. In order to improve the precision of the SVM model, we applied Artificial Fish Swarm algorithm to optimize the SVM model parameters, which include the kernel parameter σ, non-sensitive loss function parameter ε, and penalty parameter C. We compared the new optimized SVM model with Back Propagation (BP) neural network and common SVM model, using the historical data collected from freeway G5513. The results show that the accuracy of the optimized SVM model is 17.27% and 16.44% higher than those of the BP neural network model and the common SVM model respectively.
ARTICLE | doi:10.20944/preprints201710.0076.v2
Subject: Computer Science And Mathematics, Information Systems Keywords: big data; machine learning; regularization; data quality; robust learning framework
Online: 17 October 2017 (03:47:41 CEST)
The concept of ‘big data’ has been widely discussed, and its value has been illuminated throughout a variety of domains. To quickly mine potential values and alleviate the ever-increasing volume of information, machine learning is playing an increasingly important role and faces more challenges than ever. Because few studies exist regarding how to modify machine learning techniques to accommodate big data environments, we provide a comprehensive overview of the history of the evolution of big data, the foundations of machine learning, and the bottlenecks and trends of machine learning in the big data era. More specifically, based on learning principals, we discuss regularization to enhance generalization. The challenges of quality in big data are reduced to the curse of dimensionality, class imbalances, concept drift and label noise, and the underlying reasons and mainstream methodologies to address these challenges are introduced. Learning model development has been driven by domain specifics, dataset complexities, and the presence or absence of human involvement. In this paper, we propose a robust learning paradigm by aggregating the aforementioned factors. Over the next few decades, we believe that these perspectives will lead to novel ideas and encourage more studies aimed at incorporating knowledge and establishing data-driven learning systems that involve both data quality considerations and human interactions.
REVIEW | doi:10.20944/preprints201904.0027.v2
Subject: Computer Science And Mathematics, Analysis Keywords: neuroscience; big data; functional Magnetic Resonance (fMRI); pipeline; one platform system
Online: 8 April 2019 (05:46:55 CEST)
In the neuroscience research field, specific for medical imaging analysis, how to mining more latent medical information from big medical data is significant for us to find the solution of diseases. In this review, we focus on neuroimaging data that is functional Magnetic Resonance Imaging (fMRI) which non-invasive techniques, it already becomes popular tools in the clinical neuroscience and functional cognitive science research. After we get fMRI data, we actually have various software and computer programming that including open source and commercial, it's very hard to choose the best software to analyze data. What's worse, it would cause final result imbalance and unstable when we combine more than software together, so that's why we want to make a pipeline to analyze data. On the other hand, with the growing of machine learning, Python has already become one of very hot and popular computer programming. In addition, it is an open source and dynamic computer programming, the communities, libraries and contributors fast increase in the recent year. Through this review, we hope that can make neuroimaging data analysis more easy, stable and uniform base the one platform system.
REVIEW | doi:10.20944/preprints201805.0418.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: big data training and learning; company and business requirements; ethics; impact; decision support; data engineering; open data; smart homes; smart cities; IoT
Online: 29 May 2018 (08:45:52 CEST)
In Data Science we are concerned with the integration of relevant sciences in observed and empirical contexts. This results in the unification of analytical methodologies, and of observed and empirical data contexts. Given the dynamic nature of convergence, described are the origins and many evolutions of the Data Science theme. The following are covered in this article: the rapidly growing post-graduate university course provisioning for Data Science; a preliminary study of employability requirements, and how past eminent work in the social sciences and other areas, certainly mathematics, can be of immediate and direct relevance and benefit for innovative methodology, and for facing and addressing the ethical aspect of Big Data analytics, relating to data aggregation and scale effects. Associated also with Data Science is how direct and indirect outcomes and consequences of Data Science include decision support and policy making, and both qualitative as well as quantitative outcomes. For such reasons, the importance is noted of how Data Science builds collaboratively on other domains, potentially with innovative methodologies and practice. Further sections point towards some of the most major current research issues.
ARTICLE | doi:10.20944/preprints202007.0078.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: personalization; decision making; medical data; artificial intelligence; Data-driving; Big Data; Data Mining; Machine Learning
Online: 5 July 2020 (15:04:17 CEST)
The study was conducted on applying machine learning and data mining methods to personalizing the treatment. This allows investigating individual patient characteristics. Personalization is built on the clustering method and associative rules. It was suggested to determine the average distance between instances for optimal performance metrics finding. The formalization of the medical data pre-processing stage for finding personalized solutions based on current standards and pharmaceutical protocols is proposed. The model of patient data is built. The paper presents the novel approach to clustering built on ensemble of cluster algorithm with better than k-means algorithm Hopkins metrics. The personalized treatment usually is based on decision tree. Such approach requires a lot of computation time and cannot be paralyzed. Therefore, it is proposed to classify persons by conditions, to determine deviations of parameters from the normative parameters of the group, as well as the average parameters. This made it possible to create a personalized approach to treatment for each patient based on long-term monitoring. According to the results of the analysis, it becomes possible to predict the optimal conditions for a particular patient and to find the medicaments treatment according to personal characteristics.
ARTICLE | doi:10.20944/preprints201609.0027.v1
Subject: Business, Economics And Management, Business And Management Keywords: customer complaint process improvement; customer complaint service; big data analysis
Online: 7 September 2016 (11:38:33 CEST)
With the advances in industry and commerce, passengers have become more accepting of environmental sustainability issues; thus, more people now choose to travel by bus. Government administration constitutes an important part of bus transportation services as the government gives the right-of-way to transportation companies allowing them to provide services. When these services are of poor quality, passengers may lodge complaints. The increase in consumer awareness and developments in wireless communication technologies have made it possible for passengers to easily and immediately submit complaints about transportation companies to government institutions, which has brought drastic changes to the supply-demand chain comprised of the public sector, transportation companies, and passengers. This study proposed the use of big data analysis technology including systematized case assignment and data visualization to improve management processes in the public sector and optimize customer complaint services. Taichung City, Taiwan was selected as the research area. There, the customer complaint management process in public sector was improved, effectively solving such issues as station-skipping, allowing the public sector to fully grasp the service level of transportation companies, improving the sustainability of bus operations, and supporting the sustainable development of the public sector-transportation company-passenger supply chain.
ARTICLE | doi:10.20944/preprints202209.0413.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Consortium Blockchain; Ring signature; Blockchain privacy; Blockchain security; Access Control; Blockchain big data
Online: 27 September 2022 (07:35:53 CEST)
Banking sectors commit modern working frameworks and models smooth development based on decentralization with keeping money confront in unused ranges and differing activities. Consortium Blockchain Privacy becomes a major concern and the challenge of Most of banking sectors.Development without being hampered being a major concern it can store confirmed, Data privacy includes assuring protection for both insider ad outsider threats therefore access control of Ring signature could help to secure Privacy of inside and outside threats by secure process by RSBAC using CIA triad privacy Confidentiality, Availability, Integrity.This paper proposes a ring signature-based on access control mechanism for determining who a user is and then regulating that person's access to and use of a system's resources. In a nutshell, access control restricts who has access to a system. It also restricts access to system resources to users who have been identified as having the necessary privileges and permissions. The proposed paradigm satisfies the needs of both workflow and non-workflow systems in an enterprise setting. The traits of the conditional purposes, roles, responsibilities, and policies provide the foundation for it. It ensures that internal risks such as database administrators are protected.Finally, it provides the necessary protection in the event that the data is published.
ARTICLE | doi:10.20944/preprints202106.0654.v1
Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: COVID-19; Mental Health; Depression; Big data; Social media.
Online: 28 June 2021 (13:50:49 CEST)
The novel coronavirus disease (COVID-19) pandemic is provoking a prevalent consequence on mental health because of less interaction among people, economic collapse, negativity, fear of losing jobs, and death of the near and dear ones. To express their mental state, people often are using social media as one of the preferred means. Due to reduced outdoor activities, people are spending more time on social media than usual and expressing their emotion of anxiety, fear, and depression. On a daily basis, about 2.5 quintillion bytes of data are generated on social media, analyzing this big data can become an excellent means to evaluate the effect of COVID-19 on mental health. In this work, we have analyzed data from Twitter microblog (tweets) to find out the effect of COVID-19 on peoples mental health with a special focus on depression. We propose a novel pipeline, based on recurrent neural network (in the form of long-short term memory or LSTM) and convolutional neural network, capable of identifying depressive tweets with an accuracy of 99.42%. Preprocessed using various natural language processing techniques, the aim was to find out depressive emotion from these tweets. Analyzing over 571 thousand tweets posted between October 2019 and May 2020 by 482 users, a significant rise in depressing tweets was observed between February and May of 2020, which indicates as an impact of the long ongoing COVID-19 pandemic situation.
ARTICLE | doi:10.20944/preprints202103.0623.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: SARS-CoV-2; Big Data; Data Analytics; Predictive Models; Schools
Online: 25 March 2021 (14:35:53 CET)
Background: CoronaVirus Disease 2019 (COVID-19) is the main discussed topic world-wide in 2020 and at the beginning of the Italian epidemic, scientists tried to understand the virus diffusion and the epidemic curve of positive cases with controversial findings and numbers. Objectives: In this paper, a data analytics study on the diffusion of COVID-19 in Lombardy Region and Campania Region is developed in order to identify the driver that sparked the second wave in Italy Methods: Starting from all the available official data collected about the diffusion of COVID-19, we analyzed google mobility data, school data and infection data for two big regions in Italy: Lombardy Region and Campania Region, which adopted two different approaches in opening and closing schools. To reinforce our findings, we also extended the analysis to the Emilia Romagna Region. Results: The paper aims at showing how different policies adopted in school opening / closing may have on the impact on the COVID-19 spread. Conclusions: The paper shows that a clear correlation exists between the school contagion and the subsequent temporal overall contagion in a geographical area.
ARTICLE | doi:10.20944/preprints201812.0058.v1
Subject: Engineering, Mechanical Engineering Keywords: big data; parameter estimation; model updating; system identification; sequential Monte Carlo sampler
Online: 4 December 2018 (11:17:24 CET)
In this paper the authors present a method which facilitates computationally efficient parameter estimation of dynamical systems from a continuously growing set of measurement data. It is shown that the proposed method, which utilises Sequential Monte Carlo samplers, is guaranteed to be fully parallelisable (in contrast to Markov chain Monte Carlo methods) and can be applied to a wide variety of scenarios within structural dynamics. Its ability to allow convergence of one's parameter estimates, as more data is analysed, sets it apart from other sequential methods (such as the particle filter).
ARTICLE | doi:10.20944/preprints201810.0273.v1
Subject: Physical Sciences, Astronomy And Astrophysics Keywords: astroparticle physics, cosmic rays, data life cycle management, data curation, meta data, big data, deep learning, open data
Online: 12 October 2018 (14:48:32 CEST)
Modern experimental astroparticle physics features large-scale setups measuring different messengers, namely high-energy particles generated by cosmic accelerators (e.g. supernova remnants, active galactic nuclei, etc): cosmic and gamma rays, neutrinos and recently discovered gravitational waves. Ongoing and future experiments are distributed over the Earth including ground, underground/underwater setups as well as balloon payloads and spacecrafts. The data acquired by these experiments have different formats, storage concepts and publication policies. Such differences are a crucial issue in the era of big data and of multi-messenger analysis strategies in astroparticle physics. We propose a service ASTROPARTICLE.ONLINE in the frame of which we develop an open science system which enables to publish, store, search, select and analyse astroparticle physics data. The cosmic-ray experiments KASCADE-Grande and TAIGA were chosen as pilot experiments to be included in this framework. In the first step of our initiative we will develop and test the following components of the full data life cycle concept: (i) describing, storing and reusing of astroparticle data; (ii) software for performing multi-experiment and multi-messenger analyses like deep-learning methods; (iii) outreach including example applications and tutorial for students and scientists outside the specific research field. In the present paper we describe the concepts of our initiative, and in particular the plans toward a common, federated astroparticle data storage.
SHORT NOTE | doi:10.20944/preprints202211.0056.v1
Subject: Biology And Life Sciences, Biology And Biotechnology Keywords: Precision Livestock Farming; Digital Agriculure; Smart Farming; In Ovo Sexing; Big Data; Artificial Intelligence
Online: 2 November 2022 (11:03:44 CET)
Current commercial, pre-commercial, and experimental in ovo techniques for the sex determination of fertilised eggs employ either minimally invasive biomolecular assays (extracting fluid via a small laser-drilled window in the eggshell, for detection of genetic or hormonal biomarkers), analysis of volatile compounds emitted from the eggshell, visible imaging, and reflectance or transmission spectroscopic analysis exploiting molecular optical fluorescence, polarisation, and scattering phenomena, including various combinations of these modalities. , to date no endeavour employing the NIR and FTIR based spectroscopic techniques has resulted in a commercially sustainable solution to the egg sexing problem. Besides achieving only subpar performance in overall accuracy, specificity, and sensitivity, the least invasive of the current state-of-the-art optical methods still requires, creating a transmission window (fenestration) of 12–15 mm diameter through to the mammillae layer of the shell, proximal to the external shell membrane, which can affect the incubation or post-hatch development viability of up to 10% of incubated eggs. Multimodal solution combining Raman spectroscopy and hyperspectral imaging has strong prospects to overcome the hard barriers existing before the perfection of a non-invasive in-line process for high reliability and rapid throughput for sex determination of eggs within 3 days of incubation. The method for sexing of chicken embryos needs to take a multipronged approach in collecting and analyzing spectral data that points to biomarkers using the machine learning approaches to look for nanomolar to picomolar concentrations of these in the fluid.
REVIEW | doi:10.20944/preprints202103.0402.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: anesthesia; anesthesiology; big data; registries; database research; acute pain; pain management; postoperative pain; regional anesthesia; regional analgesia.
Online: 15 March 2021 (17:45:39 CET)
The digital transformation of healthcare is advancing, leading to an increasing availability of clinical data for research. Perioperative big data initiatives were established to monitor treatment quality and benchmark outcomes. However, big data analyzes have long exceeded the status of pure quality surveillance instruments. Large retrospective studies nowadays often represent the first approach to new questions in clinical research and pave the way for more expensive and resource intensive prospective trials. As a consequence, utilization of big data in acute pain and regional anesthesia research considerably increased over the last decade. Multicentric clinical registries and administrative databases (e.g., healthcare claims databases) have collected millions of cases until today, on which basis several important research questions were approached. In acute pain research, big data was used to assess postoperative pain outcomes, opioid utilization, and the efficiency of multimodal pain management strategies. In regional anesthesia, adverse events and potential benefits of regional anesthesia on postoperative morbidity and mortality were evaluated. This article provides a narrative review on the growing importance of big data for research in acute postoperative pain and regional anesthesia.
ARTICLE | doi:10.20944/preprints201906.0174.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: Business excellence; information technology; implementation challenge; ISO 20000; big data management.
Online: 18 June 2019 (10:56:19 CEST)
This study contributes to the literature by exploring challenges to implementing ISO 20000-1 in an emerging economy context, and suggests ways to overcome these challenges. A survey-based methodology was adopted. The data were analyzed using principal component analysis. The results indicated that senior management support was the most significant challenge for the successful implementation of IT Service Management (ITSM) systems. Other significant challenges were the justification of significant investment, premium customer support, co-operation and co-ordination among IT support teams, proper documentation, and effective process design The findings help managers introduce IT service management system (ISO 20000-1:2011) as well as improving IT service delivery system in IT support organizations for managing big data in an emerging economy. In the future, cross-firm and cross-country studies on challenges to ISO 20000 can be conducted. Also, interpretive structural model (ISM) can be formulated to examine the interrelationships among the identified challenges to ISO 20000.
ARTICLE | doi:10.20944/preprints202111.0029.v1
Subject: Social Sciences, Decision Sciences Keywords: Real-world fuel consumption rate; machine learning; big data; light-duty vehicle; China
Online: 2 November 2021 (09:40:05 CET)
Private vehicle travel is the most basic mode of transportation, and the effective control of the real-world fuel consumption rate of light-duty vehicles plays a vital role in promoting sustainable economic development as well as achieving a green low-carbon society. Therefore, the impact factors of individual carbon emission must be elucidated. This study builds five different models to estimate real-world fuel consumption rate of light-duty vehicles in China. The results reveal that the Light Gradient Boosting Machine (LightGBM) model performs better than the linear regression, Naïve Bayes regression, Neural Network regression, and Decision Tree regression models, with mean absolute error of 0.911 L/100 km, mean absolute percentage error of 10.4%, mean square error of 1.536, and R squared (R2) of 0.642. This study also assesses a large number of factors, from which three most important factors are extracted, namely, reference fuel consumption rate value, engine power and light-duty vehicle brand. Furthermore, a comparative analysis reveals that the vehicle factors with greater impact on real-world fuel consumption rate are vehicle brand, engine power, and engine displacement. Average air pressure, average temperature, and sunshine time are the three most important climate factors.
ARTICLE | doi:10.20944/preprints202106.0187.v3
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: SARS-CoV2; Biomathematics; Benford law; trials; Epidemiology; Fibonacci; data analysis; big data
Online: 11 June 2021 (15:47:44 CEST)
The Benford method can be used to detect manipulation of epidemiological or trial data during the validation of new drugs. We extend here the Benford method after having detected particular properties for the Fibonacci values 1, 2, 3, 5 and 8 of the first decimal of 10 runs of official epidemiological data published in France and Italy (positive cases, intensive care, and deaths) for the periods of March 1 to May 30, 2020 and 2021, each with 91 raw data. This new method – called “BFP” for Benford-Fibonacci-Perez - is positive in all 10 cases (i.e. 910 values) with an average of favorable cases close to 80%, which, in our opinion, would validate the reliability of these basic data.
Subject: Computer Science And Mathematics, Information Systems Keywords: Academic Analytics; data storage; education and big data; analysis of data; learning analytics
Online: 19 July 2020 (20:37:39 CEST)
Business Intelligence, defined by  as "the ability to understand the interrelations of the facts that are presented in such a way that it can guide the action towards achieving a desired goal", has been used since 1958 for the transformation of data into information, and of information into knowledge, to be used when making decisions in a business environment. But, what would happen if we took the same principles of business intelligence and applied them to the academic environment? The answer would be the creation of Academic Analytics, a term defined by  as the process of evaluating and analyzing organizational information from university systems for reporting and making decisions, whose characteristics allow it to be used more and more in institutions, since the information they accumulate about their students and teachers gathers data such as academic performance, student success, persistence, and retention . Academic Analytics enables an analysis of data that is very important for making decisions in the educational institutional environment, aggregating valuable information in the academic research activity and providing easy to use business intelligence tools. This article shows a proposal for creating an information system based on Academic Analytics, using ASP.Net technology and trusting storage in the database engine Microsoft SQL Server, designing a model that is supported by Academic Analytics for the collection and analysis of data from the information systems of educational institutions. The idea that was conceived proposes a system that is capable of displaying statistics on the historical data of students and teachers taken over academic periods, without having direct access to institutional databases, with the purpose of gathering the information that the director, the teacher, and finally the student need for making decisions. The model was validated with information taken from students and teachers during the last five years, and the export format of the data was pdf, csv, and xls files. The findings allow us to state that it is extremely important to analyze the data that is in the information systems of the educational institutions for making decisions. After the validation of the model, it was established that it is a must for students to know the reports of their academic performance in order to carry out a process of self-evaluation, as well as for teachers to be able to see the results of the data obtained in order to carry out processes of self-evaluation, and adaptation of content and dynamics in the classrooms, and finally for the head of the program to make decisions.
ARTICLE | doi:10.20944/preprints201810.0469.v1
Subject: Engineering, Control And Systems Engineering Keywords: energy efficiency; big data analytics; QoS-IoT; internet of things; smart city; WSN; green computing
Online: 22 October 2018 (05:27:42 CEST)
Various heterogeneous devices or objects shall be integrated for transplant and seamless communication under the umbrella of internet of things (IoT). It would facilitate the open accession of data for the growth of a glut of digital services. To build a general framework of IoT is very complex task because of heterogeneity in devices, technologies, platforms and services, operating in the same system. In this paper, we mainly focus on the framework for big data analytics in smart city applications , which being a broad category specifies the different domains for each application. IoT is intended to support the vision of Smart City, where advance technologies will be used for communication for the quality life of citizens. A novel approach used in this paper, is for enhancing the energy conservation and to reduce the delay in big data gathering at tiny sensor nodes used in IoT framework. To implement the smart city scenario in terms of big data in IoT, an efficient (optimized in quality of service) WSN is required where communication of nodes is energy effcient. That is why, a new protocol QoS-IoT is proposed on the top layer of the architecture which is validated over the traditional protocols.
ARTICLE | doi:10.20944/preprints201705.0116.v1
Subject: Engineering, Mechanical Engineering Keywords: thermal runaway; big-data platform; battery systems; electric vehicles; National Service and Management Center for Electric Vehicles
Online: 16 May 2017 (03:18:57 CEST)
This paper presents a thermal runaway prognosis scheme based on the big-data platform and entropy method for battery systems in electric vehicles. It can simultaneously realize the diagnosis and prognosis of thermal runaway caused by the temperature fault through monitoring battery temperature during vehicular operations. A vast quantity of real-time voltage monitoring data was collected in the National Service and Management Center for Electric Vehicles (NSMC-EV) in Beijing to verify the effectiveness of the presented method. The results show that the proposed method can accurately forecast both the time and location of the temperature fault within battery packs. Furthermore, a temperature security management strategy for thermal runaway is proposed on the basis of the Z-score approach and the abnormity coefficient is set to make real-time precaution of temperature abnormity.
REVIEW | doi:10.20944/preprints202004.0383.v1
Subject: Medicine And Pharmacology, Other Keywords: COVID-19; coronavirus pandemic; big data; epidemic outbreak; artificial intelligence (AI); deep learning
Online: 21 April 2020 (09:01:45 CEST)
The very first infected novel coronavirus case (COVID-19) was found in Hubei, China in Dec. 2019. The COVID-19 pandemic has spread over 215 countries and areas in the world, and has significantly affected every aspect of our daily lives. At the time of writing this article, the numbers of infected cases and deaths still increase significantly and have no sign of a well-controlled situation, e.g., as of 14 April 2020, a cumulative total of 1,853,265 (118,854) infected (dead) COVID-19 cases were reported in the world. Motivated by recent advances and applications of artificial intelligence (AI) and big data in various areas, this paper aims at emphasizing their importance in responding to the COVID-19 outbreak and preventing the severe effects of the COVID-19 pandemic. We firstly present an overview of AI and big data, then identify their applications in fighting against COVID-19, next highlight challenges and issues associated with state-of-the-art solutions, and finally come up with recommendations for the communications to effectively control the COVID-19 situation. It is expected that this paper provides researchers and communities with new insights into the ways AI and big data improve the COVID-19 situation, and drives further studies in stopping the COVID-19 outbreak.
ARTICLE | doi:10.20944/preprints202002.0143.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: ocean; big-data; cite-space; co-authorship analysis; co-citation analysis; keywords co-occurrence analysis; visualization
Online: 11 February 2020 (09:41:17 CET)
Ocean big data is the scientific practice of using big data technology in the marine field. Data from satellites, manned spacecraft, space stations, airship, unmanned aerial vehicles, shore-based radar and observation stations, exploration platforms, buoys, underwater gliders, submersibles, and submarine observation networks are seamlessly combined into the ocean’s big data. Increasing numbers of scholars have tried to fully analyze the ocean’s big data. To explore the key research technology knowledge graphs related to ocean big data, articles between 1990 and 2020 were collected from the “Web of Science”. By comparing bibliometric software and using the visualization software Cite-Space, the pivotal literature related to ocean big data, as well as countries, institutions, categories, and keywords, were visualized and recognized. Journal co-citation analysis networks can help determine the national distribution of core journals. Co-citation analysis networks for documents show authors who are influential at key technical levels. Key co-occurrence analysis network keywords can determine research hot spots and research frontiers. The three supporting elements of marine big data research are shown in the co-citation network. These elements are author, institution, and country. By examining the co-occurrence of keywords, the key technology research directions for future marine big data were determined.
ARTICLE | doi:10.20944/preprints201905.0263.v1
Subject: Computer Science And Mathematics, Computational Mathematics Keywords: natural gas; gas compressibility factor; group method of data handling (GMDH); big data; equation of state; correlation
Online: 22 May 2019 (08:29:32 CEST)
A Natural gas is increasingly being sought after as a vital source of energy, given that its production is very cheap and does not cause the same environmental harms that other resources, such as coal combustion, do. Understanding and characterizing the behavior of natural gas is essential in hydrocarbon reservoir engineering, natural gas transport, and process. Natural gas compressibility factor, as a critical parameter, defines the compression and expansion characteristics of natural gas under different conditions. In this study, a simple second-order polynomial model based on the group method of data handling (GMDH) is presented to determine the compressibility factor of different natural gases at different conditions, using corresponding state principles. The accuracy of the model evaluated through graphical and statistical analyses. The results show that the model is capable of predicting natural gas compressibility with an average absolute error of only 2.88%, a root means square of 0.03, and a regression coefficient of 0.92. The performance of the developed model compared to widely known, previously published equations of state (EOSs) and correlations, and the precision of the results demonstrates its superiority over all other correlations and EOSs.
ARTICLE | doi:10.20944/preprints202104.0482.v1
Subject: Business, Economics And Management, Accounting And Taxation Keywords: Smart Scenic; environmental disasters management; organization transformation; system design; Big Data; Internet of Things
Online: 19 April 2021 (13:19:35 CEST)
Abstract: Intensity of natural and man-made disasters is increasing day by day. Disaster is one of the major threats that affects the sustainable development of tourist attractions. Big data and Internet of Things(IoT) will greatly improve the disaster management. Based on the Big Data and IoT, a tourism attraction disaster management system is designed, divided into several stages namely pre-disaster early warning prevention, disaster mitigation, recovery and reconstruction after disaster and updating disaster planning. Then, the system flow is analysed, as well as the system structure is constructed. Additional, system function and its operation flow are introduced, including disaster warning, disaster relief, disaster assessment, real-time monitoring and supporting disaster planning functions. Finally, an application case is introduced. Research intends to improve tourism area disasters management.
ARTICLE | doi:10.20944/preprints202211.0034.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Blockchain; Smart Contract; Point Cloud; Security; Privacy Preservation; Software-Defined Network (SND); Big Data; Assurance; Resilience.
Online: 2 November 2022 (02:18:50 CET)
The rapid development of three-dimensional (3D) acquisition technology based on 3D sensors provides a large volume of data, which is often represented in the form of point clouds. Point cloud representation can preserve the original geometric information along with associated attributes in a 3D space. Therefore, it has been widely adopted in many scene-understanding-related applications such as virtual reality (VR) and autonomous driving. However, the massive amount of point cloud data aggregated from distributed 3D sensors also poses challenges for secure data collection, management, storage, and sharing. Thanks to the characteristics of decentralization and security nature, Blockchain has a great potential to improve point cloud services and enhance security and privacy preservation. Inspired by the rationales behind Software Defined Network (SDN) technology, this paper envisions SAUSA, a blockchain-based authentication network that is capable of recording, tracking, and auditing the access, usage, and storage of 3D point cloud data sets in their life-cycle in a decentralized manner. SAUSA adopts an SDN-enabled point cloud service architecture which allows for efficient data processing and delivery to satisfy diverse Quality-of-Service (QoS) requirements. A blockchain-based authentication framework is proposed to ensure security and privacy preservation in point cloud data acquisition, storage, and analytics. Leveraging smart contracts for digitizing access control policies and point cloud data on the blockchain, data owners have full control of their 3D sensors and point clouds. In addition, anyone can verify the authenticity and integrity of point clouds in use without relying on a third party. Moreover, SAUSA integrates a decentralized storage platform to store encrypted point clouds while recording references of raw data on the distributed ledger. Such a hybrid on-chain and off-chain storage strategy not only improves robustness and availability but also ensures privacy preservation for sensitive information in point cloud applications. A proof-of-concept prototype is implemented and tested on a physical network. The experimental evaluation validates the feasibility and effectiveness of the proposed SAUSA solution.
ARTICLE | doi:10.20944/preprints202305.0856.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: quality evaluation of school management; compulsory education stage; big data technology; visualization techniques; evaluation models
Online: 11 May 2023 (13:26:38 CEST)
With the spread of compulsory education emerged school management problems continued, and the quality of school management in compulsory education has attracted a great deal of attention in China. However, the application of information technology in the field is not yet detailed and wide, resulting in problems of heavy workload and high difficulty in the whole evaluation process. Accordingly, we use big data technologies such as Apache Spark, Apache Hive, and SPSS to carry out data cleaning, correlation analysis, dynamic factor analysis, principal component analysis, and visual display on 1760 sample data from 40 primary and secondary schools in Q Province in China, and constructs a model school management of quality evaluation in the compulsory education stage, which reduces the 22 management tasks required for previous evaluation to 5, greatly reducing the workload and difficulty of evaluation. It has improved the efficiency and accuracy of evaluation, and further promoted the simultaneous development of education of five domains and education equity in the compulsory education stage.
REVIEW | doi:10.20944/preprints202211.0161.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: High Performance Computing (HPC); big data; High Performance Data Analytics (HPDS); con-vergence; data locality; spark; Hadoop; design patterns; process mapping; in-situ data analysis
Online: 9 November 2022 (01:38:34 CET)
Big data has revolutionised science and technology leading to the transformation of our societies. High Performance Computing (HPC) provides the necessary computational power for big data analysis using artificial intelligence and methods. Traditionally HPC and big data had focused on different problem domains and had grown into two different ecosystems. Efforts have been underway for the last few years on bringing the best of both paradigms into HPC and big converged architectures. Designing HPC and big data converged systems is a hard task requiring careful placement of data, analytics, and other computational tasks such that the desired performance is achieved with the least amount of resources. Energy efficiency has become the biggest hurdle in the realisation of HPC, big data, and converged systems capable of delivering exascale and beyond performance. Data locality is a key parameter of HPDA system design as moving even a byte costs heavily both in time and energy with an increase in the size of the system. Performance in terms of time and energy are the most important factors for users, particularly energy, due to it being the major hurdle in high performance system design and the increasing focus on green energy systems due to environmental sustainability. Data locality is a broad term that encapsulates different aspects including bringing computations to data, minimizing data movement by efficient exploitation of cache hierarchies, reducing intra- and inter-node communications, locality-aware process and thread mapping, and in-situ and in-transit data analysis. This paper provides an extensive review of the cutting-edge on data locality in HPC, big data, and converged systems. We review the literature on data locality in HPC, big data, and converged environments and discuss challenges, opportunities, and future directions. Subsequently, using the knowledge gained from this extensive review, we propose a system architecture for future HPC and big data converged systems. To the best of our knowledge, there is no such review on data locality in converged HPC and big data systems.
ARTICLE | doi:10.20944/preprints202110.0260.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: big data; data acquisition; data visualization; data exchange; dashboard; frequency stability; Grafana lab; Power Quality; GPS reference; frequency measurement.
Online: 18 October 2021 (18:07:43 CEST)
This article proposes a measurement solution designed to monitor instantaneous frequency in power systems. It uses a data acquisition module and a GPS receiver for time stamping. A program in Python takes care of receiving the data, calculating the frequency, and finally transferring the measurement results to a database. The frequency is calculated with two different methods, which are compared in the article. The stored data is visualized using the Grafana platform, thus demonstrating its potential for comparing scientific data. The system as a whole constitutes an efficient low cost solution as a data acquisition system.
ARTICLE | doi:10.20944/preprints201805.0353.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: big data; big data system; energy; district heating; reinforcement learning
Online: 24 May 2018 (16:05:27 CEST)
This paper presents a study on the thermal efficiency improvement of the user equipment room in the district heating system based on reinforcement learning , and suggests a general method of constructing a learning network(DQN) using deep Q learning, which is a reinforcement learning algorithm that does not specify a model. In addition, we introduce the big data platform system and the integrated heat management system for the energy field in the massive data processing from the IoT sensor installed in large number of thermal energy control facilities.
ARTICLE | doi:10.20944/preprints201901.0130.v1
Subject: Business, Economics And Management, Business And Management Keywords: internationalisation of SMEs; big data; market-oriented information; relational database; supply chain network; optimized database; trade condition; data visualization
Online: 14 January 2019 (10:04:03 CET)
There have been many discussions on the globalisation of SMEs, but it is true that there is not enough academic achievement after such the study of Born global (BG) ventures. The internationalisation of SMEs (Small and Medium Enterprises) is not easy because they lack resources or capabilities compared to multinational corporations. This study investigated the role of government in assisting the internationalisation of SMEs. In particular, SMEs lacked the ability to acquire market-oriented information, so we’ve established the scheme of efficient information support system for the internationalisation of SMEs. In other words, we proposed an information analysis system through the establishment of a relational database constructed for market-oriented information support. KISTI (Korea Institute of Science and Technology Information), which is one of the government-funded research institutes in the Republic of Korea, provided information support to the SMEs dealing with hydrazine related products. This study suggests this case for the market-oriented information support of the government in the internationalisation of SMEs. The research on information support of the government is meaningful in that it suggests a way to support SMEs in practical level.
ARTICLE | doi:10.20944/preprints202210.0472.v1
Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: media; journalism; deep journalism; labor markets; Great Resignation; Quiet Quitting; Millennials; Generation Z; Big Data Analytics; Natural Language Processing (NLP)
Online: 31 October 2022 (08:33:34 CET)
We live in the information age and, ironically, meeting the core function of journalism – i.e., to provide people access to unbiased information – has never been more difficult. This paper explores deep journalism, our data-driven Artificial Intelligence (AI) based journalism approach to study how the LinkedIn media could be useful for journalism. Specifically, we apply our deep journalism approach to LinkedIn to automatically extract and analyse big data to provide the public with information about labour markets, people’s skills and education, and businesses and industries from multi-generational perspectives. The Great Resignation and Quiet Quitting phenomena coupled with rapidly changing generational attitudes are bringing unprecedented and uncertain changes to labour markets and our economies and societies, and hence the need for journalistic investigations into these topics is highly significant. We combine big data and machine learning to create a whole machine learning pipeline and a software tool for journalism that allows discovering parameters for age dynamics in labour markets using LinkedIn data. We collect a total of 57,000 posts from LinkedIn and use it to discover 15 parameters by Latent Dirichlet Allocation algorithm (LDA) and group them into five macro-parameters, namely Generations-Specific Issues, Skills & Qualifications, Employment Sectors, Consumer Industries, and Employment Issues. The journalism approach used in this paper can automatically discover and make objective, cross-sectional, and multi-perspective information available to all. It can bring rigour to journalism by making it easy to generate information using machine learning and can make tools and information available so that anyone can uncover information about matters of public importance. This work is novel since none of the earlier works have reported such an approach and tool and leveraged it to use LinkedIn media for journalism and to discover multigenerational perspectives (parameters) for age dynamics in labour markets. The approach could be extended with additional AI tools and other media.
ARTICLE | doi:10.20944/preprints202008.0254.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: feature selection; k-means; silhouette measure; clustering; big data; fault classification; sensor data; time-series data
Online: 11 August 2020 (06:26:43 CEST)
Feature selection is a crucial step to overcome the curse of dimensionality problem in data mining. This work proposes Recursive k-means Silhouette Elimination (RkSE) as a new unsupervised feature selection algorithm to reduce dimensionality in univariate and multivariate time-series datasets. Where k-means clustering is applied recursively to select the cluster representative features, following a unique application of silhouette measure for each cluster and a user-defined threshold as the feature selection or elimination criteria. The proposed method is evaluated on a hydraulic test rig, multi sensor readings in two different fashions: (1) Reduce the dimensionality in a multivariate classification problem using various classifiers of different functionalities. (2) Classification of univariate data in a sliding window scenario, where RkSE is used as a window compression method, to reduce the window dimensionality by selecting the best time points in a sliding window. Moreover, the results are validated using 10-fold cross validation technique. As well as, compared to the results when the classification is pulled directly with no feature selection applied. Additionally, a new taxonomy for k-means based feature selection methods is proposed. The experimental results and observations in the two comprehensive experiments demonstrated in this work reveal the capabilities and accuracy of the proposed method.
COMMUNICATION | doi:10.20944/preprints202206.0383.v2
Subject: Computer Science And Mathematics, Information Systems Keywords: Exoskeleton; Twitter; Tweets; Big Data; social media; Data Mining; dataset; Data Science; Natural Language Processing; Information Retrieval
Online: 21 July 2022 (04:06:53 CEST)
The exoskeleton technology has been rapidly advancing in the recent past due to its multitude of applications and diverse use-cases in assisted living, military, healthcare, firefighting, and industry 4.0. The exoskeleton market is projected to increase by multiple times of its current value within the next two years. Therefore, it is crucial to study the degree and trends of user interest, views, opinions, perspectives, attitudes, acceptance, feedback, engagement, buying behavior, and satisfaction, towards exoskeletons, for which the availability of Big Data of conversations about exoskeletons is necessary. The Internet of Everything style of today's living, characterized by people spending more time on the internet than ever before, with a specific focus on social media platforms, holds the potential for the development of such a dataset, by the mining of relevant social media conversations. Twitter, one such social media platform, is highly popular amongst all age groups, where the topics found in the conversation paradigms include emerging technologies such as exoskeletons. To address this research challenge, this work makes two scientific contributions to this field. First, it presents an open-access dataset of about 140,000 tweets about exoskeletons that were posted in a 5-year period from May 21, 2017, to May 21, 2022. Second, based on a comprehensive review of the recent works in the fields of Big Data, Natural Language Processing, Information Retrieval, Data Mining, Pattern Recognition, and Artificial Intelligence that may be applied to relevant Twitter data for advancing research, innovation, and discovery in the field of exoskeleton research, a total of 100 Research Questions are presented for researchers to study, analyze, evaluate, ideate, and investigate based on this dataset.
ARTICLE | doi:10.20944/preprints201812.0016.v1
Subject: Social Sciences, Library And Information Sciences Keywords: corpus linguistics; language modeling; big data; language data; databases; monitor corpora; documentary analysis; nuclear power; government regulation; tobacco documents
Online: 3 December 2018 (09:16:14 CET)
With the influence of Big Data culture on qualitative data collection, acquisition, and processing, it is becoming increasingly important that social scientists understand the complexity underlying data collection and the resulting models and analyses. Systematic approaches for creating computationally tractable models need to be employed in order to create representative, specialized reference corpora subsampled from Big Language Data sources. Even more importantly, any such method must be tested and vetted for its reproducibility and consistency in generating a representative model of a particular population in question. This article considers and tests one such method for Big Language Data downsampling of digitally-accessible language data to determine both how to operationalize this form of corpus model creation, as well as testing whether the method is reproducible. Using the U.S. Nuclear Regulatory Commission's public documentation database as a test source, the sampling method's procedure was evaluated to assess variation in the rate of which documents were deemed fit for inclusion or exclusion from the corpus across four iterations. The findings of this study indicate that such a principled sampling method is viable, thus necessitating the need for an approach for creating language-based models that account for extralinguistic factors and linguistic characteristics of documents.
ARTICLE | doi:10.20944/preprints202008.0053.v1
Subject: Physical Sciences, Atomic And Molecular Physics Keywords: Google Trend; Particulate Matter; National Ambient Air Quality Monitoring Information System; Chronic obstructive pulmonary disease; Big Data
Online: 2 August 2020 (18:29:51 CEST)
Depending on the characteristics of the industrial area, toxicity evaluation of human body, risk assessment and health impact assessment may directly cause cancer due to air pollution. Environmental data collection is from August 2018 to January 31, 2019, and the average, minimum, and maximum values of air pollution data respectively. According to the global data on global trends using the Big Data, high blood pressure is confirmed at 33rd place in the world, and myocardial infarction among the environmental diseases is confirmed to be lower than Korea. Disease that occurred in Jeolla province industrial complex considering the characteristics of our country was identified as representative. Air pollutants are considered to be the causes of allergic diseases in Korea. PM10 was found to be higher than the control area (28.8804348 (㎍ / ㎥), 31.7065217 (㎍ / ㎥) and 32.8532609 (㎍ / ㎥). The mean concentrations of PM2.5 in the middle and high exposure areas were lower than those of the control areas, but the highest in the intermediate exposure areas was 16.5978261 (㎍ / ㎥), 16.1086957 (㎍ / ㎥) and 17.1847826 (㎍ / ㎥) respectively. The relationship between the major variables of environmental exposure in Yeosu was confirmed to be correlated with high blood pressure, chronic obstructive pulmonary disease (COPD), bronchitis, cerebrovascular, diabetes, thyroid disease, sinus infection, anemia and pneumonia.
ARTICLE | doi:10.20944/preprints202207.0121.v5
Subject: Physical Sciences, Astronomy And Astrophysics Keywords: The Cyclic Universe; Big Bang and Big Crunch; Cosmology; Gravitational force; Dark Energy; Dark Matter
Online: 6 March 2023 (16:14:30 CET)
The cyclic universe theory is a model of cosmic evolution according to which the universe undergoes endless cycles of expansion and cooling, each beginning with a “big bang” and ending in a “big crunch”. In this paper we propose a unique property of Space-time, this particular and marvelous nature of space shows us that space can stretch, expand, and shrink. This property of space is caused the size of the Universe changed over time: growing or shrinking. The observed accelerated expansion, which relates the stretching of Shrunk space for the new theory, is derived. This theory is based on three underlying notions: First, the big bang is not the beginning of space or time, but rather at the very beginning fraction of a second there was an infinite pressure of infinite Shrunk space in the cosmic singularity, that pressure gave rise to the big bang, and caused the rapidly growing of space, and all other forms of energy are transformed into new matter and radiation and a new period of expansion and cooling begins. Second, there was a previous phase leading up to it, with multiple cycles of contraction and expansion that repeat indefinitely. Third, the two principal long range forces are the gravitational force and the pressure of shrink space. They are the two most fundamental quantities in the universe that govern cosmic evolution. They may provide the clockwork mechanism that operates our eternal cyclic universe. The universe will not continue to expand forever, no need however, for dark energy and dark matter. This new model of Space-time and its unique properties enables us to describe a sequence of events from the Big Bang to the Big Crunch.
ARTICLE | doi:10.20944/preprints202105.0226.v2
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: energy efficiency; electric drive; electric motor control; frequency converter; Industrial Internet of Things; edge computing; Big Data; Key Performance Indicators; KPI; dashboard
Online: 8 September 2021 (13:15:18 CEST)
The article presents a method of generating Key Performance Indicators related to electric motor energy efficiency on the basis of Big Data gathered and processed in frequency converter. The authors proved that using the proposed solution it is possible to specify the relation between the control mode of an electric drive and the control quality-energy consumption ratio in the start-up phase as well as in the steady operation with various mechanical loads. The tests were carried out on a stand equipped with two electric motors (one driving, the other used to apply the load by adjusting the parameters of the built-in brake). The measurements were made in two load cases, for motor control modes available in industrially applied frequency converters (scalar V/f, vector Voltage Flux Control without encoder, vector Voltage Flux Control with encoder, vector Current Flux Control and Vector Current Flux Control with torque control). During the experiments values of current intensities (active and output), the actual frequency value, IxT utilization factor, relative torque and the current rotational speed were measured and processed. Based on the data the level of the energy efficiency was determined for various control modes.
ARTICLE | doi:10.20944/preprints202302.0066.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Smart Tourism; Sustainable Tourism; Natural language Processing (NLP); Big Data Analytics; Deep Learning; Machine Learning; Unsupervised Learning; Bidirectional Encoder Representations from Transformers (BERT); Literature Review; Smart Societies
Online: 3 February 2023 (09:47:55 CET)
The Global natural and manmade events are exposing the fragility of the tourism industry and its impact on the global economy. Prior to the COVID-19 pandemic, tourism contributed 10.3% to the global GDP and employed 333 million people but saw a significant decline due to the pandemic. Sustainable and smart tourism requires collaboration from all stakeholders and a comprehensive understanding of global and local issues to drive responsible and innovative growth in the sector. This paper presents an approach for leveraging big data and deep learning to dis-cover holistic, multi-perspective (e.g., local, cultural, national, and international) and objective information on a subject. Specifically, we develop a machine learning pipeline to extract parameters from academic literature and public opinions on Twitter, providing a unique and comprehensive view of the industry from both academic and public perspectives. The academic-view dataset was created from the Scopus database and contains 156,759 research articles from 2000 to 2022, which were modelled to identify 33 distinct parameters in 4 categories: Tourism Types, Planning, Challenges, and Media & Technologies. A Twitter dataset of 485,813 tweets was collected over 18 months starting March 2021 to August 2022 to showcase public perception of tourism in Saudi Arabia, which was modelled to reveal 13 parameters categorized into two broader sets: Tourist Attractions and Tourism Services. Discovering system parameters are re-quired to embed autonomous capabilities in systems and for decision-making and problem-solving during system design and operations. The proposed approach improves AI-based information discovery by extending the use of scientific literature, Twitter, and other sources for autonomous, dynamic optimizations of systems, promoting novel research in the tourism sector and contributing to the development of smart and sustainable societies. The paper also presents a comprehensive knowledge structure and literature review of the tourism sector based on over 250 research articles.
ARTICLE | doi:10.20944/preprints202012.0507.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: HIV; big data; Africa; epidemiology
Online: 21 December 2020 (11:14:08 CET)
Background. Predisposition to HIV+ is influenced by a wide range of correlated economic, environmental, demographic, social, and behavioral factors. While evidence among a candidate handful have strong evidence, there is lack of a consensus among the vast array of variables measured in large surveys. Methods. We performed a comprehensive data-driven search for correlates of HIV positivity in >600,000 participants of the Demographic and Health Survey (DHS) across 29 sub-Saharan African countries from 2003 to 2017. We associated a total of 7,251 and of 6,288 unique variables with HIV+ in females and males respectively in each of the 50 surveys. We performed a meta-analysis within countries to attain 29 country-specific associations. Results. We identified 344 (5.4% out possible) and 373 (5.1%) associations with HIV+ in males and females, respectively, with robust statistical support. The identified associations are consistent in directionality across countries and sexes. The association sizes among individual correlates and their predictive capability was low to modest, but comparable to established factors. Among the identified associations, variables identifying being head of household among females was identified in 17 countries with a mean odds ratio (OR) of 2.5 (OR range: 1.1-3.5, R2 = 0.01). Other common associations were identified with marital status, education, age, and ownership of land or livestock. Conclusions. Our continent-wide search for variables has identified under-recognized variables associated with HIV+ that are consistent across the continent and sex. Many of the association sizes are as high as established risk factors for HIV+, including male circumcision.
ARTICLE | doi:10.20944/preprints201706.0115.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: big data,；Hadoop； visualization； model
Online: 26 June 2017 (06:07:51 CEST)
In era of ever-expanding data and knowledge, we lack a centralized system that maps all the faculties to their research works. This problem has not been addressed in the past and it becomes challenging for students to connect with the right faculty of their domain. Since we have so many colleges and faculties this lies in the category of big data problem. In this paper, we present a model which works on the distributed computing environment to tackle big data. The proposed model uses apache spark as an execution engine and hive as database. The results are visualized with the help of Tableau that is connected to Apache Hive to achieve distributed computing.
TECHNICAL NOTE | doi:10.20944/preprints202206.0252.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: SAR; InSAR; Sentinel-1; Big data
Online: 17 June 2022 (09:00:40 CEST)
We describe an efficient and cost effective data access mechanism for Sentinel-1 TOPS 1 mode bursts. Our data access mechanism enables burst-based data access and processing, thereby 2 eliminating ESA’s Sentinel-1 SLC data packaging conventions as a bottleneck to large scale processing. 3 Pipeline throughput is now determined by available compute resources and efficiency of the analysis 4 algorithms. For targeted infrastructure monitoring studies, we are able to generate coregistered, 5 geocoded stacks of SLCs for any AOI in the world in a few minutes. In addition, we describe our 6 global scale radar backscatter and interferometric products and associated pipeline design decisions 7 that ensure geolocation consistency across the suite of derived products from Sentinel-1 data. Finally, 8 we discuss the benefits and limitations of working with geocoded SAR SLC data.
BRIEF REPORT | doi:10.20944/preprints202007.0198.v1
Subject: Physical Sciences, Astronomy And Astrophysics Keywords: gravitation; dark matter; redshift; big bang
Online: 9 July 2020 (17:25:47 CEST)
A close inspection of Zwicky's seminal papers on the dynamics of galaxy clusters reveals that the discrepancy discovered between the dynamical mass and the luminous mass of clusters has been widely overestimated in 1933 as a consequence of several factors, among which the excessive value of the Hubble constant $H_0$, then believed to be about seven times higher than today's average estimate. Taking account, in addition, of our present knowledge of classical dark matter inside galaxies, the contradiction can be reduced by a large factor. To explain the rather small remaining discrepancy of the order of 5, instead of appealing to a hypothetic exotic dark matter, the possibility of a inhomogeneous gravity is suggested. This is consistent with the ``cosmic tapestry" found in the eighties by De Lapparent and her co-authors, showing that the cosmos is highly inhomogeneous at large scale. A possible foundation for inhomogeneous gravitation is the universally discredited ancient theory of Fatio de Duillier and Lesage on pushing gravity, possibly revised to avoid the main criticisms which led to its oblivion. This model incidentally opens the window towards a completely non-standard representation of cosmos, and more basically calls to develop fundamental investigation to find the origin of the large scale inhomogeneity in the distribution of luminous matter
ARTICLE | doi:10.20944/preprints201904.0281.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Cluster computing, Big Data, Spark, Hadoop.
Online: 25 April 2019 (11:22:27 CEST)
The article provides detailed information about the new technologies based on cluster computing Hadoop and Apache Spark. The experimental task of processing logistic regression with the help of these technologies is considered. The findings on the comparison of the performance of cluster computing of Hadoop and Apache Spark are revealed and substantiated.
Subject: Physical Sciences, Acoustics Keywords: Big Bounce Model, Closed Universe, Cosmological Curvature, Big Crunch, Cyclic Universe, Heat Engine Model for Universe
Online: 16 February 2021 (13:41:49 CET)
Assuming a geometrically closed universe, we predict a value for the cosmic curvature, , a value within current observational bounds. We also propose a thermodynamic heat engine model for the universe, which bypasses the need for an inflaton field. Our model is based on a Carnot Cycle where we have isothermal expansion, followed by adiabatic expansion, followed by isothermal contraction, followed by adiabatic contraction, bringing us back to our original starting point. For the working substance, we focus specifically on the CMB radiation filling the collective voids in the universe. Using this construct, we identify cosmic inflation as the isothermal expansion phase, which lasts just under, . The collective CMB volume we see today only increases by a factor of 5.65 times during this process, and homogeneity and perturbations in the CMB are explained. The singularity problem is avoided and we have a clear mechanism for the work done by the cosmos in causing expansion, and later contraction. For scaling laws with respect to the density parameters in Friedmann’s equations, we will assume a susceptibility model for space, where, , the smeared cosmic susceptibility, decreases with increasing cosmic scale parameter, . Within this framework, we can predict a maximum Hubble volume with minimum CMB temperature for the voids before contraction begins, as well as a minimum volume with maximum CMB temperature when expansion starts. The thermodynamic heat cycle deviates from efficiency in converting heat energy into mechanical energy (expansion) by a minuscule amount, namely, . The significance of this number is not known.
ARTICLE | doi:10.20944/preprints202107.0024.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: high-density lipoprotein cholesterol; hypertension; blood pressure; low high-density lipoprotein cholesterol; extremely high high-density lipoprotein cholesterol; body mass index; big data
Online: 1 July 2021 (11:53:04 CEST)
Background Although high-density lipoprotein has cardioprotective effects, the association between serum high-density lipoprotein cholesterol (HDL-C) and hypertension is poorly understood. Objective We investigated whether low and high concentrations of HDL-C are associated with hypertension using a large healthcare dataset. Methods In a community-based cross-sectional study of 1,493,152 Japanese people aged 40–74 years who underwent a health checkup, blood pressures and clinical parameters, including nine HDL-C concentrations (20–110 mg/dL or over) were investigated. Results A crude U-shaped relationship was observed between the nine HDL-C concentrations and blood pressure in males (n = 830,669), while a left-to-right inverted J-shaped relationship was observed in females(n = 662,483). An age-adjusted logistic regression analysis showed J-shaped relationships (left-to-right inversion in females) between HDL-C and odds ratios for hypertension (≥140/90 mmHg), with lower limits of 60–79 mg/dL in males and 90–99 mg/dL in females, which were unchanged after adjusting for smoking, habitual exercise, alcohol consumption, and pharmacotherapy for hypertension, dyslipidemia, and diabetes. However, further adjustment for body mass index and serum triglyceride concentration revealed latent positive linear associations between HDL-C and hypertension, although the association between extremely high HDL-C (≥100 mg/dL) and hypertension was attenuated in non-alcohol drinkers. Conclusion Both low and extremely high HDL-C concentrations are associated with hypertension. The former association may be dependent on excess fat mass, which is often concomitant with low HDL-C, whereas the latter association may be dependent on frequent alcohol consumption.
ARTICLE | doi:10.20944/preprints202201.0106.v1
Subject: Physical Sciences, Astronomy And Astrophysics Keywords: Cosmology, Cosmogenesis, Relativity, Spacetime, Hypergeometrical Universe Theory, Dark Matter, Dark Energy, L-CDM, Big Bang, Big Pop
Online: 10 January 2022 (12:14:01 CET)
HU is the Hypergeometrical Universe Theory (HU)[1-8], proposed in 2006, where the Universe is a Lightspeed Expanding Hyperspherical Hypersurface and Gravitation is an absolute-velocity-dependent, epoch-dependent force. Here we introduce the Big Pop Cosmogenesis and show our calculations associated with the Equation of State of the Universe. This article is the first in a series of articles[9-22] supporting the paradigm shift.
ARTICLE | doi:10.20944/preprints201810.0560.v1
Subject: Arts And Humanities, Philosophy Keywords: natural philosophy; cosmology; emptiness; vacuum; void; dark energy; space flight; exoplanet; big freeze; big crunch; everyday lifeworld
Online: 24 October 2018 (09:27:57 CEST)
The cosmological relevance of emptiness—that is, space without bodies—is not yet sufficiently appreciated in natural philosophy. This paper addresses two aspects of cosmic emptiness from the perspective of natural philosophy: the distances to the stars in the closer cosmic environment and the expansion of space as a result of the accelerated expansion of the universe. Both aspects will be discussed from both a historical and a systematic perspective. Emptiness can be interpreted as “coming” in a two-fold sense: Whereas in the past knowledge of emptiness as it were came to human beings, in the future it is coming insofar as its relevance in the cosmos will increase.The longer and more closely emptiness was studied since the beginning of modernity, the larger became the spaces over which it was found to extend. From a systematic perspective, I will show with regard to the closer cosmic environment that the earth may be separated from the perhaps habitable planets of other stars by an emptiness that is inimical to life and cannot be traversed by humans. This assumption is a result of the discussion of the constraints and possibilities of interstellar space travel as defined by the known natural laws and technical means. With the accelerated expansion of the universe, the distances to other galaxies (outside of the so-called local group) are increasing. According to the current standard model of cosmology and assuming that the acceleration will remain constant, in the distant future this expansion will lead first to a substantial change in the epistemic conditions of cosmological knowledge and finally to the completion of the cosmic emptiness and of its relevance, respectively. Imagining the postulated completely empty last state leads human thought to the very limits of what is conceivable.
ARTICLE | doi:10.20944/preprints202204.0295.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: Forecasting; SARIMA; Holt-Winters; Climate; Big Data
Online: 29 April 2022 (08:44:28 CEST)
As its capital, Jakarta plays a critical role in boosting Indonesia’s economic growth and setting the precedent for broader change outside of the city. One crucial avenue of inquiry to better understand, and prepare for, the future of a country so heavily impacted by disastrous weather events is understanding the effects of climate change through data. This study investigates meteorological data collected from 1996 to 2021 and compares the application of the SARIMA and the Holt-Winters methods to predict the future influence of climatic parameters on Jakarta’s weather. The performance of the SARIMA method is proven to provide better results than the Holt-Winter models and both methods showed the best performances when forecasting the humidity data. The results of the forecast are able to demonstrate the characteristic of the climate in Jakarta, with dry season ranging from May to October and wet season ranging from November to April.
REVIEW | doi:10.20944/preprints202106.0597.v1
Subject: Physical Sciences, Astronomy And Astrophysics Keywords: spacetime; relativistic cosmology; big bang model; inflation
Online: 24 June 2021 (09:33:15 CEST)
In this review article the study of the development of relativistic cosmology and introduction of inflation in it is carried out. We study the properties of standard cosmological model developed in the framework of relativistic cosmology and the geometric structure of spacetime connected coherently with it. We examine the geometric properties of space and spacetime ingrained into the standard model of cosmology. The big bang model of the beginning of the universe is based on the standard model which succumbed to failure in explaining the flatness and the large-scale homogeneity of the universe as demonstrated by observational evidence. These cosmological problems were resolved by introducing a brief acceleratedly expanding phase in the very early universe known as inflation. Cosmic inflation by setting the initial conditions of the standard big bang model resolves these problems of the theory. We discuss how the inflationary paradigm solves these problems by proposing the fast expansion period in the early universe.
Subject: Social Sciences, Geography, Planning And Development Keywords: 'Big Things'; Starchitecture; Agritecture; Parkitecture; Urban Prairies
Online: 5 April 2021 (16:02:43 CEST)
This article analyses three recent shifts in what  called the geography of ‘Big Things’ meaning the contemporary functions and adaptability of modern city centre architecture. We periodise the three styles conventionally into the fashionable ‘Starchitecture’ of the 1990s, the repurposed ‘Agritecture’ of the 2000s and the parodising ‘Parkitecture’ of the 2010s. Starchitecture was the form of new architecture coinciding with the rise of neo-liberalism in its brief era of global urban competitiveness prevalent in the 1990s. After the Great Financial Crash of 2007-8 the market for high-rise emblems of iconic, thrusting, skyscrapers and giant downtown and suburban shopping malls waned and online shopping and working from home destroyed the main rental values of the CBD. In some illustrious cases ‘Agritecture’ caused re-purposed office blocks and other CBD accompaniments to be re-purposed as settings for high-rise urban farming, especially aquaponics and hydroponic horticulture. Now, Covid-19 has further undermined traditional CBD property markets causing some administrations to decide to bulldoze their ‘deadmalls’ and replace them with urban prairie landscapes, inviting the designation ‘Parkitecture’ for the bucolic results. The paper presents an account of these transitions by reference to questions raised by urban cultural scholars like Jane M. Jacobs and Jean Gottmann to figure out answers in time and space to questions their work poses.
ARTICLE | doi:10.20944/preprints202011.0010.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Big Data; Clustering; Distributed system; Machine learning
Online: 2 November 2020 (10:00:29 CET)
In the field of machine learning, cluster analysis has always been a very important technology for determining useful or implicit characteristics in the data. However, the current mainstream cluster analysis algorithms require comprehensive analysis of the overall data to obtain the best parameters in the algorithm. As a result, handling large-scale datasets would be difficult. This research proposes a distributed related clustering mechanism for Unsupervised Learning, which assumes that if adjacent data are similar, a group can be formed by relating to more data points. Therefore, when processing data, large-scale datasets can be distributed to multiple computers, and the correlation of any two datasets in each computer can be calculated simultaneously. Later, results are processed through aggregation and filtering before assembled into groups. This method would greatly reduce the pre-processing and execution time of the dataset; in practical application, it only needs to focus on how the relevance of the data is designed. In addition, the experimental results show the accuracy, applicability, and ease of use of this method.
Subject: Engineering, Energy And Fuel Technology Keywords: Deep learning; Big data; Machine learning; Biofuels
Online: 30 September 2020 (11:19:52 CEST)
The importance of energy systems and its role in economics and politics is not hidden for anyone. This issue is not only important for the advanced industrialized countries, which are major energy consumers, but is also important for oil-rich countries. In addition to the nature of these fuels which contains polluting substances, the issue of their ending up has aggravated the growing concern. Biofuels can be used in different fields for energy production like electricity production, power production or for transportation. Various scenarios have been written about the estimated biofuels from different sources in the future energy system. The availability of biofuels for the electricity market, heating and liquid fuels is very important. Accordingly, the need for handling, modelling, decision making and future forecasting for biofuels can be one of the main challenges for scientists. Recently, machine learning and deep learning techniques have been popular in modeling, optimizing and handling the biodiesel production, consumption and its environmental impacts. The main aim of this study is to evaluate the ML and DL techniques developed for handling biofuels production, consumption and environmental impacts, both for modeling and optimization purposes. This will help for sustainable biofuel production for the future generations.
ARTICLE | doi:10.20944/preprints201806.0175.v2
Subject: Physical Sciences, Particle And Field Physics Keywords: cosmology; big bang; dark energy; neutrinos; gravitation
Online: 28 October 2019 (06:52:16 CET)
The ΛCDM model successfully models the expansion of matter in the universe with an expansion of the underlying metric. However, it does not address the physical origin of the big bang and dark energy. A model of cosmology is proposed, where the state of high energy density of the big bang is created by the collapse of an antineutrino star that has exceeded its Chandrasekhar limit. To allow the first neutrino stars and antineutrino stars to form naturally from an initial quantum vacuum state, it helps to assume that antimatter has negative gravitational mass. While it may prove incorrect, this assumption may also help identify dark energy. The degenerate remnant of an antineutrino star can today have an average mass density that is similar to the dark energy density of the ΛCDM model. When in hydrostatic equilibrium, this antineutrino star remnant can emit isothermal cosmic microwave background radiation and accelerate matter radially. This model and the ΛCDM model are in similar quantitative agreement with supernova distance measurements. Other observational tests of the above model are also discussed.
ARTICLE | doi:10.20944/preprints201901.0277.v1
Subject: Public Health And Healthcare, Nursing Keywords: personality; burnout; engagement; Big Five; healthcare personnel
Online: 28 January 2019 (12:00:59 CET)
The burnout syndrome, which affects so many healthcare workers, has recently awakened wide interest due to the severe repercussions related to its appearance. Even though job factors are determinant to its development, not all individuals exposed to the same work conditions show burnout, which demonstrates the importance of individual variables such as personality. The purpose of this study was to determine personality characteristics of a sample of nursing professionals based on the Big Five model, and then, having determined the personality profiles, analyze the differences in burnout and engagement based on those profiles. The sample was made up of 1236 nurses. An ad hoc questionnaire was prepared to collect the sociodemographic data, and the Brief Burnout Questionnaire, the Utrecht Work Engagement Scale and the Big Five Inventory-10 were used. The results showed that the existence of burnout in this group of workers, is associated negatively with extraversion, agreeableness, conscientiousness and openness to experience, and positively with the neuroticism personality trait. These personality factors showed the opposite pattern with regard to engagement. Three different personality profiles were also found in nursing personnel, in which professionals who had a profile marked by strong neuroticism and low scores on the rest of the personality traits where those who were most affected by burnout.
ARTICLE | doi:10.20944/preprints202101.0017.v2
Subject: Physical Sciences, Acoustics Keywords: Oscillating universe; big bang; big bounce; Hubble constant; dark energy; dark matter; inflation; vacuum energy density; Casimir effect
Online: 15 January 2021 (09:47:00 CET)
In cosmology dark energy and dark matter are included in the CDM model, but they are still completely unknown. On the other hand the trans-Planckian problem leads to unlikely high photon energies for black holes. We introduce a model with quantized black hole matter. This minimizes the trans- Planckian problem extremely and leads to a scalar field in the oscillating universe model. We show that the scalar field has the same characteristics as a vacuum energy field and leads to the same Casimir effect. Shortly after the beginning of the big bounce this field decays locally and leads to the production of dark matter. In this model no inflation theory is needed. We emphasize that this model is mainly a phenomenological approach with the aim of new impetus to the discussion.
ARTICLE | doi:10.20944/preprints202211.0254.v2
Subject: Physical Sciences, Mathematical Physics Keywords: singularity; infinite; Big Bang; universe evolution; scientific theory.
Online: 30 December 2022 (09:54:37 CET)
It is advisable to avoid and, even better, demystify such grandiose terms as "infinity" or "singularity" in the description of the cosmos. Its proliferation does not positively contribute to the understanding of key concepts that are essential for an updated account of its origin and evolutionary history. It will be here argued that, as a matter of fact, there are no infinities in physics, in the real world: all which appear, in any given formulation of nature by means of mathematical equations, actually arise from extrapolations, which are made beyond the bounds of validity of the equations themselves. Such crucial point is rather well-known, but too often forgotten, as discussed in this paper with several examples; namely, the famous Big Bang singularity and others, which appeared before in classical mechanics and electrodynamics, and notably in the quantization of field theories. A brief description of the Universe’s history and evolution follows. Special emphasis is put on what is presently known, from detailed observations of the cosmos and, complementarily, from advanced experiments of very-high-energy physics. To conclude, a future perspective on how this knowledge might soon improve is given.
ARTICLE | doi:10.20944/preprints202108.0471.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Big data; Health prevention; Machine learning; Medical data
Online: 24 August 2021 (14:00:12 CEST)
CVDs are a leading cause of death globally. In CVDs, the heart is unable to deliver enough blood to other body regions. Since effective and accurate diagnosis of CVDs is essential for CVD prevention and treatment, machine learning (ML) techniques can be effectively and reliably used to discern patients suffering from a CVD from those who do not suffer from any heart condition. Namely, machine learning algorithms (MLAs) play a key role in the diagnosis of CVDs through predictive models that allow us to identify the main risks factors influencing CVD development. In this study, we analyze the performance of ten MLAs on two datasets for CVD prediction and two for CVD diagnosis. Algorithm performance is analyzed on top-two and top-four dataset attributes/features with respect to five performance metrics –accuracy, precision, recall, f1-score, and roc-auc – using the train-test split technique and k-fold cross-validation. Our study identifies the top two and four attributes from each CVD diagnosis/prediction dataset. As our main findings, the ten MLAs exhibited appropriate diagnosis and predictive performance; hence, they can be successfully implemented for improving current CVD diagnosis efforts and help patients around the world, especially in regions where medical staff is lacking.
REVIEW | doi:10.20944/preprints202001.0378.v1
Subject: Computer Science And Mathematics, Mathematical And Computational Biology Keywords: workflows; containers; cloud computing; Kubernetes; big data; reproducibility
Online: 31 January 2020 (05:15:01 CET)
Containers are gaining popularity in life science research as they provide a solution for encompassing dependencies of provisioned tools, simplify software installations for end users and offer a form of isolation between processes. Scientific workflows are ideal for chaining containers into data analysis pipelines to aid in creating reproducible analyses. In this manuscript we review a number of approaches to using containers as implemented in the workflow tools Nextflow, Galaxy, Pachyderm, Argo, Kubeflow, Luigi and SciPipe, when deployed in cloud environments. A particular focus is placed on the workflow tool’s interaction with the Kubernetes container orchestration framework.
ARTICLE | doi:10.20944/preprints201810.0711.v1
Subject: Biology And Life Sciences, Biophysics Keywords: order; entropy; chaos; evolution; cosmic mind; big bang
Online: 30 October 2018 (07:50:20 CET)
We discuss the role of the opposing principles of order and disorder in physical and biological systems in determining stability, growth and evolution and bring forth the potential role of a cosmic ordering agency. We analyze its role in decreasing entropy by coarse-graining and hence in determining the initial low entropy state of the big bang universe. Since all physical and biological systems have either cycles of order and disorder alternating, or may have chaotic evolution with non-linear laws, the same is expected of the dynamics of the whole universe as well. The entropy of the initial state of the universe could be low because of the reduction of degrees of freedom (DoF) as one moves from physical encoding to neural encoding and then on to psychic encoding of information in a nested manner by coarse-graining. It is by such encoding that this cosmic agency enables the universe to pass through the big crunch phase and then rolls it out as the big bang universe from the initial state of low entropy.
ARTICLE | doi:10.20944/preprints202304.0052.v1
Subject: Biology And Life Sciences, Immunology And Microbiology Keywords: Bibliometric analysis; Biofilm; Big data; Machine learning; Artificial intelligence
Online: 4 April 2023 (16:07:54 CEST)
Biofilm is a complex community of microorganisms that are attached to surfaces and encased in a self-produced extracellular matrix. Machine learning (ML) techniques have been applied to various aspects of biofilm research, such as predicting biofilm formation, identifying key genes, and designing new therapeutic strategies. In this study, we conducted a bibliometric analysis of machine learning in biofilm research to provide a comprehensive overview of the current state of the field. We searched the Web of Science database for articles published included "machine learning biofilm". A total of 126 articles were identified and analysed. Our results showed that the number of publications on machine learning in biofilm has been increasing rapidly over the past decade, indicating a growing interest in the application of ML techniques to biofilm research. The analysis also revealed that the most common research topics in this area were related to biofilm formation, prediction, and control. Furthermore, the most frequently used ML techniques in biofilm research were artificial neural networks and support vector machines. Overall, our study provides valuable insights into the current trends and future directions of machine learning in biofilm research. It also highlights the importance of interdisciplinary collaboration between biofilm researchers and ML experts to drive innovation in this field.
CASE REPORT | doi:10.20944/preprints202110.0006.v1
Subject: Business, Economics And Management, Econometrics And Statistics Keywords: Credit scoring; Credit risk model; Big data; Digital footprints
Online: 1 October 2021 (11:32:58 CEST)
This study is the first to examine whether the performance of credit rating, one of the most important data-based decision-making of banks, can be improved by using banking system log data that is extensively accumulated inside the bank for system operation. This study uses the log data recorded for the mobile app system of Kakaobank, a leading internet bank used by more than 14 million people in Korea. After generating candidate variables from Kakaobank's vast log data, we develop a credit scoring model by utilizing variables with high information values. Consequently, the discrimination power of the new model compared to the credit bureau grades was significantly improved by 1.84% points based on the Kolmogorov–Smirnov statistics. Therefore, the results of this study imply that if a bank utilizes its log data that have already been extensively accumulated inside the bank, decision-making systems, including credit scoring, can be efficiently improved at a low cost.
ARTICLE | doi:10.20944/preprints202109.0199.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Big Five; Natural Language Processing; Personality Detection; Artificial Intelligence
Online: 13 September 2021 (09:59:25 CEST)
Personality is the most critical feature that tells us about an individual. It is the collection of the individual’s thoughts, opinions, emotions and more. Personality detection is an emerging field in research and Deep Learning models have only recently started being developed. There is a need for a larger dataset that is unbiased as the current dataset that is used is in the form of questionnaires that the individuals themselves answer, hence increasing the chance of unconscious bias. We have used the famous stream-of-consciousness essays collated by James Pennbaker and Laura King. We have used the Big Five Model often known as the five-factor model or OCEAN model. Document-level feature extraction has been performed using Google’s word2vec embeddings and Mairesse features. The processed data has been fed into a deep convolutional network and a binary classifier has been used to classify the presence or absence of the personality trait. Hold- out method has been used to evaluate the model, and the F1 score has been used as the performance metric.
ARTICLE | doi:10.20944/preprints202009.0218.v1
Subject: Physical Sciences, Particle And Field Physics Keywords: mirror matter theory; supersymmetry; big bang; unification theory; spacetime
Online: 10 September 2020 (04:40:06 CEST)
A dynamic view is conjectured for not only the universe but also the underlying theories in contrast to the convectional pursuance of single unification theory. As the 4 -d spacetime evolves dimension by dimension via the spontaneous symmetry breaking mechanism, supersymmetric mirror models consistently emerge one by one at different energy scales and scenarios involving different sets of particle species and interactions. Starting from random Planck fluctuations, the time dimension and its arrow are born in the time inflation process as the gravitational strength is weakened under a 1-d model of a “timeron” scalar field. The “ timeron” decay then starts the hot big bang and generates Majorana fermions and U(1) gauge bosons in 2-d spacetime. The next spontaneous symmetry breaking results in two space inflaton fields leading to a double space inflation process and emergence of two decoupled sectors of ordinary and mirror particles. In fully extended 4-d spacetime, the supersymmetric standard model with mirror matter before the electroweak phase transition and the subsequent pseudo-supersymmetric model due to staged quark condensation as previously proposed are justified. A set of principles are postulated under t his new framework. In particular, new understanding of the evolving supersymmetry and Z2 or generalized mirror symmetry is presented.
REVIEW | doi:10.20944/preprints202007.0153.v1
Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: Open-science; big data; fMRI; data sharing; data management
Online: 8 July 2020 (11:53:33 CEST)
Large datasets that enable researchers to perform investigations with unprecedented rigor are growing increasingly common in neuroimaging. Due to the simultaneous increasing popularity of open science, these state-of-the-art datasets are more accessible than ever to researchers around the world. While analysis of these samples has pushed the field forward, they pose a new set of challenges that might cause difficulties for novice users. Here, we offer practical tips for working with large datasets from the end-user’s perspective. We cover all aspects of the data life cycle: from what to consider when downloading and storing the data, to tips on how to become acquainted with a dataset one did not collect, to what to share when communicating results. This manuscript serves as a practical guide one can use when working with large neuroimaging datasets, thus dissolving barriers to scientific discovery.
Subject: Computer Science And Mathematics, Mathematics Keywords: classification; management; big data; computing; statistics; trophic state; zonation
Online: 27 October 2019 (15:56:58 CET)
Limnologists often adhere to a discretized view of waterbodies—they classify them, divide them into zones, promote discrete management targets, and use research tools, experimental designs, and statistical analyses focused on discretization. This approach to limnology has profoundly benefited the way we understand, manage, and communicate about waterbodies. But the research questions and the research tools in limnology are changing rapidly with consequences for the relevance of our current discretization schemes. Here, I examine how and why we discretize and argue that selectively rethinking the extent to which we must discretize gives us an exceptional chance to advance limnology in new ways.
REVIEW | doi:10.20944/preprints201908.0179.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: biofuels; deep learning; big data; machine learning models; biodiesel
Online: 17 August 2019 (03:48:28 CEST)
Biofuels construct an essential pillar of energy systems. Biofuels are considered as a popular resource for electricity production, heating, household, and industrial usage, liquid fuels, and mobility around the world. Thus, the need for handling, modeling, decision-making, demand, and forecasting for biofuels are of utmost importance. Recently, machine learning (ML) and deep learning (DL) techniques have been accessible in modeling, optimizing, and handling biofuels production, consumption, and environmental impacts. The main aim of this study is to review and evaluate ML and DL techniques and their applications in handling biofuels production, consumption, and environmental impacts, both for modeling and optimization purposes. Hybrid and ensemble ML methods, as well as DL methods, have found to provide higher performance and accuracy in modeling the biofuels.
ARTICLE | doi:10.20944/preprints202107.0238.v1
Subject: Physical Sciences, Acoustics Keywords: Negative cosmic time; cosmological solutions; variable deceleration parameter; big rip
Online: 12 July 2021 (09:46:04 CEST)
In this article, we assume that the beginning of the universe was before the Big Bang. In the beginning, all matter in the universe was combined in an infinitesimal spherical shape. This sphere was compressed to an incomprehensible value for a period, and then exploded and expanded time and space. We are referring to the negative time before the Big Bang. The evolution of the universe before the Big Bang, passing through the moment of the explosion to the end of the universe at the Big Rip, has been studied. In this article, we try to answer the questions; did the universe exist before the Big Bang? What is the origin of the universe and how did it arise? What are the stages of the evolution of the universe until the moment of the Big Rip? What is the length of time for the stages of this development?
ARTICLE | doi:10.20944/preprints202106.0330.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: Chemotherapy; Radiotherapy; Cognitive dysfunction; Big data; Cohort studies; Survival analysis
Online: 14 June 2021 (07:51:57 CEST)
Background: We aimed to assess the risk of chemotherapy- and radiotherapy-related cognitive impairment in colorectal cancer patients. Methods: We randomly selected 40% of colorectal cancer patients from Korean National Health Insurance Database (NHID), 2004-2018 (N=148,848). Patients with one or more ICD-10 diagnostic codes for dementia or mild cognitive impairment was defined as cognitive impairment cases. Patients who were aged 18 or younger, diagnosed with cognitive impairment before colorectal cancer (N=8,225) and did not receive primary resection (N=45,320) were excluded. The effects of each chemotherapy agent on cognitive impairment were estimated. We additionally estimated the effect of radiotherapy in rectal cancer patients. Time-dependent competing risk Cox regression was conducted to estimate overall and age-specific hazard ratios (HR) separately for colon and rectal cancer. Results: In colon cancer, capecitabine and irinotecan was associated with higher cognitive im-pairment, while 5-fluorouracil was not. In rectal cancer, no chemotherapy agents increased the risk of cognitive impairment, nor did radiotherapy. Hazardous association of irinotecan was estimated larger in elderly patients compared with younger counterparts. Conclusion: Heterogeneous associations between various chemotherapy agents and cognitive impairment were observed. Elderly patients were more vulnerable to possible adverse cognitive effects. Radiotherapy did not increase the risk of cognitive impairment.
ARTICLE | doi:10.20944/preprints202102.0027.v1
Subject: Computer Science And Mathematics, Other Keywords: Keywords: Ship Recycling, Predictive Analytics, Big Data, Shipbreaking, Leakage Effect
Online: 1 February 2021 (12:43:52 CET)
Abstract:Global ship demolition is mostly concentrated in south Asian countries, namely Bangladesh, India, Pakistan and China, since 1990’s, having competitive advantage for their high natural tide, and low environmental and social costs. Due to high social and environmental externalities, stakeholders increase monitoring of the externalities and continue to prescribe improvement towards sustainability, which put pressures on profitability and competitiveness. As a consequence, also seen in the past, a leakage effect may emerge, leading to shift of this activity to a region, with relatively less monitored and less stricter on social and environmental impacts. Unfortunately, the leakage effect is never predicted in shipbreaking in order to understand the level of push compatible in the given socio-economic contexts. In this study, we have attempted to predict the future ship demolition landscape, applying machine learning technique to 34,531 in-service vessels worldwide, larger than 500 gross tonnage (GT), which is run against a learning model based on 3500 demolished vessels from 2014. This study shows that redistribution may occur among the top recycling nations: India may emerge out to be a dominant player in shipbreaking, surpassing Bangladesh by a margin of two-fold, while Pakistan and China are in decreasing trend. In addition, the leakage effect is observed, in that Vietnam is predicted to be the fourth largest ship demolition country, while China and Pakistan recede from the third and fourth place to 6th and 8th. Turkey is predicted to advance from fifth position to third position by vessel count but stays same in term of total GT dismantled. Although it is not clear if any leakage is to be observed in the near future, this study may be a model for future predictive analytics and help stakeholders take evidence-based business decisions.
ARTICLE | doi:10.20944/preprints201904.0283.v1
Subject: Social Sciences, Urban Studies And Planning Keywords: Head/tail breaks, natural cities, Zipf’s law, geospatial big data
Online: 25 April 2019 (12:06:45 CEST)
Authorities define cities – or human settlements in general – through imposing top-down rules in terms of whether buildings belong to cities. Emerging geospatial big data makes it possible to define cities from the bottom up, i.e., buildings determine themselves whether they belong to a city based on the notion of natural cities that is defined based on head/tail breaks, a classification and visualization tool for data with a heavy-tailed distribution. In this paper, we used 125 million building locations – all building footprints of America (mainland) or their centroids more precisely – to derive 2.1 million natural cities in the country (http://lifegis.hig.se/uscities/). These natural cities – in contrast to government defined city boundaries – constitute a valuable data source for city-related research.
ARTICLE | doi:10.20944/preprints201810.0115.v2
Subject: Physical Sciences, Astronomy And Astrophysics Keywords: radio astronomy; interferometry; square kilometre array; big data; faraday tomography
Online: 21 November 2018 (07:19:33 CET)
The Square Kilometre Array (SKA) will be both the largest radio telescope ever constructed and the largest Big Data project in the known Universe. The first phase of the project will generate on the order of 5 zettabytes of data per year. A critical task for the SKA will be its ability to process data for science, which will need to be conducted by science pipelines. Together with polarization data from the LOFAR Multifrequency Snapshot Sky Survey (MSSS), we have been developing a realistic SKA-like science pipeline that can handle the large data volumes generated by LOFAR at 150 MHz. The pipeline uses task-based parallelism to image, detect sources, and perform Faraday Tomography across the entire LOFAR sky. The project thereby provides a unique opportunity to contribute to the technological development of the SKA telescope, while simultaneously enabling cutting-edge scientific results. In this paper, we provide an update on current efforts to develop a science pipeline that can enable tight constraints on the magnetised large-scale structure of the Universe.
ARTICLE | doi:10.20944/preprints202111.0019.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: Industry 4.0; Database; Data models; Big Data & Analytics; Asset Administration Shell
Online: 1 November 2021 (13:01:51 CET)
The data-oriented paradigm has proven to be fundamental for the technological transformation process that characterizes Industry 4.0 (I4.0) so that Big Data & Analytics is considered a technological pillar of this process. The literature reports a series of system architecture proposals that seek to implement the so-called Smart Factory, which is primarily data-driven. Many of these proposals treat data storage solutions as mere entities that support the architecture's functionalities. However, choosing which logical data model to use can significantly affect the performance of the architecture. This work identifies the advantages and disadvantages of relational (SQL) and non-relational (NoSQL) data models for I4.0, taking into account the nature of the data in this process. The characterization of data in the context of I4.0 is based on the five dimensions of Big Data and a standardized format for representing information of assets in the virtual world, the Asset Administration Shell. This work allows identifying appropriate transactional properties and logical data models according to the volume, variety, velocity, veracity, and value of the data. In this way, it is possible to describe the suitability of SQL and NoSQL databases for different scenarios within I4.0.
HYPOTHESIS | doi:10.20944/preprints201808.0127.v1
Subject: Medicine And Pharmacology, Oncology And Oncogenics Keywords: Big Data, Systems Models, Cancer metabolism, Cancer personalized treatment, Drug Discovery.
Online: 6 August 2018 (15:09:15 CEST)
Coordinated sets of extremely numerous digital data, on a given social or economic event, are treated by Artificial Intelligence tools to obtain reasonably accurate, valuable predictions. The same approach, applied to biomedical issues, as how to choose the right drug to completely cure a given cancer patient, does not reach satisfactory results. It is the “organized biological complexity”, which requires a different systems approach, to integrate, in an Augmented Intelligence strategy, statistical computations of digital data, network construction of “omics” findings, well-designed mathematical models and new experiments in an iterative pathway to reconstruct the “logic” beneath the “organized complexity”, as shown here for Systems Metabolomics of cancer. On this basis new diagnostic approaches, able to identify precision drug treatments, as well as new discovery strategy for more effective anti-cancer drugs are described.