REVIEW | doi:10.20944/preprints202105.0663.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Big Data, Internet Data Sources (IDS), Internet of Things (IoT), Sustainable Development Goals (SDGs), Big data Technologies, Big data Challenges
Online: 27 May 2021 (10:31:03 CEST)
It is strongly believed that technology can reap the best only when it can be tamed by all stakeholders. Big data technology has no exception for this and even after a decade of emergence, the technology is still a herculean task and is in nascent stage with respect to applicability for many people. Having understood the gaps in the technology adoption for big data in the contemporary world, the present exploratory research work intended to highlight the possible prospects of big data technologies. It is also advocated as to how the challenges of various fields can be converted as opportunities with the shift in the perspective towards this evolving concept. Examples of apex organizations like (IMF and ITU) and their initiatives of big data technologies with respect to the Sustainable Development Goals (SDGs) are also cited for a broader outlook. The intervention of the responsible organizations along with the respective governments is also much sought for encouraging the technology adoption across all the sections of the market players.
ARTICLE | doi:10.20944/preprints202307.1199.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: smart manufacturing; big data; manufacturing process; big data analytics; decision-making; uncertainty
Online: 18 July 2023 (09:38:31 CEST)
This paper presents a systematic approach to developing big data analytics for manufacturing process-relevant decision-making activities from the perspective of smart manufacturing. The proposed analytics consists of five integrated system components: 1) data preparation system, 2) data exploration system, 3) data visualization system, 4) data analysis system, and 5) knowledge extraction system. The functional requirements of the integrated systems are elucidated. In addition, JAVA- and spreadsheet-based systems are developed to realize the proposed integrated system components. Finally, the efficacy of the analytics is demonstrated using a case study where the goal is to determine the optimal material removal conditions of a dry electrical discharge machining operation. The analytics identified the variables (among voltage, current, pulse-off time, gas pressure, and rotational speed) that effectively maximize the material removal rate. It also identified the variables that do not contribute to the optimization process. The analytics also quantified the underlying uncertainty. In synopsis, the proposed approach results in transparent, big-data-inequality-free, and less resource-dependent data analytics, which is desirable for small and medium enterprises—the actual sites where machining is carried out.
Subject: Computer Science And Mathematics, Computer Science Keywords: big data; data integration; EVMS; construction management
Online: 30 October 2020 (15:35:00 CET)
In the information age today, data are getting more and more important. While other industries achieve tangible improvement by applying cutting edge information technology, the construction industry is still far from being enough. Cost, schedule, and performance control are three major functions in the project execution phase. Along with their individual importance, cost-schedule integration has been a significant challenge over the past five decades in the construction industry. Although a lot of efforts have been put into this development, there is no method used in construction practice. The purpose of this study is to propose a new method to integrate cost and schedule data using big data technology. The proposed algorithm is designed to provide data integrity and flexibility in the integration process, considerable time reduction on building and changing database, and practical use in a construction site. It is expected that the proposed method can transform the current way that field engineers regard information management as one of the troublesome tasks in a data-friendly way.
ARTICLE | doi:10.20944/preprints202306.1378.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Data Generation; Anomaly Data; User Behavior Generation; Big Data
Online: 19 June 2023 (16:31:37 CEST)
The rising importance of Big Data in modern information analysis is supported by vast quantities of user data, but it is only possible to collect sufficient data for all tasks within certain data-gathering contexts. There are many cases where a domain is too novel, too niche, or too sparsely collected to adequately support Big Data tasks. To remedy this, we have created ADG Engine that allows for the generation of additional data that follows the trends and patterns of the data that’s already been collected. Using a database structure that tracks users across different activity types, ADG Engine can use all available information to maximize the authenticity of the generated data. Our efforts are particularly geared towards data analytics by identifying abnormalities in the data and allowing the user to generate normal and abnormal data at custom ratios. In situations where it would be impractical or impossible to expand the available dataset by collecting more data, it can still be possible to move forward with algorithmically expanded data datasets.
ARTICLE | doi:10.20944/preprints202009.0747.v1
Subject: Business, Economics And Management, Accounting And Taxation Keywords: Big Data; Business Plan; Budgeting; Budget; Business Strategy.
Online: 30 September 2020 (13:07:58 CEST)
The business planning process can be considered as a strategic phase of any business. Given that the business plan is a management accounting tool, there are countless approaches that can be adopted to prepare it since there is no legal requirement, as opposed to obligations relating to financial accounting. However, in general, every business plan consists of a numerical part (budget) and a narrative part. In this research, the author highlights, on the basis of experiences and commonly used theories, a standard process that can be adaptable to the business plan of any type of activity. The use of big data is highlighted as an essential part of feeding the data of almost all the steps of the budget. The author then manages to determine a generally applicable standard process, indicating all the data necessary to prepare an accurate and reliable business plan. A case study will provide adequate support to the demonstration of the immediate applicability of the proposed model.
ARTICLE | doi:10.20944/preprints202304.0644.v1
Subject: Business, Economics And Management, Business And Management Keywords: Big data predictive analytics; big data culture; competitive strategies; Strategic alliance performance; Pakistani Companies
Online: 20 April 2023 (10:07:08 CEST)
The study is based on the notion that big data predictive analytics is important for developing strategic alliances performance of companies. this study investigates the relationship between big data predictive analytics, big data culture, and competitive strategies 'techniques were adopted, such as descriptive statistics, correlation,regression, etc. using SPSS and SmartPLS statistical software. Hypotheses were tested with bootstrapped analysis using SEM (through SmartPLS). The study developed a structural equation model by using the SEM analysis. The results of the SEM analysis suggested the hypothesized model of the study was valid. The results supported all the hypotheses of the study.Through empirical analysis, demonstrated the conclusion is that the big data predictive analytics has a positive and significant relationship with strategic alliance performance
ARTICLE | doi:10.20944/preprints202205.0334.v1
Subject: Engineering, Control And Systems Engineering Keywords: Backpressure; Big Data; Spark Streaming; Stream Processing
Online: 24 May 2022 (11:47:39 CEST)
In the past decades, a significant rise in the adoption of streaming applications has changed the decision-making process for the industry and academia sectors. This movement led to the emergence of a plurality of Big Data technologies such as Apache Storm, Spark, Heron, Samza, Flink, and other systems to provide in-memory processing for real-time Big Data analysis at high throughput. Spark Streaming represents one of the most popular open-source implementations which handles an ever-increasing data ingestion and processing by using the Unified Memory Manager to manage memory occupancy between storage and processing regions dynamically, which is the focus of this study. The problem behind memory management for data-intensive stream processing pipelines is that the incoming data is faster than the downstream operators can consume. Consequently, the backpressure of Spark acts in the opposite direction of downstream operators. In such a case, the incoming data overwhelms the memory manager and provokes memory leak issues. As a result, it affects the performance of applications generating, e.g., high latency, low throughput, or even data loss. In such a case, the initial intuition motivating our work is that memory management became the critical factor in keeping processing at scale and system stability of Spark. This work provides a deep dive into Spark backpressure, evaluates its structure, presents the main characteristics to support data-intensive streaming pipelines, and investigates the current in-memory-based performance issues.
ARTICLE | doi:10.20944/preprints201804.0144.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: big data; SIEM; correlation analysis; cyber crime profiling
Online: 11 April 2018 (08:39:02 CEST)
The number of SIEM introduction is increasing in order to detect threat patterns in a short period of time with a large amount of structured/unstructured data, to precisely diagnose crisis to threats, and to provide an accurate alarm to an administrator by correlating collected information. However, it is difficult to quickly recognize and handle with various attack situations using a solution equipped with complicated functions during security monitoring. In order to overcome this situation, new detection analysis process has been required, and there is an effort to increase response speed during security monitoring and to expand accurate linkage analysis technology. In this paper, reflecting these requirements, we design and propose profiling auto-generation model that can improve the efficiency and speed of attack detection for potential threats requirements. we design and propose profiling auto-generation model that can improve the efficiency and speed of attack detection for potential threats.
REVIEW | doi:10.20944/preprints202003.0141.v1
Subject: Medicine And Pharmacology, Other Keywords: data sharing; data management; data science; big data; healthcare
Online: 8 March 2020 (16:46:20 CET)
In recent years, more and more health data are being generated. These data come not only from professional health systems, but also from wearable devices. All these data combined form ‘big data’ that can be utilized to optimize treatments for each unique patient (‘precision medicine’). To achieve this precision medicine, it is necessary that hospitals, academia and industry work together to bridge the ‘valley of death’ of translational medicine. However, hospitals and academia often have problems with sharing their data, even though the patient is actually the owner of his/her own health data, and the sharing of data is associated with increased citation rate. Academic hospitals usually invest a lot of time in setting up clinical trials and collecting data, and want to be the first ones to publish papers on this data. The idea that society benefits the most if the patient’s data are shared as soon as possible so that other researchers can work with it, has not taken root yet. There are some publicly available datasets, but these are usually only shared after studies are finished and/or publications have been written based on the data, which means a severe delay of months or even years before others can use the data for analysis. One solution is to incentivize the hospitals to share their data with (other) academic institutes and the industry. Here we discuss several aspects of data sharing in the medical domain: publisher requirements, data ownership, support for data sharing, data sharing initiatives and how the use of federated data might be a solution. We also discuss some potential future developments around data sharing.
ARTICLE | doi:10.20944/preprints201806.0219.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Big data technology; Business intelligence; Data integration; System virtualization.
Online: 13 June 2018 (16:19:48 CEST)
Big Data warehouses are a new class of databases that largely use unstructured and volatile data for analytical purpose. Examples of this kind of data sources are those coming from the Web, such as social networks and blogs, or from sensor networks, where huge amounts of data may be available only for short intervals of time. In order to manage massive data sources, a strategy must be adopted to define multidimensional schemas in presence of fast-changing situations or even undefined business requirements. In the paper, we propose a design methodology that adopts agile and automatic approaches, in order to reduce the time necessary to integrate new data sources and to include new business requirements on the fly. The data are immediately available for analyses, since the underlying architecture is based on a virtual data warehouse that does not require the importing phase. Examples of application of the methodology are presented along the paper in order to show the validity of this approach compared to a traditional one.
ARTICLE | doi:10.20944/preprints202012.0529.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: e-commerce; big data; bibliometric analysis; knowledge mapping
Online: 21 December 2020 (14:24:06 CET)
The e-commerce platform in the digital economy era has evolved into a data platform ecosystem built around data resources and data mining technology systems. The most typical applications of big data are also concentrated in the field of e-commerce. E-commerce companies should first grasp the interactive relationship among the three major factors of data, technology and innovation, e-commerce platform operation is a multidisciplinary research field. It is not easy for researchers to obtain a panoramic view of the knowledge structure in this field. Knowledge graph is a kind of graph that shows the development process and structure relationship of knowledge with the field of knowledge as the object. It is not only a visual knowledge mapping, but also a serialized knowledge pedigree, which provides researchers with a quantitative research method for the development trend of statistics and academic status. The purpose of this research is to help researchers understand the key knowledge, evolutionary trends and research frontiers of current research. This study uses Citespace bibliometric analysis to analyze the data of the Science Net database and finds that: 1) The development of the research field has gone through three stages, and some representative key scholars and key documents have been recognized; 2) the common knowledge mapping of literature The co-occurrence of citations and keywords shows research hotspots; 3) The results of burst detection and central node analysis reveal research frontiers and development trends. Today, the visualization of big data brings different challenges. The abstraction between the world and today's data visualization occurs when the data is captured. Every user sees his own visualization data generated by standardized calculations. At the same time, there are still many controversies in the theoretical model, structure and structural dimensions. This is the direction that future researchers need to further study.
ARTICLE | doi:10.20944/preprints201811.0074.v1
Subject: Medicine And Pharmacology, Psychiatry And Mental Health Keywords: system dynamics modeling; big data; mental distress; diet
Online: 5 November 2018 (02:34:30 CET)
Dietary factors are one of the risk factors that can impact the brain chemistry, which leads to mental distress. Based on our data mining approach, we found that mental distress in men is associated with eating unhealthy food. Our aim in this paper is to apply results from our big data analytics approach to inform system dynamics (SD) modeling to investigate the causal relationships between brain structures, nutrients from food and dietary supplements, and mental health. We perform descriptive analysis based on a large data set to estimate the SD modeling parameters. Finally, we calibrate the model towards a time series data collected for individuals on their dietary and distress patterns. The results reveal that bridging these different methodologies leads to further insights from the SD model and decreases the error of calibrated parameter values. Future research is needed to validate our initial results for investigating the relationship between mental distress and dietary intake.
ARTICLE | doi:10.20944/preprints202309.0148.v1
Subject: Computer Science And Mathematics, Discrete Mathematics And Combinatorics Keywords: Internet of Things; Big Data Ecosystem; Hadoop Ecosystem; Storage Computing
Online: 5 September 2023 (10:33:33 CEST)
To handle the huge amount of data generated by IoT devices, Big Data processing tools make it easier. This paper discusses the Big Data concept and its main V’s characteristics. It further describes IoT-enabling technologies; nominally cloud computing such as SaaS and PaaS. The centralization and infrastructure of Big Data systems, and how Cloud Computing gives a platform access to the data from anywhere. The paper explores IoT with big data architectural solutions for various use cases across the healthcare and transportation sectors.
REVIEW | doi:10.20944/preprints202203.0407.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: big data analytics; healthcare; data technologies; decision making; information management; EHR
Online: 31 March 2022 (12:24:19 CEST)
Big data analytics tools are the use of advanced analytic techniques targeting large and diverse volumes of data that include structured, semi-structured, and unstructured data from different sources and in different sizes from terabytes to zetabytes. The health sector is faced with the need to generate and manage large data sets from various health systems, such as electronic health records and clinical decision support systems. This data can be used by providers, clinicians, and policymakers to plan and implement interventions, detect disease more quickly, predict outcomes, and personalize care delivery. However, little attention is paid to the connection between big data analytics tools and the health sector. Thus, a systematic review of the bibliometric literature (LRSB) was developed to study how the adoption of big data analytics tools and infrastructures will revolutionize the healthcare industry. The review integrated 77 scientific and/or academic documents indexed in SCOPUS presenting up‐to‐date knowledge on current insights on how big data analytics technologies influence the healthcare sector and the different big data analytical tools used. The LRSB provides findings related to the impact of Big Data analytics on the health sector by introducing opportunities and technologies that provide practical solutions to various challenges.
ARTICLE | doi:10.20944/preprints202005.0274.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: big data; deep learning; intelligent systems; medical imaging; multi-data processing
Online: 16 May 2020 (17:43:42 CEST)
Big Data in medicine includes possibly fast processing of large data sets, both current and historical in purpose supporting the diagnosis and therapy of patients' diseases. Support systems for these activities may include pre-programmed rules based on data obtained from the interview medical and automatic analysis of test results diagnostic results will lead to classification of observations to a specific disease entity. The current revolution using Big Data significantly expands the role of computer science in achieving these goals, which is why we propose a Big Data computer data processing system using artificial intelligence to analyze and process medical images.
ARTICLE | doi:10.20944/preprints202309.0560.v1
Subject: Computer Science And Mathematics, Analysis Keywords: Big Data Analytics; Revenue Generation; Customer Relationship Management (CRM)
Online: 8 September 2023 (13:33:43 CEST)
This study explores the potential of data science software solutions like Customer Relationship Management Software (CRM) for increasing the revenue generation of businesses. We focused on those businesses in Accommodation and Food Service sector across the European Union (EU). The investigation is contextualized within the rising trend of data-driven decision making, examining the potential correlation between data science application and business revenue. Employing a comprehensive evaluation of Eurostat datasets from 2014 to 2021, we used both univariate and multivariate analyses, we assessed e-commerce sales data across countries, focusing on the usage of big data and CRM tools. Big data utilization showed a clear, positive relationship with enhanced e-commerce sales. However, CRM tools exhibited a dualistic impact: while their use in marketing showed no significant effect on sales, their application in non-marketing functions had a negative correlation. These findings underscore the potential role of CRM and data science solutions in enhancing business performance in the EU's Accommodation and Food Service industry.
ARTICLE | doi:10.20944/preprints202309.0922.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: distributed photovoltaic; big data; planning; reliability; multi-scenario
Online: 13 September 2023 (16:54:48 CEST)
With a high proportion of distributed photovoltaic and lower fossil energy integrated into the distribution network, it is very difficult to ensure the reliability of power supply. The distributed photovoltaic planning model based on big data is proposed. According to the impact stochastic photovoltaics and loads on reliability planning, the static and dynamic capacity-load ratios are proposed. The big data analysis model of distributed photovoltaic planning is established. The big data multi-scenario generation and reduction algorithm of stochastic distributed photovoltaic and load planning is studied, and a source-load big data scenario matching model is proposed. Ac-cording to the source load big data scenario, the dynamic capacity-load ratio of the distribution network is obtained. The static capacity-load ratio calculation method in distribution network planning is studied to ensure the reliability of power supply. Finally, the IEEE 33-bus system is used as an example. The results show that distributed photovoltaic planning based on big data and multi-scenario methods can improve photovoltaic utilization and power supply reliability.
ARTICLE | doi:10.20944/preprints202105.0601.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Mobile RPG; Big Data; Text Mining; Topic Modeling
Online: 25 May 2021 (10:21:36 CEST)
As RPG has high sales and profits, lots of developers have supplied various RPG to market but it changed to mass production type with sensational advertising, low quality and excessive charging and similar contents which affects game market and users’ game play experience. The author of this paper studied ways to improve mobile RPG by collecting and analyzing users’ reviews using crawling on Google Play Store. The author of this paper used topic modeling that uses text mining technique and LDA (Latent Dirichlet Allocation) to extract meaningful information from collected big data and visualized it. Inferring users’ reviews, figuring out opinions objectively and seeking ways to improve games are helpful in improving mobile RPG that can be played continuously.
ARTICLE | doi:10.20944/preprints202305.0722.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Anomaly detection; Malaria data; Machine learning; big data; epidemic
Online: 10 May 2023 (09:34:36 CEST)
Disease surveillance is critical to monitor ongoing control activities, detect early outbreaks and to inform intervention priorities and policies. Unfortunately, most data from disease surveillance remain under-utilised to support decision-making in real-time. Using the Brazilian Amazon malaria surveillance data as a case study, we explore unsupervised anomaly detection machine learning techniques to analyse and discover potential anomalies. We found that our models are able to detect early outbreaks, peak of outbreaks as well as change points in the proportion of positive malaria cases. Specifically, the sustained rise in malaria in the Brazilian Amazon in 2016 was flagged by several models. We also found that no single model detects all the anomalies across all health regions. The approaches using Clustering-based local outlier algorithm ranked first before Principal component analysis and Stochastic outlier selection in maximising the number of anomalies detected in local health regions. Because of this, we also provide the minimum number of machine learning models (top-k models) to maximise the number of anomalies detected across different health regions. We discovered that the top-3 models that maximise the coverage of the number and types of anomalies detected across the 13 health regions are: Principal component analysis, Stochastic outlier selection and Multi-covariance determinant. Anomaly detection approaches provide interesting solutions to discover patterns of epidemiological importance when confronted with a large volume of data across space and time. Our exploratory approach can be replicated for other diseases and locations to inform timely interventions and actions toward endemic disease control.
ARTICLE | doi:10.20944/preprints202206.0320.v4
Subject: Biology And Life Sciences, Other Keywords: data; reproducibility; FAIR; data reuse; public data; big data; analysis
Online: 2 November 2022 (02:55:49 CET)
With an increasing amount of biological data available publicly, there is a need for a guide on how to successfully download and use this data. The Ten simple rules for using public biological data are: 1) use public data purposefully in your research, 2) evaluate data for your use case, 3) check data reuse requirements and embargoes, 4) be aware of ethics for data reuse, 5) plan for data storage and compute requirements, 6) know what you are downloading, 7) download programmatically and verify integrity, 8) properly cite data, 9) make reprocessed data and models Findable, Accessible, Interoperable, and Reusable (FAIR) and share, and 10) make pipelines and code FAIR and share. These rules are intended as a guide for researchers wanting to make use of available data and to increase data reuse and reproducibility.
ARTICLE | doi:10.20944/preprints201808.0335.v1
Subject: Business, Economics And Management, Business And Management Keywords: big data; maturity model; temporal analytics; advanced business analytics
Online: 18 August 2018 (11:05:24 CEST)
The main aim of this paper is to explore the issue of big data and to propose a conceptual framework for big data, based on the temporal dimension. The Temporal Big Data Maturity Model (TBDMM) is a means for assessing organization’s readiness to fully profit from big data analysis. It allows the measurement of the current state of the organization’s big data assets and analytical tools, and to plan their future development. The framework explicitly incorporates a time dimension, providing a complete means for assessing also the readiness to process temporal data and/or knowledge that can be found in modern sources, such as big data ones. Temporality in the proposed framework extends and enhances the already existing maturity models for big data. This research paper is based on a critical analysis of literature, as well as creative thinking, and on the case-study approach involving multiple cases. The literature-based research has shown that the existing maturity models for big data do not treat the temporal dimension as the basic one. At the same time, dynamic analytics is crucial for a sustainable competitive advantage. This conceptual framework was well received among practitioners, to whom it has been presented during interviews. The participants in the consultations often expressed their need of temporal big data analytics, and hence the temporal approach of the maturity model was widely welcomed.
ARTICLE | doi:10.20944/preprints201810.0253.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: adaptive filtering; set-membership filtering; affine projection; data censoring; big data; outliers
Online: 12 October 2018 (04:57:08 CEST)
In this paper, the set-membership affine projection (SM-AP) algorithm is utilized to censor non-informative data in big data applications. To this end, the probability distribution of the additive noise signal and the excess of mean-squared error (EMSE) in steady-state are employed in order to estimate the threshold parameter of the single threshold SM-AP (ST-SM-AP) algorithm aiming at attaining the desired update rate. Furthermore, by defining an acceptable range for the error signal, the double threshold SM-AP (DT-SM-AP) algorithm is proposed to detect very large errors due to the irrelevant data such as outliers. The DT-SM-AP algorithm can censor non-informative and irrelevant data in big data applications, and it can improve misalignment and convergence rate of the learning process with high computational efficiency. The simulation and numerical results corroborate the superiority of the proposed algorithms over traditional algorithms.
REVIEW | doi:10.20944/preprints202202.0345.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: big data; machine learning; agriculture; challenges; systematic literature review
Online: 28 February 2022 (03:14:56 CET)
Agricultural Big Data is a set of technologies that allows responding to the challenges of the new data era. In conjunction with machine learning, farmers can use data to address different problems such as farmers' decision-making, crops, weeds, animal research, land, food availability and security, weather, and climate change. The purpose of this paper is to synthesize the evidence regarding the challenges involved in implementing machine learning in Agricultural Big Data. We conducted a Systematic Literature Review applying the PRISMA protocol. This review includes 30 papers, published from 2015 to 2020. We develop a framework that summarizes the main challenges encountered, the use of machine learning techniques, as well as the main technologies used. A major challenge is the design of Agricultural Big Data architectures, due to the need to modify the set of technologies adapting the machine learning techniques, as the volume of data increases.
REVIEW | doi:10.20944/preprints202205.0325.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: big data; architecture; agriculture; climate change; systematic literature review
Online: 24 May 2022 (07:42:55 CEST)
Climate change is currently one of the main problems facing agriculture to achieve sustainability. It causes situations such as drought, increased rainfall, and increased diseases, causing a decrease in food production. In order to combat these problems, Agricultural Big Data contributes with tools that allow improving the understanding of complex, multivariate, and unpredictable agricultural ecosystems through the collection, storage, processing, and analysis of vast amounts of data from diverse heterogeneous sources. This research aims to discuss the advancement of technologies used in Agricultural Big Data architectures in the context of climate change. The study aims to highlight the tools used to process, analyze, and visualize the data and discuss the use of the architectures in the crop, water, climate, and soil management, especially to analyze the context, whether it is in Resilience Mitigation or Adaptation. The PRISMA protocol guided the study, finding 33 relevant papers. Despite the advances in this line of research, few papers were found that mention the components of the architectures, in addition to the lack of standards and the use of reference architectures, which allow the proper development of Agricultural Big Data in the context of climate change.
ARTICLE | doi:10.20944/preprints202201.0172.v2
Subject: Business, Economics And Management, Business And Management Keywords: blockchain; healthcare supply chain management; logistics cooperation; big data
Online: 19 January 2022 (12:09:00 CET)
This study emphasizes the necessity of introducing a blockchain-based joint logistics system to strengthen the competency of medical supply chain management (SCM) and tries to develop a healthcare supply chain management (HSCM) competency measurement item through an analytic hierarchy process. The variables needed for using blockchain-based joint logistics are the performance expectations, effort expectations, promotion conditions, and social impact of the UTAUT model, and the HSCM competency results in increased reliability and transparency, enhanced SCM, and enhanced scalability. Word cloud results, analyzing the most important considerations to realize work efficiency among medical industry-related agencies, mentioned numerous words, including sudden situations, delivery, technology trust, information sharing, effectiveness, urgency, etc. This might imply the need to establish a system that can respond immediately to emergency situations during holidays. It could also suggest the importance of real-time information sharing to increase the efficiency of inventory management. Therefore, there is a need of a business model that can increase the visibility of real-time medical SCM through big data analysis. By analyzing the importance of securing reliability based on the blockchain technology in the establishment of a supply chain network for HSCM competency, we reveal that joint logistics can be achieved and synergistic effects can be created by implementing the integrated database to secure HSCM competency. Strengthening partnerships, such as joint logistics, will eventually lead to HSCM competency. In particular, HSCM should seek ways to upgrade its competitive capabilities through big data analysis based on the establishment of a joint logistics system.
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: reproductive health; infertility; big data; Machine Learning; AI; Systems Biology
Online: 18 November 2020 (13:51:46 CET)
Advances in machine learning (ML) and artificial intelligence (AI) are transforming the way we treat patients in ways not even imagined a few years ago. Cancer research is at the forefront of this movement. Infertility, though not a life-threatening condition, affects around 15% of couples trying for a pregnancy. Increasing availability of large datasets from various sources creates an opportunity to introduce ML and AI into infertility prevention and treatment. At present in the field of assisted reproduction, very little is done in order to prevent infertility from arising, with the main focus put on treatment when often advanced maternal age and low ovarian reserve make it very difficult to conceive. A shift from this disease-centric model to a health centric model in infertility is already taking place with more emphasis on the patient as an active participator in the process. Poor quality and incomplete data as well as biological variability remain the main limitations in the widespread and reliable implementation of AI in the field of reproductive medicine. That said, one of the areas where this technology managed to find a foothold is identification of developmentally competent embryos. More work is required however to learn about ways to improve natural conception, the detection and diagnosis of infertility, and improve assisted reproduction treatments (ART) and ultimately, develop clinically useful algorithms able to adjust treatment regimens in order to assure a successful outcome of either fertility preservation or infertility treatment. Progress in genomics, digital technologies and advances in integrative biology has had a tremendousimpact on research and clinical medicine. With the rise of ‘big data’, artificial intelligence, and the advances in molecular profiling, there is an enormous potential to transform not only scientific research progress, but also clinical decision making towards predictive, preventive, and personalized medicine. In the field of reproductive health, there is now an exciting opportunity to leverage these technologies and develop more sophisticated approaches to diagnose and treat infertility disorders. In this review, we present a comprehensive analysis and interpretation of different innovation forces that are driving the emergence of a system approach to the infertility sector. Here we discuss recent influential work and explore the limitations of the use of Machine Learning models in this rapidly developing area.
ARTICLE | doi:10.20944/preprints202002.0294.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: bitmap indexing; processing in memory; memory wall; Big Data; Internet Of Things
Online: 20 February 2020 (08:24:48 CET)
To live in the information society means to be surrounded by billions of electronic devices full of sensors that constantly acquire data. This enormous amount of data must be processed and classified. A solution commonly adopted is to send these data to server farms to be remotely elaborated. The drawback is a huge battery drain due to high amount of information that must be exchanged. To compensate this problem data must be processed locally, near the sensor itself. But this solution requires huge computational capabilities. While microprocessors, even mobile ones, nowadays have enough computational power, their performance are severely limited by the Memory Wall problem. Memories are too slow, so microprocessors cannot fetch enough data from them, greatly limiting their performance. A solution is the Processing-In-Memory (PIM) approach. New memories are designed that are able to elaborate data inside them eliminating the Memory Wall problem. In this work we present an example of such system, using as a case of study the Bitmap Indexing algorithm. Such algorithm is used to classify data coming from many sources in parallel. We propose an hardware accelerator designed around the Processing-In-Memory approach, that is capable of implementing this algorithm and that can also be reconfigured to do other tasks or to work as standard memory. The architecture has been synthesized using CMOS technology. The results that we have obtained highlights that, not only it is possible to process and classify huge amount of data locally, but also that it is possible to obtain this result with a very low power consumption.
ARTICLE | doi:10.20944/preprints201808.0350.v2
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: big data; clustering; data mining; educational data mining; e-learning; profile learning
Online: 19 October 2018 (05:58:05 CEST)
Educational data-mining is an evolving discipline that focuses on the improvement of self-learning and adaptive methods. It is used for finding hidden patterns or intrinsic structures of educational data. In the arena of education, the heterogeneous data is involved and continuously growing in the paradigm of big-data. To extract meaningful information adaptively from big educational data, some specific data mining techniques are needed. This paper presents a clustering approach to partition students into different groups or clusters based on their learning behavior. Furthermore, personalized e-learning system architecture is also presented which detects and responds teaching contents according to the students’ learning capabilities. The primary objective includes the discovery of optimal settings, in which learners can improve their learning capabilities. Moreover, the administration can find essential hidden patterns to bring the effective reforms in the existing system. The clustering methods K-Means, K-Medoids, Density-based Spatial Clustering of Applications with Noise, Agglomerative Hierarchical Cluster Tree and Clustering by Fast Search and Finding of Density Peaks via Heat Diffusion (CFSFDP-HD) are analyzed using educational data mining. It is observed that more robust results can be achieved by the replacement of existing methods with CFSFDP-HD. The data mining techniques are equally effective to analyze the big data to make education systems vigorous.
CONCEPT PAPER | doi:10.20944/preprints202111.0117.v1
Subject: Business, Economics And Management, Business And Management Keywords: Big data predictive analytics; competitive strategies; strategic alliance performance; Telecom sector
Online: 5 November 2021 (11:29:12 CET)
Based on the resource-based theory, the current study examines the relationship between competitive strategies and strategic alliance performance. Furthermore, big data predictive analytics is treated as a boundary condition between competitive strategies and strategic alliance performance. Big data of predictive analytics in operations and industrial management has been a focal point in the current era. There has been little attention has about big data predictive analytics influences on competitive strategies and strategic alliance performance, especially in developing countries like Pakistan. A survey instrument was used to record the responses from 331 employees of the telecom sectors companies working in Pakistan. Study findings show that big competitive strategies have a positive and significant relationship with strategic alliances performance. It was also found that big data predictive analytics plays the role of moderator between competitive strategies and strategic alliance performance. The study add a new perspective and contribution to the literature on big data predictive analytics, strategic alliance performance, and competitive strategies in Pakistan's telecom sector companies. Further, the study results explain that big data analytics is just like the companies' lifeblood in the current era. The efficient and effective use of big data analytics, companies can boost their standards in a competitive environment.
ARTICLE | doi:10.20944/preprints202301.0415.v2
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Psychological Health; Drugs; Twitter; Machine Learning; Big Data; Drug Abuse; Toxicology; Social Factors; Economic Factors; Environmental Factors
Online: 27 February 2023 (13:31:40 CET)
Mental health issues can have significant impacts on individuals and communities and hence on social sustainability. There are several challenges facing mental health treatment, however, more important is to remove the root causes of mental illnesses because doing so can help prevent mental health problems from occurring or recurring. This requires a holistic approach to understanding mental health issues that are missing from the existing research. Mental health should be understood in the context of social and environmental factors. More research and awareness are needed, as well as interventions to address root causes. The effectiveness and risks of medications should also be studied. This paper proposes a big data and machine learning-based approach for the automatic discovery of parameters related to mental health from Twitter data. The parameters are discovered from three different perspectives, Drugs & Treatments, Causes & Effects, and Drug Abuse. We used Twitter to gather 1,048,575 tweets in Arabic about psychological health in Saudi Arabia. We built a big data machine learning software tool for this work. A total of 52 parameters were discovered for all three perspectives. We defined 6 macro-parameters (Diseases & Disorders, Individual Factors, Social & Economic Factors, Treatment Options, Treatment Limitations, and Drug Abuse) to aggregate related parameters. We provide a comprehensive account of mental health, causes, medicines and treatments, mental health and drug effects, and drug abuse, as seen on Twitter, discussed by the public and health professionals. Moreover, we identify their associations with different drugs. The work will open new directions for social media-based identification of drug use and abuse for mental health, as well as other micro and macro factors related to mental health. The methodology can be extended to other diseases and provides a potential for discovering evidence for forensics toxicology from social and digital media.
CONCEPT PAPER | doi:10.20944/preprints202102.0203.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: Bigdata; IoT; Big Data Analytics; Covid-19; healthcare
Online: 8 February 2021 (12:19:28 CET)
— Big Data analytics has come a long way since its inception. This field is growing day by day. With the advent of large handling capacity of computational analysis of modern computing systems as well as Internet of Things (IoT), this field has revolutionized the way we think about data. It has influenced the major domains such as healthcare, automobile, computing, climatology, and space communications. Of late, the health care sector has been largely influenced by this. This communication deals with the areas of healthcare where big data analytics has been largely influential. Encompassing the basics of Big Data Analytics (BDA) driven by IoT, the applications of it in healthcare sector are outlined, accompanied by future expectations. Additionally, it also presents a comprehensive analysis of recent application with special reference to Covid-19 in this sector.
ARTICLE | doi:10.20944/preprints201811.0339.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: eHealth; big data; deep learning; watson; spark; decision support system; prevention pathways
Online: 15 November 2018 (04:14:36 CET)
Data collection and analysis are becoming more and more important in a variety of application domains as long as the novel technologies advance. At the same time, we are experiencing a growing need for human-machine interaction with expert systems pushing research through new knowledge representation models and interaction paradigms. In particular, in the last years eHealth - that indicates all the health-care practices supported by electronic elaboration and remote communications - calls for the availability of smart environment and big computational resources. The aim of this paper is to introduce the HOLMeS (Health On-Line Medical Suggestions) framework. The introduced system proposes to change the eHealth paradigm where a trained machine learning algorithm, deployed on a cluster-computing environment, provides medical suggestion via both chat-bot and web-app modules. The chat-bot, based on deep learning approaches, is able to overcome the limitation of biased interaction between users and software, exhibiting a human-like behavior. Results demonstrate the effectiveness of the machine learning algorithms showing 74.65% of Area Under ROC Curve (AUC) when first-level features are used to assess the occurrence of different prevention pathways. When disease-specific features are added, HOLMeS shows 86.78% of AUC achieving a more specific prevention pathway evaluation.
ARTICLE | doi:10.20944/preprints202208.0083.v1
Subject: Business, Economics And Management, Accounting And Taxation Keywords: Ratios; Financial Crisis; Covid-19; Big Data; Accounting Data
Online: 3 August 2022 (10:42:06 CEST)
The effects of the 2008 financial crisis undoubtedly caused problems not only to the banking sector but also to the real economy of the developed and the developing countries in almost all around the globe. Besides, as is widely known, every banking crisis entails the corresponding cost to the economy of each country affected by it, which results from the shakeout and the restructuring of its financial system. The purpose of this research is to investigate the consequences of the financial crisis and the COVID-19 health crisis and how these affected the course of the four systemic banks (Eurobank, Alpha Bank, National Bank, Piraeus Bank) through the analysis of ratios for the period of 2015-2020.
ARTICLE | doi:10.20944/preprints201810.0601.v1
Subject: Engineering, Civil Engineering Keywords: support vector machine; travelling time; intelligent transportation system; artificial fish swarm algorithm; big data
Online: 25 October 2018 (10:48:45 CEST)
Freeway travelling time is affected by many factors including traffic volume, adverse weather, accident, traffic control and so on. We employ the multiple source data-mining method to analyze freeway travelling time. We collected toll data, weather data, traffic accident disposal logs and other historical data of freeway G5513 in Hunan province, China. Using Support Vector Machine (SVM), we proposed the travelling time model based on these databases. The new SVM model can simulate the nonlinear relationship between travelling time and those factors. In order to improve the precision of the SVM model, we applied Artificial Fish Swarm algorithm to optimize the SVM model parameters, which include the kernel parameter σ, non-sensitive loss function parameter ε, and penalty parameter C. We compared the new optimized SVM model with Back Propagation (BP) neural network and common SVM model, using the historical data collected from freeway G5513. The results show that the accuracy of the optimized SVM model is 17.27% and 16.44% higher than those of the BP neural network model and the common SVM model respectively.
ARTICLE | doi:10.20944/preprints201710.0076.v2
Subject: Computer Science And Mathematics, Information Systems Keywords: big data; machine learning; regularization; data quality; robust learning framework
Online: 17 October 2017 (03:47:41 CEST)
The concept of ‘big data’ has been widely discussed, and its value has been illuminated throughout a variety of domains. To quickly mine potential values and alleviate the ever-increasing volume of information, machine learning is playing an increasingly important role and faces more challenges than ever. Because few studies exist regarding how to modify machine learning techniques to accommodate big data environments, we provide a comprehensive overview of the history of the evolution of big data, the foundations of machine learning, and the bottlenecks and trends of machine learning in the big data era. More specifically, based on learning principals, we discuss regularization to enhance generalization. The challenges of quality in big data are reduced to the curse of dimensionality, class imbalances, concept drift and label noise, and the underlying reasons and mainstream methodologies to address these challenges are introduced. Learning model development has been driven by domain specifics, dataset complexities, and the presence or absence of human involvement. In this paper, we propose a robust learning paradigm by aggregating the aforementioned factors. Over the next few decades, we believe that these perspectives will lead to novel ideas and encourage more studies aimed at incorporating knowledge and establishing data-driven learning systems that involve both data quality considerations and human interactions.
REVIEW | doi:10.20944/preprints201904.0027.v2
Subject: Computer Science And Mathematics, Analysis Keywords: neuroscience; big data; functional Magnetic Resonance (fMRI); pipeline; one platform system
Online: 8 April 2019 (05:46:55 CEST)
In the neuroscience research field, specific for medical imaging analysis, how to mining more latent medical information from big medical data is significant for us to find the solution of diseases. In this review, we focus on neuroimaging data that is functional Magnetic Resonance Imaging (fMRI) which non-invasive techniques, it already becomes popular tools in the clinical neuroscience and functional cognitive science research. After we get fMRI data, we actually have various software and computer programming that including open source and commercial, it's very hard to choose the best software to analyze data. What's worse, it would cause final result imbalance and unstable when we combine more than software together, so that's why we want to make a pipeline to analyze data. On the other hand, with the growing of machine learning, Python has already become one of very hot and popular computer programming. In addition, it is an open source and dynamic computer programming, the communities, libraries and contributors fast increase in the recent year. Through this review, we hope that can make neuroimaging data analysis more easy, stable and uniform base the one platform system.
REVIEW | doi:10.20944/preprints201805.0418.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: big data training and learning; company and business requirements; ethics; impact; decision support; data engineering; open data; smart homes; smart cities; IoT
Online: 29 May 2018 (08:45:52 CEST)
In Data Science we are concerned with the integration of relevant sciences in observed and empirical contexts. This results in the unification of analytical methodologies, and of observed and empirical data contexts. Given the dynamic nature of convergence, described are the origins and many evolutions of the Data Science theme. The following are covered in this article: the rapidly growing post-graduate university course provisioning for Data Science; a preliminary study of employability requirements, and how past eminent work in the social sciences and other areas, certainly mathematics, can be of immediate and direct relevance and benefit for innovative methodology, and for facing and addressing the ethical aspect of Big Data analytics, relating to data aggregation and scale effects. Associated also with Data Science is how direct and indirect outcomes and consequences of Data Science include decision support and policy making, and both qualitative as well as quantitative outcomes. For such reasons, the importance is noted of how Data Science builds collaboratively on other domains, potentially with innovative methodologies and practice. Further sections point towards some of the most major current research issues.
ARTICLE | doi:10.20944/preprints202007.0078.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: personalization; decision making; medical data; artificial intelligence; Data-driving; Big Data; Data Mining; Machine Learning
Online: 5 July 2020 (15:04:17 CEST)
The study was conducted on applying machine learning and data mining methods to personalizing the treatment. This allows investigating individual patient characteristics. Personalization is built on the clustering method and associative rules. It was suggested to determine the average distance between instances for optimal performance metrics finding. The formalization of the medical data pre-processing stage for finding personalized solutions based on current standards and pharmaceutical protocols is proposed. The model of patient data is built. The paper presents the novel approach to clustering built on ensemble of cluster algorithm with better than k-means algorithm Hopkins metrics. The personalized treatment usually is based on decision tree. Such approach requires a lot of computation time and cannot be paralyzed. Therefore, it is proposed to classify persons by conditions, to determine deviations of parameters from the normative parameters of the group, as well as the average parameters. This made it possible to create a personalized approach to treatment for each patient based on long-term monitoring. According to the results of the analysis, it becomes possible to predict the optimal conditions for a particular patient and to find the medicaments treatment according to personal characteristics.
ARTICLE | doi:10.20944/preprints201609.0027.v1
Subject: Business, Economics And Management, Business And Management Keywords: customer complaint process improvement; customer complaint service; big data analysis
Online: 7 September 2016 (11:38:33 CEST)
With the advances in industry and commerce, passengers have become more accepting of environmental sustainability issues; thus, more people now choose to travel by bus. Government administration constitutes an important part of bus transportation services as the government gives the right-of-way to transportation companies allowing them to provide services. When these services are of poor quality, passengers may lodge complaints. The increase in consumer awareness and developments in wireless communication technologies have made it possible for passengers to easily and immediately submit complaints about transportation companies to government institutions, which has brought drastic changes to the supply-demand chain comprised of the public sector, transportation companies, and passengers. This study proposed the use of big data analysis technology including systematized case assignment and data visualization to improve management processes in the public sector and optimize customer complaint services. Taichung City, Taiwan was selected as the research area. There, the customer complaint management process in public sector was improved, effectively solving such issues as station-skipping, allowing the public sector to fully grasp the service level of transportation companies, improving the sustainability of bus operations, and supporting the sustainable development of the public sector-transportation company-passenger supply chain.
ARTICLE | doi:10.20944/preprints202209.0413.v2
Subject: Engineering, Chemical Engineering Keywords: Consortium Blockchain; Ring signature; Blockchain privacy; Blockchain security; Access Control; Blockchain big data
Online: 25 June 2023 (04:01:48 CEST)
Banking sectors commit modern working frameworks and models smooth development based on decentralization with keeping money confront in unused ranges and differing activities. Consortium Blockchain Privacy becomes a major concern and the challenge of Most of banking sectors.Development without being hampered being a major concern it can store confirmed, Data privacy includes assuring protection for both insider ad outsider threats therefore access control of Ring signature could help to secure Privacy of inside and outside threats by secure process by RSBAC using CIA triad privacy Confidentiality, Availability, Integrity.This paper proposes a ring signature-based on access control mechanism for determining who a user is and then regulating that person's access to and use of a system's resources. In a nutshell, access control restricts who has access to a system. It also restricts access to system resources to users who have been identified as having the necessary privileges and permissions. The proposed paradigm satisfies the needs of both workflow and non-workflow systems in an enterprise setting. The traits of the conditional purposes, roles, responsibilities, and policies provide the foundation for it. It ensures that internal risks such as database administrators are protected.Finally, it provides the necessary protection in the event that the data is published.
ARTICLE | doi:10.20944/preprints202106.0654.v1
Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: COVID-19; Mental Health; Depression; Big data; Social media.
Online: 28 June 2021 (13:50:49 CEST)
The novel coronavirus disease (COVID-19) pandemic is provoking a prevalent consequence on mental health because of less interaction among people, economic collapse, negativity, fear of losing jobs, and death of the near and dear ones. To express their mental state, people often are using social media as one of the preferred means. Due to reduced outdoor activities, people are spending more time on social media than usual and expressing their emotion of anxiety, fear, and depression. On a daily basis, about 2.5 quintillion bytes of data are generated on social media, analyzing this big data can become an excellent means to evaluate the effect of COVID-19 on mental health. In this work, we have analyzed data from Twitter microblog (tweets) to find out the effect of COVID-19 on peoples mental health with a special focus on depression. We propose a novel pipeline, based on recurrent neural network (in the form of long-short term memory or LSTM) and convolutional neural network, capable of identifying depressive tweets with an accuracy of 99.42%. Preprocessed using various natural language processing techniques, the aim was to find out depressive emotion from these tweets. Analyzing over 571 thousand tweets posted between October 2019 and May 2020 by 482 users, a significant rise in depressing tweets was observed between February and May of 2020, which indicates as an impact of the long ongoing COVID-19 pandemic situation.
ARTICLE | doi:10.20944/preprints202103.0623.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: SARS-CoV-2; Big Data; Data Analytics; Predictive Models; Schools
Online: 25 March 2021 (14:35:53 CET)
Background: CoronaVirus Disease 2019 (COVID-19) is the main discussed topic world-wide in 2020 and at the beginning of the Italian epidemic, scientists tried to understand the virus diffusion and the epidemic curve of positive cases with controversial findings and numbers. Objectives: In this paper, a data analytics study on the diffusion of COVID-19 in Lombardy Region and Campania Region is developed in order to identify the driver that sparked the second wave in Italy Methods: Starting from all the available official data collected about the diffusion of COVID-19, we analyzed google mobility data, school data and infection data for two big regions in Italy: Lombardy Region and Campania Region, which adopted two different approaches in opening and closing schools. To reinforce our findings, we also extended the analysis to the Emilia Romagna Region. Results: The paper aims at showing how different policies adopted in school opening / closing may have on the impact on the COVID-19 spread. Conclusions: The paper shows that a clear correlation exists between the school contagion and the subsequent temporal overall contagion in a geographical area.
ARTICLE | doi:10.20944/preprints201812.0058.v1
Subject: Engineering, Mechanical Engineering Keywords: big data; parameter estimation; model updating; system identification; sequential Monte Carlo sampler
Online: 4 December 2018 (11:17:24 CET)
In this paper the authors present a method which facilitates computationally efficient parameter estimation of dynamical systems from a continuously growing set of measurement data. It is shown that the proposed method, which utilises Sequential Monte Carlo samplers, is guaranteed to be fully parallelisable (in contrast to Markov chain Monte Carlo methods) and can be applied to a wide variety of scenarios within structural dynamics. Its ability to allow convergence of one's parameter estimates, as more data is analysed, sets it apart from other sequential methods (such as the particle filter).
ARTICLE | doi:10.20944/preprints201810.0273.v1
Subject: Physical Sciences, Astronomy And Astrophysics Keywords: astroparticle physics, cosmic rays, data life cycle management, data curation, meta data, big data, deep learning, open data
Online: 12 October 2018 (14:48:32 CEST)
Modern experimental astroparticle physics features large-scale setups measuring different messengers, namely high-energy particles generated by cosmic accelerators (e.g. supernova remnants, active galactic nuclei, etc): cosmic and gamma rays, neutrinos and recently discovered gravitational waves. Ongoing and future experiments are distributed over the Earth including ground, underground/underwater setups as well as balloon payloads and spacecrafts. The data acquired by these experiments have different formats, storage concepts and publication policies. Such differences are a crucial issue in the era of big data and of multi-messenger analysis strategies in astroparticle physics. We propose a service ASTROPARTICLE.ONLINE in the frame of which we develop an open science system which enables to publish, store, search, select and analyse astroparticle physics data. The cosmic-ray experiments KASCADE-Grande and TAIGA were chosen as pilot experiments to be included in this framework. In the first step of our initiative we will develop and test the following components of the full data life cycle concept: (i) describing, storing and reusing of astroparticle data; (ii) software for performing multi-experiment and multi-messenger analyses like deep-learning methods; (iii) outreach including example applications and tutorial for students and scientists outside the specific research field. In the present paper we describe the concepts of our initiative, and in particular the plans toward a common, federated astroparticle data storage.
SHORT NOTE | doi:10.20944/preprints202211.0056.v1
Subject: Biology And Life Sciences, Biology And Biotechnology Keywords: Precision Livestock Farming; Digital Agriculure; Smart Farming; In Ovo Sexing; Big Data; Artificial Intelligence
Online: 2 November 2022 (11:03:44 CET)
Current commercial, pre-commercial, and experimental in ovo techniques for the sex determination of fertilised eggs employ either minimally invasive biomolecular assays (extracting fluid via a small laser-drilled window in the eggshell, for detection of genetic or hormonal biomarkers), analysis of volatile compounds emitted from the eggshell, visible imaging, and reflectance or transmission spectroscopic analysis exploiting molecular optical fluorescence, polarisation, and scattering phenomena, including various combinations of these modalities. , to date no endeavour employing the NIR and FTIR based spectroscopic techniques has resulted in a commercially sustainable solution to the egg sexing problem. Besides achieving only subpar performance in overall accuracy, specificity, and sensitivity, the least invasive of the current state-of-the-art optical methods still requires, creating a transmission window (fenestration) of 12–15 mm diameter through to the mammillae layer of the shell, proximal to the external shell membrane, which can affect the incubation or post-hatch development viability of up to 10% of incubated eggs. Multimodal solution combining Raman spectroscopy and hyperspectral imaging has strong prospects to overcome the hard barriers existing before the perfection of a non-invasive in-line process for high reliability and rapid throughput for sex determination of eggs within 3 days of incubation. The method for sexing of chicken embryos needs to take a multipronged approach in collecting and analyzing spectral data that points to biomarkers using the machine learning approaches to look for nanomolar to picomolar concentrations of these in the fluid.
ARTICLE | doi:10.20944/preprints202308.0208.v1
Subject: Computer Science And Mathematics, Analysis Keywords: I Ching; Coin toss method; 64 hexagram changes; Big data analysis; Hexagram changes topographic map
Online: 3 August 2023 (08:16:04 CEST)
Chinese divination of I Ching has a history of thousands of years. The six lines changes in 64 hexagrams have exceeded one billion scenarios, and the inherent laws among them have not been well revealed to this day. By using big data's technology and coin toss method, this paper simulates the change of 64 hexagrams, and explores the probability and proportion of each hexagram during the change from the perspective of quantification, as well as the maximum and minimum conversion rate of hexagram change. This paper summarizes the basic characteristics and the basic law of hexagram change, and accordingly constructs the spatial form of hexagram change, and reveals the hidden secrets of the ancient Book of changes (I Ching). To achieve this goal, we randomly toss three coins 600 million times to generate 100 million hexagrams. According to the basic rules of hexagram divination, we calculate the hexagram changes. The results showed that: (1) Changes of things are mostly simple changes. The probability of 1 billion randomly generated hexagrams from one to three dynamic lines is close to 80%. (2) About 17% of the 64 hexagrams have no dynamic lines, which means that a significant proportion of the 64 hexagrams are invariant. (3) Small probability changing hexagram certainty and large probability changing hexagram uncertainty. (4) After generating one billion hexagrams at random, the topographic map with the probability of changing hexagrams has axial symmetry and fractal geometric characteristics, and the fractal characteristics are mainly manifested in that the changes on both sides of the symmetry are presented based on the triangle background. These results reflect the obvious characteristics and internal regularity of the changes in the 64 hexagrams of the I Ching. This article provides new ideas for scientific exploration of the internal laws of the 64 hexagrams in the I Ching.
REVIEW | doi:10.20944/preprints202103.0402.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: anesthesia; anesthesiology; big data; registries; database research; acute pain; pain management; postoperative pain; regional anesthesia; regional analgesia.
Online: 15 March 2021 (17:45:39 CET)
The digital transformation of healthcare is advancing, leading to an increasing availability of clinical data for research. Perioperative big data initiatives were established to monitor treatment quality and benchmark outcomes. However, big data analyzes have long exceeded the status of pure quality surveillance instruments. Large retrospective studies nowadays often represent the first approach to new questions in clinical research and pave the way for more expensive and resource intensive prospective trials. As a consequence, utilization of big data in acute pain and regional anesthesia research considerably increased over the last decade. Multicentric clinical registries and administrative databases (e.g., healthcare claims databases) have collected millions of cases until today, on which basis several important research questions were approached. In acute pain research, big data was used to assess postoperative pain outcomes, opioid utilization, and the efficiency of multimodal pain management strategies. In regional anesthesia, adverse events and potential benefits of regional anesthesia on postoperative morbidity and mortality were evaluated. This article provides a narrative review on the growing importance of big data for research in acute postoperative pain and regional anesthesia.
ARTICLE | doi:10.20944/preprints202311.1570.v1
Subject: Public Health And Healthcare, Public Health And Health Services Keywords: cancer research; cancer; natural language processing; data mining; data warehouse; big data
Online: 26 November 2023 (05:13:14 CET)
Background: Real-world data (RWD) related to the health status and care of cancer patients reflect the ongoing medical practice, and their analysis yields essential real-world evidence. Advanced information technologies are vital for their collection, qualification, and reuse in research projects. Methods: UNICANCER, the French federation of comprehensive cancer centres, has innovated a unique research network : Consore. This potent federated tool enables the analysis of data from millions of cancer patients across eleven French hospitals. Results: Currently operational within eleven French cancer centres, Consore employs natural language processing to structure the therapeutic management data of approximately 1.3 million cancer patients. This data originates from their electronic medical records, encompassing about 65 millions of medical records. Thanks to the structured data, which is harmonized within a common data model, and its federated search tool, Consore can create patient cohorts based on patient or tumor characteristics, and treatment modalities. This ability to derive larger cohorts is particularly attractive when studying rare cancers. Conclusions: Consore serves as a tremendous data mining instrument that propels French cancer centres into the big data era. With its federated technical architecture and unique shared data model, Consore facilitates compliance to regulations and acceleration of cancer research projects.
ARTICLE | doi:10.20944/preprints201906.0174.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: Business excellence; information technology; implementation challenge; ISO 20000; big data management.
Online: 18 June 2019 (10:56:19 CEST)
This study contributes to the literature by exploring challenges to implementing ISO 20000-1 in an emerging economy context, and suggests ways to overcome these challenges. A survey-based methodology was adopted. The data were analyzed using principal component analysis. The results indicated that senior management support was the most significant challenge for the successful implementation of IT Service Management (ITSM) systems. Other significant challenges were the justification of significant investment, premium customer support, co-operation and co-ordination among IT support teams, proper documentation, and effective process design The findings help managers introduce IT service management system (ISO 20000-1:2011) as well as improving IT service delivery system in IT support organizations for managing big data in an emerging economy. In the future, cross-firm and cross-country studies on challenges to ISO 20000 can be conducted. Also, interpretive structural model (ISM) can be formulated to examine the interrelationships among the identified challenges to ISO 20000.
ARTICLE | doi:10.20944/preprints202111.0029.v1
Subject: Social Sciences, Decision Sciences Keywords: Real-world fuel consumption rate; machine learning; big data; light-duty vehicle; China
Online: 2 November 2021 (09:40:05 CET)
Private vehicle travel is the most basic mode of transportation, and the effective control of the real-world fuel consumption rate of light-duty vehicles plays a vital role in promoting sustainable economic development as well as achieving a green low-carbon society. Therefore, the impact factors of individual carbon emission must be elucidated. This study builds five different models to estimate real-world fuel consumption rate of light-duty vehicles in China. The results reveal that the Light Gradient Boosting Machine (LightGBM) model performs better than the linear regression, Naïve Bayes regression, Neural Network regression, and Decision Tree regression models, with mean absolute error of 0.911 L/100 km, mean absolute percentage error of 10.4%, mean square error of 1.536, and R squared (R2) of 0.642. This study also assesses a large number of factors, from which three most important factors are extracted, namely, reference fuel consumption rate value, engine power and light-duty vehicle brand. Furthermore, a comparative analysis reveals that the vehicle factors with greater impact on real-world fuel consumption rate are vehicle brand, engine power, and engine displacement. Average air pressure, average temperature, and sunshine time are the three most important climate factors.
ARTICLE | doi:10.20944/preprints202106.0187.v3
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: SARS-CoV2; Biomathematics; Benford law; trials; Epidemiology; Fibonacci; data analysis; big data
Online: 11 June 2021 (15:47:44 CEST)
The Benford method can be used to detect manipulation of epidemiological or trial data during the validation of new drugs. We extend here the Benford method after having detected particular properties for the Fibonacci values 1, 2, 3, 5 and 8 of the first decimal of 10 runs of official epidemiological data published in France and Italy (positive cases, intensive care, and deaths) for the periods of March 1 to May 30, 2020 and 2021, each with 91 raw data. This new method – called “BFP” for Benford-Fibonacci-Perez - is positive in all 10 cases (i.e. 910 values) with an average of favorable cases close to 80%, which, in our opinion, would validate the reliability of these basic data.
Subject: Computer Science And Mathematics, Information Systems Keywords: Academic Analytics; data storage; education and big data; analysis of data; learning analytics
Online: 19 July 2020 (20:37:39 CEST)
Business Intelligence, defined by  as "the ability to understand the interrelations of the facts that are presented in such a way that it can guide the action towards achieving a desired goal", has been used since 1958 for the transformation of data into information, and of information into knowledge, to be used when making decisions in a business environment. But, what would happen if we took the same principles of business intelligence and applied them to the academic environment? The answer would be the creation of Academic Analytics, a term defined by  as the process of evaluating and analyzing organizational information from university systems for reporting and making decisions, whose characteristics allow it to be used more and more in institutions, since the information they accumulate about their students and teachers gathers data such as academic performance, student success, persistence, and retention . Academic Analytics enables an analysis of data that is very important for making decisions in the educational institutional environment, aggregating valuable information in the academic research activity and providing easy to use business intelligence tools. This article shows a proposal for creating an information system based on Academic Analytics, using ASP.Net technology and trusting storage in the database engine Microsoft SQL Server, designing a model that is supported by Academic Analytics for the collection and analysis of data from the information systems of educational institutions. The idea that was conceived proposes a system that is capable of displaying statistics on the historical data of students and teachers taken over academic periods, without having direct access to institutional databases, with the purpose of gathering the information that the director, the teacher, and finally the student need for making decisions. The model was validated with information taken from students and teachers during the last five years, and the export format of the data was pdf, csv, and xls files. The findings allow us to state that it is extremely important to analyze the data that is in the information systems of the educational institutions for making decisions. After the validation of the model, it was established that it is a must for students to know the reports of their academic performance in order to carry out a process of self-evaluation, as well as for teachers to be able to see the results of the data obtained in order to carry out processes of self-evaluation, and adaptation of content and dynamics in the classrooms, and finally for the head of the program to make decisions.
ARTICLE | doi:10.20944/preprints201810.0469.v1
Subject: Engineering, Control And Systems Engineering Keywords: energy efficiency; big data analytics; QoS-IoT; internet of things; smart city; WSN; green computing
Online: 22 October 2018 (05:27:42 CEST)
Various heterogeneous devices or objects shall be integrated for transplant and seamless communication under the umbrella of internet of things (IoT). It would facilitate the open accession of data for the growth of a glut of digital services. To build a general framework of IoT is very complex task because of heterogeneity in devices, technologies, platforms and services, operating in the same system. In this paper, we mainly focus on the framework for big data analytics in smart city applications , which being a broad category specifies the different domains for each application. IoT is intended to support the vision of Smart City, where advance technologies will be used for communication for the quality life of citizens. A novel approach used in this paper, is for enhancing the energy conservation and to reduce the delay in big data gathering at tiny sensor nodes used in IoT framework. To implement the smart city scenario in terms of big data in IoT, an efficient (optimized in quality of service) WSN is required where communication of nodes is energy effcient. That is why, a new protocol QoS-IoT is proposed on the top layer of the architecture which is validated over the traditional protocols.
ARTICLE | doi:10.20944/preprints201705.0116.v1
Subject: Engineering, Mechanical Engineering Keywords: thermal runaway; big-data platform; battery systems; electric vehicles; National Service and Management Center for Electric Vehicles
Online: 16 May 2017 (03:18:57 CEST)
This paper presents a thermal runaway prognosis scheme based on the big-data platform and entropy method for battery systems in electric vehicles. It can simultaneously realize the diagnosis and prognosis of thermal runaway caused by the temperature fault through monitoring battery temperature during vehicular operations. A vast quantity of real-time voltage monitoring data was collected in the National Service and Management Center for Electric Vehicles (NSMC-EV) in Beijing to verify the effectiveness of the presented method. The results show that the proposed method can accurately forecast both the time and location of the temperature fault within battery packs. Furthermore, a temperature security management strategy for thermal runaway is proposed on the basis of the Z-score approach and the abnormity coefficient is set to make real-time precaution of temperature abnormity.
REVIEW | doi:10.20944/preprints202004.0383.v1
Subject: Medicine And Pharmacology, Other Keywords: COVID-19; coronavirus pandemic; big data; epidemic outbreak; artificial intelligence (AI); deep learning
Online: 21 April 2020 (09:01:45 CEST)
The very first infected novel coronavirus case (COVID-19) was found in Hubei, China in Dec. 2019. The COVID-19 pandemic has spread over 215 countries and areas in the world, and has significantly affected every aspect of our daily lives. At the time of writing this article, the numbers of infected cases and deaths still increase significantly and have no sign of a well-controlled situation, e.g., as of 14 April 2020, a cumulative total of 1,853,265 (118,854) infected (dead) COVID-19 cases were reported in the world. Motivated by recent advances and applications of artificial intelligence (AI) and big data in various areas, this paper aims at emphasizing their importance in responding to the COVID-19 outbreak and preventing the severe effects of the COVID-19 pandemic. We firstly present an overview of AI and big data, then identify their applications in fighting against COVID-19, next highlight challenges and issues associated with state-of-the-art solutions, and finally come up with recommendations for the communications to effectively control the COVID-19 situation. It is expected that this paper provides researchers and communities with new insights into the ways AI and big data improve the COVID-19 situation, and drives further studies in stopping the COVID-19 outbreak.
ARTICLE | doi:10.20944/preprints202002.0143.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: ocean; big-data; cite-space; co-authorship analysis; co-citation analysis; keywords co-occurrence analysis; visualization
Online: 11 February 2020 (09:41:17 CET)
Ocean big data is the scientific practice of using big data technology in the marine field. Data from satellites, manned spacecraft, space stations, airship, unmanned aerial vehicles, shore-based radar and observation stations, exploration platforms, buoys, underwater gliders, submersibles, and submarine observation networks are seamlessly combined into the ocean’s big data. Increasing numbers of scholars have tried to fully analyze the ocean’s big data. To explore the key research technology knowledge graphs related to ocean big data, articles between 1990 and 2020 were collected from the “Web of Science”. By comparing bibliometric software and using the visualization software Cite-Space, the pivotal literature related to ocean big data, as well as countries, institutions, categories, and keywords, were visualized and recognized. Journal co-citation analysis networks can help determine the national distribution of core journals. Co-citation analysis networks for documents show authors who are influential at key technical levels. Key co-occurrence analysis network keywords can determine research hot spots and research frontiers. The three supporting elements of marine big data research are shown in the co-citation network. These elements are author, institution, and country. By examining the co-occurrence of keywords, the key technology research directions for future marine big data were determined.
ARTICLE | doi:10.20944/preprints201905.0263.v1
Subject: Computer Science And Mathematics, Computational Mathematics Keywords: natural gas; gas compressibility factor; group method of data handling (GMDH); big data; equation of state; correlation
Online: 22 May 2019 (08:29:32 CEST)
A Natural gas is increasingly being sought after as a vital source of energy, given that its production is very cheap and does not cause the same environmental harms that other resources, such as coal combustion, do. Understanding and characterizing the behavior of natural gas is essential in hydrocarbon reservoir engineering, natural gas transport, and process. Natural gas compressibility factor, as a critical parameter, defines the compression and expansion characteristics of natural gas under different conditions. In this study, a simple second-order polynomial model based on the group method of data handling (GMDH) is presented to determine the compressibility factor of different natural gases at different conditions, using corresponding state principles. The accuracy of the model evaluated through graphical and statistical analyses. The results show that the model is capable of predicting natural gas compressibility with an average absolute error of only 2.88%, a root means square of 0.03, and a regression coefficient of 0.92. The performance of the developed model compared to widely known, previously published equations of state (EOSs) and correlations, and the precision of the results demonstrates its superiority over all other correlations and EOSs.
ARTICLE | doi:10.20944/preprints202308.0937.v1
Subject: Public Health And Healthcare, Public, Environmental And Occupational Health Keywords: Google Trends; disease prediction; Lyme disease; Lyme; Big Data; One Health; negative binomial; mixed models; zoonotic disease; tick-borne disease
Online: 11 August 2023 (11:01:40 CEST)
Google Trends data can be informative for infectious disease incidences, including Lyme disease. However, the use of Google Trends for predictive purposes is underutilized. In this study, we tested the ability of Google Trends search data to predict monthly state-level Lyme disease case counts in the United States. We requested Lyme disease data for the years 2010-2021. We downloaded Google Trends search data on terms for Lyme disease, symptoms of Lyme disease, and diseases with similar symptoms as Lyme disease. We built mixed negative binomial models based on a training dataset (2010-2016) and tested the models on a test dataset (2017-2021). A model was built for each search term and monthly lags of search terms were included as predictors. The highest performing models had high predictive ability, indicated by low Root Mean Squared Errors (RMSEs) and close association between observed and predicted case counts. The highest performing model was for the search term “Summer Flu”, which indicates low specificity of some of the terms. We outline challenges of using Google Trends data, including data availability and a mismatch between geographic units. We discuss opportunities for Google Trends data, including prediction of additional zoonotic diseases and incorporating environmental and companion animal data.
ARTICLE | doi:10.20944/preprints202104.0482.v1
Subject: Business, Economics And Management, Accounting And Taxation Keywords: Smart Scenic; environmental disasters management; organization transformation; system design; Big Data; Internet of Things
Online: 19 April 2021 (13:19:35 CEST)
Abstract: Intensity of natural and man-made disasters is increasing day by day. Disaster is one of the major threats that affects the sustainable development of tourist attractions. Big data and Internet of Things(IoT) will greatly improve the disaster management. Based on the Big Data and IoT, a tourism attraction disaster management system is designed, divided into several stages namely pre-disaster early warning prevention, disaster mitigation, recovery and reconstruction after disaster and updating disaster planning. Then, the system flow is analysed, as well as the system structure is constructed. Additional, system function and its operation flow are introduced, including disaster warning, disaster relief, disaster assessment, real-time monitoring and supporting disaster planning functions. Finally, an application case is introduced. Research intends to improve tourism area disasters management.
ARTICLE | doi:10.20944/preprints202306.0733.v1
Subject: Environmental And Earth Sciences, Sustainable Science And Technology Keywords: clean cooking fuel and utilization technology; population ratio; earth big data
Online: 9 June 2023 (16:38:43 CEST)
Cooking is a very decentralized and private way of energy consumption in human activities. The existing investigation and statistical analysis can’t effectively calculate the proportion of population relying on clean cooking fuel and utilization technology in the region. Therefore, based on the big data of the earth, this paper adopts the combination of spatial analysis and statistical analysis to determine the survey sample area, and according to the economic conditions, topographic characteristics, national policies for new energy construction and living habits of the provinces and cities under investigation, the questionnaire survey is conducted for the research area to calculate the proportion of the regional population relying on clean cooking fuels and utilization technologies. Taking the south of the Yangtze River in China as an example, the paper effectively calculates that 88.25% of the population depends on clean cooking fuel and technology in this region, of which 89.81% are in urban areas and 79.87% are in rural areas. Analysis of the survey data shows that the proportion of the population using clean cooking fuels and technologies is related to factors such as economic development, income and resource endowment. There is a large urban–rural gap in terms of energy consumption tendency and structure in the south of the Yangtze River in China and cooking energy consumption in rural households also varies from region to region. The technical ideas and conclusions of the paper have high reference application value, which can help promote the upgrading of clean energy utilization and provide data basis for relevant decision and policy making.
ARTICLE | doi:10.20944/preprints202308.1347.v1
Subject: Computer Science And Mathematics, Analysis Keywords: Yijing; 64 hexagram changes; number in the great expansion method of divination; yin‐yang asymmetry; big data analysis
Online: 18 August 2023 (11:29:41 CEST)
The divination function of China's Yijing has led to its circulation for thousands of years. In our exploration of Yijing's characteristics using big data, we have discovered variations in results between the coin toss method and the ancient yarrow-stalk method of divination, known as "the number for the great expansion method of divination(大衍之数)". The yarrow-stalk method serves as the fundamental method of divination in Yijing and continues to hold significance in studying the essential characteristics of Yijing. Despite the complexity of yarrow calculations, advancements in computer technology and big data have simplified its application. By employing the yarrow-stalk method, we simulated changes in the 64 hexagrams, calculated probabilities and proportions of hexagram alterations, and derived fundamental characteristics and patterns of hexagrams. Additionally, we constructed the spatial representation of lines and hexagrams. Through a binary system rearrangement, we created a 64x64 matrix illustrating hexagram transformations. Subsequently, we generated 100 million random hexagrams and analyzed line and hexagram changes accordingly. Our findings indicate the following:(1) Big data analysis reveals evident asymmetry in the hexagrams obtained through the yarrow-stalk method, with a triangular fractal characteristic forming the background.(2) Each of the 64 hexagrams exhibits a distinct probability distribution when transforming into other hexagrams, which can be categorized into five types.(3)The occurrence probabilities of Laoyang, Laoyin, Shaoyang, and Shaoyin are 18.61%, 6.387%, 31.38%, and 43.62% respectively. The probabilities of Yin and Yang occurrences are nearly equal, each close to 50%. However, the probability of Laoyang is approximately three times higher than that of Laoyin.(4) Visualized and analyzed the characteristics of hexagram changes greater than 100000 times using 3D statistical maps and Sankey diagram.These results demonstrate that the yarrow-stalk method effectively unveils the characteristics and underlying patterns of the 64 hexagrams. This study provides a novel approach for scientifically exploring the internal laws governing the 64 hexagrams in Yijing.
ARTICLE | doi:10.20944/preprints202211.0034.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Blockchain; Smart Contract; Point Cloud; Security; Privacy Preservation; Software-Defined Network (SND); Big Data; Assurance; Resilience.
Online: 2 November 2022 (02:18:50 CET)
The rapid development of three-dimensional (3D) acquisition technology based on 3D sensors provides a large volume of data, which is often represented in the form of point clouds. Point cloud representation can preserve the original geometric information along with associated attributes in a 3D space. Therefore, it has been widely adopted in many scene-understanding-related applications such as virtual reality (VR) and autonomous driving. However, the massive amount of point cloud data aggregated from distributed 3D sensors also poses challenges for secure data collection, management, storage, and sharing. Thanks to the characteristics of decentralization and security nature, Blockchain has a great potential to improve point cloud services and enhance security and privacy preservation. Inspired by the rationales behind Software Defined Network (SDN) technology, this paper envisions SAUSA, a blockchain-based authentication network that is capable of recording, tracking, and auditing the access, usage, and storage of 3D point cloud data sets in their life-cycle in a decentralized manner. SAUSA adopts an SDN-enabled point cloud service architecture which allows for efficient data processing and delivery to satisfy diverse Quality-of-Service (QoS) requirements. A blockchain-based authentication framework is proposed to ensure security and privacy preservation in point cloud data acquisition, storage, and analytics. Leveraging smart contracts for digitizing access control policies and point cloud data on the blockchain, data owners have full control of their 3D sensors and point clouds. In addition, anyone can verify the authenticity and integrity of point clouds in use without relying on a third party. Moreover, SAUSA integrates a decentralized storage platform to store encrypted point clouds while recording references of raw data on the distributed ledger. Such a hybrid on-chain and off-chain storage strategy not only improves robustness and availability but also ensures privacy preservation for sensitive information in point cloud applications. A proof-of-concept prototype is implemented and tested on a physical network. The experimental evaluation validates the feasibility and effectiveness of the proposed SAUSA solution.
ARTICLE | doi:10.20944/preprints202305.0856.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: quality evaluation of school management; compulsory education stage; big data technology; visualization techniques; evaluation models
Online: 11 May 2023 (13:26:38 CEST)
With the spread of compulsory education emerged school management problems continued, and the quality of school management in compulsory education has attracted a great deal of attention in China. However, the application of information technology in the field is not yet detailed and wide, resulting in problems of heavy workload and high difficulty in the whole evaluation process. Accordingly, we use big data technologies such as Apache Spark, Apache Hive, and SPSS to carry out data cleaning, correlation analysis, dynamic factor analysis, principal component analysis, and visual display on 1760 sample data from 40 primary and secondary schools in Q Province in China, and constructs a model school management of quality evaluation in the compulsory education stage, which reduces the 22 management tasks required for previous evaluation to 5, greatly reducing the workload and difficulty of evaluation. It has improved the efficiency and accuracy of evaluation, and further promoted the simultaneous development of education of five domains and education equity in the compulsory education stage.
REVIEW | doi:10.20944/preprints202211.0161.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: High Performance Computing (HPC); big data; High Performance Data Analytics (HPDS); con-vergence; data locality; spark; Hadoop; design patterns; process mapping; in-situ data analysis
Online: 9 November 2022 (01:38:34 CET)
Big data has revolutionised science and technology leading to the transformation of our societies. High Performance Computing (HPC) provides the necessary computational power for big data analysis using artificial intelligence and methods. Traditionally HPC and big data had focused on different problem domains and had grown into two different ecosystems. Efforts have been underway for the last few years on bringing the best of both paradigms into HPC and big converged architectures. Designing HPC and big data converged systems is a hard task requiring careful placement of data, analytics, and other computational tasks such that the desired performance is achieved with the least amount of resources. Energy efficiency has become the biggest hurdle in the realisation of HPC, big data, and converged systems capable of delivering exascale and beyond performance. Data locality is a key parameter of HPDA system design as moving even a byte costs heavily both in time and energy with an increase in the size of the system. Performance in terms of time and energy are the most important factors for users, particularly energy, due to it being the major hurdle in high performance system design and the increasing focus on green energy systems due to environmental sustainability. Data locality is a broad term that encapsulates different aspects including bringing computations to data, minimizing data movement by efficient exploitation of cache hierarchies, reducing intra- and inter-node communications, locality-aware process and thread mapping, and in-situ and in-transit data analysis. This paper provides an extensive review of the cutting-edge on data locality in HPC, big data, and converged systems. We review the literature on data locality in HPC, big data, and converged environments and discuss challenges, opportunities, and future directions. Subsequently, using the knowledge gained from this extensive review, we propose a system architecture for future HPC and big data converged systems. To the best of our knowledge, there is no such review on data locality in converged HPC and big data systems.
ARTICLE | doi:10.20944/preprints202110.0260.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: big data; data acquisition; data visualization; data exchange; dashboard; frequency stability; Grafana lab; Power Quality; GPS reference; frequency measurement.
Online: 18 October 2021 (18:07:43 CEST)
This article proposes a measurement solution designed to monitor instantaneous frequency in power systems. It uses a data acquisition module and a GPS receiver for time stamping. A program in Python takes care of receiving the data, calculating the frequency, and finally transferring the measurement results to a database. The frequency is calculated with two different methods, which are compared in the article. The stored data is visualized using the Grafana platform, thus demonstrating its potential for comparing scientific data. The system as a whole constitutes an efficient low cost solution as a data acquisition system.
ARTICLE | doi:10.20944/preprints201805.0353.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: big data; big data system; energy; district heating; reinforcement learning
Online: 24 May 2018 (16:05:27 CEST)
This paper presents a study on the thermal efficiency improvement of the user equipment room in the district heating system based on reinforcement learning , and suggests a general method of constructing a learning network(DQN) using deep Q learning, which is a reinforcement learning algorithm that does not specify a model. In addition, we introduce the big data platform system and the integrated heat management system for the energy field in the massive data processing from the IoT sensor installed in large number of thermal energy control facilities.
ARTICLE | doi:10.20944/preprints201901.0130.v1
Subject: Business, Economics And Management, Business And Management Keywords: internationalisation of SMEs; big data; market-oriented information; relational database; supply chain network; optimized database; trade condition; data visualization
Online: 14 January 2019 (10:04:03 CET)
There have been many discussions on the globalisation of SMEs, but it is true that there is not enough academic achievement after such the study of Born global (BG) ventures. The internationalisation of SMEs (Small and Medium Enterprises) is not easy because they lack resources or capabilities compared to multinational corporations. This study investigated the role of government in assisting the internationalisation of SMEs. In particular, SMEs lacked the ability to acquire market-oriented information, so we’ve established the scheme of efficient information support system for the internationalisation of SMEs. In other words, we proposed an information analysis system through the establishment of a relational database constructed for market-oriented information support. KISTI (Korea Institute of Science and Technology Information), which is one of the government-funded research institutes in the Republic of Korea, provided information support to the SMEs dealing with hydrazine related products. This study suggests this case for the market-oriented information support of the government in the internationalisation of SMEs. The research on information support of the government is meaningful in that it suggests a way to support SMEs in practical level.
ARTICLE | doi:10.20944/preprints202210.0472.v1
Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: media; journalism; deep journalism; labor markets; Great Resignation; Quiet Quitting; Millennials; Generation Z; Big Data Analytics; Natural Language Processing (NLP)
Online: 31 October 2022 (08:33:34 CET)
We live in the information age and, ironically, meeting the core function of journalism – i.e., to provide people access to unbiased information – has never been more difficult. This paper explores deep journalism, our data-driven Artificial Intelligence (AI) based journalism approach to study how the LinkedIn media could be useful for journalism. Specifically, we apply our deep journalism approach to LinkedIn to automatically extract and analyse big data to provide the public with information about labour markets, people’s skills and education, and businesses and industries from multi-generational perspectives. The Great Resignation and Quiet Quitting phenomena coupled with rapidly changing generational attitudes are bringing unprecedented and uncertain changes to labour markets and our economies and societies, and hence the need for journalistic investigations into these topics is highly significant. We combine big data and machine learning to create a whole machine learning pipeline and a software tool for journalism that allows discovering parameters for age dynamics in labour markets using LinkedIn data. We collect a total of 57,000 posts from LinkedIn and use it to discover 15 parameters by Latent Dirichlet Allocation algorithm (LDA) and group them into five macro-parameters, namely Generations-Specific Issues, Skills & Qualifications, Employment Sectors, Consumer Industries, and Employment Issues. The journalism approach used in this paper can automatically discover and make objective, cross-sectional, and multi-perspective information available to all. It can bring rigour to journalism by making it easy to generate information using machine learning and can make tools and information available so that anyone can uncover information about matters of public importance. This work is novel since none of the earlier works have reported such an approach and tool and leveraged it to use LinkedIn media for journalism and to discover multigenerational perspectives (parameters) for age dynamics in labour markets. The approach could be extended with additional AI tools and other media.
ARTICLE | doi:10.20944/preprints202008.0254.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: feature selection; k-means; silhouette measure; clustering; big data; fault classification; sensor data; time-series data
Online: 11 August 2020 (06:26:43 CEST)
Feature selection is a crucial step to overcome the curse of dimensionality problem in data mining. This work proposes Recursive k-means Silhouette Elimination (RkSE) as a new unsupervised feature selection algorithm to reduce dimensionality in univariate and multivariate time-series datasets. Where k-means clustering is applied recursively to select the cluster representative features, following a unique application of silhouette measure for each cluster and a user-defined threshold as the feature selection or elimination criteria. The proposed method is evaluated on a hydraulic test rig, multi sensor readings in two different fashions: (1) Reduce the dimensionality in a multivariate classification problem using various classifiers of different functionalities. (2) Classification of univariate data in a sliding window scenario, where RkSE is used as a window compression method, to reduce the window dimensionality by selecting the best time points in a sliding window. Moreover, the results are validated using 10-fold cross validation technique. As well as, compared to the results when the classification is pulled directly with no feature selection applied. Additionally, a new taxonomy for k-means based feature selection methods is proposed. The experimental results and observations in the two comprehensive experiments demonstrated in this work reveal the capabilities and accuracy of the proposed method.
COMMUNICATION | doi:10.20944/preprints202206.0383.v2
Subject: Computer Science And Mathematics, Information Systems Keywords: Exoskeleton; Twitter; Tweets; Big Data; social media; Data Mining; dataset; Data Science; Natural Language Processing; Information Retrieval
Online: 21 July 2022 (04:06:53 CEST)
The exoskeleton technology has been rapidly advancing in the recent past due to its multitude of applications and diverse use-cases in assisted living, military, healthcare, firefighting, and industry 4.0. The exoskeleton market is projected to increase by multiple times of its current value within the next two years. Therefore, it is crucial to study the degree and trends of user interest, views, opinions, perspectives, attitudes, acceptance, feedback, engagement, buying behavior, and satisfaction, towards exoskeletons, for which the availability of Big Data of conversations about exoskeletons is necessary. The Internet of Everything style of today's living, characterized by people spending more time on the internet than ever before, with a specific focus on social media platforms, holds the potential for the development of such a dataset, by the mining of relevant social media conversations. Twitter, one such social media platform, is highly popular amongst all age groups, where the topics found in the conversation paradigms include emerging technologies such as exoskeletons. To address this research challenge, this work makes two scientific contributions to this field. First, it presents an open-access dataset of about 140,000 tweets about exoskeletons that were posted in a 5-year period from May 21, 2017, to May 21, 2022. Second, based on a comprehensive review of the recent works in the fields of Big Data, Natural Language Processing, Information Retrieval, Data Mining, Pattern Recognition, and Artificial Intelligence that may be applied to relevant Twitter data for advancing research, innovation, and discovery in the field of exoskeleton research, a total of 100 Research Questions are presented for researchers to study, analyze, evaluate, ideate, and investigate based on this dataset.
ARTICLE | doi:10.20944/preprints201812.0016.v1
Subject: Social Sciences, Library And Information Sciences Keywords: corpus linguistics; language modeling; big data; language data; databases; monitor corpora; documentary analysis; nuclear power; government regulation; tobacco documents
Online: 3 December 2018 (09:16:14 CET)
With the influence of Big Data culture on qualitative data collection, acquisition, and processing, it is becoming increasingly important that social scientists understand the complexity underlying data collection and the resulting models and analyses. Systematic approaches for creating computationally tractable models need to be employed in order to create representative, specialized reference corpora subsampled from Big Language Data sources. Even more importantly, any such method must be tested and vetted for its reproducibility and consistency in generating a representative model of a particular population in question. This article considers and tests one such method for Big Language Data downsampling of digitally-accessible language data to determine both how to operationalize this form of corpus model creation, as well as testing whether the method is reproducible. Using the U.S. Nuclear Regulatory Commission's public documentation database as a test source, the sampling method's procedure was evaluated to assess variation in the rate of which documents were deemed fit for inclusion or exclusion from the corpus across four iterations. The findings of this study indicate that such a principled sampling method is viable, thus necessitating the need for an approach for creating language-based models that account for extralinguistic factors and linguistic characteristics of documents.
ARTICLE | doi:10.20944/preprints202008.0053.v1
Subject: Physical Sciences, Atomic And Molecular Physics Keywords: Google Trend; Particulate Matter; National Ambient Air Quality Monitoring Information System; Chronic obstructive pulmonary disease; Big Data
Online: 2 August 2020 (18:29:51 CEST)
Depending on the characteristics of the industrial area, toxicity evaluation of human body, risk assessment and health impact assessment may directly cause cancer due to air pollution. Environmental data collection is from August 2018 to January 31, 2019, and the average, minimum, and maximum values of air pollution data respectively. According to the global data on global trends using the Big Data, high blood pressure is confirmed at 33rd place in the world, and myocardial infarction among the environmental diseases is confirmed to be lower than Korea. Disease that occurred in Jeolla province industrial complex considering the characteristics of our country was identified as representative. Air pollutants are considered to be the causes of allergic diseases in Korea. PM10 was found to be higher than the control area (28.8804348 (㎍ / ㎥), 31.7065217 (㎍ / ㎥) and 32.8532609 (㎍ / ㎥). The mean concentrations of PM2.5 in the middle and high exposure areas were lower than those of the control areas, but the highest in the intermediate exposure areas was 16.5978261 (㎍ / ㎥), 16.1086957 (㎍ / ㎥) and 17.1847826 (㎍ / ㎥) respectively. The relationship between the major variables of environmental exposure in Yeosu was confirmed to be correlated with high blood pressure, chronic obstructive pulmonary disease (COPD), bronchitis, cerebrovascular, diabetes, thyroid disease, sinus infection, anemia and pneumonia.
REVIEW | doi:10.20944/preprints202311.1257.v1
Subject: Biology And Life Sciences, Animal Science, Veterinary Science And Zoology Keywords: Climate Change; Net Zero Emissions; Dairy Farming; Big Data; Artificial Intelligence (AI); Greenhouse Gas Emissions; Sustainable Agriculture; Technological Innovation; Policy Framework; Environmental Sustainability
Online: 20 November 2023 (16:16:51 CET)
This paper provides an in-depth exploration of the role of Big Data and Artificial Intelligence (AI) in advancing dairy farming towards net zero emissions, a critical goal in the face of the global climate crisis. The study emphasizes how these technologies significantly enhance the management of greenhouse gas (GHG) emissions and optimize resource use, thereby contributing to environmental sustainability in agriculture. A key aspect of this transition is the alignment with international climate commitments, such as the Paris Agreement, which are instrumental in steering global efforts toward emission reduction and mitigating climate change. The integration of Big Data and AI in dairy farming emerges as a powerful tool to reduce the sector's environmental impact while sustaining economic growth. The paper delves into the specific applications of these technologies in emission management, including predictive analytics for feed optimization, manure management, and energy efficiency enhancements. It also addresses the broader implications of technological integration in dairy farming, considering aspects like benchmarking standards, data privacy, and the role of policy in fostering sustainable practices. The study underscores the challenges inherent in adopting these advanced technologies, including the need for improved farmer training, data quality, and compatibility with existing systems. It also advocates for enhanced policy frameworks that support sustainable practices, encourage technological adoption, and balance economic viability with environmental responsibility. This comprehensive analysis offers valuable insights into harnessing digital technologies for climate change mitigation and delineates a path for the dairy industry towards achieving net zero emissions, thereby contributing significantly to global environmental sustainability efforts.
ARTICLE | doi:10.20944/preprints202207.0121.v5
Subject: Physical Sciences, Astronomy And Astrophysics Keywords: The Cyclic Universe; Big Bang and Big Crunch; Cosmology; Gravitational force; Dark Energy; Dark Matter
Online: 6 March 2023 (16:14:30 CET)
The cyclic universe theory is a model of cosmic evolution according to which the universe undergoes endless cycles of expansion and cooling, each beginning with a “big bang” and ending in a “big crunch”. In this paper we propose a unique property of Space-time, this particular and marvelous nature of space shows us that space can stretch, expand, and shrink. This property of space is caused the size of the Universe changed over time: growing or shrinking. The observed accelerated expansion, which relates the stretching of Shrunk space for the new theory, is derived. This theory is based on three underlying notions: First, the big bang is not the beginning of space or time, but rather at the very beginning fraction of a second there was an infinite pressure of infinite Shrunk space in the cosmic singularity, that pressure gave rise to the big bang, and caused the rapidly growing of space, and all other forms of energy are transformed into new matter and radiation and a new period of expansion and cooling begins. Second, there was a previous phase leading up to it, with multiple cycles of contraction and expansion that repeat indefinitely. Third, the two principal long range forces are the gravitational force and the pressure of shrink space. They are the two most fundamental quantities in the universe that govern cosmic evolution. They may provide the clockwork mechanism that operates our eternal cyclic universe. The universe will not continue to expand forever, no need however, for dark energy and dark matter. This new model of Space-time and its unique properties enables us to describe a sequence of events from the Big Bang to the Big Crunch.
ARTICLE | doi:10.20944/preprints202302.0066.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Smart Tourism; Sustainable Tourism; Natural language Processing (NLP); Big Data Analytics; Deep Learning; Machine Learning; Unsupervised Learning; Bidirectional Encoder Representations from Transformers (BERT); Literature Review; Smart Societies
Online: 3 February 2023 (09:47:55 CET)
The Global natural and manmade events are exposing the fragility of the tourism industry and its impact on the global economy. Prior to the COVID-19 pandemic, tourism contributed 10.3% to the global GDP and employed 333 million people but saw a significant decline due to the pandemic. Sustainable and smart tourism requires collaboration from all stakeholders and a comprehensive understanding of global and local issues to drive responsible and innovative growth in the sector. This paper presents an approach for leveraging big data and deep learning to dis-cover holistic, multi-perspective (e.g., local, cultural, national, and international) and objective information on a subject. Specifically, we develop a machine learning pipeline to extract parameters from academic literature and public opinions on Twitter, providing a unique and comprehensive view of the industry from both academic and public perspectives. The academic-view dataset was created from the Scopus database and contains 156,759 research articles from 2000 to 2022, which were modelled to identify 33 distinct parameters in 4 categories: Tourism Types, Planning, Challenges, and Media & Technologies. A Twitter dataset of 485,813 tweets was collected over 18 months starting March 2021 to August 2022 to showcase public perception of tourism in Saudi Arabia, which was modelled to reveal 13 parameters categorized into two broader sets: Tourist Attractions and Tourism Services. Discovering system parameters are re-quired to embed autonomous capabilities in systems and for decision-making and problem-solving during system design and operations. The proposed approach improves AI-based information discovery by extending the use of scientific literature, Twitter, and other sources for autonomous, dynamic optimizations of systems, promoting novel research in the tourism sector and contributing to the development of smart and sustainable societies. The paper also presents a comprehensive knowledge structure and literature review of the tourism sector based on over 250 research articles.
ARTICLE | doi:10.20944/preprints202105.0226.v2
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: energy efficiency; electric drive; electric motor control; frequency converter; Industrial Internet of Things; edge computing; Big Data; Key Performance Indicators; KPI; dashboard
Online: 8 September 2021 (13:15:18 CEST)
The article presents a method of generating Key Performance Indicators related to electric motor energy efficiency on the basis of Big Data gathered and processed in frequency converter. The authors proved that using the proposed solution it is possible to specify the relation between the control mode of an electric drive and the control quality-energy consumption ratio in the start-up phase as well as in the steady operation with various mechanical loads. The tests were carried out on a stand equipped with two electric motors (one driving, the other used to apply the load by adjusting the parameters of the built-in brake). The measurements were made in two load cases, for motor control modes available in industrially applied frequency converters (scalar V/f, vector Voltage Flux Control without encoder, vector Voltage Flux Control with encoder, vector Current Flux Control and Vector Current Flux Control with torque control). During the experiments values of current intensities (active and output), the actual frequency value, IxT utilization factor, relative torque and the current rotational speed were measured and processed. Based on the data the level of the energy efficiency was determined for various control modes.
ARTICLE | doi:10.20944/preprints202012.0507.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: HIV; big data; Africa; epidemiology
Online: 21 December 2020 (11:14:08 CET)
Background. Predisposition to HIV+ is influenced by a wide range of correlated economic, environmental, demographic, social, and behavioral factors. While evidence among a candidate handful have strong evidence, there is lack of a consensus among the vast array of variables measured in large surveys. Methods. We performed a comprehensive data-driven search for correlates of HIV positivity in >600,000 participants of the Demographic and Health Survey (DHS) across 29 sub-Saharan African countries from 2003 to 2017. We associated a total of 7,251 and of 6,288 unique variables with HIV+ in females and males respectively in each of the 50 surveys. We performed a meta-analysis within countries to attain 29 country-specific associations. Results. We identified 344 (5.4% out possible) and 373 (5.1%) associations with HIV+ in males and females, respectively, with robust statistical support. The identified associations are consistent in directionality across countries and sexes. The association sizes among individual correlates and their predictive capability was low to modest, but comparable to established factors. Among the identified associations, variables identifying being head of household among females was identified in 17 countries with a mean odds ratio (OR) of 2.5 (OR range: 1.1-3.5, R2 = 0.01). Other common associations were identified with marital status, education, age, and ownership of land or livestock. Conclusions. Our continent-wide search for variables has identified under-recognized variables associated with HIV+ that are consistent across the continent and sex. Many of the association sizes are as high as established risk factors for HIV+, including male circumcision.
ARTICLE | doi:10.20944/preprints201706.0115.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: big data,；Hadoop； visualization； model
Online: 26 June 2017 (06:07:51 CEST)
In era of ever-expanding data and knowledge, we lack a centralized system that maps all the faculties to their research works. This problem has not been addressed in the past and it becomes challenging for students to connect with the right faculty of their domain. Since we have so many colleges and faculties this lies in the category of big data problem. In this paper, we present a model which works on the distributed computing environment to tackle big data. The proposed model uses apache spark as an execution engine and hive as database. The results are visualized with the help of Tableau that is connected to Apache Hive to achieve distributed computing.
TECHNICAL NOTE | doi:10.20944/preprints202206.0252.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: SAR; InSAR; Sentinel-1; Big data
Online: 17 June 2022 (09:00:40 CEST)
We describe an efficient and cost effective data access mechanism for Sentinel-1 TOPS 1 mode bursts. Our data access mechanism enables burst-based data access and processing, thereby 2 eliminating ESA’s Sentinel-1 SLC data packaging conventions as a bottleneck to large scale processing. 3 Pipeline throughput is now determined by available compute resources and efficiency of the analysis 4 algorithms. For targeted infrastructure monitoring studies, we are able to generate coregistered, 5 geocoded stacks of SLCs for any AOI in the world in a few minutes. In addition, we describe our 6 global scale radar backscatter and interferometric products and associated pipeline design decisions 7 that ensure geolocation consistency across the suite of derived products from Sentinel-1 data. Finally, 8 we discuss the benefits and limitations of working with geocoded SAR SLC data.
BRIEF REPORT | doi:10.20944/preprints202007.0198.v1
Subject: Physical Sciences, Astronomy And Astrophysics Keywords: gravitation; dark matter; redshift; big bang
Online: 9 July 2020 (17:25:47 CEST)
A close inspection of Zwicky's seminal papers on the dynamics of galaxy clusters reveals that the discrepancy discovered between the dynamical mass and the luminous mass of clusters has been widely overestimated in 1933 as a consequence of several factors, among which the excessive value of the Hubble constant $H_0$, then believed to be about seven times higher than today's average estimate. Taking account, in addition, of our present knowledge of classical dark matter inside galaxies, the contradiction can be reduced by a large factor. To explain the rather small remaining discrepancy of the order of 5, instead of appealing to a hypothetic exotic dark matter, the possibility of a inhomogeneous gravity is suggested. This is consistent with the ``cosmic tapestry" found in the eighties by De Lapparent and her co-authors, showing that the cosmos is highly inhomogeneous at large scale. A possible foundation for inhomogeneous gravitation is the universally discredited ancient theory of Fatio de Duillier and Lesage on pushing gravity, possibly revised to avoid the main criticisms which led to its oblivion. This model incidentally opens the window towards a completely non-standard representation of cosmos, and more basically calls to develop fundamental investigation to find the origin of the large scale inhomogeneity in the distribution of luminous matter
ARTICLE | doi:10.20944/preprints201904.0281.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Cluster computing, Big Data, Spark, Hadoop.
Online: 25 April 2019 (11:22:27 CEST)
The article provides detailed information about the new technologies based on cluster computing Hadoop and Apache Spark. The experimental task of processing logistic regression with the help of these technologies is considered. The findings on the comparison of the performance of cluster computing of Hadoop and Apache Spark are revealed and substantiated.
Subject: Physical Sciences, Acoustics Keywords: Big Bounce Model, Closed Universe, Cosmological Curvature, Big Crunch, Cyclic Universe, Heat Engine Model for Universe
Online: 16 February 2021 (13:41:49 CET)
Assuming a geometrically closed universe, we predict a value for the cosmic curvature, , a value within current observational bounds. We also propose a thermodynamic heat engine model for the universe, which bypasses the need for an inflaton field. Our model is based on a Carnot Cycle where we have isothermal expansion, followed by adiabatic expansion, followed by isothermal contraction, followed by adiabatic contraction, bringing us back to our original starting point. For the working substance, we focus specifically on the CMB radiation filling the collective voids in the universe. Using this construct, we identify cosmic inflation as the isothermal expansion phase, which lasts just under, . The collective CMB volume we see today only increases by a factor of 5.65 times during this process, and homogeneity and perturbations in the CMB are explained. The singularity problem is avoided and we have a clear mechanism for the work done by the cosmos in causing expansion, and later contraction. For scaling laws with respect to the density parameters in Friedmann’s equations, we will assume a susceptibility model for space, where, , the smeared cosmic susceptibility, decreases with increasing cosmic scale parameter, . Within this framework, we can predict a maximum Hubble volume with minimum CMB temperature for the voids before contraction begins, as well as a minimum volume with maximum CMB temperature when expansion starts. The thermodynamic heat cycle deviates from efficiency in converting heat energy into mechanical energy (expansion) by a minuscule amount, namely, . The significance of this number is not known.
ARTICLE | doi:10.20944/preprints202107.0024.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: high-density lipoprotein cholesterol; hypertension; blood pressure; low high-density lipoprotein cholesterol; extremely high high-density lipoprotein cholesterol; body mass index; big data
Online: 1 July 2021 (11:53:04 CEST)
Background Although high-density lipoprotein has cardioprotective effects, the association between serum high-density lipoprotein cholesterol (HDL-C) and hypertension is poorly understood. Objective We investigated whether low and high concentrations of HDL-C are associated with hypertension using a large healthcare dataset. Methods In a community-based cross-sectional study of 1,493,152 Japanese people aged 40–74 years who underwent a health checkup, blood pressures and clinical parameters, including nine HDL-C concentrations (20–110 mg/dL or over) were investigated. Results A crude U-shaped relationship was observed between the nine HDL-C concentrations and blood pressure in males (n = 830,669), while a left-to-right inverted J-shaped relationship was observed in females(n = 662,483). An age-adjusted logistic regression analysis showed J-shaped relationships (left-to-right inversion in females) between HDL-C and odds ratios for hypertension (≥140/90 mmHg), with lower limits of 60–79 mg/dL in males and 90–99 mg/dL in females, which were unchanged after adjusting for smoking, habitual exercise, alcohol consumption, and pharmacotherapy for hypertension, dyslipidemia, and diabetes. However, further adjustment for body mass index and serum triglyceride concentration revealed latent positive linear associations between HDL-C and hypertension, although the association between extremely high HDL-C (≥100 mg/dL) and hypertension was attenuated in non-alcohol drinkers. Conclusion Both low and extremely high HDL-C concentrations are associated with hypertension. The former association may be dependent on excess fat mass, which is often concomitant with low HDL-C, whereas the latter association may be dependent on frequent alcohol consumption.
ARTICLE | doi:10.20944/preprints202201.0106.v1
Subject: Physical Sciences, Astronomy And Astrophysics Keywords: Cosmology, Cosmogenesis, Relativity, Spacetime, Hypergeometrical Universe Theory, Dark Matter, Dark Energy, L-CDM, Big Bang, Big Pop
Online: 10 January 2022 (12:14:01 CET)
HU is the Hypergeometrical Universe Theory (HU)[1-8], proposed in 2006, where the Universe is a Lightspeed Expanding Hyperspherical Hypersurface and Gravitation is an absolute-velocity-dependent, epoch-dependent force. Here we introduce the Big Pop Cosmogenesis and show our calculations associated with the Equation of State of the Universe. This article is the first in a series of articles[9-22] supporting the paradigm shift.
ARTICLE | doi:10.20944/preprints201810.0560.v1
Subject: Arts And Humanities, Philosophy Keywords: natural philosophy; cosmology; emptiness; vacuum; void; dark energy; space flight; exoplanet; big freeze; big crunch; everyday lifeworld
Online: 24 October 2018 (09:27:57 CEST)
The cosmological relevance of emptiness—that is, space without bodies—is not yet sufficiently appreciated in natural philosophy. This paper addresses two aspects of cosmic emptiness from the perspective of natural philosophy: the distances to the stars in the closer cosmic environment and the expansion of space as a result of the accelerated expansion of the universe. Both aspects will be discussed from both a historical and a systematic perspective. Emptiness can be interpreted as “coming” in a two-fold sense: Whereas in the past knowledge of emptiness as it were came to human beings, in the future it is coming insofar as its relevance in the cosmos will increase.The longer and more closely emptiness was studied since the beginning of modernity, the larger became the spaces over which it was found to extend. From a systematic perspective, I will show with regard to the closer cosmic environment that the earth may be separated from the perhaps habitable planets of other stars by an emptiness that is inimical to life and cannot be traversed by humans. This assumption is a result of the discussion of the constraints and possibilities of interstellar space travel as defined by the known natural laws and technical means. With the accelerated expansion of the universe, the distances to other galaxies (outside of the so-called local group) are increasing. According to the current standard model of cosmology and assuming that the acceleration will remain constant, in the distant future this expansion will lead first to a substantial change in the epistemic conditions of cosmological knowledge and finally to the completion of the cosmic emptiness and of its relevance, respectively. Imagining the postulated completely empty last state leads human thought to the very limits of what is conceivable.
ARTICLE | doi:10.20944/preprints202204.0295.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: Forecasting; SARIMA; Holt-Winters; Climate; Big Data
Online: 29 April 2022 (08:44:28 CEST)
As its capital, Jakarta plays a critical role in boosting Indonesia’s economic growth and setting the precedent for broader change outside of the city. One crucial avenue of inquiry to better understand, and prepare for, the future of a country so heavily impacted by disastrous weather events is understanding the effects of climate change through data. This study investigates meteorological data collected from 1996 to 2021 and compares the application of the SARIMA and the Holt-Winters methods to predict the future influence of climatic parameters on Jakarta’s weather. The performance of the SARIMA method is proven to provide better results than the Holt-Winter models and both methods showed the best performances when forecasting the humidity data. The results of the forecast are able to demonstrate the characteristic of the climate in Jakarta, with dry season ranging from May to October and wet season ranging from November to April.
REVIEW | doi:10.20944/preprints202106.0597.v1
Subject: Physical Sciences, Astronomy And Astrophysics Keywords: spacetime; relativistic cosmology; big bang model; inflation
Online: 24 June 2021 (09:33:15 CEST)
In this review article the study of the development of relativistic cosmology and introduction of inflation in it is carried out. We study the properties of standard cosmological model developed in the framework of relativistic cosmology and the geometric structure of spacetime connected coherently with it. We examine the geometric properties of space and spacetime ingrained into the standard model of cosmology. The big bang model of the beginning of the universe is based on the standard model which succumbed to failure in explaining the flatness and the large-scale homogeneity of the universe as demonstrated by observational evidence. These cosmological problems were resolved by introducing a brief acceleratedly expanding phase in the very early universe known as inflation. Cosmic inflation by setting the initial conditions of the standard big bang model resolves these problems of the theory. We discuss how the inflationary paradigm solves these problems by proposing the fast expansion period in the early universe.
Subject: Social Sciences, Geography, Planning And Development Keywords: 'Big Things'; Starchitecture; Agritecture; Parkitecture; Urban Prairies
Online: 5 April 2021 (16:02:43 CEST)
This article analyses three recent shifts in what  called the geography of ‘Big Things’ meaning the contemporary functions and adaptability of modern city centre architecture. We periodise the three styles conventionally into the fashionable ‘Starchitecture’ of the 1990s, the repurposed ‘Agritecture’ of the 2000s and the parodising ‘Parkitecture’ of the 2010s. Starchitecture was the form of new architecture coinciding with the rise of neo-liberalism in its brief era of global urban competitiveness prevalent in the 1990s. After the Great Financial Crash of 2007-8 the market for high-rise emblems of iconic, thrusting, skyscrapers and giant downtown and suburban shopping malls waned and online shopping and working from home destroyed the main rental values of the CBD. In some illustrious cases ‘Agritecture’ caused re-purposed office blocks and other CBD accompaniments to be re-purposed as settings for high-rise urban farming, especially aquaponics and hydroponic horticulture. Now, Covid-19 has further undermined traditional CBD property markets causing some administrations to decide to bulldoze their ‘deadmalls’ and replace them with urban prairie landscapes, inviting the designation ‘Parkitecture’ for the bucolic results. The paper presents an account of these transitions by reference to questions raised by urban cultural scholars like Jane M. Jacobs and Jean Gottmann to figure out answers in time and space to questions their work poses.
ARTICLE | doi:10.20944/preprints202011.0010.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Big Data; Clustering; Distributed system; Machine learning
Online: 2 November 2020 (10:00:29 CET)
In the field of machine learning, cluster analysis has always been a very important technology for determining useful or implicit characteristics in the data. However, the current mainstream cluster analysis algorithms require comprehensive analysis of the overall data to obtain the best parameters in the algorithm. As a result, handling large-scale datasets would be difficult. This research proposes a distributed related clustering mechanism for Unsupervised Learning, which assumes that if adjacent data are similar, a group can be formed by relating to more data points. Therefore, when processing data, large-scale datasets can be distributed to multiple computers, and the correlation of any two datasets in each computer can be calculated simultaneously. Later, results are processed through aggregation and filtering before assembled into groups. This method would greatly reduce the pre-processing and execution time of the dataset; in practical application, it only needs to focus on how the relevance of the data is designed. In addition, the experimental results show the accuracy, applicability, and ease of use of this method.
Subject: Engineering, Energy And Fuel Technology Keywords: Deep learning; Big data; Machine learning; Biofuels
Online: 30 September 2020 (11:19:52 CEST)
The importance of energy systems and its role in economics and politics is not hidden for anyone. This issue is not only important for the advanced industrialized countries, which are major energy consumers, but is also important for oil-rich countries. In addition to the nature of these fuels which contains polluting substances, the issue of their ending up has aggravated the growing concern. Biofuels can be used in different fields for energy production like electricity production, power production or for transportation. Various scenarios have been written about the estimated biofuels from different sources in the future energy system. The availability of biofuels for the electricity market, heating and liquid fuels is very important. Accordingly, the need for handling, modelling, decision making and future forecasting for biofuels can be one of the main challenges for scientists. Recently, machine learning and deep learning techniques have been popular in modeling, optimizing and handling the biodiesel production, consumption and its environmental impacts. The main aim of this study is to evaluate the ML and DL techniques developed for handling biofuels production, consumption and environmental impacts, both for modeling and optimization purposes. This will help for sustainable biofuel production for the future generations.
ARTICLE | doi:10.20944/preprints201806.0175.v2
Subject: Physical Sciences, Particle And Field Physics Keywords: cosmology; big bang; dark energy; neutrinos; gravitation
Online: 28 October 2019 (06:52:16 CET)
The ΛCDM model successfully models the expansion of matter in the universe with an expansion of the underlying metric. However, it does not address the physical origin of the big bang and dark energy. A model of cosmology is proposed, where the state of high energy density of the big bang is created by the collapse of an antineutrino star that has exceeded its Chandrasekhar limit. To allow the first neutrino stars and antineutrino stars to form naturally from an initial quantum vacuum state, it helps to assume that antimatter has negative gravitational mass. While it may prove incorrect, this assumption may also help identify dark energy. The degenerate remnant of an antineutrino star can today have an average mass density that is similar to the dark energy density of the ΛCDM model. When in hydrostatic equilibrium, this antineutrino star remnant can emit isothermal cosmic microwave background radiation and accelerate matter radially. This model and the ΛCDM model are in similar quantitative agreement with supernova distance measurements. Other observational tests of the above model are also discussed.
ARTICLE | doi:10.20944/preprints201901.0277.v1
Subject: Public Health And Healthcare, Nursing Keywords: personality; burnout; engagement; Big Five; healthcare personnel
Online: 28 January 2019 (12:00:59 CET)
The burnout syndrome, which affects so many healthcare workers, has recently awakened wide interest due to the severe repercussions related to its appearance. Even though job factors are determinant to its development, not all individuals exposed to the same work conditions show burnout, which demonstrates the importance of individual variables such as personality. The purpose of this study was to determine personality characteristics of a sample of nursing professionals based on the Big Five model, and then, having determined the personality profiles, analyze the differences in burnout and engagement based on those profiles. The sample was made up of 1236 nurses. An ad hoc questionnaire was prepared to collect the sociodemographic data, and the Brief Burnout Questionnaire, the Utrecht Work Engagement Scale and the Big Five Inventory-10 were used. The results showed that the existence of burnout in this group of workers, is associated negatively with extraversion, agreeableness, conscientiousness and openness to experience, and positively with the neuroticism personality trait. These personality factors showed the opposite pattern with regard to engagement. Three different personality profiles were also found in nursing personnel, in which professionals who had a profile marked by strong neuroticism and low scores on the rest of the personality traits where those who were most affected by burnout.
ARTICLE | doi:10.20944/preprints202101.0017.v3
Subject: Physical Sciences, Astronomy And Astrophysics Keywords: Oscillating universe; big bang; big bounce; Hubble constant; dark energy; dark matter; inflation; vacuum energy density; Casimir effect
Online: 15 November 2023 (09:05:38 CET)
In cosmology dark energy and dark matter are included in the ΛCDM model, but they are still completely unknown. Because in black holes Lorentz invariance seems not to be applicable for curved space-time, we introduce a model for a reduced speed of light in black holes due to quantum gravity effects and the Heisenberg uncertainty relation. Then black holes are a source for a scalar field with dark energy characteristics. This model has no information paradox for black holes because particles / radiation entering the black hole are redshifted in their wavelength that far that the wavelength has the size of the Schwarzschild radius and thus they are in some way "frozen" in the black hole. We show that the scalar field also has characteristics of dark matter shortly after Planck time, when we use a Big Bounce model. This model also presents an alternative to cosmological inflation with the possibility to solve the flatness and horizon problem and the problem of density fluctuations.
ARTICLE | doi:10.20944/preprints202311.0736.v1
Subject: Business, Economics And Management, Business And Management Keywords: benefit; risk; IT outsourcing; SEM; big Polish organizations
Online: 13 November 2023 (14:21:18 CET)
(1) The main purpose of this article is to analyse the relationship between risk factors and the benefits of Information Technology Outsourcing (ITO). For this purpose, a structural model was created to determine the dependency of the benefits on the importance of the risk factors associated with this service. (2) The research methodology included: a literature review, descriptive analysis, and empirical and formal methods. A survey questionnaire was used to collect data, and structural equation methodology was used to build and verify hypotheses. The structural model was tested on data from business practice - 200 large organizations, mostly enterprises, operating in Poland and using IT outsourcing. (3) The purpose was accomplished by determining the relationship between the IT outsourcing benefits and the risk factors' importance. The strongest relationship exists between supplier-related risk factors and technological benefits, while a slightly weaker relationship exists between economic risk and economic benefits and customer-related risk and organizational benefits. The weakest relationship exists between security risks and strategic benefits. (4) The article’s originality and value consist in establishing the existence of a relationship between risk factors and the benefits of IT outsourcing, and in identifying which groups of risks have the strongest impact on the benefits of ITO, which can be exploited by the service suppliers.
ARTICLE | doi:10.20944/preprints202309.0053.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: artificial intelligence; social media; big data; financial modeling
Online: 1 September 2023 (11:42:15 CEST)
Abstract—Annually, approximately 500,000 Merger and Ac- quisition (M&A) transactions are disclosed globally, each trans- action inciting substantial perturbations to the associated com- panies’ equity prices. The probability of an M&A transaction’s closure, as perceived by the public, inherently influences the stock price of the target company leading up to the proposed date of the deal. Given the recent advancements in the realm of Natural Language Processing (NLP), we propose an empirical investigation into the correlation between digital dialogue sur- rounding M&A transactions and consequent movements in the stock prices of involved companies. Utilizing transformer-based encoder-only architectures, we fine tune a stance detection model on an extensive dataset, amassed from digital communication platforms, featuring public discourse related to five historical M&A transactions. Ultimately, we achieved 70% accuracy on deal-completion stance detection using the Roberta-base model. We subsequently employ the aggregated the public sentiment towards the completion or termination of a proposed M&A transaction to model stock price movement. Utilizing a multitude of time-series based approaches, we achieve a mean absolute error of 2.29 USD for next-day price prediction and 3.40 USD for next-week price prediction. Ultimately, we find an existing but tenuous relationship between online discourse and the price trajectory of target companies, ultimately highlighting the complex social and economic phenomena behind M&A deals.Index Terms—Mergers and Acquisitions, Stock prediction, su- pervised learning, neural networks, Recurrent Neural Network, LSTM, stance detection, transformers
ARTICLE | doi:10.20944/preprints202211.0254.v2
Subject: Physical Sciences, Mathematical Physics Keywords: singularity; infinite; Big Bang; universe evolution; scientific theory.
Online: 30 December 2022 (09:54:37 CET)
It is advisable to avoid and, even better, demystify such grandiose terms as "infinity" or "singularity" in the description of the cosmos. Its proliferation does not positively contribute to the understanding of key concepts that are essential for an updated account of its origin and evolutionary history. It will be here argued that, as a matter of fact, there are no infinities in physics, in the real world: all which appear, in any given formulation of nature by means of mathematical equations, actually arise from extrapolations, which are made beyond the bounds of validity of the equations themselves. Such crucial point is rather well-known, but too often forgotten, as discussed in this paper with several examples; namely, the famous Big Bang singularity and others, which appeared before in classical mechanics and electrodynamics, and notably in the quantization of field theories. A brief description of the Universe’s history and evolution follows. Special emphasis is put on what is presently known, from detailed observations of the cosmos and, complementarily, from advanced experiments of very-high-energy physics. To conclude, a future perspective on how this knowledge might soon improve is given.
ARTICLE | doi:10.20944/preprints202108.0471.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Big data; Health prevention; Machine learning; Medical data
Online: 24 August 2021 (14:00:12 CEST)
CVDs are a leading cause of death globally. In CVDs, the heart is unable to deliver enough blood to other body regions. Since effective and accurate diagnosis of CVDs is essential for CVD prevention and treatment, machine learning (ML) techniques can be effectively and reliably used to discern patients suffering from a CVD from those who do not suffer from any heart condition. Namely, machine learning algorithms (MLAs) play a key role in the diagnosis of CVDs through predictive models that allow us to identify the main risks factors influencing CVD development. In this study, we analyze the performance of ten MLAs on two datasets for CVD prediction and two for CVD diagnosis. Algorithm performance is analyzed on top-two and top-four dataset attributes/features with respect to five performance metrics –accuracy, precision, recall, f1-score, and roc-auc – using the train-test split technique and k-fold cross-validation. Our study identifies the top two and four attributes from each CVD diagnosis/prediction dataset. As our main findings, the ten MLAs exhibited appropriate diagnosis and predictive performance; hence, they can be successfully implemented for improving current CVD diagnosis efforts and help patients around the world, especially in regions where medical staff is lacking.
REVIEW | doi:10.20944/preprints202001.0378.v1
Subject: Computer Science And Mathematics, Mathematical And Computational Biology Keywords: workflows; containers; cloud computing; Kubernetes; big data; reproducibility
Online: 31 January 2020 (05:15:01 CET)
Containers are gaining popularity in life science research as they provide a solution for encompassing dependencies of provisioned tools, simplify software installations for end users and offer a form of isolation between processes. Scientific workflows are ideal for chaining containers into data analysis pipelines to aid in creating reproducible analyses. In this manuscript we review a number of approaches to using containers as implemented in the workflow tools Nextflow, Galaxy, Pachyderm, Argo, Kubeflow, Luigi and SciPipe, when deployed in cloud environments. A particular focus is placed on the workflow tool’s interaction with the Kubernetes container orchestration framework.
ARTICLE | doi:10.20944/preprints201810.0711.v1
Subject: Biology And Life Sciences, Biophysics Keywords: order; entropy; chaos; evolution; cosmic mind; big bang
Online: 30 October 2018 (07:50:20 CET)
We discuss the role of the opposing principles of order and disorder in physical and biological systems in determining stability, growth and evolution and bring forth the potential role of a cosmic ordering agency. We analyze its role in decreasing entropy by coarse-graining and hence in determining the initial low entropy state of the big bang universe. Since all physical and biological systems have either cycles of order and disorder alternating, or may have chaotic evolution with non-linear laws, the same is expected of the dynamics of the whole universe as well. The entropy of the initial state of the universe could be low because of the reduction of degrees of freedom (DoF) as one moves from physical encoding to neural encoding and then on to psychic encoding of information in a nested manner by coarse-graining. It is by such encoding that this cosmic agency enables the universe to pass through the big crunch phase and then rolls it out as the big bang universe from the initial state of low entropy.
ARTICLE | doi:10.20944/preprints202311.0412.v1
Subject: Engineering, Marine Engineering Keywords: big data; artificial intelligence; maritime surveillance; maritime security; sustainability
Online: 7 November 2023 (06:46:46 CET)
In today's world, wherein more than 80% of world trade is carried out by maritime routes, the safety and security of the seas where this trade takes place is of vast importance for nations and the in-ternational community. For this reason, ensuring the sustainable safety and security of the seas has become an integral part of the mission of all maritime-related entities. The utilization of big data extracted from the seas and maritime activities into meaningful information with artificial intelli-gence applications and developing applications that can be used in maritime surveillance will be of great importance for augmenting maritime safety and security. In this article, data sources which can be used by a maritime surveillance system based on big data and artificial intelligence tech-nologies and which can be established around sensitive sea areas and critical coastal facilities, are identified and a model proposal using this maritime big data is put forward.