Search | Preprints.org

Preprint REVIEW | doi:10.20944/preprints202304.0051.v2

Research Integrity and Publish or Perish: Definitions and Relations

Subject: Social Sciences, Library And Information Sciences Keywords: Research integrity; Publish or Perish; Misconduct in Science; Data fabrication; Data falsification; Plagiarism

Online: 2 April 2024 (12:46:37 CEST)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints202003.0141.v1

Sharing Is Caring – Data Sharing Initiatives in Healthcare

Tim Hulsen

Subject: Medicine And Pharmacology, Other Keywords: data sharing; data management; data science; big data; healthcare

Online: 8 March 2020 (16:46:20 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202405.0508.v2

HCI and Data: Interacting in a New Era of Virtualization

Iván Durango, José A. Gallud, Victor M. R. Penichet

Subject: Computer Science And Mathematics, Information Systems Keywords: Human-Data Interaction; Human-Computer Interaction; Big Data; Data virtualization; Data Accessibility; Data Management; Data Privacy; Data Ethics; Data-Driven Decision-Making

Online: 1 July 2024 (08:12:01 CEST)

Show abstract| Download PDF| Share

The rapid technological progress has ushered in a new era of human-computer interaction, where the distinction between the physical and virtual realms is becoming increasingly blurred. This research paper explores the profound and multifaceted intersection of Human-Data Interaction (HDI) and Data Virtualization (DV), examining how emerging technologies can significantly enhance the exploration, comprehension, and utilization of complex, multidimensional data sets. Informed by the insights gleaned from prior research in this domain , the present study delves into the potential of DV techniques to improve HDI, with a particular focus on three experimental investigations conducted within the realms of education, healthcare, and retail. The findings reveal the benefits and potential challenges associated with the implementation of DV in these diverse contexts, offering valuable guidance for the design and development of future HDI systems. Drawing upon a diverse array of authoritative sources, this paper presents a holistic, forward-looking perspective on the future of HDI, underscoring the critical role that DV will play in shaping the next generation of human-computer interfaces and facilitating a deeper, more intuitive understanding of the digital world. Furthermore, the paper presents a preliminary framework for integrating HDI principles into standard design practices. This framework outlines key considerations and guidelines to help designers and developers incorporate HDI techniques more effectively into the development of data-driven applications and interfaces.The proposed framework outlines key considerations for enhancing data accessibility and comprehension, empowering users to exercise greater control over their data, and cultivating transparent dialogues between data providers and end-users. By establishing this conceptual foundation, the paper aims to facilitate the seamless integration of HDI principles into standard design practices, ultimately leading to more intuitive, user-centric, and ethically-grounded approaches to data interaction and utilization.

Preprint ARTICLE | doi:10.20944/preprints202404.1018.v1

Discovering Data Domains and Products in Data Meshes Using Semantic Blueprints

Michalis Pingos, Andreas S. Andreou

Subject: Computer Science And Mathematics, Computer Science Keywords: Big Data; Data Lakes; Data Meshes; Data Products; Data Blueprints; Metadata Semantic Enrichment

Online: 16 April 2024 (16:26:06 CEST)

Show abstract| Download PDF| Supplementary Files| Share

Preprint ARTICLE | doi:10.20944/preprints202206.0320.v4

Ten Simple Rules for Using Public Biological Data for Your Research

Vishal Oza, Jordan Whitlock, Elizabeth Wilk, Angelina Uno-Antonison, Brandon Wilk, Manavalan Gajapathy, Timothy Howton, Austyn Trull, Lara Ianov, Elizabeth Worthey, Brittany Lasseigne

Subject: Biology And Life Sciences, Other Keywords: data; reproducibility; FAIR; data reuse; public data; big data; analysis

Online: 2 November 2022 (02:55:49 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202003.0268.v1

TEEDA: An Interactive Platform for Matching Data Providers and Users in Data Marketplace

Teruaki Hayashi, Yukio Ohsawa

Subject: Social Sciences, Library And Information Sciences Keywords: matching; data marketplace; data platform; data visualization; call for data

Online: 17 March 2020 (04:10:28 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202402.0602.v1

The Improvement of the Use of Open Data in Public Institutions

Besart Hyseni, Lejla Abazi Bexheti

Subject: Computer Science And Mathematics, Information Systems Keywords: Improving use of open data; data utilization; data optimization; enhancing data access; open data impact; open data government; data transparency; data-driven decision making

Online: 12 February 2024 (09:34:51 CET)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints202309.2113.v1

Navigating the Data Architecture Landscape: A Comparative Analysis of Data Warehouse, Data Lake, Data Lakehouse, and Data Mesh

Benjamin wong

Subject: Computer Science And Mathematics, Hardware And Architecture Keywords: Data, DWH, Data Warehouse, Architecture, Data Lake, Storage, Analysis, Data Mesh, Analytical, Architectural, Data Vault

Online: 3 October 2023 (03:28:55 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202403.0265.v1

Security and Ownership in User Defined Data Meshes

Michalis Pingos, Panayiotis Christodoulou, Andreas S. Andreou

Subject: Computer Science And Mathematics, Computer Science Keywords: Big Data; Smart Data Processing; Systems of Deep Insight; Data Meshes; Data Lakes; Data Products; Blockchain; NFT; Data Blueprints

Online: 5 March 2024 (15:04:49 CET)

Show abstract| Download PDF| Supplementary Files| Share

Preprint ARTICLE | doi:10.20944/preprints202406.1319.v1

Proposing Machine Learning Models Suitable for Predicting Open Data Utilization

Junyoung Jeong, Keuntae Cho

Subject: Business, Economics And Management, Business And Management Keywords: open data; open government data; open data utilization

Online: 19 June 2024 (07:36:26 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202405.1988.v1

Privacy Preserving Human Mobility Generation using Grid based Data and Graph Autoencoders

Fabian Netzler, Markus Lienkamp

Subject: Social Sciences, Transportation Keywords: Mobility Data; Synthetic Data Generation; Mobility Data Analytics

Online: 30 May 2024 (12:02:38 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202304.0130.v1

Data Cooperatives as Catalysts for Collaboration, Data Sharing, and the (Trans)Formation of the Digital Commons

Michael Max Bühler, Igor Calzada, Isabel Cane, Thorsten Jelinek, Astha Kapoor, Morshed Mannan, Sameer Mehta, Marina Micheli, Vijay Mookerje, Konrad Nübel, Alex Pentland, Trebor Scholz, Divya Siddarth, Julian Tait, Bapu Vaitla, Jianguo Zhu

Subject: Computer Science And Mathematics, Other Keywords: data; cooperatives; open data; data stewardship; data governance; digital commons; data sovereignty; open digital federation platform

Online: 7 April 2023 (14:14:02 CEST)

Show abstract| Download PDF| Share

Network effects, economies of scale, and lock-in-effects increasingly lead to a concentration of digital resources and capabilities, hindering the free and equitable development of digital entrepreneurship (SDG9), new skills, and jobs (SDG8), especially in small communities (SDG11) and their small and medium-sized enterprises (“SMEs”). To ensure the affordability and accessibility of technologies, promote digital entrepreneurship and community well-being (SDG3), and protect digital rights, we propose data cooperatives [1,2] as a vehicle for secure, trusted, and sovereign data exchange [3,4]. In post-pandemic times, community/SME-led cooperatives can play a vital role by ensuring that supply chains to support digital commons are uninterrupted, resilient, and decentralized [5]. Digital commons and data sovereignty provide communities with affordable and easy access to information and the ability to collectively negotiate data-related decisions. Moreover, cooperative commons (a) provide access to the infrastructure that underpins the modern economy, (b) preserve property rights, and (c) ensure that privatization and monopolization do not further erode self-determination, especially in a world increasingly mediated by AI. Thus, governance plays a significant role in accelerating communities’/SMEs’ digital transformation and addressing their challenges. Cooperatives thrive on digital governance and standards such as open trusted Application Programming Interfaces (APIs) that increase the efficiency, technological capabilities, and capacities of participants and, most importantly, integrate, enable, and accelerate the digital transformation of SMEs in the overall process. This policy paper presents and discusses several transformative use cases for cooperative data governance. The use cases demonstrate how platform/data-cooperatives, and their novel value creation can be leveraged to take digital commons and value chains to a new level of collaboration while addressing the most pressing community issues. The proposed framework for a digital federated and sovereign reference architecture will create a blueprint for sustainable development both in the Global South and North.

Preprint COMMUNICATION | doi:10.20944/preprints202401.0780.v1

Data Reuse in Agricultural Genomics Research: Present Challenges and Future Solutions

Alenka Hafner, Victoria DeLeo, Cecilia Deng, Christine G. Elsik, Damarius Fleming, Peter W. Harrison, Theodore S. Kalbfleisch, Bruna Petry, Boas Pucker, Elsa H. Quezada-Rodríguez, Christopher K. Tuggle, James Koltes

Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: data reuse; agriculture; open data; metadata; data standards; equity

Online: 10 January 2024 (10:07:03 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202311.0104.v1

Conceptual Design of a Generic Data Harmonization Process for OMOP CDM

Elisa Henke, Michele Zoch, Yuan Peng, Ines Reinecke, Martin Sedlmayr, Franziska Bathelt

Subject: Public Health And Healthcare, Other Keywords: OMOP; OHDSI; interoperability; data harmonization; clinical data; claims data

Online: 2 November 2023 (07:45:02 CET)

Show abstract| Download PDF| Supplementary Files| Share

Preprint ARTICLE | doi:10.20944/preprints202308.1237.v1

A Method to Enable Automatic Extraction of Cost and Quantity Data from Hierarchical Construction Information Documents to Enable Rapid Digital Comparison and Analysis

Daniel Adanza Dopazo, Lamine Mahdjoubi, Bill Gething

Subject: Engineering, Transportation Science And Technology Keywords: data mining; data extraction; data science; cost infrastructure projects

Online: 17 August 2023 (09:25:22 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202306.1378.v1

Algorithm-based Data Generation (ADG) Engine for Data Analytics

Iman I. M. Abu Sulayman, Peter Voege, Abdelkader Ouda

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Data Generation; Anomaly Data; User Behavior Generation; Big Data

Online: 19 June 2023 (16:31:37 CEST)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints202007.0153.v1

A Hitchhiker’s Guide to Working with Large, Open-Source Neuroimaging Datasets

Corey Horien, Stephanie Noble, Abigail Greene, Kangjoo Lee, Daniel Barron, Siyuan Gao, Dave O'Connor, Mehraveh Salehi, Javid Dadashkarimi, Xilin Shen, Evelyn Lake, R. Todd Constable, Dustin Scheinost

Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: Open-science; big data; fMRI; data sharing; data management

Online: 8 July 2020 (11:53:33 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202407.2088.v1

The Case of Clean Customer Master Data for Customer Analytics: A Neglected Element for Data Monetization

Jasmin Singh, Heiko Gebauer

Subject: Business, Economics And Management, Business And Management Keywords: Customer analytics; data cleanliness; data harmonization; data integration; data monetization; digitization; digitalization; digital transformation; and; customer master data

Online: 25 July 2024 (16:53:25 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201810.0273.v1

Russian-German Astroparticle Data Life Cycle Initiative

Igor Bychkov, Andrey Demichev, Julia Dubenskaya, Oleg Fedorov, Andreas Haungs, Andreas Heiss, Yulia Kazarina, Elena Korosteleva, Dmitriy Kostunin, Alexander Kryukov, Andrey Mikhailov, Minh-Duc Nguyen, Stanislav Polyakov, Evgeny Postnikov, Alexey Shigarov, Dmitry Shipilov, Achim Streit, Viktoria Tokareva, Doris Wochele, Jürgen Wochele, Dmitry Zhurov

Subject: Physical Sciences, Astronomy And Astrophysics Keywords: astroparticle physics, cosmic rays, data life cycle management, data curation, meta data, big data, deep learning, open data

Online: 12 October 2018 (14:48:32 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202105.0589.v1

A Study on Ways to Extend Public Data for Game Ratings from Korea

HoSeong Kang, JungYoon Kim

Subject: Engineering, Automotive Engineering Keywords: Game Ratings; Public Data; Game Data; Data analysis; GRAC(Korea)

Online: 25 May 2021 (08:32:32 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202007.0078.v1

Data Driven Analytics for Personalized Medical Decision Making

Nataliia Melnykova, Nataliya Shakhovska, Michal Gregus, Volodymyr Melnykov, Mariana Zakharchuk, Olena Vovk

Subject: Computer Science And Mathematics, Information Systems Keywords: personalization; decision making; medical data; artificial intelligence; Data-driving; Big Data; Data Mining; Machine Learning

Online: 5 July 2020 (15:04:17 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202404.0849.v1

Intellecta Cognitiva: A Comprehensive Dataset for Advancing Academic Knowledge and Machine Reasoning

Ditto PS, Ajmal PS, Jithin VG

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Synthetic data; pretrain data; llm training

Online: 12 April 2024 (12:46:27 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202103.0593.v1

Creating a Business and Supporting Digital Transformation

Miguel Ayala, Jorge Portella, Sergio Martinez, Maria Rojas, Luis Jimenez

Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Business Inteligence; Data Mining; Data Warehouse.

Online: 24 March 2021 (13:47:31 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202012.0468.v1

Developing High-Resolution Gridded Rainfall and Temperature Data for Bangladesh: The ENACTS-BMD Dataset

Nachiketa Acharya, Rija Faniriantsoa, Bazlur Rashid, Razia Sultana, Carlo Montes, Tufa Dinku, S.M.Q. Hassan

Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: climate data; gridded product; data merging

Online: 18 December 2020 (13:29:38 CET)

Show abstract| Download PDF| Share

Preprint CASE REPORT | doi:10.20944/preprints201801.0066.v1

Data Visualization of European Regional Operational Programmes: Unleashing the Informative Potential of Open Data for Performance Assessment

Emanuele Frontoni, Roberto Palloni

Subject: Engineering, Control And Systems Engineering Keywords: cohesion policy; data visualization; open data

Online: 8 January 2018 (11:11:47 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202403.0012.v1

Flexible Techniques to Detect Typical Hidden Errors in Large Longitudinal Datasets

Renato Bruni, Cinzia Daraio, Simone Di Leo

Subject: Computer Science And Mathematics, Computer Science Keywords: big data; information processing; information reconstruction; data quality: longitudinal data sequences

Online: 1 March 2024 (10:33:16 CET)

Show abstract| Download PDF| Share

Preprint COMMUNICATION | doi:10.20944/preprints202309.0047.v1

Analyzing Public Reactions during the MPox Outbreak: Findings from Topic Modeling of Tweets

Nirmalya Thakur, Yuvraj Nihal Duggal, Zihui Liu

Subject: Public Health And Healthcare, Public Health And Health Services Keywords: MPox; big data; data analysis; data science; Twitter; natural language processing

Online: 1 September 2023 (10:23:41 CEST)

Show abstract| Download PDF| Share

In the last decade and a half, the world has experienced the outbreak of a range of viruses such as COVID-19, H1N1, flu, Ebola, Zika Virus, Middle East Respiratory Syndrome (MERS), Measles, and West Nile Virus, just to name a few. During these virus outbreaks, the usage and effectiveness of social media platforms increased significantly as such platforms served as virtual communities, enabling their users to share and exchange information, news, perspectives, opinions, ideas, and comments related to the outbreaks. Analysis of this Big Data of conversations related to virus outbreaks using concepts of Natural Language Processing such as Topic Modeling has attracted the attention of researchers from different disciplines such as Healthcare, Epidemiology, Data Science, Medicine, and Computer Science. The recent outbreak of the MPox virus has resulted in a tremendous increase in the usage of Twitter. Prior works in this field have primarily focused on the sentiment analysis and content analysis of these Tweets, and the few works that have focused on topic modeling have multiple limitations. This paper aims to address this research gap and makes two scientific contributions to this field. First, it presents the results of performing Topic Modeling on 601,432 Tweets about the 2022 Mpox outbreak, which were posted on Twitter between May 7, 2022, and March 3, 2023. The results indicate that the conversations on Twitter related to Mpox during this time range may be broadly categorized into four distinct themes - Views and Perspectives about MPox, Updates on Cases and Investigations about Mpox, MPox and the LGBTQIA+ Community, and MPox and COVID-19. Second, the paper presents the findings from the analysis of these Tweets. The results show that the theme that was most popular on Twitter (in terms of the number of Tweets posted) during this time range was - Views and Perspectives about MPox. It is followed by the theme of MPox and the LGBTQIA+ Community, which is followed by the themes of MPox and COVID-19 and Updates on Cases and Investigations about Mpox, respectively. Finally, a comparison with prior works in this field is also presented to highlight the novelty and significance of this research work.

Preprint ARTICLE | doi:10.20944/preprints202205.0344.v1

Transforming Points of Single Contact Data into Linked Data

Pavlina Fragkou, Leandros Maglaras

Subject: Computer Science And Mathematics, Information Systems Keywords: Linked (open) Data; Semantic Interoperability; Data Mapping; Governmental Data; SPARQL; Ontologies

Online: 25 May 2022 (08:18:46 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202111.0073.v1

Using the Data Quality Dashboard to Improve the EHDEN Network

Clair Blacketer, Erica A Voss, Frank DeFalco, Nigel Hughes, Martijn J Schuemie, Maxim Moinat, Peter Rijnbeek

Subject: Medicine And Pharmacology, Other Keywords: data quality; OMOP CDM; EHDEN; healthcare data; real world data; RWD

Online: 3 November 2021 (09:12:54 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202110.0103.v1

Usage of Data Analytics in Improving Sourcing of Supply Chain Inputs

S M Nazmuz Sakib

Subject: Computer Science And Mathematics, Information Systems Keywords: Data Analytics; Analytics; Supply Chain Input; Supply Chain; Data Science; Data

Online: 6 October 2021 (10:38:42 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202310.1998.v1

Marburg Virus Outbreak and a New Conspiracy Theory: Findings from a Comprehensive Analysis of Web Behavior

Nirmalya Thakur, Shuqi Cui, Kesha A. Patel, Nazif Azizi, Victoria Knieling, Changhee Han, Audrey Poon, Rishika Shah

Subject: Public Health And Healthcare, Public Health And Health Services Keywords: Marburg virus; big data; data mining; data analysis; google trends; web behavior; data science; conspiracy theory

Online: 31 October 2023 (07:02:07 CET)

Show abstract| Download PDF| Share

During virus outbreaks in the recent past web behavior mining, modeling, and analysis have served as means to examine, explore, interpret, assess, and forecast the worldwide perception, readiness, reactions, and response linked to these virus outbreaks. The recent outbreak of the Marburg Virus disease (MVD), the high fatality rate of MVD, and the conspiracy theory linking the FEMA alert signal in the United States on October 4, 2023, with MVD and a zombie outbreak, resulted in a diverse range of reactions in the general public which has transpired in a surge in web behavior in this context. This resulted in “Marburg Virus” featuring in the list of the top trending topics on Twitter on October 3, 2023, and “Emergency Alert System” and “Zombie” featuring in the list of top trending topics on Twitter on October 4, 2023. No prior work in this field has mined and analyzed the emerging trends in web behavior in this context. The work presented in this paper aims to address this research gap and makes multiple scientific contributions to this field. First, it presents the results of performing time series forecasting of the search interests related to MVD emerging from 216 different regions on a global scale using ARIMA, LSTM, and Autocorrelation. The results of this analysis present the optimal model for forecasting web behavior related to MVD in each of these regions. Second, the correlation between search interests related to MVD and search interests related to zombies (in the context of this conspiracy theory) was investigated. The findings show that there were several regions where there was a statistically significant correlation between MVD-related searches and zombie-related searches (in the context of this conspiracy theory) on Google on October 4, 2023. Finally, the correlation between zombie-related searches (in the context of this conspiracy theory) in the United States and other regions was investigated. This analysis helped to identify those regions where this correlation was statistically significant.

Preprint ARTICLE | doi:10.20944/preprints202308.0442.v1

Instrumental and Observational Problems of the Earliest Temperature Records in Italy: A Methodology for Data Recovery and Correction

Dario Camuffo, Antonio Della Valle, Francesca Becherini

Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: Thermometers; Temperature records; Early instrumental meteorological series; Data rescue; Data recovery; Data correction; Climate data analysis

Online: 7 August 2023 (03:01:24 CEST)

Show abstract| Download PDF| Share

A distinction is made between data rescue (i.e., copying, digitizing and archiving) and data recovery that implies deciphering, interpreting and transforming early instrumental readings and their metadata to obtain high-quality datasets in modern units. This requires a multidisciplinary approach that includes: palaeography and knowledge of Latin and other languages to read the handwritten logs and additional documents; history of science to interpret the original text, data e metadata within the cultural frame of the 17th, 18th and early 19th century; physics and technology to recognize bias of early instruments or calibrations, or to correct for observational bias; astronomy to calculate and transform the original time in canonical hours that started from twilight. The liquid-in-glass thermometer was invented in 1641 and the earliest temperature records started in 1654. Since then, different types of thermometers were invented, based on the thermal expansion of air or selected thermometric liquids with deviation from linearity. Reference points, thermometric scales, calibration methodologies were not comparable, and not always adequately described. Thermometers had various locations and exposures, e.g., indoor, outdoor, on windows, gardens or roofs, facing different directions. Readings were made only one or a few times a day, not necessarily respecting a precise time schedule: this bias is analysed for the most popular combinations of reading times. The time was based on sundials and local Sun, but the hours were counted starting from twilight. In 1789-90 Italy changed system and all cities counted hours from their lower culmination (i.e., local midnight), so that every city had its local time; in 1866, all the Italian cities followed the local time of Rome; in 1893, the whole Italy adopted the present-day system, based on the Coordinated Universal Time and the time zones. In 1873, when the International Meteorological Committee (IMO) was founded, later transformed in World Meteorological Organization (WMO), a standardization of instruments and observational protocols was established, and all data became fully comparable. In the early instrumental period, from 1654 to 1873, the comparison, correction and homogenization of records is quite difficult, mainly because of the scarcity or even absence of metadata. This paper deals about this confused situation, discussing the main problems, but also the methodologies to recognize missing metadata, distinguish indoor from outdoor readings; correct and transform early datasets in unknown or arbitrary units into modern units; finally, in which cases it is possible to reach the quality level required by WMO. The focus is to explain the methodology needed to recover early instrumental records, i.e., the operations that should be performed to interpret, correct, and transform the original raw data into a high-quality dataset of temperature, usable for climate studies.

Preprint DATA DESCRIPTOR | doi:10.20944/preprints202308.1701.v1

A Dataset of Search Interests Related to Disease X Originating from Different Geographic Regions

Nirmalya Thakur, Kesha A. Patel, Isabella Hall, Yuvraj Nihal Duggal, Shuqi Cui

Subject: Public Health And Healthcare, Public Health And Health Services Keywords: disease X; big data; data science; data analysis; dataset development; database; google trends; data mining; healthcare; epidemiology

Online: 24 August 2023 (05:48:54 CEST)

Show abstract| Download PDF| Share

Preprint COMMUNICATION | doi:10.20944/preprints202303.0453.v1

Analysis of Public Discourse on Twitter involving COVID-19 and MPox: Findings from Sentiment Analysis and Text Analysis

Nirmalya Thakur

Subject: Social Sciences, Media Studies Keywords: COVID-19; MPox; Twitter; Big Data; Data Mining; Data Analysis; Sentiment Analysis; Data Science; Social Media; Monkeypox

Online: 27 March 2023 (08:39:28 CEST)

Show abstract| Download PDF| Share

Mining and analysis of the Big Data of Twitter conversations have been of significant interest to the scientific community in the fields of healthcare, epidemiology, big data, data science, computer science, and their related areas, as can be seen from several works in the last few years that focused on sentiment analysis and other forms of text analysis of Tweets related to Ebola, E-Coli, Dengue, Human papillomavirus (HPV), Middle East Respiratory Syndrome (MERS), Measles, Zika virus, H1N1, influenza-like illness, swine flu, flu, Cholera, Listeriosis, cancer, Liver Disease, Inflammatory Bowel Disease, kidney disease, lupus, Parkinson's, Diphtheria, and West Nile virus. The recent outbreaks of COVID-19 and MPox have served as "catalysts" for Twitter usage related to seeking and sharing information, views, opinions, and sentiments involving both these viruses. While there have been a few works published in the last few months that focused on performing sentiment analysis of Tweets related to either COVID-19 or MPox, none of the prior works in this field thus far involved analysis of Tweets focusing on both COVID-19 and MPox at the same time. With an aim to address this research gap, a total of 61,862 Tweets that focused on Mpox and COVID-19 simultaneously, posted between May 7, 2022, to March 3, 2023, were studied to perform sentiment analysis and text analysis. The findings of this study are manifold. First, the results of sentiment analysis show that almost half the Tweets (the actual percentage is 46.88%) had a negative sentiment. It was followed by Tweets that had a positive sentiment (31.97%) and Tweets that had a neutral sentiment (21.14%). Second, this paper presents the top 50 hashtags that were used in these Tweets. Third, it presents the top 100 most frequently used words that are featured in these Tweets. The findings of text analysis show that some of the commonly used words involved directly referring to either or both viruses. In addition to this, the presence of words such as "Polio", "Biden", "Ukraine", "HIV", "climate", and "Ebola" in the list of the top 100 most frequent words indicate that topics of conversations on Twitter in the context of COVID-19 and MPox also included a high level of interest related to other viruses, President Biden, and Ukraine. Finally, a comprehensive comparative study that involves a comparison of this work with 49 prior works in this field is presented to uphold the scientific contributions and relevance of the same.

Working Paper ARTICLE

Business Intelligence and Its Big Evolution

Andres Velosa, Gustavo Pabon

Subject: Engineering, Automotive Engineering Keywords: Business Intelligence; Data warehouse; Data Marts; Architecture; Data; Information; cloud; Data Mining; evolution; technologic companies; tools; software

Online: 24 March 2021 (13:06:53 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202311.1570.v1

Consore: A Powerful Federated Data Mining Tool Driving a French Research Network to Accelerate Cancer Research

Julien Guérin, Amine Nahid, Louis Tassy, Marc Deloger, François Bocquet, Simon Thézenas, Emmanuel Desandes, Marie-Cécile Le Deley, Xavier DURANDO, Anne Jaffré, Ikram Es Saad, Hugo Crochet, Marie Le Morvan, François Lion, Judith Raimbourg, Oussama Khay, Franck Craynest, Alexia Giro, Yec'han Laizet, Aurélie Bertaut, Frédérik Joly, Alain Livartowski, Pierre Etienne Heudel

Subject: Public Health And Healthcare, Public Health And Health Services Keywords: cancer research; cancer; natural language processing; data mining; data warehouse; big data

Online: 26 November 2023 (05:13:14 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202111.0410.v1

Design and Implementation of Efficient Transmission of Cloud Data in Wireless Media

Virendra Pandharipant Nikam, Sheetal S Dhande

Subject: Engineering, Control And Systems Engineering Keywords: Data compression; data hiding; psnr; mse; virtual data; public cloud; quantization error

Online: 22 November 2021 (15:17:12 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201808.0350.v2

Integration of Data Mining Clustering Approach with the Personalized E-Learning System

Samina Kausar, Huahu Xu, Iftikhar Hussain, Wenhau Zhu, Misha Zahid

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: big data; clustering; data mining; educational data mining; e-learning; profile learning

Online: 19 October 2018 (05:58:05 CEST)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints201807.0059.v1

Data Normalization in NMR-based Metabolomics

Helena Zacharias, Michael Altenbuchinger, Wolfram Gronwald

Subject: Biology And Life Sciences, Biophysics Keywords: data normalization; data scaling; zero-sum; metabolic fingerprinting; NMR; statistical data analysis

Online: 3 July 2018 (16:22:31 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202404.0357.v3

The Path to Data Protection Governance in China Mainland

Bing Chen, Yongji Liu

Subject: Social Sciences, Law Keywords: data protection; personal privacy; cybersecurity; data security

Online: 24 April 2024 (14:20:16 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202402.1372.v1

Leveraging Visualization and Machine Learning Techniques in Education: A Case Study of K-12 State Assessment Data

Loni Taylor, Vibhuti Gupta, Kwanghee Jung

Subject: Computer Science And Mathematics, Analysis Keywords: Data Visualization; Big Data; AI; Machine Learning

Online: 23 February 2024 (10:39:04 CET)

Show abstract| Download PDF| Share

Working Paper ARTICLE

The Analysis and the Measurement of Poverty: An Interval Based Composite Indicator Approach

Carlo Drago

Subject: Business, Economics And Management, Econometrics And Statistics Keywords: poverty; composite indicators; interval data; symbolic data

Online: 24 August 2021 (15:46:09 CEST)

Show abstract| Download PDF| Share

Working Paper ARTICLE

Development of Cost and Schedule Data Integration Algorithm based on Big Data Technology

Daegu Cho, Myungdo Lee, Jihye Shin

Subject: Computer Science And Mathematics, Computer Science Keywords: big data; data integration; EVMS; construction management

Online: 30 October 2020 (15:35:00 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201701.0090.v1

An Automatic Matcher and Linker for Transportation Datasets

Ali Masri, Karine Zeitouni, Zoubida Kedad, Bertrand Leroy

Subject: Computer Science And Mathematics, Information Systems Keywords: transportation data; data interlinking; automatic schema matching

Online: 20 January 2017 (03:38:06 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202407.1459.v1

Optimising Clinical Epidemiology in Disease Outbreaks: Analysis of ISARIC-WHO COVID-19 Case Report Form Utilisation

Laura Merson, Sara Duque, Esteban Garcia-Gallo, Trokon Omarley Yeabah, Jamie Rylance, Janet Diaz, Antoine Flahault, . ISARIC Clinical Characterisation Group

Subject: Public Health And Healthcare, Public Health And Health Services Keywords: clinical epidemiology; infectious disease outbreaks; data collection; data management; common data elements; ISARIC

Online: 18 July 2024 (09:53:41 CEST)

Show abstract| Download PDF| Supplementary Files| Share

Preprint ARTICLE | doi:10.20944/preprints202308.1391.v1

An Automated Method for Extracting and Analyzing Railway Infrastructure Cost Data

Daniel Adanza Dopazo, Lamine Mahdjoubi, Bill Gething

Subject: Engineering, Transportation Science And Technology Keywords: data extraction; data mining; railway infrastructure costs; infrastructure costs data analysis; cost analysis

Online: 18 August 2023 (16:03:08 CEST)

Show abstract| Download PDF| Share

Working Paper ARTICLE

Model for the Collection and Analysis of Data from Teachers and Students, Supported by Academic Analytics

Fredys A. Simanca H., Isabel Hernández Arteaga, María Elsa Unriza Puin, Fabian Blanco Garrido, Jaime Paez Paez, Jairo Cortes Méndez

Subject: Computer Science And Mathematics, Information Systems Keywords: Academic Analytics; data storage; education and big data; analysis of data; learning analytics

Online: 19 July 2020 (20:37:39 CEST)

Show abstract| Download PDF| Share

Business Intelligence, defined by [1] as "the ability to understand the interrelations of the facts that are presented in such a way that it can guide the action towards achieving a desired goal", has been used since 1958 for the transformation of data into information, and of information into knowledge, to be used when making decisions in a business environment. But, what would happen if we took the same principles of business intelligence and applied them to the academic environment? The answer would be the creation of Academic Analytics, a term defined by [2] as the process of evaluating and analyzing organizational information from university systems for reporting and making decisions, whose characteristics allow it to be used more and more in institutions, since the information they accumulate about their students and teachers gathers data such as academic performance, student success, persistence, and retention [5]. Academic Analytics enables an analysis of data that is very important for making decisions in the educational institutional environment, aggregating valuable information in the academic research activity and providing easy to use business intelligence tools. This article shows a proposal for creating an information system based on Academic Analytics, using ASP.Net technology and trusting storage in the database engine Microsoft SQL Server, designing a model that is supported by Academic Analytics for the collection and analysis of data from the information systems of educational institutions. The idea that was conceived proposes a system that is capable of displaying statistics on the historical data of students and teachers taken over academic periods, without having direct access to institutional databases, with the purpose of gathering the information that the director, the teacher, and finally the student need for making decisions. The model was validated with information taken from students and teachers during the last five years, and the export format of the data was pdf, csv, and xls files. The findings allow us to state that it is extremely important to analyze the data that is in the information systems of the educational institutions for making decisions. After the validation of the model, it was established that it is a must for students to know the reports of their academic performance in order to carry out a process of self-evaluation, as well as for teachers to be able to see the results of the data obtained in order to carry out processes of self-evaluation, and adaptation of content and dynamics in the classrooms, and finally for the head of the program to make decisions.

Preprint ARTICLE | doi:10.20944/preprints201812.0071.v1

Data Governance and Sovereignty in Urban Data Spaces Based on Standardized ICT Reference Architectures

Silke Cuno, Lina Bruns, Nikolay Tcholtchev, Philipp Lämmel, Ina Schieferdecker

Subject: Engineering, Electrical And Electronic Engineering Keywords: data governance; data sovereignty; urban data spaces; ICT reference architecture; open urban platform

Online: 6 December 2018 (05:09:54 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202110.0260.v1

Online System for Power Quality Operational Data Management in Frequency Monitoring using Python and Grafana

Jose-María Sierra-Fernández, Olivia Florencias-Oliveros, Manuel-Jesús Espinosa-Gavira, Juan-José González-de-la-Rosa, Agustín Agüera-Pérez, José-Carlos Palomares-Salas

Subject: Engineering, Electrical And Electronic Engineering Keywords: big data; data acquisition; data visualization; data exchange; dashboard; frequency stability; Grafana lab; Power Quality; GPS reference; frequency measurement.

Online: 18 October 2021 (18:07:43 CEST)

Show abstract| Download PDF| Share

Preprint DATA DESCRIPTOR | doi:10.20944/preprints202109.0370.v1

The SERL Observatory Dataset: Longitudinal Smart Meter Electricity and Gas Data, Survey, EPC and Climate Data for Over 13,000 GB Households

Ellen Webborn, Jessica Few, Eoghan McKenna, Simon Elam, Martin Pullinger, Ben Anderson, David Shipworth, Tadj Oreszczyn

Subject: Engineering, Energy And Fuel Technology Keywords: smart meter data; household survey; EPC; energy data; energy demand; energy consumption; longitudinal; energy modelling; electricity data; gas data

Online: 22 September 2021 (10:16:05 CEST)

Show abstract| Download PDF| Supplementary Files| Share

Preprint ARTICLE | doi:10.20944/preprints201807.0038.v1

Towards the Provision of Accurate Atomic Data for Neutral Iron

Andrew Conroy, Catherine Ramsbottom, Connor Ballance, Francis Keenan

Subject: Physical Sciences, Atomic And Molecular Physics Keywords: atomic data

Online: 3 July 2018 (11:25:13 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202406.1070.v1

The Mental Health Index across the Italian Regions in the Esg Context

Emanuela Resta, Giancarlo Logroscino, Silvio Tafuri, Preethymol Peter, Noviello Chiara, Alberto Costantiello, Angelo Leogrande

Subject: Business, Economics And Management, Econometrics And Statistics Keywords: ESG; Mental Health Index; Panel Data; Data Analysis

Online: 17 June 2024 (08:33:43 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202404.0740.v1

Functional Process Control (FPC): A Methodology to Reduce Variability

Joaquín Sancho, Javier Martínez, Jorge Pastor, Carlos Cajal

Subject: Computer Science And Mathematics, Applied Mathematics Keywords: functional data; quality; non-normal data; variability; outlier

Online: 10 April 2024 (15:52:41 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202309.1016.v1

Three-Stage Sampling Algorithm for Highly Imbalanced Multi-Classification Time Series Data Sets

Haoming Wang

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Imbalanced data; Data preprocessing; Sampling; Tomek Links; DTW

Online: 14 September 2023 (14:00:42 CEST)

Show abstract| Download PDF| Share

Purpose To alleviate the data imbalance problem caused by subjective and objective reasons, scholars have developed different data preprocessing algorithms, among which undersampling algorithms are widely used because of their fast and efficient performance. However, when the number of samples of some categories in a multi-classification dataset is too small to be processed by sampling, or the number of minority class samples is only 1 to 2, the traditional undersampling algorithms will be weakened. Methods This study selects 9 multi-classification time series datasets with extremely few samples as the objects, fully considers the characteristics of time series data, and uses a three-stage algorithm to alleviate the data imbalance problem. Stage one: Random oversampling with disturbance items increases the number of sample points; Stage two: On this basis, SMOTE (Synthetic Minority Oversampling Technique) oversampling; Stage three: Using dynamic time warping distance to calculate the distance between sample points, identify the sample points of Tomek Links at the boundary, and clean up the boundary noise.Results This study proposes a new sampling algorithm. In the 9 multi-classification time series datasets with extremely few samples, the new sampling algorithm is compared with four classic undersampling algorithms, ENN (Edited Nearest Neighbours), NCR (Neighborhood Cleaning Rule), OSS (One Side Selection) and RENN (Repeated Edited Nearest Neighbours), based on macro accuracy, recall rate and F1-score evaluation indicators. The results show that: In the 9 datasets selected, the dataset with the most categories and the least number of minority class samples, FiftyWords, the accuracy of the new sampling algorithm is 0.7156, far beyond ENN, RENN, OSS and NCR; its recall rate is also better than the four undersampling algorithms used for comparison, at 0.7261; its F1-score is increased by 200.71%, 188.74%, 155.29% and 85.61%, respectively, relative to ENN, RENN, OSS, and NCR; In the other 8 datasets, this new sampling algorithm also shows good indicator scores.Conclusion The new algorithm proposed in this study can effectively alleviate the data imbalance problem of multi-classification time series datasets with many categories and few minority class samples, and at the same time clean up the boundary noise data between classes.

Preprint ARTICLE | doi:10.20944/preprints202307.1117.v1

Design and Analysis of Query Models Database Preservation Information Systems Digitization of History and Endowments; Case Study of History and Waqf of Sumedang Larang Kingdom Indonesia

R. Sudrajat, Budi Nurani Ruchjana, Atje Setiawan Abdullah, Rahmat Budiarto

Subject: Computer Science And Mathematics, Information Systems Keywords: history; endowments; query model; digital data; physical data

Online: 17 July 2023 (15:11:18 CEST)

Show abstract| Download PDF| Share

Preprint COMMUNICATION | doi:10.20944/preprints202305.1694.v1

Synthetic Data & the Future of Women's Health: A Synergistic Relationship

Gayathri Delanerolle, Peter Phiri, Heitor Cavalini, David Benfield, Ashish Shetty, Yassine Bouchareb, Jian Shi, Alain Zemkoho

Subject: Medicine And Pharmacology, Clinical Medicine Keywords: Womens Health; Data Science; Data Methods; Artificial Intelligence

Online: 24 May 2023 (04:48:58 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202206.0335.v1

The Dataharmonizer: a Tool for Faster Data Harmonization, Validation, Aggregation, and Analysis of Pathogen Genomics Contextual Information

Ivan Gill, Emma Griffiths, Damion Dooley, Rhiannon Cameron, Sarah Savić Kallesøe, Nithu Sara John, Anoosha Sehar, Gurinder Gosal, David Alexander, Madison Chapel, Matthew Croxen, Benjamin Delisle, Rachelle Di Tullio, Daniel Gaston, Ana Duggan, Jennifer Guthrie, Mark Horsman, Esha Joshi, Levon Kearney, Natalie Knox, Lynette Lau, Jason LeBlanc, Vincent Li, Pierre Lyons, Keith MacKenzie, Andrew McArthur, Emilie Panousis, John Palmer, Natalie Prystajecky, Kerri Smith, Jennifer Tanner, Christopher Townend, Andrea Tyler, Gary Van Domselaar, William Hsiao

Subject: Computer Science And Mathematics, Information Systems Keywords: metadata; contextual data; harmonization; genomic surveillance; data management

Online: 24 June 2022 (08:46:04 CEST)

Show abstract| Download PDF| Share

Pathogen genomics is a critical tool for public health surveillance, infection control, outbreak investigations, as well as research. In order to make use of pathogen genomics data, it must be interpreted using contextual data (metadata). Contextual data includes sample metadata, laboratory methods, patient demographics, clinical outcomes, and epidemiological information. However, the variability in how contextual information is captured by different authorities and how it is encoded in different databases poses challenges for data interpretation, integration, and its use/re-use. The DataHarmonizer is a template-driven spreadsheet application for harmonizing, validating, and transforming genomics contextual data into submission-ready formats for public or private repositories. The tool’s web browser-based JavaScript environment enables validation and its offline functionality and local installation increases data security. The DataHarmonizer was developed to address the data sharing needs that arose during the COVID-19 pandemic, and was used by members of the Canadian COVID Genomics Network (CanCOGeN) to harmonize SARS-CoV-2 contextual data for national surveillance and for public repository submission.In order to support coordination of international surveillance efforts, we have partnered with the Public Health Alliance for Genomic Epidemiology to also provide a template conforming to its SARS-CoV-2 contextual data specification for use worldwide. Templates are also being developed for One Health and foodborne pathogens. Overall, the DataHarmonizer tool improves the effectiveness and fidelity of contextual data capture as well as its subsequent usability. Harmonization of contextual information across authorities, platforms and systems globally improves interoperability and reusability of data for concerted public health and research initiatives to fight the current pandemic and future public health emergencies. While initially developed for the COVID-19 pandemic, its expansion to other data management applications and pathogens is already underway.

Preprint ARTICLE | doi:10.20944/preprints202108.0471.v1

Identifying the Main Risk Factors for CVD Prediction Using Machine Learning Algorithms

Luis Rolando Guarneros-Nolasco, Nancy Aracely Cruz-Ramos, Giner Alor-Hernández, Lisbeth Rodríguez-Mazahua, José Luis Sánchez-Cervantes

Subject: Computer Science And Mathematics, Information Systems Keywords: Big data; Health prevention; Machine learning; Medical data

Online: 24 August 2021 (14:00:12 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202106.0738.v1

Combination of Using Pairwise Comparisons and Composite Reference Series: A New Approach in the Homogenization of Climatic Time Series

Peter Domonkos

Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: time series; homogenization; ACMANT; observed data; data accuracy

Online: 30 June 2021 (13:08:39 CEST)

Show abstract| Download PDF| Share

Working Paper ARTICLE

Generating Fake ECGs using GANs for Anonymizing Healthcare Data

Esteban Piacentino, Alvaro Guarner, Cecilio Angulo

Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: GAN; ECG; anonymization; healthcare data; sensors; data transformation

Online: 3 September 2020 (05:26:01 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201806.0419.v1

Modeling Analytical Streams for Social Business Intelligence

Indira Lanza-Cruz, Rafael Berlanga, María José Aramburu

Subject: Computer Science And Mathematics, Information Systems Keywords: social business intelligence; data streaming models; linked data

Online: 26 June 2018 (12:48:17 CEST)

Show abstract| Download PDF| Share

Preprint COMMUNICATION | doi:10.20944/preprints201803.0054.v1

Travel Time Prediction Based on Data Feature Selection and Data Clustering Methods

Chi-Hua Chen

Subject: Computer Science And Mathematics, Information Systems Keywords: data feature selection; data clustering; travel time prediction

Online: 7 March 2018 (13:30:06 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202406.0091.v1

Open Data and Sustainable Mobility in Slovenia

Klara Žnideršič, Vid Klopčič, Andraž Juvan, Matija Marolt, Matevž Pesek

Subject: Computer Science And Mathematics, Computer Science Keywords: open data; mobility applications; sustainable services; socio-economic impact; big data; real-time data

Online: 4 June 2024 (03:54:58 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202404.0429.v1

Analysis of Missingness Scenarios for Observational Health Data

Alireza Zamanian, Henrik von Kleist, Octavia Andreea Ciora, Marta Piperno, Gino Lancho, Narges Ahmidi

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Missing Data Analysis; Observational Health Data; Missingness Scenarios; Missing Data Assumptions; Missingness distribution shift

Online: 5 April 2024 (10:45:36 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202312.0496.v1

Nonparametric Partial Linear Estimation for Spatial Functional Data with Missing At-Random

Tawfik Benchikh, Ibrahim M. Almanjahie, Omar Fetitah, Mohammed kadi Attouch

Subject: Computer Science And Mathematics, Probability And Statistics Keywords: Missing at random data; Functional data analysis; Asymptotic normality; spatial data; Kernel regression method

Online: 7 December 2023 (09:14:16 CET)

Show abstract| Download PDF| Share

Preprint COMMUNICATION | doi:10.20944/preprints202206.0172.v3

MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022 MonkeyPox Outbreak

Nirmalya Thakur

Subject: Computer Science And Mathematics, Information Systems Keywords: Monkeypox; monkey pox; Twitter; Dataset; Tweets; Social Media; Big Data; Data Mining; Data Science

Online: 25 July 2022 (09:41:19 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202109.0518.v1

Data Fusion and Visualization of a Multi-Sensor Personal Exposure Campaign

Rok Novak, Ioannis Petridis, David Kocman, Johanna Amalia Robinson, Tjaša Kanduč, Dimitris Chapizanis, Spyros Karakitsios, Benjamin Flückiger, Danielle Vienneau, Ondřej Mikeš, Celine Degrendele, Ondřej Sáňka, Saul García Dos Santos-Alves, Thomas Maggos, Dementra Pardali, Asimina Stamatelopoulou, Dikaia Saraga, Marco Giovanni Persico, Jaideep Visave, Alberto Gotti, Dimosthenis Sarigiannis

Subject: Environmental And Earth Sciences, Environmental Science Keywords: data fusion; multi-sensor; data visualization; data treatment; participant reports; air quality; exposure assessment

Online: 30 September 2021 (14:13:52 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201806.0185.v1

Data Quality: A Negotiator between Paper-based and Digital Records in the Pakistan’s TB Control Program

Syed Mustafa Ali, Farah Naureen, Arif Noor, Maged Kamel N. Boulos, Javariya Aamir, Muhammad Ishaq, Naveed Anjum, John Ainsworth, Aamna Rashid, Arman Majidulla, Irum Fatima

Subject: Medicine And Pharmacology, Other Keywords: mHealth; mobile data collection; data quality; data quality assessment framework; Tuberculosis control; developing countries

Online: 12 June 2018 (10:34:33 CEST)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints202105.0663.v1

MANAGEMENT OF BIG DATA IN THE CONTEMPORARY WORLD

Anjaneyulu Jinugu, Sreechandana Kodimela, Madhavi Laitha V V

Subject: Computer Science And Mathematics, Computer Science Keywords: Big Data, Internet Data Sources (IDS), Internet of Things (IoT), Sustainable Development Goals (SDGs), Big data Technologies, Big data Challenges

Online: 27 May 2021 (10:31:03 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202003.0073.v1

FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units

Koenraad De Smedt, Dimitris Koureas, Peter Wittenburg

Subject: Computer Science And Mathematics, Information Systems Keywords: digital object; data infrastructure; research infrastructure; data management; data science; FAIR data; open science; European Open Science Cloud; EOSC; persistent identifier

Online: 5 March 2020 (02:30:06 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202403.1611.v1

A Formal Model for Reliable Data Acquisition and Control in Legacy Critical Infrastructures

Jose Miguel Blanco, Jose M. Del Alamo, Juan C. Dueñas, Felix Cuadrado

Subject: Computer Science And Mathematics, Information Systems Keywords: Critical Infrastructure; Water Distribution Network; Formal Model; Digital Transformation; Data Management; Data Security; Data Acquisition

Online: 26 March 2024 (14:35:30 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202404.2008.v1

Univariate Outlier Detection: Precision-Driven Algorithm for Single-Cluster Scenarios

El hairach Mohamed Limam, Insaf Bellamine, Amal Tmiri

Subject: Computer Science And Mathematics, Computer Science Keywords: outlier detection, machine learning, univariate data analysis, data mining

Online: 30 April 2024 (12:34:20 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202307.0244.v1

Estimation of Reference Evapotranspiration in a Semi-Arid Region of Mexico

Gerardo Delgado-Ramírez, Martín Alejandro Bolaños-González, Abel Quevedo-Nolasco, Adolfo López-Pérez, Juan Estrada-Ávalos

Subject: Environmental And Earth Sciences, Water Science And Technology Keywords: NASA-POWER platform; empirical equations; reanalysis data; meteorological data

Online: 4 July 2023 (13:59:00 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202305.0722.v1

Anomaly Detection in Endemic Disease Surveillance Data Using Machine Learning Techniques

Peter U. Eze, Nicholas Geard, Ivo Mueller, Iadine Chades

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Anomaly detection; Malaria data; Machine learning; big data; epidemic

Online: 10 May 2023 (09:34:36 CEST)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints202208.0420.v1

Siri 2.0 - Conversational Commerce of Social Bots and its Legal Implications

Dagmar Gesmann-Nuissl, Stefanie Meyer

Subject: Social Sciences, Law Keywords: conversational commerce; data protection; law of obligations of data

Online: 24 August 2022 (10:55:06 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202208.0224.v1

Recognition of Vehicles Entering Expressway Service Areas and Estimation of Dwell Time Using ETC Data

Qiqin Cai, Dingrong Yi, Fumin Zou, Zhaoyi Zhou, Nan Li, Feng Guo

Subject: Engineering, Automotive Engineering Keywords: VR-XGBoost; K-VDTE; ETC data; ESAs; data mining

Online: 12 August 2022 (03:53:23 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202208.0083.v1

A Big Data Analysis with Machine Learning Techniques in Accounting Dataset from the Greek Banking System

Leonidas Theodorakopoulos, Georgios Thanasas, Spyridon Lampropoulos

Subject: Business, Economics And Management, Accounting And Taxation Keywords: Ratios; Financial Crisis; Covid-19; Big Data; Accounting Data

Online: 3 August 2022 (10:42:06 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202103.0331.v1

Social Media Data Misuse

Tariq Soussan, Marcello Trovati

Subject: Social Sciences, Media Studies Keywords: Social media ethics; Social media; data misuse; data integrity

Online: 12 March 2021 (08:05:09 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202006.0258.v2

Data-Driven Solutions and Discoveries in Mechanics Using Physics Informed Neural Network

Qi Zhang, Yilin Chen, Ziyi Yang

Subject: Engineering, Civil Engineering Keywords: Conservation laws; Data inference; Data discovery; Dimensionless form; PINN

Online: 30 September 2020 (03:51:25 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202007.0051.v2

World Health Organization (WHO) COVID-19 Database: Who Needs It?

Ivan Kodvanj, Jan Homolak, Davor Virag, Vladimir Trkulja

Subject: Social Sciences, Library And Information Sciences Keywords: COVID-19; WHO; database; systematic review; data quality

Online: 2 August 2020 (17:43:38 CEST)

Show abstract| Download PDF| Supplementary Files| Share

Preprint ARTICLE | doi:10.20944/preprints201905.0158.v1

An Adaptive Biomedical Data Managing Scheme Based on Blockchain Technique

Ahmed Faeq Hussein, Abbas K. AlZubaidi, Qais Ahmed Habash, Mustafa Musa Jaber

Subject: Medicine And Pharmacology, Other Keywords: blockchain; biomedical data managing; DWT; keyword search; data sharing.

Online: 13 May 2019 (13:30:37 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201806.0219.v1

A Proposal of Methodology for Designing Big Data Warehouses

Francesco Di Tria, Ezio Lefons, Filippo Tangorra

Subject: Computer Science And Mathematics, Information Systems Keywords: Big data technology; Business intelligence; Data integration; System virtualization.

Online: 13 June 2018 (16:19:48 CEST)

Show abstract| Download PDF| Share

Preprint COMMUNICATION | doi:10.20944/preprints202401.2023.v1

Investigating the Global Fear associated with COVID-19 using Subjectivity Analysis and Deep Learning

Nirmalya Thakur, Kesha A. Patel, Audrey Poon, Rishika Shah, Nazif Azizi, Changhee Han

Subject: Computer Science And Mathematics, Computer Science Keywords: COVID-19; Big Data; Data Analysis; Machine Learning; Subjectivity Analysis; Data Science; Deep Learning; Mental Health

Online: 29 January 2024 (15:42:52 CET)

Show abstract| Download PDF| Share

The work presented in this paper makes multiple scientific contributions related to the investigation of the global fear associated with COVID-19 by performing a comprehensive analysis of a dataset comprising survey responses of participants from 40 countries. First, the results of subjectivity analysis of responses where participants indicated their biggest concern related to COVID-19 showed that the average subjectivity in responses by the age group of 41-50 decreased from April 2020 to June 2020, the average subjectivity in responses by the age group of 71-80 drastically increased from May 2020, and the age group of 11-20 indicated the least level of subjectivity in their responses between June 2020 to August 2020. Second, subjectivity analysis also revealed the percentage of highly opinionated, neutral opinionated, and least opinionated responses per age-group where the analyzed age groups were 11-20, 21-30, 31-40, 41-50, 51-60, 61-70, 71-80, and 81-90. For instance, the percentage of highly opinionated, neutral opinionated, and least opinionated responses by the age group of 11-20 were 17.92%, 16.24%, and 65.84%, respectively. Third, data analysis of responses from different age groups showed that the highest percentage of responses indicating that they were very worried about COVID-19 came from individuals in the age group of 21-30. Fourth, data analysis of the survey responses also revealed that in the context of taking precautions to prevent contracting COVID-19, the percentage of individuals in the age group of 31-40 taking precautions was higher as compared to the percentages of individuals from the age groups of 41-50, 51-60, 61-70, 71-80, and 81-90. Finally, a deep learning model was developed to detect if the survey respondents were seeing or planning to see a psychologist or psychiatrist for any mental health issues related to COVID-19. The deep learning model used the responses to multiple questions in the context of fear, preparedness, and response related to COVID-19 from the dataset and achieved an overall performance accuracy of 91.62% after 500 epochs.

Preprint ARTICLE | doi:10.20944/preprints202102.0326.v1

A Comparative Study on Supervised Machine Learning Algorithms for Copper Recovery Quality Prediction in a Leaching Process

Victor Flores, Claudio Leiva

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Data analysis; Artificial Intelligence; Machine Learning; Knowledge Engineering; Computers and information processing, Data analysis; Data Processing.

Online: 16 February 2021 (13:33:53 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202008.0254.v1

Unsupervised Feature Selection Using Recursive k-Means Silhouette Elimination (RkSE): A Two-Scenario Case Study for Fault Classification of High-Dimensional Sensor Data

Ahlam Mallak, Madjid Fathi

Subject: Computer Science And Mathematics, Information Systems Keywords: feature selection; k-means; silhouette measure; clustering; big data; fault classification; sensor data; time-series data

Online: 11 August 2020 (06:26:43 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202108.0303.v2

Ten Simple Rules for FAIR Sharing of Experimental and Clinical Data with the Modeling Community

Matthias König, Jan Grzegorzewski, Martin Golebiewski, Henning Hermjakob, Mike Hucka, Brett Olivier, Sarah Keating, David Nickerson, Falk Schreiber, Rahuman Sheriff, Dagmar Waltemath

Subject: Biology And Life Sciences, Other Keywords: data sharing; FAIR

Online: 19 November 2021 (08:38:42 CET)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints202403.0161.v1

4IR Applications in the Transport Industry: Systematic Review of the State of the Art with Respect to Data Collection and Processing Mechanisms

O.O. Ajayi, A.M. Kurien, K. Djouani, L. Dieng

Subject: Engineering, Transportation Science And Technology Keywords: transportation systems; systematic review; industrial revolution; data collection; data processing

Online: 6 March 2024 (04:30:45 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202307.0466.v1

PlantMetSuite: A User-Friendly Web-Based Tool for Metabolomics Analysis and Visualisation

Yu Liu, Hao-Zhuo Liu, Ding-Kang Chen, Hong-Yun Zeng, Yi-Li Chen, Nan Yao

Subject: Biology And Life Sciences, Plant Sciences Keywords: plant metabolomics; metabolite identification; data visualisation; omics data; bioinformatics tools

Online: 10 July 2023 (13:49:20 CEST)

Show abstract| Download PDF| Supplementary Files| Share

Preprint ARTICLE | doi:10.20944/preprints202112.0068.v1

Big Data Handling Approach for Unauthorized Access of Cloud Computing

Abdul Razaque, Shaldanbayeva Nazerke, Bandar Alotaibi, Munif Alotaibi, Akhmetov Murat, Aziz Alotaibi

Subject: Engineering, Electrical And Electronic Engineering Keywords: Data security; data handling; access control; unauthorized access; cloud computing

Online: 6 December 2021 (12:15:56 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202108.0256.v1

Effect of Non-Academic Parameters on Student’s Performance

Shantanu Lokhande, Vedant Bahel

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Learning Analytics, Education, Educational Data Mining, Pattern Recognition, Data Visualization.

Online: 11 August 2021 (11:23:48 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202102.0593.v2

Hospital Admissions From Care Homes in England During the COVID-19 Pandemic: A Retrospective, Cross-Sectional Analysis Using Linked Administrative Data

Fiona Grimm, Karen Hodgson, Richard Brine, Sarah R Deeny

Subject: Public Health And Healthcare, Public Health And Health Services Keywords: Hospital admissions; care homes; COVID-19; linked data; administrative data

Online: 25 May 2021 (10:33:46 CEST)

Show abstract| Download PDF| Supplementary Files| Share

Preprint ARTICLE | doi:10.20944/preprints202103.0623.v1

How Schools affected COVID-19 Pandemic in Italy: Data Analysis for Lombardy Region, Campania Region, and Emilia Region

Davide Tosi, Alessandro Siro Campi

Subject: Computer Science And Mathematics, Information Systems Keywords: SARS-CoV-2; Big Data; Data Analytics; Predictive Models; Schools

Online: 25 March 2021 (14:35:53 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202010.0618.v1

Ultra-High Bandwidth Optical Data Transmission Based on a 49GHz Kerr Soliton Crystal Microcomb

Mengxi Tan, Bill Corcoran, Xingyuan Xu, Jiayang Wu, Andreas Boes, Thach Nguyen, Sai Chu, Brent Little, Roberto Morandotti, Arnan Mitchell, David Moss

Subject: Engineering, Automotive Engineering Keywords: optical data communications; fiber optics; microcombs; ultrahigh bandwidth data transmission

Online: 29 October 2020 (14:34:21 CET)

Show abstract| Download PDF| Share

Preprint SHORT NOTE | doi:10.20944/preprints202001.0196.v1

A Guide and Toolbox to Replicability and Open Science in Entomology

Jacob Wittman, Brian Aukema

Subject: Biology And Life Sciences, Insect Science Keywords: reproducibility; open access; data curation; data mangement; pre-print servers

Online: 18 January 2020 (09:05:49 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201807.0534.v1

A Dispersion Test for the Modified Borel-Tanner Distribution

John Best, John Rayner

Subject: Computer Science And Mathematics, Probability And Statistics Keywords: covariates; crab data; foetal lamb data; orthonormal polynomials; Poisson distribution

Online: 27 July 2018 (05:20:44 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201806.0440.v1

An Efficient Grid-based K-prototypes Algorithm for Sustainable Decision Making Using Spatial Objects

Hong-Jun Jang, Byoungwook Kim, Jongwan Kim, Soon-Young Jung

Subject: Computer Science And Mathematics, Computational Mathematics Keywords: clustering; spatial data; grid-based k-prototypes; data mining; sustainability

Online: 27 June 2018 (10:21:22 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201805.0353.v1

A Study on the Improvement of Thermal Energy Efficiency for District Thermal Energy Consumer Facility based on Reinforcement Learning

Young-gon Kim, Keol Heo, Ga-Eun You, Hyun-Seo Lim, Jung-In Choi, Jae-Sik Eom

Subject: Computer Science And Mathematics, Computer Science Keywords: big data; big data system; energy; district heating; reinforcement learning

Online: 24 May 2018 (16:05:27 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201804.0054.v1

Metadata Life Cycles, Use Cases and Hierarchies

Ted Habermann

Subject: Computer Science And Mathematics, Other Keywords: metadata; documentation; data life-cycle; metadata life-cycle; hierarchical data

Online: 4 April 2018 (08:16:15 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201710.0076.v2

Addressing Complexities of Machine Learning in Big Data: Principles, Trends and Challenges from Systematical Perspectives

Qi Wang, Xia Zhao, Jincai Huang, Yanghe Feng, Jiahao Su, Zhihao Luo

Subject: Computer Science And Mathematics, Information Systems Keywords: big data; machine learning; regularization; data quality; robust learning framework

Online: 17 October 2017 (03:47:41 CEST)

Show abstract| Download PDF| Share

Preprint COMMUNICATION | doi:10.20944/preprints202309.1969.v1

A Comprehensive Analysis of the Public Discourse on Twitter about Exoskeletons from 2017 to 2023

Nirmalya Thakur, Kesha A. Patel, Audrey Poon, Rishika Shah, Nazif Azizi, Changhee Han

Subject: Computer Science And Mathematics, Computer Science Keywords: Twitter; Data Analysis; Big Data; Exoskeletons; Data Science; Text Analysis; Sentiment Analysis; Content Analysis; Natural Language Processing

Online: 28 September 2023 (13:25:30 CEST)

Show abstract| Download PDF| Share

The work of this paper presents multiple novel findings from a comprehensive analysis of about 150,000 tweets about exoskeletons posted between May 2017 and May 2023. First, findings from content analysis and temporal analysis of these tweets reveal the specific months per year when a significantly higher volume of Tweets was posted and the time windows when the highest number of tweets, the lowest number of tweets, tweets with the highest number of hashtags, and tweets with the highest number of user mentions were posted. Second, the paper shows that there are statistically significant correlations between the number of tweets posted per hour and different characteristics of these tweets. Third, the paper presents a multiple linear regression model to predict the number of tweets posted per hour in terms of these characteristics of tweets. The R2 score of this model was observed to be 0.9540. Fourth, the paper reports that the 10 most popular hashtags were #exoskeleton, #robotics, #iot, #technology, #tech #innovation, #ai, #sci, #construction and #news. Fifth, sentiment analysis of these tweets was performed using VADER and the DistilRoBERTa-base library. The results show that the percentage of positive, neutral, and negative tweets were 46.8%, 33.1%, and 20.1%, respectively. The results also show that in the tweets that did not express a neutral sentiment, the sentiment of surprise was the most common sentiment. It was followed by the sentiments of joy, disgust, sadness, fear, and anger. Furthermore, analysis of hashtag-specific sentiments revealed several novel insights, for instance, for almost all the months in 2022, the usage of #ai in tweets about exoskeletons was mainly associated with a positive sentiment. Sixth, text processing-based approaches were used to detect possibly sarcastic tweets and tweets that contained news. Finally, a comparison of positive tweets, negative tweets, neutral tweets, possibly sarcastic tweets, and tweets that contained news, in terms of different characteristic properties of these tweets are presented. The findings reveal multiple novel insights, for instance, the average number of hashtags used in tweets that contained news has considerably increased since January 2022.

Search Results

1628 articles found