Search | Preprints.org

Preprint ARTICLE | doi:10.20944/preprints202402.1296.v1

Causal Meta-Reinforcement Learning for Multimodal Remote Sensing Data Classification

Wei Zhang, Xuesong Wang, Haoyu Wang, Yuhu Cheng

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Multimodal data; remote sensing; reinforcement learning; meta-learning; causal learning

Online: 22 February 2024 (15:30:22 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201808.0350.v2

Integration of Data Mining Clustering Approach with the Personalized E-Learning System

Samina Kausar, Huahu Xu, Iftikhar Hussain, Wenhau Zhu, Misha Zahid

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: big data; clustering; data mining; educational data mining; e-learning; profile learning

Online: 19 October 2018 (05:58:05 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201801.0231.v1

Analysis of Occupational Accidents in Underground and Surface Mining in Spain Using Data Mining Techniques

Lluís Sanmiquel, Marc Bascompta, Josep Ma. Rossell, Hernán Anticoi, Eduard Guash

Subject: Engineering, Control And Systems Engineering Keywords: Data mining; Association rules; Previous Cause; Type of Accident; Overexertion

Online: 24 January 2018 (19:40:52 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201906.0144.v1

Applications of Data Mining Algorithms for Network Security

Kai Chain

Subject: Computer Science And Mathematics, Security Systems Keywords: data mining; network security; association rules; DDoS

Online: 16 June 2019 (02:42:59 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202311.1570.v1

Consore: A Powerful Federated Data Mining Tool Driving a French Research Network to Accelerate Cancer Research

Julien Guérin, Amine Nahid, Louis Tassy, Marc Deloger, François Bocquet, Simon Thézenas, Emmanuel Desandes, Marie-Cécile Le Deley, Xavier DURANDO, Anne Jaffré, Ikram Es Saad, Hugo Crochet, Marie Le Morvan, François Lion, Judith Raimbourg, Oussama Khay, Franck Craynest, Alexia Giro, Yec'han Laizet, Aurélie Bertaut, Frédérik Joly, Alain Livartowski, Pierre Etienne Heudel

Subject: Public Health And Healthcare, Public Health And Health Services Keywords: cancer research; cancer; natural language processing; data mining; data warehouse; big data

Online: 26 November 2023 (05:13:14 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201909.0040.v1

Application of Data Mining on Web Usage Data for Security: WebSecuDMiner

Muhammad Zia Aftab Khan, Jihyun Park

Subject: Business, Economics And Management, Business And Management Keywords: data mining; security; association rule; ECLAT

Online: 4 September 2019 (03:48:58 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202105.0102.v1

Implying Association Rule Mining and Market Basket Analysis for Knowing Consumer Behavior and Buying Pattern in Lockdown - A Data Mining Approach

Anurag Sinha

Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Market basket analysis; association rule mining; buying pattern; data mining

Online: 6 May 2021 (15:14:25 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202012.0529.v1

Bibliometric Knowledge Mapping of E-Commerce Platform Operation on Data Mining

Min Ye, Hongxia Li

Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: e-commerce; big data; bibliometric analysis; knowledge mapping

Online: 21 December 2020 (14:24:06 CET)

Show abstract| Download PDF| Share

The e-commerce platform in the digital economy era has evolved into a data platform ecosystem built around data resources and data mining technology systems. The most typical applications of big data are also concentrated in the field of e-commerce. E-commerce companies should first grasp the interactive relationship among the three major factors of data, technology and innovation, e-commerce platform operation is a multidisciplinary research field. It is not easy for researchers to obtain a panoramic view of the knowledge structure in this field. Knowledge graph is a kind of graph that shows the development process and structure relationship of knowledge with the field of knowledge as the object. It is not only a visual knowledge mapping, but also a serialized knowledge pedigree, which provides researchers with a quantitative research method for the development trend of statistics and academic status. The purpose of this research is to help researchers understand the key knowledge, evolutionary trends and research frontiers of current research. This study uses Citespace bibliometric analysis to analyze the data of the Science Net database and finds that: 1) The development of the research field has gone through three stages, and some representative key scholars and key documents have been recognized; 2) the common knowledge mapping of literature The co-occurrence of citations and keywords shows research hotspots; 3) The results of burst detection and central node analysis reveal research frontiers and development trends. Today, the visualization of big data brings different challenges. The abstraction between the world and today's data visualization occurs when the data is captured. Every user sees his own visualization data generated by standardized calculations. At the same time, there are still many controversies in the theoretical model, structure and structural dimensions. This is the direction that future researchers need to further study.

Preprint ARTICLE | doi:10.20944/preprints202111.0440.v1

Hybrid Algorithm for Anomaly Removal in Time Series Data Mining

Abdul Razaque, Marzhan Abenova, Munif Alotaibi, Bandar Alotaibi, Hamoud Alshammari, Salim Hariri, Aziz Alotaibi

Subject: Engineering, Control And Systems Engineering Keywords: time series; NMP algorithm; anomalies; data mining; similarities in time series; clustering

Online: 23 November 2021 (17:51:42 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202105.0601.v1

A Study on Ways to Improve Mobile RPG Using Big Data Text Mining

DongHyun Youm, JungYoon Kim

Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Mobile RPG; Big Data; Text Mining; Topic Modeling

Online: 25 May 2021 (10:21:36 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202309.1930.v1

Data Mining and Fusion Framework for In-Home Monitoring Applications

Idongesit Ekerete, Matias Garcia-Constantino, Paul McCullagh, Christopher Nugent, James McLaughlin

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: sensing solution; thermal sensor; Radar sensor; sensor fusion; data mining; in-home; machine learning

Online: 28 September 2023 (10:06:06 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201809.0466.v1

Topological Signature of 19th Century Novelists: Persistence Homology in Context-Free Text Mining

Shafie Gholizadeh, Armin Seyeditabari, Wlodek Zadrozny

Subject: Computer Science And Mathematics, Information Systems Keywords: topological data analysis; text mining; computational topology; style; persistent homology

Online: 24 September 2018 (15:33:02 CEST)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints201911.0338.v1

Sentiment Analysis on Indian Indigenous Languages: A Review on Multilingual Opinion Mining

Sonali Rajesh Shah, Abhishek Kaushik

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Indian; Sentiment Analysis; Indigenous Languages; Machine Learning; Deep learning; Data; Opinion Mining; Languages.

Online: 27 November 2019 (09:30:07 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201708.0055.v1

A Survey of Data Processing of EMR (Electronic Medical Record) Based on Data Mining

Wencheng Sun, Fang Liu, Zhiping Cai, Shengqun Fang, Guoyan Wang

Subject: Computer Science And Mathematics, Information Systems Keywords: EMR; data preprocessing; text mining; information extraction; medical decision support system

Online: 15 August 2017 (05:46:43 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202008.0074.v1

Clustering of Cardiovascular Disease Patients Using Data Mining Techniques with Principal Component Analysis and K-Medoids

Edy Irwansyah, Ebiet Salim Pratama, Margaretha Ohyver

Subject: Computer Science And Mathematics, Probability And Statistics Keywords: data mining; cardiovascular diseases; cluster analysis; principle component analysis

Online: 4 August 2020 (03:56:19 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202304.0048.v1

Evolutionary Multi-Objective Optimization of Extrusion Barrier Screws: Data Mining and Decision Making

António Gaspar-Cunha, Paulo Costa, Alexandre Cláudio Botazzo Delbem, Francisco Monaco, M. J. Ferreira, José A. Covas

Subject: Engineering, Other Keywords: Polymer Extrusion; Barrier Screws; Multi-Objective Optimization; Data Mining, Decision Making; Number of Objectives reduction

Online: 4 April 2023 (14:33:09 CEST)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints202108.0345.v1

Educational Data Mining, Student Academic Performance Prediction, Prediction Methods, Algorithms and Tools: An Overview of Reviews

Chaka Chaka

Subject: Social Sciences, Education Keywords: student academic performance; educational data mining; methods; algorithms; tools; higher education; overview

Online: 16 August 2021 (14:04:57 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202401.0571.v1

Hybrid-electric Vehicle Powertrain Mounting System Optimization Based on Cross-industry Standard Process for Data Mining

Yudong Wu, Dandan Zhao, Jingyuan Peng, Xingyu Xiang, Haibo Huang

Subject: Engineering, Automotive Engineering Keywords: Hybrid-electric vehicle powertrain mounting; Data-mining; Mounting stiffness; Multi-SVR; MRTs; MLPR

Online: 8 January 2024 (07:10:00 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201704.0117.v1

Mining Productive-associated Periodic Frequent Pattern in Body Sensor Data for Smart Home Care

Wala Ismail , Mohammad Mehedi Hassan

Subject: Computer Science And Mathematics, Information Systems Keywords: Body sensor network; Smart home, knowledge discovery in BSN data; frequent patterns; periodic patterns and productive pattern.

Online: 18 April 2017 (18:15:50 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202311.0905.v1

Mining Negative Associations From Medical Databases Considering Frequent, Regular, Closed and Maximal Patterns

Sastry Kodanda Rama Jammalamadaka, Raja Rao Budaraju

Subject: Computer Science And Mathematics, Computer Science Keywords: data mining; databases; closed item sets; maximal item sets; regular patterns; frequent patterns; negative associations; maximal patterns; frequent patterns; static

Online: 14 November 2023 (10:12:27 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202201.0445.v1

Internet of Things-Driven Data Mining for Smart Crop Production Prediction in the Peasant Farming Domain

Luis Omar Colombo-Mendoza, Mario Andrés Paredes-Valverde, María del Pilar Salas-Zárate, Rafael Valencia-García

Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: data mining; predictive analytics; Internet of Things; peasant farming; smart farming system; crop production prediction

Online: 31 January 2022 (10:58:30 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201906.0202.v1

Developing a Data Mining Based Model to Extract Predictor Factors in Energy Systems: Application of Global Natural Gas Demand

Reza Hafezi, Amir Naser Akhavan, Mazdak Zamani, Saeed Pakseresht, Shahab Shamshirband

Subject: Engineering, Mechanical Engineering Keywords: Natural gas demands; Prediction; Energy market; Genetic algorithm; Artificial neural network; Data mining.

Online: 20 June 2019 (15:58:25 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201803.0021.v2

Map Archive Mining: Visual-analytical Approaches to Explore Large Historical Map Collections

Johannes Uhl, Stefan Leyk, Yao-Yi Chiang, Weiwei Duan, Craig A. Knoblock

Subject: Environmental And Earth Sciences, Remote Sensing Keywords: map processing; retrospective landscape analysis; visual data mining, image retrieval, low-level image descriptors, color moments, t-distributed stochastic neighborhood embedding, USGS topographic maps, Sanborn fire insurance maps

Online: 17 April 2018 (09:23:37 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201608.0202.v2

Remote Sensing and Data Mining Techniques for Assessing the Urban Fabric Vulnerability to Heat Waves and UHI

Flavio Borfecchia, Vittorio Rosato, Emanuela Caiaffa, Maurizio Pollino, Luigi De Cecco, Luigi La Porta, Simone Ombuen, Lorenzo Barbieri, Federica Benelli, Flavio Camerata, Valeria Pellegrini, Andrea Filpa

Subject: Environmental And Earth Sciences, Environmental Science Keywords: HR satellite remote sensing; urban fabric vulnerability; UHI & heat waves; landsat & MODIS sensors; LST & urban heating; segmentation & objects classification; data mining; feature extraction & selection; stepwise regression & model calibration

Online: 26 October 2021 (13:11:23 CEST)

Show abstract| Download PDF| Share

Densely urbanized areas, with a low percentage of green vegetation, are highly exposed to Heat Waves (HW) which nowadays are increasing in terms of frequency and intensity also in the middle-latitude regions, due to ongoing Climate Change (CC). Their negative effects may combine with those of the UHI (Urban Heat Island), a local phenomenon where air temperatures in the compact built up cores of towns increase more than those in the surrounding rural areas, with significant impact on the quality of urban environment, on citizens health and energy consumption and transport, as it has occurred in the summer of 2003 on France and Italian central-northern areas. In this context this work aims at designing and developing a methodology based on aero-spatial remote sensing (EO) at medium-high resolution and most recent GIS techniques, for the extensive characterization of the urban fabric response to these climatic impacts related to the temperature within the general framework of supporting local and national strategies and policies of adaptation to CC. Due to its extension and variety of built-up typologies, the municipality of Rome was selected as test area for the methodology development and validation. First of all, we started by operating through photointerpretation of cartography at detailed scale (CTR 1: 5000) on a reference area consisting of a transect of about 5x20 km, extending from the downtown to the suburbs and including all the built-up classes of interest. The reference built-up vulnerability classes found inside the transect were then exploited as training areas to classify the entire territory of Rome municipality. To this end, the satellite EO HR (High Resolution) multispectral data, provided by the Landsat sensors were used within a on purpose developed "supervised" classification procedure, based on data mining and “object-classification” techniques. The classification results were then exploited for implementing a calibration method, based on a typical UHI temperature distribution, derived from MODIS satellite sensor LST (Land Surface Temperature) data of the summer 2003, to obtain an analytical expression of the vulnerability model, previously introduced on a semi-empirical basis.

Preprint ARTICLE | doi:10.20944/preprints202404.0169.v1

Visualising Daily PM10 Pollution in an Open-Cut Mining Valley of New South Wales, Australia - Part II: Classification of Synoptic Circulation Types and Local Meteorological Patterns and Their Relation to Elevated Air Pollution in Spring and Summer

Ningbo Jiang, Matthew Riley, Merched Azzi, Giovanni Di Virgilio, Hiep Nguyen Duc, Praveen Puppala

Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: PM10 pollution; local meteorological pattern; synoptic circulation type; self-organising map (SOM); air pollution conduciveness; data clustering; data visualisation; open-cut mining valley

Online: 2 April 2024 (07:42:50 CEST)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints202003.0141.v1

Sharing Is Caring – Data Sharing Initiatives in Healthcare

Tim Hulsen

Subject: Medicine And Pharmacology, Other Keywords: data sharing; data management; data science; big data; healthcare

Online: 8 March 2020 (16:46:20 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202404.1018.v1

Discovering Data Domains and Products in Data Meshes Using Semantic Blueprints

Michalis Pingos, Andreas S. Andreou

Subject: Computer Science And Mathematics, Computer Science Keywords: Big Data; Data Lakes; Data Meshes; Data Products; Data Blueprints; Metadata Semantic Enrichment

Online: 16 April 2024 (16:26:06 CEST)

Show abstract| Download PDF| Supplementary Files| Share

Preprint ARTICLE | doi:10.20944/preprints202206.0320.v4

Ten Simple Rules for Using Public Biological Data for Your Research

Vishal Oza, Jordan Whitlock, Elizabeth Wilk, Angelina Uno-Antonison, Brandon Wilk, Manavalan Gajapathy, Timothy Howton, Austyn Trull, Lara Ianov, Elizabeth Worthey, Brittany Lasseigne

Subject: Biology And Life Sciences, Other Keywords: data; reproducibility; FAIR; data reuse; public data; big data; analysis

Online: 2 November 2022 (02:55:49 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202003.0268.v1

TEEDA: An Interactive Platform for Matching Data Providers and Users in Data Marketplace

Teruaki Hayashi, Yukio Ohsawa

Subject: Social Sciences, Library And Information Sciences Keywords: matching; data marketplace; data platform; data visualization; call for data

Online: 17 March 2020 (04:10:28 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202402.0602.v1

The Improvement of the Use of Open Data in Public Institutions

Besart Hyseni, Lejla Abazi Bexheti

Subject: Computer Science And Mathematics, Information Systems Keywords: Improving use of open data; data utilization; data optimization; enhancing data access; open data impact; open data government; data transparency; data-driven decision making

Online: 12 February 2024 (09:34:51 CET)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints202309.2113.v1

Navigating the Data Architecture Landscape: A Comparative Analysis of Data Warehouse, Data Lake, Data Lakehouse, and Data Mesh

Benjamin wong

Subject: Computer Science And Mathematics, Hardware And Architecture Keywords: Data, DWH, Data Warehouse, Architecture, Data Lake, Storage, Analysis, Data Mesh, Analytical, Architectural, Data Vault

Online: 3 October 2023 (03:28:55 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202403.0265.v1

Security and Ownership in User Defined Data Meshes

Michalis Pingos, Panayiotis Christodoulou, Andreas S. Andreou

Subject: Computer Science And Mathematics, Computer Science Keywords: Big Data; Smart Data Processing; Systems of Deep Insight; Data Meshes; Data Lakes; Data Products; Blockchain; NFT; Data Blueprints

Online: 5 March 2024 (15:04:49 CET)

Show abstract| Download PDF| Supplementary Files| Share

Preprint ARTICLE | doi:10.20944/preprints202304.0130.v1

Data Cooperatives as Catalysts for Collaboration, Data Sharing, and the (Trans)Formation of the Digital Commons

Michael Max Bühler, Igor Calzada, Isabel Cane, Thorsten Jelinek, Astha Kapoor, Morshed Mannan, Sameer Mehta, Marina Micheli, Vijay Mookerje, Konrad Nübel, Alex Pentland, Trebor Scholz, Divya Siddarth, Julian Tait, Bapu Vaitla, Jianguo Zhu

Subject: Computer Science And Mathematics, Other Keywords: data; cooperatives; open data; data stewardship; data governance; digital commons; data sovereignty; open digital federation platform

Online: 7 April 2023 (14:14:02 CEST)

Show abstract| Download PDF| Share

Network effects, economies of scale, and lock-in-effects increasingly lead to a concentration of digital resources and capabilities, hindering the free and equitable development of digital entrepreneurship (SDG9), new skills, and jobs (SDG8), especially in small communities (SDG11) and their small and medium-sized enterprises (“SMEs”). To ensure the affordability and accessibility of technologies, promote digital entrepreneurship and community well-being (SDG3), and protect digital rights, we propose data cooperatives [1,2] as a vehicle for secure, trusted, and sovereign data exchange [3,4]. In post-pandemic times, community/SME-led cooperatives can play a vital role by ensuring that supply chains to support digital commons are uninterrupted, resilient, and decentralized [5]. Digital commons and data sovereignty provide communities with affordable and easy access to information and the ability to collectively negotiate data-related decisions. Moreover, cooperative commons (a) provide access to the infrastructure that underpins the modern economy, (b) preserve property rights, and (c) ensure that privatization and monopolization do not further erode self-determination, especially in a world increasingly mediated by AI. Thus, governance plays a significant role in accelerating communities’/SMEs’ digital transformation and addressing their challenges. Cooperatives thrive on digital governance and standards such as open trusted Application Programming Interfaces (APIs) that increase the efficiency, technological capabilities, and capacities of participants and, most importantly, integrate, enable, and accelerate the digital transformation of SMEs in the overall process. This policy paper presents and discusses several transformative use cases for cooperative data governance. The use cases demonstrate how platform/data-cooperatives, and their novel value creation can be leveraged to take digital commons and value chains to a new level of collaboration while addressing the most pressing community issues. The proposed framework for a digital federated and sovereign reference architecture will create a blueprint for sustainable development both in the Global South and North.

Preprint COMMUNICATION | doi:10.20944/preprints202401.0780.v1

Data Reuse in Agricultural Genomics Research: Present Challenges and Future Solutions

Alenka Hafner, Victoria DeLeo, Cecilia Deng, Christine G. Elsik, Damarius Fleming, Peter W. Harrison, Theodore S. Kalbfleisch, Bruna Petry, Boas Pucker, Elsa H. Quezada-Rodríguez, Christopher K. Tuggle, James Koltes

Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: data reuse; agriculture; open data; metadata; data standards; equity

Online: 10 January 2024 (10:07:03 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202311.0104.v1

Conceptual Design of a Generic Data Harmonization Process for OMOP CDM

Elisa Henke, Michele Zoch, Yuan Peng, Ines Reinecke, Martin Sedlmayr, Franziska Bathelt

Subject: Public Health And Healthcare, Other Keywords: OMOP; OHDSI; interoperability; data harmonization; clinical data; claims data

Online: 2 November 2023 (07:45:02 CET)

Show abstract| Download PDF| Supplementary Files| Share

Preprint ARTICLE | doi:10.20944/preprints202308.1237.v1

A Method to Enable Automatic Extraction of Cost and Quantity Data from Hierarchical Construction Information Documents to Enable Rapid Digital Comparison and Analysis

Daniel Adanza Dopazo, Lamine Mahdjoubi, Bill Gething

Subject: Engineering, Transportation Science And Technology Keywords: data mining; data extraction; data science; cost infrastructure projects

Online: 17 August 2023 (09:25:22 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202306.1378.v1

Algorithm-based Data Generation (ADG) Engine for Data Analytics

Iman I. M. Abu Sulayman, Peter Voege, Abdelkader Ouda

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Data Generation; Anomaly Data; User Behavior Generation; Big Data

Online: 19 June 2023 (16:31:37 CEST)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints202007.0153.v1

A Hitchhiker’s Guide to Working with Large, Open-Source Neuroimaging Datasets

Corey Horien, Stephanie Noble, Abigail Greene, Kangjoo Lee, Daniel Barron, Siyuan Gao, Dave O'Connor, Mehraveh Salehi, Javid Dadashkarimi, Xilin Shen, Evelyn Lake, R. Todd Constable, Dustin Scheinost

Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: Open-science; big data; fMRI; data sharing; data management

Online: 8 July 2020 (11:53:33 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201810.0273.v1

Russian-German Astroparticle Data Life Cycle Initiative

Igor Bychkov, Andrey Demichev, Julia Dubenskaya, Oleg Fedorov, Andreas Haungs, Andreas Heiss, Yulia Kazarina, Elena Korosteleva, Dmitriy Kostunin, Alexander Kryukov, Andrey Mikhailov, Minh-Duc Nguyen, Stanislav Polyakov, Evgeny Postnikov, Alexey Shigarov, Dmitry Shipilov, Achim Streit, Viktoria Tokareva, Doris Wochele, Jürgen Wochele, Dmitry Zhurov

Subject: Physical Sciences, Astronomy And Astrophysics Keywords: astroparticle physics, cosmic rays, data life cycle management, data curation, meta data, big data, deep learning, open data

Online: 12 October 2018 (14:48:32 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202105.0589.v1

A Study on Ways to Extend Public Data for Game Ratings from Korea

HoSeong Kang, JungYoon Kim

Subject: Engineering, Automotive Engineering Keywords: Game Ratings; Public Data; Game Data; Data analysis; GRAC(Korea)

Online: 25 May 2021 (08:32:32 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202007.0078.v1

Data Driven Analytics for Personalized Medical Decision Making

Nataliia Melnykova, Nataliya Shakhovska, Michal Gregus, Volodymyr Melnykov, Mariana Zakharchuk, Olena Vovk

Subject: Computer Science And Mathematics, Information Systems Keywords: personalization; decision making; medical data; artificial intelligence; Data-driving; Big Data; Data Mining; Machine Learning

Online: 5 July 2020 (15:04:17 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202404.0849.v1

Intellecta Cognitiva: A Comprehensive Dataset for Advancing Academic Knowledge and Machine Reasoning

Ditto PS, Ajmal PS, Jithin VG

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Synthetic data; pretrain data; llm training

Online: 12 April 2024 (12:46:27 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202103.0593.v1

Creating a Business and Supporting Digital Transformation

Miguel Ayala, Jorge Portella, Sergio Martinez, Maria Rojas, Luis Jimenez

Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Business Inteligence; Data Mining; Data Warehouse.

Online: 24 March 2021 (13:47:31 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202012.0468.v1

Developing High-Resolution Gridded Rainfall and Temperature Data for Bangladesh: The ENACTS-BMD Dataset

Nachiketa Acharya, Rija Faniriantsoa, Bazlur Rashid, Razia Sultana, Carlo Montes, Tufa Dinku, S.M.Q. Hassan

Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: climate data; gridded product; data merging

Online: 18 December 2020 (13:29:38 CET)

Show abstract| Download PDF| Share

Preprint CASE REPORT | doi:10.20944/preprints201801.0066.v1

Data Visualization of European Regional Operational Programmes: Unleashing the Informative Potential of Open Data for Performance Assessment

Emanuele Frontoni, Roberto Palloni

Subject: Engineering, Control And Systems Engineering Keywords: cohesion policy; data visualization; open data

Online: 8 January 2018 (11:11:47 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202403.0012.v1

Flexible Techniques to Detect Typical Hidden Errors in Large Longitudinal Datasets

Renato Bruni, Cinzia Daraio, Simone Di Leo

Subject: Computer Science And Mathematics, Computer Science Keywords: big data; information processing; information reconstruction; data quality: longitudinal data sequences

Online: 1 March 2024 (10:33:16 CET)

Show abstract| Download PDF| Share

Preprint COMMUNICATION | doi:10.20944/preprints202309.0047.v1

Analyzing Public Reactions during the MPox Outbreak: Findings from Topic Modeling of Tweets

Nirmalya Thakur, Yuvraj Nihal Duggal, Zihui Liu

Subject: Public Health And Healthcare, Public Health And Health Services Keywords: MPox; big data; data analysis; data science; Twitter; natural language processing

Online: 1 September 2023 (10:23:41 CEST)

Show abstract| Download PDF| Share

In the last decade and a half, the world has experienced the outbreak of a range of viruses such as COVID-19, H1N1, flu, Ebola, Zika Virus, Middle East Respiratory Syndrome (MERS), Measles, and West Nile Virus, just to name a few. During these virus outbreaks, the usage and effectiveness of social media platforms increased significantly as such platforms served as virtual communities, enabling their users to share and exchange information, news, perspectives, opinions, ideas, and comments related to the outbreaks. Analysis of this Big Data of conversations related to virus outbreaks using concepts of Natural Language Processing such as Topic Modeling has attracted the attention of researchers from different disciplines such as Healthcare, Epidemiology, Data Science, Medicine, and Computer Science. The recent outbreak of the MPox virus has resulted in a tremendous increase in the usage of Twitter. Prior works in this field have primarily focused on the sentiment analysis and content analysis of these Tweets, and the few works that have focused on topic modeling have multiple limitations. This paper aims to address this research gap and makes two scientific contributions to this field. First, it presents the results of performing Topic Modeling on 601,432 Tweets about the 2022 Mpox outbreak, which were posted on Twitter between May 7, 2022, and March 3, 2023. The results indicate that the conversations on Twitter related to Mpox during this time range may be broadly categorized into four distinct themes - Views and Perspectives about MPox, Updates on Cases and Investigations about Mpox, MPox and the LGBTQIA+ Community, and MPox and COVID-19. Second, the paper presents the findings from the analysis of these Tweets. The results show that the theme that was most popular on Twitter (in terms of the number of Tweets posted) during this time range was - Views and Perspectives about MPox. It is followed by the theme of MPox and the LGBTQIA+ Community, which is followed by the themes of MPox and COVID-19 and Updates on Cases and Investigations about Mpox, respectively. Finally, a comparison with prior works in this field is also presented to highlight the novelty and significance of this research work.

Preprint ARTICLE | doi:10.20944/preprints202205.0344.v1

Transforming Points of Single Contact Data into Linked Data

Pavlina Fragkou, Leandros Maglaras

Subject: Computer Science And Mathematics, Information Systems Keywords: Linked (open) Data; Semantic Interoperability; Data Mapping; Governmental Data; SPARQL; Ontologies

Online: 25 May 2022 (08:18:46 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202111.0073.v1

Using the Data Quality Dashboard to Improve the EHDEN Network

Clair Blacketer, Erica A Voss, Frank DeFalco, Nigel Hughes, Martijn J Schuemie, Maxim Moinat, Peter Rijnbeek

Subject: Medicine And Pharmacology, Other Keywords: data quality; OMOP CDM; EHDEN; healthcare data; real world data; RWD

Online: 3 November 2021 (09:12:54 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202110.0103.v1

Usage of Data Analytics in Improving Sourcing of Supply Chain Inputs

S M Nazmuz Sakib

Subject: Computer Science And Mathematics, Information Systems Keywords: Data Analytics; Analytics; Supply Chain Input; Supply Chain; Data Science; Data

Online: 6 October 2021 (10:38:42 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202310.1998.v1

Marburg Virus Outbreak and a New Conspiracy Theory: Findings from a Comprehensive Analysis of Web Behavior

Nirmalya Thakur, Shuqi Cui, Kesha A. Patel, Nazif Azizi, Victoria Knieling, Changhee Han, Audrey Poon, Rishika Shah

Subject: Public Health And Healthcare, Public Health And Health Services Keywords: Marburg virus; big data; data mining; data analysis; google trends; web behavior; data science; conspiracy theory

Online: 31 October 2023 (07:02:07 CET)

Show abstract| Download PDF| Share

During virus outbreaks in the recent past web behavior mining, modeling, and analysis have served as means to examine, explore, interpret, assess, and forecast the worldwide perception, readiness, reactions, and response linked to these virus outbreaks. The recent outbreak of the Marburg Virus disease (MVD), the high fatality rate of MVD, and the conspiracy theory linking the FEMA alert signal in the United States on October 4, 2023, with MVD and a zombie outbreak, resulted in a diverse range of reactions in the general public which has transpired in a surge in web behavior in this context. This resulted in “Marburg Virus” featuring in the list of the top trending topics on Twitter on October 3, 2023, and “Emergency Alert System” and “Zombie” featuring in the list of top trending topics on Twitter on October 4, 2023. No prior work in this field has mined and analyzed the emerging trends in web behavior in this context. The work presented in this paper aims to address this research gap and makes multiple scientific contributions to this field. First, it presents the results of performing time series forecasting of the search interests related to MVD emerging from 216 different regions on a global scale using ARIMA, LSTM, and Autocorrelation. The results of this analysis present the optimal model for forecasting web behavior related to MVD in each of these regions. Second, the correlation between search interests related to MVD and search interests related to zombies (in the context of this conspiracy theory) was investigated. The findings show that there were several regions where there was a statistically significant correlation between MVD-related searches and zombie-related searches (in the context of this conspiracy theory) on Google on October 4, 2023. Finally, the correlation between zombie-related searches (in the context of this conspiracy theory) in the United States and other regions was investigated. This analysis helped to identify those regions where this correlation was statistically significant.

Preprint ARTICLE | doi:10.20944/preprints202308.0442.v1

Instrumental and Observational Problems of the Earliest Temperature Records in Italy: A Methodology for Data Recovery and Correction

Dario Camuffo, Antonio Della Valle, Francesca Becherini

Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: Thermometers; Temperature records; Early instrumental meteorological series; Data rescue; Data recovery; Data correction; Climate data analysis

Online: 7 August 2023 (03:01:24 CEST)

Show abstract| Download PDF| Share

A distinction is made between data rescue (i.e., copying, digitizing and archiving) and data recovery that implies deciphering, interpreting and transforming early instrumental readings and their metadata to obtain high-quality datasets in modern units. This requires a multidisciplinary approach that includes: palaeography and knowledge of Latin and other languages to read the handwritten logs and additional documents; history of science to interpret the original text, data e metadata within the cultural frame of the 17th, 18th and early 19th century; physics and technology to recognize bias of early instruments or calibrations, or to correct for observational bias; astronomy to calculate and transform the original time in canonical hours that started from twilight. The liquid-in-glass thermometer was invented in 1641 and the earliest temperature records started in 1654. Since then, different types of thermometers were invented, based on the thermal expansion of air or selected thermometric liquids with deviation from linearity. Reference points, thermometric scales, calibration methodologies were not comparable, and not always adequately described. Thermometers had various locations and exposures, e.g., indoor, outdoor, on windows, gardens or roofs, facing different directions. Readings were made only one or a few times a day, not necessarily respecting a precise time schedule: this bias is analysed for the most popular combinations of reading times. The time was based on sundials and local Sun, but the hours were counted starting from twilight. In 1789-90 Italy changed system and all cities counted hours from their lower culmination (i.e., local midnight), so that every city had its local time; in 1866, all the Italian cities followed the local time of Rome; in 1893, the whole Italy adopted the present-day system, based on the Coordinated Universal Time and the time zones. In 1873, when the International Meteorological Committee (IMO) was founded, later transformed in World Meteorological Organization (WMO), a standardization of instruments and observational protocols was established, and all data became fully comparable. In the early instrumental period, from 1654 to 1873, the comparison, correction and homogenization of records is quite difficult, mainly because of the scarcity or even absence of metadata. This paper deals about this confused situation, discussing the main problems, but also the methodologies to recognize missing metadata, distinguish indoor from outdoor readings; correct and transform early datasets in unknown or arbitrary units into modern units; finally, in which cases it is possible to reach the quality level required by WMO. The focus is to explain the methodology needed to recover early instrumental records, i.e., the operations that should be performed to interpret, correct, and transform the original raw data into a high-quality dataset of temperature, usable for climate studies.

Preprint DATA DESCRIPTOR | doi:10.20944/preprints202308.1701.v1

A Dataset of Search Interests Related to Disease X Originating from Different Geographic Regions

Nirmalya Thakur, Kesha A. Patel, Isabella Hall, Yuvraj Nihal Duggal, Shuqi Cui

Subject: Public Health And Healthcare, Public Health And Health Services Keywords: disease X; big data; data science; data analysis; dataset development; database; google trends; data mining; healthcare; epidemiology

Online: 24 August 2023 (05:48:54 CEST)

Show abstract| Download PDF| Share

Preprint COMMUNICATION | doi:10.20944/preprints202303.0453.v1

Analysis of Public Discourse on Twitter involving COVID-19 and MPox: Findings from Sentiment Analysis and Text Analysis

Nirmalya Thakur

Subject: Social Sciences, Media Studies Keywords: COVID-19; MPox; Twitter; Big Data; Data Mining; Data Analysis; Sentiment Analysis; Data Science; Social Media; Monkeypox

Online: 27 March 2023 (08:39:28 CEST)

Show abstract| Download PDF| Share

Mining and analysis of the Big Data of Twitter conversations have been of significant interest to the scientific community in the fields of healthcare, epidemiology, big data, data science, computer science, and their related areas, as can be seen from several works in the last few years that focused on sentiment analysis and other forms of text analysis of Tweets related to Ebola, E-Coli, Dengue, Human papillomavirus (HPV), Middle East Respiratory Syndrome (MERS), Measles, Zika virus, H1N1, influenza-like illness, swine flu, flu, Cholera, Listeriosis, cancer, Liver Disease, Inflammatory Bowel Disease, kidney disease, lupus, Parkinson's, Diphtheria, and West Nile virus. The recent outbreaks of COVID-19 and MPox have served as "catalysts" for Twitter usage related to seeking and sharing information, views, opinions, and sentiments involving both these viruses. While there have been a few works published in the last few months that focused on performing sentiment analysis of Tweets related to either COVID-19 or MPox, none of the prior works in this field thus far involved analysis of Tweets focusing on both COVID-19 and MPox at the same time. With an aim to address this research gap, a total of 61,862 Tweets that focused on Mpox and COVID-19 simultaneously, posted between May 7, 2022, to March 3, 2023, were studied to perform sentiment analysis and text analysis. The findings of this study are manifold. First, the results of sentiment analysis show that almost half the Tweets (the actual percentage is 46.88%) had a negative sentiment. It was followed by Tweets that had a positive sentiment (31.97%) and Tweets that had a neutral sentiment (21.14%). Second, this paper presents the top 50 hashtags that were used in these Tweets. Third, it presents the top 100 most frequently used words that are featured in these Tweets. The findings of text analysis show that some of the commonly used words involved directly referring to either or both viruses. In addition to this, the presence of words such as "Polio", "Biden", "Ukraine", "HIV", "climate", and "Ebola" in the list of the top 100 most frequent words indicate that topics of conversations on Twitter in the context of COVID-19 and MPox also included a high level of interest related to other viruses, President Biden, and Ukraine. Finally, a comprehensive comparative study that involves a comparison of this work with 49 prior works in this field is presented to uphold the scientific contributions and relevance of the same.

Working Paper ARTICLE

Business Intelligence and Its Big Evolution

Andres Velosa, Gustavo Pabon

Subject: Engineering, Automotive Engineering Keywords: Business Intelligence; Data warehouse; Data Marts; Architecture; Data; Information; cloud; Data Mining; evolution; technologic companies; tools; software

Online: 24 March 2021 (13:06:53 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202111.0410.v1

Design and Implementation of Efficient Transmission of Cloud Data in Wireless Media

Virendra Pandharipant Nikam, Sheetal S Dhande

Subject: Engineering, Control And Systems Engineering Keywords: Data compression; data hiding; psnr; mse; virtual data; public cloud; quantization error

Online: 22 November 2021 (15:17:12 CET)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints201807.0059.v1

Data Normalization in NMR-based Metabolomics

Helena Zacharias, Michael Altenbuchinger, Wolfram Gronwald

Subject: Biology And Life Sciences, Biophysics Keywords: data normalization; data scaling; zero-sum; metabolic fingerprinting; NMR; statistical data analysis

Online: 3 July 2018 (16:22:31 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202404.0357.v2

The Path to Data Protection Governance in China Mainland

Bing Chen, Yongji Liu

Subject: Social Sciences, Law Keywords: data protection; personal privacy; cybersecurity; data security

Online: 9 April 2024 (12:02:20 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202402.1372.v1

Leveraging Visualization and Machine Learning Techniques in Education: A Case Study of K-12 State Assessment Data

Loni Taylor, Vibhuti Gupta, Kwanghee Jung

Subject: Computer Science And Mathematics, Analysis Keywords: Data Visualization; Big Data; AI; Machine Learning

Online: 23 February 2024 (10:39:04 CET)

Show abstract| Download PDF| Share

Working Paper ARTICLE

The Analysis and the Measurement of Poverty: An Interval Based Composite Indicator Approach

Carlo Drago

Subject: Business, Economics And Management, Econometrics And Statistics Keywords: poverty; composite indicators; interval data; symbolic data

Online: 24 August 2021 (15:46:09 CEST)

Show abstract| Download PDF| Share

Working Paper ARTICLE

Development of Cost and Schedule Data Integration Algorithm based on Big Data Technology

Daegu Cho, Myungdo Lee, Jihye Shin

Subject: Computer Science And Mathematics, Computer Science Keywords: big data; data integration; EVMS; construction management

Online: 30 October 2020 (15:35:00 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201701.0090.v1

An Automatic Matcher and Linker for Transportation Datasets

Ali Masri, Karine Zeitouni, Zoubida Kedad, Bertrand Leroy

Subject: Computer Science And Mathematics, Information Systems Keywords: transportation data; data interlinking; automatic schema matching

Online: 20 January 2017 (03:38:06 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202308.1391.v1

An Automated Method for Extracting and Analyzing Railway Infrastructure Cost Data

Daniel Adanza Dopazo, Lamine Mahdjoubi, Bill Gething

Subject: Engineering, Transportation Science And Technology Keywords: data extraction; data mining; railway infrastructure costs; infrastructure costs data analysis; cost analysis

Online: 18 August 2023 (16:03:08 CEST)

Show abstract| Download PDF| Share

Working Paper ARTICLE

Model for the Collection and Analysis of Data from Teachers and Students, Supported by Academic Analytics

Fredys A. Simanca H., Isabel Hernández Arteaga, María Elsa Unriza Puin, Fabian Blanco Garrido, Jaime Paez Paez, Jairo Cortes Méndez

Subject: Computer Science And Mathematics, Information Systems Keywords: Academic Analytics; data storage; education and big data; analysis of data; learning analytics

Online: 19 July 2020 (20:37:39 CEST)

Show abstract| Download PDF| Share

Business Intelligence, defined by [1] as "the ability to understand the interrelations of the facts that are presented in such a way that it can guide the action towards achieving a desired goal", has been used since 1958 for the transformation of data into information, and of information into knowledge, to be used when making decisions in a business environment. But, what would happen if we took the same principles of business intelligence and applied them to the academic environment? The answer would be the creation of Academic Analytics, a term defined by [2] as the process of evaluating and analyzing organizational information from university systems for reporting and making decisions, whose characteristics allow it to be used more and more in institutions, since the information they accumulate about their students and teachers gathers data such as academic performance, student success, persistence, and retention [5]. Academic Analytics enables an analysis of data that is very important for making decisions in the educational institutional environment, aggregating valuable information in the academic research activity and providing easy to use business intelligence tools. This article shows a proposal for creating an information system based on Academic Analytics, using ASP.Net technology and trusting storage in the database engine Microsoft SQL Server, designing a model that is supported by Academic Analytics for the collection and analysis of data from the information systems of educational institutions. The idea that was conceived proposes a system that is capable of displaying statistics on the historical data of students and teachers taken over academic periods, without having direct access to institutional databases, with the purpose of gathering the information that the director, the teacher, and finally the student need for making decisions. The model was validated with information taken from students and teachers during the last five years, and the export format of the data was pdf, csv, and xls files. The findings allow us to state that it is extremely important to analyze the data that is in the information systems of the educational institutions for making decisions. After the validation of the model, it was established that it is a must for students to know the reports of their academic performance in order to carry out a process of self-evaluation, as well as for teachers to be able to see the results of the data obtained in order to carry out processes of self-evaluation, and adaptation of content and dynamics in the classrooms, and finally for the head of the program to make decisions.

Preprint ARTICLE | doi:10.20944/preprints201812.0071.v1

Data Governance and Sovereignty in Urban Data Spaces Based on Standardized ICT Reference Architectures

Silke Cuno, Lina Bruns, Nikolay Tcholtchev, Philipp Lämmel, Ina Schieferdecker

Subject: Engineering, Electrical And Electronic Engineering Keywords: data governance; data sovereignty; urban data spaces; ICT reference architecture; open urban platform

Online: 6 December 2018 (05:09:54 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202110.0260.v1

Online System for Power Quality Operational Data Management in Frequency Monitoring using Python and Grafana

Jose-María Sierra-Fernández, Olivia Florencias-Oliveros, Manuel-Jesús Espinosa-Gavira, Juan-José González-de-la-Rosa, Agustín Agüera-Pérez, José-Carlos Palomares-Salas

Subject: Engineering, Electrical And Electronic Engineering Keywords: big data; data acquisition; data visualization; data exchange; dashboard; frequency stability; Grafana lab; Power Quality; GPS reference; frequency measurement.

Online: 18 October 2021 (18:07:43 CEST)

Show abstract| Download PDF| Share

Preprint DATA DESCRIPTOR | doi:10.20944/preprints202109.0370.v1

The SERL Observatory Dataset: Longitudinal Smart Meter Electricity and Gas Data, Survey, EPC and Climate Data for Over 13,000 GB Households

Ellen Webborn, Jessica Few, Eoghan McKenna, Simon Elam, Martin Pullinger, Ben Anderson, David Shipworth, Tadj Oreszczyn

Subject: Engineering, Energy And Fuel Technology Keywords: smart meter data; household survey; EPC; energy data; energy demand; energy consumption; longitudinal; energy modelling; electricity data; gas data

Online: 22 September 2021 (10:16:05 CEST)

Show abstract| Download PDF| Supplementary Files| Share

Preprint ARTICLE | doi:10.20944/preprints201807.0038.v1

Towards the Provision of Accurate Atomic Data for Neutral Iron

Andrew Conroy, Catherine Ramsbottom, Connor Ballance, Francis Keenan

Subject: Physical Sciences, Atomic And Molecular Physics Keywords: atomic data

Online: 3 July 2018 (11:25:13 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202404.0740.v1

Functional Process Control (FPC): A Methodology to Reduce Variability

Joaquín Sancho, Javier Martínez, Jorge Pastor, Carlos Cajal

Subject: Computer Science And Mathematics, Applied Mathematics Keywords: functional data; quality; non-normal data; variability; outlier

Online: 10 April 2024 (15:52:41 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202309.1016.v1

Three-Stage Sampling Algorithm for Highly Imbalanced Multi-Classification Time Series Data Sets

Haoming Wang

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Imbalanced data; Data preprocessing; Sampling; Tomek Links; DTW

Online: 14 September 2023 (14:00:42 CEST)

Show abstract| Download PDF| Share

Purpose To alleviate the data imbalance problem caused by subjective and objective reasons, scholars have developed different data preprocessing algorithms, among which undersampling algorithms are widely used because of their fast and efficient performance. However, when the number of samples of some categories in a multi-classification dataset is too small to be processed by sampling, or the number of minority class samples is only 1 to 2, the traditional undersampling algorithms will be weakened. Methods This study selects 9 multi-classification time series datasets with extremely few samples as the objects, fully considers the characteristics of time series data, and uses a three-stage algorithm to alleviate the data imbalance problem. Stage one: Random oversampling with disturbance items increases the number of sample points; Stage two: On this basis, SMOTE (Synthetic Minority Oversampling Technique) oversampling; Stage three: Using dynamic time warping distance to calculate the distance between sample points, identify the sample points of Tomek Links at the boundary, and clean up the boundary noise.Results This study proposes a new sampling algorithm. In the 9 multi-classification time series datasets with extremely few samples, the new sampling algorithm is compared with four classic undersampling algorithms, ENN (Edited Nearest Neighbours), NCR (Neighborhood Cleaning Rule), OSS (One Side Selection) and RENN (Repeated Edited Nearest Neighbours), based on macro accuracy, recall rate and F1-score evaluation indicators. The results show that: In the 9 datasets selected, the dataset with the most categories and the least number of minority class samples, FiftyWords, the accuracy of the new sampling algorithm is 0.7156, far beyond ENN, RENN, OSS and NCR; its recall rate is also better than the four undersampling algorithms used for comparison, at 0.7261; its F1-score is increased by 200.71%, 188.74%, 155.29% and 85.61%, respectively, relative to ENN, RENN, OSS, and NCR; In the other 8 datasets, this new sampling algorithm also shows good indicator scores.Conclusion The new algorithm proposed in this study can effectively alleviate the data imbalance problem of multi-classification time series datasets with many categories and few minority class samples, and at the same time clean up the boundary noise data between classes.

Preprint ARTICLE | doi:10.20944/preprints202307.1117.v1

Design and Analysis of Query Models Database Preservation Information Systems Digitization of History and Endowments; Case Study of History and Waqf of Sumedang Larang Kingdom Indonesia

R. Sudrajat, Budi Nurani Ruchjana, Atje Setiawan Abdullah, Rahmat Budiarto

Subject: Computer Science And Mathematics, Information Systems Keywords: history; endowments; query model; digital data; physical data

Online: 17 July 2023 (15:11:18 CEST)

Show abstract| Download PDF| Share

Preprint COMMUNICATION | doi:10.20944/preprints202305.1694.v1

Synthetic Data & the Future of Women's Health: A Synergistic Relationship

Gayathri Delanerolle, Peter Phiri, Heitor Cavalini, David Benfield, Ashish Shetty, Yassine Bouchareb, Jian Shi, Alain Zemkoho

Subject: Medicine And Pharmacology, Clinical Medicine Keywords: Womens Health; Data Science; Data Methods; Artificial Intelligence

Online: 24 May 2023 (04:48:58 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202206.0335.v1

The Dataharmonizer: a Tool for Faster Data Harmonization, Validation, Aggregation, and Analysis of Pathogen Genomics Contextual Information

Ivan Gill, Emma Griffiths, Damion Dooley, Rhiannon Cameron, Sarah Savić Kallesøe, Nithu Sara John, Anoosha Sehar, Gurinder Gosal, David Alexander, Madison Chapel, Matthew Croxen, Benjamin Delisle, Rachelle Di Tullio, Daniel Gaston, Ana Duggan, Jennifer Guthrie, Mark Horsman, Esha Joshi, Levon Kearney, Natalie Knox, Lynette Lau, Jason LeBlanc, Vincent Li, Pierre Lyons, Keith MacKenzie, Andrew McArthur, Emilie Panousis, John Palmer, Natalie Prystajecky, Kerri Smith, Jennifer Tanner, Christopher Townend, Andrea Tyler, Gary Van Domselaar, William Hsiao

Subject: Computer Science And Mathematics, Information Systems Keywords: metadata; contextual data; harmonization; genomic surveillance; data management

Online: 24 June 2022 (08:46:04 CEST)

Show abstract| Download PDF| Share

Pathogen genomics is a critical tool for public health surveillance, infection control, outbreak investigations, as well as research. In order to make use of pathogen genomics data, it must be interpreted using contextual data (metadata). Contextual data includes sample metadata, laboratory methods, patient demographics, clinical outcomes, and epidemiological information. However, the variability in how contextual information is captured by different authorities and how it is encoded in different databases poses challenges for data interpretation, integration, and its use/re-use. The DataHarmonizer is a template-driven spreadsheet application for harmonizing, validating, and transforming genomics contextual data into submission-ready formats for public or private repositories. The tool’s web browser-based JavaScript environment enables validation and its offline functionality and local installation increases data security. The DataHarmonizer was developed to address the data sharing needs that arose during the COVID-19 pandemic, and was used by members of the Canadian COVID Genomics Network (CanCOGeN) to harmonize SARS-CoV-2 contextual data for national surveillance and for public repository submission.In order to support coordination of international surveillance efforts, we have partnered with the Public Health Alliance for Genomic Epidemiology to also provide a template conforming to its SARS-CoV-2 contextual data specification for use worldwide. Templates are also being developed for One Health and foodborne pathogens. Overall, the DataHarmonizer tool improves the effectiveness and fidelity of contextual data capture as well as its subsequent usability. Harmonization of contextual information across authorities, platforms and systems globally improves interoperability and reusability of data for concerted public health and research initiatives to fight the current pandemic and future public health emergencies. While initially developed for the COVID-19 pandemic, its expansion to other data management applications and pathogens is already underway.

Preprint ARTICLE | doi:10.20944/preprints202108.0471.v1

Identifying the Main Risk Factors for CVD Prediction Using Machine Learning Algorithms

Luis Rolando Guarneros-Nolasco, Nancy Aracely Cruz-Ramos, Giner Alor-Hernández, Lisbeth Rodríguez-Mazahua, José Luis Sánchez-Cervantes

Subject: Computer Science And Mathematics, Information Systems Keywords: Big data; Health prevention; Machine learning; Medical data

Online: 24 August 2021 (14:00:12 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202106.0738.v1

Combination of Using Pairwise Comparisons and Composite Reference Series: A New Approach in the Homogenization of Climatic Time Series

Peter Domonkos

Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: time series; homogenization; ACMANT; observed data; data accuracy

Online: 30 June 2021 (13:08:39 CEST)

Show abstract| Download PDF| Share

Working Paper ARTICLE

Generating Fake ECGs using GANs for Anonymizing Healthcare Data

Esteban Piacentino, Alvaro Guarner, Cecilio Angulo

Subject: Computer Science And Mathematics, Data Structures, Algorithms And Complexity Keywords: GAN; ECG; anonymization; healthcare data; sensors; data transformation

Online: 3 September 2020 (05:26:01 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201806.0419.v1

Modeling Analytical Streams for Social Business Intelligence

Indira Lanza-Cruz, Rafael Berlanga, María José Aramburu

Subject: Computer Science And Mathematics, Information Systems Keywords: social business intelligence; data streaming models; linked data

Online: 26 June 2018 (12:48:17 CEST)

Show abstract| Download PDF| Share

Preprint COMMUNICATION | doi:10.20944/preprints201803.0054.v1

Travel Time Prediction Based on Data Feature Selection and Data Clustering Methods

Chi-Hua Chen

Subject: Computer Science And Mathematics, Information Systems Keywords: data feature selection; data clustering; travel time prediction

Online: 7 March 2018 (13:30:06 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202404.0429.v1

Analysis of Missingness Scenarios for Observational Health Data

Alireza Zamanian, Henrik von Kleist, Octavia Andreea Ciora, Marta Piperno, Gino Lancho, Narges Ahmidi

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Missing Data Analysis; Observational Health Data; Missingness Scenarios; Missing Data Assumptions; Missingness distribution shift

Online: 5 April 2024 (10:45:36 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202312.0496.v1

Nonparametric Partial Linear Estimation for Spatial Functional Data with Missing At-Random

Tawfik Benchikh, Ibrahim M. Almanjahie, Omar Fetitah, Mohammed kadi Attouch

Subject: Computer Science And Mathematics, Probability And Statistics Keywords: Missing at random data; Functional data analysis; Asymptotic normality; spatial data; Kernel regression method

Online: 7 December 2023 (09:14:16 CET)

Show abstract| Download PDF| Share

Preprint COMMUNICATION | doi:10.20944/preprints202206.0172.v3

MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022 MonkeyPox Outbreak

Nirmalya Thakur

Subject: Computer Science And Mathematics, Information Systems Keywords: Monkeypox; monkey pox; Twitter; Dataset; Tweets; Social Media; Big Data; Data Mining; Data Science

Online: 25 July 2022 (09:41:19 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202109.0518.v1

Data Fusion and Visualization of a Multi-Sensor Personal Exposure Campaign

Rok Novak, Ioannis Petridis, David Kocman, Johanna Amalia Robinson, Tjaša Kanduč, Dimitris Chapizanis, Spyros Karakitsios, Benjamin Flückiger, Danielle Vienneau, Ondřej Mikeš, Celine Degrendele, Ondřej Sáňka, Saul García Dos Santos-Alves, Thomas Maggos, Dementra Pardali, Asimina Stamatelopoulou, Dikaia Saraga, Marco Giovanni Persico, Jaideep Visave, Alberto Gotti, Dimosthenis Sarigiannis

Subject: Environmental And Earth Sciences, Environmental Science Keywords: data fusion; multi-sensor; data visualization; data treatment; participant reports; air quality; exposure assessment

Online: 30 September 2021 (14:13:52 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201806.0185.v1

Data Quality: A Negotiator between Paper-based and Digital Records in the Pakistan’s TB Control Program

Syed Mustafa Ali, Farah Naureen, Arif Noor, Maged Kamel N. Boulos, Javariya Aamir, Muhammad Ishaq, Naveed Anjum, John Ainsworth, Aamna Rashid, Arman Majidulla, Irum Fatima

Subject: Medicine And Pharmacology, Other Keywords: mHealth; mobile data collection; data quality; data quality assessment framework; Tuberculosis control; developing countries

Online: 12 June 2018 (10:34:33 CEST)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints202105.0663.v1

MANAGEMENT OF BIG DATA IN THE CONTEMPORARY WORLD

Anjaneyulu Jinugu, Sreechandana Kodimela, Madhavi Laitha V V

Subject: Computer Science And Mathematics, Computer Science Keywords: Big Data, Internet Data Sources (IDS), Internet of Things (IoT), Sustainable Development Goals (SDGs), Big data Technologies, Big data Challenges

Online: 27 May 2021 (10:31:03 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202003.0073.v1

FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units

Koenraad De Smedt, Dimitris Koureas, Peter Wittenburg

Subject: Computer Science And Mathematics, Information Systems Keywords: digital object; data infrastructure; research infrastructure; data management; data science; FAIR data; open science; European Open Science Cloud; EOSC; persistent identifier

Online: 5 March 2020 (02:30:06 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202403.1611.v1

A Formal Model for Reliable Data Acquisition and Control in Legacy Critical Infrastructures

Jose Miguel Blanco, Jose M. Del Alamo, Juan C. Dueñas, Felix Cuadrado

Subject: Computer Science And Mathematics, Information Systems Keywords: Critical Infrastructure; Water Distribution Network; Formal Model; Digital Transformation; Data Management; Data Security; Data Acquisition

Online: 26 March 2024 (14:35:30 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202307.0244.v1

Estimation of Reference Evapotranspiration in a Semi-Arid Region of Mexico

Gerardo Delgado-Ramírez, Martín Alejandro Bolaños-González, Abel Quevedo-Nolasco, Adolfo López-Pérez, Juan Estrada-Ávalos

Subject: Environmental And Earth Sciences, Water Science And Technology Keywords: NASA-POWER platform; empirical equations; reanalysis data; meteorological data

Online: 4 July 2023 (13:59:00 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202305.0722.v1

Anomaly Detection in Endemic Disease Surveillance Data Using Machine Learning Techniques

Peter U. Eze, Nicholas Geard, Ivo Mueller, Iadine Chades

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Anomaly detection; Malaria data; Machine learning; big data; epidemic

Online: 10 May 2023 (09:34:36 CEST)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints202208.0420.v1

Siri 2.0 - Conversational Commerce of Social Bots and its Legal Implications

Dagmar Gesmann-Nuissl, Stefanie Meyer

Subject: Social Sciences, Law Keywords: conversational commerce; data protection; law of obligations of data

Online: 24 August 2022 (10:55:06 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202208.0224.v1

Recognition of Vehicles Entering Expressway Service Areas and Estimation of Dwell Time Using ETC Data

Qiqin Cai, Dingrong Yi, Fumin Zou, Zhaoyi Zhou, Nan Li, Feng Guo

Subject: Engineering, Automotive Engineering Keywords: VR-XGBoost; K-VDTE; ETC data; ESAs; data mining

Online: 12 August 2022 (03:53:23 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202208.0083.v1

A Big Data Analysis with Machine Learning Techniques in Accounting Dataset from the Greek Banking System

Leonidas Theodorakopoulos, Georgios Thanasas, Spyridon Lampropoulos

Subject: Business, Economics And Management, Accounting And Taxation Keywords: Ratios; Financial Crisis; Covid-19; Big Data; Accounting Data

Online: 3 August 2022 (10:42:06 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202103.0331.v1

Social Media Data Misuse

Tariq Soussan, Marcello Trovati

Subject: Social Sciences, Media Studies Keywords: Social media ethics; Social media; data misuse; data integrity

Online: 12 March 2021 (08:05:09 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202006.0258.v2

Data-Driven Solutions and Discoveries in Mechanics Using Physics Informed Neural Network

Qi Zhang, Yilin Chen, Ziyi Yang

Subject: Engineering, Civil Engineering Keywords: Conservation laws; Data inference; Data discovery; Dimensionless form; PINN

Online: 30 September 2020 (03:51:25 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202007.0051.v2

World Health Organization (WHO) COVID-19 Database: Who Needs It?

Ivan Kodvanj, Jan Homolak, Davor Virag, Vladimir Trkulja

Subject: Social Sciences, Library And Information Sciences Keywords: COVID-19; WHO; database; systematic review; data quality

Online: 2 August 2020 (17:43:38 CEST)

Show abstract| Download PDF| Supplementary Files| Share

Preprint ARTICLE | doi:10.20944/preprints201905.0158.v1

An Adaptive Biomedical Data Managing Scheme Based on Blockchain Technique

Ahmed Faeq Hussein, Abbas K. AlZubaidi, Qais Ahmed Habash, Mustafa Musa Jaber

Subject: Medicine And Pharmacology, Other Keywords: blockchain; biomedical data managing; DWT; keyword search; data sharing.

Online: 13 May 2019 (13:30:37 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201806.0219.v1

A Proposal of Methodology for Designing Big Data Warehouses

Francesco Di Tria, Ezio Lefons, Filippo Tangorra

Subject: Computer Science And Mathematics, Information Systems Keywords: Big data technology; Business intelligence; Data integration; System virtualization.

Online: 13 June 2018 (16:19:48 CEST)

Show abstract| Download PDF| Share

Preprint COMMUNICATION | doi:10.20944/preprints202401.2023.v1

Investigating the Global Fear associated with COVID-19 using Subjectivity Analysis and Deep Learning

Nirmalya Thakur, Kesha A. Patel, Audrey Poon, Rishika Shah, Nazif Azizi, Changhee Han

Subject: Computer Science And Mathematics, Computer Science Keywords: COVID-19; Big Data; Data Analysis; Machine Learning; Subjectivity Analysis; Data Science; Deep Learning; Mental Health

Online: 29 January 2024 (15:42:52 CET)

Show abstract| Download PDF| Share

The work presented in this paper makes multiple scientific contributions related to the investigation of the global fear associated with COVID-19 by performing a comprehensive analysis of a dataset comprising survey responses of participants from 40 countries. First, the results of subjectivity analysis of responses where participants indicated their biggest concern related to COVID-19 showed that the average subjectivity in responses by the age group of 41-50 decreased from April 2020 to June 2020, the average subjectivity in responses by the age group of 71-80 drastically increased from May 2020, and the age group of 11-20 indicated the least level of subjectivity in their responses between June 2020 to August 2020. Second, subjectivity analysis also revealed the percentage of highly opinionated, neutral opinionated, and least opinionated responses per age-group where the analyzed age groups were 11-20, 21-30, 31-40, 41-50, 51-60, 61-70, 71-80, and 81-90. For instance, the percentage of highly opinionated, neutral opinionated, and least opinionated responses by the age group of 11-20 were 17.92%, 16.24%, and 65.84%, respectively. Third, data analysis of responses from different age groups showed that the highest percentage of responses indicating that they were very worried about COVID-19 came from individuals in the age group of 21-30. Fourth, data analysis of the survey responses also revealed that in the context of taking precautions to prevent contracting COVID-19, the percentage of individuals in the age group of 31-40 taking precautions was higher as compared to the percentages of individuals from the age groups of 41-50, 51-60, 61-70, 71-80, and 81-90. Finally, a deep learning model was developed to detect if the survey respondents were seeing or planning to see a psychologist or psychiatrist for any mental health issues related to COVID-19. The deep learning model used the responses to multiple questions in the context of fear, preparedness, and response related to COVID-19 from the dataset and achieved an overall performance accuracy of 91.62% after 500 epochs.

Preprint ARTICLE | doi:10.20944/preprints202102.0326.v1

A Comparative Study on Supervised Machine Learning Algorithms for Copper Recovery Quality Prediction in a Leaching Process

Victor Flores, Claudio Leiva

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Data analysis; Artificial Intelligence; Machine Learning; Knowledge Engineering; Computers and information processing, Data analysis; Data Processing.

Online: 16 February 2021 (13:33:53 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202008.0254.v1

Unsupervised Feature Selection Using Recursive k-Means Silhouette Elimination (RkSE): A Two-Scenario Case Study for Fault Classification of High-Dimensional Sensor Data

Ahlam Mallak, Madjid Fathi

Subject: Computer Science And Mathematics, Information Systems Keywords: feature selection; k-means; silhouette measure; clustering; big data; fault classification; sensor data; time-series data

Online: 11 August 2020 (06:26:43 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202108.0303.v2

Ten Simple Rules for FAIR Sharing of Experimental and Clinical Data with the Modeling Community

Matthias König, Jan Grzegorzewski, Martin Golebiewski, Henning Hermjakob, Mike Hucka, Brett Olivier, Sarah Keating, David Nickerson, Falk Schreiber, Rahuman Sheriff, Dagmar Waltemath

Subject: Biology And Life Sciences, Other Keywords: data sharing; FAIR

Online: 19 November 2021 (08:38:42 CET)

Show abstract| Download PDF| Share

Search Results

1400 articles found