Preprint Data Descriptor Version 1 Preserved in Portico This version is not peer-reviewed

Data Synthesis Technique for Categorical Pestes Des Petits Ruminants (PPR) Data Using CTGAN Model

Version 1 : Received: 10 May 2023 / Approved: 11 May 2023 / Online: 11 May 2023 (03:56:47 CEST)

How to cite: Nyambo, D.G.; Ngulumbi, N.; Mduma, N.; Sinde, R.; Lyimo, T. Data Synthesis Technique for Categorical Pestes Des Petits Ruminants (PPR) Data Using CTGAN Model. Preprints 2023, 2023050777. https://doi.org/10.20944/preprints202305.0777.v1 Nyambo, D.G.; Ngulumbi, N.; Mduma, N.; Sinde, R.; Lyimo, T. Data Synthesis Technique for Categorical Pestes Des Petits Ruminants (PPR) Data Using CTGAN Model. Preprints 2023, 2023050777. https://doi.org/10.20944/preprints202305.0777.v1

Abstract

Data scarcity is a significant challenge in the field of Machine Learning (ML), as data collection can be expensive, time-consuming, and difficult, particularly in developing countries. This challenge is exaggerated on the need to use dataset for livestock disease predictions for early intervention and surveillance. To address this challenge, this paper presents a data synthesis method that has been used to accurately generate new data samples from few real-world data. With much data available to train the ML models, overfitting is eliminated. We present the use of Generative Adversarial Networks mainly the Conditional Tabular Generative Adversarial Network to synthesize categorical data for training machine learning models for prediction of the Pestes des Petits Ruminants (PPR) disease. The results showed that training score became 0.89 and the cross-validation score was 0.87 after synthesized data was used with Random Forest algorithm. The resulting dataset can be used to support the prediction and surveillance of the Pestes des Petits Ruminants (PPR) disease. The proposed method can also be applied to any domain with categorical data, and has the potential to improve the performance of machine learning models with increased data availability.

Keywords

Data synthesis; Livestock health; PPR disease; Machine Learning; Prediction

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.