Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Fundamental Components and Principles of Supervised Machine Learning Workflows with Numerical and Categorical Data

Version 1 : Received: 11 December 2023 / Approved: 13 December 2023 / Online: 13 December 2023 (08:00:49 CET)

A peer-reviewed article of this Preprint also exists.

Kampezidou, S.I.; Tikayat Ray, A.; Bhat, A.P.; Pinon Fischer, O.J.; Mavris, D.N. Fundamental Components and Principles of Supervised Machine Learning Workflows with Numerical and Categorical Data. Eng 2024, 5, 384-416. Kampezidou, S.I.; Tikayat Ray, A.; Bhat, A.P.; Pinon Fischer, O.J.; Mavris, D.N. Fundamental Components and Principles of Supervised Machine Learning Workflows with Numerical and Categorical Data. Eng 2024, 5, 384-416.

Abstract

This paper offers a comprehensive examination of the process involved in developing and automating supervised end-to-end machine learning workflows for forecasting and classification purposes. It offers a complete overview of the components (i.e. feature engineering, model selection, etc), principles (i.e. bias-variance decomposition, model complexity, overfitting, model sensitivity to feature assumptions and scaling, output interpretability, etc), models (i.e. neural networks, regression models, etc), methods (i.e. Cross-Validation, data augmentation, etc), metrics (i.e. Mean Squared Error, F1-score, etc) and tools that rule most supervised learning applications with numerical and categorical data, as well as their integration, automation, and deployment. The end goal and contribution of this paper is the education and guidance of the non-AI expert academic community over complete and rigorous machine learning pipelines and data science practices, from problem scoping to design and state-of-the-art automation tools, including basic principles and reasoning in the choice of methods. The paper delves into the critical stages of supervised machine learning workflow development, many of which are often omitted by researchers due to brevity, and covers foundational concepts essential for understanding and optimizing a functional machine learning workflow, thereby offering a holistic view of task-specific application development for applied researchers who are not AI experts.

Keywords

Machine learning workflow; Supervised learning; Numerical data; Categorical data; Data engineering; Extraction, loading, transformation; Feature engineering; Automated feature extraction; Machine learning engineering; Training, validation, evaluation; Test-driven development; Automated machine learning; Model deployment

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.