Submitted:
28 May 2025
Posted:
29 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Methodology- Data Models for Processes
3. Steady State Processes – Data Strategy
3.1. Basic Development Plan
- Data Strategy: A comprehensive data strategy begins with the structured collection of data from experiments, simulations, and validated literature sources. This includes capturing spatial variations within the modeling domain. For sensor-based and live data, a robust data acquisition system should be employed, incorporating filtering and translation tools to record both the spatial location of data sources and the corresponding filtered inputs. Prior to semantic data storage—ideally aligned with a predefined ontological framework—essential preprocessing steps must be performed. These include data cleaning, normalization, anomaly detection, and mapping to ensure consistency and quality.
-
Model Development: In steady-state modeling, transient effects are either disregarded or represented through averaged or uniformly distributed approximations across the modeling domain. The choice of model type depends on the complexity and nature of the process and may include:
- ◦
- Pure data-driven models
- ◦
- Physics-informed or hybrid models (analytical, numerical, and data-driven)
- ◦
- Generative models for synthetic data generation
For complex manufacturing processes with high-dimensional data spaces, specialized combinations of solvers, interpolators, and machine learning algorithms are often required to achieve accurate predictions of product quality.
-
Real-Time Data Modeling: Real-time predictive modeling can leverage various solver technologies, including:
- ◦
- Eigenvalue-based solvers
- ◦
- Regression and clustering algorithms
- ◦
- Support Vector Machines (SVMs)
To enhance prediction accuracy for new process parameter sets, advanced interpolation techniques such as Kriging, Radial Basis Functions (RBF), and Inverse Distance Weighting (IDW) can be employed [19,20]. In high-dimensional data scenarios, additional machine learning routines may be necessary to further train the model, reduce prediction error, and improve data fitting.
-
Model Training and Validation: Training and validating data models for complex material processes—such as casting, extrusion, and AM—require careful consideration due to their multi-physical and multi-phase nature. In some cases, split databases are used to bridge different length scales and to train parameter-specific datasets. Various training techniques can be applied, including:
- ◦
- Neural networks
- ◦
- Regression models
- ◦
- Genetic Algorithm Symbolic Regression (GASR)
- ◦
- Tree-based models
Model validation is conducted across the entire modeling domain, including normal, near-boundary, and extreme conditions. Performance metrics and accuracy indices are calculated, and normalized errors are assessed using predefined Design of Experiments (DOE) scenarios to ensure robustness and reliability.
3.2. Process Case Studies
- Simulation Model Calibration and Validation: The CFD simulation model was calibrated and validated using experimental measurements.
- Definition of Prediction Objectives: Key prediction goals—such as thermal and microstructural events—were defined and implemented for a laboratory-scale semi-continuous casting process.
- Scenario Development: Variations in process parameters were considered to define realistic operational scenarios, forming a snapshot matrix.
- Simulation and Data Generation: Using the calibrated CFD model, snapshot scenarios were simulated with open-source software to generate a small database for model training. Additional DOE scenarios were simulated for validation purposes.
- Data Model Construction: Real-time data models were iteratively developed using a suitable combination of solvers and interpolators. Machine learning techniques were also applied to enhance model performance.
- Performance Evaluation: The accuracy and reliability of the data models were assessed using DOE-based validation scenarios.
- Integration into Advisory Framework: Finally, the data models were customized for integration into an existing web-based casting process advisory system.
- Definition of Prediction Objectives: Macro- and micro-scale prediction targets were established across different length scales.
- Parameter Variation Analysis: Critical process parameters were identified, and their effects were analyzed through a range of process scenarios.
- Balanced Sampling Strategy: Techniques such as Latin Hypercube Sampling and Sobol sequences were used to evenly distribute parameter variations within the multi-dimensional design space, forming the final snapshot matrix.
- Simulation and Data Generation: A calibrated FE model was used to simulate the snapshot scenarios, and the resulting data were processed.
- Multi-Scale Database Construction: A split structured database was created to store macro- and micro-scale responses, organized using semantic rules.
- Data Model Development: Predictive models were generated for macro-scale thermal fields and micro-scale microstructural evolution.
- Machine Learning Enhancement: Additional training using machine learning techniques was conducted to improve model accuracy and robustness.
- Model Validation: Extensive validation studies were performed under normal, near-boundary, and extreme conditions using DOE scenarios.
3.3. Analyses and Performance
4. Transient Processes – Modelling Strategy
4.1. Basic Modelling Concepts
4.2. Case Study
4.3. Analyses and Performance
5. Cyclic and Generative Processes
5.1. Generative Data Models – Basic Concepts
- Data Preprocessing: Available experimental data or validated literature results are preprocessed and filtered to establish a data correlation framework.
- Simulation-Based Calibration: Numerical simulations are conducted to replicate the experimental or literature scenarios for model calibration and validation.
- Correlation Analysis: A correlation study is performed using verified simulation data to isolate and relate the effects of key process parameters through data sampling techniques.
- Initial Response Estimation: Based on the correlation framework and additional data training, initial system responses are estimated and incorporated into the database.
- First-Generation Model Development: A preliminary data model is created to predict process parameters across different cycles or layers.
- Validation and Feedback Loop: Verified simulations are used to assess prediction accuracy. A feedback loop is established to refine the correlation framework.
- Database and Model Update: The process response database is updated based on the refined correlations, and a new generation of the data model is developed.
- Iterative Refinement: This cycle of error checking, correlation refinement, and database updating continues until the desired accuracy and reliability are achieved.
5.2. Case Study – AM Power Prediction
5.3. Analyses and Performance
6. Results and Discussions
6.1. Performance and Reliability
6.2. Challenges and Shortcomings
- 1-
- Data availability: As highlighted in the section 4.3, data acquisition and the construction of comprehensive process databases remain significant challenges in the development of reliable data-driven models. Sporadic, inconsistent, or incomplete process data often fail to adequately cover the full parameter space, leading to prediction inaccuracies, data gaps, and potential biases. These limitations directly undermine the robustness and generalizability of the resulting models. Furthermore, acquiring high-fidelity data—particularly for multi-scale modeling, generative process prediction, and transient or high-speed operations—is both technically demanding and costly. These constraints significantly limit the volume, resolution, and diversity of data available for model training and validation.
- 2-
- Data fitting: Generalizing process data remains a persistent challenge in the development of robust data models. Many models struggle to accurately fit initial process boundary conditions, capture rapid changes in process variables (such as high heating or cooling rates), and handle abrupt nonlinear transitions in high-dimensional data spaces. These difficulties are particularly pronounced when training data is sparse, imbalanced, or lacks sufficient diversity. Overfitting to specific boundary conditions or high-gradient trends can significantly impair a model’s ability to generalize to new or unseen scenarios, thereby limiting its practical application in dynamic environments such as digital twin or digital shadow frameworks. Compounding this issue is the data gradient—the variation in data quality, resolution, and contextual relevance across different process parameters and stages. For example, early-stage offline data (e.g., from validated simulations or literature) may be rich and well-structured, while real-time or online data streams are often noisy, incomplete, or difficult to interpret. This inconsistency complicates the seamless integration of data models into process advisory systems or digital twins, where both historical and live data must be harmonized to support real-time decision-making.
- 3-
- A significant challenge in developing data models for manufacturing processes lies in managing the vast temporal and spatial scales involved, particularly in the multi-scale modeling of material evolution. For instance, microstructural transformations in materials—such as those occurring during casting, extrusion, or AM—can unfold over milliseconds and micrometers, while the overall process may span hours and meters. Accurately modeling phenomena across these scales requires advanced multi-scale modeling frameworks, partitioned or hierarchical databases, data mapping and translation mechanisms, and sophisticated filtering techniques. These approaches are often computationally intensive and difficult to validate experimentally, posing practical limitations for real-time or large-scale deployment. Moreover, as manufacturing technologies evolve and new production methods emerge, static models and fixed-size databases quickly become obsolete. To remain relevant and effective, data models must be designed with adaptability in mind. This includes the ability to integrate dynamic, structured data streams, support regular self-updating mechanisms, and expand process databases over time. Such adaptability is essential for capturing evolving process conditions and maintaining model accuracy in rapidly changing industrial environments.
6.3. Applications and Recommendations
7. Concluding Remarks
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Gooneie, A., Schuschnigg, S., Holzer, C. A Review of Multiscale Computational Methods in Polymeric Materials. Polymers 2017, 9, 16. [CrossRef]
- Horr A., M. , Notes on New Physical & Hybrid Modelling Trends for Material Process Simulations; J. of Phy. conference Series, 2020, 1603, 012008. [Google Scholar] [CrossRef]
- Liu, Cm. , Gao, Hb. , Li, Ly. et al. A review on metal additive manufacturing: modeling and application of numerical simulation for heat and mass transfer and microstructure evolution. China Foundry 2021, 18, 317–334. [Google Scholar] [CrossRef]
- Bordas, A. , Le Masson, P. & Weil, B. Model design in data science: engineering design to uncover design processes and anomalies. Res Eng Design 2025, 36, 1. [Google Scholar] [CrossRef]
- Daria Vlah, Andrej Kastrin, Janez Povh, and Nikola Vukašinović. 2022. Data-driven engineering design: A systematic review using scientometric approach. Adv. Eng. Inform. 54, C (Oct 2022). [CrossRef]
- Sumit K. Bishnu, Sabla Y. Alnouri, Dhabia M. Al-Mohannadi, Computational applications using data driven modeling in process Systems: A review, Digital Chemical Engineering, Vol. 8, 2023, 100111. [CrossRef]
- Van de Berg D., Savage T., Petsagkourakis P., Zhang D., Shah N., Del Rio-Chanona E. A., Data-driven optimization for process systems engineering applications, Chemical Engineering Science, Vol. 248, Part B, 2022, 117135. [CrossRef]
- Wilking F., Horber D., Goetz S., Wartzack S., Utilization of system models in model-based systems engineering: definition, classes and research directions based on a systematic literature review. Design Science. 2024, 10:e6. [CrossRef]
- Tao F., Li Y., Wei Y., Zhang C., Zuo Y., Data–model Fusion Methods and Applications toward Smart Manufacturing and Digital Engineering, Journal of Engineering, 2025. [CrossRef]
- Dogan A., Birant D., Machine learning and data mining in manufacturing, Expert Systems with Applications, Volume 166, 2021, 114060. [CrossRef]
- Wang J., Xu C., Zhang J., Zhong R., Big data analytics for intelligent manufacturing systems: A review, Journal of Manufacturing Systems, Volume 62, 2022. [CrossRef]
- Ghahramani M. H., Qiao Y., Zhou M., O’Hagan A., Sweeney J., AI-Based Modeling and Data-Driven Evaluation for Smart Manufacturing Processes, IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 1026-1037, July 2020. [CrossRef]
- Sofianidis G., Rožanec J. M., Mladenić D., Kyriazis D., A Review of Explainable Artificial Intelligence in Manufacturing, 2021, ArXiv. [CrossRef]
- Horr, A., Gómez Vázquez, R. & Blacher D., Data Models for Casting Processes – Performances, Validations and Challenges,Sept 2024, In: IOP Conference Series: Materials Science and Engineering, Vol. 1315. [CrossRef]
- Horr, A., Blacher, D. & Gómez Vázquez, On Performance of Data Models and Machine Learning Routines for Simulations of Casting Processes, R., 8 Jan 2025, In: BHM Berg- und Hüttenmännische Monatshefte. 2025. [CrossRef]
- Horr, A.M., Real-Time Modeling for Design and Control of Material Additive Manufacturing Processes. Metals 2024, 14, 1273. [CrossRef]
- Horr, A.M., Drexler, H., Real-Time Models for Manufacturing Processes: How to Build Predictive Reduced Models. Processes 2025, 13, 252. [CrossRef]
- Wang J., Li Y., Gao R. X., Zhang F., Hybrid physics-based and data-driven models for smart manufacturing: Modelling, simulation, and explainability, Journal of Manufacturing Systems, Vol. 63, 2022, 381-391. [CrossRef]
- Brunton S. L., Kutz J. N., Data Driven Science & Engineering - Machine Learning, Dynamical Systems, and Control, Cambridge University Press, Cambridge, England, 2019. [CrossRef]
- Qin J., Hu F., Liu Y., Witherell P., Wang C. L., Rosen D. W., Simpson T. W., Lu Y., Tang Q., Research and application of machine learning for additive manufacturing, Additive Manufacturing, 2022, 52, 102691. [CrossRef]
- Lebon B., directChillFoam: an OpenFOAM application for direct-chill casting. Journal of Open Source Software, 2023, 8(82), 4871. [CrossRef]
- The OpenFOAM Foundation: https://openfoam.org.
- Bennon W.D., Incropera F.P., A continuum model for momentum, heat and species transport in binary solid-liquid phase change systems—I. Model formulation, International Journal of Heat and Mass Transfer, Volume 30, Issue 10, 1987, Pages 2161-2170, ISSN 0017-9310. [CrossRef]
- Greenshields C., Weller H., Notes on Computational Fluid Dynamics: General Principles, CFD Direct Ltd, 2022, ISBN 978-1-3999-2078-0, 291 pages, https://doc.cfd.direct/notes/cfd-general-principles/.
- Vreeman C., J. , Schloz J.D., Krane M.J.M., Direct Chill Casting of Aluminium Alloys: Modelling and Experiments on Industrial Scale Ingots, Journal of Heat Transfer 124, 2002, 947-953. [CrossRef]
- directChillFoam Documentation: https://blebon.com/directChillFoam/.
- Weckman, D.C. , Niessen, P., A numerical simulation of the D.C. continuous casting process including nucleate boiling heat transfer. Metall Trans B 13, 593–602 (1982). [CrossRef]
- Hoque S.E., Hovden S., Culic S., Nietsch J.A., Kronsteiner J., and Horwatitsch D., Modeling friction in hyperxtrude for hot forward extrusion simulation of AA6060 and AA6082 alloys, 25th International Conference on Material Forming (ESAFORM 2022), 2022, Key Engineering Materials, 926:416–425, 8. [CrossRef]
- Rizkya I., Syahputri K., Sari R. M., Siregar I, Utaminingrum J., Autoregressive Integrated Moving Average (ARIMA) Model of Forecast Demand in Distribution Centre, IOP Conference Series: Materials Science and Engineering, Volume 598, 012071. [CrossRef]
- Brötz, S.; Horr, A. M., Framework for progressive adaption of FE mesh to simulate generative manufacturing processes. Manufacturing Letters. [CrossRef]
- NovaFlow&Solid NovaCast Systems AB., 2022, https://www.novacast.se/product/novaflowsolid/.
- Horr A. M., Computational Evolving Technique for Casting Process of Alloys, Mathematical Problems in Engineering, vol. 2019, Article ID 6164092. [CrossRef]
- Horr, A. M., Real-time Modelling and ML Data Training for Digital Twinning of Additive Manufacturing Processes, Berg Huettenmaenn Monatsh 169, 48–56, 2024. [CrossRef]


















Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).