Digital Transformation in the Water Distribution System based on the Digital Twins Concept

MohammadHossein Homaei; Agust´ın Javie Di Bartolo; Mar Ávila; Óscar Mogollón Gutiérrez; Andrés Caro

doi:10.20944/preprints202412.0756.v1

Submitted:

07 December 2024

Posted:

10 December 2024

You are already at the latest version

Abstract

Digital Twins have emerged as a disruptive technology with great potential; they can enhance WDS by offering real-time monitoring, predictive maintenance, and optimization capabilities. This paper describes the development of a state-of-the-art DT platform for WDS, introducing advanced technologies such as the Internet of Things, Artificial Intelligence, and Machine Learning models. This paper provides insight into the architecture of the proposed platform-CAUCCES-that, informed by both historical and meteorological data, effectively deploys AI/ML models like LSTM networks, Prophet, LightGBM, and XGBoost in trying to predict water consumption patterns. Furthermore, we delve into how optimization in the maintenance of WDS can be achieved by formulating a Constraint Programming problem for scheduling, hence minimizing the operational cost efficiently with reduced environmental impacts. It also focuses on cybersecurity and protection to ensure the integrity and reliability of the DT platform. In this view, the system will contribute to improvements in decision-making capabilities, operational efficiency, and system reliability, with reassurance being drawn from the important role it can play toward sustainable management of water resources.

Keywords:

Digital transformation

;

Digital twins

;

Artificial intelligence

;

Water consumption prediction

;

Decision system

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

Inefficient energy consumption and elevated expenses represent merely two of the challenges linked to the water supply system, an essential component of infrastructure that delivers water to final consumers, such as enterprises, the populace, and agricultural sectors [1]. The establishment of facilities for the transportation of water is not a contemporary achievement of civilization. Throughout this era, developments in science and engineering have enabled water to be transported over long distances through the use of valves and pipelines [2]. Aging, urbanization, population growth, climate change, and other factors challenge an already strained WDS [3]. The water supply and bent technology vicious cycle should be broken up simultaneously, reconstructing water networks designed to elevate efficiency, reliability, and promotion in this context even risks [4].

This type of digital transformation helps solve these systemic issues. Water utilities can track and control their systems remotely with IoT technology, AI, and DTs. Predictive maintenance, the ability to look after systems and devices in real-time, and simulations of systems are made possible to the benefit of system efficiency and, at the same time, minimize operational costs [5,6]. However, despite these advances in digitalization related to WDS, there are still a number of significant challenges to overcome: data structuring, model precision, and interfacing with legacy systems [7]. Ensuring long-term viability and operational efficiency in WDS requires technological innovation and the adoption of sustainability-driven business models. Similar to how manufacturing firms are restructuring their business models to achieve sustainable value creation and delivery [8], the water sector must integrate environmental, social, and governance principles into its digital transformation strategies.

The major focus on DTs technology is that it is presently one of the newsmaking, adopted adjunct technologies that will affect the operation of business organizations. They are defined as virtual models of discrete physical systems and run in real time with data and simulations important for the augmentation of performance of a system [9]. In water distribution, DTs can enhance system design, predict the needs around maintenance, conduct simulations concerning this system’s operational capability, and many others [10]. Unlike the above proposition, bottlenecks explicit of deploying DTs in this particular industry are insecurity, data protection, and the need for a competent workforce, among others [11].

In this article, we leverage the application of DTs within the water industry and extend it to urban and rural water distribution and transmission networks to propose a novel platform model.

1.1. Objective and Main Contributions of the Paper

A new DT-based system for water supply systems is presented in this paper, in as much detail as possible. The proposed platform merges DTs with IoT technology to forecast and visualize water uses, thus enhancing the decision capabilities of whoever has authority over the management of the water utilities. Accordingly, the integration implied that the platform was supposed to incorporate the WDS imperatives for improved operational efficiency, regulatory observance, and good infrastructure. This paper will further discuss the characteristics of the platform to be created, its probable advantages, and different ways of overcoming the bottlenecks in the process of digitalization of WDS.

The main contributions of the paper are the following:

The paper proposes an innovative platform based on DTs applied to a WDS, which would be transferred/tested in an SME from the sector in a rural environment. Even if the platform is focused on the water sector and WDS, it will easily be replicable for other sectors or other commercial interests.
TIt is composed of the infrastructures of AI and ML and IoT for the realization of a digital transformation of traditional business models into digital business models. This inclusion in DT, together with IoT devices that allow the acquisition of data used in the predictions, would be of huge interest due to the possible application in similar contexts for prediction purposes.
The platform integrates with advanced Artificial Intelligence and Machine Learning models in water consumption prediction, hence providing more accurate demand forecasting and best resource management. Such predictive capability would help to achieve better decision-making in a water distribution system.
It allows for maintenance scheduling, operator management of water distribution networks, realizes Digital Twins, and CP-based scheduling for optimization of maintenance activities. This ensures that operational efficiency is optimized, the occurrence of downtime is reduced, and service quality is improved through reliable management of the water distribution system.
In fact, the cybersecurity threats that, as a consequence of the paradigm shift implied by the use of DT models connected to IoT devices, have been studied and considered. Another contribution of the paper deals with identifying the cybersecurity strategies that are implemented in accordance with the norm compliance ISO 27001. The implemented security strategies will doubtless be of great interest in similar projects.
It has been tested in a real practical environment what was foreseen with the platform, functionalities, and possibilities that it provides. The proposed DT model, the use of an AI tool, and IoT device identification- including the identification of cybersecurity strategies easily replicable in similar settings and projects, constitute one of the basic contributions of this work.

Table 1. List of Abbreviations

Abbreviation	Definition
AEMET	Agencia Estatal de Meteorología
AES	Advanced Encryption Standard
AI	Artificial Intelligence
AMI	Advanced Metering Infrastructure
AMR	Automated Meter Reading
API	Application Programming Interface
CADF	Combined Anomaly Detection Framework
CAUCCES	Control de Agua Urbana, Cantidad y Calidad Excelentes y Sostenibles
CP	Constraint Programming
CPS	Cyber-Physical Systems
DL	Deep Learning
DMA	District Metered Area
DSS	Decision Support System
DT	Digital Twin
DWSs	Digital Water Services
ERD	Energy Recovery Device
EUI	Extended Unique Identifier
FDT	Functional Design Technology
GCN	Graph Convolutional Network
GDPR	General Data Protection Regulation
GIS	Geographic Information Systems
GNN	Graph Neural Network
HIP	Hydrologic Information Portal
HMI	Human-Machine Interface
HPP	High Pressure Pump
ICT	Information and Communication Technology
ISO	International Organization for Standardization
IoT	Internet of Things
LoRa	Long Range
LOS	Levels of Service
LSTM	Long-Short-Term Memory
ML	Machine Learning
MAE	Mean Absolute Error
MAPE	Mean absolute percentage error
MSE	Mean Squared Error
NASA	National Aeronautics and Space Administration
PAT	Pumps as Turbines
PCC	Pearson Correlation Coefficients
PLC	Programmable Logic Controller
PRV	Pressure Reduction Valves
QGIS	Quantum Geographic Information System
RO	Reverse Osmosis
SCADA	Supervisory Control And Data Acquisition
SME	Small and Medium-sized Enterprises
SSL	Secure Sockets Layer
SWaT	Secure Water Treatment
SWG	Smart Water Grid
TLS	Transport Layer Security
VPN	Virtual Private Network
WDN	Water Distribution Network
WDS	Water Distribution System
WRRF	Water Resource Recovery Facility

1.2. Article Structure

The rest of the paper is organized as follows: In Section 2, Background and Motivation reviews the current status, challenges within the WDS, and the need to adopt such Digital Transformation Technologies like DTs. Section 3, Literature’s Review discusses the existing literature on DT and presents the lacunars of the current research, setting the stage for our contributions. Section 4, Proposed DTs Platform outlines the proposed platform and gives an overview of the system, functional capabilities, and methods to integrate with existing legacy systems. Section 5, Integrating AI/ML in Water Consumption Prediction outlines the use case of the AI/ML method applied for water consumption prediction and leak detection based on the DTs platform. In Section 6, we solve one CP scheduling problem for task completion time, penalty reduction, CO₂ emission minimization, fuel consumption, and maximization of efficiency in a single-machine problem setting. Section 7, Security Layer in DTs Platform, presents some cybersecurity approaches in order to build up the resilience and sustainability of the DT platform. Finally, Section 8, Conclusion and Future Directions, summarizes the contribution of this paper and provides some possible future directions for research in this domain.

2. Background and Motivation

2.1. Digital Transforming

Digital transformation in the water industry uses technologies like IoT, AI, ML, DL, and DT to make water supply and distribution more efficient, sustainable, and reliable [12,13,14]. These tools can improve maintenance, optimize resources, and support better decision-making [2,15,16]. Yet, as in other sectors experiencing similar changes, successfully adopting these advanced solutions in critical infrastructure can be challenging. National research and development initiatives have shown that strategic frameworks help integrate emerging technologies sustainably—and the same approach applies to DTs, AI, and IoT in water utilities [17]. When digital transformation aligns with such targeted R&D efforts, it can promote long-term resilience, environmental responsibility, and more efficient resource management in the water sector.

Real-time monitoring: Digital transformation will enhance real-time monitoring of the WDS; this will help in the early detection and timely response in case of system failure and leaks. For example, sensors that detect a leak in the system and send real-time alerts to maintenance personnel who can act immediately and reduce water wastage [18].
Predictive maintenance: The process of digital transformation facilitates predictive maintenance, which serves to avert system failures and diminish maintenance expenses. For instance, machine learning algorithms can be developed using system data to anticipate maintenance requirements, thereby enabling maintenance personnel to organize repairs prior to the occurrence of system malfunctions [19].
Simulation capabilities: Digital conversion can provide simulation capabilities that can help optimize system design and operation. For example, DTs can simulate system behavior under various operating conditions, allowing companies to optimize system design and identify potential problems before they occur [20].
Data-driven decision making: Digital transformation can allow for better decision-making by availing data-driven insights as stated by [21]. For instance, system data can be analyzed for patterns and trends that may inform decisions related to system upgrades or improvements.

2.2. Current Challenges

Besides, digital transformation in WDS and more has its host of big challenges:

Integration with legacy systems: Integrating modern technology with the existing infrastructure is cumbersome and costly. In support of this process of digital transition, older WDS—which are very often incompatible with today’s IoT devices and sensors—may involve extensive upgrades [22].
Data privacy and cybersecurity: Cybercriminals may be able to take advantage of more weaknesses as IoT devices and sensors proliferate, which might disrupt systems and expose data. For instance, a weakly secured sensor can give hackers access to compromised systems or pilfer confidential information [23].
Proficient workforce: The management and upkeep of digital infrastructure require specialized knowledge. Utilities may face considerable challenges and must undertake substantial programmatic investments to equip their personnel with the necessary digital skills in regions lacking such expertise [24].
Implementation costs: The initial costs of implementing digital transformation are often high, including investments in new technologies, infrastructure upgrades, and workforce training. These costs can be prohibitive for SMEs, limiting their ability to adopt advanced digital solutions and compete with larger entities [25].
Escalating Cybersecurity Threats: As a consequence of digital transformation, a greater number of devices are being linked to the Internet, thereby augmenting the potential access points for cyberattacks. Assaults directed at critical infrastructure may lead to significant financial repercussions, harm to reputation, and even physical injury [26,27].
Widening the digital divide: Digital transformation can exacerbate the digital gap by leaving behind people and communities who lack digital literacy or have restricted access to technology. This disparity impedes social and economic advancement and exacerbates existing inequalities [28].
Labour Displacement: The digital transformation of WDS should align with corporate social responsibility, addressing challenges like labor displacement caused by automation and AI. Organizations can mitigate job losses by fostering trust, equitable resource distribution, and social value through stakeholder engagement by introducing training and re-skilling programs to help workers transition to new roles. Integrating DT frameworks in water management advances technical efficiency, ethical stewardship, and workforce adaptability [29].
Privacy Issues: The significant collection and utilization of data have the potential to lead to privacy-related concerns. The implementation of digital technologies may be hindered by apprehensions regarding data security incidents or the handling of individual information, which could diminish public trust in these advancements [26,30].

While digital transformation provides great dividends in terms of productivity and, therefore, innovative capacity in various industries, it needs to be pursued strategically and with awareness within WDS. In fact, critical challenges and risks threaten the successful digital transformation that service companies have to face. It is, therefore, especially relevant that a sound strategy, recognizing the peculiarities of WDS in terms of their complexity, criticality, and regulatory requirements, be developed. Through the adept management of these obstacles, electronic tools can facilitate the successful execution of digital transformation, thereby ensuring that its advantages are equitably shared among all stakeholders [31]. This holistic strategy will lay the groundwork for unlocking the complete potential of digital transformation in enhancing operational efficiency and promoting innovation throughout the industry.

2.3. DTs in Water Industry

This was born in the 1960s when most organizations, inclusive of NASA, applied physical twins of systems for the control of systems at remote locations, as was applied in the rescue of Apollo XIII [32,33,34]. These have evolved with time and debate to highly advanced virtual twins that simulate various “what-if” scenarios. This improves the city’s ecosystem since the DT empowers planning and decision-making by highlighting the macro level of interaction that multiple components have, which enables these systems to work in a sustainable manner and with maximum efficiency.

The adoption of DTs across industries is novel, driving benefits from better product development to operations and decision-making. For example, electricity to produce more in manufacturing, to predict equipment failure, and to produce higher-quality products. The automobile industry uses them to simulate assembly lines, find out bottlenecks, and test new product designs [35,36,37]. Healthcare - DTs model human organs or body parts to understand the conditions of patients better and plan surgeries. DTs in the construction industries are applied to fine-tune the designs of buildings and reduce energy consumption through simulations of wastes created by different materials, lighting, and ventilation systems. Transportation utilizes DTs for performance monitoring and predictive maintenance in view of safety and operational efficiency [26].

Henriksen et al. (2022) present the development of a novel type of DT called HIP DT for Adaptation to Climate Change, Water Management, and Disaster Risk Reduction. This paper delineates how the development and implementation of the national DK-model HIP have been done to realize real-time updating of simulations through the HIP portal. This approach focuses on how the betterment of Denmark’s ability to cope with extreme weather events can be achieved by providing improved hydrological information. The system contributes to adaptive planning, ensures water security, reduces disaster vulnerability, and builds climate resilience by applying the latest hydrological modeling and machine learning techniques. Other important applications include real-time forecasting to help overcome the threats of flooding and drought, safeguarding water supplies, and supporting sustainable development projects [38].

Ibrahim Yousif’s contribution [39] aims to leverage digital transformation in the water desalination process, improving smart facilities. The objective targeted here was to come up with DT models for two major items, namely a three-cylinder high-pressure pump energy recovery device-HPP-ERD and a three-stage RO membrane model. These DTs employ real-time data and IoT technologies to comprehensively monitor, detect faults, and predict them in the systems. Applying machine learning to higher-order signal processing will considerably increase the efficiency and reliability of the whole system. This research paper will present how digital transformation will result in an enhanced desalination process, reduced maintenance costs, and increased sustainability in the production of freshwater.

Savic’s work [40] presents a future of the water sector that is rather transformative, similar to what happened in the car and aerospace industries. Remote sensing, artificial intelligence detecting anomalies, and digital twins are identified as key enablers to enhance water management efficiency. This research puts a strong focus on embedding human capital into the process and using appropriate cybersecurity to manage risks related to automation. Lessons can be learned from the failures within other industries, such as those involving Tesla Autopilot and Boeing 737 MAX, and be applied for the advancement of digital transformation in the water industry, increasing its resilience and operational effectiveness.

2.3.1. DTs in Water Field: Enhancing Efficiency and Sustainability

DTs enjoy a wide range of applications from energy industries through increasing system efficiencies, reliability, and sustainability by adopting real-time monitoring, predictive maintenance, and data-driven decisions. Similarly, DTs hold huge promises of making paradigmatic changes in drinking WDNs management. The performances of the drinking water supply systems can be optimized by practicing various scenarios in the virtual replicas of the networks [10,41]. Various most promising applications of DTs in drinking water WDNs are discussed in this section.

One main use of DTs in WDN is enhancing hydraulic performance. By replicating how the network acts in various scenarios, operators can pinpoint possible problems like drops in pressure or water surges, allowing for knowledgeable choices regarding network layout and function. Moreover, DTs can identify areas at risk of pollution, enabling proactive measures to manage potential hazards [32,38,42].

Besides, DTs improve the asset management of drinking WDNs. In other words, through virtual equivalent modeling of network assets, operators can test performance and predict faults before they occur. For example, the logical simulation of a pump or a valve operational behavior can allow operators to identify vulnerabilities and establish a plan for mitigating the potential damage, but also model the reaction of the network in case of power failures or natural disasters, among the quality of water.

Simulations of water flow in the network enable operators to identify possible locations of contamination and maintain good water quality accordingly [43,44]. DTs also contribute to monitoring water consumption and detecting losses, reducing overall water waste and enhancing the overall efficiency of the network.

Hence, DTs are beneficial in planning emergency responses. Various contingency scenarios can be simulated to identify potential problems and develop effective contingency response strategies by operators. For example, simulating network behavior in the event of a power outage or natural disaster allows operators to identify the weak links and put in place strategies that reduce potential damage. In general, the output from DTs applied to WDNs can be summarized as optimization of hydraulic performance together with asset management, better monitoring of water quality, and emergency response planning. In such a way, the discussed technologies enable the operator to make data-driven decisions toward efficiency, reliability, and sustainability concerns in WDNs.

3. Literature’s Review

3.1. Background Research

DTs in the water industry are improving the efficiency, reliability, and sustainability of water systems. In particular, WDS enables real-time monitoring and network behavior simulation. In this direction, the technology will enable predictive maintenance, better decision-making, and optimization of resources. DTs create a dynamic, virtual model of the WDN, bringing together real-time data, AI, and machine learning. This model helps identify potential issues, test scenarios, and understand the impacts of various decisions, reducing risks and improving service quality. This section reviews significant contributions to developing and applying DTs in WDS, highlighting advances and identifying ongoing challenges. Conducting a thorough and systematic literature review is paramount for identifying emerging trends, challenges, and solutions in applying DTs and AI/ML within WDS. Recent studies have emphasized the importance of refining literature review strategies to analyze big data trends and ensure research quality across different journal tiers [45]. Such methodological rigor helps build a robust theoretical foundation for understanding digital transformations in the water industry.

3.1.1. Early Conceptual Foundations

Early conceptual underpinning of DTs puts into consideration the problems regarding efficiency and sustainability in water treatments and distribution systems through sophisticated modeling combined with real-time data. Curl et al. (2019) focus on how DTs can provide in-depth and real-time representations of water treatment processes for predictive optimization and control. The approach has greatly optimized chemical dosing, energy use, and resource utilization to reduce operation costs and minimize environmental impact. For example, a case study at a North Carolina water treatment facility reported 10% chemical savings with a 2% improvement in water quality due to optimized coagulant dosing [46].

Conejos et al. (2020) further elaborate on developing and implementing DTs for managing drinking WDNs. They also refer to the main functionalities of the DTs: "Precise modelling of network behaviors, continuous data integration from systems like GIS, AMR, and SCADA are other advanced capabilities that include optimal network design, asset management, leak detection, and simulations of emergency responses." The case study performed on the WDN of Valencia-which supplies 1.6 million inhabitants-demonstrates remarkable added value regarding real-time monitoring, predictive maintenance, and efficiency enhancement of operations. This exemplifies the scalability and efficacy of decision trees in overseeing intricate water management systems, highlighting their potential as essential instruments for decision-making within contemporary water utility operations [32].

Giudicianni et al. (2020) provide this basis by discussing the integration of advanced energy management and leakage control technologies within smart water grids. The study discusses the role of DTs, CPS, and blockchain in optimizing WDSs. Its core focus rests on the recovery of energy, and that is achieved by installing micro-turbines and PATs instead of traditional PRVs through the enhancement of energy efficiency together with leakage control. The study has also identified that the segmentation of water networks into DMAs is effective in improving monitoring and management. Giudicianni et al. trust that the integration of the concept of digital water will contribute significantly towards enhancing the sustainability and resilience of urban water systems to enable upcoming smart city development [47].

3.1.2. Expansion and Application in Various Water Systems

The application of DTs in water systems has immensely increased, proving their flexibility and effectiveness for use across various environments. Valverde et al. (2021) extend the explanation of how DTs improve the performance and efficiency of the water infrastructure with respect to operational decision-making for sewer networks and water resource recovery facilities. The integration of real-time data and advanced models within the DTs allows for operational optimization and smarter handling of data in complex water systems. Case studies from Global Omnium in Valencia, Spain, Aarhus Vand in Denmark, and DC Water in the United States further illustrate that, in general, the technology of digital twins allows for real-time monitoring, integration, and holistic management of systems, underlining their flexibility in a variety of water management contexts [48].

Garrido-Baserba et al. (2020) relate the fourth revolution to the digitization impelled by big data and artificial intelligence in the water sector. These technologies enhance operation, maintenance, and sustainability when integrated with urban water infrastructure. AI and big data analytics enable real-time insights and predictive capabilities, changing how decisions are made, resources are recovered, and assets are managed. This digital transition will bring changes in the socio-economic perspective, it will influence new business models, and it will demand new research and also new workforce development to meet the future demands of the water sector [49].

Pedersen et al. 2021 investigate the Living and prototyping DTs for an urban water system that is focused on multi-purpose value creation via models and sensors. While a living DT performs real-time operational and control functionalities, in prototyping DT, it would be used for design and planning without any real-time data coupling. Implementation at VCS Denmark showed how the system management can be empowered with data links, simulation models, and enhanced analytics. This is emphasized through open data standards and intersectoral collaboration to maximize the impacts of DTs for efficiency, resilience, and sustainability in the management of urban water [50].

Hietala et al., 2021 present different forms of collaboration in the digital transformation of municipal wastewater management, focusing on inter-organizational cooperation between Finnish water utilities. They identify four main modes: autonomous, limited company, central service organization, and standardization. These collaboration modes enhance ICT development and deployment, providing benefits such as predictive maintenance, efficient resource allocation, and improved data integration. While autonomous development is prevalent, collaboration through limited companies and standardization offers significant advantages in managing digital transformation challenges and improving overall operational efficiency and sustainability [51].

Van Rooij et al. (2021) proposed a DSS based on DTs to recover membranes in RO desalination plants. The study addresses bio-fouling caused by algal blooms, significantly impacting membrane efficiency. The DSS creates a DT of an RO vessel to evaluate maintenance strategies, including membrane cleaning, swapping, and replacement. Applied to the Carlsbad Desalination Plant, this approach optimizes maintenance schedules, reduces operational costs, and enhances the reliability and longevity of membrane systems, setting a new standard for membrane management in the desalination industry [52].

Udugama et al. 2021 explores the challenges and opportunities of applying Digital Twins to bio-manufacturing, highlighting potential gains in process efficiency and resource optimization. The following work details a five-step methodology for developing a full DT, starting with a basic steady-state process model and culminating in an advanced, validated model complete with bidirectional communications. A bench-scale ethanol fermentation serves as a test bed to demonstrate the improved monitoring and control functionalities of the developed framework. Key challenges include the need for high-fidelity models, real-time data integration, and operator interaction, with suggestions for future research to optimize biomanufacturing operations [53].

Botin et al. (2021) study DT applications representing urban spaces and vehicles to create a Living Laboratory and demonstrate how DTs in urban environments can enable interaction in achieving the United Nations Sustainable Development Goals. The work describes a method for the development of DTs using a network of vehicle sensing devices which, by processing in real-time via edge computing, modelling software, and machine learning algorithms, resulting in DTs that shall be used to analyze the urban space evolution, mobility, and the interaction of vehicles for the valuation of insights into urban planning and improvements of infrastructure toward sustainable and resilient urban environments [54].

3.1.3. Developing Advanced Methodologies and Frameworks

The development of new methodologies and frameworks in WDN management has been strongly improved by combining DTs with legacy algorithms and innovative technologies. Ciliberti et al. (2021) propose a new digital transformation paradigm in the WDNetXL/WDNetGIS platform, focusing on life-cycle management and operational efficiency. The essential services, such as the Digital Water DMA Analyzer, optimize district metering area design for leakage reduction through pressure control, and the Digital Water Rehabilitation provides optimal pipe replacement plans. These digital services significantly improve the management and sustainability of WDNs by real-time data exploitation and advanced modeling techniques [7].

Ramos et al. (2022) talk about the integration of DT technologies to enhance WDN efficiency, showing substantial water savings and improvement in system operations through optimization algorithms and real-time data collection using GIS. A case study in Lisbon shows the ability of DTs to bring about 28% water savings through quick leak detection and optimal pressure control valve settings. This research puts forward the transformative effect of DTs on smart water management, emphasizing reduced water and energy losses and promoting sustainable operations [41].

Pesantez et al.(2022) assess the impact of the COVID-19 pandemic on water infrastructure based on a DT framework by fusing AMI information with hydraulic modeling. The study highlights the dramatic changes in residential and nonresidential water use patterns due to social distancing measures, which are creating increased pressures and increased water age within the distribution system. This approach really depicts the potential of DTs in providing real-time insight and enhancing operational decision-making to improve infrastructure resilience during unforeseen events [55].

Bonilla et al. (2022) provide a state estimation methodology for WDSs using GCNs coupled with hydraulic models to constitute a DT. This framework estimates the pump speeds from available data on pressure and flow rates and makes a very accurate prediction of the system’s hydraulic state. Validated in benchmark networks, this methodology shows high predictive accuracy and thus has great potential for DTs in improving WDS monitoring, management, and anomaly detection to improve overall operational efficiency and reliability [42].

Zekri et al. propose a smart water management framework using intelligent DTs and multi-agent systems in order to increase efficiency and distribution of water resources. This five-layer framework uses intelligent agents for data analytics, simulation of asset operation, and feedback from users in real-time. The framework is designed with much emphasis on autonomy and intelligence, using a reward-based mechanism to incentivize optimal water consumption—how DTs can improve asset management, leak detection, and system operations for sustainable use of water resources [10].

Matheri et al. (2022) review the integration of DTs, AI, and data-driven optimization in wastewater treatment plants for sustainable circularity and intelligent operations. The study highlights how CPS can be deployed in real-time for monitoring, predictive maintenance, and enhancement of decision-making frameworks. In this regard, adopting circular bio-economy approaches will transform wastewater treatment into resource recovery facilities that support sustainable development goals. Implementations of these technologies under which considerable operational cost savings, significant improvements in system reliability, and compliance with better environmental standards are realized truly illustrate the transformative capability of DTs in wastewater management [56].

3.1.4. Recent Developments and Future Directions

Recent progress in developing digital technologies for water systems has focused on improving real-time monitoring, operational efficiency, and sustainability by using new applications and advanced frameworks. Dodanwala et al. (2023) present a digital technology framework for LOS in relation to potable water infrastructure systems. This integrates real-time data acquisition with established service standard benchmarks so that the performance evaluation and management of the water infrastructures can be automated. This framework improves strategic decisions as well as operational efficiency in the development of a cyber equivalent for a physical infrastructure that is facing aging infrastructure and/or variable service conditions. It follows ISO 55000 guidelines based on dynamic data integration and automatic LOS assessments to ensure sustainability and resilience in the water delivery service [57].

Ramos et al. (2023) confirm both applications of the SWG and DT technologies at the Gaula Water Distribution Network, Madeira, Portugal. Real-time monitoring, scenario analysis, and optimized pressure control were found to reduce significant water loss and improve the performance of the system. The DT model allowed for a potential reduction of 80% in water losses, saving about EUR 165,000 every year. This study underlines the importance of using DT and SWG for effective modernization of water management practices toward sustainable water usage and increasing the efficiency of the network, in general [58].

Overall, Grievson et al. (2022) describe the integration of digital solutions into water management: real-time optimization, total transparency, and predictive maintenance. A number of cases are presented that have underlined the transformative potential that digitalization can show in areas such as reduction of non-revenue water, improvement of the quality of water, and the management of infrastructure. Alongside opportunities, challenges about cybersecurity, data quality, and legacy systems call for collaboration in a multi-stakeholder perspective able to ensure digital transformation success. The research promotes creating future-proof frameworks that support innovative technologies, driving sustainable development and equitable water management practices [59].

Fu et al. (2022) introduced the development of DTs for biomanufacturing, considering a general framework for developing a full-fledged DT starting from a basic steady-state process model. For a case study on second-generation ethanol fermentation, this presented DT framework has been able to illustrate productivity enhancements of 20-33%. The following critical success factors have been identified: modeling accuracy, human operator actions, and economic value proposition of the models. Notwithstanding these benefits, the study recognized that for such interaction to be effective, highly advanced digital infrastructure, together with carefully designed HMIs that will help enhance the accuracy and robustness of the DT system [60].

Pedersen et al.(2022) introduce a diagnostic framework for addressing uncertainties in integrated urban drainage models used in living DTs. The framework enhances iterative model improvements by classifying errors across urban drainage system components using hydro-logic and hydraulic signatures from water level sensors. Applied in Odense, Denmark, this approach reveals discrepancies in model inputs, structural attributes, and temporal variability, improving model accuracy and transparency. This diagnostic framework leads to more reliable and efficient urban water management [61].

Gino Ciliberti et al. (2023) present a transformative approach to the digitization of WDNs by developing standardized methods for creating DTs for WDNs. Advanced hydraulic modeling is combined with AI, machine learning, and deep learning techniques; thus, a conceptual framework for WDN digital transformation is presented. This framework integrates the whole DTs with advanced network analysis, developing DWSs as plugins for QGIS software: Enhancing WDN planning, management, and design, continuous improvement of their digital representation, and improving technical decision-making and overall efficiency [62].

Additionally, Ramos et al. (2023) illustrated how DTs transform system operation and maintenance. DTs allow the detection of patterns in historical and real-time sensor data and thus inform predictive maintenance strategies, avoiding sudden failures and reducing costs to the very minimum. These technologies may be one of the primary drivers enabling the SDGs due to their reduced environmental footprint through better water management, pressure management, and resource preservation. There are still barriers, however, regarding scaling up with currently existing infrastructure in terms of data management issues, regulatory issues, and cybersecurity concerns. Further research is needed to improve forecasting water consumption and detecting leaks to fully leverage DT technology in enhancing WDSs [63].

Torfs et al. (2024) introduce WRRF models transitioning into DT applications with a focus on how to overcome a lack of consensus in DT definition and application. The main differences from traditional simulation models toward DTs point to continuous and automated data links and dynamic updating. Integrating mechanical with data-driven models into hybrid frameworks enhances their predictive power, hence operational efficiency. Examples of success stories range from Changi Water Reclamation Plant to Kolding WRRF and have shown that decision improvements in operation and real-time optimizations have been possible by the implementation of DTs. Successful DT deployment in WRRFs needs a holistic approach that includes stakeholder buy-in and adequate data management [64].

Menapace et al. (2024) present a proof-of-concept about optimal sensor placement using GNNs to support the development of DTs for WDSs. The paper presents a novel methodology which uses GNNs in the evaluation of pressure at the consumption nodes, guiding the optimal configuration of sensors with the goal of minimizing the estimation error. Applied to a synthetic case study, the approach demonstrated high accuracy in pressure estimation across various sensor configurations, highlighting the potential of GNNs in enhancing the accuracy and reliability of DTs in WDNs. This innovative method supports more effective monitoring and management of water systems, promoting improved operational efficiency and resource utilization [65].

Table 2. Comparison of DT Implementations in the Water Industry

No	Focus Area	Contributions/Goals	Methods	Case Studies/Apps	Outcomes
[46] 2019	DTs in water treatment	Real-time optimization, sustainability	DTs, real-time data	North Carolina water treatment facility	10% reduction in chemical usage, 2% water quality improvement
[32] 2020	WDNs	Real-time monitoring, predictive maintenance	DTs, GIS, SCADA	Valencia, Spain	Improved operational efficiency, support for 1.6 million inhabitants
[47] 2020	Smart water grids	Energy efficiency, leakage control	DTs, CPS, blockchain	Various implementations	Enhanced sustainability, resilience of urban water systems
[66] 2020	Urban water cycles	Real-time control, subsystem interoperability	Cyber-physical systems, model predictive control	Barcelona, Badalona	Reduced combined sewer overflows, optimized water usage
[48] 2021	Water infrastructure	Operational decision support	DTs, real-time data	Global Omnium, Aarhus Vand, DC water	Versatility and effectiveness in different contexts
[50] 2021	Urban water systems	Real-time operational and control functionalities	Living and prototyping DTs	VCS Denmark	Improved system management, efficiency, resilience
[51] 2021	Wastewater Management	Inter-organisational collaboration	Autonomous, limited company, central service organisation, standardisation	Finnish water utilities	Enhanced ICT development, predictive maintenance
[52] 2021	Desalination plants	Membrane maintenance optimisation	DTs, decision support systems	Carlsbad desalination Plant	Reduced operational costs, enhanced membrane efficiency
[53] 2021	Bio-manufacturing	Process efficiency, resource utilisation	DTs, real-time data integration	Ethanol fermentation process	Enhanced monitoring and control capabilities
[54] 2021	Urban spaces and vehicles	Sustainable urban environments	DTs, sensing devices, edge computing, machine learning	Various urban applications	Improved urban planning and infrastructure
[7] 2022	WDNs	Asset management, pressure control	DTs, established algorithms	Lisbon, Portugal	28% water savings, reduced water and energy losses
[55] 2022	Water infrastructure	Impact assessment of COVID-19	DTs, AMI data, hydraulic modelling	Mid-sized utility serving 60,000 people	Altered water demand patterns, improved operational decision-making
[42] 2022	WDSs	State estimation methodology	DTs, graph conventional networks	Two benchmark networks	High predictive accuracy, improved WDS monitoring
[10] 2022	Smart water management	Water resource efficiency	DTs, multi-agent systems	Various implementations	Improved asset management, leak detection, system operation
[56] 2022	Wastewater treatment plants	Process efficiency, resource recovery	DTs, AI, CPS	Various implementations	Reduced operational costs, enhanced system reliability
[57] 2023	Potable water infrastructure	Real-time data collection, bench-marking	DTs, Levels of Service framework	Various implementations	Enhanced decision-making, operational efficiency
[58] 2023	WDNs	Water loss reduction, system performance	Smart water grids, DTs	Gaula WDN, Madeira, Portugal	80% water loss reduction, EUR 165,000 annual savings
[59] 2023	Digital transformation	Digital solutions integration	DTs, real-time optimisation	Various case studies	Enhanced operational efficiency, resource management
[60] 2023	Biomanufacturing	DT framework	DTs, human-machine interfaces	Ethanol fermentation	20-33% productivity improvement
[63] 2023	System operation and maintenance	Predictive maintenance, sustainability	DTs, historical and real-time data	Various implementations	Minimized maintenance costs, improved sustainability
[62] 2023	WDNs	Digital transformation, AI integration	DTs, AI, machine learning, deep learning	QGIS software plugins	Improved planning, management, and design of WDNs
[64] 2024	Water resource recovery	Real-time optimization, stakeholder involvement	DTs, hybrid frameworks	Changi water reclamation plant, Kolding WRRF	Significant operational efficiency improvements
[61] 2024	Urban drainage models	Diagnosing uncertainties	DTs, hydrologic and hydraulic signatures	Odense, Denmark	Improved model accuracy, enhanced urban water management
[65] 2024	WDSs	Optimal sensor placement	DTs, graph neural networks	Synthetic case study	High accuracy in pressure estimation, improved monitoring

3.2. Gaps in Research

Despite all the advancements in applying DTs within the water industry, there still remain some major research gaps. A major gap is the integration and effective use of state-of-the-art AI and ML models with DTs for WDSs. While AI and ML hold prospects to enhance predictive analytics and operational efficiency, real-world applications are mostly curbed by the shortage of good-quality, granular time-series data needed to train strong models. The complexity of water consumption patterns and the requirement for much feature engineering make it difficult to implement models such as LSTM, Prophet, LightGBM, and XGBoost. It reflects the reality that AI models should be both accurate and generalizable within different contexts of the water industry.

Another critical gap is the optimization of maintenance operations within WDSs using DTs and advanced optimization algorithms, such as CP. How to integrate DTs with sophisticated scheduling models to address complex maintenance tasks with dynamic priorities and dependencies remains a less-explored avenue. This calls for setting up new research efforts toward developing scalable optimization methods for adaptation to the dynamic nature of WDSs that minimize operational costs and environmental impacts.

In addition, there is a need to develop scalable digital twin architectures that can handle the complexities involved in water distribution networks by considering diverse data sources, such as IoT sensor data, historical maintenance records, and environmental information. Current digital twin applications often focus on specific aspects, like pipeline health monitoring or water quality management, rather than working on an integrated view of all subsystems and external factors that influence water distribution. Such creation of integrated digital twin platforms would enable smooth interoperability between the different components involved, such as water quality assessment, leak detection, consumption prediction, and maintenance planning. That would need some research into data standardization methodologies and interoperability.

Successful DT adoption also needs a skilled workforce capable of managing these advanced systems. It is an imperative task to research programs for education and training tools to close the digital skill gap in the water industry. This will enable smoother transitions to digital platforms and realize maximum benefits from DT implementations.

Furthermore, digitalization in WDSs, especially regarding regulatory compliance and cybersecurity, is under-explored. With the increasing prevalence of digital systems, there is an acute need to ensure secure and resilient interconnected water distribution infrastructures from cyber threats. A great deal of research will be required to ensure DT platforms comply with relevant regulations, including data privacy policies like GDPR and ethical policies. Frameworks that properly balance operational efficiency against legal and moral imperatives will be critical to ensure both effective and responsible digital transformation of WDSs. The creation of the DT should, therefore, be a priority, enabling the discovery of those errors only when such a DT is created; hence, this paper proposes a DT platform for better and more precise water resource management in order to manage efficiently the support and maintenance of the water network.

4. Proposed DTs Platform

This manuscript introduces the CAUCCES platform, a DT-based system integrating AI models for water consumption forecasting, IoT infrastructures for gathering data in real-time, and cybersecurity measures compliant with the regulatory context. Successfully deployed in Ambling [67], an SME in charge of the management of rural water supply services, the platform tackles challenges like water scarcity and demographic decline through showing how advanced sensing, communication, and information technologies may be beneficial in enhancing governance around water resources.

The platform provides an integrated framework for real-time monitoring and decision-making and digitization of the maintenance process for water distribution networks. It optimizes maintenance by using CP-based scheduling with DT technology to bring operational efficiency and service quality to a higher level, along with system reliability. Inspired by solutions for urban areas, CAUCCES tailors these technologies to the demands of rural needs, enabling functionalities such as remote meter readings, water quality monitoring, demand forecasting, and scenario-based system simulations.

Digital Twins, at the heart of the platform, are dynamic digital models of physical systems integrated in conformity with real-time sensor data to increase a system’s efficiency and sustainability. This makes CAUCCES an enduring solution to the challenges of modern-day water distribution.

Figure 1 classifies DT applications into operational and strategic. In an operational sense, DTs support monitoring and control; as such, they can also detect anomalies, optimizing any process—for instance, the automation of a water production facility. At a strategic level, "predictive twins" leverage simulated data to model potential scenarios to drive design changes in systems, train personnel, and simulate incidents proactively.

Models are usually used by water utilities for the prediction of system behavior under normal conditions. However, actual performance deviates due to real demand, maintenance, and aging infrastructure. By integrating these factors, a DT creates a dynamic, real-time model that continuously adapts to changes in order to improve daily operations and long-term planning. This is how DTs have the potential to transform the management of water from static models to dynamic, scenario-based tools that empower utilities to navigate the future with confidence.

4.1. CAUCCES Architecture

Figure 2 presents the architecture of the CAUCCES Smart Water Management System, which has been meticulously designed to integrate diverse data sources and cutting-edge technologies to enhance water management efficiency significantly. The architecture is structured into several interconnected layers, each tailored to perform specific functions contributing to the system’s overall effectiveness and operational intelligence.

4.1.1. Cyber Physical System

The CPS represents the interconnected physical assets and digital system components that facilitate data generation, collection, and transmission. It comprises two key elements that work together to ensure the continuous flow of information from physical sensors to computational platforms:

IoT Networks: These networks include IoT devices located throughout the WDS, including PLCs, meters, pumps, pipes, and gauges. Real-time data is collected at the various treatment, transmission, and distribution stages of water. The data collected becomes important in monitoring the performance of the system and determining inefficiencies or faults in the network. The deployment of IoT networks ensures extensive coverage of the entire water infrastructure with high-accuracy data capture.
Communication: The communication layer is important for the lossless and timely delivery of data gathered by IoT devices to downstream systems. In our architecture, data from IoT devices connected via the LoRaWAN protocol are aggregated and then securely transmitted to cloud platforms for further processing; thus, this ensures that the system has strong and scalable data handling, maintaining the integrity and timeliness of critical information across the network.

4.1.2. DTs and System Integration Layer

At the heart of this architecture lies the DT component, which creates accurate virtual replicas of physical water systems. This core element encompasses various models and a computational engine that transforms real-world data into digital entities. It also enables seamless communication between the physical and virtual systems, allowing for higher-level analysis, optimization, and decision-making:

API and Data Management Layer: This layer is essential for data integration and management. It controls access, processing, and the exchange of data within the system in such a way as to provide a safe and orderly environment for the sharing of data through APIs. The platform provides access to the controlled data and functionality while allowing integration with other external systems; therefore, the collaboration between two or more platforms in operations becomes much more effective. The layer also ensures the system can handle large volumes of data efficiently, supporting scalability and compliance with regulations such as GDPR.
Visual Representation in GIS: Although not a self-contained layer, Geographic Information Systems (GIS) are truly part of the system architecture. GIS provides visual insight by displaying data and results of analyses on geographic maps, which significantly enhances spatial analysis. The platform supports both two-dimensional and three-dimensional mapping capabilities, as illustrated in Figure 3 and Figure 4, respectively, which are pivotal in visualizing and interpreting the geographic information, fundamental for both operational and strategic decision-making in the water management domain.

4.1.3. Processing and Analytics Layer

The processing and analytics layer transforms raw data into actionable insights in the data analysis phase. This stage handles multiple data sources, categorized into real-time and historical data streams. The data typically undergo pre-processing, cleaning, and transformation on servers within this layer to ensure it is ready for optimization and predictive models. The critical components of this layer include:

Input Data: This layer gathers diverse types of data essential for analysis:

−

Meteorological data is sourced via APIs from regional weather stations, providing crucial environmental context.

−

Historical data consists of daily and even hourly water consumption records for the village, stored in dedicated databases for trend analysis and forecasting.

−

Real-time data from meters, pumps, and PLC devices are continuously captured and stored in relevant databases, feeding into real-time monitoring and analysis models.
Artificial Intelligence Models and Forecasting: A key component of this layer is the AI/ML process stage, where artificial intelligence and machine learning models analyze historical and real-time data. To forecast water consumption, we employed a combination of advanced models to forecast water consumptions, including LSTM, Prophet, LightGBM, and XGBoost. These models enable the system to perform extensive optimization and computational tasks, predict future water demand, detect network leaks, and calculate energy usage and carbon dioxide emissions. Integrating time series data allows for highly accurate forecasting and efficient resource management [68].
Business Intelligence and Dashboard Tools: Processed data is presented visually through intuitive dashboards and detailed reports, providing users with real-time insights and historical data analysis. These tools empower decision-makers to make informed, data-driven choices, ensuring operational efficiency and long-term sustainability in water management.

5. Integrating AI/ML in Water Consumption Prediction

5.1. Data Aggregation and Pre-Processing

Water consumption data, originally recorded at 15-minute intervals, is aggregated into daily, weekly, and monthly totals to align with the temporal granularity of meteorological data. This alignment allows for practical correlation analysis at different timescales.

5.2. Meteorological Variables

Data from the AEMET [69] provides essential information about climatic conditions, augmenting the water consumption data and enabling the extraction of consumption patterns. These variables are listed in Table 3 and Figure 5.

The following part evaluates the correlation between water consumption and meteorological parameters using Pearson’s correlation coefficient (R), which ranges from

- 1

(perfect negative linear relationship) to

+ 1

(perfect positive linear relationship). This analysis helps identify the most influential meteorological factors for inclusion in forecasting models.

Definition: Pearson’s Correlation Coefficient

R = \frac{n \sum (x_{i} y_{i}) - \sum x_{i} \sum y_{i}}{\sqrt{[n \sum x_{i}^{2} - {(\sum x_{i})}^{2}] [n \sum y_{i}^{2} - {(\sum y_{i})}^{2}]}}

(1)

Where n is the number of observations,

x_{i}

and

y_{i}

are the individual data points of variables X and Y, and other terms are mathematical sums and products needed for the computation.

Figure 6, and Table 4 presents the Pearson correlation coefficients (R) for the relationships between various climatic factors and the meteorology station parameters, used as a proxy for water consumption. For the provided dataset, several climatic factors exhibit significant correlations with maximum daily temperature, suggesting that these variables may also impact water consumption if temperature is considered a surrogate for this measure. A correlation coefficient greater than 0.4 is considered significant, indicating a meaningful linear relationship between two variables. In this dataset, most of the climatic factors, such as maximum temperature (tmax), medium temperature (tmed), and minimum temperature (tmin), show strong positive correlations with the average temperature, indicating that changes in these factors will likely affect water consumption patterns.

However, some variables, such as wind speed (velmedia) and wind direction (dir), have weaker correlations, suggesting a lesser impact on water consumption when the temperature is used as a stand-in. Moreover, correlations among the variables highlight potential multicollinearity issues, which must be addressed when selecting independent predictor variables for any forecasting models to avoid invalid results. Based on the correlations, the average temperature (tmex), which has the highest correlation with maximum temperature (tmed), could be considered a significant input variable in a water consumption forecasting model using methods such as LSTM networks.

5.3. LSTM Model

The first model implemented on this platform for predicting water consumption is the LSTM network, which integrates climate variables along with real-time and historical water consumption data. By training on synthetic datasets derived from daily consumption records, LSTM captures long-term dependencies inherent in time series data. This makes it particularly effective in forecasting future water usage. The model combines historical water use patterns with real-time meteorological observations to deliver accurate and reliable predictions. Given its strength in analyzing sequential data, the LSTM model is crucial for strategic water management, enabling more informed decision-making and resource planning. Key Operations of LSTM Networks:

Forget Gate: The forget gate decides what information should be discarded from the cell state. It is computed as follows:

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(2)

Input Gate: The input gate updates the cell state by adding new information into the cell, calculated by:

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(3)

Candidate Layer: This layer creates a vector of new candidate values that could be added to the state:

{\tilde{C}}_{t} = tanh (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{C})

(4)

Cell State Update: The cell state is updated by combining the old state and the new candidate values, influenced by the forget and input gates:

C_{t} = f_{t} \cdot C_{t - 1} + i_{t} \cdot {\tilde{C}}_{t}

(5)

Output Gate: The output gate decides what the next hidden state should be, which contains information on previous inputs:

o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})

(6)

Hidden State Output: The hidden state contains the output information to be passed onto the next timestep:

h_{t} = o_{t} \cdot tanh (C_{t})

(7)

Data preparation for LSTM involves applying Min-Max scaling for data normalization:

x^{'} = \frac{x - x_{\min}}{x_{\max} - x_{\min}}

(8)

The dataset is methodically partitioned into two segments: 80% for training and 20% for testing. This distribution allows the training set to predict future water consumption across specified periods effectively. Utilizing a structured approach, the LSTM network harnesses both historical and meteorological data to generate accurate forecasts of water consumption patterns. Below, we outline the step-by-step Algorithm 1 for training the LSTM model tailored for predicting water consumption:

Algorithm 1 LSTM for water consumption forecasting

1:: Initialize parameters:
2:: Define the number of LSTM units (neurons), learning rate, and epochs
3:: Initialize weight matrices $W_{f}$ , $W_{i}$ , $W_{C}$ , $W_{o}$ , and bias vectors $b_{f}$ , $b_{i}$ , $b_{C}$ , $b_{o}$
4:: Preprocess input data:
5:: Normalize water consumption and meteorological data using Min-Max scaling
6:: Divide the dataset into training, validation, and testing sets
7:: Create sequences of input data X and target values Y
8:: Reshape input data X to (num_samples, sequence_length, num_features)
9:: Model Training:
10:: Initialize cell state $C_{0}$ and hidden state $h_{0}$ to zeros
11:: for each epoch do
12:: for each batch in the training data do
13:: for each time step t in the input sequence do
14:: Compute Forget Gate: $f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})$
15:: Compute Input Gate: $i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})$
16:: Compute Candidate Cell State: ${\tilde{C}}_{t} = tanh (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{C})$
17:: Update Cell State: $C_{t} = f_{t} \cdot C_{t - 1} + i_{t} \cdot {\tilde{C}}_{t}$
18:: Compute Output Gate: $o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})$
19:: Update Hidden State: $h_{t} = o_{t} \cdot tanh (C_{t})$
20:: end for
21:: Compute output (predicted water consumption): $y_{{pred}_{t}} = Dense (h_{t})$
22:: Calculate batch loss: $MSE (y_{{pred}_{t}}, y_{{true}_{t}})$
23:: Backpropagation through time (BPTT):
24:: Calculate gradients of Loss w.r.t weights and biases
25:: Update $W_{f}$ , $W_{i}$ , $W_{C}$ , $W_{o}$ and $b_{f}$ , $b_{i}$ , $b_{C}$ , $b_{o}$ using an optimizer
26:: end for
27:: Evaluate model on the validation set after each epoch
28:: end for
29:: Model Evaluation:
30:: Test the model on the testing set
31:: Calculate and report performance metrics: RMSE and MAPE
32:: Model Deployment:
33:: Save the trained model for future use
34:: Deploy the model for real-time or batch water consumption prediction

5.4. Prophet Model

We employed the Prophet model to forecast daily water consumption from 2020 to 2024, integrating historical water usage data with maximum daily temperature as an external regressor. Prior analyses using the Pearson correlation coefficient indicated a significant positive correlation between water consumption and maximum temperature. This relationship suggests that higher temperatures lead to increased water usage due to activities like irrigation and cooling. By retaining missing values in the dataset, we utilized Prophet’s capability to handle incomplete data internally, preserving the integrity and variability essential for robust forecasting. The model incorporated maximum temperature as an external regress to account for temperature-related variations in water demand. Prophet is particularly suitable for this application because it handles time series data with complex seasonal patterns and additional regressors. The model is formulated as:

y (t) = g (t) + s (t) + h (t) + β \cdot T_{\max} (t) + ϵ_{t},

(9)

where:

$y (t)$ is the forecasted water consumption at time t,
$g (t)$ represents the growth trend component (linear or logistic),
$s (t)$ denotes the seasonal component modeled using Fourier series to capture annual and weekly patterns,
$h (t)$ accounts for holiday effects using indicator functions,
$T_{\max} (t)$ is the maximum temperature on day t,
$β$ quantifies the impact of temperature on water consumption,
$ϵ_{t}$ is the error term, assumed to be normally distributed with a mean of zero.

The inclusion of

T_{\max} (t)

as an external regressor was implemented using Prophet’s add_regressor function, enabling the model to learn the relationship between temperature and water demand dynamically.

During the model fitting process, the Prophet model was trained on the historical dataset, optimizing the model parameters to minimize the discrepancy between the predicted and observed water consumption. Prophet employs maximum likelihood estimation (MLE) to estimate the parameters by optimizing the following objective function:

min_{θ} \sum_{t = 1}^{n} {(y (t) - \hat{y} (t; θ))}^{2},

(10)

where:

$θ$ represents all model parameters, including $β$ , trend, and seasonal components,
$y (t)$ is the actual water consumption,
$\hat{y} (t; θ)$ is the predicted water consumption based on parameters $θ$ .

After completing the training and validation phases, the model generated forecasts for future water consumption over two periods: from January 1, 2023, to July 1, 2024 (18 months), and from January 1, 2024, to July 1, 2024 (6 months). These forecasts utilized patterns identified from historical data, including trends, seasonal variations, holiday effects, and the influence of maximum temperatures. The anticipated water consumption values were calculated using:

\hat{y} (t) = \hat{g} (t) + \hat{s} (t) + \hat{h} (t) + \hat{β} \cdot T_{\max} (t),

(11)

where

\hat{g} (t)

,

\hat{s} (t)

,

\hat{h} (t)

, and

\hat{β}

are the estimated components from the fitting stage, and

T_{\max} (t)

represents the projected maximum temperature values, either derived from historical trends or provided as external forecasts.

Algorithm 2 outlines the steps in applying the Prophet model for forecasting water consumption.

Algorithm 2 Prophet for water consumption forecasting

1:: Initialize Parameters:
2:: Define forecasting horizon n days
3:: Set growth model $g (t)$ as ’linear’ or ’logistic’
4:: Specify relevant seasonalities (e.g., daily, weekly, yearly)
5:: Preprocess Input Data:
6:: Collect historical water consumption data $y (t)$ and maximum temperature data $T_{\max} (t)$
7:: Prepare dataset with columns:
8:: Date (t): ’ds’
9:: Water consumption ( $y (t)$ ): ’y’
10:: Maximum temperature ( $T_{\max} (t)$ ): ’T_max’
11:: Handle missing values appropriately (Prophet can handle them internally)
12:: Model Configuration:
13:: Initialize Prophet model with specified growth and seasonalities:
14:: model = Prophet(growth $= g (t)$ , daily_seasonality=True,
15:: weekly_seasonality=True, yearly_seasonality=True)
16:: Add maximum temperature as an external regressor:
17:: model.add_regressor(’T_max’)
18:: Model Fitting:
19:: Fit the Prophet model to the historical data:
20:: model.fit(data)
21:: Forecasting:
22:: Create a future dataframe for n days ahead:
23:: future = model.make_future_dataframe(periods $= n$ )
24:: Obtain future maximum temperature $T_{\max} (t)$ for future dates
25:: Add $T_{\max} (t)$ to the ’future’ dataframe under ’T_max’
26:: Generate forecast:
27:: forecast = model.predict(future)
28:: Extract predicted water consumption $\hat{y} (t)$ from ’forecast’ dataframe
29:: Model Deployment:
30:: Save the trained model for future use:
31:: model.save(’prophet_model.pkl’)
32:: Deploy the model for real-time or batch water consumption prediction

5.5. LightGBM Model with Feature Engineering

To enhance the forecasting accuracy of daily water consumption, we employed the LightGBM model, a gradient-boosting framework renowned for its efficiency and performance in handling large-scale data and complex features. This model integrates feature engineering techniques to capture temporal patterns and dependencies inherent in water consumption data.

LightGBM operates by constructing an ensemble of decision trees, where each subsequent tree focuses on correcting the errors of the previous ones. This iterative boosting process allows the model to capture nonlinear relationships and interactions between features, making it well-suited for time-series forecasting tasks. Key components of the LightGBM model are:

Objective Function: The model minimizes an objective function that combines a loss function with a regularization term to prevent overfitting:

$L = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}) + Ω (T),$

(12)

where:

−

$y_{i}$ is the actual water consumption at time i,

−

${\hat{y}}_{i}$ is the predicted water consumption,

−

$l (y_{i}, {\hat{y}}_{i})$ is the loss function, typically MSE for regression tasks,

−

$Ω (T)$ is the regularization term for the complexity of the trees T.
Feature Engineering: Various features were created to enhance the model’s predictive performance:

−

Lag Features: Previous water consumption and temperature values were included to capture temporal dependencies:

$y_{{lag}_{k}} = y_{t - k}, T_{\max, {lag}_{k}} = T_{\max, t - k},$

(13)

where k is the lag period (e.g., 1 day, 7 days).

−

Rolling Means: Moving averages were computed to smooth out short-term fluctuations:

${\bar{y}}_{t} = \frac{1}{w} \sum_{i = t - w + 1}^{t} y_{i}, {\bar{T}}_{\max, t} = \frac{1}{w} \sum_{i = t - w + 1}^{t} T_{\max, i},$

(14)

where w is the window size (e.g., 7 days).

−

Temporal Indicators: Features such as day of the week and weekend indicators were added to capture weekly seasonal patterns:

${IsWeekend}_{t} = \{\begin{matrix} 1 & if day of week \geq 5, \\ 0 & otherwise . \end{matrix}$

(15)
Data Normalization: Feature scaling was performed using standardization to ensure that all features contribute equally to the model training:

$x^{'} = \frac{x - μ}{σ},$

(16)

where x represents the original feature value, $μ$ is the mean, and $σ$ is the standard deviation of the feature.

The LightGBM model effectively captures complex temporal and seasonal patterns in water consumption da by integrating these engineered features. This approach enhances the model’s ability to provide accurate and reliable forecasts, which are crucial for efficient water resource management. In following the pseudocode of the LightGBM model shown in algorithm 3.

Algorithm 3 LightGBM with Feature Engineering for Water Consumption Prediction

1:: Initialize Parameters:
2:: Set LightGBM hyperparameters: learning rate $η$ , number of leaves $n u m_l e a v e s$ , feature fraction $f f$ , bagging fraction $b f$ , bagging frequency $b f r e q$ , number of boosting rounds $n u m_b o o s t_r o u n d$
3:: Data Preparation:
4:: Load dataset with dates $d s$ , water consumption y, and maximum temperature $T_{\max}$
5:: Convert date strings to datetime objects
6:: Remove missing values in y and forward-fill missing values in $T_{\max}$
7:: Feature Engineering:
8:: Create lag features:
9:: $y_{lag 1} = y_{t - 1}$ , $y_{lag 7} = y_{t - 7}$
10:: $T_{\max, lag 1} = T_{\max, t - 1}$ , $T_{\max, lag 7} = T_{\max, t - 7}$
11:: Create rolling mean features with window size $w = 7$ :
12:: ${\bar{y}}_{t} = \frac{1}{w} \sum_{i = t - w + 1}^{t} y_{i}$
13:: ${\bar{T}}_{\max, t} = \frac{1}{w} \sum_{i = t - w + 1}^{t} T_{\max, i}$
14:: Create temporal indicators:
15:: Day of week: $d a y_o f_w e e k_{t} = DayOfWeek (t)$
16:: Is weekend indicator:
17:: is_weekend_t = $\{\begin{matrix} 1 & if d a y_o f_w e e k_{t} \geq 50 & otherwise \end{matrix}$
18:: Remove any rows with missing values resulting from feature creation
19:: Feature Scaling:
20:: Standardize features using z-score normalization:
21:: For each feature X:
22:: $X^{'} = \frac{X - μ_{X}}{σ_{X}}$
23:: Split Data:
24:: Define split date (e.g., January 1, 2024)
25:: Split data into training set (dates before split date) and testing set (dates on or after split date)
26:: Extract feature matrix X and target vector y for both sets
27:: Prepare Data for LightGBM:
28:: Create LightGBM datasets:
29:: Training data: $t r a i n_d a t a = lgb . Dataset (X_{train}, y_{train})$
30:: Validation data: $v a l i d_d a t a = lgb . Dataset (X_{test}, y_{test}, reference = t r a i n_d a t a)$
31:: Model Training:
32:: Train the model using the training data:
33:: $l g b m_m o d e l = lgb . train (params, t r a i n_d a t a, n u m_b o o s t_r o u n d,$
34:: valid_sets = $[v a l i d_d a t a],$ early_stopping_rounds = 100)
35:: Model Prediction:
36:: Predict on the testing set:
37:: $\hat{y} = l g b m_m o d e l . p r e d i c t (X_{test}, num_iteration = l g b m_m o d e l . b e s t_i t e r a t i o n)$
38:: Model Deployment:
39:: Save the trained model for future use
40:: Deploy the model for real-time or batch water consumption prediction

5.6. XGBoost Model with Feature Engineering

We employed the XGBoost model to enhance the accuracy of daily water consumption forecasting. XGBoost, known for its efficiency in regression tasks and regularization techniques to prevent overfitting, is well-suited for time-series forecasting. The dataset consists of daily water consumption and maximum temperature records from 2020 to 2024. Preprocessing steps included converting date strings to datetime objects, renaming columns for consistency, removing missing values in water consumption, and forward-filling missing temperature values to ensure data continuity.

Algorithm 4 XGBoost with Hyperparameter Tuning and Feature Engineering

1:: Initialize Parameters:
2:: Define hyperparameter search space for:
3:: Number of estimators $n_{estimators}$
4:: Maximum depth $\max_depth$
5:: Learning rate $η$
6:: Subsample ratio subsample
7:: Column subsample ratio $colsample_bytree$
8:: Regularization parameters $γ$ , $α$ , $λ$
9:: Data Preparation:
10:: Load dataset with dates $d s$ , water consumption y, and maximum temperature $T_{\max}$
11:: Convert date strings to datetime objects
12:: Remove missing values in y and forward-fill missing values in $T_{\max}$
13:: Feature Engineering:
14:: Create lag features: // Use previous time steps as features
15:: $y_{lag 1} = y_{t - 1}$ , $y_{lag 7} = y_{t - 7}$
16:: $T_{\max, lag 1} = T_{\max, t - 1}$ , $T_{\max, lag 7} = T_{\max, t - 7}$
17:: Create rolling mean features with window size $w = 7$ : // Capture trends over time
18:: ${\bar{y}}_{t} = \frac{1}{w} \sum_{i = t - w + 1}^{t} y_{i}$
19:: ${\bar{T}}_{\max, t} = \frac{1}{w} \sum_{i = t - w + 1}^{t} T_{\max, i}$
20:: Create temporal indicators: // Add time-related patterns
21:: Day of week: $d a y_o f_w e e k_{t} = DayOfWeek (t)$
22:: Is weekend indicator:
23:: ${IsWeekend}_{t} = \{\begin{matrix} 1 & if d a y_o f_w e e k_{t} \geq 50 & otherwise \end{matrix}$
24:: Remove any rows with missing values resulting from feature creation
25:: Feature Scaling:
26:: Standardize features using z-score normalization:
27:: $X^{'} = \frac{X - μ_{X}}{σ_{X}}$ // $μ_{X}$ is the mean and $σ_{X}$ is the standard deviation
28:: Split Data:
29:: Define split date (e.g., January 1, 2024)
30:: Split data into training set (dates before split date) and testing set (dates on or after split date)
31:: Extract feature matrix X and target vector y for both sets
32:: Hyperparameter Tuning:
33:: Define time series cross-validation strategy with k folds
34:: Initialize XGBoost regressor model
35:: Perform randomized search over hyperparameter space using cross-validation to find optimal hyperparameters $Θ^{*}$
36:: Model Training:
37:: Train the XGBoost model with optimal hyperparameters $Θ^{*}$ on the training data
38:: Model Prediction:
39:: Predict on the testing set:
40:: ${\hat{y}}_{t} = XGBoostModel . predict (X_{test})$
41:: Objective Function:
42:: Minimize the objective function:

$L (Θ) = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}) + \sum_{k = 1}^{K} Ω (f_{k})$

(17)
43:: Where:
44:: $l (y_{i}, {\hat{y}}_{i})$ is the loss function (e.g., MAE or MSE)
45:: $Ω (f_{k})$ is the regularization term for tree $f_{k}$
46:: K is the number of trees in the ensemble
47:: Model Deployment:
48:: Save the trained model for future use
49:: Deploy the model for real-time or batch water consumption prediction

5.7. Model Evaluation

The forecasting models were evaluated using three essential statistical metrics: Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Root Mean Squared Error (RMSE). MAE measures the average magnitude of the errors in a set of predictions, without considering their direction, providing a straightforward assessment of prediction accuracy in the same unit as the original data. MAPE quantifies the error as a percentage, making it especially useful for comparing performance across different scales or datasets. RMSE, on the other hand, penalizes larger errors more heavily by squaring them before averaging, offering insight into the variability of the errors and emphasizing significant deviations in the predictions.

MAE represents the average of the absolute differences between the predicted and actual values, providing a straightforward measure of the model’s prediction accuracy:

MAE = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|,

(18)

MAPE measures the average absolute percentage difference between the predicted and actual values, offering insight into the model’s accuracy relative to the actual consumption:

MAPE = \frac{100 %}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}|,

(19)

RMSE is the square root of the Mean Squared Error and represents the standard deviation of the prediction errors, providing a measure of the average magnitude of these errors:

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}},

(20)

where n is the number of observations,

y_{i}

is the actual water consumption, and

{\hat{y}}_{i}

is the predicted water consumption.

By utilizing these evaluation metrics, we identified each model’s strengths and areas for improvement, guiding further refinements to enhance forecasting accuracy. This comprehensive evaluation is crucial for developing reliable water demand forecasts essential for effective water resource management.

Results of water consumption forecasting by the Prophet model with six regressors are presented in Figure 7, and they show large shortcomings for all forecast periods. Looking at the 6-month forecasting, while the model picks up broad seasonal trends and cyclical changes, there are deviations at sharp peaks and troughs that manifest a lack of robustness in handling transient phenomena or unmodeled noise. The identified discrepancies suggest that the model’s reliance on pre-determined regressors may not adequately capture the complex nature of water consumption dynamics, especially in the short term. Although the overall correlation with observed data appears to be acceptable, these problems raise some doubts about the reliability of the model for medium-term operational planning in environments characterized by frequent sudden changes. Looking ahead to the 18-month horizon, the shortcomings of the model become even more apparent. While some deeper seasonal patterns are still captured, there is a visible rise in cumulative error as the model struggles to mimic more complex variations. This kind of deterioration in performance could be due to a combination of static correlations between regressors and water use, along with a diminished importance of the chosen exogenous variables over time, and an inability to adapt to changing dynamics. Results: The findings indicated that the Prophet model with regressors performed reasonably well in short-term forecasting, though serious shortcomings occurred as this went further. Consequently, its adequacy for longer-term operational planning and strategic decision-making was highly questionable.

The performance of the Prophet model with regressors and custom seasonalities, as shown in Figure 8, despite capturing broad trends correctly, reveals several critical deficiencies. For the 6-month forecast, the model does indicate adherence to general consumption patterns; however, large deviations from actual data, particularly for sudden changes or high peaks, suggest that its representation of short-term fluctuations is not adequate. The seasonalities specific to certain domains try to catch the periodic fluctuations in demand and operational cycles; still, this customized approach hardly makes up for the discrepancies with rapid changes or special events in water use. It implies that the tailored seasonalities and regressors alone may not reach a degree of accuracy satisfactory for reliable short-term forecasts. Over the course of the 18 months for which forecasting was conducted, the limitations of the model become more pronounced. While it is able to reproduce the general patterns of the seasons over the long term, results diverge substantially from the actual data at higher temporal resolutions. This increasing divergence shows that the reliance of the model on fixed seasonalities and rigid associations with regressors is not sufficient to handle the compounding uncertainties and shifting dynamics over time. While the tailored seasonalities improve slightly the representation of general trends compared to models using generalized seasonalities, this improvement is of little consequence and does not approach the large disparities in capturing real-world variability. It results in that, though the Prophet model with its seasonality tailored incorporates some usefulness in recognizing general trends, its predictive precision for both immediate and long-term time frames is yet far from satisfactory, which strongly limits its use in practical applications characterized by high dynamism and variability.

The advanced variant of the Prophet model-which includes lag features, rolling means, and custom seasonalities showed in Figure 9 presents an improved performance in predictions by taking into consideration historical dependencies with softly varying trends. Looking at its 6-month-ahead forecast, the model fits well with the actual data of consumption and captures the short-term fluctuations and sudden changes in the trend. The lag features allow the model to capture temporal dependencies, and the rolling means create a smoothed representation of periodic variability that can work to improve the accuracy of the operational planning. Lastly, customized seasonalities further improve the model’s ability to conform to domain-specific periodicities that are highly effective at managing dynamic shifts in water demand patterns. In the 18-month projection, the model maintains its ability to capture seasonal patterns, though it is not as good at capturing finer-scale fluctuations and anomalies. These differences can be explained by less temporal relevance of lagged variables on longer forecasting periods and the inherently higher problem of modeling unexpected changes in consumption behavior.

The developed Prophet Model with Advanced Feature Engineering showed in Figure 10 leverages an expansive set of lagged variables, rolling statistics, and domain-specific customizations that enhance forecasting precision. It can also be seen that the 6-month forecast exhibits a high degree of agreement with real water consumption trends. The lag features embed temporal dependencies within the model, while the rolling mean features reduce the observed short-term variability and can thus better capture the fundamental patterns. Complemented by supplementary contextual data in the day-of-week effects and indications of weekends, the model can consider such a cyclical shift in demand fluctuations. The incorporation of multiplicative seasonality, along with Fourier-based seasonal components for both weekly and monthly cycles, allows the model to respond effectively to the intricate seasonal patterns associated with water consumption. In the case of the 18-month forecast, the model maintains strong performance in identifying long-term seasonal trends; however, it exhibits certain discrepancies when forecasting abrupt transitions and anomalies. This can be attributed to the fact that the predictive power of lagged variables and their rolling means weaken with the extension of horizons. Moreover, there are possible external influences not represented by the engineered features. Still, the inclusion of country-specific holidays and domain-specific seasonality somewhat combats these weaknesses and allows the model to remain robust over this longer forecast period.

The predictions of XGBoost model with feature engineering plotted in Figure 11 exhibits strong performance in forecasting water consumption, benefiting from the same comprehensive set of engineered features as the advanced Prophet model. In the 6-month forecast, the model effectively captures short-term fluctuations and seasonality, closely aligning with actual water consumption data. The integration of lag features, rolling statistics, and temporal variables (day of the week, weekend indicators) allows XGBoost to leverage both temporal dependencies and cyclical demand patterns. Minor deviations during extreme consumption peaks suggest potential overfitting to specific patterns or the need for additional external variables. In the 18-month forecast, the model continues to demonstrate robustness in capturing overarching trends and seasonal patterns, though its accuracy diminishes in predicting sudden anomalies and small variations. The diminishing predictive power of lag and rolling features over longer horizons, combined with the inherent limitations of tree-based models in extrapolation, likely contribute to this reduction in performance. Despite these challenges, the model benefits from its ability to prioritize relevant features during training, ensuring that the most critical information drives the predictions.

The LightGBM model with feature engineering is showed in Figure 12 offers a nuanced understanding of water consumption dynamics, particularly over short and medium-term horizons. For the 6-month forecast, the model demonstrates remarkable precision in tracking the actual water consumption trends, effectively capturing both abrupt shifts and recurring seasonal patterns. This performance can be attributed to its reliance on well-designed features, including lag variables and rolling means, which encapsulate historical consumption behaviors and smooth out irregularities. Moreover, LightGBM’s ability to assign feature importance dynamically ensures that critical variables, such as temperature and temporal markers like day-of-week or weekend indicators, are prioritized during the learning process. However, minor deviations during peak consumption periods suggest the potential for further optimization, perhaps through the inclusion of additional external drivers. Over the 18-month horizon, the model exhibits consistent performance in maintaining the overall trend alignment and seasonal pattern recognition, although some accuracy is sacrificed in capturing localized anomalies. This decline can be linked to the fading influence of lagged variables over extended periods and the inherent difficulty of modeling long-term variability with a gradient-boosting framework. Despite slight mismatches in predicting extreme events, the LightGBM model’s efficiency in handling large datasets and its capacity to generalize well over different temporal scales make it a compelling choice for forecasting tasks where both accuracy and computational efficiency are crucial.

By the use of an Stacking Ensemble model, our research investigated water consumption forecasting using a hybrid approach that combines multiple prediction techniques. Results are displayed in Figure 13. Analysis of 6-month forecasts revealed that merging different statistical approaches produced more accurate predictions than using any single method alone. The model successfully identified both day-to-day changes and seasonal trends in water usage patterns. While most predictions closely matched actual consumption data, some extreme usage peaks showed minor discrepancies, suggesting room for future refinement. When extended to 18-month predictions, the hybrid approach maintained its effectiveness in capturing both short-term fluctuations and long-term consumption patterns. The model proved particularly adept at identifying recurring seasonal changes in water usage. Empirical testing showed improved accuracy compared to traditional forecasting methods, especially when analyzing extended time periods. Though the model occasionally struggled to predict quick changes in consumption, it consistently provided reliable forecasts across various testing scenarios. These findings demonstrate the practical value of combining multiple forecasting techniques for water consumption prediction.

This LSTM Neural Network gives good results as showed in Figure 14 in the forecasting of water consumption by using its strengths in portraying temporal dependencies within sequential data. The model stays close to the actual values during the 6-month forecast, showing good capture of short-term fluctuations and seasonality in the data. The LSTM architecture’s recurrent structure allows storing and using previously computed information at later time steps, which gives the network the capability to represent dynamic and nonlinear relations peculiar to water consumption patterns. Small deviations in the case of important peaks suggest potential influences from external factors not represented in the input features; however, the predictions generally show a high level of reliability. In the 18-month forecast, the LSTM continues to perform well in maintaining long-term seasonal trends and general consumption patterns. It is good at capturing dependencies over long sequences, which helps in sustaining its predictive accuracy over time. However, the model displays some limitation in dealing with things that represent abrupt changes or localized anomalies, as most long-horizon predictions are bound to do, because of the attenuation of temporal dependencies over such periods.

The LSTM and GRU hybrid model’s prediction are plotted in Figure 15 , has been good at capturing the short- and long-term dependencies of the water consumption dataset. In 6-month forecasting, the combination of LSTM and GRU architectures really models the sequential dependencies well and gives accurate predictions very close to real consumption patterns. The LSTM layers learn long-term trends, while the GRU layers introduce flexibility to account for more immediate temporal changes. Further, the inclusion of dense layers helps fine-tune interactions between features, and the dropout mechanism decreases overfitting, especially during periods of high variance. The hybrid structure ascertains a balanced approach in dealing with the complexities related to short-term fluctuations and periodic trends. The hybrid model demonstrates a high ability to maintain and extrapolate learned patterns throughout a long forecasting period of 18 months. While long-term prediction naturally shows some variances, due to the nature of sudden changes in water demand, the model is able to maintain consistency with both seasonal and trend components. The addition of more dense layers increases the generalization capability of the model over long sequences, while dropout regularization prevents overfitting common in neural networks with high-capacity architecture.

The LSTM Neural Network with Rolling Mean Features takes a more simplified approach compared to the hybrid LSTM and GRU model with additional layers. Predictions obtained with this model are presented under Figure 16 While rolling mean features succeed in smoothing out short-term fluctuations and providing a clearer representation of underlying trends, this model’s ability to adapt to sudden changes or to model complicated temporal patterns is limited by its lack of structural complexity—most specifically, the exclusion of GRU layers or additional dense layers. In the 6-month forecast, the model does well in generalizing trends and cyclical behavior, with the rolling mean features helping to stabilize the predictions. However, since this has no GRU layers, the model is less responsive to sharp, localized fluctuations; hence, slightly larger errors during abrupt changes in consumption compared to the hybrid model. To a certain extent, the rolling mean feature approach is able to expose more and more weaknesses in the 18-month ahead forecast. Although good at preserving long-term seasonal patterns, the model struggles more with smaller-scale oscillations and adaptive changes in water usage patterns. On the other hand, the hybrid model allows for much finer handling of the temporal dependencies over extended periods due to the combined strengths of LSTM and GRU layers.

The MV-LSTM model, presented by Niknam [70], exhibits its ability to fuse multivariate interdependencies by integrating external meteorological parameters with water consumption data. Predictions of water consumption are plotted in Figure 17. This enables the model to learn complicated relationships between climatic variables and water demand, which consequently enhances its performance in forecasting short-term consumption variability. The combination of these different input features makes the model more capable of learning temporal dependencies and complicated interactions, which guarantees stable predictive performance in shorter time frames. In longer horizons, MV-LSTM preserves the possibility of accurate predictions by maintaining seasonal patterns and general consumption trends. Its sensitivity to new changes decreases, yet it still reacts to dynamic external conditions due to the inclusion of the variables with lags and those representing meteorological data. By integrating multivariate inputs with sophisticated recurrent architectures, the MV-LSTM proves suitable for applications in which internal and external variables that impact water consumption need to be evaluated.

This study conducts an in-depth evaluation of a variety of forecasting models for the task of water demand prediction, with a special focus on their performance over both the short and long terms. Models considered range from various variants of Prophet to gradient boosting models using LightGBM and XGBoost and to state-of-the-art neural network architectures, including LSTM-based models and the lately proposed multivariate LSTM (MV-LSTM) [70]. Results in Table 5 indicate that the performances of these models differ substantially. Each of the methods thus has its pros and cons: while Prophet models show stable performance across the forecasting horizon, they reveal large improvements when advanced feature engineering is applied. The Prophet with Advanced Feature Engineering variant performs the best among the Prophet models: MAE of 5.76 and RMSE of 8.31 for the 6-month horizon, and MAE of 10.07 and RMSE of 15.02 for the 18-month horizon. Such improvement highlights the need to include domain-specific features, such as lagged variables and custom seasonalities, as these increase the model’s ability to recognize temporal patterns and seasonal trends. While Prophet models are computationally efficient and interpretable, they present limitations in adapting quickly to, for example, sudden changes in consumption for the longer forecast horizons.

Among all the gradient boosting models, LightGBM outperforms XGBoost for both short- and long-term horizons, with an MAE of 5.90 and RMSE of 8.25 for the 6-month horizon, and an MAE of 11.77 and RMSE of 18.31 for the 18-month horizon. This improved performance of LightGBM may be attributed to its capability of handling large datasets efficiently while capturing complex interactions among features by its gradient boosting framework. XGBoost is competitive but has slightly higher errors and more so with the longer-term horizon. Ensemble methods, such as stacking LightGBM and XGboost, are more robust but do not outperform LightGBM alone in terms of general accuracy. This underlines the stand-alone strength that LightGBM possesses for time series forecasting.

The neural network approaches, especially those based on LSTM, are capable of capturing complex temporal dependencies. The LSTM model performs well, with an MAE of 5.96 and RMSE of 9.38 for 6 months, showing its strength in short-term forecasting. However, the model’s performance slightly deteriorates for the 18-month horizon, with an MAE of 12.63 and RMSE of 20.66, pointing out some challenges in keeping high accuracy over longer time frames. The hybrid model LSTM + GRU, despite its architectural complexity, does not outperform the simpler configurations of LSTM, indicating that deeper layers and more components may introduce overfitting or a higher sensitivity to noise. The MV-LSTM proposed by Niknam outperforms all the other models with respect to accuracy in capturing multivariate dependencies. It realizes the lowest MAPE 15.48% for the 6-month horizon and performs well in the long-term horizon with an MAPE of 19.30%, which is the most accurate model concerning the relative percentage error. These results have highlighted the potential of multivariate approaches in leveraging external regressors to enhance predictive precision.

6. Optimizing WDS Maintenance

In many rural water distribution networks, operators face challenges in determining the most effective routing and scheduling of maintenance activities. Such decisions are usually based on subjective judgment, which may lead to inefficiencies, especially where many activities with varying priorities and dependencies must be coordinated simultaneously. Without a systematic approach, operators are left to grapple with how to prioritize tasks in a manner that minimizes operational costs—travel time, CO₂ emissions, distance traveled—to the end user while resulting in longer times to complete tasks and suboptimal resource utilization. The problem becomes even more complicated when high-priority emergency tasks arrive at times that don’t agree with the existing schedule presented to the operator and when tasks depend on the execution of each other. This calls for more structured and dynamic scheduling with a view to incorporating uncertainties and task dependencies.

We are dealing with an NP-hard problem known as Single Machine Scheduling with Preemptive Jobs, Variable Release Times, and Task Dependencies, in which there is a single resource that has to deal with multiple tasks, which may be interrupted (preemptions), new tasks may arrive at any moment, and some tasks may depend on the completion of others. This model incorporates, in each day, random assignments of dependencies between tasks to reflect real-world operational complexities where not all tasks are independent. The objective is to minimize general operational costs, including the total completion time, fuel consumption in liters, emissions of CO₂, and delays of tasks, and optimally manage tasks with different priority levels, including high-priority emergency tasks, also known as emergency response activities while satisfying task dependencies.

Apart from saving time and delays in the completion of the tasks, the model also accounts for fuel consumption and CO₂ emissions by analyzing various task characteristics such as location, processing time, and gradients that affect fuel efficiency. With this scheduling model in place, the system is able to cope with the challenges in dynamic task arrivals, task dependencies, and preemptions while ensuring the proper execution of prioritized tasks. Given that the problem is NP-hard, we consider using CP as an approach that provides an optimized solution. In the following sections, we describe the mathematical formulation of the problem.

Mathematical Model

We have adapted the model to be compatible with CP, focusing on deterministic parameters. Uncertainties in processing and travel times are handled using expected or conservative estimates. The model aims to optimize the objectives while satisfying all constraints.

The objectives are to:

−

Minimize the total completion time ( $C_{\max}$ ).

−

Minimize the total fuel consumption ( $F_{total}$ ).

−

Minimize the total CO₂ emissions ( $C_{total}$ ).

−

Minimize the total delays and penalties ( $D_{total}$ ).
Sets and Indices

−

Let T be the set of all tasks indexed by i.

−

Let $K_{i}$ be the set of processing segments for task i due to preemption, indexed by k.

−

Let D be the set of task dependencies, where $(i, j) \in D$ means task j depends on task i.

An example of task dependencies is provided in Table 6.

Parameters

For each task $i \in T$ :

−

$p_{i}$ : Deterministic processing time required for task i (in hours), based on expected or conservative estimates.

−

$d_{i j}$ : Deterministic travel time from task i to task j (in hours), based on expected or conservative estimates.

−

$f_{i}$ : Fuel consumption for processing task i (in liters).

−

$c_{i}$ : CO₂ emissions for processing task i (in kg).

−

${loc}_{i}$ : Location coordinates $({lat}_{i}, {lon}_{i})$ of task i.

−

${priority}_{i}$ : Priority level of task i (higher value indicates higher priority).

−

$r_{i}$ : Task’s release time (arrival time). For regular tasks, $r_{i} = 0$ ; for emergency tasks, $r_{i} \geq 0$ .

−

$l_{i j}$ : Distance from task i to task j (in kilometers).

−

${VE}_{v}$ : Fuel efficiency of vehicle type v (in km per liter).

−

${EF}_{v}$ : Emission factor of vehicle type v (in kg CO₂ per liter of fuel).

−

S: Start of the working day (e.g., 8:00 AM).

−

E: End of the working day (e.g., 3:00 PM).

−

${Vehicle}_{i}$ : Vehicle type required for task i (e.g., Van, Small Truck).

−

MaxPreemptions: Maximum allowed number of preemptions per task.

Table 7 provides an example of tasks and associated parameters.

The travel times and distances between tasks are provided in Table 8.

The fuel efficiency and emission factors for the vehicle types used are given in Table 9.

Decision Variables

−

Task Scheduling Variables: Variables that define when each task or task segment starts and ends.

*

$s_{i k}$ : Scheduled start time of segment k of task i.

*

$C_{i k}$ : Scheduled completion time of segment k of task i.

*

$p_{i k}$ : Scheduled processing time of segment k of task i.

*

$K_{i}$ : Number of segments into which task i is divided due to preemption, where $K_{i} \leq MaxPreemptions$ .

−

Sequencing Variables: Binary variables that determine the order in which tasks are performed relative to each other.

*

$y_{i j}$ : Binary variable; $y_{i j} = 1$ if task i is scheduled immediately before task j, 0 otherwise.

−

Auxiliary Variables:

*

$δ_{i}$ : Binary variable; $δ_{i} = 1$ if task i is preempted, 0 otherwise.

*

$θ_{i j}$ : Binary variable; $θ_{i j} = 1$ if task j depends on task i, 0 otherwise.

*

$x_{i j k l}$ : Binary variable indicating if segment k of task i is scheduled before segment l of task j.
Objective Function

$min Z = w_{t} \times (C_{\max} - S) + w_{f} \times F_{total} + w_{c} \times C_{total} + w_{d} \times D_{total}$

(21)

Where:

−

$w_{t}, w_{f}, w_{c}, w_{d}$ are the weights for time, fuel consumption, CO₂ emissions, and delays and penalties, respectively.

−

$C_{\max} = {max}_{i, k} C_{i k}$ : Completion time of the last task segment.

−

$F_{total} = \sum_{i \in T} f_{i} + \sum_{i \in T} \sum_{j \in T} y_{i j} \cdot f_{i j}$ : Total fuel consumption for processing and traveling.

−

$C_{total} = \sum_{i \in T} c_{i} + \sum_{i \in T} \sum_{j \in T} y_{i j} \cdot c_{i j}$ : Total CO₂ emissions for processing and traveling.

−

$D_{total} = \sum_{emergency i} (C_{i K_{i}} - r_{i})$ : Total delays for emergency tasks beyond their release times.
Fuel Consumption and CO₂ Emissions for Traveling

The fuel consumption and CO₂ emissions for traveling between tasks are calculated using the following equations:

f_{i j} = \frac{l_{i j}}{{VE}_{{Vehicle}_{i}}}

(22)

c_{i j} = f_{i j} \times {EF}_{{Vehicle}_{i}}

(23)

Using the data from Table 7, Table 9, and Table 8, the calculated fuel consumption and CO₂ emissions for traveling are presented in Table 10.

Constraints

−

Processing Time Constraints: Ensure that each task’s total scheduled processing time, including preempted segments, matches the required time.

$\sum_{k = 1}^{K_{i}} p_{i k} = p_{i}$

(24)

−

Segment Completion Constraints:

For all segments k of task i:

$C_{i k} = s_{i k} + p_{i k}$

(25)

−

Precedence Constraints for Task Dependencies:

If task j depends on task i (i.e., $(i, j) \in D$ ):

$s_{j 1} \geq C_{i K_{i}}$

(26)

This ensures that task j cannot start before task i is completed.

−

Travel Time Constraints:

When task i is scheduled immediately before task j:

$s_{j 1} \geq C_{i K_{i}} + d_{i j}$

(27)

−

Non-Overlap Constraints (Single-Machine Constraint):

To prevent overlapping of processing times on the single machine, we include the following constraints for all tasks $i \neq j$ and their segments k and l:

$C_{i k} \leq s_{j l} + M (1 - x_{i j k l})$

(28)

$C_{j l} \leq s_{i k} + M x_{i j k l}$

(29)

Where:

*

$C_{i k}$ is the completion time of segment k of task i.

*

$s_{i k}$ is the start time of segment k of task i.

*

$x_{i j k l}$ is a binary variable defined as:

$x_{i j k l} = \{\begin{matrix} 1, & if segment k of task i is scheduled before segment l of task j; \\ 0, & otherwise . \end{matrix}$

*

M is a sufficiently large positive constant.

−

Work Hours Constraints:

Ensure tasks are scheduled within working hours:

$S \leq s_{i k} \leq C_{i k} \leq E$

(30)

−

Emergency Task Constraints:

*

Release Time Constraint:

$s_{i 1} \geq r_{i}$

(31)

*

Delay Penalties: Delays for emergency tasks are included in the objective function through $D_{total}$ as shown in Equation (21).

−

Limit on Preemptions:

$K_{i} \leq MaxPreemptions$

(32)

−

Sequencing Constraints:

Ensure that each task is preceded and succeeded by at most one other task:

$\sum_{j \in T} y_{i j} = 1 \forall i \in T$

(33)

$\sum_{i \in T} y_{i j} = 1 \forall j \in T$

(34)

−

Subtour Elimination Constraints (Miller-Tucker-Zemlin constraints):

Introduce variables $u_{i}$ for each task i:

$u_{i} \geq 1 \forall i \in T$

(35)

$u_{i} - u_{j} + | T | y_{i j} \leq | T | - 1 \forall i \neq j$

(36)
Performance Metrics

In addition to the objective function, we define the following performance metrics to evaluate the scheduling model:

−

Total Delays and Penalties ( $D_{total}$ ):

$D_{total} = \sum_{i \in T} max (0, C_{i K_{i}} - r_{i})$

(37)

This represents the total delay beyond the release times $r_{i}$ for all tasks.

−

Efficiency and Utilization ( $E_{eff}$ ):

$E_{eff} = (\frac{\sum_{i \in T} p_{i}}{C_{\max} - S}) \times 100 %$

(38)

This represents the percentage of time spent on processing tasks relative to the total time from the start to the completion of all tasks.

Algorithm 5 CP for Scheduling Problem

Require: Set of tasks T, release times $r_{i}$ , processing times $p_{i}$ , dependencies D, preemption limit MaxPreemptions, objective weights $w_{t}, w_{f}, w_{c}, w_{d}$
Ensure: Optimized schedule with minimized completion time, fuel consumption, CO₂ emissions, and task delays
Initialize constraint model CPModel
for each task $i \in T$ do
Define start time variables $s_{i k}$ and completion time variables $C_{i k}$ for each segment k of task i
Define preemption segments $K_{i} \leq MaxPreemptions$ , with $\sum_{k = 1}^{K_{i}} p_{i k} = p_{i}$
Set release time constraint: $s_{i 1} \geq r_{i}$
if task i has dependencies then
for each $(i, j) \in D$ do
Add precedence constraint: $s_{j 1} \geq C_{i K_{i}}$
end for
end if
end for
for each task pair $(i, j) \in T$ where $i \neq j$ do
Add sequencing constraints to prevent overlapping:
$C_{i k} \leq s_{j l} + M (1 - x_{i j k l})$
$C_{j l} \leq s_{i k} + M x_{i j k l}$
end for
for each task $i \in T$ do
Set within working hours constraints: $S \leq s_{i k} \leq C_{i k} \leq E$
end for
Define objective function to minimize:
$Z = w_{t} \times (C_{\max} - S) + w_{f} \times F_{total} + w_{c} \times C_{total} + w_{d} \times D_{total}$
Solve CPModel using a CP solver (e.g., Google OR-Tools)
if solution found then
Extract optimized schedule with task start and completion times
else
Report that no feasible solution was found
end if

The above model addresses uncertainties in processing and travel times by using deterministic estimates, such as expected or conservative values. By adopting CP, we optimize the schedule while satisfying all constraints and providing robust solutions that effectively handle task dependencies, variable release times, preemptions, and multiple objectives (see Algorithm 5 for the pseudocode of our CP-based approach).

Note: The deterministic approach simplifies the model for CP solvers while still practically accounting for variability. This allows for efficient computation and implementation using tools like Google OR-Tools, which are well-suited for handling complex scheduling problems with the defined constraints.

The comparison and improvement of the proposed scheduling model against conventional operator methods are illustrated in Table 11 and Figure 18. These results represent the average performance obtained from 20 independent executions of the algorithm to ensure statistical robustness and reliability.

7. Security Layer in DTs Platform

In the digital transformation era, DTs have become a cornerstone for enhancing operational efficiency and decision-making across industries [26,71]. These virtual replicas, integral to systems from manufacturing to smart cities, leverage real-time data to mirror and predict the physical world’s behavior. However, the complexity of DTs introduces multiple layers of cybersecurity risks that must be meticulously managed. Cybersecurity for DTs is not just an add-on but a foundational component that ensures the safe functioning of these systems. Each aspect, from hardware and software to models and algorithms, requires comprehensive protection to defend against cyber threats. Without robust cybersecurity measures, DTs could become liabilities, offering cyber criminals potential backdoors to critical infrastructure [30,72].

To mitigate these vulnerabilities, implementing predefined security layers is essential. As detailed in Table 12, CAUCCES integrates a range of tailored cybersecurity strategies specifically for the context of smart water management. These strategies include advanced encryption methods for securing data at rest and in transit, rigorous access controls to limit interactions with DT systems to authorized personnel, and continuous monitoring to detect and respond to threats in real-time. The table also outlines the specific cybersecurity approaches applied in the CAUCCES project to protect DTs used in managing WDSs. This includes securing communication channels that transmit sensitive data, employing anomaly detection techniques to quickly identify potential threats, and implementing robust data flow integrity measures.

The integration of these cybersecurity measures is critical to maintaining the integrity and reliability of DTs. As these systems increasingly support essential infrastructure—from healthcare to public utilities—the stakes for cybersecurity can hardly be overstated. Ensuring the resilience of DTs against cyber threats is not merely about protecting information but about safeguarding public welfare and the environments these systems serve.

By proactively addressing cybersecurity and embedding it within DT technologies, organizations can prevent potential vulnerabilities, ensuring that DTs enhance, rather than compromise, the digital future. The CAUCCES project underscores the criticality of incorporating robust cybersecurity strategies within the DT framework for water management. By addressing key areas, including smart meter security, communication network protection, employee training, database security, anomaly detection, data privacy, real-time monitoring, and collaborative approaches, the project sets a benchmark in fortifying water infrastructure resilience against evolving cyber threats. This comprehensive approach strengthens infrastructure while fostering a culture of security awareness and preparedness, serving as a model for future DT and cybersecurity integration initiatives.

8. Conclusions

This paper has explored the transformative potential of DTs in the water distribution sector, underscoring their pivotal role in enhancing system efficiency, reliability, and sustainability. By integrating IoT devices, advanced AI, and ML algorithms, DTs offer a dynamic and robust platform for simulating real-world water systems, enabling predictive maintenance, real-time monitoring, and strategic decision-making.

We presented a novel DT platform within the CAUCCES project, which integrates sophisticated AI/ML models—including LSTM networks, Prophet, LightGBM, and XGBoost—for accurate forecasting of water consumption based on historical and meteorological data. The model evaluations demonstrated that incorporating advanced feature engineering and hyperparameter tuning significantly improves forecasting accuracy, which is essential for effective water resource management. For instance, the Prophet model with advanced feature engineering achieved a MAE of 5.76 and a MAPE of 18.61% over a 6-month forecasting horizon, outperforming basic models.

Additionally, we addressed the optimization of WDS maintenance by formulating a scheduling problem using CP. The proposed model effectively minimizes total completion time, fuel consumption, CO₂ emissions, and delays, enhancing operational efficiency and reducing environmental impact. The results indicated a 14% reduction in total completion time and a 17% decrease in CO₂ emissions compared to conventional methods, highlighting the efficacy of the CP-based approach in handling complex scheduling problems in WDS maintenance.

Moreover, the platform’s emphasis on cybersecurity ensures the integrity and confidentiality of data, a critical aspect given the increasing threats in the digital landscape. Implementing robust cybersecurity measures aligned with standards like ISO 27001 protects the system against potential cyber-attacks, ensuring continuous and reliable service delivery.

Future research lines could prioritize the integration of more advanced artificial intelligence and machine learning methodologies, including deep learning frameworks and ensemble techniques, with the focus of improving the accuracy of the prediction of water consumption and anomaly detection. Combination of real-time data analytics with optimization techniques may, therefore, yield a possible way to shift toward more dynamic and adaptive management of water distribution systems, one that reacts rapidly to change and unexpected events. Likewise, the expansion of this platform to greater and complex water networks and integration with other utility systems can provide urban resource management in an altogether more holistic approach. In addition, a better user interface and extensive training programs for utility operators and stakeholders could help achieve greater adoption of the platform. There is a need for continuing compliance with evolving cybersecurity measures and regulatory requirements in order to maintain data security and system integrity. Lastly, future releases of the platform could contain elements focused on assessing and reducing environmental impacts, such as water loss and energy consumption, and thereby contribute significantly to sustainable development goals.

Declarations

Funding: This project is carried out within the framework of the funds of the Recovery, Transformation and Resilience Plan, financed by the European Union (Next Generation). The publication is part of the Spanish Strategic Cybersecurity Project “Artificial Intelligence applied to Cybersecurity in Critical Water and Sanitation Infrastructures (***/**)” funded by Instituto Nacional de Ciberseguridad de España (INCIBE).
Conflict of interest: The authors declare no conflicts of interest relevant to the content of this article.
Ethics approval and consent to participate: Not applicable.
Consent for publication: Not applicable.
Data and code availability: All code and datasets used in this study are publicly available at https://github.com/Homaei/DigitalTwin-Water-ML.
Materials availability: Not applicable.
Author contributions: All authors contributed equally to the conception, design, drafting, and revision of the manuscript.

Appendix A Lemma and Proof of Pearson’s Correlation Coefficient

Lemma: The Pearson correlation coefficient R is a measure of the linear relationship between two variables and is invariant under changes in the location and scale of the variables.

Proof: Let

X^{'}

and

Y^{'}

be two new variables derived from X and Y by linear transformations:

X^{'} = α X + β and Y^{'} = γ Y + δ

(A1)

where

α

and

γ

are non-zero constants, and

β

and

δ

are constants.

The covariance of

X^{'}

and

Y^{'}

is given by:

Cov (X^{'}, Y^{'}) = Cov (α X + β, γ Y + δ) = α γ \cdot Cov (X, Y)

(A2)

The standard deviations of

X^{'}

and

Y^{'}

are:

σ_{X^{'}} = | α | σ_{X} and σ_{Y^{'}} = | γ | σ_{Y}

(A3)

The Pearson correlation coefficient for

X^{'}

and

Y^{'}

is:

R^{'} = \frac{Cov (X^{'}, Y^{'})}{σ_{X^{'}} σ_{Y^{'}}} = \frac{α γ \cdot Cov (X, Y)}{| α | σ_{X} | γ | σ_{Y}} = \frac{Cov (X, Y)}{σ_{X} σ_{Y}} = R

(A4)

Thus, the Pearson correlation coefficient R remains unchanged under linear transformations, proving its invariance under changes in location and scale.

Appendix B Explanation of Non-Overlap Constraints

Explanation of Constraints (28) and (29):

When $x_{i j k l} = 1$ :

−

Constraint (28) simplifies to:

$C_{i k} \leq s_{j l}$

(A5)

This ensures that segment k of task i finishes before segment l of task j starts.

−

Constraint (29) becomes:

$C_{j l} \leq s_{i k} + M \times 1$

(A6)

Since M is a large number, this constraint does not restrict the scheduling and is effectively redundant.
When $x_{i j k l} = 0$ :

−

Constraint (28) becomes:

$C_{i k} \leq s_{j l} + M \times 1$

(A7)

Again, this constraint is redundant due to the large M.

−

Constraint (29) simplifies to:

$C_{j l} \leq s_{i k}$

(A8)

This ensures that segment l of task j finishes before segment k of task i starts.

These constraints guarantee that for any two segments, either one must finish before the other starts, thereby preventing any overlap on the single machine.

Setting the Value of M:

The constant M must be chosen carefully. It should be a value larger than the maximum possible difference between any task segment’s start and completion times within the scheduling horizon. This ensures that when a constraint is intended to be inactive (due to the value of

x_{i j k l}

), it does not inadvertently impose any restrictions on the scheduling variables.

References

Hu, Z.; Chen, B.; Chen, W.; Tan, D.; Shen, D. Review of model-based and data-driven approaches for leak detection and location in water distribution systems. Water Supply 2021, 21, 3282–3306. [Google Scholar] [CrossRef]
DAYIOĞLU, M.A.; TURKER, U. Digital Transformation for Sustainable Future - Agriculture 4.0 : A review. Tarım Bilimleri Dergisi 2021. [CrossRef]
Bauer, P.; Stevens, B.; Hazeleger, W. A digital twin of Earth for the green transition. Nature Climate Change 2021, 11, 80–83. [Google Scholar] [CrossRef]
Beji, H.; Lade, M. Impact of Digital Transformation on Carbon Emissions Reductions in the Water Industry. In Lecture Notes in Energy; Springer International Publishing, 2022; pp. 117–127. [CrossRef]
Khanna, M. Digital Transformation of the Agricultural Sector: Pathways, Drivers and Policy Implications. Applied Economic Perspectives and Policy 2020, 43, 1221–1242. [Google Scholar] [CrossRef]
Agostinelli, S.; Cumo, F.; Guidi, G.; Tomazzoli, C. Cyber-Physical Systems Improving Building Energy Management: Digital Twin and Artificial Intelligence. Energies 2021, 14, 2338. [Google Scholar] [CrossRef]
Ciliberti, F.G.; Berardi, L.; Laucelli, D.B.; Giustolisi, O. Digital Transformation Paradigm for Asset Management in Water Distribution Networks. 2021 10th International Conference on ENERGY and ENVIRONMENT (CIEM). IEEE, 2021, pp. 760–765. [CrossRef]
Jagani, S.; Deng, X.; Hong, P.C.; Mashhadi Nejad, N. Adopting sustainability business models for value creation and delivery: An empirical investigation of manufacturing firms. Journal of Manufacturing Technology Management 2023, 35, 360–382. [Google Scholar] [CrossRef]
Jiang, Y.; Yin, S.; Li, K.; Luo, H.; Kaynak, O. Industrial applications of digital twins. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 2021, 379, 20200360. [Google Scholar] [CrossRef] [PubMed]
Zekri, S.; Jabeur, N.; Gharrad, H. Smart Water Management Using Intelligent Digital Twins. Computing and Informatics 2022, 41, 135–153. [Google Scholar] [CrossRef]
Pylianidis, C.; Osinga, S.; Athanasiadis, I.N. Introducing digital twins to agriculture. Computers and Electronics in Agriculture 2021, 184, 105942. [Google Scholar] [CrossRef]
Boyle, C.; Ryan, G.; Bhandari, P.; Law, K.M.Y.; Gong, J.; Creighton, D. Digital Transformation in Water Organizations. Journal of Water Resources Planning and Management 2022, 148. [Google Scholar] [CrossRef]
Haaker, T.; Ly, P.T.M.; Nguyen-Thanh, N.; Nguyen, H.T.H. Business model innovation through the application of the Internet-of-Things: A comparative analysis. Journal of Business Research 2021, 126, 126–136. [Google Scholar] [CrossRef]
Pivoto, D.G.; de Almeida, L.F.; da Rosa Righi, R.; Rodrigues, J.J.; Lugli, A.B.; Alberti, A.M. Cyber-physical systems architectures for industrial internet of things applications in Industry 4.0: A literature review. Journal of Manufacturing Systems 2021, 58, 176–192. [Google Scholar] [CrossRef]
Feroz, A.K.; Zo, H.; Chiravuri, A. Digital Transformation and Environmental Sustainability: A Review and Research Agenda. Sustainability 2021, 13, 1530. [Google Scholar] [CrossRef]
Wójcicki, K.; Biegańska, M.; Paliwoda, B.; Górna, J. Internet of Things in Industry: Research Profiling, Application, Challenges and Opportunities—A Review. Energies 2022, 15, 1806. [Google Scholar] [CrossRef]
Jalali Sepehr, M.; Mashhadi Nejad, N. Exploring Strategic National Research and Development Factors for Sustainable Adoption of Cellular Agriculture Technology. Proceedings of the 2024 Midwest Decision Sciences Institute Conference. Decision Sciences Institute, 2024. [CrossRef]
Botín-Sanabria, D.M.; Mihaita, A.S.; Peimbert-García, R.E.; Ramírez-Moreno, M.A.; Ramírez-Mendoza, R.A.; de, J. Lozoya-Santos, J. Digital Twin Technology Challenges and Applications: A Comprehensive Review. Remote Sensing 2022, 14, 1335. [Google Scholar] [CrossRef]
Futai, M.M.; Bittencourt, T.N.; Carvalho, H.; Ribeiro, D.M. Challenges in the application of digital transformation to inspection and maintenance of bridges. Structure and Infrastructure Engineering 2022, 18, 1581–1600. [Google Scholar] [CrossRef]
Singh, M.; Fuenmayor, E.; Hinchy, E.; Qiao, Y.; Murray, N.; Devine, D. Digital Twin: Origin to Future. Applied System Innovation 2021, 4, 36. [Google Scholar] [CrossRef]
Arena, S.; Florian, E.; Zennaro, I.; Orrù, P.; Sgarbossa, F. A novel decision support system for managing predictive maintenance strategies based on machine learning approaches. Safety science 2022, 146, 105529. [Google Scholar] [CrossRef]
Nadkarni, S.; Prügl, R. Digital transformation: A review, synthesis and opportunities for future research. Management Review Quarterly 2020, 71, 233–341. [Google Scholar] [CrossRef]
Mendhurwar, S.; Mishra, R. Integration of social and IoT technologies: Architectural framework for digital transformation and cyber security challenges. Enterprise Information Systems 2019, 15, 565–584. [Google Scholar] [CrossRef]
Broo, D.G.; Schooling, J. Digital twins in infrastructure: Definitions, current practices, challenges and strategies. International Journal of Construction Management 2021, 23, 1254–1263. [Google Scholar] [CrossRef]
Ko, A.; Fehér, P.; Kovacs, T.; Mitev, A.; Szabó, Z. Influencing factors of digital transformation: Management or IT is the driving force? International Journal of Innovation Science 2021, 14, 1–20. [Google Scholar] [CrossRef]
Homaei, M.; Óscar, Mogollón-Gutiérrez.; Sancho, J.C.; Ávila, M.; Caro, A. A review of digital twins and their application in cybersecurity based on artificial intelligence. Artificial Intelligence Review 2024, 57, 201. [Google Scholar] [CrossRef]
Li, F. Leading digital transformation: Three emerging approaches for managing the transition. International Journal of Operations and Production Management 2020, 40, 809–817. [Google Scholar] [CrossRef]
Aguilera Castillo, A. Digital Transformation and the Public Sector Workforce:: An exploration and research agenda. 14th International Conference on Theory and Practice of Electronic Governance. ACM, 2021, ICEGOV 2021, pp. 471–475. [CrossRef]
Callaway, S.; Mashhadi Nejad, N. Innovating Toward CSR: Creating Value by Empowering Employees, Customers, and Stockholders. SSRN Electronic Journal 2024. [Google Scholar] [CrossRef]
Sadornil Renedo, D., Ed. The role of Artificial Intelligence in Digital Twin’s Cybersecurity. Editorial Universidad de Cantabria, 2022. [CrossRef]
Shahi, C.; Sinha, M. Digital transformation: Challenges faced by organizations and their potential solutions. International Journal of Innovation Science 2020, 13, 17–33. [Google Scholar] [CrossRef]
Conejos Fuertes, P.; Martínez Alzamora, F.; Hervás Carot, M.; Alonso Campos, J. Building and exploiting a Digital Twin for the management of drinking water distribution networks. Urban Water Journal 2020, 17, 704–713. [Google Scholar] [CrossRef]
Haag, S.; Anderl, R. Digital twin – Proof of concept. Manufacturing Letters 2018, 15, 64–66. [Google Scholar] [CrossRef]
Ketzler, B.; Naserentin, V.; Latino, F.; Zangelidis, C.; Thuvander, L.; Logg, A. Digital Twins for Cities: A State of the Art Review. Built Environment 2020, 46, 547–573. [Google Scholar] [CrossRef]
Rice, L. Digital Twins of Smart Cities: Spatial Data Visualization Tools, Monitoring and Sensing Technologies, and Virtual Simulation Modeling. Geopolitics, History, and International Relations 2022, 14, 26. [Google Scholar] [CrossRef]
Ravid, B.Y.; Aharon-Gutman, M. The Social Digital Twin:The Social Turn in the Field of Smart Cities. Environment and Planning B: Urban Analytics and City Science 2022, p. 239980832211370. [CrossRef]
Bariah, L.; Sari, H.; Debbah, M. Digital Twin-Empowered Smart Cities: A New Frontier of Wireless Networks. TechRxiv 2022. [Google Scholar] [CrossRef]
Henriksen, H.J.; Schneider, R.; Koch, J.; Ondracek, M.; Troldborg, L.; Seidenfaden, I.K.; Kragh, S.J.; Bøgh, E.; Stisen, S. A New Digital Twin for Climate Change Adaptation, Water Management, and Disaster Risk Reduction (HIP Digital Twin). Water 2022, 15, 25. [Google Scholar] [CrossRef]
Yousif, I. Application of Digital Transformation in the Water Desalination Industry to Develop Smart Desalination Plants. Master’s thesis, College of Engineering and Computing, University of South Carolina, 2021.
Savić, D. Digital water developments and lessons learned from automation in the car and aircraft industries. Engineering 2022, 9, 35–41. [Google Scholar] [CrossRef]
Ramos, H.M.; Morani, M.C.; Carravetta, A.; Fecarrotta, O.; Adeyeye, K.; López-Jiménez, P.A.; Pérez-Sánchez, M. New Challenges towards Smart Systems’ Efficiency by Digital Twin in Water Distribution Networks. Water 2022, 14, 1304. [Google Scholar] [CrossRef]
Bonilla, C.A.; Zanfei, A.; Brentan, B.; Montalvo, I.; Izquierdo, J. A Digital Twin of a Water Distribution System by Using Graph Convolutional Networks for Pump Speed-Based State Estimation. Water 2022, 14, 514. [Google Scholar] [CrossRef]
Wei, Y.; Law, A.W.K.; Yang, C.; Tang, D. Combined Anomaly Detection Framework for Digital Twins of Water Treatment Facilities. Water 2022, 14, 1001. [Google Scholar] [CrossRef]
Albarrán, J.C.; Ramírez, E.C.; Salazar, L.A.C.; Astudillo, Y.A.P. Digital Twin in Water Supply Systems to Industry 4.0: The Holonic Production Unit. In Service Oriented, Holonic and Multi-Agent Manufacturing Systems for Industry of the Future; Springer International Publishing, 2021; pp. 42–54. [CrossRef]
Mashhadi Nejad, N.; Alvarado-Vargas, M.J.; Jalali Sepehr, M. Refining Literature Review Strategies: Analyzing Big Data Trends Across Journal Tiers. Academy of Management Proceedings 2024, 2024. [Google Scholar] [CrossRef]
Curl, J.M.; Nading, T.; Hegger, K.; Barhoumi, A.; Smoczynski, M. Digital Twins: The Next Generation of Water Treatment Technology. Journal AWWA 2019, 111, 44–50. [Google Scholar] [CrossRef]
Giudicianni, C.; Herrera, M.; Nardo, A.d.; Adeyeye, K.; Ramos, H.M. Overview of Energy Management and Leakage Control Systems for Smart Water Grids and Digital Water. Modelling 2020, 1, 134–155. [Google Scholar] [CrossRef]
Valverde-Pérez, B.; Johnson, B.; Wärff, C.; Lumley, D.; Torfs, E.; Nopens, I.; Townley, L. Digital Water-Operational digital twins in the urban water sector: case studies. International Water Association, London, UK, White paper 2021.
Garrido-Baserba, M.; Corominas, L.; Cortés, U.; Rosso, D.; Poch, M. The Fourth-Revolution in the Water Sector Encounters the Digital Revolution. Environmental Science & Technology 2020, 54, 4698–4705. [Google Scholar] [CrossRef]
Pedersen, A.N.; Borup, M.; Brink-Kjær, A.; Christiansen, L.E.; Mikkelsen, P.S. Living and Prototyping Digital Twins for Urban Water Systems: Towards Multi-Purpose Value Creation Using Models and Sensors. Water 2021, 13, 592. [Google Scholar] [CrossRef]
Hietala, H.; Rossi, P.M.; Annanperä, E.; Päivärinta, T. Modes of collaboration in digital transformation of municipal wastewater management. 29th European Conference on Information Systems (ECIS 2021), Marrakech, Morocco (Virtual), June 14-16, 2021. Association for Information Systems, 2021, pp. 1470–1486.
van Rooij, F.; Scarf, P.; Do, P. Planning the restoration of membranes in RO desalination using a digital twin. Desalination 2021, 519, 115214. [Google Scholar] [CrossRef]
Udugama, I.A.; Lopez, P.C.; Gargalo, C.L.; Li, X.; Bayer, C.; Gernaey, K.V. Digital Twin in biomanufacturing: Challenges and opportunities towards its implementation. Systems Microbiology and Biomanufacturing 2021, 1, 257–274. [Google Scholar] [CrossRef]
Botín-Sanabria, D.M.; Lozoya-Reyes, J.G.; Vargas-Maldonado, R.C.; Rodríguez-Hernández, K.L.; Ramírez-Mendoza, R.A.; Ramírez-Moreno, M.A.; Lozoya-Santos, J.d.J. Digital Twin for Urban Spaces: An Application. Proceedings of the International Conference on Industrial Engineering and Operations Management, 2021, pp. 2880–2891.
Pesantez, J.E.; Alghamdi, F.; Sabu, S.; Mahinthakumar, G.; Berglund, E.Z. Using a digital twin to explore water infrastructure impacts during the COVID-19 pandemic. Sustainable Cities and Society 2022, 77, 103520. [Google Scholar] [CrossRef] [PubMed]
Matheri, A.N.; Mohamed, B.; Ntuli, F.; Nabadda, E.; Ngila, J.C. Sustainable circularity and intelligent data-driven operations and control of the wastewater treatment plant. Physics and Chemistry of the Earth, Parts A/B/C 2022, 126, 103152. [Google Scholar] [CrossRef]
Dodanwala, T.C.; Ruparathna, R. A Levels of Service (LOS) Digital Twin for Potable Water Infrastructure Systems. Proceedings of the Canadian Society for Civil Engineering Annual Conference 2023. Springer Nature Switzerland AG, 2023, pp. 15–37.
Ramos, H.M.; Kuriqi, A.; Besharat, M.; Creaco, E.; Tasca, E.; Coronado-Hernández, O.E.; Pienika, R.; Iglesias-Rey, P. Smart Water Grids and Digital Twin for the Management of System Efficiency in Water Distribution Networks. Water 2023, 15, 1129. [Google Scholar] [CrossRef]
Grievson, O.; Holloway, T.; Johnson, B. A Strategic Digital Transformation for the Water Industry; IWA Publishing, 2022. [CrossRef]
Fu, G.; Jin, Y.; Sun, S.; Yuan, Z.; Butler, D. The role of deep learning in urban water management: A critical review. Water Research 2022, 223, 118973. [Google Scholar] [CrossRef]
Pedersen, A.N.; Pedersen, J.W.; Borup, M.; Brink-Kjær, A.; Christiansen, L.E.; Mikkelsen, P.S. Using multi-event hydrologic and hydraulic signatures from water level sensors to diagnose locations of uncertainty in integrated urban drainage models used in living digital twins. Water Science and Technology 2022, 85, 1981–1998. [Google Scholar] [CrossRef] [PubMed]
Gino Ciliberti, F.; Berardi, L.; Laucelli, D.B.; David Ariza, A.; Vanessa Enriquez, L.; Giustolisi, O. From digital twin paradigm to digital water services. Journal of Hydroinformatics 2023. [Google Scholar] [CrossRef]
Ramos, H.M.; Kuriqi, A.; Coronado-Hernández, O.E.; López-Jiménez, P.A.; Pérez-Sánchez, M. Are digital twins improving urban-water systems efficiency and sustainable development goals? Urban Water Journal 2023, p. 1–13. [CrossRef]
Torfs, E.; Nicolaï, N.; Daneshgar, S.; Copp, J.B.; Haimi, H.; Ikumi, D.; Johnson, B.; Plosz, B.B.; Snowling, S.; Townley, L.R.; Valverde-Pérez, B.; Vanrolleghem, P.A.; Vezzaro, L.; Nopens, I., The transition of WRRF models to digital twin applications. In Modelling for Water Resource Recovery; IWA Publishing, 2024; chapter 6, pp. 2840–2853. [CrossRef]
Menapace, A.; Zanfei, A.; Herrera, M.; Brentan, B. Graph Neural Networks for Sensor Placement: A Proof of Concept towards a Digital Twin of Water Distribution Systems. Water 2024, 16, 1835. [Google Scholar] [CrossRef]
Sun, C.; Puig, V.; Cembrano, G. Real-Time Control of Urban Water Cycle under Cyber-Physical Systems Framework. Water 2020, 12, 406. [Google Scholar] [CrossRef]
Homaei, M. Ambling, 2024. 04 Apr 2024.
Khazrak, I. A Study on Corporate Carbon Footprint Using Panel Data Analysis. Master’s thesis, Bowling Green State University, OhioLINK Electronic Theses and Dissertations Center, 2023. Committee: Yuhang Xu, Ph.D. (Committee Chair); Shuchismita Sarkar, Ph.D. (Committee Member); Sophie Song, Ph.D. (Committee Member).
Agency, T.S.M. AEMET OpenData, 2024. Accessed: 04 Apr 2024.
Niknam, A.; Zare, H.K.; Hosseininasab, H.; Mostafaeipour, A. Developing an LSTM model to forecast the monthly water consumption according to the effects of the climatic factors in Yazd, Iran. Journal of Engineering Research 2023, 11, 100028. [Google Scholar] [CrossRef]
Guikema, S.; Flage, R. Digital twins as a security risk? Risk Analysis 2024. [Google Scholar] [CrossRef] [PubMed]
Wei, Y.; Law, A.W.K.; Yang, C. Real-Time Data-Processing Framework with Model Updating for Digital Twins of Water Treatment Facilities. Water 2022, 14, 3591. [Google Scholar] [CrossRef]

Figure 1. DTs in the WDS.

Figure 2. DTs Platform in the Water Distribution Networks

Figure 3. 2D Map of the village and communication signals

Figure 4. 3D Map of the village

Figure 5. Meteorology and Water Consumption Data

Figure 6. Correlation Matrix based on the parameters

Figure 7. Prophet Model with Regressors (six and eighteen months forecasting). Subfigure (a) shows the model’s short-term performance, while subfigure (b) highlights long-term trends and limitations.

Figure 8. Prophet Model with Regressors and Custom Seasonalities (six and eighteen months forecasting). Subfigure (a) shows improvements in short-term predictions, and subfigure (b) reflects better seasonal pattern detection.

Figure 9. Advanced Prophet Model with Lag Features, Rolling Means, and Custom Seasonalities (six and eighteen months forecasting). Subfigure (a) demonstrates short-term improvements, while subfigure (b) focuses on long-term trends.

Figure 10. Prophet Model with Advanced Feature Engineering (six and eighteen months forecasting). Subfigure (a) captures detailed seasonal trends, while subfigure (b) shows long-term prediction limitations.

Figure 11. XGBoost Model with Hyperparameter Tuning and Feature Engineering (six and eighteen months forecasting). Subfigure (a) shows precise short-term forecasts, while subfigure (b) captures long-term seasonal trends.

Figure 12. LightGBM Model with Feature Engineering (six and eighteen months forecasting). Subfigure (a) highlights short-term precision, while subfigure (b) illustrates long-term trend consistency.

Figure 13. Stacking Ensemble of XGBoost and LightGBM (six and eighteen months forecasting). Subfigure (a) shows short-term accuracy, while subfigure (b) highlights trend stability in long-term forecasts.

Figure 14. LSTM Neural Network (six and eighteen months forecasting). Subfigure (a) focuses on short-term accuracy, while subfigure (b) shows consistent long-term trends.

Figure 15. LSTM-GRU Hybrid Model with Additional Layers (six and eighteen months forecasting). Subfigure (a) improves short-term precision, while subfigure (b) focuses on long-term seasonal trends.

Figure 16. LSTM Neural Network with Rolling Mean Features (six and eighteen months forecasting). Subfigure (a) enhances short-term performance, while subfigure (b) focuses on seasonal patterns.

Figure 17. MV-LSTM Niknam (six and eighteen months forecasting). Subfigure (a) demonstrates short-term robustness, while subfigure (b) ensures long-term trend consistency.

Figure 18. Overview of key operational metrics in system performance analysis. Figure A shows the Completion Time, reflecting the efficiency of task completion across varying conditions. Figure B illustrates Delays and Penalties, representing the penalties incurred due to task delays and their impact on overall costs. Figure C presents data on CO₂ Emissions, highlighting environmental impacts related to system operations. Figure D provides insights into Fuel Consumption, tracking resource usage efficiency. Lastly, Figure E displays Efficiency and Utilization, summarizing the effectiveness of resource utilization in the system.

Table 3. Meteorological Variables from AEMET for Water Consumption Analysis

Variable	Description
Date	The date when the data was recorded (year-month-day format).
Tmed	Average air temperature in °C, calculated from daily max and min temps.
Prec	Total precipitation in millimeters accumulated during the day.
Tmin	Minimum air temperature in degrees Celsius recorded during the day.
Hourtmin	The time (hh:mm format) when the minimum air temperature was recorded.
Tmax	Maximum air temperature in degrees Celsius recorded during the day.
Houratmax	The time (hh:mm format) when the maximum air temperature was recorded.
Dir	Average wind direction, derived from 10-minute instantaneous recordings, in degrees.
Velmedia	Average wind speed, derived from 10-minute instantaneous recordings, in m/s.
Maxvel	Maximum wind speed in meters per second recorded during the day.
Hourracha	The time (hh:mm format) when the maximum wind speed was recorded.
Sun	Duration of sunshine in hours recorded during the day.
PresMax	Maximum atmospheric pressure in hectopascals recorded during the day.
HouraPresMax	The time (hh:mm format) when the maximum atmospheric pressure was recorded.
PresMin	Minimum atmospheric pressure in hectopascals recorded during the day.
HourPresMin	The time (hh:mm format) when the minimum atmospheric pressure was recorded.

Table 4. PCC(R) for the Effects of Climatic Factors on Water Consumption

Climatic Factor	tmax	tmed	tmin	prec	dir	velmedia	racha	Water Cons
tmax	1.000	0.980	0.884	-0.217	-0.013	-0.083	-0.057	0.683
tmed	0.980	1.000	0.959	-0.144	-0.024	-0.017	0.008	0.669
tmin	0.884	0.959	1.000	-0.030	-0.039	0.079	0.099	0.604
prec	-0.217	-0.144	-0.030	1.000	-0.077	0.330	0.395	-0.160
dir	-0.013	-0.024	-0.039	-0.077	1.000	-0.020	-0.064	0.034
velmedia	-0.083	-0.017	0.079	0.330	-0.020	1.000	0.800	-0.021
racha	-0.057	0.008	0.099	0.395	-0.064	0.800	1.000	0.009
Water Cons	0.683	0.669	0.604	-0.160	0.034	-0.021	0.009	1.000

Table 5. Comprehensive Comparison of Forecasting Models Across Different Time Frames

	6 Months			18 Months
Model	MAE	RMSE	MAPE	MAE	RMSE	MAPE
Prophet Basic	10.37	13.66	22.45%	19.70	28.68	22.45%
Prophet + Seasonality	12.39	14.91	22.45%	24.25	35.02	22.45%
Advanced Prophet	6.24	8.78	20.77%	11.14	18.02	22.34%
Prophet Adv. Engineering	5.76	8.31	18.61%	10.07	15.02	20.12%
XGBoost	7.02	8.74	24.93%	12.34	18.50	27.49%
LightGBM	5.90	8.25	19.64%	11.77	18.31	24.98%
Stacking XGBoost + LightGBM	6.57	8.70	22.45%	12.48	18.94	27.62%
LSTM	5.96	9.38	18.64%	12.63	20.66	25.61%
LSTM + GRU Hybrid	8.18	10.10	30.10%	14.64	22.32	34.06%
LSTM Rolling Mean Features	7.94	10.82	27.59%	12.33	20.57	24.67%
MV-LSTM [70]	5.91	10.03	15.48%	12.30	22.53	19.30%

Table 6. Task Dependencies

Dependency $(i, j)$	Description
(1, 3)	Task 3 depends on Task 1
(2, 4)	Task 4 depends on Task 2
(5, 3)	Task 3 depends on Task 5

Table 7. Example Tasks and Parameters

Task i	$p_{i}$ (hrs)	$f_{i}$ (L)	$c_{i}$ (kg)	${loc}_{i}$ (lat, lon)	${priority}_{i}$	$r_{i}$ (hrs)	${Vehicle}_{i}$	MaxPreemptions
1	2.0	5.0	13.2	(51.5074, -0.1278)	2	0	Van	1
2	1.5	3.5	9.24	(51.5155, -0.1410)	3	0	Van	2
3	2.5	6.0	15.84	(51.5237, -0.1585)	1	2	Small Truck	1
4	1.0	2.5	6.6	(51.5308, -0.1208)	4	0	Van	1
5	3.0	7.5	19.8	(51.4975, -0.1357)	5	1	Small Truck	2

Table 8. Travel Times and Distances Between Tasks

From Task i	To Task j	$d_{i j}$ (hrs)	$l_{i j}$ (km)
1	2	0.5	10
1	3	0.7	14
1	4	0.4	8
1	5	0.6	12
2	3	0.6	12
2	4	0.3	6
2	5	0.7	14
3	4	0.8	16
3	5	0.5	10
4	5	0.6	12

Table 9. Vehicle Types and Specifications

Vehicle Type v	${VE}_{v}$ (km/L)	${EF}_{v}$ (kg CO₂/L)
Van	12	2.64
Small Truck	8	2.68

Table 10. Calculated Fuel Consumption and CO₂ Emissions for Traveling

From Task i	To Task j	Vehicle	$l_{i j}$ (km)	$f_{i j}$ (L)	$c_{i j}$ (kg)
1	2	Van	10	$\frac{10}{12} \approx 0.83$	$0.83 \times 2.64 \approx 2.19$
1	3	Small Truck	14	$\frac{14}{8} = 1.75$	$1.75 \times 2.68 \approx 4.69$
1	4	Van	8	$\frac{8}{12} \approx 0.67$	$0.67 \times 2.64 \approx 1.77$
1	5	Small Truck	12	$\frac{12}{8} = 1.5$	$1.5 \times 2.68 = 4.02$

Table 11. Comparison of Proposed Scheduling Model Against Conventional Operator Methods

Metric	Conventional Method	Proposed Model	Improvement (%)
Total Completion Time ( $E [C_{\max}]$ )	180.58 hours	155.24 hours	14%
Delays and Penalties ( $E [D_{total}]$ )	17.5 hours	13.15 hours	25%
CO₂ Emissions ( $E [C_{total}]$ )	660.8 kg	545.7 kg	17%
Fuel Consumption ( $E [F_{total}]$ )	85.58 Litres	71.98 Litres	16%
Efficiency and Utilization ( $E [E_{eff}]$ )	86.17%	92.23%	7%

Table 12. Implemented Cybersecurity Strategies for CAUCCES Project based on ISO 27001 Compliance

Component	Cybersecurity Measure	Security Standard/Protocol
Cybersecurity for Smart Water Meters
Private Key Encryption	AES-128 encryption for data transmission, with private keys ensuring authorized decryption.	AES-128, LoRaWAN, NB-IoT
LoRaWAN and NB-IoT Compatibility	Supports secure, long-range data transmission suitable for rural areas.	LoRaWAN, NB-IoT
Data Logging and IP-68 Protection	IP-68 protection and internal datalogger for secure, resilient data storage.	IP-68, Data Integrity
Secure Data Transmission via Gateway and ChirpStack Platform
LoRaWAN Gateway	IP-67 gateway securely transmits encrypted data via MQTT to the cloud.	IP-67, MQTT, Data Integrity
ChirpStack Platform Security	ChirpStack manages devices with unique keys, MAC-based authentication, and data validation.	Device Authentication, MAC Address Validation
SSL/TLS Protocols	SSL/TLS secures all data transmission between services, ensuring encryption in transit.	SSL/TLS
Data Flow, Decryption, and Secure Transmission
Encrypted Data Transmission	Transmits data in AES-128 encrypted format via MQTT to cloud storage.	AES-128, MQTT, Data Encryption
Data Decryption	Decrypted and stored in main and backup databases for integrity and redundancy.	Data Integrity, Redundancy
Secure Access, API Integration, and Web Application Security
API Data Loading	Secure APIs (FastAPI, Django) with HTTPS prevent data interception.	HTTPS, API Security
Secure Login Platform	Two-step authentication secures access to platform data.	Two-Step Authentication
Password Encryption with Salting	Salting and hashing for secure password and sensitive data storage.	Salting, Hashing, Data Encryption
Database Security (PostgreSQL)	Access controls, encryption at rest, and regular audits protect database data.	Data Encryption, Access Control
Backup, Data Integrity, and ISO 27001 Compliance
Backup Protocols	Regular backups ensure data availability and integrity.	Data Backup, ISO 27001
ISO 27001 Compliance	Aligns cybersecurity measures with ISO 27001 standards; regular audits verify compliance.	ISO 27001
Data Privacy Compliance	GDPR-compliant protocols and audits for data privacy.	GDPR, Data Privacy
Real-Time Monitoring and Threat Detection
Zabbix Monitoring	Monitors network and infrastructure performance, detecting anomalies.	Real-Time Monitoring
Wazuh for Threat Detection	Detects intrusions and abnormal activity through log analysis.	Threat Detection, Log Analysis

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

From Task i	To Task j	$d_{i j}$ (hrs)	$l_{i j}$ (km)
1	2	0.5	10
1	3	0.7	14
1	4	0.4	8
1	5	0.6	12
2	3	0.6	12
2	4	0.3	6
2	5	0.7	14
3	4	0.8	16
3	5	0.5	10
4	5	0.6	12

From Task i	To Task j	$d_{i j}$ (hrs)	$l_{i j}$ (km)
1	2	0.5	10
1	3	0.7	14
1	4	0.4	8
1	5	0.6	12
2	3	0.6	12
2	4	0.3	6
2	5	0.7	14
3	4	0.8	16
3	5	0.5	10
4	5	0.6	12