Submitted:
19 May 2025
Posted:
20 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. Data and the Case Study

| Sub-catchment | Terminates at Joint | Population | Wastewater production (L/s) | Infiltration (L/s) |
|---|---|---|---|---|
| S1 | J1 | 1200 | 1.94 | 0.49 |
| S2 | J9 | 700 | 1.13 | 0.28 |
| S3 | J2 | 700 | 1.13 | 0.28 |
| S4 | J3 | 1200 | 1.94 | 0.49 |
| S5 | J12 | 1300 | 2.11 | 0.53 |
| S6 | J10 | 1200 | 1.94 | 0.49 |
| S7 | J4 | 900 | 1.46 | 0.36 |
| S8 | J7 | 1200 | 1.94 | 0.49 |
| S9 | J5 | 1000 | 1.62 | 0.41 |
| S10 | J8 | 1000 | 1.62 | 0.41 |
| S11 | J6 | 900 | 1.46 | 0.36 |
| Total | WWTP | 11300 | 18.31 | 4.58 |
2.2. Gaussian Process Regression
2.3. Constructing Forecasting Model
2.4. Kernel Design
2.5. Anomaly Detection
2.6. Model Checking
2.6.1. Root Mean Square Error
2.6.2. Coverage
2.6.3. Entropy
3. Results and Discussion
3.1. Forecasting the WWTP Influent



3.2. Forecasting Node Surcharges and Overflows



| Surcharge event time | Node depth (cm) | Probability of surcharge (%) |
|---|---|---|
| 2007-07-01 04:03 | 69.87 | 18.57 |
| 2007-07-04 16:03 | 200.96 | 100.0 |
| 2007-07-04 17:03 | 53.41 | 18.87 |
| 2007-07-04 18:03 | 57.83 | 29.83 |
| 2007-07-04 20:03 | 57.31 | 26.03 |
| 2007-07-05 19:03 | 74.39 | 46.10 |
| 2007-07-13 13:03 | 73.24 | 67.287 |
| 2007-07-13 14:03 | 56.46 | 11.23 |
| 2007-07-13 15:03 | 128.68 | 41.43 |
| 2007-07-13 16:03 | 74.78 | 45.86 |
| 2007-07-13 17:03 | 76.77 | 46.95 |
| 2007-07-13 21:03 | 71.39 | 23.23 |
| 2007-07-13 22:03 | 61.70 | 9.03 |
| 2007-07-15 12:03 | 95.31 | 100 |
| 2007-07-15 14:03 | 79.16 | 55.31 |
| 2007-07-15 15:03 | 75.72 | 100 |
| 2007-07-20 18:03 | 249.72 | 99.97 |
| 2007-07-20 19:03 | 59.88 | 47.38 |
| 2007-07-25 18:03 | 68.676 | 99.97 |
| 2007-07-26 14:03 | 250 | 99.99 |
| 2007-07-26 15:03 | 53.65 | 34.08 |
| 2007-07-26 19:03 | 169.6 | 100 |
| 2007-08-14 05:03 | 249.65 | 99.99 |
| 2007-08-15 10:03 | 110.05 | 100 |
3.3. Anomaly Detection

3.4. Prediction with Limited Data

3.5. Further Discussion
5. Conclusions
- I.
- A probabilistic forecasting model can take different uncertainties into account and provides a likelihood of happening which seems to be more realistic than presenting a single value as a definite answer.
- II.
- In sewer flow prediction, taking time and precipitation as the main inputs of the model makes the model reliable. In contrast, having a single-input model cannot effectively reflect flow changes during wet-weather periods. Also, adding new inputs to the model only adds to the dimension of the covariance matrix which is directly related to the computational cost of the GPs.
- III.
- Designing a proper kernel is crucial for making a good forecast. The model can do it with maximising the log marginal likelihood; however, defining some metrics like RMSE, coverage and differential entropy can help in finding the best kernel setting.
- IV.
- GPR can respond well to minimal training datasets and make reliable predictions. On the other hand, high number of data points can reduce the model’s speed and make it computationally inefficient. In such cases, sparsification methods like SVGP can come helpful.
- V.
- Outliers may appear in every prediction, often due to a lack of wide range in training datasets, especially when dealing with precipitation data. In such cases, increasing the length of the training period or defining physical constraints help.
- VI.
- Real-time control has been traditionally done with deterministic models. Now, a probabilistic method is showing a reliable real-time prediction which can be applied by water companies in many real-world cases with natural uncertainties.
- VII.
- The surcharge, overflow and CSO forecasting significantly contribute to an enhanced environment and public health—key objectives for every water-related research.
- VIII.
- This probabilistic approach equips decision-makers to act on sewer risks with greater confidence, supporting resilient infrastructure planning and reducing threats to public health and the environment.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Acknowledgment
Abbreviations
| CSO | Combines Sewer Overflow |
| OFWAT | Water Service Regulation Authority |
| ANN | Artificial Neural Network |
| SVM | Support Vector Machine |
| GP | Gaussian Process |
| GPR | Gaussian Processes Regression |
| LPCD | Litre per Capita per Day |
| SWMM | Storm Water Management Model |
| DWF | Dry-weather Flow |
| WWF | Wet-weather Flow |
| CCTV | Closed-circuit Television |
| RMSE | Root Mean Square Error |
| SVGP | Sparse Variational Gaussian Processes |
References
- Walski, T.M.; Barnard, T.E. Wastewater Collection System Modeling and Design; Haestad Press: Waterbury, CT, USA, 2004.
- Owolabi, T.A.; Mohandes, S.R.; Zayed, T. Investigating the impact of sewer overflow on the environment: A comprehensive literature review paper. J. Environ. Manag. 2022, 301, 113810. [CrossRef]
- Perry, W.B.; Ahmadian, R.; Munday, M.; Jones, O.; Ormerod, S.J.; Durance, I. Addressing the challenges of combined sewer overflows. Environ. Pollut. 2024, 343, 123225. [CrossRef]
- Faris, N.; Zayed, T.; Aghdam, E.; Fares, A.; Alshami, A. Real-time sanitary sewer blockage detection system using IoT. Measurement 2024, 226, 114146. [CrossRef]
- Arthur, S.; Crow, H.; Pedezert, L. Understanding blockage formation in combined sewer networks. Proc. Inst. Civ. Eng.-Water Manag. 2008, 161, 215–221. [CrossRef]
- Balla, K.M.; Bendtsen, J.D.; Schou, C.; Kallesøe, C.S.; Ocampo-Martinez, C. A learning-based approach towards the data-driven predictive control of combined wastewater networks–An experimental study. Water Res. 2022, 221, 118782. [CrossRef]
- Perez, G.; Gomez-Velez, J.D.; Grant, S.B. The sanitary sewer unit hydrograph model: A comprehensive tool for wastewater flow modeling and inflow-infiltration simulations. Water Res. 2024, 249, 120997. [CrossRef]
- Zhang, Q.; Li, Z.; Snowling, S.; Siam, A.; El-Dakhakhni, W. Predictive models for wastewater flow forecasting based on time series analysis and artificial neural network. Water Sci. Technol. 2019, 80, 243–253. [CrossRef]
- Li, S.; Tian, W.; Yan, H.; Zeng, W.; Tao, T.; Xin, K. Modeling transient mixed flows in sewer systems with data fusion via physics-informed machine learning. Water Res. X 2024, 25, 100266. [CrossRef]
- Stieglitz, M.; Hobbie, J.; Giblin, A.; Kling, G. Hydrologic modeling of an arctic tundra watershed: Toward Pan-Arctic predictions. J. Geophys. Res.-Atmos. 1999, 104, 27507–27518. [CrossRef]
- Ge, J.; Li, J.; Qiu, R.; Shi, T.; Zhang, C.; Huang, Z.; Yuan, Z. A data-driven method for estimating sewer inflow and infiltration based on temperature and conductivity monitoring. Water Res. 2024, 261, 122002. [CrossRef]
- Donnelly, J.; Daneshkhah, A.; Abolfathi, S. Forecasting global climate drivers using Gaussian processes and convolutional autoencoders. Eng. Appl. Artif. Intell. 2024, 128, 107536. [CrossRef]
- Machac, D.; Reichert, P.; Rieckermann, J.; Albert, C. Fast mechanism-based emulator of a slow urban hydrodynamic drainage simulator. Environ. Model. Softw. 2016, 78, 54–67. [CrossRef]
- Troutman, S.C.; Schambach, N.; Love, N.G.; Kerkez, B. An automated toolchain for the data-driven and dynamical modeling of combined sewer systems. Water Res. 2017, 126, 88–100. [CrossRef]
- Aliashrafi, A.; Zhang, Y.; Groenewegen, H.; Vanrolleghem, P.A. A Review of Data-Driven Modelling in Drinking Water Treatment. Rev. Environ. Sci. Biotechnol. 2021, 20, 985–1009. [CrossRef]
- Swiler, L.P.; Gulian, M.; Frankel, A.L.; Safta, C.; Jakeman, J.D. A survey of constrained Gaussian process regression: Approaches and implementation challenges. J. Mach. Learn. Model. Comput. 2020, 1, 119-156. [CrossRef]
- Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [CrossRef]
- Thorndahl, S.; Willems, P. Probabilistic modelling of overflow, surcharge and flooding in urban drainage using the first-order reliability method and parameterization of local rain series. Water Res. 2008, 42, 455–466. [CrossRef]
- Breinholt, A. Uncertainty in Prediction and Simulation of Flow in Sewer Systems. Ph.D. Thesis, Technical University of Denmark, Kongens Lyngby, Denmark, 2012.
- Raimondi, A.; Sanfilippo, U.; Becciu, G. Uncertainty on flow rate and temperature measurement for the detection of illicit flows in sewers. J. Hydrol. 2024, 632, 130891. [CrossRef]
- Sriwastava, A.K.; Tait, S.; Schellart, A.; Kroll, S.; Van Dorpe, M.; Van Assel, J.; Shucksmith, J. Quantifying uncertainty in simulation of sewer overflow volume. J. Environ. Eng. 2018, 144, 04018050. [CrossRef]
- Ding, C.; Rappel, H.; Dodwell, T. Full-field order-reduced Gaussian Process emulators for nonlinear probabilistic mechanics. Comput. Methods Appl. Mech. Eng. 2023, 405, 115855. [CrossRef]
- Wang, J. An intuitive tutorial to Gaussian process regression. Comput. Sci. Eng. 2023, 25, 4–11. [CrossRef]
- Ding, C.; Chen, Y.; Rappel, H.; Dodwell, T. Functional order-reduced Gaussian Processes based machine-learning emulators for probabilistic constitutive modelling. Compos. Part A Appl. Sci. Manuf. 2023, 173, 107695. [CrossRef]
- Gonzalvez, J.; Lezmi, E.; Roncalli, T.; Xu, J. Financial applications of Gaussian processes and Bayesian optimization. arXiv 2019, arXiv:1903.04841. [CrossRef]
- Roberts, S.; Osborne, M.; Ebden, M.; Reece, S.; Gibson, N.; Aigrain, S. Gaussian processes for time-series modelling. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2013, 371, 20110550. [CrossRef]
- Sun, A.Y.; Wang, D.; Xu, X. Monthly streamflow forecasting using Gaussian process regression. J. Hydrol. 2014, 511, 72–81. [CrossRef]
- Pastrana-Cortés, J.D.; Gil-Gonzalez, J.; Álvarez-Meza, A.M.; Cárdenas-Peña, D.A.; Orozco-Gutiérrez, Á.A. Scalable and interpretable forecasting of hydrological time series based on variational Gaussian processes. Water 2024, 16, 2006. [CrossRef]
- Samuelsson, O.; Björk, A.; Zambrano, J.; Carlsson, B. Gaussian process regression for monitoring and fault detection of wastewater treatment processes. Water Sci. Technol. 2017, 75, 2952–2963. [CrossRef]
- Ng, J.Y.; Fazlollahi, S.; Dechesne, M.; Soyeux, E.; Galelli, S. Robust optimal design of urban drainage systems: A data-driven approach. Adv. Water Resour. 2023, 171, 104335. [CrossRef]
- Wang, Y.; Ocampo-Martinez, C.; Puig, V. Stochastic model predictive control based on Gaussian processes applied to drinking water networks. IET Control Theory Appl. 2016, 10, 947–955. [CrossRef]
- Grbić, R.; Kurtagić, D.; Slišković, D. Stream water temperature prediction based on Gaussian process regression. Expert Syst. Appl. 2013, 40, 7407–7414. [CrossRef]
- Bonakdari, H.; Ebtehaj, I.; Samui, P.; Gharabaghi, B. Lake water-level fluctuations forecasting using minimax probability machine regression, relevance vector machine, Gaussian process regression, and extreme learning machine. Water Resour. Manag. 2019, 33, 3965–3984. [CrossRef]
- Sweetapple, C.; Webber, J.; Hastings, A.; Melville-Shreeve, P. Realising smarter stormwater management: A review of the barriers and a roadmap for real world application. Water Res. 2023, 120505. [CrossRef]
- Patil, R.R.; Calay, R.K.; Mustafa, M.Y.; Ansari, S.M. AI-driven high-precision model for blockage detection in urban wastewater systems. Electronics 2023, 12, 3606. [CrossRef]
- Rosin, T.R.; Kapelan, Z.; Keedwell, E.; Romano, M. Near real-time detection of blockages in the proximity of combined sewer overflows using evolutionary ANNs and statistical process control. J. Hydroinf. 2022, 24, 259–273. [CrossRef]
- Jimoh, M.; Abolfathi, S. Modelling pollution transport dynamics and mixing in square manhole overflows. J. Water Process Eng. 2022, 45, 102491. [CrossRef]
- Li, N.; Wang, X.; Li, Z.; Zhao, F.; Nair, A.; Zhang, J.; Liu, C. Real-time identification and positioning of sewer blockage based on liquid level analysis in rural area. Processes 2023, 11, 161. [CrossRef]
- Kargar, K.; Joksimovic, D. Analysis of sewer blockage causes using open data. Water Pract. Technol. 2024, 19, 3855–3866. [CrossRef]
- Rossman, L.A.; Simon, M.A. Storm Water Management Model User’s Manual Version 5.2; US Environmental Protection Agency: Cincinnati, OH, USA, 2022.
- Prodanovic, P.; Simonovic, S.P. Development of Rainfall Intensity Duration Frequency Curves for the City of London under the Changing Climate; Department of Civil and Environmental Engineering, The University of Western Ontario: London, ON, Canada, 2007.
- Rezaee, M.; Tabesh, M. Effects of inflow, infiltration, and exfiltration on water footprint increase of a sewer system: A case study of Tehran. Sustain. Cities Soc. 2022, 79, 103707. [CrossRef]
- Zeydalinejad, N.; Javadi, A.A.; Webber, J.L. Global perspectives on groundwater infiltration to sewer networks: A threat to urban sustainability. Water Res. 2024, 262, 122098. [CrossRef]
- McDonnell, B.E.; Ratliff, K.; Tryby, M.E.; Wu, J.J.X.; Mullapudi, A. PySWMM: The python interface to stormwater management model (SWMM). J. Open Source Softw. 2020, 5, 1-3. [CrossRef]
- Stovin, V. Mappin Green Roof Test Bed Rainfall and Runoff Data 2007; The University of Sheffield: Sheffield, UK, 2024. [CrossRef]
- Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006.
- Deshpande, S.; Rappel, H.; Hobbs, M.; Bordas, S.P.; Lengiewicz, J. Gaussian process regression + deep neural network autoencoder for probabilistic surrogate modeling in nonlinear mechanics of solids. Comput. Methods Appl. Mech. Eng. 2025, 437, 117790. [CrossRef]
- Rappel, H.; Beex, L.A.; Hale, J.S.; Noels, L.; Bordas, S.P. A tutorial on Bayesian inference to identify material parameters in solid mechanics. Arch. Comput. Methods Eng. 2020, 27, 361–385. [CrossRef]
- Duvenaud, D. Automatic Model Construction with Gaussian Processes. Ph.D. Thesis, University of Cambridge, Cambridge, UK, 2014.
- Wilson, A.G. Covariance Kernels for Fast Automatic Pattern Discovery and Extrapolation with Gaussian Processes. Ph.D. Thesis, University of Cambridge, Cambridge, UK, 2014.
- Matthews, A.G.; Van Der Wilk, M.; Nickson, T.; Fujii, K.; Boukouvalas, A.; Le, P.; Ghahramani, Z.; Hensman, J. GPflow: A Gaussian process library using TensorFlow. J. Mach. Learn. Res. 2017, 18, 1–6. https://www.jmlr.org/papers/v18/16-537.html.
- Duvenaud, D.; Lloyd, J.; Grosse, R.; Tenenbaum, J.; Ghahramani, Z. Structure Discovery in Nonparametric Regression through Compositional Kernel Search. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 26 May 2013; pp. 1166–1174.
- Kuok, S.C.; Yao, S.A.; Yuen, K.V.; Yan, W.J.; Girolami, M. Bayesian generative kernel Gaussian process regression. Mech. Syst. Signal Process. 2025, 227, 112395. [CrossRef]
- Abramowitz, M.; Stegun, I.A., Eds. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables; US Government Printing Office: Washington, DC, USA, 1948.
- Gilks, W.R.; Richardson, S.; Spiegelhalter, D., Eds. Markov Chain Monte Carlo in Practice; CRC Press: Boca Raton, FL, USA, 1995.
- Malde, S. Gaussian Process Emulators in Coastal Wave Modelling. Ph.D. Thesis, University of Sheffield, Sheffield, UK, 2018.
- Gelman, A.; Carlin, J.B.; Stern, H.S.; Rubin, D.B. Bayesian Data Analysis, 3rd ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 2021.
- Hyndman, R.J.; Koehler, A.B. Another look at measures of forecast accuracy. Int. J. Forecast. 2006, 22, 679–688. [CrossRef]
- Hasegawa, Y.; Nishiyama, T. Thermodynamic entropic uncertainty relation. arXiv 2025, arXiv:2502.06174. [CrossRef]
- MacKay, D.J. Information Theory, Inference and Learning Algorithms; Cambridge University Press: Cambridge, UK, 2003.
- Liu, H.; Ong, Y.S.; Shen, X.; Cai, J. When Gaussian process meets big data: A review of scalable GPs. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 4405–4423. [CrossRef]
- Roghani, B.; Cherqui, F.; Ahmadi, M.; Le Gauffre, P.; Tabesh, M. Dealing with uncertainty in sewer condition assessment: Impact on inspection programs. Autom. Constr. 2019, 103, 117–126. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).