Submitted:
20 May 2025
Posted:
21 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Material and Methods
2.1. Dataset Description
2.1.1. Data Cleaning and Preprocessing
2.2. Weighted Arithmetic Water Quality Index
- is the sub-index quality rating (scaled between 0 and 100).
- is the weight assigned to each variable.
2.2.1. Quality Rating Calculation
pH quality rating (), equation 2
- = observed pH value
- Ideal pH = 7.0 (neutral)
- Standard maximum = 8.5
Inverse scaled TDS quality rating (), equation 3
- = observed TDS value
- Ideal TDS = 0 mg/L
- Standard maximum = 500 mg/L
- Higher TDS indicates worse quality (inverse scaling)
Temperature quality rating (), equation 4
- = observed water temperature
- Ideal temperature = 26°C
- Acceptable range = 24–27°C
- Deviation from 26°C results in a penalty
2.2.2. Weight Calculation
- = standard limit for each variable
- k = proportionality constant (set to 1 for simplicity)
- (deviation range from 26°C)
2.3. Kolmogorov–Arnold Networks: Theoretical Foundations
- - the target multivariate function to approximate.
- - the p-th input variable.
- - continuous univariate inner functions, applied to each input variable.
- - continuous univariate outer functions, combining the intermediate results.
- n - the dimensionality of the input space.
2.4. KAN Implementation Details
- Original model (KAN): Two layers configured as , , using all three input variables (pH, TDS, temperature).
- Pruned model (KAN′): Two layers configured as , , using only pH and temperature after feature sparsification.
2.4.1. Data Split
- Training set: 6 complete days of data ( samples).
- Validation set: Approximately 1.5 days ( samples).
- Test set: Approximately 1.5 days ( samples).
2.5. Overview of the Proposed Methodology
3. Results and Discussion
3.1. Architectures of the Original and Pruned KAN Models
3.1.1. Exploratory Analysis of Variable Relationships
3.2. Model Inference and Symbolic Expression Evaluation
3.2.1. Symbolic Formulas
3.2.2. Performance Evaluation
- The pruned model, trained with reduced input variables, demonstrates enhanced accuracy compared to the original model.
- The pruned model shows slightly better generalization on the test set, as evidenced by a higher and lower MAE and RMSE values. This highlights the capability of KAN models to effectively eliminate less relevant variables while maintaining or improving predictive performance.
- The symbolic expressions obtained from both models (original and pruned) offer improved interpretability and enable the optimization of sensor deployment by reducing the number of measurements required. This practical advantage translates into lower equipment, installation, and maintenance costs, while achieving better predictive accuracy in the case study analyzed in this work.
4. Conclusions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljačić, M.; Hou, T.Y.; Tegmark, M. KAN: Kolmogorov-Arnold Networks, 2024. [CrossRef]
- Liu, Z.; Ma, P.; Wang, Y.; Matusik, W.; Tegmark, M. KAN 2.0: Kolmogorov-Arnold Networks Meet Science, 2024. [CrossRef]
- Ji, T.; Hou, Y.; Zhang, D. A Comprehensive Survey on Kolmogorov Arnold Networks (KAN), 2024. [CrossRef]
- Gilbert Zequera, R.A.; Rassõlkin, A.; Vaimann, T.; Kallaste, A. Kolmogorov-Arnold networks for algorithm design in battery energy storage system applications. Energy Reports 2025, 13, 2664–2677. [Google Scholar] [CrossRef]
- Huang, Y.; Li, B.; Wu, Z.; Liu, W. Symbolic Regression Based on Kolmogorov–Arnold Networks for Gray-Box Simulation Program with Integrated Circuit Emphasis Model of Generic Transistors. Electronics 2025, 14, 1161. [Google Scholar] [CrossRef]
- Siswanto, B.; Dani, Y.; Morika, D.; Mardiyana, B. A simple dataset of water quality on aquaponic fish ponds based on an internet of things measurement device. Data in Brief 2023, 48, 109248. [Google Scholar] [CrossRef]
- Uddin, M.G.; Nash, S.; Olbert, A.I. A review of water quality index models and their use for assessing surface water quality. Ecological Indicators 2021, 122, 107218. [Google Scholar] [CrossRef]
- Chidiac, S.; El Najjar, P.; Ouaini, N.; El Rayess, Y.; El Azzi, D. A comprehensive review of water quality indices (WQIs): history, models, attempts and perspectives. Reviews in Environmental Science and Bio/Technology 2023, 22, 349–395. [Google Scholar] [CrossRef]
- Chicco, D.; Warrens, M.J.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science 2021, 7, e623. [Google Scholar] [CrossRef]
- Krishnamurthi, R.; Ku"March", A.; Gopinathan, D.; Nayyar, A.; Qureshi, B. An Overview of IoT Sensor Data Processing, Fusion, and Analysis Techniques. Sensors 2020, 20, 6076. [Google Scholar] [CrossRef]
- Liu, Y.; Dillon, T.; Yu, W.; Rahayu, W.; Mostafa, F. Missing Value Imputation for Industrial IoT Sensor Data With Large Gaps. IEEE Internet of Things Journal 2020, 7, 6855–6867. [Google Scholar] [CrossRef]
- França, C.M.; Couto, R.S.; Velloso, P.B. Missing Data Imputation in Internet of Things Gateways. Information 2021, 12, 425. [Google Scholar] [CrossRef]
- Cao, L. Practical Issues in Implementing a Single-Pole Low-Pass IIR Filter [Applications Corner]. IEEE Signal Processing Magazine 2010, 27, 114–117. [Google Scholar] [CrossRef]
- Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Networks 1989, 2, 359–366. [Google Scholar] [CrossRef]
- Leshno, M.; Lin, V.Y.; Pinkus, A.; Schocken, S. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks 1993, 6, 861–867. [Google Scholar] [CrossRef]
- Schmidt-Hieber, J. The Kolmogorov–Arnold representation theorem revisited. Neural Networks 2021, 137, 119–126. [Google Scholar] [CrossRef]
- Vaca-Rubio, C.J.; Blanco, L.; Pereira, R.; Caus, M. Kolmogorov-Arnold Networks (KANs) for Time Series Analysis, 2024. [CrossRef]
- Abueidda, D.W.; Pantidis, P.; Mobasher, M.E. DeepOKAN: Deep operator network based on Kolmogorov Arnold networks for mechanics problems. Computer Methods in Applied Mechanics and Engineering 2025, 436, 117699. [Google Scholar] [CrossRef]
- Perperoglou, A.; Sauerbrei, W.; Abrahamowicz, M.; Schmid, M. A review of spline function procedures in R. BMC Med. Res. Methodol. 2019, 19, 46. [Google Scholar] [CrossRef]
- Hasan, M.S.; Alam, M.N.; Fayz-Al-Asad, M.; Muhammad, N.; Tunç, C. B-spline curve theory: An overview and applications in real life. Nonlinear Engineering 2024, 13, 20240054. [Google Scholar] [CrossRef]
- Xu, Y.; Goodacre, R. On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning. Journal of Analysis and Testing 2018, 2, 249–262. [Google Scholar] [CrossRef]
- Joseph, V.R. Optimal ratio for data splitting. Statistical Analysis and Data Mining: The ASA Data Science Journal 2022, 15, 531–538. [Google Scholar] [CrossRef]
- Kapoor, S.; Narayanan, A. Leakage and the reproducibility crisis in machine-learning-based science. Patterns 2023, 4, 100804. [Google Scholar] [CrossRef]
- Abbasian Dehkordi, S.; Farajzadeh, K.; Rezazadeh, J.; Farahbakhsh, R.; Sandrasegaran, K.; Abbasian Dehkordi, M. A survey on data aggregation techniques in IoT sensor networks. Wireless Networks 2019, 26, 1243–1263. [Google Scholar] [CrossRef]
- Zhou, H.; Wang, X.; Zhu, R. Feature selection based on mutual information with correlation coefficient. Applied Intelligence 2022, 52, 5457–5474. [Google Scholar] [CrossRef]
- Do, D.D.; Le, A.H.; Vu, V.V.; Le, D.A.N.; Bui, H.M. Evaluation of water quality and key factors influencing water quality in intensive shrimp farming systems using principal component analysis-fuzzy approach. Desalination and Water Treatment 2025, 321, 101002. [Google Scholar] [CrossRef]
- Ferreira, N.; Bonetti, C.; Seiffert, W. Hydrological and Water Quality Indices as management tools in marine shrimp culture. Aquaculture 2011, 318, 425–433. [Google Scholar] [CrossRef]
- Tallar, R.Y.; Suen, J.P. Aquaculture Water Quality Index: a low-cost index to accelerate aquaculture development in Indonesia. Aquaculture International 2015, 24, 295–312. [Google Scholar] [CrossRef]





| Description | Units |
|---|---|
| Timestamp recorded | DateTime |
| Water pH | pH units |
| TDS concentration | mg/L (ppm) |
| Water temperature | °C (Celsius) |
| Model | R² Score | MAE | RMSE |
|---|---|---|---|
| Original Model (inference) | 0.970 | 1.236 | 2.203 |
| Symbolic Prediction - Original Model | 0.982 | 0.806 | 1.722 |
| Pruned Model (inference) | 0.987 | 0.520 | 1.471 |
| Symbolic Prediction - Pruned Model | 0.987 | 0.520 | 1.471 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).