2.1. Study Area
The study area comprises Tiruppur District, situated in the western part of Tamil Nadu, India, between approximately 11.0°–11.5° N latitude and 77.0°–77.6° E longitude. The district lies within a semi-arid climatic zone, characterized by high inter-annual rainfall variability, frequent dry spells, and increasing temperature trends. Average annual rainfall is primarily governed by the southwest and northeast monsoons, with recharge largely concentrated during short, intense precipitation events.
Hydro-geologically, Tiruppur is dominated by hard-rock formations, mainly consisting of weathered and fractured crystalline rocks. Groundwater occurrence in such terrains is highly heterogeneous and is largely confined to shallow weathered zones and deeper fracture networks. The limited storage capacity of these aquifers, coupled with low natural recharge rates, makes groundwater availability extremely sensitive to variations in rainfall intensity, duration, and seasonal distribution. Consequently, groundwater levels exhibit pronounced temporal fluctuations and spatial variability across the district.
Tiruppur is internationally recognized as the “Knitwear Capital of India”, hosting a dense concentration of textile dyeing and processing industries. Rapid industrialization, combined with urban expansion and agricultural water demand, has led to persistent over-extraction of groundwater for industrial, domestic, and irrigation purposes. In the absence of perennial surface water sources, groundwater remains the primary source of water supply, further intensifying stress on aquifer systems.
In addition to quantitative depletion, groundwater quality deterioration has emerged as a critical concern. Discharge of untreated or partially treated textile effluents, rich in dissolved salts and chemical residues, has resulted in elevated levels of Total Dissolved Solids (TDS), electrical conductivity, and salinity in several parts of the district. These impacts are exacerbated by reduced dilution during low-recharge periods, particularly under prolonged drought conditions.
Climate change further compounds groundwater vulnerability in the region by altering monsoon patterns, increasing evapotranspiration, and amplifying the frequency of extreme hydrological events. These combined hydrogeological, industrial, and climatic factors make Tiruppur District a representative and high-risk case study for evaluating advanced, data-driven groundwater forecasting methodologies. The complex and non-stationary nature of groundwater dynamics in this region provides an ideal testbed for assessing the effectiveness of hybrid deep learning models under climate-induced stress conditions.
Figure 1.
Spatial distribution of the comprehensive groundwater monitoring network in Tiruppur District.
Figure 1.
Spatial distribution of the comprehensive groundwater monitoring network in Tiruppur District.
The map illustrates the locations of groundwater monitoring wells operated by the Central Ground Water Board (CGWB) and State agencies, including both manual and telemetry-based observation wells. The red boundary indicates the district extent.
Figure 2.
Block-wise distribution of groundwater monitoring wells in Tiruppur District.
Figure 2.
Block-wise distribution of groundwater monitoring wells in Tiruppur District.
The figure shows the spatial density of monitoring wells across administrative blocks, highlighting variations in monitoring coverage within the district boundary.
Figure 3.
Distribution of telemetry-based groundwater monitoring wells in Tiruppur District.
Figure 3.
Distribution of telemetry-based groundwater monitoring wells in Tiruppur District.
The figure highlights the spatial locations of real-time groundwater observation wells operated by monitoring agencies, supporting high-resolution temporal analysis.
2.2. Data Collection and Sources
The increasing deployment of real-time groundwater monitoring systems has enabled the collection of high-frequency groundwater level data suitable for data-driven modeling approaches. In recent years, IoT-based and telemetry-enabled groundwater observation networks have been successfully utilized to support machine learning–based groundwater level prediction [
2]. In this study, groundwater level and quality data from both manual and telemetry-based monitoring wells operated by central and state agencies were integrated with meteorological observations to ensure comprehensive spatial and temporal coverage.
2.2.1. Groundwater Level Data
Groundwater level data were obtained from the Water Resources Department (WRD), Government of Tamil Nadu, covering a continuous period from 1994 to 2024. The dataset consists of monthly observations of depth to groundwater level (meters below ground level) collected from a network of observation wells distributed across Tiruppur District. Each record is accompanied by spatial attributes, including well identification number, geographic coordinates (latitude and longitude), administrative boundaries (taluk and village), and well type (dug wells and bore wells).
This long-term dataset captures seasonal recharge patterns, inter-annual variability, and long-term depletion trends driven by industrial abstraction and land-use change. The extended temporal coverage makes the dataset particularly suitable for deep learning–based time-series modeling and for assessing non-stationary groundwater behavior under evolving climatic conditions.
2.2.2. Groundwater Quality Data
Groundwater quality data were collected from the Tamil Nadu Water Supply and Drainage (TWAD) Board and the Central Ground Water Board (CGWB) for the period 2017–2023. The dataset includes key physicochemical parameters that indicate groundwater suitability for domestic and agricultural use, such as pH, electrical conductivity (EC), and total dissolved solids (TDS). In addition, major ionic constituents—including calcium (Ca), magnesium (Mg), sodium (Na), chloride (Cl), sulfate (SO), and bicarbonate (HCO)—were considered.
Derived indices such as Sodium Adsorption Ratio (SAR) and Residual Sodium Carbonate (RSC) were computed to assess salinity hazards and irrigation suitability. These parameters provide critical insight into groundwater quality degradation resulting from industrial effluent discharge, excessive abstraction, and reduced natural dilution during low-recharge periods. The inclusion of groundwater quality data allows the proposed framework to evaluate both quantitative and qualitative aspects of groundwater sustainability.
2.2.3. Meteorological and Climate Data
Meteorological data were obtained from the Indian Meteorological Department (IMD) and include daily observations of rainfall (mm), maximum and minimum temperature (°C), and relative humidity (%) for the period 1994–2024. These variables are key climatic drivers influencing groundwater recharge, evapotranspiration, and seasonal water-level fluctuations. Daily records were aggregated to a monthly temporal resolution to ensure consistency with groundwater observations.
To assess future groundwater behavior under climate change, climate projection data from the Coupled Model Intercomparison Project Phase 6 (CMIP6) were incorporated. Projections were considered under multiple Shared Socioeconomic Pathways (SSPs), including SSP2-4.5 (moderate emissions) and SSP5-8.5 (high emissions) scenarios. These datasets provide downscaled estimates of future precipitation and temperature patterns and enable long-term groundwater forecasting under alternative climate trajectories.
2.2.4. Data Integration and Temporal Alignment
All datasets were harmonized by resampling to a common monthly temporal resolution and aligned spatially using geographic coordinates. This integration facilitates seamless fusion of groundwater, quality, and climatic variables within the modeling framework.
A summary of the datasets, sources, temporal coverage, and parameters is presented in
Table 1.
Figure 4.
Overall architecture of the proposed hybrid groundwater forecasting framework.
Figure 4.
Overall architecture of the proposed hybrid groundwater forecasting framework.
Schematic representation of the integrated framework illustrating data acquisition, preprocessing, signal decomposition, deep learning model development, adaptive ensemble learning, and forecast visualization.
2.3. Data Preprocessing and Cleaning
Data preprocessing is a critical step in developing reliable machine learning and deep learning models, particularly when working with heterogeneous hydroclimatic datasets characterized by missing values, measurement noise, and temporal inconsistencies. In this study, a comprehensive preprocessing pipeline was implemented to ensure data quality, consistency, and suitability for advanced time-series modeling.
2.3.1. Handling Missing and Invalid Data
Groundwater level and quality datasets contained missing observations due to intermittent monitoring, inaccessible wells, or instrument malfunction. Missing values in continuous time-series records were addressed using linear interpolation and forward-filling techniques, ensuring continuity while preserving temporal trends. Records containing non-numeric or invalid entries (e.g., “dry well” or “blocked”) were identified and either encoded appropriately or removed based on data completeness criteria. This approach minimized information loss while preventing the introduction of artificial bias.
2.3.2. Temporal Resampling and Alignment
To enable seamless integration of groundwater, quality, and meteorological datasets, all variables were resampled to a monthly temporal resolution. Daily meteorological observations were aggregated using monthly totals for rainfall and monthly averages for temperature and humidity. Temporal alignment ensured that each time step contained a consistent set of explanatory and target variables, facilitating effective learning of temporal dependencies by recurrent neural networks.
2.3.3. Unit Standardization and Data Transformation
Data collected from multiple agencies often exhibit variations in measurement units and reporting formats. To ensure uniformity, all parameters were converted to consistent units prior to modeling. For example, electrical conductivity values were standardized to µS/cm, and groundwater levels were expressed in meters below ground level. Logarithmic transformation was applied to selected skewed variables, such as total dissolved solids, to stabilize variance and improve model learning.
2.3.4. Normalization and Scaling
Feature scaling was applied to mitigate the influence of differing variable magnitudes and to enhance model convergence during training. All continuous variables were normalized using min–max scaling, mapping feature values into the range [0, 1]. This normalization strategy is particularly suitable for deep learning architectures, such as LSTM and CNN–LSTM models, which are sensitive to input scale variations.
2.3.5. Outlier Detection and Treatment
Outliers arising from measurement errors, extreme climatic events, or anomalous anthropogenic activities can adversely affect model performance. In this study, outliers were identified using a combination of Z-score analysis and interquartile range (IQR) methods. Extreme values exceeding predefined statistical thresholds were carefully examined and, where appropriate, capped or removed to prevent distortion of learning patterns while retaining genuine hydrological extremes relevant to climate variability.
2.3.6. Data Consistency and Quality Assurance
Following preprocessing, the cleaned datasets were subjected to consistency checks to ensure completeness and reliability. Cross-validation of groundwater level trends with meteorological patterns was performed to verify logical coherence. The final preprocessed dataset was then structured into input–output sequences suitable for time-series forecasting models, ensuring that the data accurately represent both seasonal and long-term groundwater dynamics.
2.4. Feature Engineering
Feature engineering plays a crucial role in enhancing the predictive capability of machine learning and deep learning models by transforming raw hydroclimatic data into informative representations that capture underlying physical processes. In groundwater forecasting, appropriately designed features enable models to learn seasonal behavior, long-term trends, and delayed system responses driven by climatic and anthropogenic factors. In this study, a systematic feature engineering strategy was adopted to improve model interpretability and forecasting accuracy.
Figure 5.
Feature engineering module for groundwater forecasting.
Figure 5.
Feature engineering module for groundwater forecasting.
The figure shows the generation of lagged features, rolling statistical features, and time-based features from decomposed intrinsic mode functions and preprocessed meteorological data to construct the final feature set used for model training.
2.4.1. Time-Based Features
Groundwater systems exhibit pronounced seasonal and inter-annual variability influenced by monsoon cycles, evapotranspiration, and land-use changes. To represent these temporal characteristics, several time-based features were incorporated, including month, season, and year indicators. Seasonal encoding was used to distinguish between pre-monsoon, monsoon, and post-monsoon periods, allowing the model to recognize recurring recharge and depletion patterns. Long-term trend features were included to capture gradual groundwater decline associated with sustained abstraction and climate-induced changes.
2.4.2. Lagged Features
Groundwater response to climatic inputs is often delayed due to infiltration processes, subsurface storage, and aquifer characteristics. To account for these delayed effects, lagged groundwater level and quality variables were introduced as model inputs. Observations from previous time steps (e.g., one-month, three-month, and multi-season lags) were incorporated to enable the model to learn temporal dependencies and memory effects. These lagged features are particularly effective for recurrent neural networks, such as LSTM models, which are designed to capture long-term temporal relationships.
2.4.3. Rolling Statistical Features
Rolling statistical features were generated to summarize local temporal behavior and smooth short-term fluctuations. Moving averages, rolling standard deviations, and rolling minimum and maximum values were computed over multiple window sizes to highlight both short-term variability and long-term trends. These features assist the model in distinguishing persistent groundwater depletion signals from transient noise caused by isolated climatic events.
2.4.4. Climate Interaction Features
To strengthen the coupling between groundwater and climatic drivers, interaction features combining rainfall, temperature, and humidity variables were generated. These composite features help the model learn nonlinear relationships between recharge potential and atmospheric conditions, improving sensitivity to climate variability and extreme events. Such interactions are particularly important in semi-arid regions where groundwater recharge is episodic and highly dependent on rainfall intensity rather than cumulative precipitation.
2.4.5. Feature Selection and Dimensionality Control
To prevent overfitting and reduce computational complexity, feature relevance was evaluated using correlation analysis and model-based importance measures. Redundant and weakly informative features were removed, retaining only those that contributed meaningfully to prediction accuracy. This feature selection process ensured that the final input set balanced model complexity with generalization capability.
2.5. Signal Decomposition
Groundwater time series are inherently non-stationary due to the combined influence of seasonal recharge, long-term abstraction trends, and climate variability. Signal decomposition techniques provide an effective means of isolating these overlapping temporal components prior to model training. Previous studies have demonstrated that decomposition-assisted deep learning frameworks outperform standalone models in groundwater forecasting tasks [
6].
In this study, Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN) was applied to decompose groundwater time-series data into intrinsic mode functions representing oscillations at different temporal scales. To further refine high-frequency components and suppress residual noise, Variational Mode Decomposition (VMD) was subsequently applied. This two-stage decomposition strategy is conceptually aligned with recent ensemble-based modal decomposition approaches that have shown improved robustness in groundwater level prediction using multi-data inputs [
1].
Figure 6.
Signal decomposition framework for groundwater and meteorological time-series data.
Figure 6.
Signal decomposition framework for groundwater and meteorological time-series data.
The figure illustrates the application of Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN) followed by Variational Mode Decomposition (VMD) to decompose historical groundwater time-series data into refined intrinsic mode functions, while meteorological variables are preprocessed to generate scale-consistent inputs for subsequent deep learning–based modeling.
2.5.1. Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN)
Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN) is an enhanced variant of empirical mode decomposition designed to overcome mode mixing and residual noise commonly observed in conventional EMD approaches. ICEEMDAN decomposes a complex time series into a finite set of Intrinsic Mode Functions (IMFs) and a residual trend component, each representing oscillations at distinct temporal scales.
In this study, ICEEMDAN was applied to groundwater level and quality time series to extract meaningful components corresponding to high-frequency fluctuations, seasonal variations, and long-term trends. By introducing adaptive noise during decomposition, ICEEMDAN ensures improved stability and consistency across ensembles, resulting in clearer separation of hydrological signals influenced by climatic and anthropogenic factors.
2.5.2. Variational Mode Decomposition
While ICEEMDAN effectively separates intrinsic components, certain high-frequency IMFs may still contain residual noise. To further refine these components, Variational Mode Decomposition (VMD) was employed as a secondary decomposition step. VMD formulates signal decomposition as a constrained variational problem, extracting modes with specific bandwidths and center frequencies.
The application of VMD to selected ICEEMDAN-derived IMFs enhances frequency resolution and suppresses spurious oscillations. This two-stage decomposition strategy improves the clarity of temporal features supplied to deep learning models, enabling more accurate learning of groundwater dynamics across multiple time scales.
2.5.3. Integration with Deep Learning Models
Each decomposed component obtained through ICEEMDAN and VMD was used as an independent input to deep learning models, particularly LSTM-based architectures. By training models on simplified and scale-specific signals rather than raw time series, the forecasting framework reduces model complexity and mitigates overfitting. The final groundwater prediction is reconstructed by aggregating outputs from individual decomposed components.
This integrated decomposition–learning approach enables the model to capture both short-term variability and long-term groundwater trends more effectively than conventional single-stage models. The decomposition process thus plays a critical role in enhancing prediction accuracy, stability, and interpretability under climate-induced non-stationary conditions.
2.6. Model Development
Advanced deep learning architectures have been increasingly adopted for groundwater level prediction due to their ability to capture complex temporal and spatial dependencies. Recent studies employing hybrid architectures, such as TCN–LSTM networks with attention mechanisms, have reported improved predictive performance compared to conventional recurrent models [
4].
Building on these developments, the present study employs two complementary deep learning models: an ICEEMDAN–VMD–SMA–LSTM model optimized using a metaheuristic algorithm and a CNN–LSTM model designed to capture spatial–temporal relationships between meteorological variables and groundwater dynamics. To enhance robustness and generalization capability, an Adaptive Weighting Model ensembles the outputs of individual models, dynamically adjusting their contributions based on predictive performance.
Figure 7.
Development of the SMA–LSTM and CNN–LSTM models.
Figure 7.
Development of the SMA–LSTM and CNN–LSTM models.
The schematic illustrates the model development process, including SMA-optimized LSTM training using decomposed intrinsic mode functions and the CNN–LSTM architecture for extracting spatial–temporal features from meteorological data.
2.6.1. Long Short-Term Memory (LSTM) Model
Long Short-Term Memory (LSTM) networks are a class of recurrent neural networks specifically designed to learn long-term dependencies in sequential data. Groundwater level and quality time series often exhibit delayed responses to recharge, extraction, and climatic drivers, making LSTM architectures particularly suitable for this application.
The LSTM model employed in this study consists of memory cells with input, forget, and output gates that regulate information flow across time steps. This gated structure enables the model to retain relevant historical information while discarding noise and irrelevant fluctuations. The LSTM model serves as a baseline temporal predictor and provides a benchmark for evaluating the effectiveness of more advanced hybrid architectures.
2.6.2. ICEEMDAN-VMD-SMA-LSTM Hybrid Model
To improve forecasting performance under non-stationary conditions, a hybrid model combining signal decomposition, deep learning, and metaheuristic optimization was developed. Following ICEEMDAN and VMD decomposition, each intrinsic component was modeled independently using LSTM networks. This approach allows the model to focus on simplified temporal patterns at different frequency scales.
Hyperparameters of the LSTM networks, including learning rate, number of hidden units, and dropout ratio, were optimized using the Slime Mould Algorithm (SMA). SMA is a population-based optimization technique inspired by the foraging behavior of slime moulds and is well-suited for navigating complex, nonlinear search spaces. The optimized configuration enhances convergence speed and predictive accuracy while reducing the risk of overfitting.
Figure 8.
Algorithmic workflow of the ICEEMDAN–VMD–SMA–LSTM model.
Figure 8.
Algorithmic workflow of the ICEEMDAN–VMD–SMA–LSTM model.
The algorithm outlines the step-by-step procedure for groundwater time-series decomposition, model training with optimized hyperparameters, component-wise prediction, reconstruction of final forecasts, and performance evaluation.
2.6.3. Convolutional Neural Network-LSTM (CNN-LSTM) Model
To incorporate spatial variability and climatic influence, a hybrid CNN–LSTM model was developed. Convolutional Neural Networks are effective in extracting spatial features from gridded meteorological data, such as rainfall and temperature distributions. In the proposed framework, CNN layers first process meteorological inputs to learn spatial patterns associated with recharge potential.
The extracted spatial features are then fed into an LSTM network to model temporal dependencies and long-term groundwater responses. This architecture enables simultaneous learning of spatial and temporal relationships, improving the model’s ability to capture regional rainfall–groundwater interactions.
2.6.4. Adaptive Weighting Model (AWM)
Given that no single model consistently outperforms others under all hydrological conditions, an Adaptive Weighting Model (AWM) was implemented to ensemble predictions from multiple base models, including LSTM, CNN–LSTM, and ICEEMDAN–VMD–SMA–LSTM. The AWM assigns dynamic weights to individual model outputs based on their recent predictive performance, subject to a convexity constraint.
Weights are updated iteratively using error-based feedback, ensuring that models with lower prediction error contribute more significantly to the final forecast. This ensemble strategy enhances robustness, reduces model bias, and improves generalization across varying climatic and hydrological regimes.
Figure 9.
Adaptive Weighting Model (AWM) for ensemble learning.
Figure 9.
Adaptive Weighting Model (AWM) for ensemble learning.
The figure depicts the ensemble framework combining SMA–LSTM and CNN–LSTM model outputs using adaptive weighting parameters to minimize prediction loss and generate the final ensemble model.
2.6.5. Model Integration and Forecast Reconstruction
The final groundwater forecast is obtained by integrating outputs from all component models through the Adaptive Weighting Model. For decomposed signals, predictions from individual intrinsic components are aggregated to reconstruct the original groundwater time series. This hierarchical integration strategy ensures that both short-term variability and long-term trends are accurately represented in the final forecast.
The hybrid model development approach adopted in this study enables effective handling of non-stationary groundwater dynamics and provides a flexible framework for incorporating additional data sources or modeling techniques in future extensions.
2.7. Model Training and Evaluation
Model training and evaluation were conducted to ensure that the proposed forecasting framework achieves high predictive accuracy, robustness, and generalization capability under varying hydrological and climatic conditions. A structured training strategy, combined with rigorous evaluation metrics, was adopted to objectively assess model performance and reliability.
2.7.1. Training-Testing Strategy
The complete dataset was divided into training and testing subsets following a chronological split to preserve the temporal structure of groundwater time series. Historical data from 1994 to 2014 were used for model training, while data from 2015 to 2024 were reserved for testing and validation. This approach prevents information leakage from future observations and ensures realistic evaluation of forecasting performance.
For deep learning models, a walk-forward (rolling-origin) validation strategy was employed during training to assess sequential prediction capability. This method allows the models to be evaluated incrementally as new observations become available, closely mimicking real-world operational forecasting scenarios.
2.7.2. Model Training Configuration
All deep learning models were trained using consistent hyperparameter settings to enable fair comparison. Training was performed using adaptive optimization algorithms, including Adam and Slime Mould Algorithm–optimized learning rates for hybrid models. Typical training parameters included batch sizes of 32, training epochs ranging from 100 to 150, and dropout regularization to mitigate overfitting.
Activation functions were selected based on model architecture, with rectified linear units used in hidden layers and linear activation in output layers. Early stopping criteria were implemented based on validation loss to prevent excessive training and improve model generalization.
2.7.3. Evaluation Metrics
Model performance was evaluated using multiple statistical metrics that collectively assess accuracy, stability, and predictive reliability. These metrics include Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Coefficient of Determination (R²), and Nash–Sutcliffe Efficiency (NSE). MAE and RMSE quantify prediction error magnitude, while R² measures variance explanation and NSE evaluates hydrological model efficiency.
2.7.4. Comparative Performance Assessment
To assess the effectiveness of the proposed hybrid framework, the performance of individual models—LSTM, CNN–LSTM, and ICEEMDAN–VMD–SMA–LSTM—was compared against the ensemble Adaptive Weighting Model. All models were evaluated on identical testing datasets using uniform metrics to ensure comparability.
The ensemble model consistently outperformed standalone architectures, demonstrating lower prediction errors and higher efficiency scores. This improvement highlights the benefits of combining signal decomposition, optimized deep learning, and adaptive ensemble learning for groundwater forecasting.
2.7.5. Robustness and Generalization Analysis
Model robustness was evaluated by examining performance across different hydrological conditions, including periods of high recharge and prolonged depletion. Sensitivity to extreme climatic events was assessed to ensure stability under non-stationary conditions. The results indicate that the ensemble framework maintains consistent performance and reduces error variance compared to individual models.
Overall, the adopted training and evaluation strategy ensures that the proposed framework provides reliable and generalizable groundwater forecasts suitable for decision-support applications under climate variability.