Preprint
Article

This version is not peer-reviewed.

Comparative Analysis of Statistical, Deep Learning, and Hybrid Approaches for Daily Rainfall Forecasting in the Ecuadorian Andean Region

Submitted:

03 June 2026

Posted:

03 June 2026

You are already at the latest version

Abstract
Accurate daily rainfall forecasting is essential for water resource management, flood risk mitigation, and early warning systems in tropical mountain regions. This study systematically compared five univariate forecasting models drawn from three methodological families—one statistical model (ARIMA), two deep learning architectures (Long Short-Term Memory, LSTM; Gated Recurrent Unit, GRU), and two decomposition-based hybrid models (Discrete Wavelet Transform–LSTM, DWT-LSTM; Variational Mode Decomposition–GRU, VMD-GRU)—for one-step-ahead daily rainfall forecasting across 18 meteorological stations in Tungurahua Province, Ecuador, covering the period 2013–2025. Model performance was evaluated using a comprehensive multi-metric framework that combined continuous accuracy metrics (Mean Absolute Error, MAE; Root Mean Square Error, RMSE; Nash-Sutcliffe Efficiency, NSE; Kling-Gupta Efficiency, KGE) and categorical event-detection metrics (Probability of Detection, POD; False Alarm Ratio, FAR; Critical Success Index, CSI) applied at three precipitation thresholds (1 mm, P90, and P95). LSTM achieved the best overall performance for continuous prediction and rainfall-occurrence detection at the 1 mm threshold; GRU delivered comparable skill at lower computational cost, making it a practical alternative for operational applications. Station-level performance rankings varied considerably across the network, reflecting strong hydroclimatic heterogeneity and confirming that spatially uniform single-model forecasting strategies are suboptimal. All evaluated models failed to reliably detect high-intensity (P90) and extreme (P95) rainfall events, producing very low POD and high FAR values. Although ARIMA ranked first in relative terms for extreme-event detection, its absolute skill remained insufficient for operational purposes. Hybrid decomposition models (DWT-LSTM, VMD-GRU) did not outperform the simpler recurrent architectures and frequently produced negative NSE values. These results demonstrate that univariate approaches are adequate for routine daily rainfall forecasting but inadequate for reliable extreme-event prediction, underscoring the critical need for multivariate predictor frameworks, specialized loss functions, and probabilistic approaches in future research.
Keywords: 
;  ;  ;  ;  ;  ;  ;  ;  ;  
Subject: 
Engineering  -   Other

1. Introduction

Precipitation is a critical component of the hydrological cycle given its pervasive influence on ecosystems and human livelihoods, including agriculture, drinking water supply, and hydroelectric energy generation [1,2,3]. Globally, extreme hydrometeorological events such as droughts and floods are occurring with increasing frequency and intensity under ongoing climate change [4,5,6]. This trend is particularly pronounced in the Ecuadorian Andes, where complex topography and high spatiotemporal rainfall variability drive flash floods, landslides, and localized droughts. In this context, accurate daily rainfall forecasts are essential, particularly for developing countries seeking to mitigate extreme climate impacts through early warning and adaptive water management [7,8,9].
Precipitation is inherently variable in both space and time, rendering its reliable forecasting a persistent challenge. The highly nonlinear, intermittent, and multi-scale nature of precipitation series challenges classical linear approaches, driving the progressive adoption of advanced time-series techniques, deep learning architectures, and hybrid models [1,2,10,11]. Historically, rainfall forecasting has relied on univariate statistical models such as the Autoregressive Integrated Moving Average (ARIMA) and its seasonal variants, valued for their mathematical rigor, interpretability, and low computational requirements. However, these approaches assume linear relationships and relatively simple dependence structures, which often fail to capture the complex rainfall dynamics found in mountain regions characterized by strong topographic heterogeneity [4,12,13,14].
The emergence of deep learning architectures, particularly Long Short-Term Memory (LSTM) networks [15] and Gated Recurrent Units (GRU) [16], has substantially transformed time-series forecasting by enabling the capture of long-range temporal dependencies and nonlinear patterns in meteorological data [10,17,18,19]. Applications across diverse climatic zones and spatial scales—from the northwestern Himalayas [20] to urban environments in the United Kingdom [21] to national-scale studies [22]—have repeatedly demonstrated that LSTM and GRU models frequently outperform traditional statistical methods for precipitation forecasting, while maintaining acceptable computational efficiency for operational use [17,23,24].
In parallel, hybrid models that decompose precipitation time series into multiple frequency components using signal-processing techniques—notably the Discrete Wavelet Transform (DWT) and Variational Mode Decomposition (VMD)—followed by component-wise prediction through neural networks have emerged as promising solutions. These frameworks aim to overcome the limitations of standalone LSTM and GRU models when handling highly volatile, intermittent precipitation signals characterized by extreme events [25,26,27,28]. Signal decomposition can be particularly advantageous in hydrological systems with strong non-stationarity, by isolating trend, seasonal, and stochastic sub-series to enable more precise modeling of each component [28,29].
Despite these advances, a significant literature gap persists for the Ecuadorian Andean context. Previous studies have applied LSTM-based Bayesian frameworks for hourly rainfall nowcasting in a tropical Andean city [30] and Random Forest models for 30-day accumulated precipitation forecasting over an Ecuadorian Andean basin [31]. However, no systematic, methodologically homogeneous, and reproducible comparison exists among ARIMA, LSTM, GRU, DWT-LSTM, and VMD-GRU focused specifically on univariate daily precipitation series from a meteorological network in the Ecuadorian Andean region. This gap prevents the establishment of a consistent evidence base for determining which model family offers greater robustness and operational utility in a mountain environment characterized by high pluviometric variability [30,31].
This study therefore systematically evaluates the predictive capacity of statistical, deep learning, and decomposition-based hybrid models—including ARIMA, LSTM, GRU, DWT-LSTM, and VMD-GRU—for one-step-ahead daily rainfall forecasting at 18 meteorological stations in Tungurahua Province, Ecuador (2013–2025). Model performance is assessed using a comprehensive dual framework of continuous accuracy metrics (MAE, RMSE, NSE, and KGE) and categorical event-detection metrics (POD, FAR, and CSI) evaluated at three operationally relevant precipitation thresholds (1 mm/day, P90, and P95). By jointly analyzing mean error behavior and extreme-event prediction skill, this study aims to identify the limitations of average-based metrics under highly intermittent rainfall regimes, to evaluate whether hybrid decomposition approaches add predictive value, and to provide actionable guidance for operational rainfall forecasting systems in hydroclimate-complex Andean environments.

2. Materials and Methods

2.1. Area of Study and Meteorological Network

The area of study is located in Tungurahua Province, within the inter-Andean region of Ecuador, and falls within the hydrological influence zone of the Pastaza River basin, a major tributary of the Amazon basin [32,33]. The province is characterized by irregular topography, strong altitudinal gradients (ranging from approximately 1529 to 3875 m a.s.l. across the network), and pronounced hydroclimatic variability driven by orographic processes, local convective dynamics, and large-scale atmospheric circulation patterns (Figure 1).
The provincial capital, Ambato, is classified as Cfb (temperate oceanic) under the Köppen–Geiger classification [34,35,36], though the broader study domain encompasses distinct precipitation regimes shaped by slope orientation, valley exposure, and exposure to moisture-laden air masses of Amazonian origin.
Eighteen meteorological stations were selected from the Tungurahua Provincial Government’s monitoring network to represent the spatial variability of daily precipitation across the study area. Although the network comprises 20 stations, two were excluded due to insufficient record length to support a robust comparative evaluation. The spatial distribution of stations is shown in Figure 1, and their main characteristics are summarized in Table 1.

2.2. Data Description

The dataset consists of daily precipitation records from the 18 selected meteorological stations, provided by the Tungurahua Provincial Government. Daily temporal resolution was adopted for its direct relevance to operational hydrometeorological applications, including water resource management, irrigation scheduling, and flood risk analysis [37]. Available records span the period 2013–2025, although the effective observation period varies by station; stations AS and QP have records available only through May 2023 and August 2024, respectively, and were therefore analyzed over their own reliable observation windows. Each time series exhibits the strongly intermittent behavior typical of Andean precipitation regimes, dominated by zero or near-zero values and punctuated by occasional high-intensity events, as illustrated in Figure 2.

2.3. Data Preprocessing

2.3.1. Quality Control and Data Cleaning

Prior to model training, daily precipitation series were subjected to a structured preprocessing workflow to ensure data consistency and cross-station comparability. Quality control examined records for values exceeding the maximum measurement range of the tipping-bucket rain gauges and for positive values below the instrument resolution threshold of 0.1 mm [38]. No observations outside these limits were detected across the 18 stations; consequently, no data removal or correction was required at this stage.

2.3.2. Handling of Missing Values

Precipitation records from 15 stations were complete. At stations PL, BA, and HB, the proportion of missing data was minimal (≤0.07%). Missing values were filled using daily precipitation estimates from the NASA POWER reanalysis product [39], matching the corresponding dates and station locations. This satellite-based gap-filling strategy is an accepted approach when missing values are sparse and do not compromise the overall dataset structure [40,41].

2.3.3. Temporal Partitioning and Dataset Construction

Each station’s precipitation series was sorted chronologically and partitioned into training (70%), validation (15%), and test (15%) subsets using a fixed temporal ratio without randomization, preserving chronological order and preventing information leakage. The prediction task was formulated as a one-step-ahead (t + 1) forecasting problem under a univariate framework, using a sliding-window input of 30 consecutive daily observations to predict the next-day precipitation amount. Supervised input-output datasets for all deep learning and hybrid models were constructed using the same window length, chronological alignment, and test dates to ensure that differences in predictive performance reflect model structure rather than data preparation inconsistencies.

2.3.4. Data Scaling and Shared Preprocessing Artifacts

Min-Max normalization (Equation (1)) was applied exclusively to models incorporating deep learning components, fitted solely on the training subset and subsequently applied—without parameter recalculation—to validation and test subsets. This procedure preserves temporal causality and prevents data leakage [27].
x * = x x m i n x m a x x m i n
where xmax and xmin are the maximum and minimum values of the precipitation series estimated from the training subset, respectively.
ARIMA was fitted directly on precipitation data in original units (mm/day), as scaling is unnecessary for its standard formulation and retaining original units facilitates direct meteorological interpretation of model residuals and confidence intervals [42]. To guarantee strict comparability across all model scripts, a shared set of preprocessing artifacts—including chronological split definitions, event thresholds, fitted scaling objects, and sliding-window datasets—was created and reused for all experiments.

2.4. Experimental Forecasting Framework

2.4.1. Common Experimental Protocol

All forecasting models were developed under a unified experimental protocol based on strict temporal ordering to ensure comparability, reproducibility, and experimental fairness. The forecast horizon was fixed at one day ahead (t + 1), and the event analysis framework was standardized across all models. Rainfall occurrence was evaluated at a threshold of p ≥ 1.0 mm/day, while high- and extreme-rainfall events were assessed using P90 and P95 thresholds computed exclusively from rainy-day observations in the training subset. These thresholds were held constant for validation and test period analyses.
Reproducibility was reinforced by setting a global random seed and enabling deterministic configurations in all main computational libraries where the software and runtime permitted. All deep learning and hybrid models were trained from scratch with no weight transfer between architectures. To maintain a comparable training budget, all models were optimized under the same constraints: a maximum of 100 epochs, early stopping based on validation loss with a patience of 10 epochs, best-weight restoration, and a default batch size of 32.

2.4.2. Hyperparameter Optimization and Model Selection

Hyperparameter optimization was conducted exclusively on training and validation subsets. For deep learning and hybrid models, the search was performed using the Optuna framework [43] over a constrained search space encompassing learning rate, number of layers, hidden units, dropout rate, and batch size, with the objective of minimizing validation RMSE and pruning enabled to terminate unpromising trials efficiently. ARIMA model selection followed a reproducible constrained search over (p, d, q) combinations evaluated first by the Akaike Information Criterion (AIC) on the training subset, with final order selected by minimizing validation RMSE. When two or more configurations produced similar RMSE values, Kling-Gupta Efficiency (KGE) served as a tiebreaker, with preference given to the higher KGE configuration.
Following hyperparameter selection, the best-performing configuration for each model was retrained on the combined training and validation data, and final performance was evaluated once on the unmodified test subset. This train-validate-retrain-test workflow avoids optimistic bias while maintaining a uniform model selection procedure across all evaluated approaches. A fundamental principle of the framework was explicit leakage control: all fitting, parameter estimation, and threshold definition operations were restricted to training data during model development, and validation and test subsets were exposed only to transformations learned from training data.

2.5. Forecasting Models

Five univariate daily rainfall forecasting models from three methodological families were evaluated: ARIMA (statistical), LSTM and GRU (deep learning), and DWT-LSTM and VMD-GRU (decomposition-based hybrid). These models represent a spectrum of temporal complexity and predictive capacity, from conventional linear approaches to nonlinear sequence learning and multi-scale signal decomposition, and were selected to reflect both established benchmarks and current state-of-the-art approaches in hydrological forecasting [10,17,25,26,44,45].

2.5.1. Autoregressive Integrated Moving Average (ARIMA)

ARIMA, introduced by Box and Jenkins [42], is expressed as ARIMA(p,d,q), where p is the order of the autoregressive (AR) component, d is the degree of differencing required to achieve stationarity, and q is the order of the moving-average (MA) component. The mathematical formulation of the model is presented in Equation (2).
ϕ p ( B ) ( 1 B ) d Z t = θ q ( B ) a t
Where:
ϕ p ( B ) = 1 ϕ 1 B ϕ 2 B 2 ϕ p B p Backshift operator (B) AR process
θ q ( B ) = 1 θ 1 B θ 2 B 2 θ q B q Backshift operator (B) MA process
B Backshift operator
( 1 B ) d Differentiating operator
d Order of differencing

2.5.2. Long Short-Term Memory Neural Network (LSTM)

LSTM networks, introduced by Hochreiter and Schmidhuber [44], address the vanishing-gradient problem inherent in standard recurrent neural networks through three gating mechanisms, input (it), forget (ft), and output (ot) gates, that regulate information flow and memory retention.
Input gate (it): This gate determines what portion of information from the current input is stored in the cell state. It considers both the current input and the previous cell state. The corresponding mathematical expression is:
i t = σ { W i ( h t 1 x t ) + b i }
Forget gate (ft): This gate determines what information from the previous cell state is retained and what is discarded. It considers the previous cell state and the current input. The forget gate equation is:
f t = σ { W f ( h t 1 x t ) + b f }
Output gate (ot): This gate determines what information from the current cell state is used to produce the output. It considers the current input, the previous cell state, and the information accumulated therein. The output gate expression is:
o t = σ { W o ( h t 1 x t ) + b o }
Where:
it, ft y ot: vectors of the input, forget, and output gates, respectively, at time step t. f t o t
σ: sigmoid activation function, mapping inputs to the range [0, 1].
Wi, Wf y Wo: weight matrices of the respective gates.
h(t−1): the concatenated previous hidden state at t − 1.
xt: the current input vector at time t.
bi, bf, bo: bias terms.
Cell state: the internal memory component of the model that retains prior information and is updated by the following equation [17]
C t = f t C t 1 + i t C t ~
Where:
Ct is the cell state at time t, ft is the forget gate, it is the input gate, Ct−1 is the previous cell state, and C t ~ is a candidate cell state computed as:
h t = o t t a n h ( C t )
where ht is the hidden state at time step t, ot is the output gate, and Ct is the cell state.

2.5.3. Gated Recurrent Unit (GRU)

GRU is a simplified type of recurrent neural network (RNN) inspired by LSTM, but with fewer parameters, making it highly efficient for capturing and retaining both short- and long-range dependencies in time-series data [21].
The key difference from LSTM is that GRU merges the forget and input gates into a single update gate, enabling dynamic memory control. This design mitigates the overfitting problem that can arise in LSTM when data are limited, while still addressing the long-range dependency problem. The GRU computation can be summarized through Equations (8) to (11), [17,22].
Reset gate (rt): controls how much of the previous hidden state, ht−1, is discarded. r t h t 1
r t = σ { W r [ h t 1 , x t ] + b r }
Update gate (zt): controls how much of the previous hidden state is carried forward to the current hidden state (ht) z t ) h t
z t = σ { W z ( h t 1 , x t ) + b z }
Candidate activation h ~ t   generates new information for the current hidden state.
h ~ t = t a n h { W h [ r t h t , x t ] + b h }
Where:
r t , zt: vectors of the reset and update gates, respectively.
h ~ t : candidate activation.
σ: sigmoid activation function mapping inputs to [0, 1].
W r , Wz and Wh: weight matrices applied to the input; xt: input vector at time t.
h t 1 h: hidden state from the previous time step.
x t : current input at time t.
b r , bz, bh: bias terms. b z b h
Cell state update:
h t = ( 1 z t ) × h t 1 + z t × h ~ t
where:
h t : Current hidden state, which is a combination of the previous hidden state, h t 1 , and the candidate activation, which is controlled by the update gate, z t .

2.5.4. Discrete Wavelet Transform (DWT)

The Discrete Wavelet Transform (DWT) is a powerful signal-processing technique that decomposes a time series into multiple resolution levels, separating components with different frequency characteristics. The continuous wavelet transform can be expressed through Equations (12) to (29), [46,47]
w t ( a , b ) = 1 a x ( t ) φ ( a , b ) d t
where φ (a, b) is the complex conjugate of the base signal, φ
φ ( a , b ) = 1 a φ ( t b a ) w
However, CWT’s scale parameter’s dyadic discretization, which is the DWT, has a linear connection to the shift parameter’s step size.
{ a = 2 j b = k 2 j
This transforms Equations (12) and (13) into the following expressions:
w t ( j , k ) = 1 2 j x ( t ) φ ( j , k ) d t
φ ( j , k ) = 1 2 j φ ( t k 2 j 2 j )
Equations (15) and (16) are simplified, using the Mallat algorithm, into:
φ j , k [ t ] = 2 j 2 k d j , k φ ( 2 j t k )
j , k [ t ] = 2 j 2 k a j , k ( 2 j t k )
Where:
d j , k = k g ( n ) a j 1 , k
a j , k = k h ( n ) a j 1 , k
Where d j , k are the high-frequency coefficients, a j , k are the low-frequency coefficients, g ( n ) and h ( n ) are the high- and low-pass filters, respectively, expressed as:
{ h ( n ) = φ , φ 1 , n g ( n ) =   φ , φ 1 , n
Where,
n h ( n ) = 2
n g ( n ) = 0
For DWT with Daubechies wavelet four as the scaling function, the following matrix applies [46]:
[ a 0 a 1 a 2 a 3 0 000 a 1 a 0 000 0 a 3 a 2 ] [ y 1 y n ] = [ s j 1 d j 1 ]
Where a are the low-frequency coefficients, y is the raw signal, s and d are the transformed signal, which can also be calculated as:
s j 1 ( 1 ) [ n ] = s j [ 2 n ] + 3 s j [ 2 n + 1 ]
d j 1 [ n ] ( 1 ) = s j [ 2 n + 1 ] + 1 4 3 s j 1 [ n ] 1 4 ( 3 2 ) s j 1               ( 1 ) [ n 1 ]
s j 1 [ n ] ( 2 ) = s j 1               ( 1 ) [ n ] + d j 1               ( 1 ) [ n + 1 ]
s j 1 [ n ] = 3 1 2 s j 1               ( 2 ) [ n ]
d j 1 [ n ] = 3 + 1 2 d j 1               ( 1 ) [ n ]
Further details on other wavelet types can be found in [48].

2.5.5. Variational Mode Decomposition (VMD)

Variational Mode Decomposition (VMD) is a technique that decomposes a complex time series into a discrete number of intrinsic mode functions (IMFs), each characterized by a specific center frequency and limited bandwidth. VMD formulates the decomposition problem as a constrained variational optimization task, resulting in greater noise robustness, reduced mode mixing, and a stronger mathematical foundation [27]
The constrained variational problem can be expressed through Equations (30) to (35) [49],
{ k = 1 K t [ ( δ ( t ) + j π t ) · u k ( t ) ] e j ω k t 2 2 }
s . t . k = 1 K u k = f ( t )
Where { u k ( t ) } = { u 1 ( t ) ,   u 2 ( t ) , ,   u K ( t )   } and { ω k } = { ω 1 ,   ω 2 ,     ω K } are shorthand notations for decomposed sub-sequences and their center frequencies, respectively; δ ( t ) is the Dirac distribution, the symbol denotes convolution, and e j ω k t is a phasor describing the rotation of the complex signal in time, with j 2 = 1 . The variational problem is addressed efficiently using the alternate direction method of multipliers (ADMMs). The modes u K ( t ) are updated by Wiener filtering in the Fourier domain with a filter tuned to the current center frequency; see Equation (32). Then the center frequencies ω k are updated as the center of gravity of the corresponding mode’s power spectrum, expressed as Equation (33), and finally the Lagrangian multiplier λ enforcing exact constraints is updated as the dual ascent by Equation (34). The updating procedure is repeated until the convergence condition is satisfied, as in Equation (35).
u ^ k n + 1 ( ω ) f ^ ( ω ) i < k u ^ i n + 1 ( ω ) i > k u ^ i n ( ω ) + λ ^ n ( ω ) 2 1 + 2 θ ( ω ω k n ) 2
ω k n + 1 0 ω | u ^ i n + 1 ( ω ) | 2 d ω 0 | u ^ i n + 1 ( ω ) | 2 d ω
λ ^ n + 1 ( ω ) λ ^ n ( ω ) + τ ( f ^ ( ω ) k u ^ k n + 1 ( ω ) )
k u ^ k n + 1 ( ω ) 2 2 u ^ k n 2 2 < ε
Here, u ^ k n + 1 ( ω ) , f ^ ( ω ) , and λ ^ n + 1 ( ω ) represent the Fourier transforms of u k n + 1 ( t ) , f ( t ) and respectively; n is the iteration, θ is a quadratic penalty term, τ is the iterative factor that indicates VMD’s noise tolerance, and ε denotes the convergence tolerance.

2.6. Model Training and Leakage Control

All experiments were implemented in Python within a Google Colab environment using TensorFlow/Keras for deep learning components and the pmdarima library for ARIMA. A global random seed was set for Python, NumPy, and TensorFlow to minimize stochastic variability and improve reproducibility. The full computational environment—including Python version, library versions, and package list—was documented for each experiment. Leakage prevention safeguards included: (i) strictly temporal, non-randomized data splits; (ii) all fitting operations (scaling, ARIMA order selection, decomposition parameterization) restricted to the training set; (iii) percentile-based event thresholds defined exclusively from training-period rainy days; and (iv) sliding-window alignment ensuring each prediction at t+1 depended solely on observations available at time t. Consistency checks verified correct date alignment, absence of missing values in model inputs, and proper separation among subsets.

2.7. Performance Evaluation

A multi-metric evaluation framework was adopted to capture different dimensions of forecast quality, recognizing that a single metric cannot adequately characterize model behavior under intermittent, heavy-tailed precipitation regimes [17,23].

2.7.1. Continuous Metrics

Model performance for precipitation amounts was quantified using four complementary metrics: Mean Absolute Error (MAE, Equation (36)), Root Mean Square Error (RMSE, Equation (37)), Nash-Sutcliffe Efficiency (NSE, Equation (38)), and Kling-Gupta Efficiency (KGE, Equations (39)–(42)).
MAE = i = 1 n | y i y ^ i | n
RMSE = i = 1 n ( y i y ^ i ) 2 n
Nash-Sutcliffe Efficiency (NSE) quantifies how well simulated and observed data fit a 1:1 line, and is evaluated using Equation (38), [31].
N S E = i = 1 n ( y i y ^ i ) 2 i = 1 n ( y i μ y ^ ) 2
Where:
n: total number of observations in the test set.
yᵢ: observed value.
y ^ i : model-predicted value.
Kling-Gupta Efficiency (KGE) is calculated using Equations (39) to (42):
K G E = 1 ( r 1 ) 2 + ( β 1 ) 2 + ( γ 1 ) 2
Donde:
Correlation (r):
r = c o v ( y , y ^ ) σ y σ y ^
Bias ratio (β):
β = μ y ^ μ y
Variability ratio (γ):
γ = C V y ^ C V y = σ y ^ / μ y ^ σ y / μ y
Where:
y: real observed value.
y ^ i : model-predicted value.
μ y : mean of the real variable.
μ y ^ : mean of the simulated variable.
σ y : standard deviation of observed values.
σ y ^ : standard deviation of simulated or predicted values.

2.7.2. Event-Based Categorical Metrics and Rainfall Thresholds

To assess the models’ ability to correctly detect rainfall occurrence and high-intensity events, continuous predictions were transformed into binary categorical forecasts using three precipitation thresholds: p ≥ 1.0 mm/day (rainfall occurrence), P90 (high-rainfall events), and P95 (extreme-rainfall events). The P90 and P95 thresholds were computed from rainy-day observations in the training subset and held constant for validation and test periods (Equations (43)–(45)). Three categorical metrics were computed from the resulting contingency table (Hits H, False Alarms F, Misses M):
Probability (POD):
P O D = H H + M
False Alarm Ratio (FAR):
F A R = F H + F
Critical Success Index (CSI):
C S I = H H + M + F
POD (Probability of Detection) ranges from 0 to 1, with 1 indicating perfect event detection. FAR (False Alarm Ratio) ranges from 0 to 1, with 0 indicating no false alarms. CSI (Critical Success Index) ranges from 0 to 1 and provides an integrated measure of detection skill that penalizes both misses and false alarms [51].

3. Results and Discussion

3.1. Hydroclimatic Characterization of the Station Network

To establish a hydroclimatic baseline for the subsequent model evaluation, the rainfall regime was first characterized at each station. Table 2 presents key descriptors of the daily precipitation distributions across the 18 stations.
Precipitation regimes across the Tungurahua station network exhibited high spatiotemporal variability. Coefficients of variation (CV) exceeded 147% at all stations, confirming that all series are highly variable relative to their means—a threshold classically associated with high variability in precipitation [51,52]. CV values exceeded 200% at stations BA, CB, AP, FC, HB, JA, and PL (located between 2270 and 3314 m a.s.l.), reflecting a predominance of dry days interrupted by intense rainfall episodes.
Skewness values ranged from 2.79 (MC) to 11.61 (BA), confirming strongly positive (right-skewed) distributions at all stations. Fisher kurtosis values exceeded 10 at all stations, reaching extreme values above 100 at PL (141.08), HB (96.71), BA (250.28), and RV (80.61), indicating strongly leptokurtic distributions with heavy right tails—a well-documented feature of Andean precipitation regimes [53,54]. These distributional characteristics have critical implications for model performance, as the dominance of low and zero precipitation values creates severe class imbalance that challenges all evaluated forecasting approaches, particularly at high-intensity thresholds (Figure 3).
Daily P90 thresholds ranged from 3.20 mm (CB) to 19.83 mm (RV), while P95 thresholds ranged from 5.90 mm (CB) to 28.12 mm (RV). The highest P90/P95 values were observed at RV (82.12% rainy days), PF, and PY (83.68% rainy days), indicating that these stations experienced more frequent and severe high-intensity rainfall episodes. In contrast, CB, AP, FC, and BA showed rainy-day frequencies at or below 55%, reflecting more intermittent regimes with longer dry spells punctuated by moderate-to-intense events.
The Walsh-Lawler Seasonality Index (SI) ranged from 0.16 (QP) to 0.33 (PF), classifying most stations as ‘equable with a short dry season’ and MC, QP, and HB as ‘very equable.’ The Precipitation Concentration Index (PCI) ranged narrowly from 8.64 to 9.54, with all stations classified as having uniform monthly distribution, indicating that precipitation is not strongly concentrated in a few months of the year. Nonetheless, seasonal patterns exist: the wettest month was most frequently June, while September was most commonly the driest month. Stations exhibited unimodal (8 stations: CH, PY, ET, AS, BA, GL, PF, and—marginally—others), bimodal (9 stations), or relatively uniform (1 station: QP) annual precipitation regimes (Figure 4), consistent with the diversity of hydroclimatic typologies reported for the Ecuadorian and Colombian Andes [51,55].
The diversity of seasonal regimes has direct implications for model design and calibration. Stations with clearly defined unimodal seasonality (e.g., CH, PY, AS) may benefit from explicit seasonal components in statistical models or recurrent architectures capable of capturing annual cycles. In contrast, bimodal stations (MC, PS, CM, PL, FC, CB, HB, JA, AP, RV) require models capable of capturing multiple periodicities, potentially justifying multi-scale decomposition approaches such as DWT or VMD [26,27]. The pronounced hydroclimatic heterogeneity across the station network—with stations separated by as little as 10–30 km exhibiting fundamentally different precipitation dynamics due to orographic effects and slope orientation [55], argues strongly against spatially uniform forecasting strategies and in favor of adaptive, station-specific model selection.

3.2. Overall Continuous Predictive Performance

LSTM and GRU achieved the best overall continuous predictive performance across the station network (Figure 5), producing the lowest cross-station mean MAE and RMSE values. Among all evaluated models, LSTM emerged as the most consistently robust, combining low error magnitudes with the highest mean NSE, a mean bias close to zero, and comparatively lower cross-station performance dispersion (Figure 5e). These results indicate that LSTM most reliably reproduced the temporal behavior of the observed daily precipitation series.
This pattern is consistent with the broader hydrological deep learning literature: LSTM and GRU gating mechanisms enable the retention of relevant temporal information across varying time lags, facilitating the representation of dependencies that are structurally difficult to capture with linear models [17,45]. Several studies have highlighted this advantage, particularly when the target process is sequential, non-stationary, and exhibits complex autocorrelation structures [21,22]. For routine operational water resource management in Tungurahua Province, LSTM is therefore the most robust option for continuous daily rainfall prediction, while GRU emerges as a well-balanced alternative offering comparable performance at lower computational cost—an important consideration for resource-constrained operational contexts.
ARIMA showed the weakest overall performance among the five models, with the highest mean RMSE and MAE and the lowest NSE (Figure 5), reflecting the inability of its linear structure to capture the complex, nonlinear daily precipitation dynamics under Andean mountain regimes, particularly at stations with bimodal seasonality [10,17]. This weaker continuous prediction skill is consistent with prior comparative studies in complex rainfall environments [20,24].
The hybrid models DWT-LSTM and VMD-GRU produced intermediate error levels but negative NSE and KGE values, indicating weaker overall agreement with observations than the standalone recurrent architectures. The failure of these hybrid models to outperform simpler deep learning architectures suggests that, for daily precipitation series of approximately 12 years with high intermittency, the additional complexity of multi-scale decomposition does not translate into predictive improvement. This may be attributed to error accumulation during component reconstruction, fragmentation of temporal information into sub-series with insufficient training data for individual neural networks, and the tendency of VMD to generate spurious modes under limited data conditions [31,45,56].

3.3. Station-Wise Variability in Model Performance

The network-scale performance hierarchy described above was not spatially homogeneous, as revealed by the station-wise ranking analysis (Figure 6). LSTM most frequently ranked first or second in MAE and RMSE across the 18 stations, consistent with its superior network-level performance. However, GRU and DWT-LSTM outperformed LSTM at several individual stations, confirming that no single universally optimal model exists and that the best-performing architecture depends on the local hydroclimatic context [10,17].
Stations where GRU outperformed LSTM likely feature precipitation patterns with shorter effective temporal dependencies, a context in which GRU’s simplified architecture with fewer parameters provides better generalization [21,56]. The occasional superiority of DWT-LSTM in NSE and KGE at certain stations suggests that wavelet decomposition can improve representation of multi-scale precipitation variability, particularly at stations with complex patterns combining long-term seasonal trends and high-frequency fluctuations [26,28].
Bias behavior also varied across stations and models (Figure 6e). LSTM showed nearly balanced bias, GRU showed slight underestimation (attributable to its single update-gate structure, which can dampen extreme event magnitudes [21]), DWT-LSTM showed overestimation (likely arising from reconstruction errors when high-frequency sub-series predictions are recombined [26,28]), and ARIMA showed moderate overestimation. The integrated ranking (Figure 6f) confirmed LSTM as the most consistently competitive model, followed by GRU, with ARIMA generally remaining among the weaker models and VMD-GRU showing limited cross-site stability.
These findings have direct implications for operational provincial forecasting networks: spatially homogeneous strategies applying a single model to all stations are suboptimal and likely to underperform at individual stations. An adaptive multi-model system—applying LSTM at bimodal stations with complex long-range dependencies (e.g., MC, PS, RV), GRU at stations with simpler unimodal regimes where parsimony improves generalization (e.g., CH, QP), and DWT-LSTM at stations with pronounced multi-scale variability (e.g., PL, BA)—can improve aggregate network accuracy while maintaining computational efficiency.

3.4. Rainfall Event Detection at the 1 mm/Day Threshold

Accurate estimation of rainfall magnitude does not necessarily translate into reliable detection of rainy-day occurrence. At the 1 mm/day threshold, LSTM, GRU, DWT-LSTM, and VMD-GRU achieved high POD values (Figure 7a), reflecting the capacity of deep learning models to capture the nonlinear atmospheric patterns preceding rainfall onset [17,25]. In contrast, ARIMA showed lower and more variable POD, consistent with its linear structure’s reduced sensitivity to the abrupt transitions characteristic of precipitation intermittency [12,18].
Despite their high sensitivity, neural network models—particularly VMD-GRU—produced elevated FAR values (Figure 7b), while ARIMA exhibited the lowest FAR. Neural networks trained by minimizing mean squared error (MSE) tend to generate small positive predictions under uncertain conditions, which, when compared against a 1 mm binary threshold, results in false alarms when observed precipitation is zero [23,31]. ARIMA’s autoregressive structure relies on the persistence of dry conditions and is therefore less prone to predicting spurious wet-to-dry transitions, though at the cost of missing real events.
When the POD-FAR trade-off was integrated through CSI (Figure 7c), ARIMA showed better median overall detection skill at the 1 mm threshold, while GRU and LSTM remained moderately competitive. This result—apparently contradictory to the superior continuous-metric performance of deep learning models—illustrates an important methodological limitation: optimizing for magnitude accuracy (MSE minimization) does not guarantee skill in binary occurrence classification. The trade-off between sensitivity and specificity has critical implications that depend on the operational context: in flood early warning and reservoir management, high POD is prioritized to avoid missing precipitation events that could trigger hazardous conditions, and a moderate FAR is accepted as an unavoidable operational cost [8,32].

3.5. Detection of High- and Extreme-Rainfall Events

Model skill deteriorated dramatically when the detection threshold was elevated from 1 mm/day to the P90 and P95 percentiles (Figure 8). At the P90 threshold, POD values were close to zero for most models across the 18 stations, while FAR values remained high (medians at or above 0.8–1.0 for GRU, DWT-LSTM, ARIMA, and VMD-GRU), representing the worst possible scenario: models fail to detect real high-intensity events while simultaneously generating abundant false alarms when predicted magnitudes exceed the P90 threshold without corresponding to observed extreme precipitation.
CSI values were very low across the station network for all models at P90. ARIMA showed slightly higher POD and CSI than the deep learning architectures—a result that apparently contradicts the superior continuous-metric performance of LSTM and GRU, and challenges the assumption that greater architectural complexity necessarily yields better extreme-event predictive capacity. ARIMA’s relative advantage at high thresholds can be explained by its autoregressive structure: under conditions of temporal persistence or clustering in the upper tail of the precipitation distribution, the linear autoregressive components can partially capture serial correlation among consecutive intense events [1,13]. Furthermore, by operating directly on the original precipitation series without the intermediate decomposition and reconstruction steps required by hybrid models, ARIMA avoids the cumulative errors associated with those processes [26,27].
The overall collapse in predictive skill was even more severe at the P95 threshold, where POD (Figure 8d) and CSI (Figure 8f) were practically zero for all models across the station network, and FAR values approached 1.0. ARIMA showed the only marginally detectable deviation from zero POD. This catastrophic failure at P95 reflects multiple inherent challenges that univariate approaches cannot adequately resolve. The 95th percentile defines the upper 5% of the daily precipitation distribution, corresponding to approximately 18 extreme events per station per year—yielding roughly 216 extreme event examples over the 12-year training period. This severe class imbalance between normal and extreme days fundamentally limits the ability of any supervised learning algorithm trained with standard loss functions to learn robust representations of extreme-event precursor conditions [29,53]. Neural network models optimized by MSE minimization naturally prioritize accurate prediction of frequent low-to-moderate precipitation events at the expense of rare extreme episodes [17,23].
Furthermore, extreme precipitation typically results from complex atmospheric configurations—involving interactions among synoptic-scale dynamics, mesoscale convective systems, and local orographic forcing—that cannot be inferred from precipitation history alone. Incorporating additional meteorological predictors (atmospheric humidity profiles, wind fields, sea surface temperature anomalies, large-scale circulation indices) has been shown to significantly improve extreme-event detection in complex terrain [20,57]. These results underscore the fundamental inadequacy of purely univariate approaches for operationally reliable extreme-event rainfall forecasting in the Ecuadorian Andes.

3.6. Integrated Cross-Domain Synthesis of Model Performance

Table 3 presents an integrated synthesis of model performance across all evaluation domains. No single model dominated across all forecasting objectives, and the optimal operational approach depends on the specific application context and station characteristics.
LSTM’s leadership in continuous prediction is attributable to its sophisticated memory cell architecture, with gating mechanisms that capture long-range nonlinear temporal dependencies—a fundamental characteristic for modeling seasonal persistence and autocorrelation in precipitation series [32,45]. GRU ranked second and demonstrated the most balanced behavior across all evaluation domains, making it particularly attractive for provincial-scale operational networks where a single model must perform acceptably across multiple forecasting objectives simultaneously [21].
DWT-LSTM showed an intermediate profile: weaker baseline performance for continuous prediction (3rd, integrated rank 3.06 ± 1.04) and 1 mm detection (4th), but comparatively better rankings for P90 and P95 detection (2nd in both). This differential behavior suggests that wavelet decomposition enables the LSTM component associated with high-frequency detail sub-series to capture abrupt intensity peaks characteristic of extreme events, even when the overall continuous metric performance is penalized by reconstruction errors [26,28,31].
ARIMA’s paradoxical first-place ranking for extreme-event detection (P90 and P95) relative to the deep learning architectures should be interpreted cautiously: as demonstrated by near-zero POD and FAR ≥ 0.8 at both thresholds across the station network (Figure 8), this relative superiority reflects not genuine predictive skill but rather the conservative prediction behavior of the autoregressive structure, which tends to revert toward historical means and thereby reduces the probability of spurious extreme-threshold crossings at the cost of near-zero detection rates. ARIMA’s ranking advantage at extreme thresholds therefore represents relative rather than absolute superiority [13,18]. VMD-GRU showed the lowest overall robustness consistently across all evaluation domains, occupying last or penultimate positions, and failed to show local compensating advantages at any individual station. VMD’s requirement for a priori specification of the number of intrinsic modes K may have led to spurious mode generation under limited and highly intermittent data conditions, fragmenting the precipitation signal into components without hydroclimatic significance [27,49].

3.7. Limitations and Future Research Directions

This study acknowledges several important limitations that constrain the generalizability of the findings and motivate future research. First, the approximately 12-year record length provides an insufficient number of P95 extreme-event examples for robust neural network training under standard loss functions, contributing to the near-zero detection skill observed at extreme thresholds. Second, the strictly univariate framework—employing daily precipitation as the sole predictive variable—excludes complementary meteorological predictors (atmospheric humidity, wind fields, temperature, large-scale circulation indices) that have been demonstrated to substantially improve extreme-event prediction [20,57]. Third, evaluation was confined to the one-step-ahead (t + 1) horizon; performance degradation at operationally relevant extended horizons (t + 3 to t + 14 days) was not assessed.
Future research should prioritize: (i) multivariate integration of atmospheric precursor variables to improve P90/P95 extreme-event detection; (ii) implementation of event-sensitive loss functions—including weighted MSE, quantile regression loss, and focal loss—that explicitly up-weight rare extreme events during training; (iii) probabilistic and ensemble forecasting frameworks that provide calibrated uncertainty estimates; (iv) multi-horizon evaluation (1–14 days) to characterize temporal performance degradation; and (v) transfer learning or data-augmentation strategies to overcome the limited extreme-event sample size. The application of physics-informed neural networks that incorporate hydrological process knowledge as structural constraints represents a particularly promising research direction for complex mountain environments.

4. Conclusions

This study provided a systematic and methodologically rigorous comparison of five univariate daily rainfall forecasting models—ARIMA, LSTM, GRU, DWT-LSTM, and VMD-GRU—across 18 meteorological stations in Tungurahua Province, Ecuador (2013–2025), using a comprehensive evaluation framework integrating continuous accuracy metrics (MAE, RMSE, NSE, KGE) and categorical event-detection metrics (POD, FAR, CSI) at three precipitation thresholds (1 mm/day, P90, P95). The following principal conclusions are drawn:
LSTM achieved the best overall performance for continuous daily rainfall prediction and rainfall-occurrence detection at the 1 mm/day threshold, while GRU delivered comparable skill at substantially lower computational cost. Both deep learning architectures significantly outperformed ARIMA in continuous accuracy metrics, demonstrating the superiority of nonlinear recurrent architectures over classical linear statistical models for daily rainfall prediction in topographically complex Andean environments.
Model performance was not spatially uniform: station-wise rankings varied considerably across the 18-station network, reflecting the strong hydroclimatic heterogeneity of the Ecuadorian Andean region. No single model ranked first at all stations, confirming that spatially uniform single-model forecasting strategies are suboptimal and that adaptive multi-model systems tailored to local hydroclimatic typology are warranted.
All evaluated models failed to provide operationally reliable detection of high-intensity (P90) and extreme (P95) rainfall events, producing near-zero POD and FAR values at or above 0.8 across the station network. Although ARIMA achieved the best relative ranking for extreme-event detection, its absolute predictive capacity was insufficient for operational early warning or reservoir management. This universal failure at high-intensity thresholds underscores the fundamental inadequacy of univariate approaches—regardless of architectural complexity—for reliably forecasting rare extreme precipitation events from precipitation history alone.
Hybrid decomposition models (DWT-LSTM, VMD-GRU) did not outperform the simpler deep learning architectures in continuous prediction; VMD-GRU showed the lowest overall robustness across all evaluation domains. These results suggest that, for daily precipitation series of approximately 12 years with high intermittency, multi-scale decomposition does not provide a net predictive benefit and may exacerbate error accumulation during reconstruction.
Future operational rainfall forecasting systems in the Ecuadorian Andes should prioritize multivariate predictor frameworks incorporating atmospheric precursor variables, event-sensitive loss functions, and probabilistic uncertainty quantification to address the critical limitations identified in this study, particularly for extreme-event prediction at operationally relevant thresholds.

Author Contributions

Conceptualization, L.C. and M.V.; methodology, M.V. and S.V.; software, M.V., S.V. and L.C.; validation, L.C. and M.V.; formal analysis, M.V.; investigation, S.V. and L.P.; resources, S.V., L.C., L.P. and M.V.; data curation, L.C., L.P. and M.V.; writing—original draft preparation, M.V. and L.C.; writing—review and editing, S.V. and L.P.; visualization, L.P.; supervision, L.C.; project administration, L.C.; funding acquisition, L.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Universidad Técnica de Ambato through the Research and Development Direction (DIDE) research project PFICM31 “Pronóstico de precipitaciones en la provincia de Tungurahua usando modelación estocástica”.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors greatly acknowledge the support of the Gobierno Provincial de Tungurahua and the Instituto Nacional de Metereología e Hidrología (INHAMI) for the weather stations data provided for this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Salih, S.Q.; et al. Integrative stochastic model standardization with genetic algorithm for rainfall pattern forecasting in tropical and semi-arid environments. Hydrol. Sci. J. 2020, 65, 1145–1157. [Google Scholar] [CrossRef]
  2. Xu, L.; Chen, N.; Zhang, X.; Chen, Z. A data-driven multi-model ensemble for deterministic and probabilistic precipitation forecasting at seasonal scale. Clim. Dyn. 2020, 54, 3355–3374. [Google Scholar] [CrossRef]
  3. Buhl, M.; Markolf, S. A review of emerging strategies for incorporating climate change considerations into infrastructure planning, design, and decision making. Sustain. Resilient Infrastruct. 2023, 8, 157–169. [Google Scholar] [CrossRef]
  4. Saha, A.; Singh, K.N.; Ray, M.; Rathod, S. A hybrid spatio-temporal modelling: An application to space-time rainfall forecasting. Theor. Appl. Climatol. 2020, 142, 1271–1282. [Google Scholar] [CrossRef]
  5. Li, Y.; Niu, Z.; Liu, C.; Yan, C. Credible Bayesian reliability model for structures with interval uncertain parameters. Structures 2022, 45, 2151–2161. [Google Scholar] [CrossRef]
  6. Chen, L.; et al. Global increase in the occurrence and impact of multiyear droughts. Science 2025, 387, 278–284. [Google Scholar] [CrossRef]
  7. Yaseen, Z.M.; Ali, M.; Sharafati, A.; Al-Ansari, N.; Shahid, S. Forecasting standardized precipitation index using data intelligence models: Regional investigation of Bangladesh. Sci. Rep. 2021, 11, 3435. [Google Scholar] [CrossRef]
  8. Ashok, S.P.; Pekkat, S. A systematic quantitative review on the performance of some of the recent short-term rainfall forecasting techniques. J. Water Clim. Chang. 2022, 13, 3004–3029. [Google Scholar] [CrossRef]
  9. Ebtehaj, I.; Bonakdari, H.; Zeynoddin, M.; Gharabaghi, B.; Azari, A. Evaluation of preprocessing techniques for improving the accuracy of stochastic rainfall forecast models. Int. J. Environ. Sci. Technol. 2020, 17, 505–524. [Google Scholar] [CrossRef]
  10. Abdaki, M.; Al-Ozeer, A.Z.; Alobaydy, O.; Al-Tayawi, A.N. Predicting rainfall in Nineveh Governorate in northern Iraq using machine learning time-series forecasting algorithm. Arab. J. Geosci. 2023, 16, 655. [Google Scholar] [CrossRef]
  11. Sheikh, M.R.; Coulibaly, P. Review of recent developments in hydrologic forecast merging techniques. Water 2024, 16, 301. [Google Scholar] [CrossRef]
  12. Duan, X.; Ma, R.; Zhang, X. Forecasting occurrence and quantity of monthly precipitation simultaneously while accounting for complex serial correlation. Int. J. Climatol. 2022, 42, 9494–9509. [Google Scholar] [CrossRef]
  13. Lana, X.; et al. Autoregressive process of monthly rainfall amounts in Catalonia (NE Spain) and improvements on predictability of length and intensity of drought episodes. Int. J. Climatol. 2021, 41, e2418–e2430. [Google Scholar] [CrossRef]
  14. Dahiya, P.; et al. Time series study of climate variables utilising a seasonal ARIMA technique for the Indian states of Punjab and Haryana. Discov. Appl. Sci. 2024, 6, 650. [Google Scholar] [CrossRef]
  15. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  16. Cho, K.; et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
  17. Wani, O.A.; et al. Predicting rainfall using machine learning, deep learning, and time series models across an altitudinal gradient in the North-Western Himalayas. Sci. Rep. 2024, 14, 27876. [Google Scholar] [CrossRef]
  18. Pérez-Alarcón, A.; Garcia-Cortes, D.; Fernández-Alvarez, J.C.; Martínez-González, Y. Improving monthly rainfall forecast in a watershed by combining neural networks and autoregressive models. Environ. Process. 2022, 9, 53. [Google Scholar] [CrossRef]
  19. Rogers, J.K.B.; Mercado, T.C.R.; Galleto, F.A., Jr. Comparison of ARIMA boost, Prophet boost, and TSLM models in forecasting Davao City weather data. Indones. J. Electr. Eng. Comput. Sci. 2024, 34, 1092–1101. [Google Scholar] [CrossRef]
  20. Fahad, S.; Su, F.; Khan, S.U.; Naeem, M.R.; Wei, K. Implementing a novel deep learning technique for rainfall forecasting via climatic variables: An approach via hierarchical clustering analysis. Sci. Total Environ. 2023, 854, 158760. [Google Scholar] [CrossRef]
  21. Xu, H.; Guo, Z.; Cao, Y.; Cheng, X.; Zhang, Q.; Chen, D. Research on short-term precipitation forecasting method based on CEEMDAN-GRU algorithm. Sci. Rep. 2024, 14, 31885. [Google Scholar] [CrossRef]
  22. Widiasari, I.R.; Efendi, R. Utilizing LSTM-GRU for IoT-based water level prediction using multi-variable rainfall time series data. Informatics 2024, 11, 73. [Google Scholar] [CrossRef]
  23. Ridwan, W.M.; Sapitang, M.; Aziz, A.; Kushiar, K.F.; Ahmed, A.N.; El-Shafie, A. Rainfall forecasting model using machine learning methods: Case study Terengganu, Malaysia. Ain Shams Eng. J. 2021, 12, 1651–1663. [Google Scholar] [CrossRef]
  24. Barrera-Animas, A.Y.; et al. Rainfall prediction: A comparative analysis of modern machine learning algorithms for time-series forecasting. Mach. Learn. Appl. 2022, 7, 100204. [Google Scholar] [CrossRef]
  25. Singh, S.; Mukherjee, A.; Panda, J.; Choudhury, A.; Bhattacharyya, S. Analysis and forecasting of temporal rainfall variability over hundred Indian cities using deep learning approaches. Earth Syst. Environ. 2024, 8, 599–625. [Google Scholar] [CrossRef]
  26. Wu, X.; et al. The development of a hybrid wavelet-ARIMA-LSTM model for precipitation amounts and drought analysis. Atmosphere 2021, 12, 74. [Google Scholar] [CrossRef]
  27. Cui, X.; Wang, Z.; Pei, R. A VMD-MSMA-LSTM-ARIMA model for precipitation prediction. Hydrol. Sci. J. 2023, 68, 810–839. [Google Scholar] [CrossRef]
  28. Li, D.; et al. Prediction of rainfall time series using the hybrid DWT-SVR-Prophet model. Water 2023, 15, 1935. [Google Scholar] [CrossRef]
  29. Guo, T. Extreme precipitation strongly impacts the interaction of skewness and kurtosis of annual precipitation distribution on the Qinghai-Tibetan Plateau. Atmosphere 2022, 13, 1857. [Google Scholar] [CrossRef]
  30. Cabrera, D.; et al. Rainfall nowcasting using a Bayesian framework and long short-term memory multi-model estimation: Case study, Andean Ecuadorian tropical city. Preprint 2022. [Google Scholar] [CrossRef]
  31. Necesito, I.V.; Kim, D.; Bae, Y.H.; Kim, K.; Kim, S.; Kim, H.S. Deep learning-based univariate prediction of daily rainfall: Application to a flood-prone, data-deficient country. Atmosphere 2023, 14, 632. [Google Scholar] [CrossRef]
  32. Chidichimo, F.; Catelan, P.; Lupiano, V.; Straface, S.; Di Gregorio, S. A first simulation of the impact upon the Hidroagoyán Dam due to lahars triggered by an 1877-type Cotopaxi eruption in Ecuador. Geosciences 2022, 12, 376. [Google Scholar] [CrossRef]
  33. García, V.J.; Márquez, C.O.; Isenhart, T.M.; Rodríguez, M.; Crespo, S.D.; Cifuentes, A.G. Evaluating the conservation state of the páramo ecosystem: An object-based image analysis and CART algorithm approach for central Ecuador. Heliyon 2019, 5, e02701. [Google Scholar] [CrossRef]
  34. Fonseca, K.; et al. Assessing the potential of nature-based solutions as sustainable land and water management strategies in the high tropical Andean páramo ecosystem. J. Environ. Manag. 2024, 372, 123350. [Google Scholar] [CrossRef]
  35. Delgado-Gutierrez, E.; Canivell, J.; Bienvenido-Huertas, D.; Hidalgo-Sánchez, F.M. Adaptive comfort potential in different climate zones of Ecuador considering global warming. Energies 2024, 17, 2017. [Google Scholar] [CrossRef]
  36. Andrade, C.; Fonseca, A.; Santos, J.A.; Bois, B.; Jones, G.V. Historic changes and future projections in Köppen-Geiger climate classifications in major wine regions worldwide. Climate 2024, 12, 94. [Google Scholar] [CrossRef]
  37. Valipour, M.; Khoshkam, H.; Bateni, S.M.; Jun, C. Machine-learning-based short-term forecasting of daily precipitation in different climate regions across the contiguous United States. Expert Syst. Appl. 2024, 238, 121907. [Google Scholar] [CrossRef]
  38. Liu, Y.; Liu, S.; Chen, J. RLNformer: A rainfall levels nowcasting model based on Conv1D-Transformer for the northern Xinjiang area of China. Water 2023, 15, 3650. [Google Scholar] [CrossRef]
  39. NASA POWER. Data Access Viewer (DAV). Available online: https://power.larc.nasa.gov/data-access-viewer/ (accessed on 11 November 2025).
  40. Saleh, A.; Tan, M.L.; Yaseen, Z.M.; Zhang, F. Integrated machine learning models for enhancing tropical rainfall prediction using NASA POWER meteorological data. J. Water Clim. Chang. 2024, 15, 6022–6042. [Google Scholar] [CrossRef]
  41. Faccin, C.; Specht, L.P.; de Almeida Junior, P.O.B.; Schuster, S.L.; Pacheco, L.C. Comparison of climate data from NASA POWER reanalysis and data measured at surface weather stations for application in Brazilian paving projects. An. Inst. Geociênc. 2024, 47. [Google Scholar] [CrossRef]
  42. Box, G.E.P.; Jenkins, G.M. Time Series Analysis: Forecasting and Control; Holden-Day: San Francisco, CA, USA, 1976. [Google Scholar]
  43. Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar]
  44. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  45. Waqas, M.; Humphries, U.W. A critical review of RNN and LSTM variants in hydrological time series predictions. MethodsX 2024, 13, 102946. [Google Scholar] [CrossRef] [PubMed]
  46. Jensen, A.; La Cour-Harbo, A. Ripples in Mathematics: The Discrete Wavelet Transform; Springer: Berlin, Germany, 2001. [Google Scholar] [CrossRef]
  47. Strömbergsson, D.; Marklund, P.; Berglund, K.; Saari, J.; Thomson, A. Mother wavelet selection in the discrete wavelet transform for condition monitoring of wind turbine drivetrain bearings. Wind Energy 2019, 22, 1581–1592. [Google Scholar] [CrossRef]
  48. Dragomiretskiy, K.; Zosso, D. Variational mode decomposition. IEEE Trans. Signal Process. 2014, 62, 531–544. [Google Scholar] [CrossRef]
  49. Zhou, H.; Schertzer, D.; Tchiguirinskaia, I. Combining recurrent neural networks with variational mode decomposition and multifractals to predict rainfall time series. Hydrol. Earth Syst. Sci. 2025, 29, 4437–4455. [Google Scholar] [CrossRef]
  50. Moriasi, D.N.; Arnold, J.G.; Van Liew, M.W.; Bingner, R.L.; Harmel, R.D.; Veith, T.L. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans. ASABE 2007, 50, 885–900. [Google Scholar] [CrossRef]
  51. Morales-Acuña, E.; Linero-Cueto, J.R.; Canales, F.A. Assessment of precipitation variability and trends based on satellite estimations for a heterogeneous Colombian region. Hydrology 2021, 8, 128. [Google Scholar] [CrossRef]
  52. Zakwan, M.; Ara, Z. Statistical analysis of rainfall in Bihar. Sustain. Water Resour. Manag. 2019, 5, 1781–1789. [Google Scholar] [CrossRef]
  53. Haseeb, F.; Ali, S.; Ahmed, N.; Alarifi, N.; Youssef, Y.M. Comprehensive probabilistic analysis and practical implications of rainfall distribution in Pakistan. Atmosphere 2025, 16, 122. [Google Scholar] [CrossRef]
  54. Arias, P.A.; et al. Hydroclimate of the Andes Part II: Hydroclimate variability and sub-continental patterns. Front. Earth Sci. 2021, 8, 505467. [Google Scholar] [CrossRef]
  55. Rollenbeck, R.; Bendix, J. Rainfall distribution in the Andes of southern Ecuador derived from blending weather radar data and meteorological field observations. Atmos. Res. 2011, 99, 277–289. [Google Scholar] [CrossRef]
  56. Farfán-Durán, J.F.; Cea, L. Streamflow forecasting with deep learning models: A side-by-side comparison in Northwest Spain. Earth Sci. Inform. 2024, 17, 5289–5315. [Google Scholar] [CrossRef]
  57. Khaniani, A.S.; Motieyan, H.; Mohammadi, A. Rainfall forecast based on GPS PWV together with meteorological parameters using neural network models. J. Atmos. Sol. Terr. Phys. 2021, 214, 105533. [Google Scholar] [CrossRef]
  58. Montenegro, R.; Célleri, R.; Orellana-Alvear, J.; Muñoz, P.; Córdova, M. Precipitation forecasting using random forest over an Ecuadorian Andes basin. Meteorol. Atmos. Phys. 2025, 137, 5. [Google Scholar] [CrossRef]
  59. Gao, R.X.; Yan, R. Wavelets: Theory and Applications for Manufacturing; Springer: Berlin, Germany, 2011. [Google Scholar]
  60. Gomaa, E.; et al. Assessment of hybrid machine learning algorithms using TRMM rainfall data for daily inflow forecasting in Três Marias Reservoir, eastern Brazil. Heliyon 2023, 9, e18819. [Google Scholar] [CrossRef] [PubMed]
  61. Latif, S.D.; et al. Assessing rainfall prediction models: Exploring the advantages of machine learning and remote sensing approaches. Alex. Eng. J. 2023, 82, 16–25. [Google Scholar] [CrossRef]
Figure 1. Location of the study area and spatial distribution of the 18 meteorological stations in Tungurahua Province, Ecuador.
Figure 1. Location of the study area and spatial distribution of the 18 meteorological stations in Tungurahua Province, Ecuador.
Preprints 216724 g001
Figure 2. Station-wise temporal evolution of daily precipitation (mm/day) during the full observation period. Each panel corresponds to one station, with the station code indicated. Grey shading denotes the training subset, and the dashed vertical lines delimit the validation and test subsets.
Figure 2. Station-wise temporal evolution of daily precipitation (mm/day) during the full observation period. Each panel corresponds to one station, with the station code indicated. Grey shading denotes the training subset, and the dashed vertical lines delimit the validation and test subsets.
Preprints 216724 g002
Figure 3. Boxplot distribution of daily precipitation across meteorological stations: (a) typical precipitation regime (y-axis capped at 30 mm) and (b) full range including extreme values. Stations are ordered by increasing elevation.
Figure 3. Boxplot distribution of daily precipitation across meteorological stations: (a) typical precipitation regime (y-axis capped at 30 mm) and (b) full range including extreme values. Stations are ordered by increasing elevation.
Preprints 216724 g003
Figure 4. Mean intra-annual precipitation cycle across stations based on monthly rainfall averages. Stations are grouped by rainfall regime type (unimodal, bimodal, relatively uniform). The monthly mean ± 1 standard deviation across all stations is shown as the reference envelope.
Figure 4. Mean intra-annual precipitation cycle across stations based on monthly rainfall averages. Stations are grouped by rainfall regime type (unimodal, bimodal, relatively uniform). The monthly mean ± 1 standard deviation across all stations is shown as the reference envelope.
Preprints 216724 g004
Figure 5. Cross-station mean ± standard deviation of continuous performance metrics for the five evaluated models: (a) MAE, (b) RMSE, (c) NSE, (d) KGE, and (e) mean bias. Points represent the cross-station mean for each model; horizontal error bars represent ±1 standard deviation. Lower values indicate better performance for MAE, RMSE, and mean bias; higher values indicate better performance for NSE and KGE.
Figure 5. Cross-station mean ± standard deviation of continuous performance metrics for the five evaluated models: (a) MAE, (b) RMSE, (c) NSE, (d) KGE, and (e) mean bias. Points represent the cross-station mean for each model; horizontal error bars represent ±1 standard deviation. Lower values indicate better performance for MAE, RMSE, and mean bias; higher values indicate better performance for NSE and KGE.
Preprints 216724 g005
Figure 6. Station-wise ranking of the five evaluated models according to continuous performance metrics: (a) MAE, (b) RMSE, (c) NSE, (d) KGE, (e) absolute mean bias, and (f) integrated continuous metric ranking. Rankings are assigned within each station, where rank 1 indicates the best relative performance and rank 5 the worst.
Figure 6. Station-wise ranking of the five evaluated models according to continuous performance metrics: (a) MAE, (b) RMSE, (c) NSE, (d) KGE, (e) absolute mean bias, and (f) integrated continuous metric ranking. Rankings are assigned within each station, where rank 1 indicates the best relative performance and rank 5 the worst.
Preprints 216724 g006
Figure 7. Station-wise distribution of categorical metrics for rainfall event detection at the 1 mm/day threshold: (a) Probability of Detection (POD), (b) False Alarm Ratio (FAR), and (c) Critical Success Index (CSI). Boxplots show the median, interquartile range, and full distribution across the 18 stations.
Figure 7. Station-wise distribution of categorical metrics for rainfall event detection at the 1 mm/day threshold: (a) Probability of Detection (POD), (b) False Alarm Ratio (FAR), and (c) Critical Success Index (CSI). Boxplots show the median, interquartile range, and full distribution across the 18 stations.
Preprints 216724 g007
Figure 8. Station-wise distribution of categorical metrics for high- and extreme-rainfall event detection: (a) POD at P90, (b) FAR at P90, (c) CSI at P90, (d) POD at P95, (e) FAR at P95, and (f) CSI at P95. All metrics are computed from the test subset.
Figure 8. Station-wise distribution of categorical metrics for high- and extreme-rainfall event detection: (a) POD at P90, (b) FAR at P90, (c) CSI at P90, (d) POD at P95, (e) FAR at P95, and (f) CSI at P95. All metrics are computed from the test subset.
Preprints 216724 g008
Table 1. Geographic and record characteristics of the 18 meteorological stations included in this study.
Table 1. Geographic and record characteristics of the 18 meteorological stations included in this study.
No. Station ID Canton UTM Easting (m) UTM Northing (m) Elevation (m a.s.l.) Record Start Record End Records (n) Missing Data (%)
1 Chiquiurcu CH Ambato 743,787 9,866,064 3875 2013-02-16 2025-04-20 4447 0.00
2 Mula Corral MC Ambato 741,602 9,867,738 3875 2013-03-13 2025-04-28 4430 0.00
3 Pampas de Salasaca PS Mocha 757,194 9,844,510 3760 2013-03-13 2025-04-28 4430 0.00
4 Quisapincha QP Ambato 753,559 9,865,921 3670 2013-02-16 2024-08-23 4207 0.00
5 Embalse Pisayambo PY Píllaro 790,071 9,881,472 3604 2013-02-08 2025-04-25 4460 0.00
6 Calamaca CM Ambato 742,705 9,858,860 3437 2013-03-02 2025-03-07 4389 0.00
7 Pilahuin PL Ambato 752,358 9,856,011 3314 2013-01-19 2025-02-16 4412 0.05
8 Escuela Tasinteo ET Píllaro 777,991 9,870,930 3300 2013-03-20 2025-04-15 4410 0.00
9 U. E. Pedro Fermín Cevallos FC Cevallos 765,641 9,849,972 2910 2013-03-13 2025-04-28 4430 0.00
10 Hacienda Cunchibamba CB Ambato 767,300 9,874,583 2861 2013-08-28 2025-02-20 4195 0.00
11 Huambaló HB Pelileo 774,743 9,846,179 2800 2013-02-23 2025-05-06 4456 0.02
12 U. E. Jorge Álvarez JA Píllaro 772,342 9,870,622 2770 2013-03-05 2025-04-25 4435 0.00
13 U. E. Antonio José de Sucre AS Patate 778,876 9,860,556 2700 2013-03-14 2023-05-08 3708 0.00
14 Aeropuerto AP Ambato 769,929 9,865,679 2590 2013-02-08 2025-05-06 4472 0.00
15 U. E. Benjamín Araujo BA Patate 778,205 9,856,142 2270 2013-02-23 2025-02-19 4380 0.07
16 Hacienda Guadalupe GL Patate 778,853 9,849,321 2013 2013-01-25 2025-03-23 4441 0.00
17 Parque de la Familia PF Baños 791,471 9,845,439 1695 2013-03-15 2025-03-22 4391 0.00
18 Río Verde RV Baños 800,465 9,845,046 1529 2013-02-06 2025-04-21 4458 0.00
Notes: U. E. = Unidad Educativa (Educational Unit); coordinates are reported in UTM (m), WGS84, Zone 17S.
Table 2. Rainfall regime descriptors derived from daily precipitation series for the 18 stations in Tungurahua Province, Ecuador.
Table 2. Rainfall regime descriptors derived from daily precipitation series for the 18 stations in Tungurahua Province, Ecuador.
Station Mean (mm) Median (mm) Std Dev (mm) Max (mm) Skewness Kurtosis CV (%) P90 (mm) P95 (mm) Rainy Days (%) SI (WL) PCI Wettest Month Driest Month Regime
CH 2.94 1.10 4.52 45.4 2.87 11.78 154.0 8.34 12.10 74.21 0.21 8.98 June Sep Unimodal
MC 2.57 0.90 4.01 35.6 2.79 10.40 155.9 7.40 10.70 73.39 0.19 8.88 June Sep Bimodal
PS 2.52 0.80 4.06 45.8 3.18 15.49 160.9 7.30 10.00 71.53 0.21 8.78 Apr Sep Bimodal
QP 2.66 1.10 4.06 39.3 3.09 14.04 152.4 7.20 10.70 78.13 0.16 8.64 Apr Sep Uniform
PY 3.58 1.60 5.26 53.8 2.94 12.60 147.1 9.70 13.61 83.68 0.26 9.14 June Oct Unimodal
CM 1.67 0.30 3.02 29.7 3.28 14.79 180.4 5.00 7.65 60.99 0.19 8.71 June Sep Bimodal
PL 1.98 0.60 4.23 107.0 8.56 141.08 213.4 5.20 8.10 71.12 0.22 8.88 Mar Sep Bimodal
ET 2.19 0.60 3.92 46.7 3.56 18.47 179.4 6.40 9.90 66.83 0.20 8.86 May Sep Unimodal
FC 1.27 0.10 2.82 38.6 4.54 30.23 221.3 3.80 6.46 54.38 0.19 8.74 June Feb Bimodal
CB 1.14 0.10 2.93 41.0 5.28 39.90 256.8 3.20 5.90 51.66 0.26 9.08 Nov Sep Bimodal
HB 2.10 0.30 4.63 108.0 6.96 96.71 220.8 6.10 9.30 60.48 0.19 8.75 June Oct Bimodal
JA 1.46 0.20 3.12 41.5 4.23 26.00 214.3 4.30 7.30 57.18 0.28 9.17 Apr Sep Bimodal
AS 2.71 0.90 4.60 43.8 3.45 16.31 169.4 7.40 11.40 75.11 0.28 9.49 June Feb Unimodal
AP 1.33 0.10 3.04 35.1 4.46 27.57 229.1 4.00 6.90 52.96 0.24 9.12 Apr Sep Bimodal
BA 1.89 0.10 5.29 153.0 11.61 250.28 279.5 5.40 9.51 55.02 0.30 9.44 May Sep Unimodal
GL 1.92 0.30 3.81 52.0 3.78 20.92 198.0 5.90 9.50 63.21 0.31 9.34 June Jan Unimodal
PF 3.50 0.70 6.05 63.5 2.98 12.08 173.0 10.90 15.60 72.72 0.33 9.54 July Oct Unimodal
RV 7.84 3.70 13.73 255.0 6.69 80.61 175.0 19.83 28.12 82.12 0.25 8.97 June Nov Bimodal
Notes: Minimum recorded precipitation at all stations was 0 mm. CV: Coefficient of Variation; SI (WL): Walsh-Lawler Seasonality Index; PCI: Precipitation Concentration Index.
Table 3. Integrated cross-domain synthesis of model performance across continuous prediction, station-wise continuous ranking, and categorical rainfall event detection at 1 mm/day, P90, and P95 thresholds. Ordinal labels in parentheses indicate relative rank within each evaluation domain (1st = best, 5th = worst).
Table 3. Integrated cross-domain synthesis of model performance across continuous prediction, station-wise continuous ranking, and categorical rainfall event detection at 1 mm/day, P90, and P95 thresholds. Ordinal labels in parentheses indicate relative rank within each evaluation domain (1st = best, 5th = worst).
Model Continuous Skill Stations Ranked Best Detection at 1 mm P90 Detection P95 Detection Main Performance Profile
LSTM 2.26 ± 0.72 (1st) 8 stations 2.70 ± 0.59 (1st) 3.40 ± 0.48 (4th) 3.17 ± 0.24 (t-4th) Best overall for continuous prediction and baseline detection; weaker at high-intensity thresholds
GRU 2.51 ± 0.82 (2nd) 7 stations 2.73 ± 0.47 (2nd) 2.95 ± 0.62 (3rd) 3.09 ± 0.31 (3rd) Most balanced model; competitive continuous performance and moderate categorical robustness across all thresholds
DWT-LSTM 3.06 ± 1.04 (3rd) 5 stations 3.13 ± 0.82 (4th) 2.86 ± 0.59 (2nd) 2.87 ± 0.54 (2nd) Intermediate; comparatively stronger for high- and extreme-rainfall detection due to wavelet-based multi-scale decomposition
ARIMA 3.49 ± 0.85 (4th) 0 stations 3.07 ± 0.76 (3rd) 1.81 ± 0.73 (1st) 2.05 ± 0.69 (1st) Weakest for continuous prediction, but best relative performance for P90/P95 detection; absolute skill insufficient for operations
VMD-GRU 3.69 ± 0.64 (5th) 0 stations 3.36 ± 0.64 (5th) 3.51 ± 0.61 (5th) 3.17 ± 0.24 (t-4th) Lowest overall robustness; poor performance across all evaluation domains with no compensating local advantages
Note: Continuous skill is expressed as the mean integrated rank across MAE, RMSE, NSE, and KGE (lower values indicate better overall performance). t-4th: tied for 4th place.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated