Advancements and Challenges in Photovoltaic Power Forecasting: A Comprehensive Review

Paolo Di Leo; Alessandro Ciocia; Gabriele Malgaroli; Filippo Spertino

doi:10.20944/preprints202502.2234.v1

Submitted:

27 February 2025

Posted:

27 February 2025

You are already at the latest version

Abstract

The fast growth of photovoltaic power generation requires dependable forecasting methods to support efficient solar energy power integration into power systems. This study conducts an up-to-date systemized analysis of different models and methods used for photovoltaic power prediction. It begins with a new taxonomy, classifying PV forecasting models according to time horizon, architectures, and selection criteria matched to certain application areas. An overview of the most popular heterogeneous forecasting techniques, including physical models, statistical methodologies, machine learning algorithms, and hybrid approaches, is provided; their respective advantages and disadvantages are put into perspective based on different forecasting tasks. The paper also explores advanced model optimization methodologies, achieving hyperparameter tuning, feature selection, and the use of evolutionary and swarm intelligence algorithms, which have shown promise in enhancing the accuracy and efficiency of PV power forecasting models. The review includes a detailed examination of performance metrics and frameworks, as well as the consequences of different weather conditions affecting renewable energy generation and the operational as well as economic implications of forecasting performance. The paper also highlights recent advancements in the field, including the use of deep learning architectures, the incorporation of diverse data sources, and the development of real-time and on-demand forecasting solutions. Finally, the paper identifies key challenges and future research directions, emphasizing the need for improved model adaptability, data quality, and computational efficiency to support the large-scale integration of PV power into future energy systems. By providing a holistic and critical assessment of the PV power forecasting landscape, this review aims to serve as a valuable resource for researchers, practitioners, and decision-makers working towards the sustainable and reliable deployment of solar energy worldwide.

Keywords:

photovoltaic power forecasting

;

solar energy

;

forecasting models

;

machine learning

;

hybrid approaches

;

optimization strategies

;

performance evaluation

;

future directions

Subject:

Engineering - Electrical and Electronic Engineering

1. Introduction

The rapid growth of photovoltaic (PV) is dominant in the transition paths towards clean energy thanks to the high reliability of PV plants [1,2] and the abundance of sun resource. As the global capacity of installed PV systems keeps expanding, PV power generation forecasting becomes increasingly crucial for an efficient electrical grid management system and renewables into the power supply system management strategy. The global PV market has experienced remarkable growth in recent years, with a cumulative capacity reaching 1,186 GW by the end of 2022 [3]. This rapid expansion is expected to continue, as countries worldwide strive to reduce their reliance on fossil fuels and meet their ambitious climate targets. Figure 1 illustrates the historical growth and future projections of global cumulative installed PV capacity based on data from the IEA PVPS report. The rapid growth of solar photovoltaic electricity generation is particularly evident in Brazil, where the installed capacity has risen from 2,455 MW in 2018 to 47,033 MW in August 2024 [4].

However, the large-scale integration of PV power into electrical grids poses severe challenges, stemming from the inherent variability and uncertainty in solar radiation. The power output of PV plants is primarily sensitive to the weather parameters: cloud cover, temperature, and humidity, which, in their turn, tend to undergo very rapid and sometimes unpredictable changes [5]. Such fluctuations pose disturbances in the supply-demand equilibrium, variation in voltage and frequency, and increased demand for ancillary services that yield instability and unreliability in the grid [6]. Accurate forecasting of PV power has become one of the critical solutions to the above-mentioned challenges of photovoltaic systems in order to allow for the efficient integration of solar energy into power systems. By providing reliable predictions of PV power generation at various temporal and spatial scales, forecasting models enable better planning, scheduling, and operation of the grid [7]. Grid operators can leverage PV power forecasts to optimize dispatch decisions, reduce reserve requirements, and minimize the costs associated with balancing supply and demand [8]. Accurate forecasts are crucial for the involvement of PV power plant owners and investors in the electricity markets, financial risk management and revenue maximization [9]. Moreover, accurate forecasts are a prerequisite for investment planning, allowing the financial viability evaluation of PV projects. In the past decade, several PV power forecasting techniques have evolved, mostly ranging from simple statistical methods to more complex machine learning algorithms or, at their most sophisticated, hybrid physical-statistical models. Still, while the last decade has seen improvement in forecasting power output from PV systems, challenges still abound. The variability and intermittency of solar power due to changing weather conditions complicate accurate forecasting. Traditional forecasting models are often incapable of satisfactorily capturing a complex, nonlinear relationship between various atmospheric parameters and PV output [10]. Besides, a lingering shortage of good large-scale training data has hampered the building of robust models. The purpose of this review paper will be to not only provide an overview of the present cutting-edge work in PV power forecasting models-make known the strengths and weaknesses of different approaches and thus suggest strategies for improvement of PV forecasting. This study focuses on the development of advanced optimization techniques and the application of hybrid models to predict PV power output accurately. Through a systematic analysis of various forecasting approaches, this work aims at contributing to the development of more accurate and reliable PV power forecasting. The paper is structured in the following way: Section 2 offers a taxonomy of PV power forecasting models, based on temporal horizons, architectural features, and selection criteria based on specific application contexts. Section 3 offers a discussion of the state-of-the-art techniques in PV power forecasting, providing special emphasis on artificial neural networks (ANNs), support vector machines (SVMs), and ensemble and hybrid approaches. Section 4 highlights various optimization techniques for models inclusive of hyperparameter tuning, feature selection and engineering, and blending with evolutionary and swarm-based intelligence algorithms. It further provides insights into the evaluation of PV forecasting models under a range of weather conditions, benchmarking, model comparisons, along with their economic and operational power applications. Section 6 discusses recent innovations in PV power forecasting, such as the integration of advanced deep learning architectures, the incorporation of diverse data sources, the application of metaheuristic optimization algorithms, and the development of real-time and on-demand forecasting solutions. Section 7 discusses primary challenges and future directions of research in PV power forecasting with respect to accurate forecasts, improved adaptability of models, and consideration of practical issues concerning data quality, computational efficiency, and integration into the grid. Finally, Section 8 summarizes the paper with conclusions based on its findings and recommendations for future works. In providing a holistic and current view on the rapidly advancing field of PV power forecasting, this review aims to serve as a valuable reference point for researchers, practitioners, and decision-makers striving toward large-scale solar energy integration into modern power systems. Such knowledge and recommendations are envisaged to spur further research and innovation and go ahead to contribute to the development of better accurate, reliable, and more practical PV power forecasting solutions.

2. Taxonomy

In order to manage the very diverse PV power forecast models, one first has to establish a taxonomy that will serve to differentiate among models based on parameters such as prediction horizon, several architectural aspects, and selection criteria in line with respective applications.

The proposed taxonomy, as depicted in Figure 2, gives an organized overview of the major dimensions based on which PV power forecasting models can be categorized. The temporal horizon dimension classifies models according to the time frames of the forecasts: intra-hour, intra-day, day-ahead, medium-term and long-term horizons. The architecture approach dimension is the basis of the differences between physical models, data-driven models (which can be further categorized into statistical and machine learning methods), and hybrid models resulting from the mixtures of several approaches. Finally, the selection criteria dimension consists of aspects such as application context, accuracy requirements, interpretability, computational efficiency, and data availability when selecting a forecasting model to meet the demands of a particular use case. This taxonomy forms the basis for the next sections dealing with the state-of-the-art techniques, model optimization strategies, and evaluation frameworks for PV power forecasting.

2.1. Temporal Horizon Classification

One of the primary dimensions for classifying PV power forecasting models is the temporal horizon, which refers to the time scale of the predictions. This term refers to the scale of the predictions in terms of time. The selection of the right temporal horizon varies according to the demands of a specific application, such as grid operation, energy trading, or maintenance scheduling [6]. The main categories with the corresponding applications are presented in Table 1.

2.2. Model Architecture Classification

Models for PV power forecasting also need classification regarding their underlying architectural method or mathematical structure. The architectural design of a model delivers substantial consequences on computational operations as well as interpretability and performance outcome.

2.3. Physical Models

Physical Models which operate under the name white-box models derive their functionality from the fundamental PV power generation mechanisms [10]. The accurate operation of these systems requires specific information about the complete set of PV features such as module specifications, orientation, and shading effects. Physical models need solar irradiance as well as temperature data and wind speed information [5] for their operational effectiveness. According to [11], PV power generation prediction alongside PV system fault detection becomes possible through accurate weather measurements combined with accurate irradiance measurements [11]. Some examples of physical models include:

Clear sky models: These models estimate the maximum achievable PV power under perfect cloudless situations through a combination of astronomical calculations and atmospheric measurements.
Decomposition models: These models achieve sun irradiance decomposition into direct and diffuse fractions for better PV performance forecasting across diverse sky scenarios.
Semi-empirical models: These models use physical equations combined with empirically obtained coefficients to generate predictions of PV power output from selected environmental factors.

The physical models both give users insights into performance factors and maintain clear interpretation. However, the major disadvantages include their extensive need for system-specific information and the difficulty in representing the range and complex nonlinear inter-relationship of various factors [7].

2.3.1. Data-Driven Models

Data-driven models, or black-box models, utilize historical data pertaining to PV power generation and its associated parameters to develop patterns and realize the interrelationships without explicit knowledge of the physical processes [5]. These models can further be divided into three major approaches:

Physical approaches: These models implement physical equations to convert solar irradiance data into predictions of produced electricity. Typical input sources include numerical weather predictions (NWP), satellite images, and data from meteorological stations [5].
Statistical approaches: These models build correlations between input parameters and output based on concepts such as persistence or time series. They encompass traditional statistical methods (time series and regression) and artificial intelligence models, such as neural networks, LSTM, and SVM [5].
Hybrid approaches: These models are an amalgamation of physical correlations with statistical techniques to improve forecasting accuracy. They generally use technical parameters of PV panels estimated from historical data [5].

The performance of these models depends on several factors: these include weather conditions, forecasting horizon, geographical location, and quality of data. For very short-term forecasts (few minutes), statistical approaches based on historical data are simple and effective. Satellite images allow horizons of a few hours, while NWP are well-suited for horizons beyond 6 hours [5]. Statistical models include a variety of techniques:

Regression models: Regression models are used to find the linear or nonlinear relations between PV power output and explanatory variables such as solar irradiance, temperature, and time of day [12].
Time series models: These models establish temporal dependencies in PV power generation data, using methods such as autoregressive integrated moving average (ARIMA), seasonal ARIMA (SARIMA), and exponential smoothing [6].

Machine learning models, in contrast, exploit advanced algorithms used to uncover complex patterns and relationships in data [13]. Commonly used machine learning techniques for PV power forecasting comprise [14]:

Artificial neural networks (ANNs): Consisting of interconnected nodes capable of learning nonlinear relationships among input variables and PV power [15].
Support vector machines (SVMs): The aim of SVM is to find the optimal hyperplane that separates PV power output classes or predicts continuous output values [6].
Ensemble models: These combine several distinct forecasting models, like decision trees or ANNs, to boost predictive accuracy and robustness [7].
Deep learning models: These have introduced extensions to ANN-wise deep architectures, such as convolutional neural networks (CNNs) and long short-term memory (LSTM) networks, to capture buried hierarchical and temporal patterns in PV power data [5].

Data-driven models have performed commendably well on several PV power forecasting tasks. Cumulatively, most data-driven models perform better than physical ones when it comes to accuracy [5]. However, they require large amounts of historical data and may be less interpretable when compared to physical models.

2.3.2. Hybrid Models

Hybrid models combine the strengths of physical and data-driven approaches to improve the accuracy and robustness of PV power forecasting. These models utilize domain expertise and physical models interpretability while exploiting flexibility and learning capabilities of data-driven techniques. By incorporating alternative modeling paradigms, hybrid models are capable of effectively capturing the complex dynamics and uncertainties associated with PV power generation [5,7,12]. An in-depth discussion of hybrid model architectures, advantages, and applications can be found in Section 3.4.

2.4. Selection Framework Based on Application Context

Although the classification on a temporal horizon and modeling architecture provides a useful framework in appraising PV power forecasting models, the actual choice of the right model needs to be based on the specific application context [6]. Various applications may prioritize different aspects, such as accuracy, interpretability, computational efficiency, or data requirements [5]. Table 2 provides a selection framework to illustrate the connection between useful model characteristics and common application contexts to navigate practitioners and researchers to the appropriate approaches for their requirements.

These include applications for grid operation, which require both high accuracy and short computation time to aid in real-time decision-making [12]; in this regard, data-driven models such as machine learning (ML) and deep learning (DL) approaches, as well as hybrid physical-statistical models, are often preferred [6]. Energy trading applications, on the other hand, may give priority to probabilistic forecasts and uncertainty quantification in the function of managing financial risks. These requirements are well met by ensemble models and Bayesian approaches, which can directly provide prediction intervals and capture inherent uncertainty in PV power generation [7]. It is possible that applications related to PV plant monitoring and performance evaluation give preference to interpretability and system-specific insights rather than predictive accuracy [10]. In such situations, very often physical models and semi-empirical approaches will be preferred because they might very well add into clear explanations of the factors that influence PV performance [5]. Regional forecasting applications, where predictions are made into aggregations over multiple PV systems, need modeling approaches that can handle spatial dependencies and can efficiently scale [6]. Spatio-Temporal Models and hierarchical approaches appear as some great potential candidates for these tasks because they can capture the spatial correlations and deal with large-scale data [8]. It is important to note that the selection framework provided in Table 2 serves as a general guide: given the specific characteristics of the PV system, data availability, and computational resources, the choice of an optimal model may be different depending on the actual case [5]. It is practically useful in most cases to compare several models against one another and evaluate their performance with the relevant evaluation metrics and validation strategies that will be examined in detail in Section 5.

3. State-of-the-Art Forecasting Techniques

This section reviews a range of the most recent and state-of-the-art techniques in PV power forecasting, with special emphasis on three main categories: artificial neural networks (ANNs), support vector machines (SVMs), and ensemble and hybrid approaches. Before each category is discussed in detail, statistical measures that assess PV power forecasting models in terms of accuracy and performance are first introduced.

3.1. Statistical Measures for Forecast Accuracy

Statistical measurements allow quantitative evaluation of both accuracy levels and reliability aspects within PV power forecasting models. Forecast quality evaluation happens through value comparison between prediction and actual measurements [6]. The following statistical indicators are widely employed for PV power forecasting:

Mean Absolute Error (MAE): This describes the average absolute difference between what would have been predicted and the observed. It states how good or bad the forecast really is [5,10]. It is given as:

$MAE = \frac{1}{n} \sum | P_{pred} - P_{obs} |$

(1)

where $P_{pred}$ is the predicted power, $P_{obs}$ is the observed power, and n is the number of data points.
Root Mean Square Error (RMSE): This is a quadratic scoring rule that provides an average magnitude for the forecast errors and weighs larger errors more heavily than smaller [5,6,10,12]. It can be expressed as:

$RMSE = \sqrt{\frac{1}{n} \sum {(P_{pred} - P_{obs})}^{2}}$

(2)

Getting better at improving upon model RMSE would be focused on outlier analysis as RMSE is well-known to be very sensitive to outliers.
Mean Absolute Percentage Error (MAPE): This expresses the forecast error as a percentage of the observed values and is considered a dimension-free measure of accuracy [5,6,12]. It is given as:

$MAPE = \frac{100}{n} \sum |\frac{P_{pred} - P_{obs}}{P_{obs}}|$

(3)

It is possible to compare the performances of the models on different PV systems and power.

While these statistical measures provide valuable insights into the overall accuracy of the forecasting models, they should be interpreted in the context of the specific application requirements and the characteristics of the PV power data [6]. For instance, RMSE may be more relevant for applications that are sensitive to large forecast errors, while MAPE may be more appropriate for comparing models across different PV systems [5].

3.2. Artificial Neural Networks (ANNs)

Based on the promising capabilities of ANNs in learning complex, nonlinear relationships between input variables and output power [15], they have recently emerged among the foremost popular and powerful methods for PV power forecasting. ANNs comprise interconnected nodes organized in layers based largely on the types of structures and functions found in biological neural networks [7]. Important advancements in ANN-based PV power forecasting include:

Deep learning architecture: Clean and better performance over shallow architectures are manifested by deep neural networks (DNNs) containing multiple hidden layers. Notably, convolutional neural networks (CNNs) and LSTM networks have efficiently captured spatial and temporal dependencies, respectively [5].
Hybrid ANN models: The enhanced accuracy and robustness result from hybridizing ANNs with other techniques, such as wavelet transforms or evolutionary algorithms [12,16]. Taking as an example wavelet-based feature extraction in combination with ANNs, these methods have shown superiority over others in handling non-stationary PV power data [7].
Bayesian neural networks: The integration of Bayesian inference into ANNs through statistical means allows these networks to produce uncertainty measurements together with probabilistic forecasts useful for energy trading risk evaluations [9].

An analysis of ANN-based PV power forecasting models appears in Table 3 along with their main characteristics and performance metrics as well as application limits.

Table 3 shows ANN models achieve excellent predictive results because their RMSE and MAE values remain under 10% [15]. The selection process for ANN architecture together with hybrid combination remains dependent on data characteristics alongside specific application settings [5].

3.3. Support Vector Machines (SVMs)

Among many machine learning methods, SVMs have shown a number of advantages compared to physical methods for short-term photovoltaic (PV) power forecasting. [8] notes that SVRs outperform on relatively short horizons (< 30 min), and that SVRs can learn nonlinear relationships from data and do not depend on plant-specific knowledge, making them suitable for capturing systematic errors and obtaining low BIAS in the process. On the other hand, regional forecasting may be less accurate, they are less interpretable, they require a complex phase of parameter optimization, and the complexity of partially cloudy sky conditions can be problematic.

As for the heterogeneous data integration, [8] proposes the pattern-label training for each source (measurements, NWP, satellite imagery) and then merge all in one integrated SVR An additional feature for both models is the time of day, included to capture any deterministic trends. The results indicate that this hybrid method outperforms the individual models for all time horizons.

Notable advancements in SVM-based PV power forecasting include:

Kernel selection: the selection of kernel function makes a huge difference in the SVM performance. PV power forecasting is generally carried out using a gaussian radial basis function (RBF) and polynomial kernels [12]. More recently, improved results have been seen with custom kernels built for specific data characteristics [10].
Feature selection: for SVM performance it is important to choose the most relevant input features. Optimizing feature subsets for SVM-based PV power forecasting has been applied with techniques like recursive feature elimination (RFE) and genetic algorithms (GA).
Ensemble SVMs: The integration of several SVM models using bagging, boosting, or stacking techniques has shown improved accuracy and robustness over individual SVM models [12].

Table 4 gives a comparative assessment of SVM-based models used in PV power forecasting with key features, performance parameters, and possible applications.

Table 4 presents the performance metrics that endorse competitive accuracy of SVM-based models with RMSE and MAE values quite often falling below 10% [12]. Kernel function selection and feature selection technique selection are mainly responsible for obtaining a good model performance suitable for a given application [10].

3.4. Ensemble and Hybrid Approaches

The ensemble and hybrid approaches are receiving much attention in PV power forecasting for their ability to take advantage of the many individual models working together and to capture different patterns in the data [7]. These approaches can be broadly classified into two main types:

Homogeneous ensembles: These types of ensembles, combine various models of the same type (like bagging, boosting or stacking of decision trees and ANNs). Homogenous ensembles are commonly used; however, examples like random forest (RF) and gradient boosting machines (GBM) are popular ones [5].
Heterogeneous ensembles: These ensembles include varied models, such as by merging physical models with data-driven approaches or mixing statistical and machine learning techniques [5]. The diversity of the individual models helps capture complementary information and improve overall forecasting performance [6].

Hybrid approaches that combine different forecasting techniques sequentially or in parallel have also presented encouraging results in PV power forecasting. Some very well-regarded hybrid approaches are:

Physical-statistical hybrid models: These models employ physical equations to model the deterministic components of PV power output and statistical techniques to take into account stochastic variations [5]. The combination of domain knowledge and data-driven learning often leads to improved accuracy and interpretability [10].
Wavelet-based hybrid models In this approach, wavelet transforms are used to decompose the PV power time series into different frequency components [7]. Separate models are then used to forecast each frequency component, and finally, the predictions are aggregated to obtain the final prediction [18]. This approach helps to capture multiscale patterns and enhances forecasting performance [6].
Evolutionary-neural hybrid models: Evolutionary algorithms, like genetic algorithm (GA) or particle swarm optimization (PSO), are used to optimize the hyperparameters or structure of neural net models [12]. This hybrid strategy combines the comprehensive search feature of evolutionary algorithms with the learning capability of neural networks [5].

A comparison of ensemble and hybrid models for PV power forecasting, summarizing their main characteristics, performance indicators, and appropriate use scenarios, are shown in Table 5.

The performance metrics in Table 5 demonstrate the superior accuracy of ensemble and hybrid models compared to individual techniques, with RMSE and MAE values often falling below 5% [6]. The choice of the specific ensemble or hybrid approach depends on factors such as data characteristics, domain knowledge availability, and computational resources [5].

3.5. Comparative Analysis of Model Performance

To cover the state-of-the-art forecasting techniques in a holistic manner, a comparison of performance ranges for various model categories has been provided in Figure 3.

The comparison of the three model categories shows that the performance of PV forecasting was generally strong, with RMSE and MAE being less than 10%. The best low-error performance is obtained generally by ANNs and ensemble/hybrid models, while SVMs have competitive performance with slightly higher variability. Choosing the right model finally depends on specific application requirements and data attributes. The relative performance of model categories presented in Figure 3 has been elicited from the reviewed literature, which may vary with the particular dataset, forecasting horizon, and the metrics considered for evaluation. Nevertheless, the comparative analysis provides a valuable overview of the relative strengths and weaknesses of different model categories.

4. Model Optimization Strategies

While selection of a modeling mechanism capable of forecasting power output from a PV plant is fundamental for accurate predictions, the performance of such models can also be enhanced through implementing advance optimization strategies. This section looks at some of the methods for the optimization of the model comprising hyperparameter tuning, feature selection and engineering, and integration of evolutionary and swarm intelligence algorithms.

4.1. Hyperparameter Tuning and Feature Selection

The performance improvement of forecasting models depends heavily on proper selection of their hyperparameters. The learning process does not adjust these model settings which need to be established before training commences [10]. Common examples of hyperparameters include the number of hidden layers/neurons in an artificial neural network (ANN), an SVM kernel function and its regularization parameter, and the number of trees (and maximum depth) in a random forest [12]. Model performance and its ability to learn data points alongside its ability to predict new samples both rely heavily on the selection of proper hyperparameters. Both underdeveloped and inappropriate hyperparameter choices result in poor model performance through overfitting or underfitting conditions [7].

The following methods exist to tune PV power forecasting models:

Grid search: This exhaustive search method evaluates the model performance for all possible combinations of hyperparameter values within a predefined range [6]. While grid search is straightforward to implement, it can be computationally expensive, especially for models with a large number of hyperparameters [5].
Random search: An exhaustive search that assesses the model accuracy on every combination of possible hyperparameter values outlined under a specified range [6]. Although it is simple to implement, grid search can be quite slow for models with many hyperparameters [5]. Research has also shown random search to be more efficient than grid search in medium to high-dimensional hyperparameter search spaces [10].
Bayesian optimization: This sequential model-based optimization approach starts by building a (probabilistic) model of the objective function (e.g., forecast accuracy), and then uses this model to select the hyperparameter values to evaluate next [7]. It allows a probabilistic approach to the parameter search and has been proven to be more efficient and effective than grid search and random search.
Genetic algorithms (GA): The GA-based hyperparameter tuning treats the optimization problem as a process of evolutionary optimization, whereby a population of candidate hyperparameter settings evolves through generations via selection, crossover, and mutation operations [12]. GA has been shown to be effective in finding near-optimal hyperparameter configurations for PV power forecasting models [5].
LSTM-WGAN: A data imputation technique using Wasserstein Generative Adversarial Network (WGAN) and Long Short-Term Memory (LSTM) is developed to mitigate the difficulties stemming from inadequate prediction results due to missing data in PV power records. This method introduces a data-driven GAN framework with quasi-convex characteristics to ensure the smoothness of the imputed data with the existing data and employs a gradient penalty mechanism and a single-batch multi-iteration strategy for stable training [19].

Feature selection and engineering are important for enabling solar photovoltaic technologies to predict power output [10]. Feature selection refers to selecting the most informative input variables leading to an accurate prediction, while feature engineering is a general term that describes the process of transforming it to enhance the model learning [5]. For PV power forecasting, the input features generally consist of meteorological variables (e.g., solar irradiance, temperature, humidity), time-related factors (e.g., hour of the day, day of the week), and system-specific parameters (e.g., panel tilt angle, inverter efficiency) [6]. But not all the features are relevant or hold information gain for the forecasting task and even introduce noise or redundancy that could be non-conducive to the model performance [12]. Feature selection methods try to find the best subset of features leading to the highest predictive accuracy of the model while keeping the cost of the computation as low as possible [7]. Here are some of the feature selection techniques widely used in the context of PV power forecasting:

Filter methods: These methods assess the relevance of each feature independently of the learning algorithm. The relevance is usually checked using statistical measures such as correlation, mutual information, or chi-squared tests. Features are ranked based on their individual relevance scores, and the top-ranked features are selected for model training [5].
Wrapper methods: These methods assess the quality of various subsets of features using the learning algorithm itself as part of the process of selection [12]. Examples include recursive feature elimination (RFE) and genetic algorithms (GA) for feature subset search. Wrapper methods usually out-perform filter methods but are computationally quite expensive [7].
Embedded methods: These methods use feature selection during the model fitting process by taking advantage of the inherent feature importance metrics in the learning algorithm [10]. Regularization techniques such as L1 (Lasso) and L2 (Ridge) regularization can provide some form of encouragement to model sparse feature representations, thereby allowing them to identify the most relevant features [5].

In contrast, feature engineering involves crafting new informative attributes from the existing ones to capture complex patterns and relationships in the data [12]. PV power forecasting employs some feature engineering techniques, including:

Temporal feature extraction: Derived features such as moving averages, lag values, and rolling statistics can capture short-term and long-term temporal dependencies in the PV power time series [6].
Spatial feature extraction: Techniques such as principal component analysis (PCA) and wavelet transforms can be used to extract spatial patterns and multi-resolution information from the input features [7].
Domain-specific feature generation: Incorporating expert knowledge and physical understanding of the PV system can help create meaningful features, such as the clearness index, which captures the ratio of actual solar irradiance to the theoretical maximum [10].

The best combination of feature selection and engineering techniques will, therefore, depend on the characteristics of the forecasting problems at hand, such as available data sources, the desired level of model interpretability, and the trade-off between computational complexity and predictive accuracy [10]. In general, the identification of the most informative and relevant features for a given forecasting task may require iteration and the experience of domain experts [5]. Restricting scope to residential-level forecasting, advanced feature engineering techniques [20] are proposed to capture relevant temporally, spatially, and meteorologically varying features. Features may involve the distance from solar noon and summer solstice, special features associated with solar position (azimuth and zenith), wind speed and direction features, one-hot features associated with dotting time of day/seasons, etc. All of these lead to a considerable improvement in the performance of machine learning models in PV generation and household consumption forecasting.

4.2. Evolutionary and Swarm Intelligence Algorithms

Evolutionary and swarm intelligence algorithms function now as powerful optimization methods which help improve performance of PV power forecasting models. The biologically based algorithms aim to search optimal solutions through complex high-dimensional problem spaces using principles of both biological evolution and collective intelligence. Evolutionary algorithms, like genetic algorithms (GA), evolutionary programming (EP), and differential evolution (DE), act on a population of candidate solutions by applying selection, crossover, and mutation operations to generate additional solutions, with the performance of the new solutions exhibiting improved fitness [5]. In the area of PV power forecasting, they have performed tasks ranging from obtaining feature selection for hyperparameter tuning to optimization [7]. Swarm intelligence algorithms receive their inspiration from biological collective behaviors of ant colonies and flocks of birds with schools of fish. The search space exploration and solution optimization process relies on three algorithms namely particle swarm optimization (PSO), ant colony optimization (ACO), and artificial bee colony (ABC) which utilize the interaction and collaboration of simple agents to explore the search space and find optimal solutions [6]. A summary of evolutionary and swarm intelligence algorithms used in PV power forecasting appears in Table 6 while highlighting their main attributes as well as optimization performance.

Integration of evolutionary and swarm intelligence algorithms with PV power forecasting models has shown promise for improvements in accuracy, robustness, and generalization capability [12]. Some of the notable applications are:

Feature selection GA and PSO successfully select the most crucial set of input features for PV power forecasting models which reduces computational requirements and enhances understanding of results [7].
Hyperparameter tuning: DE, PSO, and ABC have been employed to search for the optimal hyperparameter settings of various forecasting models, such as ANNs, SVMs, and random forests, resulting in improved predictive performance [5].
Model structure optimization: GAs and ACO have enabled researchers to discover efficient model architectures by means of optimizing both neural network structures and the patterns of network connections [5].
Ensemble model generation: DE and ABC have been utilized to generate diverse and complementary forecasting models, which are combined into a robust ensemble prediction [6].

However, besides these possible benefits, evolutionary and swarm intelligence algorithms present several acknowledged disadvantages, such as the need for careful parameter setting, the risk of premature convergence, and the computational overhead associated with the iterative optimization process [12]. The selection process for suitable algorithms combined with forecasting models demands complete knowledge about problem characteristics in addition to available resources and exploration-exploitation balance requirements [10]. [20] presents a systematic comparison of different machine learning architectures (Prophet, feed-forward networks, LSTM, GRU) for residential-level forecasting, with parameter optimization specific to generation and consumption. Results show the superiority of neural networks, particularly for solar generation forecasting, achieving a coefficient of determination (R²) up to 0.981. The task of forecasting household consumption proves difficult because human behaviors strongly influence these predictions and achieve only 0.523 as the highest coefficient of determination.

4.3. Hybrid Optimization Frameworks

The hybrid optimization frameworks combine several optimization techniques with the intention of exploiting their collective strengths while overcoming their individual shortcomings [5]. Such frameworks were designed in order to forge a more systematic yet effective model optimization in PV power forecasting [7]. The most commonly used hybrid optimization strategies include:

GA-ANN hybrid: The GA is used in this framework to get the structural optimization and hyperparameter optimization of an ANN model. Conventional gradient-based methods are applied to training the ANN model. The ANN parameters are finely tuned by gradient methods, while GA is able to search the immense search space of model configurations [12].
PSO-SVM hybrid: This system enables the use of PSO to identify optimal SVM model hyperparameters including kernel function and regularization parameter and kernel parameters. The global search feature of PSO enables the identification of optimal hyperparameter settings that lead to accurate PV power predictions when trained over an SVM model [5].
DE-Ensemble hybrid: This framework produces different base forecasting models through DE optimization while ensemble techniques such as stacked or weighted averaging determine the model selection according to [7]. By using the DE algorithm users can optimize the ensemble weights or stacking model to achieve minimum ensemble error [6].
ACO-Fuzzy hybrid: This framework integrates Ant Colony Optimization with Fuzzy logic systems for PV power forecasting [12]. ACO facilitates optimization of fuzzy model parameters by finding the most suitable membership functions and rule structures and the fuzzy system delivers understandable forecasting results.

Table 7 summarizes the key hybrid optimization frameworks and their main advantages in PV power forecasting.

Hybrid optimization frameworks provide a broader and more flexible approach to optimization modeling in PV power forecasting. They capitalize on existing synergies while overcoming limitations that individual techniques might have [5]. Such frameworks combine the strength of various optimization algorithms to navigate complex search spaces, handle multiple objectives, and adapt to the specific characteristics of the forecasting problem [7]. Nonetheless, hybrid optimization frameworks will, on their own, bring challenges such as high computational complexity, the need for good design and integration of different components, and the possibility of over-parameterization [12]. The selection of an appropriate hybrid framework should, therefore, reflect a sound understanding of the problem domain, data and computational resources available, and the performance-interpretability balance [10].

5. Performance Evaluation Metrics and Frameworks

Section 3.1 introduced the statistical measures for accuracy analysis while this section covers PV power model assessments under different weather environments alongside benchmarking methods and operational-economic effect assessments.

5.1. Evaluation under Different Weather Conditions

The output of PV power generation shows strong dependence on weather conditions for solar irradiance and temperature levels [10]. Determination of model reliability and stability needs to be established through weather-conditioned assessment procedures [12]. Typical weather conditions in PV power forecasting evaluation involve:

Clear sky conditions: Clear skies with maximum solar radiation form the perfect backdrop for PV power generation [6]. The forecasting models show good performance in these conditions because they produce both precise predictions and minimal error metrics [5].
Partially cloudy conditions: Partially cloudy conditions introduce significant variability in solar irradiance, leading to rapid fluctuations in PV power output [7]. Under these evaluation conditions analysts can measure forecasting models’ success in representing cloud transit dynamics and power system instability [10].
Overcast conditions: In highly clouded conditions, heavy cloud cover causes a decrease in the output of PV power [12]. Forecasting models should be able to accurately predict the lower power levels and the potential for sudden changes in generation due to cloud movement [6].
Variable sky conditions: Sunlight spikes caused by "broken clouds" could lead to major positive and negative fluctuations in the irradiance pattern measured at the PV site location. Assessment of how these variable sky conditions affect PV energy generation is important for increasing forecast accuracy and understanding the uncertainties involved [21].
Seasonal variations: PV power generation produces different output levels depending on seasonal progress with better performance in summer months than in winter months [5]. The ability of models to follow solar irradiance and temperature trends within different seasons gets assessed [7].

A standard method for assessing how PV power forecasting models perform under different conditions depends on separating data into weather category clusters based on solar irradiance along with cloud coverage and temperature rates [10]. The statistical measures discussed in Section 3.1 can then be calculated for each weather category, providing a more nuanced assessment of the model’s performance [12]. The evaluation of weather-dependent PV power forecasting models represents a vital requirement to validate their reliability for practical use [10]. Researchers together with practitioners can develop improved solar energy generation models when they examine performance under different weather conditions [12]. Recently, the SKIPP’D dataset has been introduced [22] with the intention of providing a benchmark for the development and comparison of PV power forecasting models in the short run. High temporal resolution data at 1 minute for 3 years, processed and raw sky images, and several weather scenarios (clear summers, partially cloudy winters) characterize the dataset. Coupled with preprocessing code and baseline implementations, these features make the SKIPP’D an invaluable resource in gauging and comparing various forecasting methods, thus aiding research reproducibility in the field. With these standardized datasets, research reproducibility will indeed get a huge boost through algorithms allowing simple comparisons of the forecast methods. A common platform for assessing model performance will help researchers compare the success of different techniques under varying weather conditions and forecast horizons. The advancement of PV power forecasting and identification of superior forecasting models depends on possessing diverse high-quality datasets.

5.2. Benchmarking and Model Comparison Strategies

Benchmarking and model comparison strategies are essential for assessing the relative performance of different PV power forecasting models and identifying the state-of-the-art approaches [7]. The implementation of standardized datasets along with evaluation metrics plus comparison frameworks serves to deliver fair and consistent model evaluation across multiple frameworks [6].

Some key considerations for benchmarking and model comparison in PV power forecasting include:

Benchmark datasets: The development and distribution of publicly accessible benchmark datasets stands essential for PV power forecasting progress and model determination [5]. These datasets should cover a variety of configurations of PV systems from different geographical regions with different weather conditions to accurately evaluate how generalizable the models are. For instance, [4] assessed the global horizontal irradiance (GHI) of four global reanalysis datasets-MERRA-2, ERA5, ERA5-Land, and CFSv2-in a comparison applied across 35 observation stations scattered throughout Brazil and ground-based measurements to determine their aptitude for the representation of hourly GHI. Such studies provide valuable insights into the suitability of different data sources for PV power forecasting in regions with limited observational time series measurements.
Evaluation metrics: Standardizing the evaluation metrics used for model comparison is essential for ensuring that results are comparable and meaningful with regards to performance assessment [12]. These should include the statistical measures discussed in Section 3.1: RMSE, MAE, and MAPE, along with domain-specific measures like forecast skill [7].
Cross-validation: Employing cross-validation techniques, such as k-fold or leave-one-out cross-validation, helps assess the model’s performance on unseen data and reduces the risk of overfitting [6]. Through sequential training and testing partitions of the data with cross-validation techniques we can achieve better estimates of model generalization ability [5].
Statistical significance tests: Conducting statistical significance tests, such as t-tests or Wilcoxon signed-rank tests, is important to determine whether the performance differences between models are statistically significant or merely due to chance [10]. Such tests provide a rigorous basis through which models can be compared and ranked [12].
Model complexity and interpretability: Model complexity alignment with interpretability needs to be evaluated jointly with predictive performance for practical utilization purposes [7]. We should prefer less complex models having better interpretability over more complex models even when these simpler choices allocate slightly fewer accurate predictions [6].

Effective benchmarking and model comparison strategies are fundamental in identifying the most promising approaches in PV power forecasting and guiding future research directions [12]. Developing boilerplate datasets, evaluation metrics, and comparison frameworks helps to propel the active state-of-the-art PV power forecasting together with the establishment of improved and reliable models for researchers and practitioners [7].

5.3. Operational and Economic Impact Assessment

The evaluation of the operational and economic impacts of the PV power forecasting models is important to demonstrate their practical worthiness and provide guidance for their implementation in real-life applications. This assessment attempts to quantify the benefits of improved forecast accuracy as, for example, system reliability, energy efficiency, and cost savings [6].

The operational impact assessment concerns how PV power forecasts can affect the daily operation of the power system:

Grid stability: Precise PV power predictions support grid operators in maintaining power supply-demand equilibrium thus minimizing system instability that causes blackouts [5]. For PV power generation variability prediction operators must take charge of other generation resources dispatch to maintain grid stability [10].
Reserve capacity requirements: Accurate PV power forecasting enables grid operators to reach the best possible decisions about reserve capacity deployment because unexpected renewable energy generation changes need compensation [12]. The accuracy of PV power forecasts helps decrease both the system’s operational expenses and enhance its productivity [7].
Curtailment reduction: When the power grid experiences limits to its capacity to accept PV system electricity outputs it results in power generation curtailment [6]. Grid operators who predict PV power with accuracy develop proactive measures against curtailment events through generator dispatch adjustments or demand response program implementation [5].
Module lifespan prediction: Long-term reliability tests for high power density PV modules have shown that standard tests, like IEC 61215, may not adequately assess the long-term reliability of these modules. A new combined stress test concept, which includes light-combined damp heat cycles, has been introduced to better predict the rate of degradation and the service life of PV modules based on latent heat analysis [23].

Economic impact assessment, on the other hand, mainly looks at the financial benefits of the different PV power forecasting models, such as:

Energy market participation: Accurate forecasts for PV power enable PV system owners and operators to effectively participate in energy markets (day-ahead and real-time markets) [10]. By providing reliable estimates of their expected power output, PV system owners can optimize their bidding strategies and thus maximize their revenues [12].
Reduced imbalance costs: In many electricity markets, generators are penalized for deviations between their scheduled and actual power output [7]. If their actual generation differs substantially from their forecast values, PV system owners could face substantial imbalance costs [6]. Accurate PV power forecasts help minimize these imbalance costs by reducing the mismatch between predicted and actual generation [5].
Investment planning: The forecasting models of PV power systems serve as fundamental elements for establishing strong investment decisions in PV project development [10]. Providing reliable estimates of expected power output over the lifetime of a project should enable an investor to assess the financial viability of PV installations and make proper decisions on capacity expansions and technology upgrades [12].

Estimating the operational and economic impacts arising from the use of PV power forecasting models is crucial for demonstrating their value proposition and thereby enabling their uptake in the energy sector [10]. Quantifying functional and economic benefits from the perspectives of grid stability, energy efficiency, and cost savings legitimizes the rationale for putting advanced forecasting techniques into practice [12].

All things considered, the performance evaluation framework for PV power forecasting models must cover statistical measures, evaluation under diverse weather conditions, benchmarking and model comparison strategies, operational and economic impact assessment [10]. By considering these diverse aspects, researchers and practitioners can gain a holistic understanding of the model’s performance, reliability, and practical value, enabling informed decision-making and driving the continuous improvement of PV power forecasting techniques [12].

6. Recent Innovations in Photovoltaic Power Forecasting

Recent years have seen important improvements in photovoltaic (PV) power forecasting, thanks to better machine learning and deep learning and the use of many different kinds of data. Innovations made have been fuelling the growing accuracy and reliability of PV power predictions, addressing some of the inherent challenges associated with the variability of solar energy.

6.1. Advanced Machine Learning and Deep Learning Techniques

The application of advanced ML and DL techniques has considerably transformed PV power forecasting. Linear regression and support vector machines (SVM), customary approaches to modeling, have been improved or replaced by newer techniques. Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), especially Long Short-Term Memory (LSTM) networks, show meaningful promise in discerning both spatial and temporal dependencies natural within photovoltaic power data. Exceptionally effective CNNs show an outstanding skill for processing a wide-ranging quantity of satellite and sky images to model the important effect of broadly distributed cloud cover and other atmospheric conditions on solar irradiance [24]. LSTMs possess an important skill for learning long-term dependencies and this characteristic renders them highly suitable for forecasting PV power across diverse time horizons [25].

Several hybrid models combining multiple ML techniques along with DL techniques have also emerged as powerful tools. LSTM and CNN-based models offer an important approach to analyzing PV power data because they excel at representing the data’s spatial and temporal characteristics [26]. These hybrid approaches produce considerably more accurate and strong predictions by combining the major strengths of each technique.

6.2. Integration of Diverse Data Sources

The association of various data sources with PV power forecasting has led to a massive enhancement to the accuracy of the predictions. Satellite data about cloud cover, solar irradiance, and other atmospheric conditions are invaluable in this regard. Satellite data in weather forecasting models give better predictions, especially during inclement weather conditions [27]. Moreover, as shown in [28], irradiance forecasts can be further enhanced by combining ground-based measurements (e.g., irradiance sensors, anemometers, or thermometers), remote sensing data processing (e.g., satellite image analysis), and outputs from dynamic weather models. This multi-source approach enhances the accuracy of the forecasts compared to using a single data source, thereby optimizing the estimation of PV power generation. Figure 4 from [28] illustrates an example of an irradiance forecast (Global Horizontal Irradiance,

G H I

) for a day characterized by variable weather conditions. The black curve represents the measured values, denoted as

G_{meas}

, while the other curves correspond to different forecast scenarios with lead times ranging from 30 hours to 6 hours before the event. A progressive improvement in forecast accuracy can be observed as the forecast horizon shortens. For instance, the mean measured irradiance value is Gmeas = 0.040 kW/m², whereas the 30-hour ahead forecast significantly overestimates this value, yielding

G H I_{fore}

(+30h) = 0.159 kW/m², with a normalized Mean Absolute Error (

M A E

) of 12.8% and a normalized Mean Bias Error (

M B E

) of 11.1%. Conversely, the forecast updated 6 hours before the event reduces the prediction error, providing a value closer to the actual measurement,

G H I_{fore}

(+6h) = 0.088 kW/m², with

M A E

= 5.2% and

M B E

= 5.1%.

Total Sky Imagers used to monitor cloud movement and predict short-term changes in solar irradiance have seen a marked increase in the accuracy of intra-hour and intra-day PV power forecasts when sky images are combined with machine learning models [27].

Numerical Weather Predictions (NWP) models that forecast meteorological variables, such as temperature, humidity, and wind speed, have also been combined with historical PV power data to formulate more precise PV power forecasting models. This multi-source data integration enables a more comprehensive understanding of the factors influencing PV power generation, hence more reliable forecasts.

6.3. Metaheuristic Optimization Algorithms

The performance of PV power forecasting models receives enhancement through implementation of metaheuristic optimization algorithms. These algorithms optimize internal configurations of the ML model, make it more accurate, and enhance performance. Particle Swarm Optimization (PSO), which imitates the way bird flocks or fish schools move together, has improved the hyperparameters of many machine learning models, resulting in more accurate predictions. Genetic Algorithms (GA), mimicking natural selection, have improved the structure and parameters of several neural networks, thereby resulting in a meaningful improvement in the accuracy of PV power forecasts. Differential Evolution (DE), a highly effective population-based optimization algorithm, considerably improves the performance of many single forecasting models, along with hybrid forecasting models, by precisely fine-tuning their parameters [12].

6.4. Real-Time and On-Demand Forecasting Applications

The inclusion of multiple forecasting models into real-time and on-demand applications has furthered the ease of use and practicality of PV power predicts for different interested parties. The development of user-friendly MATLAB applications made real-time PV power to be predicted. These applications can consequently allow users to input relevant data and overflow much more quick forecasts, hence streamlining the decision-making processes of grid operators and energy traders. Scalable platforms for PV power forecasting operate on the cloud now. Hence such platforms harness the potential of cloud computing to process large datasets and provide highly reliable instantaneous forecasts [12].

7. Challenges and Future Opportunities

Despite meaningful progress in predicting solar power output, several challenges continue to obstruct these technologies. Dealing with these important challenges enables large improvements in PV power forecasting accuracy, reliability and applicability through focused research and innovation.

7.1. Variability and Uncertainty in Solar Power Generation

The main challenge to be solved in PV power forecasting is the intrinsic variability and uncertainty present along with solar power output. Solar irradiance is being influenced by so many factors; cloud cover, atmospheric conditions, and geographical conditions are certainly altering in a short time and in an unpredictable manner [29]. Hence, a precise forecast is very difficult to obtain in a relatively short term. These changes might necessitate advanced forecasting models capable of dynamic adaptations to changing conditions and assimilation of real-time data [30].

7.2. Integration with Grid Operations

The intermittent nature of solar power presents challenges to the integration of PV systems with the existing power grid. Variability in the output of PV power can lead to insufficient voltages, surges, and overall operating costs [31]. Accurate forecasting supported the balancing of supply and demand by minimizing the need for costly reserve capacity and optimizing grid operations. There is a great need to develop robust forecasting models that can produce reliable forecasts under different conditions existing on the network [10].

7.3. Data Quality and Availability

The accuracy of PV power forecasts heavily depends on how well the data stands in terms of both quality and existing quantity. High-resolution, consistent historical data are often limited or inconsistent due to technological changes and short record periods at newly built sites. Further, any missing or erroneous data can skew the performance of the forecasting models. A set of advanced preprocessing steps such as normalization, interpolation, and outlier detection should be coupled with the forecasting model establishment to mitigate these issues [32].

To ease prediction challenges arising out of missing data in PV power records, a Wasserstein Generative Adversarial Network (WGAN) and Long Short-Term Memory (LSTM)-based data imputation method has been proposed. This method introduces a data-driven GAN framework with quasi-convex characteristics to ensure the smoothness of the imputed data with the existing data and employs a gradient penalty mechanism and a single-batch multi-iteration strategy for stable training [19].

7.4. Model Complexity and Computational Requirements

With the rise in sophistication of forecasting models, the complexity and computational needs have increased as well. Due to their hybrid architectures and ensemble methods, state-of-the-art ML and DL models usually demand significant computational resources for both training and deployment. In real-time and wide-scale applications, this poses a barrier. There is certainly a need for more effective algorithms and optimization techniques for better performance with fewer computational burdens, while also maintaining high accuracy [5].

7.5. Adaptability to Changing Conditions

PV power forecasting models should withstand weather and operational characteristics. The necessity of updating forecasting models with new data and contexts-for the models to evolve in a way that maintains predictions’ accuracy-is crucial. Context change detection and incremental learning can help in enhancing the adaptability of the forecasting models so that they continue learning on input data and improve their performance over time [10].

7.6. Future Research Directions

Research in the future should prioritize studying several key areas to overcome existing challenges:

Enhanced Data Integration: Multiple data integration between satellite images and sky images and NWP data gives researchers an expanded perspective to study PV power generation factors. Using data from various sources to build forecasts results in increased accuracy and reliability estimates [27].
Advanced Optimization Techniques: Metaheuristic optimization algorithms like the PSO, GA, and DE can improve the internal configurations of forecasting models that help enhance the forecasting model’s performance [12].
Real-Time and Scalable Solutions: Solutions such as cloud-based platforms to offer a real-time, scalable forecasting tool would lead to the large-scale availability of PV power forecasts for many end-users, given that they are made freely accessible [27].
Adaptive Learning Models: The adaptive learning models, which may effectuate dynamic learning through access to new data and rapid changes in conditions, is essential in achieving high prediction accuracy in time [10].

Addressing these challenges and exploring these future research directions will foster the advancement of PV power forecasting, which will pave the way for efficient integration of solar energy to the power grid during the transformation to a sustainable energy future.

8. Conclusions

The present review detailedly examined the burgeoning advances in the technique of PV power forecasting, model architecture, and optimization approaches for PV power forecasting during the recent period. These innovations have been stirred up in part by increased solar power penetration worldwide into power systems and growing emphasis on accurate forecasting to smoothen grid integration/operation. The review highlighted a comprehensive taxonomy of forecasting models, which elaborates on their temporal horizons, mathematical basis, and application context. Physical models, statistical techniques, and machine learning algorithms have been discussed, highlighting their strengths, limitations, and suitability for different forecasting tasks. Model optimization strategies for hyperparameter tuning, feature selection, and embedding evolution and swarm intelligence algorithms have taken away the center stage. Technological developments in PV power forecasting include new advanced deep learning architectures, synthesis of various sources of data, incorporation of metaheuristic optimization algorithms, and the predictive models that are running in real time and can be called upon to perform prediction on demand. These advancements have significantly enhanced the accuracy, reliability, and practicality of PV power predictions. However, there are other challenges that still need to be solved - variability and uncertainty regarding solar electricity generation, synching PV systems to grid operations, quality and availability of the data to high-quality set up, huge computing requirements of sophisticated models, changing conditions which forecasting models should be adaptive to - are some of the relevant cases for further inquiry and innovation.

Research directions should include improved data integration, creation of promising and scalable forecasting solutions, coming up with adaptive learning models, and improving interpretability and transparency. This is indeed a challenge and opportunity for the successful large-scale integration of solar energy into power systems, which will accelerate the transition to sustainable and resilient energy. In conclusion, this review has provided a comprehensive and up-to-date analysis of the state-of-the-art in PV power forecasting, highlighting the progress made, the challenges faced, and the future directions to be pursued. The work provides a balanced understanding of fast-moving research to help researchers together with practitioners and decision-makers improve PV power forecasting approaches to unlock the full potential of solar energy.

List of Abbreviations and Notations

Abbreviations:

ANN: Artificial Neural Network
ARIMA: AutoRegressive Integrated Moving Average
CNN: Convolutional Neural Network
DE: Differential Evolution
DL: Deep Learning
EP: Evolutionary Programming
GA: Genetic Algorithm
GRU: Gated Recurrent Unit
LSTM: Long Short-Term Memory
MAE: Mean Absolute Error
MAPE: Mean Absolute Percentage Error
ML: Machine Learning
NWP: Numerical Weather Prediction
PSO: Particle Swarm Optimization
PV: Photovoltaic
R²: Coefficient of Determination
RF: Random Forest
RMSE: Root Mean Square Error
RNN: Recurrent Neural Network
SARIMA: Seasonal Autoregressive Integrated Moving Average
SVM: Support Vector Machine
SVR: Support Vector Regression
WGAN: Wasserstein Generative Adversarial Network

References

Bizzarri, F.; Nitti, S.; Malgaroli, G. The use of drones in the maintenance of photovoltaic fields. 2019, Vol. 119. [CrossRef]
Spertino, F.; Chiodo, E.; Ciocia, A.; Malgaroli, G.; Ratclif, A. Maintenance Activity, Reliability Analysis and Related Energy Losses in Five Operating Photovoltaic Plants. 2019. [CrossRef]
PVPS, I. Trends in Photovoltaic Applications 2024: Survey Report of Selected IEA Countries between 1992 and 2023; Report IEA-PVPS T1-43: 2024, IEA PVPS Task 1, 2024. [Google Scholar]
Araujo, M.; Aguilar, S.; Souza, R.; Cyrino Oliveira, F. Global Horizontal Irradiance in Brazil: A Comparative Study of Reanalysis Datasets with Ground-Based Data. Energies 2024, 17, 5063. [Google Scholar] [CrossRef]
Al-Dahidi, S.; Madhiarasan, M.; Al-Ghussain, L.; Abubaker, A.; Ahmad, A.; Alrbai, M.; Aghaei, M.; Alahmer, H.; Alahmer, A.; Baraldi, P.; et al. Forecasting Solar Photovoltaic Power Production: A Comprehensive Review and Innovative Data-Driven Modeling Framework. Energies 2024, 17, 4145. [Google Scholar] [CrossRef]
Iheanetu, K. Solar Photovoltaic Power Forecasting: A Review. Sustainability 2022, 14, 17005. [Google Scholar] [CrossRef]
Akhter, M.; Mekhilef, S.; Mokhlis, H.; Shah, N. Review on Forecasting of Photovoltaic Power Generation Based on Machine Learning and Metaheuristic Techniques. IET Renew. Power Gener. 2019, 13, 1009–1023. [Google Scholar]
Wolff, B.; Kuehnert, J.; Lorenz, E.; Kramer, O.; Heinemann, D. Comparing support vector regression for PV power forecasting to a physical modeling approach using measurement, numerical weather prediction, and cloud motion data. Sol. Energy 2016, 135, 197–208. [Google Scholar] [CrossRef]
Lateko, A.; Yang, H.T.; Huang, C.M. Short-Term PV Power Forecasting Using a Regression-Based Ensemble Method. Energies 2022, 15, 4171. [Google Scholar] [CrossRef]
Amiri, A.; Chouder, A.; Oudira, H.; Silvestre, S.; Kichou, S. Improving Photovoltaic Power Prediction: Insights through Computational Modeling and Feature Selection. Energies 2024, 17, 3078. [Google Scholar] [CrossRef]
Chicco, G.; Cocina, V.; Di Leo, P.; Spertino, F.; Massi Pavan, A. Error Assessment of Solar Irradiance Forecasts and AC Power from Energy Conversion Model in Grid-Connected Photovoltaic Systems. Energies 2016, 9, 8. [Google Scholar] [CrossRef]
Das, U.; Tey, K.; Seyedmahmoudian, M.; Mekhilef, S.; Idris, M.; Van Deventer, W.; et al. Forecasting of photovoltaic power generation and model optimization: A review. Deakin University. Journal contribution, 2018.
Saigustia, C.; Pijarski, P. Time Series Analysis and Forecasting of Solar Generation in Spain Using eXtreme Gradient Boosting: A Machine Learning Approach. Energies 2023, 16, 7618. [Google Scholar] [CrossRef]
Tsai, W.C.; Tu, C.S.; Hong, C.M.; Lin, W.M. A Review of State-of-the-Art and Short-Term Forecasting Models for Solar PV Power Generation. Energies 2023, 16, 5436. [Google Scholar] [CrossRef]
Cantillo-Luna, S.; Moreno-Chuquen, R.; Celeita, D.; Anders, G. Deep and Machine Learning Models to Forecast Photovoltaic Power Generation. Energies 2023, 16, 4097. [Google Scholar] [CrossRef]
Oh, J.; So, D.; Jo, J.; Kang, N.; Hwang, E.; Moon, J. Two-Stage Neural Network Optimization for Robust Solar Photovoltaic Forecasting. Electronics 2024, 13, 1659. [Google Scholar] [CrossRef]
Shao, D.; et al. Transient Stability Assessment Method for Power System Based on SVM with Adaptive Parameters Adjustment. In Proceedings of the 2021 IEEE 4th International Electrical and Energy Conference (CIEEC); 2021; pp. 1–6. [Google Scholar] [CrossRef]
Xu, W.; Li, D.; Dai, W.; Wu, Q. Informer Short-Term PV Power Prediction Based on Sparrow Search Algorithm Optimised Variational Mode Decomposition. Energies 2024, 17, 2984. [Google Scholar] [CrossRef]
Liu, Z.; Xuan, L.; Gong, D.; Xie, X.; Zhou, D. A Long Short-Term Memory–Wasserstein Generative Adversarial Network-Based Data Imputation Method for Photovoltaic Power Output Prediction. Energies 2025, 18, 399. [Google Scholar] [CrossRef]
Raudys, A.; Gaidukevičius, J. Forecasting Solar Energy Generation and Household Energy Usage for Efficient Utilisation. Energies 2024, 17, 1256. [Google Scholar] [CrossRef]
Chicco, G.; Cocina, V.; Di Leo, P.; Spertino, F. Weather forecast-based power predictions and experimental results from photovoltaic systems. In Proceedings of the 2014 International Symposium on Power Electronics, Electrical Drives, Automation and Motion, IEEE, Ischia, Italy; 2014; pp. 342–346. [Google Scholar] [CrossRef]
Nie, Y.; Li, X.; Scott, A.; Sun, Y.; Venugopal, V.; Brandt, A. SKIPP’D: A SKy Images and Photovoltaic Power Generation Dataset for short-term solar forecasting. Solar Energy 2023, 255, 171–179. [Google Scholar] [CrossRef]
Nam, W.; Choi, J.; Kim, G.; Hyun, J.; Ahn, H.; Park, N. Predicting Photovoltaic Module Lifespan Based on Combined Stress Tests and Latent Heat Analysis. Energies 2025, 18, 304. [Google Scholar] [CrossRef]
Gensler, A.; Henze, J.; Sick, B.; Raabe, N. Deep Learning for solar power forecasting — An approach using AutoEncoder and LSTM Neural Networks. In Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2016; pp. 002858–002865. [Google Scholar] [CrossRef]
Gallardo, I.; Amor, D.; Gutiérrez. Recent Trends in Real-Time Photovoltaic Prediction Systems. Energies 2023, 16, 5693. [Google Scholar] [CrossRef]
Zhu, C.; Wang, M.; Guo, M.; et al. Innovative approaches to solar energy forecasting: unveiling the power of hybrid models and machine learning algorithms for photovoltaic power optimization. J Supercomput 2025, 81, 20. [Google Scholar] [CrossRef]
Blanc, P.; Remund, J.; Vallance, L. Short-term solar power forecasting based on satellite images. In Renewable Energy Forecasting from Model to Applications; Woodhead Publishing: Cambridge, UK, 2017; pp. 179–198. [Google Scholar]
Ciocia, A.; Chicco, G.; Gasperoni, A.; Malgaroli, G.; Spertino, F. Photovoltaic Power Prediction from Medium-Range Weather Forecasts: a Real Case Study. In Proceedings of the 2023 IEEE PES Innovative Smart Grid Technologies Europe (ISGT EUROPE); 2023; pp. 1–5. [Google Scholar] [CrossRef]
Diagne, M.; David, M.; Lauret, P.; Boland, J.; Schmutz, N. Review of Solar Irradiance Forecasting Methods and a Proposition for Small-Scale Insular Grids. Renew. Sustain. Energy Rev. 2013, 27, 65–76. [Google Scholar]
Ye, H.; Yang, B.; Han, Y.; Chen, N. State-Of-The-Art Solar Energy Forecasting Approaches: Critical Potentials and Challenges. Frontiers in Energy Research 2022, 10, 875790. [Google Scholar] [CrossRef]
Woyte, A.; Van Thong, V.; Belmans, R.; Nijs, J. Voltage fluctuations on distribution level introduced by photovoltaic systems. IEEE Trans. Energy Conv. 2006, 21, 202–209. [Google Scholar]
Liu, F.; Ting, K.; Zhou, Z. Isolation-based anomaly detection. ACM Trans. Knowl. Discov. Data (TKDD) 2012, 6, 1–39. [Google Scholar]

Figure 1. Global cumulative installed PV capacity growth and projections (data from IEA PVPS).

Figure 2. Taxonomy of PV power forecasting models, categorized by temporal horizon, architectural approach, and selection criteria.

Figure 3. Comparative analysis of model performance ranges based on reviewed literature.

Figure 4. Example of forecast profiles for a day with variable weather conditions.

Table 1. Temporal horizon categories and applications.

Temporal Horizon	Range	Applications
Intra-hour	< 1 hour	Real-time dispatch, power quality control
Intra-day	1 - 24 hours	Day-ahead scheduling, energy trading
Day-ahead	24 - 48 hours	Maintenance planning, resource allocation
Medium-term	48 hours - 1 month	Asset management, seasonal planning
Long-term	> 1 month	Capacity expansion, policy making

Table 2. Selection framework for PV power forecasting models based on application context and key requirements.

Application Context	Key Requirements	Suitable Model Types
	High accuracy,	Data-driven models
Grid operation	Fast computation	(ML, DL),
		Hybrid physical-statistical models
	Probabilistic forecasts,	Ensemble models,
Energy trading	Uncertainty	Bayesian
	quantification	approaches
	Interpretability,	Physical models,
PV plant monitoring	System-specific	Semi-empirical
	insights	models
	Robustness to data	Ensemble models,
Performance assessment	quality, Handling	Deep learning
	missing data	models
	Spatial aggregation,	Spatio-temporal
Regional forecasting	Scalability	models,
		Hierarchical models

Table 3. Comparative analysis of ANN-based models for PV power forecasting.

Model	Key Features	Performance Metrics	Suitable Applications
	Feedforward	RMSE: 3.2-7.6%,	Short-term
MLP [15]	architecture,	MAE: 2.5-5.9%	forecasting,
	Backpropagation		PV plant
	learning		monitoring
	Spatial feature	RMSE: 2.8-6.4%,	Regional
CNN [5]	extraction,	MAPE: 4.2-9.6%	forecasting,
	Hierarchical		Spatio-temporal
	learning		modeling
	Temporal	RMSE: 3.5-8.3%,	Medium-term
LSTM [5]	dependency	MAE: 2.9-6.8%	forecasting,
	capture,		Time series
	Long-term memory		analysis
	Multi-resolution	RMSE: 2.3-5.7%,	Non-stationary
Wavelet-ANN [7]	analysis,	MAPE: 3.6-8.5%	data,
	De-noising		Feature
	and compression		extraction

Table 4. Comparative analysis of SVM-based models for PV power forecasting.

Model	Key Features	Performance Metrics	Suitable Applications
	Non-linear	RMSE: 4.5-9.7%,	Short-term
	mapping,	MAE: 3.8-8.2%	forecasting,
SVM-RBF [12]	Gaussian		Regression
	kernel		tasks
	function
	Polynomial	RMSE: 5.2-10.4%,	Non-linear
	kernel	MAPE: 6.3-12.6%	relationships,
SVM-Poly [10]	function,		High-dimensional
	Degree		input
	optimization
	Genetic	RMSE: 3.9-8.5%,	Feature
	feature	MAE: 3.2-7.1%	optimization,
GA-SVM [7]	selection,		Computationally
	Optimal		efficient
	subset
	identification
	Recursive	RMSE: 4.2-9.1%,	High-dimensional
	feature	MAPE: 5.4-11.3%	input,
RFE-SVM [17]	elimination,		Feature
	Backward		ranking
	feature
	selection
	Multiple	RMSE: 3.1-7.3%,	Improved
	SVM	MAE: 2.6-6.2%	accuracy,
Ensemble-SVM [12]	combination,		Robustness
	Bagging,		enhancement
	boosting,
	stacking

Table 5. Comparative analysis of ensemble and hybrid models for PV power forecasting.

Model	Key	Performance	Suitable
	Features	Metrics	Applications
	Bagging of	RMSE: 2.9-6.8%,	Short-term
	decision trees,	MAE: 2.4-5.7%	forecasting,
Random Forest [6]	Feature		Variable
	importance		selection
	ranking
	Sequential	RMSE: 3.2-7.5%,	Nonlinear
Gradient Boosting [5]	tree boosting,	MAPE: 4.3-9.2%	relationships,
	Gradient-based		Medium-term
	optimization		forecasting
	Physical	RMSE: 2.6-6.1%,	Improved
	modeling of	MAE: 2.2-5.3%	interpretability,
	components,		Domain
Physical-ANN hybrid [10]	ANN for		knowledge
	stochastic		integration
	variation
	Multi-scale	RMSE: 2.5-5.9%,	Non-stationary
	decomposition,	MAPE: 3.8-8.6%	data,
Wavelet-SVM hybrid [7]	SVM for		Robustness
	component		improvement
	forecasting
	GA for	RMSE: 2.3-5.4%,	Hyperparameter
	ANN	MAE: 1.9-4.6%	tuning,
GA-ANN hybrid [5]	optimization,		Model
	Evolutionary		structure
	learning		optimization

Table 6. Overview of the key evolutionary and swarm intelligence algorithms.

Algorithm	Category	Main Characteristics	Optimization Capabilities
Genetic Algorithm (GA)	Evolutionary algorithm	Selection, crossover, and mutation-based optimization	Feature selection; hyperparameter-tuning; model structure optimization
Differential Evolution (DE)	Evolutionary algorithm	Mutation and crossover-based optimization on vectors in search space	Hyperparameter-tuning; model structure optimization
Particle Swarm Optimization (PSO)	Swarm intelligence algorithm	Particle move in search space based on personal and global best positions	Feature selection; hyperparameter-tuning; model structure optimization
Ant Colony Optimization (ACO)	Swarm intelligence algorithm	Pathfinding based on pheromone deposition by ants	Feature selection; hyperparameter-tuning; model structure optimization
Artificial Bee Colony (ABC)	Swarm intelligence algorithm	Mimics foraging behavior of honeybees	Feature selection; hyperparameter-tuning; ensemble model generation

Table 7. Summarizes the key hybrid optimization frameworks and their main advantages in PV power forecasting.

Framework	Optimization Techniques	Main Advantages
GA-ANN-hybrid	Genetic Algorithm;	Efficient exploration of model
	Artificial Neural Network	configuration space; fine-tuning of model parameters; improved generalization and robustness
PSO-SVM-hybrid	Particle Swarm Optimization;	Effective hyperparameter optimization;
	Support Vector Machine	enhanced model accuracy and robustness; reduced computational complexity
DE-Ensemble-hybrid	Differential Evolution;	Discovery of diverse and complementary models;
	Ensemble methods	improved performance and stability; robustness to individual model weaknesses
ACO-Fuzzy-hybrid	Ant Colony Optimization;	Optimization of fuzzy rule-based systems;
	Fuzzy Logic Systems	incorporation of domain-specific knowledge for transparent reasoning

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.