Quality control of solar radiation data within the South African Weather Service solar radiometric network

This study reports on the performance results of the Baseline Surface Radiation Network (BSRN) quality control procedures applied to the solar radiation data, from September 2013 to December 2017, within the South African Weather Service radiometric network. The overall percentage performance of the SAWS solar radiation network based on BSRN quality control methodology was 97.79%, 93.64%, 91.60% and 92.23% for long wave downward irradiance (LWD), global horizontal irradiance (GHI), diffuse horizontal irradiance (DHI) and direct normal irradiance (DNI), respectively, with operational problems largely dominating the percentage of bad data. The overall average performance of the surface solar radiation dataset – Heliosat data records for the GHI estimation for all stations showed a mean bias deviation of 8.28 Wm-2, a mean absolute deviation of 9.06 Wm-2 and the root mean square deviation of 11.02 Wm-2. The correlation, quantified by the square of correlation coefficient (R2), between ground-based and Heliosat-derived GHI time series was ~0.98. The established network has the potential to provide high quality minute solar radiation data sets (GHI, DHI, DNI and LWD) and auxiliary hourly meteorological parameters vital for scientific and practical applications in renewable energy technologies.


Introduction
Knowledge of the local solar radiation arriving at the surface of Earth is very important for many different applications, such as crop growth models, architectural designs, planning, designing and sizing of solar energy systems [1,2,3]. To be successful in these applications, solar radiation measurements are required at strategic sites [1]. Historically, solar radiation data has been measured and recorded by the national meteorological services around the world [4]. Until recently, the South African Weather Service (SAWS) has been the primary source of ground-based solar radiation data in South Africa [5,6]. The old solar radiometric network, which was operational from 1957 to 1997, collapsed because of technical difficulties and lack of maintenance [5,7]. Owing to the rapid development of solar-based renewable energy technologies and projects, the demand for reliable and accurate data for site-specific solar resource assessment has increased [4]. Quality control (QC) may be a tedious process and, as a result, most users are keen to use data directly from meteorological services with confidence without performing an additional and fine data check [8]. To this end, in 2013, the SAWS re-established the national solar radiometric network, comprising thirteen new stations within the country's six climatic zones [9]. These stations are equipped with robust and reliable instruments suitable for Baseline Surface Radiation Network (BSRN) solar radiation measurements [10]. Climatic zones are regions with similar climatic conditions [11] and, according to Conradie [9], they were established to classify different areas based on their maximum energy demand and maximum energy consumption. Where each radiometric station is located, there is an automatic weather station (AWS) measuring hourly temperature, rainfall, pressure, humidity, wind speed and wind direction to provide auxiliary meteorological parameters. Measurements of solar radiation are more susceptible to errors than other meteorological parameters [12]. According to Urraca et al. [12], there are two major sources of these errors related to ground-based solar radiation measurements: equipment and operational errors. Equipment errors are inherent to the type and construction of the sensors used in the measuring campaign [4,8]. Solar measuring sensors produce electric current when reacting with radiation, which is converted into measurement of solar radiation. Solar radiation measurement instruments are also prone to change in sensitivity, thermal offsets, spectral effects, geometry and the environment [4]. On the other hand, operational errors are independent of the type of sensor, and involve different factors such as shading by nearby objects or dust covering the dome of the sensor, incorrect levelling, station shut-downs, and electric fields near cables or a malfunction in the data-logger. Careful selection of the place to install the station, as well as a regular maintenance, can ameliorate most of these operational errors. Applying a QC procedure becomes an essential step before using ground-based datasets [4,8] to identify and quantify all the different types of errors in the measurements of solar radiation. According to Huld et al. [13], accurate and reliable solar radiation measurements provide investmentgrade bankable solar radiation data to the solar energy industry, project developers, decision makers in financing and policy-making institutions, and the scientific community. Accurate ground-based solar radiation data is also important for the improvement and validation of satellite-derived solar radiation data and scientific models.
In the present study, the BSRN QC procedures were applied to the solar radiation data within the SAWS radiometric network, with data coverage from September 2013 to December 2017.

Ground-based solar radiation data
The datasets used in this study consist of ground measurements registered at thirteen stations in South Africa, owned and maintained by the SAWS. They are evenly distributed in six different climatic regions [11] over an area bounded by latitudes 23° to 34° south and longitudes 18° to 31° east, Figure  1. The elevation of the stations ranges from 80 m to almost 1700 m, as described in Table 1. Table 2 provides information on the manufacturer and type of instruments used at the measurement stations. De Aar, located in the Northern Cape, is a BSRN station utilising two ventilated CMP21 pyranometers by Kipp & Zonen, which are rated in the highest possible International Organization for Standardization pyranometer performance category. The ventilation units keep the pyranometer's domes clean from frost and water. Periodical maintenance procedures are applied to the various instruments to satisfy the BSRN quality requirements. The ground-based solar radiation database contains one-minute values of all the measured parameters at each station.

Satellite-derived solar radiation data
The surface solar radiation dataset -Heliosat (SA-RAH) [18] is part of the climate data records produced by Satellite Application Facility on Climate Monitoring (CMSAF), where the objective is to produce a temporally homogeneous data record for long times suitable for climate analysis, i.e., assessment of anomalies and trends. The SARAH data records are derived using data from the Meteosat visible infra-red imager instruments of the Meteosat First Generation satellites (Meteosat 2-7) up to the    The SARAH provides data for the GHI and DNI irradiance at the earth surface from 1983 to date at high temporal (down to 30 minutes, but also daily and monthly averages) and spatial (0.05° x 0.05°) resolutions. Surface solar radiation is obtained using a modified Heliosat method to calculate the effective cloud albedo and the Specmagic clear-sky model [14], which is an extension to spectral bands of the mesoscale atmospheric irradiance code model [15]. The Specmagic uses monthly average values of atmospheric water vapour content from the European Centre for Medium-Range Weather Forecasts Reanalysis (ERAinterim) product and long-term monthly climatologies of aerosol optical depth based on monitoring atmospheric composition and climate [16,17]. Validation of SARAH, using high-quality ground stations from international networks, e.g., BSRN, http://bsrn.awi.de, as well as from national networks, has been published [18][19][20][21]. At present, the SARAH dataset provided by CMSAF exists in two versions. The dataset used in the present work is based on version 1 of SARAH, with one difference: the hourly data used here are calculated from one satellite image per hour. In contrast, the SARAH version 1 data available from CMSAF use a weighted average of three half-hourly satellite images to calculate the hourly solar radiation values.

Methods for quality control and validation 2.3.1. Quality control of solar radiation data
The schematic diagram illustrating the methodology considered in the present study is given in Figure 2. According to Urraca et al. [20], there are several and diverse QC methods applied to solar radiation data by different meteorological services and independent researchers. The SAWS has preferred to use well-known QC procedures from the BSRN [22]. These QC procedures mark those samples identified out of the normal test limits of data and usually leave the decision of removing marked cases to the user. In this study, the BSRN QC procedures with three levels of testing was applied on the archived monthly minute data stored in a central database. Each point minute data is associated with its own quality code (see Table 3) after the test. These tests can be classified in three major categories: physical possible limits, extremely rare limits, and coherence between measurements or across quantities relationships, which can be defined as follows.
• Physical possible limits: check for possible physical reasonable maximum and minimum values. These extremal values are assigned codes 1 and 2 corresponding to less than a minimum and greater than maximum reasonable values, respectively [12,22]. Data that did not pass this test is flagged and excluded from further analysis. • Extremely rare limits: check the data that is in the physical possible limit range for random errors often associated to unusual weather conditions like multiple reflection between broken clouds and the snow surface or a track-ing problem (hardware). Typical tracking problems experienced at some of the stations, include, power failures due to damaged electric cables and overcharging batteries. Moreover, mechanical damage to the tracking instrument was experienced, resulting from severe thunderstorms, wind and hail. Therefore, data that is beyond the extremely rare cases should at least be visually inspected and, if no physical reasons is found, it is excluded from the analysis. • Coherence between measurements: compared measurements or across quantity relationships are based on the relationship among the three main solar radiation parameters: GHI, DHI, and DNI. In cases where GHI and DHI are almost the same, most of the values do not pass the coherence quality test. This usually happens when the DHI sensor was exposed to the sun, thus recording similar values as GHI, resulting from a tracking problem. Data that does not pass this test is flagged and excluded from further analysis. The minute ground-based solar radiation data of GHI, DHI, DNI and LWD from all thirteen SAWS solar radiometric stations (Table 1), were subjected to quality check procedures based on BSRN QC standards [22,23,24] before the validation was performed. Only the data that passed the first two quality check tests (physically possible limits and extremely rare limits) was used in the validation. Files containing missing values and the data that did not pass the first two BSRN QC tests were flagged and later replaced by not a number (NaN) before that timestamp was considered for the validation [22,23,24]. Moreover, the minute values were averaged to 15 minutes and then four slots of 15-minute averages were averaged to get hourly mean values [23,24]. Furthermore, all night values, values between sunset (20:00) and sunrise (05:00) based on South African standard time, i.e., when the solar zenith angle is less than 90°, were replaced by 0. Hourly mean values were then averaged to get daily mean values and, subsequently, monthly mean values calculated from the daily mean values.

Validation of the satellite-based solar radiation
Quality controlled irradiance values in the validation of satellite-based models involved a comparison of computed monthly mean satellite-retrieved estimates with monthly averaged ground-based solar radiation data. The CMSAF-SARAH monthly mean surface incoming shortwave radiation data with a spatial resolution of 0.05°x 0.05° from MSG was validated against concurrent quality-checked monthly average GHI values calculated from minute GHI values measured from thirteen SAWS solar radiometric network. According to Schulz et al. [25], the CMSAF-SARAH products are accurate enough to be used for solar energy applications and to support meteorological organisation with diurnal, sub-seasonal and seasonal solar radiation data sets.

Validation metrics
The validation metrics, including the mean bias deviation (MBD), mean absolute deviation (MAD), root mean square deviation (RMSD) and the square of correlation coefficient (R 2 ), were calculated from all the months with 90% or more [19] data that passed the quality tests. In addition, calculations were made for diffuse fraction (DHI/GHI) and clearness index (GHI/top of atmosphere), hereafter referred to as DF and KT, respectively. The DF and KT were calculated for all thirteen stations from the months with 90% or more [19] of both GHI and DHI data that passed the quality tests. Annual average temperature and humidity levels of each station from 2013 to 2017 were also calculated for each radiometric station, using hourly data from AWS. Satellite-retrieved and groundbased solar radiation values of GHI were compared at the different stations for every year and month independently. The MBD, RMSD and MAD in absolute (Wm -2 ) values were computed according to Equations 1 to 3 [3,26,27]. In addition to these, the R 2 correlation coefficient was also calculated using Equation 4 [26,27].
where is the satellite-retrieved irradiance value at the i th time point and is the ground-based solar radiation value for that timestamp; N is the total number of points considered in the period of time analysed (year or month); and ���� is the average ground-based solar radiation value during the considered time.

The BSRN quality control
The BSRN QC results are presented in Tables 4 and  5, where the overall percentage performance of SAWS' solar radiation network based on BSRN QC procedures was 97.79%, 93.64%, 91.6% and 92.23% for LWD, GHI, DHI and DNI respectively. Operational problems dominated the percentage of bad data as follows: LWD 2.21%, GHI 1.6%, DHI 3.57% and DNI 3.57%. Only data represented by code 0 was regarded as having passed all the three BSRN QC tests and, thus, representing the overall percentage performance. Code 5 represents missing data or data that was not recorded, which is indicative of overall percentage of the operational problems and errors. On the other hand, codes 8 and 10 represent data that failed first and second BSRN QC tests respectively. All the irradiance data sets passed the first QC test and only 0.3% of GHI and DHI data failed the second QC test. Code 16 and 32 represent data that failed the third BSRN QC test, with the results showing that at least 4% of GHI, 4.3% of DHI and 4.2% of DNI failed the measurement coherence test between them. Code 40 represents data that failed both the second and third BSRN QC test and code 42 represents data that failed all three BSRN QC tests. For the validation, the data coded 0 were used. This data was re-garded as good quality because it passed all quality tests and useable. On the other hand, data bearing the codes 5, 8, 10, 16, 32, 40 and 42 was discarded, replaced by NaN, and was not considered for further analysis in the validation because it failed either of the three quality tests. The final volume of monthly data used at every station depended on the QC results of the measured values. Riihela et al. [19] advocate validating satellite products against in situ measurements with more than 90% of good quality data.

Validation of the satellite-derived solar radiation product
Considering the results obtained in all the stations, the overall average performance of the SARAH data record for the GHI estimation showed an MBD of 8.28 Wm -2 , MAD of 9.06 Wm -2 and RMSD of 11.02 Wm -2 . Analysing the correlation between groundbased and satellite-derived GHI time series with the R 2 coefficient, the average performance of the SARAH satellite product in the estimation of the GHI values was ~0.98. Table 6 presents the absolute average MBD, RMSD and MAD values obtained from the validation of the complete valid time series of the SARAH GHI estimates at every station. Besides the R 2 coefficient, the number of months used at every location is also indicated.
From the validation of the global irradiance estimates, the SARAH product provided accurate estimates of the monthly average GHI values in every location other than Durban and Cape Point, where it also showed the highest overestimation. This overestimation at these locations could indicate either a problem with the ground measurements or a misinterpretation of the input parameters, such as aerosols or albedo, used by the satellite method [29]. In addition, low altitudes, 91 m and 86 m, respectively, may have exacerbated the overestimation.
According to Posselt et al. [28], the validation accuracy threshold for MAD of monthly mean GHI against SARAH monthly mean GHI ought to be 15 Wm -2 , target accuracy threshold is 10 Wm -2 and optimal accuracy threshold is 8 Wm -2 . The comparison between concurrent SARAH GHI against SAWS GHI monthly means showed a great similarity with MAD of less than 15 Wm -2 in 11 of 13 stations. Figure 3 shows only Cape Point (temperature coastal) and Durban (subtropical coastal) stations had an MAD greater than the validation threshold accuracy of 15 Wm -2 , recording 18.9 Wm -2 and 19.0 Wm -2 , respectively.
Prieska, Upington, De Aar, Irene, Mafikeng, Bethlehem, Polokwane (stations located in the arid climatic regions, cold interior and temperature interior) reached an optimal threshold with a MAD of    On the other hand, Nelspruit (hot interior) reached a target accuracy threshold with MAD less than 10 Wm -2 , while Thohoyandou (hot interior), George (temperature coastal) and Mthatha (Subtropical coastal) meet the validation threshold with MAD less than 15 Wm -2 .

Diffuse fraction and clearness index
The clearness index and diffuse fraction shown in Figure 4 depict higher values (near 1) of clearness index, implying a clear sky and calm atmosphere. In case of Upington and De Aar, clearness index averages were >0.6 throughout the year. The diffuse fraction of solar radiation was also calculated for all locations and found to vary from 0 to 1. Higher val-ues indicate more aerosols and clouds. The values for diffuse fraction for Upington and De Aar were always <0.3 in all months, indicating suitability of the locations for solar energy prospecting.

Conclusions
The Baseline Surface Radiation Network quality control (BSRN QC) tests proved to be effective and efficient in detecting errors at different stations.
• The overall average performance of the surface solar radiation dataset -Heliosat (SARAH) data record for the global horizontal irradiance (GHI) estimation for all the stations exhibited mean bias deviation of -8.28 Wm -2 , mean absolute deviation of 9.06 Wm -2 , and root mean square deviation of 11.02 Wm -2 . (e) (f) • The SARAH estimates can provide the basis for further analysis, such as the one presented in this study on annual photovoltaic electricity production. • The overall percentage performance of SAWS' solar radiation network based on BSRN QC procedures is 97.79%, 93.64%, 91.6% and 92.23% for long wave downward irradiance, GHI, diffuse horizontal irradiance and direct normal irradiance, respectively, demonstrating the potential value of SAWS solar resource database for practical and scientific applications in South Africa.