Uncertainty in drought identification due to data choices, and the value of triangulation

Droughts are complex and gradually evolving conditions of extreme water deficits which can compromise livelihoods and ecological integrity, especially in fragile arid and semi-arid regions that depend on rainfed farming, such as Kitui West in south-eastern Kenya. Against the background of low ground-station density, 10 gridded rainfall products and four gridded temperature products were used to generate an ensemble of 40 calculations of the Standardized Precipitation Evapotranspiration Index (SPEI) to assess uncertainties in the onset, duration and magnitude of past droughts. These uncertainties were driven more by variations between the rainfall products than variations between the temperature products. Remaining ambiguities in drought occurrence could be resolved by complementing the quantitative analysis with ground-based information from key informants engaged in disaster relief, effectively formulating an ensemble approach to SPEI-based drought identification to aid decision making. The reported trend towards drier conditions in Eastern Africa was confirmed for Kitui West by the majority of data products, whereas the rainfall effect on the increasingly dry conditions was more subtle than annual and seasonal declines and greater annual variation, which warrants further investigation. Nevertheless, the effects of increasing droughts are already felt on the ground and warrant decisive action.


Introduction
Drought is a slow-onset phenomenon characterized by spatiotemporal water deficits restricting water accessibility and availability for social-ecological systems at varying temporal scales [1][2][3][4][5]. Characteristic persistent negative anomalies in precipitation and high temperatures leading to high evapotranspiration from soils and crops eventually have cross-sectoral effects on agriculture, food and livelihoods, particularly in East Africa where rainfed agriculture is the economic mainstay [1, [6][7][8][9][10][11]. Droughts and other environmental changes prevalent in East Africa, such as agricultural expansion and corresponding land degradation, contribute to water crises as they aggravate the competition of water demands [1]. Droughts may be categorized as (i) meteorological (resulting from rainfall deficit) or, depending on duration and additional drivers and impacts, (ii) agricultural (exceptionally low soil moisture), (iii) hydrological (exceptionally low surface and/or subsurface water levels) and (iv) socio-economic (resulting from water supply and demand failure in relation to the previous categories) [1,4].
Droughts have severe, widespread effects on livelihoods, especially in arid and semi-arid regions, contributing inter alia to declining crop quality and quantity and forest productivity [12,13], and deterioration of aquatic life [10]. East Africa, and especially Kenya, is emblematic of the recurring drought regions worldwide [10,[14][15][16][17]. The agroecosystems of semi-arid eastern Kenya are particularly vulnerable, with an inconsistent rainfall regime and the frequency and intensity of droughts increasing [3,10,12,18,19]. Kitui County in southeastern Kenya is such a vulnerable semi-arid region with inconsistent rainfall and high temperatures, featuring dry spells in the growing season that impede the dominantly rainfed agriculture [10,16,20]. Water demand will likely follow the projected population increase in the area [21]; hence monitoring and understanding of drought dynamics and the development of management interventions are ever more necessary.
Precipitation and temperature are the primary meteorological variables modulating drought duration and severity. However, the impact of prevailing data uncertainties [22] in the identification of past droughts, particularly in data scarce regions like East Africa, has received little attention in the literature. Identification of past drought occurrence is essential to assess responses and mitigate against current and future events. The inherent complexity of the phenomenon due to the interrelation of hydrological and social factors in drought occurrences, impacts and responses has attracted a range of research fields across the natural and social sciences [2,23,24]. It seems apt, therefore, to complement the meteorological data with qualitative ground-based information from disaster response and other sources in order to verify drought identification based on gridded products. This promising approach has to date remained largely unexplored.
The Standardized Precipitation Index (SPI) and the Standardized Precipitation-Evapotranspiration Index (SPEI) are two widely used drought intensity monitoring indices. The SPI is recommended by the World Meteorological Organization (WMO) [1,15,25] and requires rainfall as the only parameter. The SPEI, an extension of the SPI, is a more recent statistical index where the water balance is represented by precipitation and potential evapotranspiration (PET) [26], making it arguably more reliable for the detection and monitoring of drought [26][27][28]. The SPEI identifies meteorological drought at a sub-annual scale but can be a proxy for hydrological, agricultural and socioeconomic drought [29]. SPI and SPEI, which are closely related indices, with the latter an improvement of the former [27,30], have been applied to various ecosystems in East Africa. Studies have typically responded to the unevenly distributed and generally scarce station-based data over East Africa with the use of gridded data products [7,9,[31][32][33][34]. For instance, [28] demonstrate near similarity of SPEI and SPI using MERRA-2 temperature, merged with the CHIRPS rainfall product. [35], by contrast emphasize the value of PET for drought identification, and hence the superiority of SPEI over SPI. [36] show the value of gridded data for drought assessment in the Ethiopian Upper Blue Nile Basin; in their case the CHIRPS product outperformed TARCAT, PERSIANN and TRMM. Also [37] emphasize the usefulness of CHIRPS, in the uneven topography of East Africa. They reveal the value of precipitation, and minimum and maximum temperature at monthly resolution for long-term climate variability assessment. [9] use an array of five gridded data products to compute SPI, SPEI and soil moisture anomalies, demonstrating the uncertainty in existing products, with discrepancies particularly in mountainous areas and areas with low groundstation density.
[37] emphasize the need to consider temperature variation alongside rainfall and the need for higher quality data to manage data-related uncertainties in the central Kenyan highlands. [38] provide an account of drought impacts over East African agroecosystems and the importance of temporal assessment using gridded data, further emphasizing uncertainty and spatial variability.
In the present study, we problematize the choice of rainfall and temperature products for the calculation of SPEI in the context of identifying past drought conditions in the semi-arid Kitui West area of Kitui County, south-east Kenya. We thereby complement existing studies with a demonstration of the variation of data products and the resulting SPEI calculations at the sub-national scale, which is relevant for assessing drought impacts on agriculture-based livelihoods [39]. We compare 10 gridded rainfall products with coverage of Kenya. In the absence of ground-stations within the study area, comparison is made with the two nearest in-situ stations as well. In the attempt to resolve the ambiguity in drought identification resulting from the differences in products, we show the value of complementing the SPEI analysis with key informant interviews, effectivvely demonstrating the value of triangulation. The paper is structured as follows. Section 2 introduces data and methods. Section 3 and 4 present and discuss the results in light of other studies in Kenya and East Africa. Section 5 concludes with a summary and recommendations for policy and practice.

Study area
Kitui County is a largely semi-arid to arid locality in south-eastern Kenya , Figure 1, with an intermittent river regime. The county has a population of over 1.1 million persons with a density of 37 persons per square kilometer, an average household size of 4.3 and a total area of about 30,430km 2 [21]. The county is characterized by relatively high poverty levels, with indicators of food and water insecurity highlighted in the sub-national development blueprint, the Kitui County Integrated Development Plan (2018-2022) [40]. Food poverty is estimated at about 39.4% compared to Kenya's average of 32% [40]. Approximately 50% of inhabitants do not have access to water sources within a walking distance of 5km [40]. The erratic rainfall regime is considered a principal parameter linked to the viability of the mixed crop agroecosystem against the background of recurrent drought conditions [11]. As in most of East Africa, small-scale mixed crop farming is the primary livelihood in Kitui County, supporting food production among other benefits [11].
Kenya receives rainfall in two seasons, a longer one in March-May (MAM) and a shorter but more reliable season in October-December (OND) [41]. Temperatures range from 14 to 34 °C, with January-February being the warmest months followed by MAM [42]. The ecological profile of the county includes seven agroecological zones that reflect the agricultural development potential as well as varying vegetative cover. Dominant soil groups include Dystric Regosols, Lithosols and Humic Cambisols, the Ferralo category consisting of Acrisols (ferric), Luvisols and Ferralsols, and Chromic Luvisols and Ferralsols [8].

SPEI calculation
The SPEI was calculated using the R package SPEI version 1. 7 [30] for a 30year period (1987-2016) using all combinations of 10 monthly rainfall and four monthly min/max temperature products (Table 2), yielding a total of 40 data blends. These products were chosen because they had proven reliable in the variable terrain of East Africa [28,36,37,43]. A 30-year window of analysis was chosen as all products overlapped during this period. The units of all data sources were harmonized to mm/month and °C (monthly average), respectively. Monthly PET was calculated from Tmin and Tmax using the reduced data Hargreaves method in the SPEI. Following previous studies, a 12-month accumulation was used as it yielded a smoother annual drought visualization compared to 3-and 6month accumulations, while depicting generally similar drought patterns ( [28], [44]). The 12-month SPEI also represented an annual rainfall regime matching the semi-arid agro-ecology of the study area which often receives minimal rainfall. It also fits with the observed inter-annual distribution of drought instances as learned from interviews in the field. The accumulated differences between rainfall and PET were normalized using the log-logistic distribution, and fitted using the unbiased estimator of probability-weighted moments, as implemented in the SPEI package version 1.7.

Meteorological data products
The nearest four synoptic and agrometeorological stations are located approximately 100-200kms away from the study area, and the nearest, Kitui Agrometeorological Station, has only a 5-year record and too many data gaps to be useful for our analysis. The same applies to adjacent volunteer stations [34]. Hence the gridded data products could only be compared to two ground-stations further away that had reliable records [34,45]. The gridded products are summarized in Table 2.  [34,46]. This product is developed through the Enhancing National Climate Services (ENACTS) program [31,46,47], which works with national meteorological services across Africa to improve the quality of climate data and enhance access in essential sectors such as agriculture to counter the problem of scarce ground-based stations [31]. The KMD product combines spatially downscaled reanalysis data and bias corrected satellite-based rainfall estimates with sparse station-based observations. For Tmax and Tmin, 37 weather stations across Kenya were used and merged with data from the JRA-55 (Japanese 55year Reanalysis) product (see Table 2 for JRA-55 background) [48]. Rainfall was generated using data from about 700 stations which were merged with satellite data from the Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) product (see Table 2) [43,46].
The Japanese 55-year Reanalysis (JRA-55) data, produced by the Japanese Meteorological Agency, is an improvement of the predecessor, JRA-25, where problems such as cold bias in the lower atmosphere, dry bias in the Amazon and a longer time scale, since 1958, have been addressed [49]. Following [50], the product has demonstrated reliability in Central Equatorial Africa where a comparison was made with other reanalysis products including MERRA-2, ERA-Interim, 20CR, CFSR, NCEP-1 and NCEP-2. The ERA5 data is a fifth-generation reanalysis product of the European Center for Medium-Range Weather Forecasts (ECMWF) [51]. It has a longer temporal coverage and higher resolution than the predecessor, ERA-Interim, and provides more parameters at hourly resolution accompanied by uncertainty information. A study by [52] compared the performance of the product to in-situ stations, with [53] revealing the usefulness of ERA5 especially at high elevations. The Modern-Era Retrospective Analysis for Research and Applications (MERRA-2) data is a reanalysis product of the Global Modeling and Assimilation Office of the Goddard Space Flight Center developed towards the aim of an integrated earth system analysis [54]. The satisfactory performance of the product as compared to the GPCP and JRA-55 products is depicted by [55] and by [50] over Central Equatorial Africa through comparison with the new gauge-based NIC31 product alongside other reanalysis data such as JRA-55 and ERA-Interim.
The data from the Global Precipitation Climatology Centre (GPCC), operated by the German Weather Service, consists of the world's largest database of station-based precipitation data [56]. The primarily monthly data is used to develop gridded products such as the full-data, monthly version 6 which consists of the largest station number. The GPCC showed reliable performance when compared at the global level to the CRU CL 2.0 and ERA40 products at various locations. The data from the Global Precipitation Climatology Project (GPCP) of the World Data Center for Meteorology is a monthly gridded product built by merging satellite estimates and gauge analysis from the GPCC. Version 2.3 includes adjustments for improved rainfall estimates compared to version 2.2 [57]. A study over the complex terrain of the Ethiopian highlands by [7] showed the applicability of the product under those circumstances compared to the TRMM 3B43 and CMAP data. The Climatic Research Unit gridded Time Series (CRU TS) data is a gridded product based on angular distant weighting of groundstation data from national meteorological services around the world [58]. The product's performance has been compared to the GPCC.
The Climate Hazards Group InfraRed Precipitation with Station (CHIRPS) data is a merged product including five satellite-based and ground-station products [43]. It has previously proved reliable in the uneven topography of East Africa [32]. Over Kenya, the product has demonstrated remarkable performance [59] over drier regions [37] where it out-performed ARC2 and CHIRP. The latest version of the Tropical Applications of Meteorology using SATellite (TAMSAT) data (TAMSAT 3.1) merges Meteosat thermal infrared imagery and rain gauge observations covering the whole of the African continent since 1983 [60]. Alongside the TRMM 3B42 and CMORPH products, TAMSAT demonstrated high performance over the complex Ethiopian highlands in a study by [7]. Another largely satellite based product, the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks -Climate Data Record (PERSIANN-CDR), is developed from GPCP and satellite-based data [61]. The PERSIANN-CDR has proven useful in detecting disasters as [61] showed in the 2005 Katrina hurricane product verification study, comparing also GPCP, TRMM and CPS. The meteorological data were averaged over the study area by 6 weighted average, proportional to the contribution of each grid cell to the 7 study area shape (see Figure. S1 and Equation. S1 of the Supplementary in-8 formation). For each data product, the grids differed in their intersection with 9 the study area (see Figure. S2 of the Supplementary Information). Correla- Makindu and the gridded rainfall data provided by the KMD were greater 12 than 0.6 (see Figure. S3 of the Supplementary Information). Following [63], 13 we used the native resolution of the products (  18 [3] recommended the triangulation of SPEI output in order to reinforce 19 the results while also contributing to a broader understanding of the temporal 20 evolution of droughts and ongoing responses. Following [64] we additionally 21 view methodological triangulation (referred to as triangulation in the text and 22 henceforth) as an optimal approach for integrating our qualitative and quanti-23 tative data to generate a confirmatory picture. Therefore, in addition to the 24 SPEI calculations using the 40 blends of rainfall and temperature products, 25 information on drought occurrence and severity was obtained by interviews 26 from 14 key informants with a track record of working on droughts and re-27 lated activities, e.g. food security, humanitarian and farm-based interventions, 28 in the study region (see Table. S3 in the Supplementary Information). The  Table S3. A snowball sampling 37 approach was used, where each key informant was asked to suggest equally 38 active organizations in the study area for further interviews [64]. Some inter-39 views were recorded upon consent of the interviewee; for others, notes were 40 taken.  45 The inter-annual variability in precipitation across the study area frequently 46 exceeds ±1mm (in 30% of the cases), less often ±2mm (5% of the cases), (Figure 47 2; for zoomed-in versions see Supplementary Information, Figure. S5). Mean ab-48 solute deviation is 154mm for annual precipitation, and negative anomalies are 49 more frequent but less severe as compared to the positive anomalies. Further, at 50 the annual level the overall products, mean= 656mm, SD=197mm and CV=32%. 51 The data products often, but not always, agree on the direction of the anomaly     1994, 1996-1997, 1999-2000, 2005-2006, 2009and 2011. More ambiguous 106 are 1988, 1991-1993, 2001-2004, 2008  The information from the key informant interviews agreed with all 114 unambiguous droughts in the timespan (2005-2006, 2009, 2011) and the one year 115 which was unambiguously wet (2007). The interviews also pointed to droughts in 116 2008, 2010, 2012 and 2014-2015 where the SPEI information based on the 117 different data products was ambiguous. In the other ambiguous years 2013 and 118 2016, the key informant interviews pointed to no drought. Hence it would seem 119 that key informants engaged in drought relief on the ground in the region can 120 resolve the ambiguity resulting from the disagreement between meteorological 121 data products.  125 Reliable assessment of the onset, magnitude and duration of drought is vital 126 in agro-pastoral ecosystems, not only to understand impacts on livelihoods but 127 also to signal and assess the reliability of responses [2,65]. In the absence of reli-128 able meteorological data as a result of sparse in-situ station density over Kenya 129 [16,34,37] and other African countries, rainfall and temperature data from grid- ready resulted in deterioration of livelihoods and ecosystem integrity [69][70][71].

143
In the current study uncertainty manifests itself in differences between the 144 data values of gridded products, for rainfall and temperature, with annual 145 minimum and maximum temperature varying less between data than rainfall, as 146 depicted in the results section.The temporal pattern of the Tmax and Tmin input 147 was also more similar across products than that of rainfall. The variation of SPEI 148 across data blends therefore predominantly reflects the variation of the rainfall  161 By comparing 10 precipitation products, we found no evidence of a 162 statistically significant trend (although there could be a trend), neither in annual 163 rainfall nor seasonal rainfall totals, nor annual standard deviations. This finding is 164 in contrast with the declining rainfall trend over East Africa reported by [71], 165 [72], [38], [41] and [11]. It is also in contrast with the key informant information 166 that the March-April-May rain season, being the longer of the two seasons and 167 essential in the farming calendar, has demonstrated unreliability in recent years.

168
Since rain-fed agriculture is the primary source of livelihoods in the study area 169 and the primary contributor to the economy [38,40], a decrease of rainfall in the 170 long season and a general shortening of the season is a major concern [73]. deviations. Both would propagate to lower SPEI values, which in our case and for 176 most products agree with an increase in drought instances in recent years.

177
The absence of evidence of a significant trend in the shorter October-  (Table. S2) as also reported by [75]. The MAM, especially due to its lower 186 variability, thus remains important for agroecosystem productivity in the region, 187 with a likely atmospheric teleconnection with the OND as shown by [71]. The varied more between products. This agrees with findings over Kenya by [76] who 196 found increasing trends of min/max temperatures and [39] who similarly reports a 197 marked warming in the Horn of Africa. [10] found warm days to be increasing 198 and cold nights to be decreasing, as well as summer days to be increasing over 199 Kenya, confirming the picture of rising temperatures. devastating among the households largely dependent on rainfed agriculture.

237
Essential sectors such as energy, which is largely hydro-based, were negatively 238 impacted across East Africa [2,6]. In Kenya, a total of 3.75 million persons, 239 primarily in the North and parts of the South-East, were affected by the resulting 240 food shortage according to the global record of mass disasters occurrence [78].

241
The drought period 2005-2006, confirmed by most products, was followed by 242 wetter conditions in 2007, which exacerbated impacts. As [18], [77] and [38] 243 discuss, livelihoods and natural ecosystems across East Africa were severely and willfully biased responses with the aim to attract funding by exaggerating the 263 severity of the drought situation [81]. On their own, the qualitative data lack 264 information on drought magnitude and timing, which is something that the SPEI 265 analysis can provide, albeit with uncertainty.

268
Using an ensemble of gridded meteorological data products in the calculation 269 of drought indices, such as the SPEI in this study, facilitated greater 270 understanding of the uncertainties in onset, duration and magnitude of past 271 droughts. These uncertainties were driven more by the variation between rainfall 272 products than temperature products in our case. Understanding past droughts is 273 important to study their social-ecological impacts and assess the adequacy of 274 responses. Our study thus holds an important lesson for studies of past droughts: 275 using any one of the available data products would risk severely misrepresenting 276 drought characteristics. It is equally important to bear in mind that, in the absence 277 of a dense ground-station network, there is no benchmark dataset against which 278 the individual data products can be assessed. Searching for a "best" product is 279 thus not viable, and the value of these products can only be realized in an 280 ensemble.

282
An ensemble approach to SPEI could not, however, identify all droughts 283 unanimously in our case, using an ensemble of 10 rainfall products times four 284 temperature products over the Kitui West area in south-east Kenya. This 285 ambiguity could only be resolved with the information from key informants 286 engaged in disaster relief on the ground. Our study thus demonstrates the value of 287 triangulating quantitative drought analysis with qualitative data. The qualitative 288 data alone, in turn, would miss information on drought onset, duration and 289 magnitude; this is what the ensemble approach to SPEI provides, albeit with 290 uncertainty. It is thus the juxtaposition of both types of data that is most fruitful.  Effective responses include enhancement of government, private sector and 303 community based disaster relief systems, targeting, for example, crop 304 diversification with cultivation of drought resistant varieties as championed by the 305 Kenya Red Cross [82]. An ensemble approach to SPEI will provide the necessary 306 quantitative basis for these policies, while the experience of community, regional 307 and national organisations will help resolve data ambiguities as well as strengthen 308 the implementation of national policies.

310
Appreciating uncertainties in drought characteristics should in no way distract 311 from decisive action for mitigating the impacts of droughts, improve disaster re-312 lief and strengthen adaptive capacity, because extreme events such as droughts 313 have been increasing over East Africa and have already resulted in deterioration 314 of livelihoods and ecosystem integrity. While there is likely spatial variation over 315 the region, we confirmed a statistically significant trend towards increasingly 316 drier conditions also for Kitui West with just over half of the SPEI ensemble 317 members. This trend was partly driven by a significant increase of minimum and 318 maximum temperature over time in all data products, while negative annual and 319 seasonal rainfall trends in some of the products could not be proven statistically 320 significant. Beyond the temperature, and therefore evapotranspiration, effect, it 321 will be worth investigating next how the timing and sub-annual variation of rain-322 fall propagates into negative SPEI values, i.e. drier conditions. Such an analysis 323 should go beyond trends in annual standard deviations of rainfall, which in our 324 case did not turn out significant either.