1. Introduction
The use of remote sensing and Geographic Information Systems (GIS) applied to epidemiological events bring options for more integrated approaches to diseases and novel possibilities for their prevention and control [
1]. Early warning systems that provide information about temporal and spatial predictions of epidemics might help control and prevent malaria outbreaks [
2], since they guarantee that health authorities and decision makers become aware of the immediate threat faced, and prepare them to take effective control measures [
3]. Thus, early detection and prevention of malaria outbreaks constitute one of the four technical elements of the Global Strategy for Malaria Control [
4].
Generally, early warning systems of malaria are based on records and monthly diagnoses of reported patients by health authorities, who are responsible for diagnosing cases and providing effective treatment to the population. If the monitoring system is conducted effectively, prevention and control measures can be carried out early, prioritizing resources in more vulnerable areas [
2,
5].
One of the most used tools for the development of early warning systems are statistical prediction models based on historical reports of cases and indicators of environmental risks [
6,
7]. Being able to quantify the incidence of environmental/climatic variables over the occurrence of malaria cases is a key step towards an effective early detection. However, one limiting factor is the lack of or difficulty to access environmental and meteorological data. It is here where remote sensing constitutes a key source of information for the development of epidemiological predictive models.
In Argentina, studies using satellite information for predicting disease cases transmitted by mosquito vectors are limited and primarily focused on addressing the issue of dengue [
6,
8,
9,
10,
11]. However, in the case of malaria, studies allowing the prediction of epidemic outbreaks have not been conducted to date.
The area encompassing the Argentinean-Bolivian border is a region of intense epidemiological dynamics, where pathogenic interactions are related to structural, climatic, and social conditions, such as migratory dynamics in both directions. For many years, the transmission of malaria has been favored by cases imported from Bolivia [
12]. Although in the ARBOL (Argentina-Bolivia) campaigns during the 90s, a reduction of malaria cases was achieved in both frontiers, bad weather conditions jeopardized accessibility to terrain, leading to the discontinuity of control actions [
13]. Nevertheless, the number of indigenous malaria cases decreased substantially and reached zero, for which Argentina has been officially recognized by the World Health Organization (WHO) as free from autochthonous transmission of malaria in 2019.
In this context, the analysis of the degree of influence of environmental factors (obtained from remote sensing or satellite imagery) on the emergence of malaria cases is proposed, aiming to generate models capable of predicting future epidemic outbreaks and maintain epidemiological surveillance in San Ramón de la Nueva Orán, Orán Department (Salta province). To achieve this, epidemiological data of people reported with malaria in San Ramón de la Nueva Orán between 1986 and 2005 were analyzed; a local environmental characterization of the study area was performed; environmental variables were obtained through the pre-processing and processing of satellite imagery; and ARIMA (Autoregressive Integrated Moving Average) models based on the fluctuation of malaria cases and on the obtained environmental variables were used to determine the degree of adjustment of the model for predicting the emergence of new cases and estimating transmission risks.
2. Materials and Methods
2.1. Study Area
The study area corresponds to the city of San Ramón de la Nueva Orán, located in the north of the Salta province (
Figure 1). The population is 82,413 inhabitants [
14] and thus the city is the second most important of the province and constituting an important economic center [
15,
16].
The climate is subtropical with dry season. The mean annual rainfall ranges between 700-1000 mm and it concentrates in summer months (November-March) [
17]. Mean annual temperature is 21.4° C, reaching extreme values of 45° during the summer. Winters are temperate, and humidity is around 78% [
18]. The city is located in the altitudinal stratum corresponding to Yungas piedmont forests (between 400 and 700 meters above sea level, m.a.s.l.). Land use in the region is based mainly on the cultivation of sugar cane, citrus fruits, horticulture, tropical fruit production and logging [
18].
2.2. Epidemiological Data
The cases of malaria patients were extracted from epidemiological sheets developed by technicians at the operating base of Orán, which belongs to the National Program of Vector Control, Salta delegation, of the National Health Ministry of Argentina. These epidemiological sheets contain data provided by patients, including date (day, month, year), name of the patient, address of the patient, place of infection, trips held in the last months, occupation, nationality of the patient and classification (indigenous, introduced or imported, according to the place of infection). Cases were registered monthly. Some data of the original epidemiological sheets were extracted and digitized in Excel sheets. In this study, only malaria cases reported as indigenous and introduced were used, excluding imported cases under the assumption that climatic variables fluctuations at the local level might have influenced the emergence of the former but not the emergence of imported ones.
2.3. Landsat Images: Selection, Calibration and Georeferencing
Since the epidemiological data used in this study are in a monthly resolution, a temporal series of Landsat satellite images with an equivalent frequency was generated (240 images from January, 1986 to December, 2005). Thus, an interpolation algorithm in IDL was implemented to estimate missing data and generate a complete temporal series that matched the epidemiological series. Landsat images were obtained from three different sources: United States Geological Survey (USGS), Global Land Cover Facility (GLCF): Maryland University, United States, with support from NASA, and the Instituto de Pesquisas Espaciais (INPE, Brazil). Since the study area was in the limit of two adjacent satellite passes: path: 230/231; row: 076 in the WRS-2 System (World Reference System), it was possible to define it in both scenes, for which an important number of images became available (
Figure 2). A total of 160 Landsat TM 4 and 5, and X Landsat ETM+.7 images were downloaded. The reference system of the images is UTM_WGS84.
Satellite images were calibrated through the use of the remote sensing package. For the present study, two calibration scripts (one for Landsat 5 and one for Landsat 7) were created. Once the bands were calibrated, NDVI and NDWI were calculated in RStudio, using the “ndvi” and “ndwi” functions of the Remote Sensing package. The calculated NDWI corresponds to Gao [
19].
Three satellite images from the different sources were selected to assess whether they were correctly georeferenced. Google Earth coordinates were defined as the real coordinates. By overlapping USGS Landsat images with Google Earth images, it was possible to check the geographic correspondence between them, and a USGS Landsat image was then used for georeferencing in ENVI 4.8 software. Since the INPE images were not correctly georeferenced, the georeferencing proceeding was performed for all these images with the USGS template image.
2.4. Environmental Variables Through Satellite Imagery
Three temporal series of Landsat images (one for the Land Surface Temperature (LST), one for the Normalized Difference Vegetation Index (NDVI), and one for the Normalized Difference Water Index (NDWI)) were obtained, and the fluctuation of the environmental variables within the study period was derived. Pre-processing steps of the Landsat satellite images are reported and the images from the calibration through the usage of an R algorithm on the one hand and of the Remote Sensing package in R on the other are compared.
2.5. Normalized Indices
Data collected through satellite images were processed in order to estimate real ecological variables and meteorological variables measured in situ [
20]. These estimations consist of transformations of the images using algorithms based on the bands, and have been widely used to monitoring changes in natural factors, such as rainfall and surface temperature. Normalized Vegetation Index (NDVI): For the identification of plant masses, ratios or indices were based on the specific radiometric behavior of vegetation. The Normalized Vegetation Index (NDVI) [
21] transforms multispectral data into a single band. As all normalized indices, NDVI values vary between -1 and +1, with values closer to 1 in a given pixel or area reflecting higher vegetation health [
22]. Normalized Difference Water Index (NDWI): Different versions of the normalized water index exist, and all of them are considered good indicators of water content and humidity conditions of vegetation. Its values range between -1 and 1, with water bodies exhibiting positive values and vegetation and soil exhibiting zero or negative values [
19]. Generation of NDVI, NDWI and LST temporal series: A total of three temporal series were obtained from the original Landsat images: one for NDVI, one for NDWI and one for LST, as previously reported. For their processing, ENVI 4.8 and IDL were used.
2.6. ARIMA Models (Autoregressive Integrated Moving Average)
The temporal analysis of Box-Jenkins or ARIMA models the reported cases, allowing to predict the number of expected cases; and provides confidence intervals associated to these predictions. The comparison of such predictions with the observed cases facilitates decision making and allows determining whether a high number of cases corresponds to an outbreak of the disease or to random variation.
The ARIMA model includes the following two parameters: “autoregressive” (AR), and moving average (MA), each of which includes one term indicating the number of lags, designed as “degree”. Thus, the autoregressive term relates the observation obtained at the time t to the observation obtained at the time t -1 (first degree) or t – 2 (second degree), and so on. The moving average term relates the error (difference between observed and expected values) of time t to those of time t – 1 and t – 2, etc. Both sets might also include seasonal terms (of degree 12, 13, 52, etc., depending of the interval between observations) and their multiples.
2.7. Data Analysis Through ARIMA Model and Adjustment
To establish the relation between malaria prevalence and the environmental variables derived from satellite imagery, a multivariate ARIMA model was estimated. This model reflected the influence of the values of the (dependent) variables and their random noise at the recent past, over the values of the dependent variables at each interval, taking the effect of the independent variables and their respective lags into account [
23].
The search of the ARIMA model which best adjusted to the data was performed using the “Expert modeler” option of the Temporal Series module of SPSS 15.0 software. In order to do this, the series of malaria cases and environmental variables were divided in two sub-periods: one sub-period of estimation, used to determine which model best adjusts data; and one sub-period of prediction, used to test the forecast capacity of the model. The estimation period went from January, 1986 to December, 1999, and the prediction period corresponded to the year 2000 (from January to December).
The Expert Modeler automatically searches for the model which best adjusts to each dependent series. If independent (predictor) variables are specified, the Expert Modeler selects those with significant associations with the dependent series for their inclusion in the ARIMA models. If the independent variables do not bring any information, an univariate ARIMA model is generated [
24].
To determine the degree of adjustment of the multivariate ARIMA model proposed by the Expert Modeler, the coefficient of determination (R2) was used, which determines the proportion of the variance of the dependent variable explained by the model. This statistic value varies between 0-1. The adjustment of the model was also tested with a residual analysis, verifying that they were not correlated through the Ljung-Box statistic test, which assesses the existence of autocorrelation among residuals through the Q index (i.e., a non-significant Q index implies absence of autocorrelation).
2.8. Influence of Environmental Variables over Malaria Cases
The degree of influence of the climatic/environmental variables (NDVI, NDWI and LST) obtained from the remote sensors over the emergence of malaria cases in San Ramón de la Nueva Orán between January, 1986 and December, 2005 was analyzed.
Firstly, a correlation analysis between the remote-sensing-derived environmental variables was performed, in order to avoid multicollinearity issues (i.e., high correlation among explaining variables) (
Table 1). Then, a selection of the variables was performed, and two possible models were considered: Model A, which included the variables NDVImed, NDWIvar, LSTmed y LSTvar; and Model B, which included the variables NDWImed, NDVIvar, LSTmed, LSTvar.
Before applying the statistical analyses to Models A and B, an exploratory analysis of the data was carried out, in order to assess for correlations between the number of malaria cases and the independent variables. The analysis revealed that the number of infected patients was higher when the mean independent variables with one month lag were considered.
On the other hand, through an annual mean and variance analysis of the independent variables considered, it was observed that the mean of malaria cases exhibited higher annual variability and that the mean and variance of the total number of cases in the 20 years was very different. This result led to the application of hyper Poisson regression models for over dispersed data with a variable ⎣ parameter (incidence rate) by year (i.e., ⎣ is not constant but varies with time). In sum, the fluctuation of the climatic/environmental variables in relation to the emergence of malaria cases was analyzed through Poisson regressions.
An index named Incidence Rate Ratio (IRR) is obtained, through which it is possible to determine the percentage of influence of each of the environmental/climatic variables over the emergence of malaria cases. In turn, standard error (S.E.), p and confidence interval values (C.I. 95%) are obtained.
In the present study, data were analyzed through the aforementioned Poisson regressions, but considered the two previously described models A and B (which included NDVImed, NDWIvar, LSTmed and LSTvar; and NDWImed, NDVIvar, LSTmed, LSTvar respectively).
4. Discussion
Epidemiologically, the northwest of Argentina was an important region due to the outbreaks of malaria cases registered since the late 19th century. The development of the disease was influenced by the topography, climate and phytogeography [
12]. Thus, the “Malaric mountainous area” was limited to the provinces of the north and center of the country [
25,
26]. Historically, the season of transmission of the disease occurred in October or November in Salta and Jujuy; in November or December in Tucumán, Santiago del Estero and Catamarca; and in December and January in La Rioja and Córdoba, but it was delayed if the previous winter and spring were especially cold and rainy, respectively. The transmission of the disease could last until May or June in the whole area, and it could extend during temperate winters [
12,
25,
27].
In the current study, the fluctuation of malaria cases in San Ramón de la Nueva Orán during the 1986-2005 period showed a seasonal pattern with abundance peaks during summer and autumn months. This agrees with authors who found that malaria cases emerged during summer and autumn in Aguas Blancas (border with Bolivia), preceded by the highest abundance of the mosquito vector,
An. pseudopunctipennis, three months before [
12]. Thus, the emergence of malaria cases was related not only to the presence of the mosquito but also to the periods of its highest abundance, which for the study area occurred in spring [
28,
29] and summer [
28].
San Ramón de la Nueva Orán is located at the Yungas piedmont, an area which has suffered severe anthropic transformations in the last decades, mainly due to deforestation for agricultural expansion or urbanization. The latter is linked to the migration of the population towards rural areas in search of work, this increases the contact with the populations of mosquitoes and the risk of contracting the disease by rural workers who, in general, do not use body-covering clothing or repellent in their working hours. The contact is due to the fact that the “edge effect” of the natural vegetation patches increases, which are adjacent to agriculture and represent areas with high abundance of mosquitoes [
30]. It should be taken into account that the emergence of agricultural land generates suitable conditions for the generation of breeding grounds, such as irrigation channels.
Based on the descriptive analysis, it was observed that the most affected age group was the one comprised by male gender between 15 and 30 years old, followed by the 31-45 years old group. This makes sense, since young adults are the ones who generally carry out rural work, while women and children remain in the households. This agrees with a study carried out in Perú, where the authors surveyed people who had contracted the disease, finding that all of them were male farm workers who had not protected during working hours [
31]. Dantur Juri et al. [
12] also found a high prevalence (79%) of male cases in El Oculto in comparison with Aguas Blancas (49%). The authors attributed that high difference to the fact that Aguas Blancas is on the border with Bolivia, thus trade is the main commercial activity, in which women are also involved. It is well-known that the dynamics of malaria are related to human activities [
32,
33,
34].
As previously mentioned, should be taken into account by the medical authorities when implementing control measurements, since by knowing which are the seasons with a higher number of reported cases, it can be deduced that a higher exposure and infection might exist then. Besides, it is important to warn the target population in order for them to take adequate control measurements to avoid contracting the disease by avoiding contact with the vector.
The use of indices derived from the satellite images and ARIMA temporal series models were developed to predict the emergence of new malaria cases and to estimate the risk of disease transmission. In the present study, the ARIMA model was adjusted to the changes in the epidemiological behavior of malaria in San Ramón de la Nueva Orán, and which was generated automatically. With NDVI, NDWI, and LST indices used as predictor variables for the occurrence of cases, it was observed that their prevalence was related with the emergence of cases the month before and with maximum NDVI and the mean NDWI observed five months before.
The relation between maximum NDVI and malaria prevalence has been previously reported, where the vegetation covers directly affected the abundance of anopheline mosquitoes [
35], since it maintained soil and air humidity, and provided them of more shelter [
12]. Another study that agrees with these results was carried out in Kenya. The study observed that the monthly number of malaria cases was strongly correlated with the NDVI of the previous month. In turn, the authors observed that with a minimum value of NDVI, the number of cases of malaria registered in the following month was more than 5% of the total [
20]. In Burundi (Africa), a model was developed to predict the incidence of malaria from the association between climatic variables and NDVI with the cases of malaria reported monthly. The best model to predict the incidence of the disease included rainfall, maximum temperature, and NDVI of the previous month. In turn, through the use of monthly records of malaria cases of 23 provinces of Afghanistan and their relation with environmental variables, such as rainfall, temperature and NDVI [
36]. Adimi et al. [
37] found that the latter was the best predictor of the incidence of malaria. Similar results were reported by Gaudart et al. [
38] in a study performed in Mali, Africa, in which the authors established a cohort of children up to 12 years old and followed them for five years to evaluate parasitaemia in blood samples. Through a temporal series analysis associating incidence of
Plasmodium falciparum and NDVI, they found that this index explained the seasonal pattern of the parasite with a lag of 15 days, and that an increase in NDVI generated a significant increase of parasitaemia.
As previously cited, in this study, it was observed that the NDWI related inversely with the emergence of malaria cases in San Ramón de la Nueva Orán. The use of NDWI as a regressor variable in the temporal and spatial models of different diseases has been previously reported by some authors. Estallo et al. [
10] demonstrated that NDWI is a good predictor of the Household Infection Index (HI) in San Ramón de la Nueva Orán. In turn, Cohen et al. [
39] used NDWI, among other remote-sensing-derived environmental variables, to develop spatial models of malaria transmission in Swaziland, Africa. The authors showed that the transmission of the disease was related to areas with high NDWI values. In the study of Dambach et al. [
40], the authors showed the existence of a significant relation between NDWI and the presence of breeding sites of the vector of malaria in a rural area of Burkina Faso, which resulted of great importance for the generation of risk maps for the transmission of the disease. In Italy, Rosa et al. [
41] analyzed temporal series of captures of
Culex pipiens, a mosquito vector of the West Nile Virus, and environmental variables such as rainfall, temperature and NDWI. They found that an increase in NDWI at the beginning of the year was related with a shortening of the season of highest abundance of mosquitoes. However, this study agrees with the reported by Estallo et al. [
10] and Machault et al. [
42]. NDWI is used for the environmental characterization, since it measures the foliar content of water in vegetation and also constitutes an indirect measure of rainfall and soil humidity, which play an important role in the biology of the disease vector.
Although in the described studies, the relation between NDWI and the drivers of the emergence of several diseases is positive, in this study, the NDWI exhibited a negative effect on the emergence of malaria cases 5 months later. Taking into account that NDWI is related with the occurrence of rainfall and humidity conditions, the inverse relationship between NDWI and the emergence of malaria cases here found might be explained by the occurrence of heavy rains, which might cause the washing of breeding sites of the immature forms of the mosquito vectors, with the consequent reduction of the populations of anopheline mosquitoes, and the indirect reduction of the number of malaria cases.
A similar result was reported in a study carried out in Colombia, where the transmission of dengue was associated with the variability of local climatic conditions, with a lag of 20 weeks (approximately 5 months) in the case of rainfall [
23]. According to the authors, one of the hypotheses which might explain this lag is that local populations of the vector might connect through the dispersion of a small group of migrant females which might colonize recently established new habitats (water reservoirs) during periods of higher rainfall. Subsequently, due to the generation of many populations of the vector, the flux of individuals within the new habitats is possibly favored, which might ultimately derive in the long-term persistence of vector populations. This can generate a critical population density of the vector, which likely allows the efficient dispersion of dengue virus within human population.
As a result of the multilevel Poisson regressions and the two formulated models, it was observed that malaria cases might increase with both a decrease in mean NDVI and an increase in mean LST; or with a decrease in NDWI and an increase in mean LST. In other words, in both formulated models malaria cases were positively related to an increase in mean LST. It is known that temperature is a key factor for the transmission of malaria [
43].
Several authors [
33,
35,
44,
45] reported that an increase in temperature produced a faster hatching of eggs and shortened the duration of the larval period, which generated a higher number of adults in a shorter time, thus increasing the populations of anophelines. Furthermore, with the increase of temperature, the gonotrophic cycle of the female mosquito shortens, which increases the frequency of blood intakes, i.e., the biting rate [
32,
33,
45]. Lindblade et al. [
32] not only demonstrated that an increase in temperature resulted in an increase of the number of adults of
Anopheles (Cellia) gambiae which rested in households, but also that the biting rate was higher in localities of higher temperature. On the other hand, different laboratory tests demonstrated the reduction of the gonotrophic cycle of
An. albimanus when exposed to temperatures between 24 and 30° C [
46].
In the study of Afrane et al. [
47], the authors reported a reduction in the gonotrophic cycle of
An. gambiae from 0.9 to 1.7 days in localities of Iguhu (Uganda) with higher temperature. Besides, the extrinsic incubation period of the parasite held a direct relation with temperature, since when increasing it, the incubation period of the
Plasmodium parasite became shorter [
33,
35,
44,
45,
47], which increases the time of effective life of the vector. Thus, for example, the extrinsic period of
Plasmodium falciparum in
An. gambiae was reduced by 17.3 days (from 55.5 to 38.2 days) when temperature increased from 18 to 18.9° C [
32]. In general terms, in the case of the mosquito, the gonadotrophic cycle lasts 2-3 days, matching with the biting of females in search of blood [
44].
The relationship between the emergence of malaria cases and temperature has been previously reported in the country by Dantur Juri et al. [
12], who showed that malaria cases in El Oculto and Aguas Blancas localities (Orán Department, Salta province) were associated to mean and maximum mean temperature, among other variables. In El Oculto, rises in maximum mean temperature provoked an increase in emergence risk of malaria cases; while in Aguas Blancas the increase in emergence risk of malaria was produced by rises in mean monthly temperature and relative humidity. Also, Sáez-Sáez et al. [
45] analyzed the relation between rainfall and temperature on the incidence of the disease in Sucre (Venezuela). In the mentioned study, malaria cases and climatic variables showed a positive correlation, with rainfall and air temperature as the variables that better explained the emergence of malaria.
In Model A of the present study, the cases of malaria were also influenced by the decrease of mean NDVI; while in Model B malaria cases were also affected by a decrease of mean NDWI. This might be due to the fact that both indices are related. NDVI measures the status of the vegetation, its vigor and “greenness” in relation to the process of photosynthesis. This, in turn, is positively related with environmental conditions such as rainfall, humidity and temperature. NDWI measures the water content of vegetation and is thus an indirect indicator of soil and environment humidity. Ultimately, an increase in rainfall will produce increases in both NDWI and NDVI. It is well-known that rainfall generates suitable conditions for the creation of new larval habitats, but strong rainfalls might produce floods which would wash the breeding sites of anophelines [
12].
Although there are studies which differ with the results obtained in the present work, showing a positive relation between NDVI and malaria cases, in a study carried out in Bangladesh, Haque et al. [
48] analyzed malaria cases in relation to meteorological and environmental variables, such as rainfall, temperature, humidity and NDVI. The authors reported that an increase of 0.1 in monthly NDVI was associated with a 30.4% decrease of malaria cases.
Finally, it should be taken into account that the dynamics of malaria are very complex and not only environmental but also social, economic and politic factors play a role. The quality of households (which is in turn related to economic conditions of the populations), hygiene conditions, actions carried out but public health agencies, population dynamics and migration within areas of endemic transmission, among others, are factors which directly and indirectly affect the dynamics of the disease.