1. Introduction
By the year 2050, the world population will increase to approximately 9,725 million inhabitants [
1], so, by this same year, food production must also increase by about 70% [
2] cited by [
3]. Due to these projections, the general assembly of the United Nations [
4] adopted the 2030 agenda–sustainable development [
5]. This agenda proposes 17 priority goals [
5], where the topic “food production” resides in five of this 17 goals: 1) end of poverty, 2) zero hunger, 3) decent work and growth economic, 4) sustainable cities and communities and 5) responsible production and consumption. According to [
6,
7,
8,
9,
10,
11,
12,
13,
14,
15], to achieve these five goals, considering the effects of climate change, an efficient tool is the prediction of agricultural crop yield, through multiple linear regressions (MLR) and essential climate variables (ECV). There are 55 ECVs [
16], and they stand out for their ease of obtaining and importance in agriculture: 1) average soil moisture (ASM), 2) cumulative effective precipitation (CEP) and air temperature [
17,
18]. According to [
19,
20], from the air temperature it is possible to calculate: 3) average maximum temperature (AMT), 4) maximum maximorum temperature (MMT), 4) average minimum temperature (AmT), 5) minimum minimorum temperature (mmT), 6) average mean temperature (AMeT), 7) maximorum mean temperature (MMeT), 8) degree days and 9) cumulative potential evapotranspiration (CET), these nine ECV being those which mainly affect crop yields. According to [
21], the crops most sensitive to extreme ECV conditions in Latin America are: corn, wheat and bean. In Mexico, bean is a crop that requires special conditions of ASM and air temperature, both in irrigation (IBY) as in rainfed (RBY) [
17], therefore, strategies must be developed to guarantee a source of proteins, mainly for citizens with extreme poverty [
22]. Specifically in Sinaloa, sulfur bean is the most cultivated [
23], occupying an important place in the annual demand (120,000 tons), obtained with a harvest of 74,000 hectares (average yield of 1.66 T ha– 1) [
24]. Although autumn–winter bean is the most widely cultivated in Sinaloa, there are still no prediction models of IBY–RBY. This condition exacerbates the vulnerability of this crop to extreme ECV events, such as frost [
25], hot extremes [
26] and CEP irregularity, which are common meteorological phenomena in this state [
17,
26].
In this study, at two meteorological stations in Sinaloa (Culiacán–center and Rosario–south) and for the period 1982–2013 of the autumn–winter cycle (October–March), the following was carried out:
- a)
For ECVs, using the National Meteorological Service (SMN)–National Water Commission (CONAGUA) database [
27], daily series of precipitation and maximum-minimum temperature were obtained. To obtain reliable, long-term and good quality results [
28,
29], the SMN–CONAGUA daily series were homogenized using the Standard Normal Homogeneity Test (SNHT) method [
30]. With the homogenized series, the mean daily temperature (meanT) was determined. Finally, the annual series of: AMT, MMT, AmT, mmT, AMeT, MMeT, average bean degree days (ABDD) [
20], CET and CEP were calculated.
- b)
From the European Space Agency (ESA)–experimental break-adjusted COMBINED Product (version 07.1) [
31] – spatial resolution of 0.25° x 0.25°, daily soil moisture was obtained. These data were obtained for two points near the Culiacán and Rosario stations. ASM was calculated annually.
- c)
From the Agrifood and Fisheries Information Service (SIAP) [
32], the annual series of IBY and RBY were obtained.
Standardized Z normalization was applied to the annual series of all variables. Pearson (PC) and Spearman (SC) correlations were applied, depending on the normality of the series. To know the correlations significantly different from zero [
33], a hypothesis test was applied between PC and SC vs. Pearson’s critical correlations (CPC) and Spearman (CSC) [
7]. Using MRL, four predictive models of IBY–RBY (dependent variables) were obtained as a function of ECVs (independent variables). The four models were significantly predictive (PC > CPC). All MRL met the assumptions of no autocorrelation [
34], linearity and severe non-multicollinearity [
35,
36,
37], homogeneity and normality.
The goal was to model IBY–RBY as a function of ECVs, in central and southern Sinaloa.
This study provides models with predictive sensitivity of IBY–RBY [
38], based on extreme ECVs events [
39,
40,
41,
42,
43,
44,
45,
46]. The results contribute to the prevention of negative effects due to possible decreases in IBY–RBY, helping in the production of sustainable foods [
5], in a purely agricultural state, considered to this day as “the breadbasket of Mexico” [
24].
2. Materials and Methods
Study Area
The state of Sinaloa is located to the northwest of Mexico (
Figure 1) and has a predominantly semi-arid aridity index [
7]. This state, called “the breadbasket of Mexico,” has historically generated high rates of agricultural production. Specifically, bean cultivation occupies a relevant position, mainly due to the planted area and the economic income generated. For example, in 2013, 69,727 hectares of irrigated land and 4,723 hectares of rainfed land were harvested [
24].
Essential Climate Variables (ECVs)
Quality control and homogenization of meteorological series
From SMN–CONAGUA:
https://smn.conagua.gob.mx/es/climatologia/informacion-climatologica/informacion-estadistica-climatologica [
27], daily series of maximumT and minimumT and precipitation were obtained from 70 meteorological stations in Sinaloa. initially, it was decided to consider the stations that presented daily missing data < 5%. This condition was only met by five stations: Culiacán, Santa Cruz de Alaya, Las Tortugas, Rosario and La Concha. The data from these five stations were previously obtained by [
7]. Because [
29] argue that, to detect long-term climate change, reliably and with good quality, one must work with homogenized series, in this study, the meteorological series were homogenized with the climatol library (
https://climatol.eu) [
28,
47]. This software is based on the Standard Normal Homogeneity Test method (SNHT) [
30]. This library also calculated the missing data, using expression 1:
Where
is the estimated meteorological data, through the corresponding
nearby data, available at each time step
and with the weight
assigned to each of them. Statistically, the expression
is a linear regression, with a reduced major axis (orthogonal regression), in which the line is adjusted, minimizing the distances of the points perpendicular to it [
28].
Temperatures: average maximum (AMT), maximum maximorum (MMT), average minimum (AmT), minimum minimorum (mmT), average mean (AMeT) and maximorum mean (MMeT)
By means of the semi-sum of the homogenized maximumT and minimumT, meanT was obtained. In accordance with what was stated by [
19,
48,
49], the monthly averages of: AMT, MMT, AmT, mmT, AMeT and MMeT were calculated.
Average bean degree days (ABDD), cumulative potential evapotranspiration (CET) and cumulative effective precipitation (CEP)
ABDD was calculated with a daily scale and monthly average (expression 2) [
20]:
Where ABDD = average bean degree days (°C day–1),
= mean daily temperature and
= base temperature = 8.3 °C (for Mexican bean varieties) [
20].
To obtain monthly CET, the Hargreaves and Samani method was used [
50], expressed by Equation (3). This method was applied due to the absence of relative humidity and wind speed data.
Where CET = cumulative potential evapotranspiration (mm month–1), maximumT = maximum temperature (°C month –1), minimumT = minimum temperature (°C month–1), meanT = mean temperature (°C month–1) and = extraterrestrial solar radiation (mm month–1).
Finally, for the monthly determination of CEP, it was decided to apply expressions 4–5 [
51], since according to [
52], when the study area is mostly flat, such as Culiacán and Rosario (slope < 5 %,), this method is ideal.
Average soil moisture (ASM)
From ESA:
https://catalogue.ceda.ac.uk/uuid/0ae6b18caf8a4aeba7359f11b8ad49ae [
31], daily global soil moisture data with a spatial resolution of 0.25° x 0.25° were obtained. This database is called: experimental break-adjusted COMBINED Product, Version 07.1 (version 2023). It was created by direct fusion of soil moisture products from scatterometer and radiometer, level 2 [
53]. ESA data have been widely used, for example: [
18,
54,
55,
56]. From the global daily data, two sites close to the Culiacán and Rosario stations were selected. The coordinates of these two sites were: Culiacán (24°52′30″ N and 107°22′30″ W) and Rosario (23°07′30″ N and 105°52′30″ W). Finally, ASM was calculated annually for these two sites.
Agricultural variables
Irrigated yield (IBY) and rainfed (RBY) bean for the autumn-winter cycle
From SIAP:
http://infosiap.siap.gob.mx/gobmx/datosAbiertos.php [
32], a detailed review of the availability of annual IBY–RBY data was carried out. Only two municipalities (Culiacán and Rosario) presented 0 % missing data for the autumn-winter cycle. For IBY–RBY, historically the sowing date ranges from October 15 to November 20, and because, according to SAGARPA–SIAP [
57], the sowing–harvest cycle should concentrate ≥ 110 days, the harvest date ranges from February 2 to March 10, approximately.
In this study, the sowing-harvest cycle ranged from October 1 to March 31. Only the period 2003–2013 recorded information divided by municipalities, that is, IBY–RBY for the period 1982–2002, it was at the state level.
The data studied for ECV and IBY–RBY were for the period October–March 1982–2013.
Due to the availability of complete series of ECVs and IBY–RBY, in this study it was decided to work with only two municipalities: Culiacán and Rosario.
Table 1 presents the main statistics for IBY–RBY and ECVs, for the period January–December 1982–2013, at the Culiacán and Rosario stations.
Mathematical Equations that Govern the Statistical Analyses, Applied to Agricultural Variables and Essential Climatic Variables (ECVs)
Normalization
A standardized Z normalization [
58] was applied to all series (Equation (6)):
Where = value of the variable of the treated series, = arithmetic mean of the series and = standard deviation of the series.
Normality tests
Shapiro Wilk Method
Four normality tests were applied to all series (
and
and residuals). The first test was Shapiro-Wilk [
59], which was calculated using expression 7:
Where is the ordered sample of the standardized data with constants.
Anderson–Darling method
The second normality test was Anderson–Darling [
60], which is defined by expressions 8–9:
Where
is the number of observations, and
is defined by expression 9:
Where is the normal cumulative probability distribution function, with mean and variance specified from the sample, and is the data obtained in the sample, ordered from highest to lowest.
Lilliefors Method
Also, the Lilliefors test [
61] was applied, where the average and variance of each series were first estimated (expression 10).
Where is the sampling distribution function, is the theoretical distribution function and .
Jarque–Bera method
Finally, the Jarque–Bera test [
62] (expression 11) was applied, where the skewness and Kurtosis were used (expressions 12–13).
Where
is the sample size, and S and K are respectively the skewness and kurtosis coefficients, which are defined below:
Where
is the third moment, around the mean
, and
is the variance.
Where is the fourth moment, around the mean , and is the variance.
Correlations
Pearson (PC)
When the series presented normality, PC was applied [
63], which is based on expression 14.
Where PC = Pearson correlation coefficient, = covariance of and and = standard deviations of and .
Spearman (SC)
SC was used as a second alternative, that is when the series did not follow a normal distribution [
63] (expression 15).
Where SC is the Spearman correlation coefficient, is the number of data points in the series, and is the rank difference of element .
Hypothesis tests
To find out if PC and SC were significantly different from zero [
7], a hypothesis test was carried out (Equations (16) and (17)). PC and SC were contrasted vs CPC
) and CSC (
). CPC and CSC were obtained from [
33].
Multiple Linear Regressions (MLR)
Four MLR were carried out: two for Culiacán (irrigated and rainfed) and two for Rosario (irrigated and rainfed). IBY–RBY depended on ECVs: ASM, AMT, MMT, AmT, mmT, AMeT, MMeT, ABDD, CET and CEP. MLR were expressed with Equation (18).
The estimated response, was characterized with sampling regression 19:
Where each regression coefficient
was estimated with
, from the sample data, using the least squares method. The least squares estimators of the parameters
, were obtained by fitting the MLR model (Equation (18)), to the data of expression 20:
Where
is the observed response to the values
of the
independent variables
. Each observation (
), satisfied Equation (21):
Or each observation (
), satisfied expression 22:
Where
and
are the random error and the residual, respectively, which were associated with the response
and with the fitted value
[
64].
Validation of mathematical models
In the four MLR, the following was done:
- 1)
In the residuals, the autocorrelation contrasts of [
65,
66] were applied.
- 2)
Goodness-of-fit statistics were calculated: R2, PC, mean error (ME), root mean square error (RMSE), mean error absolute (MEA), percentage of error mean (PEM), percentage of error absolute mean (PEAM) and Theil’s U2 statistic (U2). To comply with the linearity hypothesis, in each MLR, the condition PC ≥ CCP ∴ CP ≠ 0 was met [
7].
- 3)
For the analysis of severe non-multicollinearity, the variance inflation factor (VIF) and tolerance (To) were initially calculated. For severe non-multicollinearity, it was verified that R2 ≤ 0.800, VIF ≤ 5.000 ∴ To > 0.200 [
67] cited by [
68,
69]. In models, the variables that presented severe multicollinearity were eliminated, to subsequently recalculate each MLR.
- 4)
For the homogeneity, it was verified that the average of each residual serie was zero [
70].
- 5)
Finally, a normality analysis was applied to the residuals of each MLR. Normality methods were the same as for PC and SC.
Software used and statistical significance
In this study, the follow softwares were used: XLstat version 2023, RStudio version 4.3.0, DrinC version 1.7, Past version 4.08, Gretl 2023b, Panoply version 5.2.6, Surfer version 10.0 and CorelDRAW version 2019.
All statistical analyzes were carried out with = 0.05.
3. Results
Agricultural variables and essential climate variables ()
In Culiacán (1992, 1995, 1998–1999 and 2010;
Figure 2a,b) and in Rosario (1988, 1995 and 2010–2011;
Figure 2c,d), CEP = 0 mm yr
–1 was recorded. In 1991, the lowest magnitudes of CET were recorded in Culiacán (CET = 624.338 mm yr
–1) and in Rosario (CET = 672.895 mm yr
–1).
IBY–Culiacán, ranged from 1.06 t ha
–1 in 1987 to 1.98 t ha
–1 in 2002 (
Figure 2a,b), and IBY–Rosario, varied from 0.90 in 2005 to 1.98 t ha
–1 in 2002 (
Figure 2c,d). RBY–Culiacán, ranged from 0.42 t ha
–1 in 1996 and 2007 to 1.06 t ha
–1 in 1982, and RBY–Rosario, varied from 0.42 t ha
–1 in 1996 to 1.06 t ha
–1 in 1982.
In 1986, the lowest ASM magnitudes were recorded in Culiacán (ASM = 0.020 m
3 m
–3,
Figure 2a), and Rosario (ASM = 0.040 m
3 m
–3,
Figure 2d). The highest magnitudes of ASM were recorded in: 2004 for Culiacán (ASM = 0.194 m
3 m
–3), in 2005 for Rosario (ASM = 0.232 m
3 m
–3).
In Culiacán (
Figure 2a,b), the highest magnitudes of maximumT were: (AMT = 37.333 °C in 2013, MMT = 36.917 °C in 2005, AMeT = 28.708 °C in 2013, MMeT = 27.417 °C in 2002 and ABDD = 17.643 °C in 2013), and the lowest magnitudes of minimalT were: (AmT = 10.333 °C in 2011 and mmT = 8.583 °C in 1998). In Rosario (
Figure 2c,d), the highest magnitudes of maximumT were: (AMT = 35.250 °C in 2013, MMT = 35.417 °C in 1983, AMeT = 26.833 °C in 2013, AMeT = 27.167 °C in 1994 and ABDD = 16.380 °C in 1994), and the lowest magnitudes of minimumT were: (AmT = 10.833 °C in 2011 and mmT = 9.567 °C in 2007).
Correlation
For IBY (
Table 2), significant correlations were recorded in IBY–Culiacán vs: AMeT (SC = 0.487), ABDD (SC = 0.486) and ASM (SC = 0.443), and in IBY–Rosario vs: AMeT (SC = 0350). For RBY, significant correlations were recorded in RBY–Culiacán vs: ASM (SC = –0.487) and CEP (SC = 0.375). RBY–Rosario did not register any significant correlation.
Only one case with severe multicollinearity was recorded (AMeT vs ABDD; SC = 0.999, R
2 = 0.999). According to [
33] and [
80], the correlations in bold (
Table 2), can be defined as significantly different from zero, since for
= 32 (period 1982–2013), a CSC = |0.350| corresponds, that is, SC > CSC.
Models
No ECVs was repeated in all four models (Equations (23)–(26)). Three ECVs were repeated in three models: CEP (IBY–Culiacán, RBY–Culiacán and RBY–Rosario; Equations (23), (24) and (26)), MMT (RBY–Culiacán, RBY–Culiacán and IBY–Rosario; Equations (23)–(25)) and AMT (RBY–Culiacán, IBY–Rosario and RBY–Rosario; Equations (24)–(26)).
Validation
No autocorrelation
The Breusch–Godfrey contrasts were: IBY–Culiacán (R2 = 0.007, p–value = 0.679 > LMF = 0.175), RBY–Culiacán (R2 = 0.019, p–value = 0.493 > LMF = 0.485), IBY–Rosario (R2 = 0.007, p–value = 0.678 > LMF = 0.177) and RBY–Rosario (R2 = 0.014, p–value = 0.560 > LMF = 0.350). The Ljung–Box contrasts were: IBY–Culiacán (p–value = 0.686 > Q´ = 0.163), RBY–Culiacán (p–value = 0.506 > Q´ = 0.443), IBY–Rosario (p–value = 0.774 > Q´ = 0.083) and RBY–Rosario (p–value = 0.543 > Q´ = 0.369).
Figure 3.
Correlation and confidence of the residuals of the models.
Figure 3.
Correlation and confidence of the residuals of the models.
Linearity and severe non-multicollinearity
According to
Figure 4 and
Table 3, R
2 and PC of the models were: IBY–Culiacán (R
2 = 0.348, PC = 0.590, n = 32;
Figure 4a), RBY–Culiacán (R
2 = 0.539, PC = 0.734, n = 32;
Figure 4b), IBY–Rosario (R
2 = 0.386, PC = 0.621, n = 31;
Figure 4c) and RBY–Rosario (R
2 = 0.283, PC = 0.532, n = 32;
Figure 4c).
Figure 4.
Observed and predicted values of irrigated (IBY) and rainfed (RBY, T ha–1) bean yields, in Culiacán and Rosario. a) IBY–Culiacán, b) RBY–Culiacán, c) IBY–Rosario and d) RBY–Rosario.
Figure 4.
Observed and predicted values of irrigated (IBY) and rainfed (RBY, T ha–1) bean yields, in Culiacán and Rosario. a) IBY–Culiacán, b) RBY–Culiacán, c) IBY–Rosario and d) RBY–Rosario.
The model with the best fit was RBY–Culiacán: R
2 = 0.539, PC = 0.734, ME = 2.255 x 10
–16, RMSE = 0.111, MEA = 0.086, PEM = –2.643, PEAM = 13.763 and U2 = 0.743 (
Table 3).
In IBY–Culiacán and RBY–Culiacán (Equations (23) and (24)), the variables AMeT and ABDD were not considered, because they were the only ones that presented severe multicollinearity (
Table 2).
Homogeneity
The average residue of four models were: IBY–Culiacán (1.250 x 10–7 T ha–1), RBY–Culiacán (–9.400 x 10–4 T ha–1), IBY–Rosario (8.618 x 10–18 T ha–1) and RBY–Rosario (0.000 T ha–1).
Normality
In table 4, the p–values of the four normality tests were: IBY–Culiacán (from 0.077 to 0.860), RBY–Culiacán (from 0.070 to 0.344), IBY–Rosario (from 0.890 to 0.963) and RBY–Rosario (from 0.395 to 0.788).
Table 4.
p–values of the normality tests of the residuals of the models.
Table 4.
p–values of the normality tests of the residuals of the models.
|
P–values of normality tests |
Dependent variable in each model |
Shapiro–Wilk |
Anderson–Darling |
Lilliefors |
Jarque–Bera |
IBY–Culiacán |
0.410 |
0.211 |
0.077 |
0.860 |
RBY–Culiacán |
0.158 |
0.185 |
0.070 |
0.344 |
IBY–Rosario |
0.900 |
0.904 |
0.890 |
0.963 |
RBY–Rosario |
0.395 |
0.534 |
0.788 |
0.500 |
4. Discussion
The results of CEP and CET (
Figure 2a) agree with [
71], who point out that, in northern Mexico and for the period 1961–2000, the decade 1990s, they recorded the lowest CEP (from 100 to 900 mm yr
–1). The results of IBY–RBY can be attributed to high CEP associated with Hurricane Paul (category 2, occurred in 1982), which elevated RBY–Culiacán and RBY–Rosario [
72]. Also, in the periods 1985–1987 and 1996–2003, extreme droughts were recorded in northern Mexico [
73,
74], which significantly altered IBY–Culiacán. Furthermore, the results of this study are similar to those reported by [
75], since these authors point out that, IBY–RBY in Sinaloa and for the autumn-winter cycle of 2021-2022, was 1.93 t ha
–1, being fourth place nationally in sown area (85,657 hectares). The results of ASM can be attributed to the fact that, in 1986 (July–September), minimal anomalies occurred: negative ones from the Atlantic multidecadal oscillation and positive ones from the Pacific decadal oscillation, which were generators of La Niña events [
76]. Furthermore, according to data from [
77] cited by [
74], points out that the period 2003–2005 was the one that presented the greatest extreme wet events in northwest Mexico. The results of temperature coincide with those of [
25], who point out that, in Culiacán, maximumT increases from 31.600 °C to 34.600 °C, in the period 1982–2008.
Figure 2c,d agree with what was stated by [
25], since this author points out that 2011 was the most catastrophic year in Sinaloa, due to the lowest historical of minimumT recorded in the last fifty years. These values of minimumT of the autumn-winter cycle caused economic losses of 2,353 million dollars, corresponding to 95% the loss of vegetables and annual crops [
78] cited by [
79].
According to [
33,
80], the correlations in bold (
Table 2), can be defined as significantly different from zero, since for
= 32 (period 1982–2013), a CSC = |0.350| corresponds, that is, SC > CSC. The significant SC can be attributed to the fact that bean are one of the crops most sensitive to maximumT, minimumT, meanT and wet and dry events (ASM and CEP) [
20,
24,
81].
The simultaneous repetition of ECVs (CEP, MMT and AMT) can be attributed to the fact that in IBY–RBY, CEP is essential, generating good availability of ASM [
82]. Also, [
83] mention that the development of crops in Sinaloa depends largely on the absence of high maximumT, for example, because they could damage the cells and tissues of the bean [
20].
According to [
84,
85], because the condition of p–value > (LMF–Q´) was presented in the four models (Equations (23)–(26)), it can be said that the residuals do not present autocorrelation (
Figure 3a–d).
According to [
7,
33,
84], the previous PCs present linearity, since the four MLRs (
Figure 4a–d) comply with PC > CPC (CPC = |0.349|, n = 32; CPC = |0.355|, n = 31). Also, all MLRs meet the condition R
2 ≤ 0.800, VIF ≤ 5.000 ∴ To > 0.2000 (
Table 3), therefore, all models dispense with severe multicollinearity [
67] cited by [
68].
According to [
70], all four models meet the assumption of homogeneity (average residuals = 0).
According to [
7,
76,
84], the four models presented normality in the residuals, because no p–values < 0.05 (
Table 4).
5. Conclusions
This study is a pioneer in Sinaloa in the prediction of IBY–RBY, using ECVs.
The lower magnitudes of ASM are caused, in large part by the seasonal occurrence (July–September period) of La Niña events (shortage of CEP): minimum negative anomalies of the Atlantic multidecadal oscillation and minimum positive anomalies of the Pacific decadal oscillation. The highest IBY values, are associated with tropical cyclones that have made landfall in Sinaloa, mainly due to their high humidity contributions. In general, ECVs have a greater correlation with IBY than with RBY. IBY–RBY in Culiacán and Rosario, are more sensitive to the extreme values of maximumT indicators, than to minimumT, meanT, ASM, CET and CEP indicators. AMT in Culiacán and Rosario (period 2009–2013), has not stopped increasing, which suggests that IBY–RBY are highly influenced by the occurrence of intense meteorological droughts.
The only event with severe multicollinearity was recorded in Culiacán (AMeT vs ABDD).
The four models met the hypotheses for MLR: no autocorrelation, linearity and severe non-multicollinearity, homogeneity and normality. RBY–Culiacán was the model with the greatest adjustment. IBY–Rosario was the only model to which a delay was applied to its series.
This study provides sensitive tools to prevent damage from a decrease in IBY–RBY, making use of ECVs, which are relatively easy to obtain in Sinaloa, a state that ranks fourth nationally in area planted with beans.