Modeling and Forecasting of Lake Malombe Fish Biomass and Catch Per Unit Effort (CPUE)

Lake Malombe fish stocks have been depleted by chronic overfishing. Various management approaches (co-management, command control, and ecosystem-based management to fisheries) have been used to manage the fishery. However, the lack of an accurate predictive model has hampered their success. Therefore, we developed and tested a time series model for Lake Malombe fishery. The seasonal fish biomass and CPUE trends were first observed and both were non-stationary. The second-order differencing was applied to transform the non-stationary data into stationary. Autocorrelation functions (AC), partial autocorrelation function (PAC), and Akaike information criterion (AIC) were estimated, which led to the identification and construction of autoregressive integrated moving average (ARIMA) models, suitable in explaining the time series and forecasting. The results showed that ARIMA (1,2,1) provided a better prediction than its counterparts. The model satisfactorily predicted that by 2032, both fish biomass and CPUE will decrease to 3204.6 tons and 59.672 respectively, signifying the potential threat to Lake Malombe fishery. The model justified the necessity of taking precautionary measures to avoid the total collapse of the fishery.

the result of overfishing. In Estonian lakes, Kangur et al. [11] on the other hand reported a significant reduction in the fish population as the result of human activities.
The concept of stock production models has been applied in Malawi inland fishery to provide information for fisheries managers and biologists in achieving management goals and take appropriate management strategies for their sustainable exploitation. The concept is based on the assumption that the dynamic of lake fishery is directly linked to the concept of the precautionary approach-a basic concept in fish stock management [12]. Schaeffer and Fox production approaches -also known as surplus production models (a model associated with maximum sustainable yield (MSY)) have been commonly applied to depict the status of fishery in Lake Malawi and other inland lakes. Although these models are acceptable in fishery management, their applications are still questionable. For example, the models only depict the current picture of the fishery using the time series trend and lack the power of prediction. Other models such as bootstrap, state-space [13], and Bayesian [12,13,14,15] have also been applied with little success. The time-series approach, however, has been indispensable in understanding natural resources systems and the development of better management policies [16]. It is described as a new approach to fisheries modeling and stock assessment. The approach demands few biological assumptions than other traditional fisheries models [17] and with simple mathematical techniques and few assumptions, it can significantly reduce the modeling costs including research costs [16]. Time series modeling provides a feasible way to examine the time series data and provide a prediction [18][19]. Several researchers have recommended the time series approach as the most outstanding technique. Lake Malombe has a long run time series of detailed data records on fish biomass landings and CPUE. However, no research work has been undertaken to forecast the future status of the fishery. In other words, management approaches (co-management, command control, and ecosystem-based management to fisheries) have failed due to a lack of accurate predictive models in this lake. In this paper, we developed and tested a time series model as a tool for forecasting the status of Lake Malombe fish biomass landings and CPUE.

MATERIALS AND METHODS
The study area Lake Malombe is located in the Southern part of Lake Malawi within coordinates 14°40′0″S 35°15′0″E. It is an impoundment of the outflow from Lake Malawi via the Upper Shire River [20] (Figure 1).

Lake Malombe catchment
The Lake is documented as the third-largest in Malawi with an estimated total area of 162 square miles (420km 2 ), water surface area of approximately 390 km 2 , length of 30km, the width of 17km, and water depth not exceeding 6m [21]. The communities around the lake are predominately fishers [22] and the lake has approximately 65 fishing beaches scattered over the three major administrative strata known as Lake Malombe East coded as 1.1, Lake Malombe West coded 1.2 and Upper Shire coded 1.3 [23]. The surrounding area of Lake Malombe is densely populated by the Yao ethnic tribe consisting of over 85% [22] of the fishing population. Few tribes such as Chewa, Lhomwe, and Nyanja are also found around the Lake.

Data collection
Department of Fisheries through its statistical office under the Malawi Fisheries Management Act (1997) regulations are mandated to collect monthly catch, fishing effort, and CPUE data from both large-scale and artisanal fishermen. The data collected using Malawi Traditional Fishery (MTF)-a computerized gear-based sampling technique are entered into catch and effort statistics database hosted by Monkey Bay Fisheries Research Division of the Department of Fisheries. In this study, 41 years of fish biomass and CPUE data were used to develop a forecast model for Lake Malombe fish biomass and CPUE.

Conceptual framework of the ARIMA model
To model the fish biomass and CPUE data, the traditional statistical models such as Autoregressive (AR), smoothing, Moving Average (MA) and ARIMA are applied. The autoregressive (AR (p)) model is expressed as where is the dependent variable at time t, −1 + −2 + ⋯ + − are the lagged dependent variables while 1 + 2 + ⋯ + are the unknown parameters of the model 1 ≠ 0 and is the value of the disorders term at time t, i,i.d. ~(0, 2 ): p-the number of lagged values of y and represent the order of the process. The moving average (MA(q)) is defined by a function of its present and q-past disorders (lagged error) and is expressed as where is the dependent variable y at time t, −1 , −2 … − means lagged disturbances and 1 , 2 … . means unknown parameters of the model ≠ 0, ~(0, 2 ) q the number of lagged values of y and represents the order of the process. The two-process (AR) and (MA) are combined to form an autoregressive moving average model (ARMA (p, q)) expressed as: However, the ARMA model works with stationary data which is not the case with fish biomass and CPUE data. Therefore, the non-stationary data can be modeled using a log transformation to linearize the series. The application of differencing to remove a mean trend from non-stationary series is also another procedure for transforming nonstationary data into stationary. The procedure has been advocated in the Box-Jenkins approach. In this paper, the autoregressive integrated moving average model (ARIMA (p, d, q)) is introduced to deal with non-stationary data. The general form of the ARIMA model is (2) is the empirical autocovariance at lag and is the sample variance. The ACF reveals regular moving average spikes. For example, if the model has an MA (1) component, then there will be only one regular significant spike. If the model has an MA (2) component, then the model will have two regular spikes. The sample ACF at lag, is simply the correlation between two sets of residuals obtained from regressing the elements and +1 on the set of intervening values. The PACF reveals a spike at the lag of the interaction term. Different combinations of multiplicative parameters were estimated to determine whether the identified parameters were statistically significant. The t ratios of the parameters to their standard error were estimated. To evaluate the model adequacy, likelihood ratio and AIC were used. Ljung-Box Chi-squared test was also used to estimate whether the overall correlogram of the residuals displayed any methodical error. The Ljung-Box Chi-squared test is expressed as where rk (k=1……...m) are residual AC and n is the number of observations used to fit the model. Akaike Information Criterion (AIC) is expressed as Where −2 ln ( ∅ ⁄ ) means the goodness of fit while +2k means model complexity and k are several model parameters.
Note, the model with the smallest AIC is chosen as the finest. Once the suitable time series model estimated its unknown parameters and established that the model fits well, the next step is to forecast future fish biomass and CPUE values. In this context, the autoregressive model is represented as follows: The next observation beyond y1…………yT is predicted using the model expressed below Where the ̂ are obtained by substituting the estimated parameters in the theoretical ones. The forecast +1 is obtained and used to forecast +2 which later is used to generate +3 . The process is used to obtain a forecast out of any point in the future. The R statistical software version 3.6.3 has all required functions to fit the 41 years' time series of fish biomass and CPUE data using ARIMA models.

RESULTS
Modeling and forecasting of total annual biomass landings and CPUE were conducted based on the raw data. The precision and characteristics of ARIMA models were detailed studied following the Box-Jenkins approach. The model parameters were identified, estimated, and verified. The model shown in Figure 2 is based on the 41 years' time-series data from 1976 to 2017. Figure 2a depicts the instability of Lake Malombe's total annual biomass landings trend with the highest observed from 1980 to 1990. The same observation is made in the CPUE data plot Figure 2b which shows the highest within the period from 1980 to 1994.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 October 2020 doi:10.20944/preprints202010.0565.v1 The first approach to time series modeling is data inspection. Time series data depicting seasonal spikes are usually described as non-stationary. As observed in Figure 2, the biomass landings and CPUE trends have been unstable from 1976 to 2017 indicating non-stationarity.

Model identification
The ARIMA model generally does not work with non-stationary data. Therefore, the first-order differencing was applied to transform non-stationary data into stationary.

Figure 3: Autocorrelation (AC) and Partial Autocorrelation (PAC) functions used in the ARIMA model
However, it was noted that first-order differencing did not completely transform the data into stationary. Hence, second-order differencing was applied. An appropriate model was identified by examining ACF and PACF. Figure 3 shows the AC and PAC functions for both fish biomass landings and CPUE. The ACF plot (Figures 3) showed a significant spike only at lag 1 meaning that higher-order autocorrelation was explained by lag 1 AR [24]. In verifying the data stationarity, however, it was noted that AR or MA models were not pure as seen from ACF and PACF correlograms Figure 3. Therefore, several models had to be tested to identify the most suitable one for fish biomass and CPUE forecasting,

Model selection
The model selection was based on the minimization of the Akaike information criterion (AIC) and the Gaussian Maximum Likelihood Estimation algorithm. The ARIMA model in R statistical manual allows estimating the coefficient of the models that were previously identified by providing the parameters p, q, and d using a maximum likelihood estimation algorithm and Akaike information criterion (AIC). The procedure offers a new time series representing values adjusted by model residuals and confidence intervals of the adjustment at 0.05 level of significance. The best model displays the least AIC and high maximum likelihood estimation algorithm.  Note: ns indicates not significant while ** and * indicate significance at 0.01 and 0.05 probability level of confidence Table 1 further shows that the p-value for the Ljung-Box test was greater than 0.05 in all competing models suggesting that there was very little evidence for non-zero autocorrelations. Both AR and MA models were greater than 0.05 suggesting that the residues of the model were independent at a 95% level of confidence and the ARMA model proved to be the best model fit.   Table 2 presents the results of the estimated values of the selected ARIMA (1,2,1) model. Based on the selected ARIMA (1,2,1) model presented in Table 2, the following model was developed.
= + − − − + with: , −1 :fish biomass landings or CPUE period t and t-1, respectvelly, , −1 : residuals of period t and t-1 constitute a white noise and 1 and 1 : coefficients of AR and MA processes respectively. From Table 2, the coefficient of AR and MA were extracted to develop the following forecasting models for fish biomass and CPUE.

Accuracy of ARIMA (1,2,1) model
Before forecasting, the residuals were checked through the Box-Ljung test, ACF, and PACF to see if there was any systematic pattern that needs to be eliminated to improve the accuracy and performance of the selected model. The Ljung-Box test (Table 3) for CPUE had a p-value of 0.911 while fish biomass was 0.103 and both were not significant suggesting that there was little evidence of non-zero AR in the sample forecast errors at lags 1-20. Therefore, ARIMA (1,2,1) provided an adequate predictive model which probably could not be improved. Besides, the ACF and PACF residuals plot (Figure 4) showed that none of the AC was significantly different from zero at 95% confidence intervals.

Figure 4: Autocorrelation and partial autocorrelation functions residuals
This proved that the selected ARIMA (1,2,1) model was an appropriate model for forecasting Lake Malombe fish biomass and CPUE values and could not be improved.

Forecast
After defining the most appropriate model of fish biomass and CPUE values, the forecasting was made. Table 4 shows that the noise residuals were a combination of positive and negative errors and falling within 95% confidence intervals indicating that the model had a good performance of forecasting.  Figure 5 presents the results of fish biomass and CPUE forecasts obtained after applying the ARIMA model (1,2,1) for the period of 15 years from 2017 to 2032.

Figure 5: Fish biomass and CPUE Forecasted values
As seen from Table 4 and Figure 5, the model satisfactorily predicted that by 2032, both fish biomass and CPUE will decrease to 3204.6 tons and 59.672 respectively.

Discussion
The abundance of fish species in Lake Malombe is directly linked to CPUE though common criticism of CPUE is that the relationship between abundance and CPUE is more complex [25][26]. Both the catch and CPUE appeared to increase from 1980 to 1984 and remained constant with a slight fluctuation from 1984 to 1990s and then collapsed drastically from 1994 to 2000. Several researchers had also similar observations. For example, Weyl et al [27], evidenced an increase in biomass and CPUE within the period of 1992/1993 in the Southern part of Lake Malawi after subsequent closure of the fishery. Alexander et al [28] on the other hand reported a strong relationship between whole lake CPUE to relative fish biomass and abundance. Maynou et al [29] further noted that CPUE series reflect the general abundance of species and catch fluctuation. It was noted that the decrease in CPUE and fish biomass trends within the period of 1990s to 2005 was an indicator of an increase in the diversity of gears, population of fisherfolks, and overexploitation. According to Weyl et al [27], CPUE variation is strongly linked to the difference in the number of fishers, man-hours, and the categories of fishing gears. Low CPUE indicates a relatively low abundance of fish which results in prolonged man-hours and an increase in the number of fishers and low catches [30]. The option to model both catches and CPUE was based on the fact that these two indicators can be used by regulators to monitor for a potential change in fish population related to the effects of human exploitation and other anthropogenic factors [31].
In this study, the first approach to time series modeling was data inspection using a plot. If time-series data depicted seasonal spikes, it means that the data is non-stationary and needs differencing. The fish biomass and CPUE series appeared to be stationary after second-order differencing. The method of modeling time series of fish biomass and CPUE addressed the problem of non-stationarity and gave good predictions. In other words, the conditional variance of the model was found to be volatile with time series and the ARIMA model well addressed this phenomenon. The study findings showed that it is reasonable to use history as a basis for prediction because of the steady decreasing trend in the time series and the slowly diminishing noise. The time series for the fish biomass and CPUE reflected a common scenario in Lake Malombe. Evidence has shown that various fish species in the Lake have been targeted by fisherfolks in recent years and its stocks are overexploited. A similar situation occurs in Lake Chilwa, Chiuta, and Shire river [32][33]. We noted that several other factors such as fishing effort, gear modification, climate variability, increased in fisherfolks population, and change in fishing gears and technology had a strong influence on fish biomass and CPUE trend though further research is required to support this claim. The ARIMA (1, 2,1) model predicted that both fish biomass and CPUE will continue declining suggesting that the fish stocks are continuously being overexploited. The model predictions agree with Kanyere et al [34] who also observed that the relative biomass index of Lake Malombe fishery in Figure   Kanyere et al further attempted to calculate the current fishing mortality for Lake Malombe and Upper Shire river using the average for the last three years and reported to be at 0.49 ±0.01. Comparing this value to the fishing mortality sustainable yield (FMSY) (0.28 ±0.04), they concluded that the current mortality in Lake Malombe is over 75% above the optimum long-term fishing mortality suggesting that the Lake Malombe fishery has experienced a catastrophic decline. In Lake Nasser, Egypt, the time series of size and CPUE also showed a negative trend indicating the high exploitation rate of the most important commercial fish species in the lake by the fishing gears [35]. Therefore, this study finding present a wakeup call to the fisheries managers and shows that, if the over-exploitation of Lake Malombe fishery is to be reduced without alternative livelihood option, the most vulnerable riparian population living at the margins of the cash economy will have limited choices for generating income and sustaining their livelihoods [5,2,6]. (a)

(b)
However, if the population of indigenous fish species in Lake Malombe collapse, immigration from elsewhere will do little to help the population recover [36]. This can also be critical if a restocking program is attempted using the population from other localities. Given the failure of previous efforts to manage Lake Malombe fishery, this research suggests that there is a need to review Malawi fisheries regulations to balance between conservation and the demand of the population of fisherfolks otherwise the CPUE and fish biomass models developed in this study indicate the potential threat to Lake Malombe fishery.