Probability Distribution of Philippine Daily Rainfall Data

Philippines as an archipelago and tropical country, which is situated near the Pacific ocean, faces uncertain rainfall intensities. This makes environmental, agricultural and economic systems affected by precipitation difficult to manage. Time series analysis of Philippine rainfall pattern has been previously done, but there is no study investigating its probability distribution. Modeling the Philippine rainfall using probability distributions is essential, especially in managing risks and designing insurance products. Here, daily and cumulative rainfall data (January 1961 August 2016) from 28 PAGASA weather stations are fitted to probability distributions. Moreover, the fitted distributions are examined for invariance under subsets of the rainfall data set. We observe that the Gamma distribution is a suitable fit for the daily up to the ten-day cumulative rainfall data. Our results can be used in agriculture, especially in forecasting claims in weather index-based insurance.


Introduction
Philippines, as an agricultural country, allocates about 32% of its total land area to crop production [1].The major crops include rice, corn, sugarcane and banana.In crop production, it is essential to consider climate and weather factors that will greatly determine the potential growth and yield of harvests.An example of these factors is rainfall availability.
Erratic rainfall pattern and extreme events could lead to production losses [2,3].Being an archipelagic tropical country near the Pacific ocean, Philippines experiences more of the adverse effects of the fast-changing climate [4].For instance, scarce or excessive rainfall events (e.g.El Niño and La Niña) occur more often and more intense than usual [5].These situations affect many sectors negatively or worse, these can lead to destruction of valuable assets.These circumstances could also threat food security, telecommunication, and coastal communities.Due to these consequences, it is necessary to determine rainfall patterns in different parts of the country.Since rainfall is highly variable, we can study this using probability distributions [6,7,8].
When the correct probability distributions for rainfall data are determined, we can easily simulate or forecast rainfall amounts without losing its accuracy and reliability.The simulation can be used for managing water resources, such as watersheds or in rain-fed farms.It can also be employed in computations for weather index-based insurances for crops [9].Time series analysis of Philippine rainfall data and its seasonality has been previously done [10,5,11,12].However, there is still no existing published results about the probability distribution of the rainfall pattern in the Philippines.This study is the first and intended to initiate more research in fitting distributions to weather indices.Our results can be used in stochastic risk management and in the design of insurance products [13,14,15].This study presents models of Philippine precipitation using daily rainfall data from the Philippine Atmospheric, Geophysical and Astronomical Services Administration (PAGASA).Specifically, we fulfill the following objectives: (i) Fit the daily and cumulative rainfall data into positively-skewed distributions; (ii) Determine the most appropriate probability distribution using goodness-of-fit tests; and (iii) Determine the subsets of the rainfall data where the fitted distributions exhibit invariance.The third objective is intended to justify forecasts that only use subset of the total rainfall data set.That is, whatever subset is used, a similar probability distribution can be invoked up to some degree.This is important especially when only a subset of the rainfall data is available.
In this study, we utilize rainfall data available from 28 synoptic stations of PAGASA, where each region of the Philippines is represented.We consider 55-year daily rainfall data from January 1961 to August 2016.Each station reaches area of 30 km up to 50 km radius.The daily rainfall data is positively-skewed where most data points fall under lower values and is heavily-tailed [7].We use different probability distributions for fitting, e.g., Negative Binomial, Gamma, Exponential, Weibull, Poisson, Geometric and Normal.Our results show that Gamma distribution is a good fit for the daily up to the ten-day cumulative rainfall data.Since the model does not use time series analysis, we can only predict the rainfall amount but not when that event will exactly happen.

Foreign Studies
The following are some of the existing researches regarding the use of probability distributions to model rainfall data.[16] developed a model from the quarterly rainfall data of Zaria, Nigeria using Gamma distribution.Using the Kolmogorov-Smirnov test, they concluded that Gamma model adequately fit the rainfall data.

Monthly Data
In 1999, Sen et al. [17] modeled the monthly rainfall data in Libya.They utilized the latest recorded rainfall data for at least 20 years and confirmed that monthly rainfall is Gamma distributed using chi-square test.
A similar research was studied by Al-Suhili et al. [18] where they modeled the monthly rainfall data of Sulaimania Region in Iraq.They fitted continuous distributions such as, Normal, Lognormal, Weibull, Exponential and Gamma.From their calculations, they found that Gamma distribution best fit the data.

Daily Data
As early as in 1998, probability distributions were used to model rainfall.Duan et al. [19] worked on the Weibull distribution to model the daily precipitation in the US Pacific Northwest.They have seen that the one-parameter Weibull distribution performed as efficient compared with the Gamma model.Li et al. (2012) [7] conducted studies on the 49 daily precipitation records in Texas.They worked on the Kappa, Gamma, Exponential and Generalized Pareto (GP) distributions.They also used hybrid (Gamma and GP) distributions to compensate for low-generating of extreme values of the first four distributions.They have shown that application of hybrid distributions can create an efficient and more realistic rainfall simulations.
In a study conducted by Brissette et al. (2012) [8], rainfall data were collected from 24 weather stations and two watersheds in Quebec, Canada.They evaluated six probability distributions, namely, Exponential, Gamma, Weibull, skewed-Normal, Mixed-Exponential, and a hybrid (exponential and GP).Their results showed that Mixed-Exponential distributions best simulates the daily precipitation in the province of Quebec.
Another study by Neykov et al. (2014) [20] modeled rainfall data using Gamma, Weibull and Lognormal distributions.Similar with Li et al. (2012), they also used hybrid distributions since common distribution models fail to produce sufficiently heavy-tail results.Based from their results, hybrid distributions best fit the daily rainfall data.

Hourly Data
Dan'azumi et al. (2010) examined hourly rainfall data in Peninsular Malaysia.They tried four distributions (GP, Exponential, Gamma and Beta).They evaluated the models using goodness-of-fit tests and concluded that Generalized Pareto distribution is the best model for hourly rainfall data [21].

Local Studies (Philippines)
Studies on rainfall patterns have been made.Pajuelas (2000) [10] researched on rainfall variations in the Philippines in 1950-1996.Long-term trends and variability of the rainfall in different parts of the country were also analyzed using time series analysis as in the research of Villafuerte et al. (2014) [5] and Cinco et al. (2014) [11].June to October is the usual rainy season in the Philippines.However, there is still no existing published literature for modeling the daily rainfall data in the Philippines using probability distributions.
We have done a preliminary study to demonstrate ways to forecast rainfall [22].These forecasts were used to predict the number of claims for the Weather-Index Based Insurance (WIBI) in Malaybalay City and Davao City.The Moving Averages technique was applied to 30-year rainfall data.It was observed that this technique did not produce a satisfying forecast since it did not capture the rare extreme cases valuable [23] for WIBI.Another approach used to forecast rainfall was fitting probability distributions to the rainfall data.We evaluated probability distributions (Exponential, Negative Binomial, Gamma, Poisson and Normal) using Pearson's chi-square goodness-of-fit test.Results show that daily and cumulative rainfall data in Malaybalay City and Davao City follow the Gamma distribution.This can be used to reasonably estimate the number of WIBI claims.

Data Collection
The 55-year daily rainfall data (January 1961 -August 2016) for 28 synoptic stations (as shown in Fig. 1) is requested from PAGASA.These data are examined for sufficiency and removal of missing values.We construct the histograms using intervals of 10mm as bins to represent the classes of rainfall amounts.We then calculated the relative frequencies that will be used for the probability distribution fitting.The probability distributions are fitted for each rainfall data in each station, first by adjusting the 10 mm bins (e.g., if i = 10mm then i adj = (i/10) + 0.5).Then, these distributions are evaluated using goodness-of-fit tests.
We have tested the data set for invariance using two versions.Invariace version 1: The daily rainfall data is examined in exhibiting invariant patterns for different time scales (tenyear, five-year, one-year, half-year, and monthly).Using chi-square test, the most appropriate fit to each of these subsets is chosen.Invariace version 2: We have also tested if a good fit still remains if we use the distribution with parameter estimated using the complete data set as fit to the subsets.
If the rainfall data demonstrates invariance, then researchers may opt to request and use smaller sets of data without sacrificing its integrity of prediction.We can still produce substantial outcome necessary for our study since the data sets still follow the same distribution.It will also be more time and cost-efficient to deal with smaller sets of data.
Cumulative rainfall data (adding rainfall amount of n consecutive days, where n = 2, 3, . . ., 10) are also fitted to the probability distributions.We also apply goodness-of-fit tests to determine the most fit distribution.

Probability Distributions
Daily rainfall data are usually positively skewed and heavy-tailed, that is, most data points fall on the side with the lower values [7].Based from our initial findings using the daily rainfall data available for Malaybalay City and Davao City, a special probability distribution (i.e., Gamma) suitably fit the daily rainfall data.We focus on this distribution but we have also tested Negative Binomial, Exponential, Weibull, Poisson, Geometric and Normal distributions for fitting.Negative Binomial distribution is a good candidate for fitting daily rainfall data, but the Gamma distribution is good both for fitting daily and cumulative rainfall.Gamma Distribution.Let x be the random variable for rainfall amount in mm.The Gamma distribution has two parameters, where α is the shape parameter and β is the rate parameter.It has a probability density function given by: (1)

Goodness-of-fit Test
To assess the consistency of the probability distributions in modeling the daily rainfall data of the Philippines, goodness-of-fit tests are used.We use Pearson's chi-square test, and the maximum log-likelihood to estimate the Akaike Information Criterion (AIC).In this paper, we only present the commonly used Pearson's chi-square statistic.The Pearson's chi-square test will be used to determine if the daily rainfall data are consistent with the probability distribution.The null and alternative hypotheses to be tested are stated as follows: H o : The rainfall data do follow the specified distribution.
H a : The rainfall data do not follow the specified distribution.
To compute for the χ 2 statistic, we employ the following formula: where, χ 2 c is the chi-square statistic, O i is the observed value in bin i and, T i is the expected value in bin i.We consider α = 0.05 as the level of significance and, the degrees of freedom (df ) is the total number of bins (n) less by 1 and less the number of parameters of the distribution, k (i.e., df = n − k − 1).For instance, the df if we fit an exponential distribution is df = n − 2. We reject the null hypothesis . Otherwise, we fail to reject H o .
The rule-of-thumb in using this test is that the expected frequency for all the bins must be ≥ 5 [24].In this study, we are analyzing a large data set where extreme rare events (large rainfall amounts) are important.Majority of the bins have frequencies ≥ 5 but a few have frequencies < 5 to reflect the rare extreme events.

Distribution Fitting
To determine which probability distribution is a good fit, we formulate a nonlinear program (NLP) that minimizes the χ 2 c value.A sample NLP for fitting Gamma distribution is as follows: Minimize where α and β are the shape and rate parameters, respectively; O i is the relative frequency of bin i; and, T i is the theoretical Gamma frequency of bin i.
A similar formulation is done for each station for all considered probability distributions.We solve the NLP above using MS Excel Solver with RGRNonlinear engine.Sample computations are presented in the Supplementary File.

Fitting to 1961-2016 Data
The histogram for the daily rainfall data for each station are constructed and fitted to the probability distributions considered in this study.As an example, the fitted distributions for Casiguran, Aurora station are given in Fig. 2. The chi-square value and parameter estimates for the respective distribution are shown in the figure .Although selected probability distributions fit well on the daily rainfall data using the chisquare test, the Gamma distribution gives the lowest chi-square statistic.In fact, this is also the case for the other 27 stations which can be seen in Table 1.Since all chi-square values are less than 1, then we fail to reject the null hypothesis, that is, the rainfall data is consistent with the Gamma distribution.Some relevant statistics are also available in the said table .One important characterstic of the distribution is the higher variance compared to the mean (σ 2 > µ).This implies heavy-tailed distribution.Moreover, we have observed a limitation of the fitted Gamma distribution.The theoretical mean based on the fitted Gamma distribution overestimates the sample mean (arithmetic mean of the data set).If this limitation will affect our decision-making (e.g., in calculating claims in insurance), the sample mean and sample variance can be used instead.
The chi-square test determines how good a fit is based from the total sum of squares of the deviations between the observed and expected frequencies [24].In real-world scenario, the most valuable result of modeling rainfall is its ability to capture extreme cases such as scarcity and excessive rainfall [23].
Even if it is concluded that the Gamma distribution is the most appropriate fit in all stations, there are many instances where the parameters generated from the distribution do not adequately fit these extreme events.In particular, the 0mm rainfall bin is often underestimated like in Fig. 3.This underestimation of the probability for 0mm rainfall leads to the overestimation of the mean (theoretical vs sample mean).If our concern is the risk of heavy rainfall  and not the risk of drought, the fitted Gamma distribution and its corresponding theoretical mean can be used.The difference between the risk cost in using the theoretical mean and the risk cost in using the sample mean can be part of the insurance safety loading.

Fitting to Subsets of the Data (Invariance Version 1)
We have also employed fitting smaller sets (ten-year, five-year, one-year, half-year, and monthly) of rainfall data into probability distributions.If we take a look at the chi-square values for the different subsets of rainfall data shown in Table 2, these are all less than 1.Meaning, these subsets of data considered are still consistent with the Gamma distribution.
In Fig. 4, we can visually compare the histograms for each data sets.Even if the chisquare test indicates that the Gamma distribution is a good fit for smaller sets of rainfall data, it is possible that the Gamma distribution for the monthly and half-year data would not produce sufficient rainfall amount prediction.This means that decreasing the quantity of data considered would only be suitable up to some extent.

Fitting the Parameters from 1961-2016 Data to Subsets of the Data (Invariance Version 2)
In this section, we fit the obtained Gamma parameters in Table 1 to different subsets of the rainfall data.As we can see in Table 3, most of the chi-square statistics are still within the acceptance level.That is to say that majority of the subsets of rainfall follow the same Gamma distribution as the 55-year daily rainfall data.

Fitting to Cumulative Data
Similar to the daily data, we have also fitted the probability distributions into the cumulative rainfall data.Using the chi-square test, the cumulative n-day rainfall patterns where n = 2, 3, . . ., 10 also follow Gamma distribution.It is evident that the parameters α and β increase as n increases.That is because the probability of a zero rainfall decreases as we increase the number of days considered.In some stations, there is an obvious change in the shape when α < 1 to α > 1 as we can see in Fig. 5.

Conclusions
Considering the 55-year daily rainfall data available from 28 different synoptic stations in the Philippines with each region represented, the rainfall distribution for each station follows the Gamma distribution.With regards to the cumulative rainfall amount, we have confirmed that each stations rainfall pattern still follows the Gamma distribution.Lastly, we considered modeling the rainfall pattern using smaller sets of the data and have shown that still, it is consistent with the Gamma distribution.
The use of relatively simple probability distributions in this research had generated novel results to represent the Philippine rainfall data.However, we recommend the following areas to work on in the future: (i) Since the distributions generated are for areas covered by the 28 weather station, we can consider spatial interpolation to generate a distribution for the uncovered areas; and (ii) As mentioned earlier, there are stations where the resulting distribution for lower bins are underestimated.We may opt to use hybrid distributions to improve the probability estimates.Supplementary files: • Summary of Gamma distribution parameters.xlsx

• Sample computations.xlsx
Acknowledgments: JFR is supported by the UP System Enhanced Creative Work and Research Grant (ECWRG 2016-1-009).Author Contributions: JFR conceived the study; DCNC provided the input information; VABB and ABBA produced the results; all authors contributed in the analysis, interpretation,

Figure 1 :
Figure 1: Philippine map presenting the locations of the 28 PAGASA weather stations

Figure 2 :
Figure 2: Rainfall histogram for Casiguran Station and selected fitted probability distributions with corresponding parameters

Figure 3 :
Figure 3: Gamma Distribution for the Daily Rainfall in Science Garden Quezon City Station

Figure 4 :Figure 5 :
Figure 4: Gamma Distribution for the Daily Rainfall in Dumaguete Station using Different Time Scales

Table 1 :
Chi-Square Statistic, Gamma Parameters, Mean and Variance of the Daily Rainfall

Table 2 :
Chi-Square Statistic for Different Time Scales of Data (Invariance Version 1)