STATISTICAL MODELING OF COVID-19 PANDEMIC STAGES WORLDWIDE

COVID-19 is an infectious disease, growth of which depends upon the linked stages of the epidemic, the average number of people one person can infect and the time it takes for those people to become infectious themselves. We have studied the COVID-19 time series to understand the growth behaviour of COVID-19 cases series. A structural break occurs in the COVID-19 series at the change time form one stage to another. We have performed the structural break analysis of data available for 207 countries till April 20, 2020. There are 42 countries which have recorded five breaks in COVID cases series. This means that these countries are in the sixth stage of growth transmission and show a downward pattern in reporting in the daily cases, whereas countries with two and three breaks, record the rapid growth pattern in the daily cases. From this study, we conclude that the more the breaks in the series, there is more possibility to determine the constant or decreasing rate of daily cases. It is well fitted using lognormal distribution as this distribution is archived at its highest peak after some period and then suddenly it decreases at a longer time period. This can be seen in various countries like China, Australia, New Zealand and so on.


INTRODUCTION:
COVID-19 is a novel coronavirus that has travelled from Wuhan, China, to the most of the countries (207) within 100 days. COVID-19 is an infectious disease, growth of which depends upon the stage of the epidemic, the average number of people one person can infect and the time it takes for those people to become infectious themselves. However, the growth rate of coronavirus cases is different in every country depends upon its health infrastructure, persons living life, environmental conditions and many other factors. So, the stages of disease transmission depends upon the changing pattern in the series of the number of COVID cases based on total population, total land area, medical facilities, etc. These changing patterns may be analyzed by a structural break model where shifting in the series can be determined using a change in the growth of spreading of COVID-19. Hence, the structural break can easily explain the shifting on the COVID-19 time series from one breakpoint to another by changing the model parameters. For each break interval, it follows a well-known growth model, and the growth rate of the series is different. Significant contribution in the study of a structural break in time series includes the work of Chow (1960), Nelson and Plosser (1982), Andrews In COVID-19 cases, a structural break might occur when most of the population and land area are affected, sudden increments in corona patients on daily basis, population not following the government guidelines or other factors. It also depends upon the administrative model which has the main goal to slow down the growth of COVID-19 cases by various measures like social distancing, lockdown etc. Health services may be provided in a more structural and significant way when such services try to increase the recovery of COVID-19 cases. Most of the countries have managed the disease in reasonable time. However, some countries record the change in the COVID series, i.e., there is a possibility in the series that structural break(s) have happened. So, the present study has analyzed the changing pattern of COVID-19 cases and identified various suitable breakpoints. For that, we have determined the breakpoints using statistical methodology and then examined the changing trend in each break interval. We have analyzed every stage separately to understand the behaviour of transmission on infections. Various distribution models have been fitted in each break interval based on the number of infected days and the number of total cases. Based on the results, we determine the best-fitted model in the overall break interval.
In the analysis of break intervals, the Lognormal distribution fitted better among all the five discrete and continuous distributions under consideration. Similarly, in the analysis of the number of cases at breaks, the Lognormal distribution fitted better than the rest of the distributions, both discrete and continuous, under consideration.

MATERIAL & METHOD
The data has been collected from our World COVID-19 databases of daily updates of confirmed cases. The data covers the total number of people infected with the COVID-19 virus from December 31, 2019, to April 20, 2020. As of April 20, 2020, there were 2350993 cases of infections worldwide.
On April 20, there were a total of 205 countries/provinces infected with the virus. This data was processed for further analysis.
In the present paper, we apply the Chow F-test statistic to determine the potential breaks at all change points in the COVID series. This methodology is well discussed in R package "strucchange", We will do the following analysis step by step: Step-1: Determine the number of the structural break(s) and its locations.
Step-2: Classify the countries by taking the break interval of World COVID cases.
Step-3: Fit a distribution in each break interval based on total infected days and the total number of cases.
Step-4: Display the histogram and density plot based on the results obtained from the previous steps to conclude the COVID-19 cases. Similarly, for the rest of the intervals, most of the countries have about 4 to 9 days before the next breakpoint occurs. The difference in the shape of the first interval and rest of the intervals is that there are more countries in the S1 than the rest, i.e., almost all the countries have observed their first breakpoint based on total number of cases within the country. Likewise, very few countries have observed their sixth stage in the number from the total cases, which leads to a distorted histogram for S6 and less skewed than the other intervals. This also shows a control situation of these countries that observed the sixth stage of COVID-19 infection.

Out
The Cullen and Frey graphs were fitted for each of the intervals and used to find possible candidates for fitting various discrete and continuous distributions to the data. The following distributions are considered: Normal, Negative Binomial, Poisson, Lognormal and Gamma distribution. Table-2

Analysis the number of cases at each stage:
In the analysis of different COVID-19 infection, we also need to understand the COVID-19 cases as the infection in equally crucial for the number of carriers. So far, we modelled COVID-19 cases in different stages of spreading of COVID-19. First, the data consisted of distinctively very high values contributed by countries with a high number of infected cases. These are the countries like the United States, Italy, China, Spain, Germany, Iran, etc., these data points consisted of about 15-19% of the data and has been removed considering outliers which are recorded in Table 3.
The outlier countries are the countries that are worse affected by the disease than the countries in the corresponding stages. It is also recorded that countries which are severely affected by COVID19 (United States, China, etc.), are not having reached in last stages like fifth and sixth. Table 4 records the descriptive statistics of the break interval based on total cases, and Figure 5 displays its density plot.
The aggregate data, after removing outliers, have a positively skewed shape and mildly peaked in the first half of the data. The average number of cases at B0 is 74, which is reasonable as these are moderately affected (after removing outlier) countries and therefore have fewer cases till the first break. This average increased rapidly till B4, reaching approximately 580 cases, where about 75% of the countries have less than 802 cases in total. At break point B5, this average falls. The reason behind this is a possibility to decrease the growth pattern of the COVID-19 cases in these countries. Most of these countries have a high number of infected cases that were pulling the average up at the earlier breaks.
The suitable candidates for distribution fitting were found using Cullen and Frey graphs in Table   5. The lognormal distribution fitted the best among exponential, normal, Gamma and other discrete distributions. The trace of density & distribution function and the P-P plots are shown in  These figures show that a better fitting is observed using the lognormal distribution as it has achieved its peak and after that, there is a sudden decreasing trend in the extended time period. This can be easily seen in China series. Departures from the straight line in the Q-Q plot at the higher quantiles form the distribution is fitted due to the highly skewed countries data. Hence the methods other than lognormal distribution are not able to thoroughly explain the skewness of the data accurately in the total cases.

DISCUSSION
This paper studies and analyzes the various stages in the growth of COVID-19 cases using the structural break methodology. We considered the cumulative cases of coronavirus for each country as a data series and identified the breaking point when the structure of the series is shifted suddenly. We have examined the inference based on each break period. This break period provides the length of the duration and number of cases in various growth stages. For each break interval, we fit a distribution to explain the growth pattern of COVID-19 cases. Based on the results, we observed that lognormal distribution is better fitted to the number of days and number of cases in each break interval of COVID19 in the whole world. This is so because lognormal distribution has a long right tail with an exponential growth pattern.