A ’Local-Global’ model for Seasonal Diseases: Influenza Subtypes Analysis Case Study

Influenza epidemics in temperate regions display dynamics that are characterized by pronounced seasonal peaks during the winter. The general lack of influenza cases during the off-season may result from the virus physically disappearing at the end of the season, in which case it must be imported annually. Alternatively, it may result from persistent asymptomatic carriers or unnoticed local transmission chains that develop into local epidemics as conditions become conducive. Here I attempt to understand these differing explanations by analyzing the global distribution of the four major subtypes that comprise influenza over a period of 18 years based on FluNet data, the surveillance network and database compiled by the WHO, and the NCBI influenza data resource, a repository of relevant genetic information. Examining the annual proportion of each subtype, I find considerable variations in subtype annual proportions between the regions. Moreover, I find that seasonal influenza subtypes can remain confined to specific temperate regions, without showing measurable global presence. These results indicate that although largely undetected during the off-season, influenza is likely to persist locally, and imply a ‘local-global’ model where annual influenza epidemics are a mixture of local strains undergoing reactivation together with an influx of global variants.


Introduction
Influenza virus infections are among the leading causes of winter morbidity and mortality, resulting in an estimated 40,000 deaths annually in the United States alone and up to ~500K globally 1,2 . Influenza epidemics in temperate climates exhibit distinctive seasonal dynamics, characterized by a peak in disease incidence over the winter months, but only sporadic disease cases are detected during the rest of the year [3][4][5][6][7][8][9] . Off-season epidemics have been observed in temperate regions, but these are not common and mostly confined to narrow geographic settings, e.g. correctional facilities 10 , military bases 11 , cruise-ships 12 and long-term care facilities 13 .
The low disease incidence during the off-season may indicate that influenza in temperate regions physically goes extinct at the end of each winter season, and that consequently influenza must be imported annually via 'carriers' travelling from source regions. Phylogenetic studies focused on the long-term spatial-temporal dynamics of influenza support this type of model (termed here a 'strictly global model'), either with the semi-tropical southeast-Asia region as a perennial source 14,15 , or as has been suggested more recently, influenza lineages are dynamically maintained through time and space via complex migration networks, essentially persisting in no population for extended periods 3,16 .
However, increased influenza surveillance following the 2009 H1N1pdm pandemic revealed higher than previously assumed off-season incidence rates 17,18 . For example, active transmission chains of the H3N2 influenza subtype were detected in rural areas of New York, USA (latitude: 40N) from March to June 2012, at which point 19 .
Moreover, persistent 'local influenza lineages' have been identified in subtropical Africa 20 , suggesting that at least in some regions long-term influenza persistence is possible.
Here I postulate that if annual epidemics in temperate regions absolutely require the introduction of strains from global sources (i.e. a 'strictly global model'), then the annual subtype proportions in temperate/sink regions will tend to be similar to, and correlate well with the proportions in the source region(s Finally, I add perspective by repeating the analysis using data derived from the NCBI influenza database 28 , which is the primary resource used in corresponding phylogenetic studies. Importantly, I did not find evidence of neither a global pattern nor of local persistence in this dataset. These results suggest that a dynamic mixture of global and local sources shape the subtype-composition of annual influenza epidemics, and specifically, that the influenza virus can persist locally in temperate regions. Furthermore, a comparison of data from the two main influenza resources, FluNet and NCBI, showed that although lacking sequence information, the wealth of systematically collected surveillance data (FluNet) could shed light on questions that usually are addressed via phylogenetic analysis.

Results
Expected subtype proportions in strictly global/local models I consider two hypothetical models that can explain influenza long-term dynamics: The first is a strictly global model, where all annual influenza epidemics (of any subtypes) are introduced into temperate regions via external carriers, either from one particular source or from multiple sources. The second is a strictly local model, where the source for all influenza epidemics is assumed to be local.
The annual proportions of each influenza subtype are expected to differ considerably between these models. In a strictly local model, the subtype proportions in different regions should be entirely independent and poorly correlated. However, pandemics, as well as numerous molecular studies 2,7,29,30 prove that this is almost certainly not generally the case and I shall not examine this possibility further.
In a strictly global model influenza dynamics in temperate regions are expected to critically depend on the volume of human transport between sinks and source(s): lower volumes of transport could lead to increasingly asynchronous epidemics, or even entire seasons where influenza is absent from some or even all the temperate regions (sinks). Conversely, higher volumes of transport, and by extension a non-limiting influx of influenza carriers are expected to ensure that epidemics of any subtype occurring in a source(s) will be continuously introduced to temperate regions.
In which case, epidemics in sink regions could initiate whenever the local conditions become favorable, and hence epidemics in such regions, which also share the same seasonality pattern, are expected to begin at roughly the same time, and comprise similar subtype proportions, mimicking the contemporary proportions at the source.

Influenza subtypes annual distribution by region
In a global model, given non-limiting transport rates, subtypes should be represented

Globally homogeneous periods
Within the general lack of a homogeneous pattern in the annual subtype proportions During the period examined four seasons, 00-01, 03-04, 09-10, and 15-16 were identified as HR_G, for subtypes H1N1, H3N2, H1N1pn and FluB respectively (seasons indicated in figure 4). Two of these seasons (03-04 and 09-10) are known to be associated with the emergence of highly transmissible, antigenically novel influenza variants capable of rapid global spread 14,22,27,32,33 , and the 00-01 season showed an unusually high level of H1N1 activity in many regions, notably in Europe and the US 25,26,34 .
Note the probability of four HR_G seasons arising by chance (i.e. in the absence of global dynamics) is p < 0.01, strongly suggesting most if not all these HR_G seasons were the result of a particular strain reaching rapid global dominance.

Limited local dominance
Intuitively, influenza variants can become highly represented in a region (HR in Eqn.2) by gaining a competitive advantage, e.g. through mutations and genomic re-assortment events 1,14,15,[35][36][37][38][39] . In the context of a strictly global model, the bulk of such virus evolution occurs at the source regions, whereas strains in sink regions are close relatives of contemporary strains at the sources 16,35 . Hence, it is expected that strains that become HR emerge in a source region i (HR i ), and from the source spread to connected sink regions. If the source is sufficiently well connected then these unusually fit strains are expected to spread globally and become HR_G (Eqn.3).
However, in the majority of seasons examined, at least one region was classified as HR (figure 4) for a subtype, yet the same subtype was not HR in other regions. This observation is consistent with the low correlation in subtype proportions, but is difficult to explain within a strictly global context, whether the model assumes a single source region, or multiple source networks.

Local persistence
Another prediction that follows from assuming a strictly global model is that since influenza epidemics in temperate regions completely die out annually, then any subtype detected in any temperate region must also be detected in at least one source region.
However, a careful analysis of the country incidence data revealed a counterexample where the H1N1 subtype was detected in temperate regions over several seasons without an ostensible source elsewhere. H1N1 was prevalent in Asia (most notably

Subtype distribution in the context of influenza phylogenetic studies
Our results point to dynamics in which globally circulating strains can have a dramatic impact on regional epidemics, at times overshadowing any local contributions, and even replacing existing subtypes altogether, e.g. the annual H1N1 subtype was replaced by H1N1pdm over the 2009-10 season 23 .
However, while phylogenetic studies point to a general lack of local influenza persistence 3,16,18 , our results suggest there are both local and global sources for annual epidemics, i.e. persistent local sources contribute measurably to the proportions of influenza subtypes in annual epidemics. I explored the possibility that data collection methodology can be the source for this discrepancy by repeating the analysis using the NCBI influenza database 28 .
The NCBI database contains date and locations, as well as the genetic information, which is of great utility in phylogenetic analyses. However, while the data is more detailed than that supplied by FluNet (which does not contain sequence information), it is limited in the total number of samples available for the period examined, and may be less standard in sample submission methodology. Note in context that the FluNet data analyzed here comprises >2M data points, compared to ~50K data points in the NCBI data (compare table 1 (FluNet) and table 2 (NCBI)).
As with the FluNet data, annual subtype proportions were overall heterogeneous and Remarkably however, in the NCBI data no season was HR_G for the proportions of any subtype (figure 6c). Rather, results showed a horizontal (regional) pattern and a pronounced paucity of FluB genetic data in most countries, suggesting a potential bias in the sampling methodology, see e.g. the supposedly unusually high proportion of H3N2 in Europe between 2002 and 2005 (compare with figure 3a and 4 for FluNet data). Furthermore, there was no evidence in the data to support the localized subtype persistence pattern that was found analyzing the FluNet data ( figure 5). That is, there were no 2 consecutive seasons in which samples of any subtype were detected mainly/only in temperate regions, and specifically not H1N1 in Europe. Taken together, these results indicate that local persistence in temperate regions does indeed occur, which implies a 'local-global' model, where the annual composition of influenza epidemics is a mixture of local strains undergoing reactivation together with an influx of globally circulating variants. When the competitive advantage of global variants is sufficiently large, the contribution of local sources may vanish, as indicated by the correspondence between the emergence of antigenically advanced variants and years identified here as 'globally homogeneous' (HR_G). However, in the absence of such advanced variants, the presence of local strains has a non-negligible effect, which leads to regional variations in annual influenza subtype compositions.

Annual influenza epidemics in temperate regions
Examining subtype incidence data derived from the NCBI database (rather than FluNet) had an important impact on the results; specifically, I did not find any 'global seasons' (HR_G, compare figures 4 and 6c) and there was no evidence of the local persistence pattern found in the FluNet data. It is of note that FluNet was designed purposely for broadening the understanding of global patterns in influenza spread, and provides a large set of systematically collected case samples for statistical analysis (Table 1), containing > 2 orders of magnitude more samples than in the NCBI database.
The contrast between the FluNet-and NCBI-based results may provide a clue as to why studies of influenza sequence information from roughly the same time period conclude the virus does not survive the off-season, and must be imported annually from source regions. It is quite surprising that so little consideration had been given hitherto to the relative proportions of subtypes or lineages in public health related issues, such as vaccine strain selection. For example, significant efforts have been directed at identifying a link between the genetic composition of variants and their ability to evade neutralizing antibodies [42][43][44] , which are important but not sufficient parameters in determining the pandemic potential of a variant. In contrast, I are not familiar with studies that aim to identify significant deviations in the relative proportions of subtypes, a computationally simpler and intuitively more direct measure of a strain's pandemic potential. I note the time-series data from the southern hemisphere was arbitrarily set to lag behind that of the northern hemisphere, however, reversing this decision, i.e. having the southern hemisphere season precede the northern hemisphere, had no effect on the results (not shown). Also, the use of alternative methods for defining seasons, e.g.
detecting peaks of influenza activity in the data, per country, region and globally did not affect the observed dynamics (not shown).
The type of model I propose in which global sources promote influenza diversity yet are not the only factor explaining seasonality, is consistent with several general observations. For example, it is well documented that influenza strains may be detected all year-round in non-human hosts, not only in Avian hosts, where the virus is maintained via a complex intra-species/reservoir systems 30,45 , but also in non-migrating Equine and Swine populations where no natural reservoir had been identified [46][47][48][49][50][51] . Interestingly, it was recently found 40 that while difficult to detect through conventional surveillance, swine influenza lineages can persist in surprisingly small host population (~500 animals).
The results presented here emphasize the importance of using multiple data sources and applying diverse analytic approaches to complex problems, and taking into consideration the knowledge of the genetic composition, as well as the relative abundance of influenza variants in the selection of seasonal flu vaccines.

NCBI data
I retrieved HA sequence data from the NCBI influenza virus database 28 for the H1N1, H3N2 and H1pdm subtypes, and NA sequence data for FluB in the period from September 1999 to February 2017 (including partial sequences). Using the header information I divided the data into countries/seasons as with the FluNet data. Note this data was used only to examine subtype proportions per country/season, and without reference to the sequence data itself. The numbers of samples per country are presented in table 2.

Regions and seasons
Countries were divided into four regions based on general geographic location: Asia (Japan, China, South Korea, Thailand, Singapore, and Russia), Europe (Norway, Sweden, Denmark, Netherlands, Poland, Germany, France, UK, Finland, Switzerland, Czech, Romania, Hungary, Austria, Iceland, Ireland, Italy, Israel, Spain), North America (USA, Canada) and 'southern hemisphere' countries (Peru, Argentina, Chile, Australia and New Zealand, and South Africa), referred to as 'south' (figure 1). Data (Tables 1 and 2) was added for each of the countries in these specific regions.
The influenza season was defined according to calendar months: the winter-season in the northern hemisphere was approximated as week 40 to week 20 in the following year and weeks 14-47 for the south, representing a 26 weeks shift forward (or backward, S3).

Subtype proportions
For year A, the per season , proportion for each region i , i.e. the number of f Aαi positive events of a given subtype α , over the overall influenza events detected, was calculated using the following: is the number of influenza subtype α cases (H1N1, H3N2, FluB and Aαi N H1N1pn) in region i, in season A and the sum of these is the total cases of influenza reported per region, per season.

Correlations
Let ρ(x,y) be the standard Pearson's correlation coefficient between two variables x(t) and y(t). Then the lagged cross-correlation between the variables is defined as ρ(x(t),y(t-τ)), where τ is the time-lag. To quantify the epidemic timing (