Symptom and Age Homophilies in SARS-CoV-2 Transmission Networks during the First and Second Waves of the Pandemic in Japan

*Ali Andalibi, PhD – Co-designed the study, contextualized the results, and analyzed the medical aspects of the data *Naoru Koizumi, PhD – Accessed and organized the prefecture data, co-designed the study, and performed statistical analysis Meng-Hao Li Cleaned and co-organized the data, and performed the network analysis Abu Bakker Siddique, MPP – Cleaned and co-organized the data, and visualized the networks *These authors contributed equally to this work Data sharing statement


INTRODUCTION
Epidemiological studies of COVID-19 have provided mounting evidence that a significant number of individuals infected with SARS-CoV-2 are asymptomatic (1,2), while demonstrating that the symptomology of the disease largely depends on age, sex, and comorbidities (3)(4)(5). There is limited information, however, on the characteristics of viral transmission networks, especially in relation to the demographic and symptomological homogeneities and heterogeneities in viral transmission (6). To examine the characteristics of SARS-CoV-2 viral transmissions, we analyzed Japanese contact tracing data that record viral transmission chains as well as demographic and symptomological information of the PCRconfirmed cases during the first and the second waves of the pandemic.
Since the index case was confirmed on January 16 th , 2020, the Japanese government has been publishing demographic, clinical, and epidemiological data on each individual who has tested positive for the virus. One unique feature of the data is the transmission paths revealed through the contact tracing efforts of the public health centers (PHCs) (7)(8)(9). Although contact tracing has become unfeasible in many parts of Japan after the resurgence of the disease in summer 2020, such data were fairly complete and reliable for the first 6 months of the pandemic, i.e., February through July. Under this government-led contact tracing effort known as "cluster countermeasure", the PHCs retrospectively queried all identifiable individuals who had had in-person contact with a confirmed case during the prior 14 days (8,10,11). Those who were determined to have been in "close contact" were all subjected to a PCR screening test irrespective of the presence of COVID-19 related symptoms (11). The criteria used to determine "close contact" included: i) being a cohabitant of the confirmed case; ii) having spent long hours in an indoor setting (including a car, or an airplane) with the confirmed case; iii) having provided (medical, nursing) care to the confirmed case without adequate personal protective equipment; iv) likely exposure to droplets or other body fluids of the confirmed case; or iv) having been within 1 meter (6 feet) radius of the confirmed case for a total of 15 minutes, or more without protection. Those who did not meet any of these criteria were requested to self-quarantine for 14 days and were advised to receive a test if any symptoms appeared during the 14 days (8,10,11).
We utilized the data from the two prefectures, Hokkaido and Kanagawa, for the period when the data was most complete, i.e., between mid-February and mid-July for Hokkaido and between mid-January and early August for Kanagawa. We selected Hokkaido as it was one of the first prefectures that experienced the COVID-19 pandemic, and which issued the Declaration of a New Coronavirus Emergency as early as 28 February 2020 (12). Kanagawa is another prefecture that experienced the pandemic early, with a resident returning from Wuhan, China, and became the country's index/first case of COVID-19 (13).
The primary objective of the current study was to construct SARS-CoV-2 transmission networks and to analyze the characteristics of viral transmission both descriptively and statistically. In particular, we examined symptom, age, and sex "homophilies", i.e., whether an infector (the source patient) and the infectee(s) tend to experience similar symptoms, be both asymptomatic or belong to the same age or sex group. Although the results of such analyses do not provide direct evidence about the variations of the virus, the findings shed light on the heterogeneity of SARS-CoV-2 transmission that may be partly explained by viral variants, as well as how government intervention strategies and the population's behavior at the time of the pandemic influence the spread of the virus.

Data
We queried the government registry data for Hokkaido and Kanagawa prefectures. The registry data from Hokkaido contained 1,269 cases (including 674 or 53% females and 595 or 47% males) covering the period between February 14 and July 22, 2020, while the data from Kanagawa contained 3,123 (including 1,346 or 43% females and 1,777 or 57% males) cases covering the period between January 15 and August 6, 2020. The final data contained information about 4,392 (2,020 or 46% female and 2,372 or 54%) patients. These cases were originally confirmed by the local PHCs that report to the Ministry of Health, Labor and Welfare. The Ministry standardizes and publishes the data it receives from the PHCs as part of the comprehensive data contained in the national registry (7). Individual prefectures also publish the data through their websites, although the specifics and the format of such information vary by prefecture.
The data collected by the PHCs during the study period included basic demographic, symptomatological, and epidemiologic information, including the transmission paths (likely infectors and infectees) and travel history of the confirmed cases, with informed consents (7). We queried both national and local registry data for this study.
The final data included the information on: sex; age (<10, 11-20, 21-30, 31-40, 41-50, 51-60, 61-70, 71-80, 81-90, 91-100, or >100); the city of residence, or the testing site; the dates of PCR and the onset of symptoms; and symptoms experienced (if any). In Kanagawa, 100 patients were non-Japanese citizens who reside on the US military base. Symptomatological data on these patients were not publicly available, thereby reducing the sample size to 3,023 for the analysis of symptomatological data. Similarly, 48 patients did not provide age, reducing the sample size for the analysis involving age to 4,344. Data on viral transmission paths were available for 1,365 patients (371 cases (29%) and 994 (32%) cases in Hokkaido and Kanagawa, respectively). After excluding those patients whose symptomatological data were missing, 1,310 patients (355 patients in Hokkaido and 955 in Kanagawa) remained in the viral transmission networks. For Kanagawa, the likely settings through which transmission occurred were also available for 457 (15%) patients. These included: i) at medical facility; ii) through family; iii) through friends; iv) at work; and v) through travel (domestic or international, where the destinations of international travels included: Middle East, South Asia, EU, US, and other).

Methods
Patient characteristics observed in Hokkaido and Kanagawa were compared using t-tests for continuous, and chi-square tests for nominal variables. Depending on the distribution of a continuous variable and the sample size of a nominal variable, Wilcoxon-Mann-Whitney and Fisher's exact tests were used to replace t-and chi-square tests, respectively. To investigate the factors correlated with viral transmission and asymptomatic state, logistic regressions were performed with the binary dependent variables recording the presence of either viral transmission or asymptomatic state. The factors explaining the viral transmission counts were examined using a Poisson regression with the number of infectees per patient as the dependent variable. In order to examine the difference between the two prefectures, the interaction term between Kanagawa and asymptomatic status was included in the regressions. For the age analysis, the patients aged between 50 and 59 were the reference group, as the preliminary analysis indicated that the group had the lowest proportion of asymptomatic patients. For the month fixed effects, July was the reference month, as the month signifies the end of the first wave for both prefectures and the beginning of the second wave for Kanagawa. All statistical analyses were performed in STATA (StataCorp, v14).
Statistical significance was defined by p≤0.05 unless noted otherwise.
We defined asymptomatic cases as those cases who met at least one of the following criteria: i) the note in the registry indicated the case as an "asymptomatic patient"; ii) the note indicated "no symptoms"; or iii) there were no symptoms recorded while all other information (age, sex, dates of PCR, etc.) on the patient were present. While these cases may be pre-symptomatic, the notes in the registry data frequently included updated information, indicating, for instance, "the patient reported a fever of [degree] on [date]" after the initial recording. These updates appeared to have been made during the aforementioned 14 daymonitoring period. Our definition conforms to the current WHO's guidelines for the determination of asymptomatic cases, i.e., PCR-positive COVID-19 patients without overt symptoms at the time of the laboratory-confirmed infection.
To visually inspect the patterns of viral transmission, we constructed viral transmission networks using the records of the patients whose infectors or infectees were known in the registry data. The network construction and visualization were done using Gephi (v0.9.2). To examine the prevalence of "homophilies" in the viral transmission networks, we applied the exponential random graph models (ERGMs), which are well-established models to statistically analyze social and other network data. We specifically investigated several types of homophilies in the networks including: i) sex homophily, which represents the situations where an infector and the infectee(s) belonged to the same sex; ii) age homophily representing the situations where an infector and the infectee(s) belonged to the same age group; iii) symptom homophily where an infector and the infectee(s) had the same symptom; iv) asymptomatic homophily where both an infector and the infectee(s) were asymptomatic. The first two analyses are to investigate the demographic homogeneities/heterogeneities in the networks, while the last two are to examine the symptomological homogeneities/heterogeneities.  ERGMs essentially test whether infector-infectee chains with a specific type of homophily are more prevalent than those chains without the homophily, i.e., "heterogeneous" chains, in the networks. The heterogeneous class was the reference group in the analysis. In the homophily analysis of sex, we compared the 2 homophily classes of infector-infectee chains to 1 heterogeneous class. The 2 homophily classes were: a) the (1,1) class, which represented the chains with the sex homophily, while the heterogeneous class contained both (0,1) and (1,0) cases, representing the chains without sex concordance between infectors and the infectees (Table 1). Asymptomatic homophily was structured and analyzed analogously. In the analysis of age, we combined the age categories into 3 age groups (<30, 30-59, and 60+), and compared 3 homophily chains: a) the (1,1) class representing the transmission between age<30 and age<30 patients; b) the (2,2) class representing the transmission between aged 30-59 and 30-59 patients; c) the (3,3) class representing the transmission between age ≥60 and age ≥60 patients to d) the heterogeneous class comprised of the (1, 2), (2, 1), (1,3), (3,1), (2,3) and (3,2) chains. In the homophily analysis of symptoms, 2 classes of homophily chains: (a) the (1,1) class representing the presence of the same symptom between infector and the infectee(s); and b) the lack thereof, i.e., the (0,0) class; were compared to c) the heterogeneous class, which represents both (0,1) and (1,0) chains where either infector or the infectee(s) had the symptom. We combined 15 symptoms to make 4 distinct clinical symptom groups to ensure that each class has a sufficient sample size to detect any statistically meaningful variations across the classes: • Other symptoms that were not grouped included Fever, Headache, and Body ache. All ERGM analyses were run using the programming language R (R Core Team).

RESULTS
The result section is structured to have the following sub-sections: 1) the comparison between Hokkaido and Kanagawa patient profiles; 2) factors correlated with being asymptomatic; 3) factors correlated with viral transmission; 4) viral transmission networks; and 5) demographic and symptomological homophilies in viral transmission networks. June and another after July. The second wave predominantly hit Tokyo and the vicinity, which is part of Kanagawa. Table 2 summarizes the demographic and symptomatological profiles of the patients in the two prefectures.

Factors correlated with Asymptomatic State
Figures 2 (i)-(iv) present the proportions of asymptomatic and symptomatic patients by sex and age group in each prefecture. The figures indicate that, for both prefectures, the proportion of asymptomatic patients is higher in both younger (<20) and older (≥70 or 80 depending on the prefecture/sex) generations compared to the middle-aged group, irrespective of sex.
We statistically examined the relationship between age and the likelihood of being asymptomatic, adjusting for patient's sex using a logistic regression on the data from both prefectures ( Table 3). The results were consistent with the observations from Figure 2, demonstrating that, compared to the patients aged between 50 and 59 (the reference age group), the patients aged between 1 and 9 and between 10 and 19 were 4.65 and 1.84 times more likely to be asymptomatic, respectively (p<0.001 for both age groups).
Similarly, the patients aged between 80 and 89 as well as 90 and above were 2.18 (p<0.001) and 2.62 (p<0.001) times more likely to be asymptomatic, respectively, compared to the reference group (i.e., 50-59   To better understand the seasonal effect, we separated the data by prefecture to examine whether the proportion of the asymptomatic cases vary by month in each prefecture, adjusting for sex (

Factors correlated with Viral Transmission
To identify the factors correlated with the viral transmission, we performed a logistic regression with the binary dependent variable representing the patients who infected at least one individual (

Viral Transmission Networks
Transmission of the virus ranged from one to four levels (primary to quaternary) in both prefectures.  Table 5 presents distribution of the viral transmission levels by symptomatic/asymptomatic status.

(a) Logistic Regression on Viral Transmission (b) Poisson Regression on the Number of Infectees
Quaternary transmission was rare, accounting for less than 1% of all cases in the networks in both prefectures. In both prefectures, the incidences of secondary transmission was the highest, accounting for 58% (Hokkaido) to 61% (Kanagawa) of all the cases in the transmission networks.  (Table 6). Those patients who contracted the virus through the secondary or tertiary transmission were 2.9 (OR=2.9, p<0.001) and 3.2 (OR=3.2, p<0.001) times more likely to be asymptomatic than primary cases, respectively.  Figure 4 presents the viral transmission networks by the (color-coded) transmission level for both prefectures. In the diagram, each node represents a patient, while the node size is depicted in proportion to the number of his/her infectees. The transmission networks indicate that the majority of the chains consist of two cases, an infector (green) and an infectee (pink). There are also several large transmission networks in which the virus was spread from a primary infection case (green) to a large number of secondary infection cases (pink). A few networks consisted of a number of secondary infection cases who spread virus to tertiary cases (orange). There were a very small number of tertiary infection cases who spread virus to quaternary cases (blue).
The histogram of the network sizes is shown in Figure 5. The histogram demonstrates that more than 90% (i.e.,   cases) was predominantly comprised of asymptomatic cases (33 or 92% asymptomatic and 3 (8%) symptomatic cases). Even excluding this particular cluster, there was a general tendency that asymptomatic cases were more likely to generate asymptomatic cases in subsequent transmission chains. We statistically  tested this by examining whether the proportions of symptomatic and asymptomatic patients differed depending on the symptomatic/asymptomatic status of the infectors. The result revealed that approximately 8% of patients infected by symptomatic patients were asymptomatic while 29% of patients infected by asymptomatic patients were also asymptomatic in the networks (p<0.001). Separately, we visualized the transmission networks by age and sex, which revealed no discernable patterns and thus are not presented here.
For Kanagawa networks, we also visualized the viral transmission networks by setting (Figure 7).

Demographic and Symptomological Homophilies in Viral Transmission Networks
The ERGM analyses examined the prevalence of sex, age, and symptom homophilies in the viral transmission networks.   Table 8 presents the results of the ERGM analysis. In the table, statistically significant homophily classes are shown in bold. As evidenced by the odds ratios of 1 or above, homophily chains were more prevalent than heterogeneous chains in general. The only exception was the gastrointestinal homophily in Kanagawa (OR=0.36, p<0.001), which indicated that the gastrointestinal homophily chains are 64% less likely than the heterogeneous chains. The gastrointestinal homophily chains were more likely than the heterogeneous chains in Hokkaido (OR=2.20, p<0.001), showing differences in disease manifestation between the two prefectures. For all other homophilies, the results were consistent between the prefectures.
In particular, the asymptomatic homophily and the sensory disruption homophily chains were statistically more likely than the heterogeneous chains in both prefectures. With respect to the asymptomatic homophily, the asymptomatic chains were 5.21 times and 3.67 times more likely than the heterogeneous chains in

DISCUSSION
The current study analyzed the data from 4,392 (2,020 females and 2,372 males) individuals who were PCR-positive for SARS-CoV-2. The comparison of the results from the two prefectures have shown similarities, as well as differences. In both prefectures, asymptomatic cases were about 20% and were more likely to be female, and in either younger (<20), or older (≥80) age group. The rate of asymptomatic infection observed in the current study is comparable to that report in prior literature (14)(15)(16). The evidence that female patients are more likely to be asymptomatic is also relatively well-established (17,18), although these studies also indicate that younger female patients are particularly more likely to be asymptomatic. The observation made in the current study that older patients are more likely to be asymptomatic might be unique to Japan. Japan is known as one of the world's top countries for longevity, especially in females (19). Such prolonged life expectancy has been accompanied by concomitant improvement in overall health and physical functions in older population, reducing the mortality rate in Japanese female centenarians even further in the last decade (19,20). Moreover, studies have shown that the Japanese elderly population, as whole, is lean, with a low body mass index (BMI), which has been shown to be associated with longevity (21,22). Additionally, the susceptibility of overweight individuals, who often suffer from diabetes and hypertension, to severe COVID-19 disease has been established in multiple studies (23). Our analysis also shows that regardless of showing symptoms, in both prefectures, males transmitted the virus at a higher rate. This is consistent with the results of other studies that have shown a slower ability to clear viral RNA in males versus females and a more efficient immune response in females (24)(25)(26).
The primary difference observed between the prefectures was the viral transmission rate among asymptomatic patients. In Hokkaido, asymptomatic patients were more likely to transmit the disease while, in Kanagawa, symptomatic patients were more likely to transmit the virus. Other studies have also reported varying results with regard to the viral transmissions by symptomatic and asymptomatic cases, ranging between 0% and 2.2% for asymptomatic transmission and between 0.8% to 15.4% for symptomatic transmission (27)(28)(29)(30)(31). The most recent meta-analysis reports that the relative risk of asymptomatic transmission was 42% lower than that of symptomatic transmission (16). The higher viral transmission by asymptomatic cases in Hokkaido may reflect the fact that, during the first wave, the presence of asymptomatic infections as well as the risk of subsequent transmissions by asymptomatic cases were less known in the population, and thus the maintenance of in-person social contacts by asymptomatic cases was more widespread in Hokkaido than in Kanagawa during the second wave.
Another explanation may be the differences in the climate and temperature. Hokkaido is farther north and significantly colder than Kanagawa, especially during the winter, and experienced its first COVID cases during the winter months, peaking in April (mean temperature 5 o C). Given that the seasonality of respiratory viral diseases and the impact of temperature and humidity on the body's response to these pathogens is well-established (32), it stands to reason that symptomatic respiratory diseases such as COVID-19 may be more prevalent, and associated with more severe symptoms, in the colder clime of Hokkaido than in the warmer temperatures of Kanagawa. As such, Hokkaido patients would have been more easily identified and quarantined, thus resulting in a reduction of transmission from symptomatic patients, relative to asymptomatic ones. In Kanagawa, on the other hand, environmental factors such as the warmer temperatures during the latter two COVID peaks in July and  (33)(34)(35)(36), especially in the early stage of the pandemic when proper protection of health care workers were not in place. The role of superspreaders in the indoor setting has been well documented (37)(38)(39). Several explanations have been provided regarding the existence of superspreaders including: i) high viral shedding of the seed case due to low immunocompetence, attributable to underlying medical conditions or co-infection; ii) the indoor environmental factors, such as humidity, which are conducive to epithelial innate immune function, resulting in higher levels of viral replication and shedding; and iii) active social behavior of the seed case (32,(40)(41)(42)(43)(44).
Transmission clustering has also been reported in family setting. These studies have shown that within-family transmissions are often localized and that the risk of transmission in the setting is comparatively high (6). Our study also found clustering within families, although the clusters were small.
Moreover, with the exceptions of the two medical facility transmission networks, our analysis revealed that the majority (64%) of the networks were comprised of 2 patients (an infector and an infectee) and more than 90% of the networks involved less than 5 patients. In recent months, more evidence on the makeup of SARS-CoV-2 transmission lineages has become available (45)(46)(47). These studies report that the proportion of the lineages that go beyond secondary transmissions is surprisingly low, in part driven by lockdowns, and implementation of effective interventions to control the pandemic. For instance, consistent with our data, Geoghegan, et al. (2020) report that less than 20% of virus introductions into New Zealand generated viral transmission of more than one additional cases. Here, it is possible that a geographic attribute (being an island) of the two countries may have resulted in similar intervention effects.
To our knowledge, no prior studies have examined demographic and symptomological homophilies of the SARS-CoV-2 viral transmission networks. Homophilies, in this case, refer to the similarities between the infector and infectee. Our exponential random graph model (ERGM) analysis revealed the presence of age homophily among older (≥60) patients in both prefectures. This may at least partially attributable to the age grouping of individuals in nursing homes and care facilities, as well as the forms of social interactions (e.g., indoor rather than outdoor, duration, etc.) among older adults which may make allow them to transmit the disease to their confreres. In Kanagawa, additional homophilies were detected in the patients age <30 and 31-59, likely reflecting the generational differences in social behavior, especially in an urban setting such as Kanagawa.
In addition to age homophily, we also observed symptomatic and asymptomatic homophilies.
Symptomatic infectors were more likely to give rise to symptomatic infectees, while patients who got the disease from an asymptomatic infector, were likely to also be asymptomatic. Although the reason behind this homophily remains unclear, it could be the result of a lower viral load in patients with mild disease, which would result in fewer shed viral particles and a consequent lower infectious dose delivered to an infectee. Whether asymptomatic patients have a lower viral load, however, is controversial, with some studies showing a lower levels and others showing no difference (48,49). Related to this point, we also observed that those patients who contracted the virus through the secondary or tertiary transmission were more likely to be asymptomatic than primary cases, potentially suggesting natural viral attenuation. Here, sequencing of viral isolates from primary and higher level cases would be informative.
Finally, homophily of sensory disruption (i.e., anosmia and ageusia) was observed in the networks of both prefectures. Moreover, we observed that homophily chains were more prevalent than heterogeneous chains in the network. These findings suggest that genetic variations of SARS-CoV-2 may be underlying the variance in symptoms, and that the transmission of virions from particular genetic lineage from an infector to an infectee may result in a similarity of symptoms between these two groups. Phylogenetic analyses of SARS-CoV-2 sequences from these cases are warranted to explore this hypothesis.

CONCLUSION
We analyzed the records of 4,392 PCR-confirmed COVID-19 patients in two prefectures, Hokkaido and Kanagawa, during the first two waves of the pandemic in Japan. The network analysis of the viral transmission chains revealed that demographic and symptomological homophilies exist in both prefectures. In particular, age homophily existed in both prefectures, especially between older adults, but more prevalently in Tokyo area. No sex homophily was observed in either prefecture. Most importantly, similar patterns of symptom homophilies were seen in both prefectures, with the most striking being the homophily between asymptomatic infectors and infectees. This result substantiates the logic behind contact tracing and testing of "close contact" cases, even in the absence of the symptoms, in order to contain the spread of the virus. Furthermore, as with COVID-19, control of future pandemics will likely also greatly benefit from public education to promote testing in "close contact" cases, as well as from the establishment of an efficient testing system during the early stages of the outbreak.