TMPRSS2 transcriptional inhibition as a therapeutic strategy for COVID-19

There is an urgent need to identify effective therapies for COVID-19. The SARS-CoV-2 host factor protease TMPRSS2 is required for viral entry and thus an attractive target for therapeutic intervention. In mouse, knockout of tmprss2 led to protection against SARS-CoV-1 with no deleterious phenotypes, and in human populations genetic loss of TMPRSS2 does not appear to be selected against. Here, we mined publicly available gene expression data to identify several compounds that down-regulate TMPRSS2. Recognizing the need for immediately available treatment options, we focused on FDA-approved drugs. We found 20 independent studies that implicate estrogenic and androgenic compounds as transcriptional modulators of TMPRSS2, suggesting these classes of drugs may be promising therapeutic candidates for clinical testing and observational studies of COVID-19. We also note that expression of TMPRSS2 is highly variable and skewed in humans, with a minority of individuals having extremely high expression. Combined with literature showing that inhibition of TMPRSS2 protease activity reduces SARS-CoV-2 viral entry in human cells, our results raise the hypothesis that modulation of TMPRSS2 expression is a promising therapeutic avenue for COVID-19. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 28 April 2020 doi:10.20944/preprints202003.0360.v2 © 2020 by the author(s). Distributed under a Creative Commons CC BY license. Introduction The rapid international spread of the novel pathogenic severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which causes the disease known as COVID-19, poses a global health emergency. As of April 5, 2020, there have been over 1,133,000 confirmed cases and 62,500 deaths worldwide. The clinical presentation of COVID-19 ranges from mild respiratory symptoms to severe progressive pneumonia, multiorgan failure, and death. Therapeutic interventions beyond supportive care in the literature have included oseltamivir, remdesivir, ganciclovir, α-interferon, hydroxychloroquine and lopinavir. Lopinavir, a protease inhibitor, is the only drug with a completed clinical trial but failed to shorten time to improvement or viral shedding. Any effective intervention rapidly mobilized to the frontlines could profoundly impact resource allocation. Effective treatments are therefore vital to handle the surge of COVID-19 infections. SARS-CoV-2 host factors are attractive targets for therapeutic intervention. The SARS-CoV-2 spike (S) glycoprotein binds the angiotensin-converting enzyme 2 (ACE2), allowing the viral particle to enter host cells. Viral entry into host cells also requires cleavage of the viral S protein by host proteases; this cleavage results in irreversible conformational changes to the S protein that allow the virus and host cell membranes to fuse. S protein cleavage, called priming, can use the host serine protease TMPRSS2 or the cysteine proteases cathepsin B or L (CatB/L). A recent single-cell RNA-sequencing study of human and non-human primate tissues revealed three major cell types that co-express TMPRSS2 and ACE2: type II pneumocytes in the lung, absorptive enterocytes in the terminal ileum, and nasal goblet secretory cells. Computational and in vitro screens are useful to identify compounds that either act directly against viral proteins, or that disrupt protein interactions between SARS-CoV-2 and host proteins required for its viral life cycle. Here we propose and develop a complementary approach seeking to identify transcriptional regulators of the host proteins most critical to viral entry and replication within host cells. Given the aggressiveness of this pandemic and the urgency of deploying effective treatments, our first efforts focus on the repurposing of existing drugs as an attractive alternative to novel compound discovery. We note, however, that this screening approach could also be applied to the discovery of new chemical entities with more desirable properties than already available approved medicines. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 28 April 2020 doi:10.20944/preprints202003.0360.v2


Introduction
The rapid international spread of the novel pathogenic severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which causes the disease known as COVID-19, poses a global health emergency. As of April 5, 2020, there have been over 1,133,000 confirmed cases and 62,500 deaths worldwide 1 . The clinical presentation of COVID-19 ranges from mild respiratory symptoms to severe progressive pneumonia, multiorgan failure, and death 2 . Therapeutic interventions beyond supportive care in the literature have included oseltamivir, remdesivir, ganciclovir, α-interferon, hydroxychloroquine and lopinavir [2][3][4][5][6][7] . Lopinavir, a protease inhibitor, is the only drug with a completed clinical trial but failed to shorten time to improvement or viral shedding. Any effective intervention rapidly mobilized to the frontlines could profoundly impact resource allocation 8 . Effective treatments are therefore vital to handle the surge of COVID-19 infections.
SARS-CoV-2 host factors are attractive targets for therapeutic intervention. The SARS-CoV-2 spike (S) glycoprotein binds the angiotensin-converting enzyme 2 (ACE2), allowing the viral particle to enter host cells 9 . Viral entry into host cells also requires cleavage of the viral S protein by host proteases; this cleavage results in irreversible conformational changes to the S protein that allow the virus and host cell membranes to fuse 9 . S protein cleavage, called priming, can use the host serine protease TMPRSS2 or the cysteine proteases cathepsin B or L (CatB/L) [10][11][12][13][14] .
A recent single-cell RNA-sequencing study of human and non-human primate tissues revealed three major cell types that co-express TMPRSS2 and ACE2: type II pneumocytes in the lung, absorptive enterocytes in the terminal ileum, and nasal goblet secretory cells 15 .
Computational and in vitro screens are useful to identify compounds that either act directly against viral proteins, or that disrupt protein interactions between SARS-CoV-2 and host proteins required for its viral life cycle. Here we propose and develop a complementary approach seeking to identify transcriptional regulators of the host proteins most critical to viral entry and replication within host cells. Given the aggressiveness of this pandemic and the urgency of deploying effective treatments, our first efforts focus on the repurposing of existing drugs as an attractive alternative to novel compound discovery. We note, however, that this screening approach could also be applied to the discovery of new chemical entities with more desirable properties than already available approved medicines.
Several lines of evidence point to the host protease TMPRSS2 that the virus employs to gain entry into lung epithelium as a promising target for pharmacologic targeting. Pharmacologic inhibition of TMPRSS2 prevents SARS-CoV-2 entry into cultured human lung cells 11 , and work in mouse also supports loss of tmprss2 as being protective against SARS-CoV-1 (details in Supplementary Materials) 16 . Furthermore, loss-of-function of TMPRSS2 is not strongly selected against in human populations (Supplementary Fig. 1) 17 . In contrast, for different reasons neither ACE2 nor CatB/L appear as strong candidates for therapeutic intervention based on transcriptional inhibition (details in Supplementary Materials). For these reasons, we focus on transcriptional modulation of TMPRSS2 as the highest priority, but have generated results for other critical host proteins using the same framework, in particular the other entry proteins and a broader set of proteins that interact with viral proteins 18 . The initial identification of these compounds highlights potential therapeutic targets and pathways that could be pursued by drug repurposing for the amelioratation of COVID-19 symptoms.

Literature-wide screen for transcriptional inhibitors of SARS-CoV-2 host factors reveals drug repurposing candidates
To identify compounds that transcriptionally inhibit host factors required for SARS-CoV-2 viral entry, we performed a literature-wide screen of RNA-seq datasets in the NCBI Sequence Read Archive (SRA) that incorporated keywords relating to drug treatments. Of 252,877 human RNAseq datasets in the SRA that were uniformly mapped by the Skymap project 19

Identification of TMPRSS2 transcriptional inhibitors
Next, we queried this database to identify drug treatments that led to significant differential expression of SARS-CoV-2 host factor genes. We first focused on TMPRSS2, given its promise  Fig. 2, data in Supplementary Table   1). Accordingly, we also replicated the enzalutamide signal in the LAPC4 prostate cancer cell line. (Fig. 2). Conversely, TMPRSS2 expression increased between 1.4-fold and 20-fold following treatment with androgens (e.g. testosterone, any duration of treatment, see To identify more compounds that modulate TMPRSS2 expression, we considered data from the Connectivity Map 21 , which includes an unbiased screen of compounds in multiple cell types followed by expression profiling of 978 landmark genes (L1000 platform) and statistical imputation for the rest of the transcriptome. However, TMPRSS2 was not well-imputed by the Connectivity Map (self-correlation = 0.56 between RNA-seq expression and imputed expression). Consistent with a recent preprint on the lack of reproducibility for L1000-based gene imputation values 22 , we did not observe a significant difference in TMPRSS2 expression following estradiol treatment in the breast or prostate cancer cells used in the RNA-seq studies described above. Interestingly, the prostate cancer cell line used in these studies, PC3, lacks androgen receptor (AR) expression and has only minimal estrogen receptor (ER) expression, so the absence of a TMPRSS2 signal is not unexpected.
We also considered single-cell RNA-sequencing (scRNA-seq) data from a recent study that measured transcriptomic signatures following treatment with 188 compounds in three cancer cell lines (MCF7 breast cancer, K562 leukemia, A549 lung adenocarcinoma) 23 .
TMPRSS2 expression was only present in MCF7 cells. In MCF7 cells, we identified two compounds that led to statistically significant increase in TMPRSS2 expression: JQ-1, a BET bromodomain inhibitor (q-value = 1.58x10 -24 , normalized effect size = 0.19), and fulvestrant, an estrogen receptor antagonist (q-value = 3.96x10 -11 , normalized effect size = 0.14). Interestingly, JQ1 is known to inhibit ER expression in MCF7, so the effects of JQ1 on TMPRSS2 expression may be mediated through the ER 24 . Together, these results identify existing drug compounds that can potentially be repurposed to transcriptionally inhibit TMPRSS2 expression, and suggest that the activation of estrogen pathways or inhibition of androgen pathways can be a promising modality for clinical intervention in SARS-CoV-2 infection. . Bars correspond to distinct biological replicates. Difference in control expression of TMPRSS2 corresponds to differences in baseline TMPRSS2 expression in prostate & breast tissue (see Supplementary Fig. 3). p-values calculated using DESeq2, comparing entire transcriptome between treatment and control. Rank represents rank of TMPRSS2 differential expression p-value compared to all other genes in transcriptome. SRP value corresponds to accession number of corresponding study in the NCBI SRA.

Relevance in lung tissue of estradiol-based transcriptional inhibition of TMPRSS2
One key limitation of the drug repurposing analysis presented above is that all experiments were performed in breast cancer or prostate cancer cell lines, leading to concerns about applicability and relevance to lung tissue. We next asked whether treatment with estradiol or androgen receptor antagonists could potentially transcriptionally inhibit TMPRSS2 expression in human lung tissue. First, we note that androgen treatment was shown to significantly increase TMPRSS2 expression in A549 lung adenocarcinoma cells 20 , and also that the androgen receptor which is known to regulate TMPRSS2 is expressed in human lung 26 . To test whether treatment with anti-androgenic or pro-estrogenic compounds would also down-regulate expression of TMPRSS2 in human lungs, we leveraged TMPRSS2's extremely high variability in gene expression across individuals (Supplementary Fig. 4). Specifically, we hypothesized that if TMPRSS2 expression in human lung tissue were regulated by estrogen and androgens, then its expression in lung would vary in tandem with known estrogen and androgen response genes ( Fig. 3A). To test this hypothesis, we quantified the correlation in gene expression between TMPRSS2 and 12,276 other genes expressed in lung across all 427 individuals with RNA-seq data (TPM > 5, Spearman correlation, GTEx v7 data). Notably, using Gene Set Enrichment Analysis (GSEA) on the gene list ranked by correlation with TMPRSS2 expression 27 , we observed that the hallmark early and late estrogen response gene sets, as well as the hallmark androgen response gene set are three of the top four sets most enriched in genes with high positive correlation with TMPRSS2 expression in lung (Fig. 3B,C).
Finally, we tested whether the transcriptionally repressive direction of effect for TMPRSS2 modulation by estradiol and androgen receptor antagonists might be the same in the human lung as it is in cell lines. To do so, we considered temporal gene expression data from MCF7 breast cancer cells and LNCaP prostate cancer cells in response to estradiol and dihydrotestosterone (DHT, androgen) treatment (SRA accession #: SRP070657 and SRP059762) 28 . Notably, we observed that in human post-mortem lungs, genes with expression patterns that most strongly correlate with TMPRSS2 also behave similarly to TMPRSS2 in response to estradiol and DHT treatment in MCF7 and LNCaP cells, respectively (Fig. 3D, middle panels). Conversely, genes with expression patterns most inversely correlated with TMPRSS2 in human lungs behave inversely in response to estradiol and DHT treatment (Fig.   3D, right panels). These results are consistent with a prior study that demonstrated both androgen-induced transcription of TMPRSS2, and androgen receptor binding to enhancer elements within the TMPRSS2 gene, in human lung adenocarcinoma cells 20 . Collectively, these results indicate that expression of TMPRSS2 in the human lung changes alongside known estrogen and androgen response genes, and with the same direction of effect. This suggests that the expression of TMPRSS2 in the lung can be repressed by treatment with estrogens or with androgen receptor antagonists, and that expression can be activated by androgens.
Furthermore, these data suggest that within human populations, TMPRSS2 expression in the lung is modulated by estrogens and androgens.

High variability in TMPRSS2 expression within human populations
Recent epidemiological data indicate that COVID-19 may cause severe illness in up to 30% of infected individuals with a case fatality-rate exceeding 20% in high risk populations 2,29 . Sexspecific infection and severity rates vary, with some data pointing to equal infection rates between men and women, and other data showing higher rates of infection and critical illness in men 2,30-32 . Given that TMPRSS2 expression is modulated by the sex hormones estradiol and androgens, and given evidence that tmprss2 loss of function in mouse could be protective against severity of SARS-CoV-1 16 , we next asked whether demographic differences in expression of TMPRSS2, as well as ACE2, could explain trends observed in COVID-19 epidemiological data.
The most striking feature of TMPRSS2 and ACE2 expression data is how variable their gene expression is amongst individuals. Using post-mortem gene expression data from human lungs (n=427, age range 20-80, GTEx Consortium v7) 26 , we noticed that both TMPRSS2 and ACE2 have highly variable expression amongst individuals (Supplementary Fig. 4). This variability is not due to differences between age groups or sex, but is present within each demographic group. As an example, we considered a set of 156 men aged 40-59. In this group, In addition, we considered sex and age-specific gene expression patterns for the four host factor genes. Using the GTEx dataset, we did not observe strong sex-specific differences in the expression of any of the four genes (Supplementary Figs. 4 and 5), however a recent preprint reported that TMPRSS2 has significantly higher expression in males specifically when considering gene expression data in bronchial epithelial cells from 170 individuals 33 . We also noticed that the skewness of ACE2 expression within demographic groups increased significantly with age (p=0.048 in females, p=0.049 in males, Fig. 4D). A more modest but not statistically significant increase in skewness was observed for TMPRSS2 expression across age (Fig. 4C).

Discussion
Here, we have used a computational approach that leverages publicly available While it is already known that sex hormones can regulate TMRPSS2 expression 20 , this work shows that estrogen-related compounds and androgen receptor antagonists appear to be the most securely identified down-regulators of TMPRSS2 expression amongst FDA approved drugs and other widely tested compounds. It is therefore a high priority to evaluate how estrogen-related compounds perform both in in vitro viral entry assays and in symptom amelioration in patients. While lower priority, the regulators of ACE2 and CatB/L should also be evaluated in in vitro viral entry assays, and it seems possible that ACE2 down-regulation could theoretically play a role in prophylactic prevention of infection, but not reduction in symptom severity, if such transcriptional inhibition were shown to be safe.

Expression variability of ACE2 and TMPRSS2
Of particular interest is the wide variability of TMPRSS2 expression levels in the human population, suggesting a possible explanation for much of the variability amongst people in the severity of disease. While we do not observe a strong correlation between increased TMPRSS2 or ACE2 expression and age in adults, we do observe a significant increase in skewness of ACE2 expression with age and non-significant increase in skewness of TMPRSS2, which raises a possible connection between higher expression of these host factors and the marked increase in mortality rate in the older population. Further, the dramatic variation in expression of both TMPRSS2 and ACE2 also suggests a credible hypothesis for variation of vulnerability within age groups. We note that this hypothesis can be readily tested by evaluating TMPRSS2 expression levels in patients with a range of severities, including comparisons of all symptomatic patients with the general populations. We consider these assessments a high priority, and suggest consideration of whether these evaluations can be performed in the same RNA samples used for SARS-CoV-2 diagnostics, given that scRNA-seq studies have shown TMPRSS2 and ACE2 expression in nasal goblet cells.

Considerations for hormone-based trials in COVID-19
The computational review of hundreds of FDA-approved compounds for transcriptional inhibitors of TMPRSS2 allows us to select the best starting points focused on maximizing transcriptional inhibition. Our computational analysis led to our hypothesis that robustly reducing AR transcriptional output will suppress expression of TMPRSS2 in target cells, interfere with The next step towards clinical translation is to select the optimal patient population.
Although data continue to evolve, many patients with COVID-19 exhibit a mild disease course and therapeutic intervention may be riskier than supportive measures, although a mild form of hormonal modulation could be considered. The end stage of COVID-19, characterized by respiratory and multi-organ failure, may be driven by a hyperinflammatory response rather than active viral replication. Thus, we contend that the optimal patient population would be those who manifest symptomatic disease that warrants hospitalization for supportive care, but are not yet exhibiting signs characteristic of acute respiratory distress syndrome (ARDS). Finally, it may be prudent to evaluate androgen suppression first in male COVID-19 patients, as suppression of androgen in females reduces endogenous estrogenic activity and thereby could increase expression of TMPRSS2 and exacerbate COVID-19.
Given that transcriptional suppression of TMPRSS2 operates independently of all other COVID-19 treatments currently being tested or considered, a combination therapy approach using anti-androgens in tandem with antivirals such as remdesivir or competitive inhibitors of viral entry proteins would also be an obvious priority for evaluation. It is worth noting that a small molecule inhibitor of TMPRSS2, camostat mesilate, has recently entered trials for COVID-19 (NCT04321096). In addition, testing combination therapies and comparing different hormonebased treatments in each sex is also a high priority for pre-clinical models.
In summary, we have developed a therapeutic strategy designed to complement existing antiviral strategies focused on inhibition of viral proteins and strategies focused on small molecule disruption of host protein interactions with the virus. Our strategy seeks to transcriptionally inhibit the key proteins that SARS-CoV-2 relies upon, and has identified immediate opportunities for therapeutic intervention using approved estrogen-related compounds or anti-androgens that modulate expression of TMPRSS2, critical to viral entry.
Depending on the degree of TMPRSS2 transcriptional inhibition needed for a protective effect, we hypothesize these transcriptional inhibitors can be used either alone or in combination with other direct inhibitors. Further, this framework can be expanded to identify the most effective down regulators of both viral entry proteins and proteins critical to the life cycle of the virus within cells.
count data, and for the current analysis, we focused specifically on a subset of comparisons with at least one biological replicate. scRNA-seq drug treatment data was downloaded from Srivatsan et al. 23 .
TMPRSS2 gene expression analyses TMPRSS2 gene expression across human individuals were downloaded from the GTEx project (version 7). Correlation between TMPRSS2 and other genes expressed in lung (n=427) was performed in R by calculating the Spearman correlation between TMPRSS2 expression and other genes, restricting to only 12,276 genes with median expression greater than 5 transcripts per million. Skewness of all expression patterns was calculated using the "skewness" function in the e1071 package in R.
Gene Set Enrichment Analysis was run using GSEA (v4.0.3) using the pre-ranked function, with values for each gene from -1 to 1 taken from their Spearman correlation with TMPRSS2 expression in human lung samples.
Comparisons of temporal gene expression for genes with high and low correlation were performed by taking the top and bottom 100 genes by Spearman's correlation (corresponding to correlation > 0.627 and < -0.5032). Estradiol data taken from SRP070657, and DHT data taken from SRP059762. As we consider the expression patterns of many genes, all with different baseline expression, we scaled the expression of every gene by dividing the expression at each time point by the expression at t=0. Plots were generated in ggplot2 using geom_smooth (loess).

Connectivity Map analyses
Connectivity Map L1000 data were downloaded from the NCBI GEO database (GSE70138 and GSE92742). We extracted imputed expression data for TMPRSS2 and ACE2, and for each treatment compound, we performed a non-parametric Mann-Whitney U test between control DMSO-treated samples and drug-treated samples. As Connectivity Map data tests drug treatments across a wide range of concentrations, we performed comparisons using the following dosage groups: (i) any dosage, (ii) below 0.5uM, (iii) 0.2uM to 1uM, (iv) 0.5uM to 2uM, (v) below 1uM. Multiple testing correction was performed using the Bonferroni-Hochberg correction.
To identify broad transcriptional inhibitor of SARS-CoV-2 viral protein interacting partners, we queried the Connectivity Map (clue.io) to identify compounds most likely to down-regulate host proteins required for SARS-CoV-2 pathogenesis as determined via a protein-protein interaction analysis 18 . The Connectivity Map considers up to 150 genes per query. We used three different signatures as input. In the first, we included the 33 out of the 332 host proteins that are directly quantified by the L1000 platform (i.e. "landmark genes"). The second signature included these landmark genes and 33 other well-imputed (self-correlation >0.8) genes. In the third signature, we considered the top 150 genes from the protein-protein interaction map, ranked by fold change. We considered the "summary" Connectivity Score for each compound, which represents the average Connectivity Score achieved in each of the nine core CMap cell lines.