Novel Method for Detection of Genes With Altered Expression Caused by Coronavirus Infection and Screening of Candidate Drugs for SARS-CoV-2

To better understand the genes with altered expression caused by infection with the novel coronavirus strain SARS-CoV-2 causing COVID-19 infectious disease, a tensor decomposition (TD)-based unsupervised feature extraction (FE) approach was applied to a gene expression profile dataset of the mouse liver and spleen with experimental infection of mouse hepatitis virus, which is regarded as a suitable model of human coronavirus infection. TD-based unsupervised FE selected 134 altered genes, which were enriched in protein-protein interactions with orf1ab, polyprotein, and 3C-like protease that are well known to play critical roles in coronavirus infection, suggesting that these 134 genes can represent the coronavirus infectious process. We then selected compounds targeting the expression of the 134 selected genes based on a public domain database. The identified drug compounds were mainly related to known antiviral drugs, several of which were also included in those previously screened with an in silico method to identify candidate drugs for treating COVID-19.


I. INTRODUCTION
T HE current pandemic of COVID-19 caused by infection of the new coronavirus strain SARS-CoV-2 is a severe public health problem that must be resolved as soon as possible. To achieve this goal, it is essential to understand the mechanism by which SARS-CoV-2 successfully invades human cells. Recently, Pfaender et al. [1] demonstrated that host lymphocyte antigen 6 (LY6E) complex impairs coronavirus fusion and confers immune control of viral disease. The authors also used a transcriptome approach to evaluate the effect of infection of mouse hepatitis virus (MHV), a natural mouse pathogen that causes hepatitis and encephalomyelitis, which is a well-studied model of coronavirus infection. Although they found many pathways that were disturbed after MHV infection, they did not perform a detailed analysis of the genes with altered expression in response to MHV infection.
In this study, we applied tensor decomposition (TD)-based unsupervised feature extraction (FE) [2] to identify genes with altered expression by MHV infection as a model of coronavirus. We further performed functional enrichment on the selected genes to determine their potential associations with coronavirus infection processes, and screened candidate drug compounds targeting these genes. Overall, this work expands TD formalism by exploring the interpretation of sixdimensional tensors in an infectious disease context. Moreover, we demonstrate a novel application of TD to facilitate the drug discovery process, which can offer a valuable resource for researchers to obtain mechanistic insight for identifying effective drugs for infectious diseases such as COVID-19.

A. Gene expression profile dataset
The gene expression profile was downloaded from the Gene Expression Omnibus (GEO) dataset GSE146074. This dataset comprises the gene expression profiles of the liver and spleen from female mice experimentally infected with MHV or injected with phosphate-buffered saline (PBS) as a control group for comparison. This experiment was performed with mice of two genetic backgrounds, including wild-type (WT) mice and an textit Ly6e-knockout (KO) mutant strain. The number of replicates for each group are listed in Table I. Seventy two files whose names start by "GSM" (processed file) were used for analyses.  WT  3  3  3  3  3  3  KO  3  3  3  3 3 3

B. Additional gene expression profile data set
In order to validate the suitability of MHV as model SARS-CoV-2 infectious process, two additional gene expression profiles of mice lung SARS-CoV infectious processes (Table II. For GSE33266 and GSE50000, we download two files GSE33266 series matrix.txt.gz and GSE50000 series matrix.txt.gz, respectively. Although GSE50000 also includes files for SARS-CoV-MA15, they were omitted (the reason why we did not use files for MA15 can be available at Discussion section). For the cases that have less than five biological replicates, we used more than once some of replicates in order to have five biological replicates for invidual cases. C. TD-based unsupervised FE The gene expression profile dataset was formatted as a tensor, x ijkmnp ∈ R N ×2×2×3×3×2 , which represents the expression level of the ith gene of the jth genotype (j = 1:KO, j = 2:WT) of the kth tissue (k = 1:liver, k = 2:spleen) of the mth treatment group (m = 1:PBS day 5, m = 2:MHV day 3, m = 3:MHV day 5) for the nth biological replicate (1 ≤ n ≤ 3) and pth technical replicate (1 ≤ p ≤ 2).
The TD is therefore expressed as where G( 1 2 3 4 5 6 ) ∈ R N ×2×2×3×3×2 is a core tensor, and u 1j ∈ R 2×2 , u 2 k ∈ R 2×2 , u 3m ∈ R 3×3 , u 4n ∈ R 3×3 , u 5 p ∈ R 2×2 , and u 6 i ∈ R N ×N are singular values vectors, which can be obtained via the higher-order singular value decomposition (HOSVD) algorithm [2]. To select u 6 i , attributed to selected genes, we need to select u 1j attributed to the genotype, u 2 k attributed to the tissue, u 3m attributed to the treatment, u 4n attributed to the biological replicate, and u 5p attributed to the technical replicate, associated with desired properties.
For this study, we sought to identify genes whose expression is independent of the mouse genotype, tissue type, and replicate. Thus, u 1j , u 2 k , u 4n , and u 5p should be independent of j, k, n, and p.
By contrast, we require u 3 m to be dependent on u 31 < u 32 < u 3 3 or vice versa. This is because m = 2 (3 days after MHV infection) must be between m = 1 (5 days after PBS injection as the control) and m = 3 (5 days after MHV infection).
After selecting 1 , 2 , 3 , 4 , and 5 based on the above considerations, we selected 6 associated with G( 1 2 3 4 5 6 ) as the largest absolute value, with fixed 1 , 2 , 3 , 4 , and 5 values. Using the selected u 6 i , The P -values, P i s, were attributed to gene expression levels as where P χ 2 [> x] is the cumulative probability distribution of the χ 2 distribution when the argument is larger than x and σ 6 is the standard deviation of u 6i . P i s were adjusted by the Benjamini and Hochberg (BH) criterion [2], and only genes associated with an adjusted P i less than 0.01 were selected for further analysis.
Downstream procedures for gene selection after identifying singular value vectors are the same as those for MHV. Core tensors, G( 1 2 3 4 )s, are investigated in order to see which 4 associated with G( 1 2 3 4 ) having larger absolute values given 1 , 2 , 3 . u 4i s selected are used for attributing Pvalues, P i , to genes and genes associated with adjusted Pvalues less than 0.01 were selected.

D. Enrichment analysis
Gene symbols of genes selected by TD-based unsupervised FE with significantly altered expression due to MHV infection were uploaded to Enricher [3] and Metascape [4], which are popular enrichment analysis servers that evaluate the biological properties of genes based on enrichment analysis.
III. RESULTS Figure 2 shows the overview of analyses performed in this study.

A. Synthetic data sets
In order to demonstrate how effective tensor is, we employ synthetic data sets composed of multiway and multiclass labels, x ijks ∈ R N ×M ×K×S . Here, i represents N variables, among which partial collections having distinct values between js as well as ks must be selected where S replicates are available for individual combinations of j and k. A typical example is that x ijks represents expression of ith genes in kth tissue of patients who belong to jth group that consists of S patients. Then the purpose is to identify which genes are expressed distinctly in the tissue (k) -specific as well as patients group (j) specific ways.
The most popular approach to address this problem is linear regression, where a j and b k are the pre-defined variables that represent some properties of jth patients group and kth tissue, respectively. α i , β i , γ i are regression coefficients that are supposed to be selected such that the discrepancy between both sides of equation is minimized. Then is associated with significant P -values are selected as those whose expression are distinct between distinct js and ks, respectively. Of course, it is not guaranteed that liner regression correctly figure out the dependency of x ijks upon j and k. Here we generate synthetic data that follows where ξ ijk are drawn from N (1, 0) for every combination of i, j, k (this means that x ijks s associated with distinct ss share the same ξ ijk ) and ε ijks are drawn from N (1, 0) for every combination of i, j, k, s. N (µ, σ) represents the normal distribution that has mean of µ and standard deviation of σ.
For i ≥ N 0 , α ijk is taken to be zero; this means that x ijks for i ≥ N 0 is simply random variables. The task is to correctly select N 0 variables associated with the dependence upon j and k.
As an alternative regression analysis of liner regression, we employ which reflects the multiplicative nature when x ijk is generated with eq. (8) although it cannot represent x ijk completely since α ijk is replaced with α i (i.e., dependence upon j or k is not assumed). In addition to the above two regression analysis, we also applied categorical regression that is known to be equivalent to analysis of variance (ANOVA) where δ jj and δ kk are Kronecker's delta and α ijk and γ i are regression coefficients, respectively. One should notice that eq. (10) can fully reproduce x ijk generated by eq. (8), when α ij k is taken to be ξ ij k a j b k excluding randomness introduced by ε ijks , which can be regarded as residuals.  Finally, we also applied TD based unsupervised FE to x ijks . TD is computed as (11) Then 1 and 2 that have largest absolute values of correlation coefficients between u 1j and a j or u 2k and b k are selected. Then top two 1 that have larger absolute G( 1 2 1 4 )s are selected. The reason why 3 is fixed to be 1 is because u 1s always represent u 3s that lacks s dependency, i.e., constant independent of s (Since x ijks should take same values between replicates, u 3 s that do not have any s dependence is selected). Then P -values are attributed to i as where summation is taken over two selected 4 only.
No matter which one of liner regression (eq.(7)), eq. (9), categorical regression (eq.(10)) or TD (eq.(11)) is used to compute P i , P i is corrected by BH criterion [2] and is having corrected P i less than threshold P -values are selected as those associated with dependency upon j and k, respectively.
For simplicity, we employ a j = j and b k = k when generating x ijks using eq. (8) as well as eq. (9) is used for regression analysis. Specifically, we chose N = 1000, N 0 = 100, M = K = 3, S = 5. Generation of x ijks and selection of is are repeated by 100 times with two distinct threshold Pvalues 0.01 or 0.1 for each trial. Table III shows the averaged performance over 100 trials. Although categorical regression, eq. (10), can outperform other three since it is expected to completely reproduce eq. (8) as denoted in the above, it has one week point; it cannot explicitly consider the dependence of a j and b k upon j and k, since ξ ijk a j b k s are estimated as one parameter α ijk . This might prevent us from selecting is that are specifically associated with the dependence upon j and k that a j and b k represent. Even if some is are selected, it might because of dependence upon j and k that a j and b k do not represent. There are no ways for us to check this point. In this sense, TD based unsupervised FE, which is the second best, is better than categorical regression since TD based unsupervised FE can consider a j and b k when selecting u 1j and u 2k correlated with a j and b k , respectively. Another advantages of TD based unsupervised FE is short cpu time required (Table III). Roughly speaking, cpu time required for TD based unsupervised FE is one tench of other three methods that must repeat regression analysis by N times. This difference might matter when we are forced to deal with massive data set.
The reason why TD based unsupervised FE can consider dependence upon j and k that a j and b k have although other methods cannot is as follows; because of random nature of ξ ijk in eq. (8), when individual i is considered, there are no ways to consider dependence upon j and k that a j and b k have. Nevertheless, when x ijks s are averaged over multiple is, there are some possibilities that dependence upon j and k that a j and b k have can appear, since randomness of ξ ijk can be smeared out because of averaging. In actual, since u 1j as well as u 2k can be such variables that can appear only after averaging and can represent dependence upon j and k that a j and b k have. This is the reason why TD based unsupervised FE can outperform eq. (9) that explicitly consider multiplicative nature when x ijks s are generated and can consider dependence upon j and k that a j and b k have, which categorical regression, eq.(10), cannot.

B. Selection of genes
At first, we apply TD based unsupervised FE to gene expression profiles introduced in Materials and Methods. Then other methods applied to synthetic data set than TD based unsupervised FE are considered for the comparisons in the next section, discussions and conclusions.

C. Protein-protein interaction with coronavirus infection
We first evaluated whether the 134 selected genes could reflect the process of coronavirus infection using the Enrichr server for functional enrichment analysis. Several of the genes were enriched in the category "Virus-Host PPI P-HIPSTer 2020", which is related to SARS-CoV (Table VI, see the  supplementary materials for the full list).
These genes were also related to ORF1ab, polyprotein, and 3C-like protease. Interestingly, Woo et al. [5] suggested that ORF1ab, which encodes a replicase polyprotein of CoV-HKU1, is cleaved by papain-like proteases and 3C-like proteinase. Thus, it is reasonable that ORF1ab, polyprotein, and 3C-like protease would be affected during MHV infection VI. Other PPIs detected that are not listed in Table VI (see  supplementary materials) were also mainly associated with ORF1ab and polyproteins, suggesting that our strategy has clear capability to elucidate the basic infectious process at the molecular level that is common among various coronaviruses.

D. Virus perturbation
We next evaluated whether genes with known altered expression by virus perturbation overlapped with the 134 genes selected by our TD-based unsupervised FE approach (Table  VII; see the supplementary material for the full list).
Among these, we detected the overlap of many genes that are pertubated in response to either SARS-CoV or SARS-like bat CoV, which are the genetically closest coronaviruses to the new SARS-CoV-2 strain. This further suggests that our results could have high similarity to the genes perturbated in SARS-CoV-2 infection.

E. TMPRSS2 as a scavenger receptor
For further functional enrichment analysis, we uploaded the 134 selected genes to Metascape to identify non-redundant biological terms (Fig. 4). Among the terms identified, "R-HSA-2173782: Binding and Uptake of Ligands by Scavenger Receptors" was the third most significantly enriched term. Although it was initially surprising that a scavenger receptor might be related to the response to coronavirus infection, a search of the related literature revealed that the scavenger receptor TMPRSS2 plays a critical role in SARS-CoV-2 infection as well as SARS-CoV infection [6]. Isolation of SARS-CoV-2 was also reported to be enhanced by TMPRSS2-expressing cells [7]. Moreover, TMPRSS2 contains a scavenger receptor domain [8], suggesting that TMPRSS2 activity would be related to detection of scavenger receptor activity. This finding further demonstrates the outstanding capability of our strategy to detect factors related to the SARS-CoV-2 infectious process. Moreover, this analysis suggests that research on the SARS-CoV infection process could be informative for understanding the SARS-CoV-2 infection process when it is not possible to directly investigate SARS-CoV-2 infection.

F. Drug discovery
We previously demonstrated that genes selected by TDbased unsupervised FE are useful to screen for drugs that are effective in treating disease or those that may cause adverse effects [9]. Therefore, we used this approach to screen for candidate drugs to treat coronavirus infections based on the individual terms that emerged from the Enrichr analysis.
1) Drug Matrix: In the Enrichr category "DrugMatrix", the top-ranked drug was related to virus infection (Table  VIII; see the supplementary materials for the full list). Most of these viruses are enveloped, single-stranded RNA viruses. Coronaviruses, including SARS-CoV-2, are positive-sense, enveloped, single-stranded RNA viruses, whereas influenza virus is a negative-sense, enveloped, single-stranded RNA virus.
Primaquine is known to inhibit the replication of Newcastle disease virus [10], which is in the family of paramyxoviruses that are enveloped, non-segmented, negative-sense single-stranded RNA viruses. Meloxicam is known to have cytotoxic and antiproliferative activity on virus-transformed tumor cells [11], including myelocytomatosis virus and Rous    [14]. To our knowledge, there are no reports that neomycin is effective against RNA viruses; however, one study showed that it could inhibit infection of fibroblasts with human cytomegalovirus [15], which is a DNA virus.
Although not all viruses identified to be related to the 134 genes selected by TD-based unsupervised FE are enveloped, positive-sense, single-stranded RNA viruses similar to SARS-CoV-2, since drugs shown to be effective against other viruses (e.g., DNA viruses) are also often effective against RNA viruses (including pyrogallol that was screened by our strategy), drugs in Table VIII warrant being tested as potential treatments for SARS-CoV-2 infection.
2) Drug Perturbations from GEO: Several promising drug compound candidates were also screened from the GEO "Drug Perturbations from GEO up" and "Drug Perturbations from GEO down" categories, along with available evidence for possible adverse effects (Table IX; see the supplementary material for the full list).
Drugs associated with upregulated genes that overlapped with the 134 genes selected by TD-based unsupervised FE are considered to be more likely to cause adverse effects, since they will enhance the expression of genes altered by SARS-CoV infection. Captoprilis is an angiotensin-converting enzyme (ACE) inhibitor, which is known to activate ACE2 that is the receptor that SARS-CoV-2 uses to infect human cells [16], suggesting that this drug might have negative effects for COVID-19 therapy. Coenzyme Q10, which frequently emerged in Table IX, has been reported to accelerate virus infection [17], which could therefore also have negative effects for COVID-19 therapy. Fenretinide is known to effectively inhibit HIV infection [18], and therefore might be a promising drug candidate for SARS-CoV-2 even though it was listed in the "Drug Perturbations from GEO up" category.
In contrast to the drugs in the above list, those associated with downregulated genes that overlapped with the 134 genes selected by TD-based unsupervised FE are considered to be able to effectively suppress SARS-CoV-2 infection, since they will inhibit the expression of genes altered by SARS-CoV infection. Pioglitazone was also included in the list of candidate compounds for SARS-CoV-2 screened by an in silico method [19]. Quercetin was reported to inhibit the cell entry of SARS-CoV-2 [20], and was also included in the list of candidate compounds for SARS-CoV-2 screened by an in silico method [21]. Fenretinide was also included in the drugs identified as effective compounds in the "Drug perturbations from GEO up" category as described above. Decitabine is one of the drugs used in HIV combination therapy [22]. Troglitazone impedes the oligomerization of sodium taurocholate co-transporting polypeptide and entry of hepatitis B virus into hepatocytes [23], which is a partially double-stranded DNA virus. Finally, motexafin gadolinium was reported to selectively induce apoptosis in HIV-1-infected CD4+ T helper cells [24].
Based on these observations, our strategy appears to be useful to identify potential drug compounds for SARS-CoV-2.

G. Comparison with in silico drug discovery
Finally, we compared the drugs screened out using our approach from the "Drug perturbations from GEO up/down" lists with those screened from two in silico drug discovery studies [19], [21] 1) Comparison with Wu et al. [19]: We found multiple hits, which are summarized in Table X. The main drugs identified included doxycycline, ascorbic acid, isotretinoin, pioglitazone, cortisone, and tibolone.
Wu et al. [19] identified 29 potential PLpro inhibitors, 27 potential 3CLpro inhibitors, and 20 potential RdRp inhibitors from the ZINC drug database, and identified 13 potential PLpro inhibitors, 26 potential 3Clpro inhibitors, and 20 Potential RdRp inhibitors from their in-house natural product database. Doxycycline was among both the potential PLpro and 3CLpro inhibitors; ascorbic acid and isotretinoin were among the potential PLpro inhibitors; pioglitazone was among the potential 3CLpro inhibitors; and cortisone and tibolone were included in the potential RdRp inhibitors from the ZINC drug database. These multiple hits also further support the suitability of our strategy.
2) Comparison with Ubani et al. [21]: Ubani et al. [21] screened a library of 22 phytochemicals with antiviral activity obtained from the PubChem database for activity against the spike envelope glycoprotein and main protease of SARS-CoV-2. Among these, we found only one hit that overlapped with our screened out drugs, which was quercetin (Table XI).

IV. DISCUSSION AND CONCLUSION
In this paper, we present a novel evaluation method to identify drugs that could be used to effectively treat COVID-19. We applied a TD-based unsupervised FE method to select genes with altered expression caused by MHV infection in mice. Although the dataset analyzed for this study was not based on SARS-CoV-2 infection, the 134 genes selected by TD-based unsupervised FE can still be considered useful for gaining a better understanding of the infectious mechanism of SARS-CoV-2 for several reasons. First, the 134 genes selected were enriched in general RNA virus proteins that play important roles during infectious processes. This suggests that the infectious mechanism represented by the 134 genes in the mouse model is also applicable to SARS-CoV-2 infection. In fact, these genes were also enriched in processes related to scavenger receptor activity, which might reflect the critical role of TMPRSS2 activity in SARS-CoV-2 replication, suggesting a potential therapeutic target.
Following these achievements, we tried to identify potential drug candidate compounds that could influence the 134 selected genes. Among these, we screened out several candidate compounds that are known antiviral drugs, including those that were screened out as drug candidate compounds for SARS-CoV-2 using in silico methods.  One might wonder if MHV is a suitable model to be consider SARS-CoV-2 infection, since there are more data set available for SARS-CoV infection toward mouse lung. In order to evaluate the suitability of MHV as model of SARS-CoV- 2 infectious process, we also performed additional analyses using two SARS-CoV infectious processes toward mouse lung (see Materials and Methods). As can be seen in Figs. 5 and 6, 1 = 2 is selected as singular value vector u 1j that has monotonic dependence upon j, 2 = 2 is selected as singular value vector u 2k that has monotonic dependence upon k, and 3 = 1 is selected as singular value vector u 3n that has constant values regardless n for GSE33266 while 1 = 2 is selected as singular value vector u 1j that has distinct values between Mock (j = 3) and infectious samples (j = 1, 2), 2 = 3 is selected as singular value vector u 2k that has monotonic dependence upon k, and 3 = 1 is selected as singular value vector u 3n that has constant values regardless n for GSE50000. Then 4 = 2, 3 and 4 = 1 are selected as those associated with the larger absolute values of G( 1 2 3 4 ) given 1 , 2 , 3 . P -values, P i , are attributed to gene i using cumulative χ 2 distribution, P χ 2 [> x], as described in Materials and Methods, 569 and 475 gene symbols associated with selected genes are selected for GSE33266 and GSE50000, respectively (see supplementary materials). Figure 7 shows Venn diagram of these two sets of genes and 134 genes selected by TD based unsupervised FE using GSE146074. Although there are some overlaps, majority of genes are not shared among these three gene sets.
These two sets of genes are uploaded to Enrichr and it is checked if they are significantly overlapped with corona virus PPI genes (Table XII). It is obvious that two sets of selected genes are significantly overlapping with genes that are  (Table II). U1:u 2j , U2:u 2k , and U3:u 1n . See Materials and Methods for the meanings of j, k and n.
supposed to interact with SARS-CoV proteins in spite of that these two gene sets are not highly coincident with 134 genes selected by TD based unsupervised FE using GSE146074 (Fig. 7). This means that TD based unsupervised FE generally has ability to predict PPI using gene expression profiles no matter which data sets are used. It also suggests that the achievement in Table VI is not unlikely accidental but really an evidence that genes selected by TD based unsupervised FE are those interacting with SARS-CoV proteins during infectious processes.
A possible another concern is that the studies are not based upon direct investigation of SARS-CoV-2 but based upon on that of closely related virus. This suggests that everything   (Table II). U1:u 2j , U2:u 3k , and U3:u 1n . See Materials and Methods for the meanings of j, k and n.
identified here might not be applicable to SARS-CoV-2 at all. In order to address this point, we compare the 134 genes (Table V) with genes reported to be interacting with SARS-CoV-2 proteins [25] (Table XIII). It is obvious that 134 genes are significantly overlapping with human genes reported to be interacting with SARS-CoV-2 proteins. This means that TD based unsupervised FE has ability to predict PPI even when only gene expression profiles during related virus infection are available. Finally, we also compared identified drugs ("Drug Pert GEO up/down" and "DrugMarix" in supplementary materials) with those reported to be possible drugs toward SARS-CoV-2 [26]. Among 142 drugs identified by Zhou et al [26], as many as 25 drugs were found to significantly affect 134 genes in at  least one experiment within either DrugMatrix, or GEO, in Enrichr with adjusted P -values less than 0.05 (XIV). Thus, our proposal of drug repositioning is also reliable.
Since it is unlikely that this level of agreements mentioned in the above is purely accidental, the drugs identified in the present study can be useful candidates for further evaluation in COVID-19 therapy. This work therefore provides a foundation for further research pertaining to utilizing advanced learning concepts to analyze COVID-19 infectious disease.
Final concerns to be addressed might be the comparison with other methods applied to synthetic data sets than TD based unsupervised FE. When considering synthetic data set, categorical regression, eq. (10), outperformed TD based unsupervised FE. Although eq. (9) cannot be better than TD based unsupervised FE (Table III), it is still comparative. If these two more easily understood methods are better than or comparative to TD based unsupervised FE, TD based unsupervised FE, which is more difficult to interpret, is useless. In order to check this point, we applied categorical regression and eq. (9), which are modified as α ij k m δ jj δ kk δ mm + γ i (13) and where a i = i, b k = k, c m = m, to the present set (GSE146074). Then is associated with adjusted P -values less than 0.01 are selected; it turned out that there are too many genes that pass this screening, 25609 and 20217, respectively. This suggests that these two methods lack the ability to screen limited number of genes that are likely interacting SARS-CoV-2 proteins, since not most of genes but only limited number of humna genes are interacting SARS-CoV-2 proteins (2 × 10 4 are as many as all of human protein coding genes). Although this fact is enough to deny to employ these two methods replaced with TD based unsupervised FE, we still seek other possibility; even if these two methods fail to screen limited number of genes, if only top ranked genes are selected, it might be possible to select limited number of genes that significantly overlap with genes reported to interact with SARS-CoV-2 proteins. In order to see this, we select top ranked 134 gene using P -values that these two methods compute and upload them to Enrichr. Nevertheless, we found that there are no SARS-CoV-2 proteins that significantly interact with these uploaded 134 genes. Since the significant interaction with SARS-CoV-2 proteins is the primary requirement for genes to be used to screen candidate drug compounds, these two methods are definitely useless fo the present purpose. If we employ more complicated and sophisticated methods to select genes, it might be possible for us to have limited number of genes that significantly interact with SARS-CoV-2 protein. Nonetheless, since TD based unsupervised FE is simple and rapid (see cpu time in Table III) enough to achieve the present purpose, we are not willing to test more complicated and advanced methods in the present study.

ACKNOWLEDGMENT
This study was supported by KAKENHI grants 19H05270, 20H04848, and 20K12067. This project was also funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, under grant no. KEP-8-611-38. The authors, therefore, acknowledge DSR with thanks for providing technical and financial support.