The Expression of RN 7 SL 494 P ( 7 SL ) Predicts Nodal Metastasis and Prognosis in Lung Adenocarcinoma

The metastasis of lung cancer can spread to the lymph nodes around the lungs. Metastasis, rather than the primary cancer, judges patients survival. Wherefore, a more detailed study on transcriptome of metastatic lung adenocarcinoma (LUAD) including primary carcinoma was carried out. LUAD RNA-seq data and the corresponding clinical information were available from The Cancer Genome Atlas (TCGA), which included 522 cases but only 515 cases have transcriptome data. Differential expression analyses between cases and controls, between primary cancer and metastasis subgroup, or between TNM stages, were respectively carried out using edgeR package. Then, the KruskalWallis tests were used to verify the gradient changes of cancer metastasis or staging with the differential expression genes. The survival analyses were calculated using the KaplanMeier algorithm and log-rank test. The functional predictions for the differentially expressed genes were porformed with the Gene Ontology and Kyoto Encyclopedia of Genes and Genomes (GO/KEGG). Single gene set enrichment analysis (single GSEA) was run to explore the biological pathways associated with the expressions of RN7SL494P gene based on the Molecular Signatures Database (MSigDB). 406 and 439 differentially expressed genes were identified respectively in lymph node metastasis or TNM stages. 112/296 intersection genes were associated with nodal metastasis and/or staging, among them only 25 genes were associated with the nodal metastasis, 13 genes were associated with the staging with gradient changes. Only one gene (RN7SL494P) was found to be associated with prognosis. But RN7SL494P was not found joining any biological functions or processes or cellular components with GO/KEGG analyses. Finally, single GSEA enrichment and pathway analyses showed that RN7SL494P might Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 5 March 2019 doi:10.20944/preprints201903.0064.v1


Introduction
Lung adenocarcinoma (LUAD), a histological subtype of non-small cell lung cancer (NSCLC), rises when healthy cells change and uncontrolledly grow in the outer region of the lung.It is the most common lung cancer, and accounting for about 40 percent of all lung-derived cancers [1].
Lung adenocarcinoma tends to grow in smaller airways, such as bronchioles, which develops more tardily than any other sorts of lung cancer.Once cancerous tissues growing, it may cause cancer cells to fall off.These cells can be taken away in the blood, or float in the lymph fluid which encompasses the lung tissue [2].The lymph flows through pipes called lymphatic vessels, which inflows into collecting station called lymph nodes [3,4].When a cancer cell passes through the bloodstream into a lymph node or a distant body, it is called metastasis.
In this study, we provided a comprehensive screening for nodal metastasis, TNM staging with the transcriptome and clinical data in Lung adenocarcinoma of The Cancer Genome Atlas (TCGA) project.TCGA began in 2006 [5], which is a joint research project between the National Human Genome Research Institute and the National Cancer Institute.

The Differential Expression Genes in Lung Adenocarcinoma
We conducted gene differential expression analysis and found total 13118 differential expression genes, among them, 2800 down-regulated genes and 10318 upregulated genes.The top 10 significant down-and up-regulated genes were shown in Table 2.We chose all significantly up-and down-regulated mRNA to draw their expression on the heatmap and volcanic map (Figure 2 A and B).

GO and KEGG Analysis of Differentially Expressed Genes
We conducted a GO analysis of all differentially expressed genes in LUAD and found that RN7SL494P was not involved in any biological functions or processes or cellular components in DAVID database (Figure 4 A and B).Kobas was used for differential gene functional annotation with KEGG pathway.Indeed, after identifying key KEGG pathways, we also did not find RN7SL494P-related pathways (Supplement Table 1).The functional annotation of the differentially expressed genes with clusterProfiler R package also did not find RN7SL494P-related KEGG pathways (Supplement Table 2).
Therefore, a single gene functional enrichment method associated with specific gene would be studied in the following step.

Staging
Then, based on the lymph node metastasis features of the subjects in Table 1, a total of at least 406 differential genes were obtained, and 312 genes were significantly upregulated, and 94 genes were significantly down-regulated (Figure 2 C and D).The top 10 significant down-and up-regulated genes associated with cancer metastasis were shown in Table 3.Similarly, the TNM staging-related differentially expressed genes were shown in Figure 2 E and F, and its top 10 significant down-and up-regulated genes were shown in Table 3.

The Overlapping Differentially Expressed Genes Associated With Nodal Metastasis and TNM Staging
Venn diagram analysis was carried out to visualize the overlapping differentially expressed genes between lymph node metastasis and TNM stages using VennDiagram R packa ge.296 overlapping genes were found (Figure 3 A).

The Gradient Changes of Differentially Expressed Genes Associated With Nodal Metastasis and TNM Staging
We analyzed the gradient changes of differentially expressed genes in lymph node metastasis (from N0 to N2) and TNM stage (from I to IV) with Kruskal-Wallis test.
Because the number of N3samples was only two, this subgroup would not be considered in this section.112 differentially expressed genes were associated with the gradient changes of lymph node metastasis, or TNM stages, or metastasis and TNM stages (Table 4).Among them, 25 differentially expressed genes were associated with the lymph node metastasis; 13 differentially expressed genes were associated with the TNM stages; and only 7 genes (SCARNA7, AC105999.2,RANBP20P, RN7SL151P, SYNPR, AL512638.1,and TMIGD1) were simultaneously associated with lymph node metastasis and TNM stages.

Survival Analysis of Differentially Expressed Genes Associated With Nodal Metastasis and TNM Staging
We analyzed survival time with all 30 differential expression genes which associated with the gradient changes on lymph node metastasis and/or TNM stages, just one gene (RN7SL494P) was found to be associated with patient survival time (Table 4 and Figure 3B), which was simultaneously associated with the gradient changes on lymph node metastasis (P = 0.02587 for N0 vs. N1 vs. N2; Figure 3C), and 0.006 for N0 vs. N1&N2; Figure 3D).But this gene did not be associated with the gradient changes on TNM stages (P = 0.057; Figure 3E).

Single GSEA Enrichment and Pathway Analysis
The associations between RN7SL494P co-expressions and cancer-related pathways were carried out, and there was only one enriched pathway KEGG_RENIN_ANGIOTENSIN_SYSTEM which associated with higher expressions of RN7SL494P gene (Figure 5A and B).But there were 45 KEGG functional pathways associated with lower expressions of this gene in LUAD (Figure 5D).C An examples showing that the genes co-expressed with lower expression of RN7SL494P were associated with KEGG_NUCLEOTIDE_EXCISION_REPAIR; D The genes co-expressed with lower expressions of RN7SL494P were enriched in 45 biological pathways.

Discussion
Many patients were diagnosed as cancer metastasis, which makes treatment very difficult.The 5-year survival rate for metastatic lung cancer was about 1 percent [6].
When tumors spread outside the lungs, it may be difficult to cure successfully.Because none of these patients have a single best treatment, the choice of treatment strategies relies on the location, size and stages, subtypes and the lymph nodes involved.
Scientists have exploited methods for cancer patients who can screen for metastasis.
The main target of screening is to reduce the number of people who die from cancer, especially from cancer metastasis.To study the "drive genes" in metastatic lung adenocarcinoma, we examined the differentially expressed genes with the repository data of RNA-seq from TCGA.We comprehensively analyzed the gene expression in lung adenocarcinoma, especially in the course of tumor metastasis.
We identified the differential expression genes which associated with lymph node metastasis and TNM stages in lung adenocarcinoma.We found that RN7SL494P gene not only possessed the above characteristics, but also prognostic significance in metastatic cancer.Subsequently, RN7SL494P single GSEA enrichment analysis further demonstrated the roles and functions of RN7SL494P.RN7SL494P (7SL) located on 15q21.2,belongs to a long noncoding RNA (lncRNA) class pseudogene.As an eukaryotic small cytoplasmic RNAs, 7SL RNA is essential for protein translocation that binds to the ribosome and targets the newborn protein in the endoplasmic reticulum to secrete or insert the membrane during the assembly of human signal recognition particle (SRP) [7,8].A study with RNA sequencing from 11 human tissues showed that 7SL was is the highest expression of ncRNAs and could be an order of magnitude higher than any mRNA [9].7SL stimulates the GTPase activities of the SRP and its signal receptor (SR) complex [10,11].
Defines a set of genes based on previous biological experiments, for example, knowledges about co-expression or biochemical pathways.A recent study showed the Sstructure domain of 7SL RNA is related to the cellular activity in mitochondria [12].
Furthermore, except the function of NUCLEOTIDE_EXCISION_REPAIR, the results of single-GSEA demonstrated that RN7SL494P was also associated with CELL_CYCLE, RIBOSOME, DNA_REPLICATION, and UBIQUITIN_MEDIATED_PROTEOLYSIS.
Thus, RN7SL494P (7SL) may play a role in the process of translation and assembly of peptides, and its dysfunction may cause pathological occurrence.
We found the high expression of RN7SL494P improved tumor survival rates in lung adenocarcinoma (high expression 41.80% vs. low expression 39.70%; Figure 3B).Yang et al. [13] found that the over-expression of FOXP3 could inhibit the transcription of 7SL RNA through binding to its promoter and subsequently strengthens the translation of p53 and conduced to repressing the growth of multiple tumors (but not include lung cancer).This study suggested that 7SL (RN7SL494P) RNA may be a direct target of indicated that there were so many complex regulatory networks in the process of tumor formation.We speculated that RN7SL494P gene may display "inconsistent functions" in different tumor microenvironments.

Conclusions
In the current study, we used the TCGA database to analyze expressions of genes in lung adenocarcinoma.We found that the expression of RN7SL494P (7SL) was obviously associated with nodal metastasis along with gradient changes, and its prognostic value was also better than any other genes with differential expressions.

The LUAD Data and Pipeline
The LUAD data from the National Cancer Institute's Genomic Data Commons data portal (https://portal.gdc.cancer.gov/repository)were downloaded on August 5, 2017 using gdc-client.exesoftware.This gave us 594 level-3 RNA-seq (515 cases) and 522 clinical XML datasets.The clinical data are showed in Table 1.The pipeline and its details of this study are showed in Figure 1.

Differential Gene Expression Analysis
The differential expressions of RNA-seq were analyzed using edgeR package [14].
It used empirical Bayesian estimation and accurate tests based on the negative binomial distributions.As edgeR suggested, genes with very low reads were often not interested in differential expression analysis; therefore, the average count-per-million (CPM) was an important criterion which could define whether a gene is reasonably expressed.Then, the package reported log2 (fold change), log2 (counts per million), and corresponding statistical significant and their corresponding error discovery rates.The differential expression genes with upregulation or downregulation were selected based on these parameters.

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Analysis
The GO provides a platform for assorting genes or their products hierarchically into terms.These terms fall into three categories: molecular functions (the molecular activity), cellular component (the functional gene products), and biological processes (the cellular or physiological effects) [15,16].The DAVID 6.7 was used to perform the functional annotation analysis [17], the ggplot2 and the GOplot R packages were used to view the results.
Then we used two methods including the Kobas algorithm [18] and clusterProfiler R package to analyze the KEGG pathway [19] of differential expression genes respectively.
The significant upward and downward differential expression genes from LUAD RNAseq were analyzed, and P value less than 0.05 was considered as the screening criterion.

Gene Set Variation Analysis (GSVA) of KEGG Pathways
A comprehensive human gene annotations document (c5.all.v5.2.symbols.gmt)for the GO function category was downloaded from the Molecular Signatures Database (MSigDB) [20].To reduce mRNA-SEQ data from transcriptional abundance of gene level to transcriptional activity index of gene function level, Gene Set Variation Analysis (GSVA) algorithm [21] was carried out according to enrichment scores.In the differential expression analysis associated with cancer metastasis or TNM stages, the clinical data like lymph node metastasis and TNM stages were selected.The Kruskal-Wallis tests were used to perform the differential expressions in the among multiple cancer groups (N0, N1, N2, and maybe N3; or stage I, II, III, and IV).The Kruskal-Wallis test by grade is a nonparametric substitution method for one-way ANOVA, and this method expands the double-sample Wilcoxon test in the case of more than two groups [22] (see below).

Survival Analyses
Two risk groups were established according to the cut-off value derived from the median of the corresponding gene expressions in the analysis of the associations of patient prognosis with gene expressions.The Kaplan-Meier algorithm and log-rank test were carried out to evaluate the survival differences between the two risk groups, and a P value less than 0.05 was considered to be statistically significant.

Gene Set Enrichment Analysis (GSEA) and Single Gene Set Enrichment
Analysis (Single-GSEA) GSEA assesses genomic level expression data.According to the median of the hub gene expression (high and low expressions), 515 lung cancer samples from the RNA-seq were divided into two groups.These two groups of GSEA were used to identify the potential function of the hub gene and the annotated c5.all.v6.2.symbols.gmtwas selected as the reference gene sets.The difference at the nominal P < 0.05, FDR < 0.05 and the enrichment score (ES) > 0.6 were defined as the cutoff standard.
The single gene "RN7SL494P" (found it related to metastasis and prognosis in this study) related gene sets from Molecular Signatures Database (MSigDB) [23] was used to decide whether the sets show statistical difference comparing the low and the high expression categories with java-dependent GSEA 3.0 software package [24].

Figure 4 .
Figure 4. GO analyses of all differentially expressed genes in LUAD.

Figure 3 .
Figure 3.The overlapping differentially expressed genes associated with nodal
Figure 5A and 5B are examples showing that RN7SL494P expression levels were inversely associated with different pathways.The co-expression genes with the low-expressions of RN7SL494P were abounded in some biological or pathological pathways like Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 5 March 2019 Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 5 March 2019 doi:10.20944/preprints201903.0064.v1NUCLEOTIDE_EXCISION_REPAIR, MISMATCH_REPAIR, CELL_CYCLE, and OXIDATIVE_PHOSPHORYLATION et al. (Figure 5D).These findings suggest that low-expression of RN7SL494P might be associated with cancer development process and poor outcome in patients with lung adenocarcinoma.

Figure 1 .
Figure 1.The pipeline of this study.

Table 2 .
The top 10 significant down-and up-regulated genes associated with lung adenocarcinoma.

Table 3 .
The top 10 significant down-and up-regulated genes associated with lymph node metastasis or TNM stages.

Table 4 .
The gradient changes of differentially expressed genes associated with lymph node metastasis or TNM stages with the Kruskal-Wallis test, and the survival analysis of patients with the differentially expressed genes.

Table 1 .
Clinical and laboratory features of the subjects included in the study.