Identification of novel gene expression signature in lung adenocarcinoma by using next-generation sequencing data and bioinformatics analysis

Lung adenocarcinoma is one of the leading causes of cancer-related death worldwide. We showed transcriptomic profiles in three pairs of tumors and adjacent non-tumor lung tissues using next-generation sequencing (NGS) to screen protein-coding RNAs and microRNAs. Combined with meta-analysis from the Oncomine and Gene Expression Omnibus (GEO) databases, we identified a representative genetic expression signature in lung adenocarcinoma. There were 9 upregulated genes, and 8 downregulated genes in lung adenocarcinoma. The analysis of the effects from each gene expression on survival outcome indicated that 6 genes (AGR2, SPDEF, CDKN2A, CLDN3, SFN, and PHLDA2) play oncogenic roles, and 7 genes (PDK4, FMO2, CPED1, GNG11, IL33, BTNL9, and FABP4) act as tumor suppressors in lung adenocarcinoma. In addition, we also identified putative genetic interactions, in which there were 5 upregulated microRNAs with specific targets - hsa-miR-183-5p-BTNL9, hsa-miR-33b-5p-CPED1, hsa-miR-429-CPED1, hsa-miR-182-5p-FMO2, and hsa-miR-130b-5p-IL33. These 5 microRNAs have been shown to be associated with tumorigenesis in lung cancer. Our findings suggest that these genetic interactions play important roles in the progression of lung adenocarcinoma. We propose that this molecular change of genetic expression may represent a novel signature in lung adenocarcinoma, which may be developed for diagnostic and therapeutic strategies in the future.


INTRODUCTION
Lung cancer is one of the leading causes of cancerrelated death worldwide [1]. The development and progression of lung cancer has been widely studied. Briefly, the genetic alterations or mutations occurred in a single cell, leading to cellular transformation and thus expansion into a malignant tumor [2]. Non-small cell Research Paper www.impactjournals.com/oncotarget lung carcinoma (NSCLC) accounts for about 80-85% of all lung cancers [3,4], of which lung adenocarcinoma (40%) is the most common subtype of NSCLC [5]. Surgery to remove cancer, with or without radiotherapy/ chemotherapy, is the standard approach for early stage lung cancer. However, the recurrence of distant metastasis [6,7] or resistance to therapy [8,9] often occurs, and such phenomenon is associated with critical genetic alterations involved in various biological mechanisms.
The genetic alterations related to cellular transformation are involved in various biological processes, including transcription [10], DNA repair [11], cell cycle progression [12], apoptosis [13], migration ability [14], and metabolism [15,16]. In lung cancer, many genes have been identified as oncogenes or tumor suppressor genes, which modulate varieties of molecular functions involved in tumor development and progression [17,18]. Recently, small RNAs have been found to play important roles in lung cancer. MicroRNAs are a group of small non-coding RNAs containing 20-26 nucleotides, which can regulate gene expression via binding to the 3' untranslated region (3′UTR) of specific messenger RNAs (mRNAs). This interaction can cause mRNAs' degradation or translation inhibition [19]. The signaling pathways associated with microRNAs targeting oncogenes or tumor suppressor genes have been reported to be involved in lung cancer progression [20]. Alterations of many microRNAs in chromosome regions associated with various cancers have also been implicated [21].
Next-generation sequencing (NGS) is a powerful method to screen the entire transcriptomic profile, including messenger RNAs or small RNAs [22]. In this study, we attempted to identify the differentially expressed genes and genetic interactions of target gene-microRNA in lung adenocarcinoma combined with systematic analysis, by using bioinformatics tools, including the Oncomine [23], Gene Expression Omnibus (GEO) [24], PrognoScan [25], Kaplan-Meier plotter [26], SurvExpress [27], and miRmap databases [28]. We sought to identify novel gene expression signature and/or genetic interactions in lung adenocarcinoma via systematic bioinformatics analysis. Hopefully, the approach and findings from this study will provide new perspectives on the development of diagnostic and therapeutic strategies for lung adenocarcinoma.

Identification of differentially expressed genes as a molecular signature in lung adenocarcinoma
To investigate genetic expression changes in lung adenocarcinoma, we analyzed the transcriptomic profiles in 3 pairs of human specimens from lung adenocarcinoma and its adjacent normal lung tissue using next-generation sequencing ( Figure 1). We focused on protein-coding RNAs and Venn diagram analysis which showed that 9 genes were upregulated ( Figure 1A), whereas 8 genes were downregulated ( Figure 1B) in lung adenocarcinoma tissue compared to adjacent normal lung tissue. The analysis criteria were fold change > 2 and fragments per kilobase million (FPKM) > 0. 3. The hierarchical color clustering showed the expression pattern of each gene with z-scores (log2) in these 3 pairs of specimens ( Figure 1C). The list of 17 differentially expressed genes with FPKM is shown in Table 1. To investigate whether this genetic expression pattern could represent a molecular signature in lung adenocarcinoma, we investigated these genes in the Oncomine database, which contains different data sets of specimens from lung adenocarcinoma and normal lung tissue. We selected 7 datasets from the Oncomine database for comparison of gene expression, including Hou (normal = 65 and lung adenocarcinoma = 45), Landi (normal = 49 and lung adenocarcinoma = 58), Selamat (normal = 58 and lung adenocarcinoma = 58), Okayama (normal = 20 and lung adenocarcinoma = 226), Su (normal = 30 and lung adenocarcinoma = 27), Wei (normal = 25 and lung adenocarcinoma = 25), and Stearman (normal = 19 and lung adenocarcinoma = 20). The heatmap analysis indicated that the genetic expression patterns of 17 genes were similar among these datasets (Figure 2), which suggests that this molecular change is consistent in lung adenocarcinoma, and may represent a novel genetic signature in lung adenocarcinoma.

Analysis of TOX3 and SPDEF in lung adenocarcinoma
The mRNA expression of TOX3 and SPDEF between lung adenocarcinoma and normal lung tissue derived from the Oncomine database is listed in Table 3. The analysis criteria were fold change > 2, p-value < 1E-04, and gene ranking in the top 10%. The results showed that both TOX3 and SPDEF are significantly upregulated in lung adenocarcinoma, compared to normal tissue. In addition, we selected a microarray with the accession number of GSE10072 from the Gene Expression Omnibus (GEO) database for gene expression analysis. This array contains 31 pairs of clinical lung adenocarcinomas and adjacent normal tissue. The results showed that the mRNA expression of either TOX3 ( Figure 3A) or SPDEF ( Figure 3B) is upregulated in lung adenocarcinoma. To further investigate the role of TOX3 and SPDEF expression in cancer progression, we performed a survival curve analysis to evaluate the effects of gene expression in lung cancer patients with lung adenocarcinoma. The results indicated that the population with higher TOX3 expression has better survival rates ( Figure 3C-3G), whereas the population with higher SPDEF expression has poorer survival outcome ( Figure 3H, 3I). The prognostic values of TOX3 and SPDEF expression in lung adenocarcinoma were shown as forest plots ( Figure 3J), which were derived from the PrognoScan database with a Cox p-value < 0.05, and the Kaplan-Meier plotter database with a log-rank p-value < 0.05.

Analysis of PDK4, FMO2, and FABP4 in lung adenocarcinoma
mRNA expression of PDK4, FMO2, and FABP4 between lung adenocarcinoma and normal lung tissue was analyzed by using the Oncomine database, and listed in Table 4. The expressions of PDK4, FMO2, and FABP4 are significantly downregulated in lung adenocarcinoma compared to normal lung tissue in several datasets, and this phenomenon was also observed in GSE10072 array ( Figure 4A-4C). In cancer patients with lung Figure 1: Identification of differentially expressed genes in lung adenocarcinoma compared to adjacent normal tissue using next-generation sequencing. Venn diagram analysis showed 9 upregulated genes (A) and 8 downregulated genes (B) in lung adenocarcinoma, compared to adjacent non-adenocarcinoma tissue from 3 pairs of clinical specimens. The criteria were fold change > 2 (tumor/normal) and fragments per kilobase million (FPKM) > 0.3. (C) The heatmap diagram showed the differentially expressed genes with z-score (log2) values by using color clustering on GENE-E web-tool. Green represents downregulation (minimum = -2.5), and red represents upregulation (maximum = 2.5).
adenocarcinoma, the survival curve analysis indicated that the population with higher expression of PDK4 ( Figure  4D

Analysis of CDKN2A, NDRG4, SFN, and PHLDA2 in lung adenocarcinoma
We observed that the expression of CDKN2A, PHLDA2, and SFN are upregulated and the expression of NDRG4 is downregulated in lung adenocarcinoma when compared to normal lung tissue, which was also confirmed by the Oncomine database, listed in Table 5. In the GSE10072 array, expression levels of CDKN2A, PHLDA2, and SFN are significantly upregulated in lung adenocarcinoma compared to normal lung tissue in several datasets ( Figure 5A-5C), and NDRG4 is downregulated ( Figure 5D). The survival curve analysis showed that lung adenocarcinoma patients with high levels of CDKN2A expression are associated with poorer survival rates ( Figure 5E). The expression level of NDRG4, however, has no significant effects on survival outcomes for patients with lung adenocarcinoma ( Figure 5F). Higher expressions of PHLDA2 ( Figure

Analysis of AGR2, CLDN3, and AQP5 in lung adenocarcinoma
Analysis of mRNA expression from the Oncomine database revealed that AGR2, AQP5, and CLDN3 are upregulated in lung adenocarcinoma, compared to normal lung tissue, and these results are listed in Table 6. In the GSE10072 array, we also observed that expression levels of AGR2 and CLDN3 are significantly upregulated in lung adenocarcinoma when compared to normal lung tissue ( Figure 6A, 6B). However, the expression of AQP5 showed no significant change in the GSET10072 array ( Figure 6C). The survival curve analysis showed that high expression of CLDN3 is correlated with poorer rates of survival in lung adenocarcinoma patients ( Figure 6D). However, the population with a higher expression of AQP5 has better survival outcome ( Figure 6E-6G). Higher expression of AGR2 is also associated with poorer survival rates ( Figure 6H). The prognostic values of AGR2, CLDN3, and AQP5 were shown as forest plots ( Figure 6I).

Analysis of IL33 in lung adenocarcinoma
The mRNA expression of IL33 is significantly downregulated in lung adenocarcinoma compared to normal tissue (Table 7). We also found that IL33 expression is decreased in lung adenocarcinoma identified from the GSE10072 array ( Figure 7A). The survival curve analysis performed using the SurvExpress database showed that the high risk population with lower expression of IL33 has poorer survival outcomes for lung adenocarcinoma patients ( Figure 7B, 7C). This phenomenon was also observed in the PrognoScan ( Figure 7D-7F) and Kaplan-Meier plotter databases ( Figure 7G). The prognostic values of IL33 in lung adenocarcinoma was illustrated as a forest plot ( Figure 7H).

Analysis of ZDHHC9, BTNL9, GNG11, and CPED1 in lung adenocarcinoma
The mRNA expression of BTNL9, GNG11, or CPED1 is downregulated in lung adenocarcinoma when compared to normal tissue, and ZDHHC9 is upregulated (Table 8). In the GSE10072 array, GNG11 ( Figure 8A) Seventeen differentially expressed genes (9 up and 8 down) identified from 3 pairs of clinical lung adenocarcinoma were selected. Raw data were extracted and re-plotted by GENE-E web-tool, and the relative color scheme used for clustering analysis. Yellow represents high expression (maximum = 1) and blue represents low expression (minimum = 0). The gene symbols and corresponding specific probes are displayed on the right side of each diagram. www.impactjournals.com/oncotarget The gene expression of TOX3 (A) and SPDEF (B), comparing 31 pairs of clinical lung adenocarcinoma (red) and adjacent normal tissue (blue), was performed on GSE10072 microarray from the GEO database. (3 probes for TOX3 and SPDEF respectively in GSE10072) The p-value of gene expression was calculated by t-test with Wilcoxon matched-pairs signed rank test. *** represents p < 0.001, ** represents p < 0.01. The survival curves comparing 2 populations with high (red) and low (black) gene expression in lung adenocarcinoma patients were performed on the PrognoScan database -TOX3 (C-F) and SPDEF (H, I), and the Kaplan-Meier plotter database -TOX3 (G). The analysis criteria of the PrognoScan and Kaplan-Meier plotter databases were Cox p-value < 0.05 and log-rank p-value < 0.05 respectively. The raw data of GEO and PrognoScan databases were extracted and re-plotted by GraphPad Prism 5 software. (J) The forest plots showed hazard ratios (95% CI, confidence interval) identified from PrognoScan and Kaplan-Meier plotter databases. and CPED1 ( Figure 8B) expression is downregulated in lung adenocarcinoma. There is no specific probe for ZDHHC9 and BTNL9 in the GSE10072 array. Survival curve analysis revealed that lung adenocarcinoma patients with high expression of ZDHHC9 ( Figure 8C

Identification of genetic regulation in lung adenocarcinoma using next-generation sequencing
We simultaneously performed small RNA-seq in these 3 pairs of specimens using next-generation sequencing ( Figure 9). We focused on microRNAs and found 22 upregulated microRNAs in lung adenocarcinoma using Venn diagram analysis ( Figure 9A), which is listed in Table  9. The analysis criteria were fold change > 2 and reads per million (RPM) > 1. No microRNA with downregulated changes were found in our analysis ( Figure 9B). Heatmap color clustering showed the expression patterns of each upregulated microRNA from these 3 pairs of specimens ( Figure 9C). To further elucidate the genetic interactions in lung adenocarcinoma, we performed miRmap database for target prediction. Among 22 upregulated microRNAs, there were 13 putative targets, shown in the "Targets" Venn diagram ( Figure 9D). The prediction threshold was miRmap score > 90.0. The Venn diagram analysis between 13 targets of microRNAs and 8 downregulated genes, shown in Figure 1B, indicates that there were 10 genetic interactions of microRNA-mRNA in lung adenocarcinoma ( Figure 9D), which is listed in Table 10. Only 6 genes were involved in these 10 genetic interactions, due to the 3 genes have been attributed to different microRNAs.

DISCUSSION
Lung cancer, one of the leading causes of cancerrelated death worldwide [29], still has much that requires  further study. In our project, we hoped to identify novel gene expression signature or genetic interactions of gene-microRNA in lung adenocarcinoma by using next-generation sequencing combined with systematic bioinformatics analysis. We found 17 differentially expressed genes in lung adenocarcinoma compared to its adjacent normal lung tissue, which were classified into 6 functional groups based on a search of the literature. These results indicated that tumor progression is involved in alterations of various biological functions. We then summarized the potential oncogenic and tumor suppressor roles of these genes in lung adenocarcinoma (Supplementary Table 2).
TOX3 contains an HMG-box (high mobility group box) domain. The function of TOX3 remains unclear, but it may be involved in various DNA-dependent processes [30][31][32]. TOX3 polymorphisms and epigenetic regulation have been demonstrated in breast cancer [33] and lung cancer [34] respectively. In our study, TOX3 was significantly upregulated in lung adenocarcinoma, and higher expression of TOX3 is correlated with better survival outcome. We speculated that as more factors may be involved in TOX3-related mechanisms of tumor progression, more studies are needed to clarify the relationship between TOX3 expression and tumor progression.
SPDEF containing ETS domain has been reported to be overexpressed in many cancers [35][36][37]. Our study suggests that SPDEF may play an oncogenic role in lung adenocarcinoma, although some reports have shown that SPDEF can suppress cancer metastasis [38]. However, contrary effects of SPDEF on tumorigenesis require further research.
PDK4 is a mitochondrial protein that can regulate glucose metabolism through inhibition of pyruvate dehydrogenase complex. An aberrant metabolism is one of the characteristics of cancer cells. In liver cancer, PDK4 has been identified as a potential tumor suppressor [39].  Ironically, PDK4 exerts oncogenic effects in colon cancer [40]. In our study, PDK4 played a potential tumor suppressor role in lung adenocarcinoma. FMO2 is an NADPH-dependent enzyme that catalyzes the oxygenation of substrates [41], but the effect of FMO2 on tumorigenesis is unclear. Although genetic polymorphisms of FMO genes may influence drug metabolism [42], we found that FMO2 might have tumor suppressor effects in lung adenocarcinoma.
FABP4 is involved in fatty acids trafficking and metabolism. Fatty acids serve as both an energy source and signaling molecules that can regulate various cellular functions [43]. The dysfunction of FABP proteins has been found to be associated with some metabolic diseases [44], and elevated FABP4 has been observed in many types of cancer [45][46][47]. Our data showed that FABP4 may have tumor suppressor effects in lung adenocarcinoma.
CDKN2A encodes two spliced transcripts, p16 INK4a and p14 ARF , which regulate cell cycle progression through inhibition of CDK4 kinase and p53 respectively [48]. CDKN2A has been shown as a tumor suppressor in cancer progression [49], and its alterations, including epigenetic modifications, deletion, and mutations, frequently occur in cancers [50]. In our study,   CDKN2A may have potential oncogenic effects in lung adenocarcinoma. PHLDA2, located in an imprinted region on chromosome 11p15.5, has primarily been studied for its regulation of placental growth [51]. Although the role of PHLDA2 in cancer is unclear, our data showed that PHLDA2 may potentially exert oncogenic effects in lung adenocarcinoma.
SFN is involved in protein synthesis and epithelial cell growth. Numerous reports have demonstrated the molecular functions of SFN in keratinocytes and fibroblasts [52]. Furthermore, elevated expression of SFN has been reported in lung adenocarcinoma [53], and our study also found that SFN may exert oncogenic effects in lung adenocarcinoma.
NDRG4 is involved in the regulation of cell cycle progression [54] and has been identified as a novel tumor suppressor in colon cancer [55]. We found that NDRG4 levels were significantly decreased in lung adenocarcinoma, but with regard to survival analysis, the expression of NDRG4 has shown no significant influence on survival rates of lung cancer patients.
AGR2 is an endoplasmic reticulum (ER) protein which can catalyze protein folding. Its oncogenic role and increased expression have been reported in different types of cancer [56][57][58]. In our study, we found that AGR2 may serve as a potential prognostic biomarker of lung adenocarcinoma.
AQP5, aquaporin 5, is a water channel protein involved in pulmonary secretions, and elevated expression of AQP5 is associated with poor survival outcome in many types of cancer [59][60][61]. In our study, the expression of AQP5 was upregulated in lung adenocarcinoma, and its high expression correlated with better survival outcome.
CLDN3 regulates tight junctions of cell-cell adhesion in epithelial or endothelial cells and is overexpressed in ovarian [62] and colon cancer [63]. Loss of claudin 3 expression increases the metastatic ability of esophageal cancer [64], whereas claudin 3 is upregulated in lung adenocarcinoma [65]. Our study showed that claudin 3 may have oncogenic effects in lung adenocarcinoma.
IL33 is a cytokine involved in a spectrum of biological processes, and the chronic inflammatory signaling activation is known to be involved in cancer progression. The expression of IL33 in tumor tissues is depressed, but tumor stroma and serum have increased levels of IL33, suggesting the distinct functions of IL33 in cancer cells from the microenvironment [66]. IL-33 is shown to promote tumorigenesis and induce stemness in breast cancer [67]. In Apc Min/+ mice, epithelial-derived IL-33 can promote intestinal tumorigenesis [68]. These reports indicated the function of IL-33 in tumorigenesis is controversial. In our analysis, we found that IL33 may have tumor suppressor functions in lung adenocarcinoma.
ZDHHC9 is a palmitoyltransferase that can regulate palmitoylation of HRAS and NRAS. The function of ZDHHC9 in cancers is unclear, although inactivation of ZDHHC9 can reduce leukemogenic effects through repression of oncogenic NRAS [69]. According to our data, increased ZDHHC9 is observed in lung adenocarcinoma, but its high expression is correlated with better rates of survival.
BTNL9 belongs to the immunoglobulin superfamily, with the butyrophilin family modulating immune homeostasis [70]. Although the function of BTNL9 in tumorigenesis remains unclear, our results suggest that BTNL9 may serve as a tumor suppressor.  GNG11 is a lipid-anchored protein, which has been reported to inhibit cell growth [71] and regulate cellular senescence in lymphoma [72]. We found that GNG11 may exert suppressor functions in the tumorigenesis of lung adenocarcinoma.
CPED1, also known as C7orf58, contains a cadherin-like beta sandwich domain [73], but the molecular function of CPED1 is unclear. Our data showed that CPED1 may play a role as a potential tumor suppressor in lung adenocarcinoma.
The summary of differentially expressed genes in the Oncomine database is shown in Supplementary Table 3. Seven of a total of 11 datasets showed similar patterns of genetic expression, suggesting that this molecular change is constant between lung adenocarcinoma and normal lung tissue. Thus, those genes found in this study may represent a novel gene expression signature in lung adenocarcinoma ( Figure 10). The increased expression of AGR2 and decreased expression of IL33 have also been identified in other reports. We also analyzed the expression of microRNAs in lung adenocarcinoma (Table 9). Twenty-too upregulated microRNAs were identified in lung adenocarcinoma. We focused on microRNAs with predictable putative targets -BTNL9, FMO2, IL33, CPED1, and PDK4. Among these microRNAs, elevated expression of hsa-miR-183-5p [74], hsa-miR-33b-5p [75], hsa-miR-429 [76], hsa-miR-182-5p [77], and hsa-miR-130b-5p [78] have been associated with tumorigenesis in lung cancer. The function of hsa-miR-542-3p is unclear. However, since the genetic interactions of hsa-miR-183-5p-BTNL9, hsa-miR-33b-5p-CPED1, hsa-miR-429-CPED1, hsa-miR-182-5p-FMO2, hsa-miR-130b-5p-IL33, and hsa-miR-542-3p-IL33 have not been identified, these altered genetic regulations may play important roles in the progression of lung adenocarcinoma.   The "Targets" Venn diagram shows the predicted genes of microRNAs from the "microRNAs" Venn diagram using the miRmap web-site database. The selection threshold was miRmap score ≥ 90.0. The intersection Venn diagram between "mRNAs" and "Targets" showed total of 6 potential microRNA-mRNA interactions.

Clinical lung adenocarcinoma specimens
of KMUH, and informed consent was obtained from all patients in accordance with the Declaration of Helsinki.

Next-generation sequencing (NGS)
The expression profile of mRNA and microRNA was performed using NGS [22]. Three pairs of lung adenocarcinomas and adjacent normal specimens were used in this project. Total RNA was extracted by using Trizol® Reagent (Invitrogen, USA), according to the manufacturer's instructions. The cell lysates were applied to Welgene Biotechnology Company (Welgene, Taipei, Taiwan) for RNA-seq and small RNA-seq analysis. The criteria for differentially expressed mRNA analysis were fold change > 2 and fragments per kilobase million (FPKM) > 0.3. The criteria for differentially expressed microRNAs' selection were fold change > 2 and reads per million (RPM) > 1.

Oncomine database analysis
The Oncomine database contains over 18,000 microarray experiments and 35 major cancer types [23]. The raw data of mRNA expression in clinical lung adenocarcinoma and normal specimens (cancer vs. normal) were extracted from the Oncomine database (http://www.oncomine.org, Compendia biosciences, Ann Arbor, MI, USA). The criteria in the analysis were P-value < 1E-4, fold change > 2, and gene rank in top 10%. P-value was calculated using the Oncomine database through two-sided Student's t-test. For the comparison of genes in each dataset, raw data was extracted and replotted using the GENE-E web-tool (https://software. broadinstitute.org/GENE-E/), and the relative color scheme was used for clustering as minimum = 0 (blue) and maximum = 1 (yellow). Eleven datasets were selected

SurvExpress database analysis
SurvExpress integrates the TCGA database (https:// tcga-data.nci.nih.gov) which provides microarray information, including cancer type, survival, and gene expression. The correlation between IL33 mRNA expression and survival rate was analyzed on the SurvExpress web-databse (http://bioinformatica.mty. itesm.mx/SurvExpress). The dataset lung adenocarcinoma TCGA (N = 255) was used in our analysis. Samples of each dataset were split into 2 risk groups (high and low risk) of the same size, of which each group was determined by the ordered Prognostic Index (PI, high value for high risk) [27]. Prognostic Index (PI) is the linear component of the Cox model, computed by gene expression value multiplied with values estimated from the Cox fitting [90].

PrognoScan database analysis
PrognoScan collects information of the GEO (Gene Expression Omnibus) microarray database, including cancer type, survival rates, and gene expression. The correlation between gene expression and overall survival rates was performed on the PrognoScan web-databse (http://www.abren.net/PrognoScan/) [25]. The raw data were extracted and re-plotted by GraphPad Prism 5 software (GraphPad Software, La Jolla, CA, USA). The threshold was determined as Cox p-value < 0.05. Samples of each dataset were divided into 2 expression groups (high and low) at the potential cutpoint. The cutpoint (from < 0.1 or > 0.9 quantile) was estimated by the minimum P-value approach [91], and the P-value correlation was calculated by the formula [92]. The hazard ratios (95% confidence intervals) of each dataset was calculated using the Cox proportional model, and are listed in the related Tables. HR = 0 represents no difference between 2 groups, HR < 1 represents better survival rate in the population with higher levels of expression, and HR > 1 represents better survival rates in the population with lower levels of expression. The specific probe of each dataset is listed in its related Figure.

Kaplan-Meier plotter database analysis
The Kaplan-Meier plotter is a web-database providing the information on 54675 genes' expression and survival rates in 10461 cancer samples, including 5143 breast, 1816 ovarian, 2437 lung, and 1065 gastric cancer patients [26]. The correlation of gene expression and overall survival rates in lung cancer was determined, and lung adenocarcinoma (N = 720) was selected in our analysis. Patients were split into 2 populations with the best cut-off, which was computed with median survival. The hazard ratios (95% confidence intervals) were calculated using the Cox proportional model, and are listed in the related Tables. The specific probe of each dataset was listed in its related Figure.

Gene expression omnibus (GEO) database analysis
GEO is a web-database providing submitted high throughput gene expression data of microarrays, chips, or NGS (https://www.ncbi.nlm.nih.gov/geo/) [24]. We selected the microarray with accession number GSE10072, published in 2008 [80], for this project. This microarray provides gene expression information of 180 clinical lung adenocarcinoma and non-tumor samples. Here, we selected 31 pairs of lung adenocarcinoma with adjacent normal tissue for gene expression analysis. The raw data were analyzed and extracted from GEO2R (https://www. ncbi.nlm.nih.gov/geo/geo2r/), and re-plotted by GraphPad prism 5 software (GraphPad Software, La Jolla, CA, USA). The p-value was calculated by using t-test with Wilcoxon matched-pairs signed rank test.

miRmap database analysis
miRmap is a web-tool database providing analysis of microRNA targets prediction (http://mirmap.ezlab. org/) [28]. It identifies the putative target genes by calculating the complementary ability of microRNA-mRNA interactions. The strength of mRNA repression is estimated for ranking potential candidate targets by employing various features, including thermodynamic, evolutionary, probabilistic or sequence-based features. The prediction results show a list of putative target genes with miRmap scores, which are a predictive reference values. Putative targets with miRmap scores ≥ 90.0 were selected for this project.

Statistical analysis
The raw data extracted from GEO database were statistically analyzed using t-test with Wilcoxon matchedpairs signed rank test by GraphPad Prism 5 software (GraphPad Software, La Jolla, CA, USA).