Identification and validation long non-coding RNAs of oral squamous cell carcinoma by bioinformatics method

Gene markers of oral squamous cell carcinoma (OSCC) have great significance on early diagnosis and treatment of clinical oral cancer. In this study, we used RNA-Seq data from OSCC patients and filtered differentially-expressed long non-coding RNA (lncRNA) to further clarify the molecular mechanism. Firstly, we downloaded datasets of OSCC from National Center for Biotechnology Information(NCBI), which were predicted and analyzed by cufflinks and tophat. Then, differentially expressed lncRNA enrichment was performed with The Database for Annotation, Visualization and Integrated Discovery (DAVID). Finally, we verified the gene expression via in vitro assays. Results showed that 52 lncRNAs were significantly differentially expressed compared to those in normal oral tissues, three highly expressed genes (XLOC_002599, XLOC_002634 and XLOC_132858) were verified by RT-PCR, which was consistent with the prediction. XLOC_002634 (GAS5) transcript levels were reduced both in vivo and in vitro assays, which confirmed that the expression of GAS5 was comparatively low in OSCC. Over-expression of GAS5 in cancer cells inhibited cell proliferation. Moreover, the migration and invasion potential of cancer cells were inhibited compared to control groups. All in all, the study indicated that the decrease in GAS5 expression may contribute to OSCC tumor pathogenesis and serve as a potential target for cancer therapy.


INTRODUCTION
Every year, 1.6 million people all over the world are diagnosed with squamous cell carcinoma of head and neck (SCCHN) [1]. Oral squamous cell carcinoma (OSCC) is one of the highly aggressive tumors and prone to local recurrence and metastasis [2]. Among these people, one in five died from SCCHN, and half of them were killed by OSCC. OSCC is a complicated process involved a lot of steps, multiple factors and aberrant genes. An increasing number of evidences indicated that various regulators were involved with carcinogenesis. However, the pathogenesis of OSCC was not well understood yet [3]. Recent medical studies confirmed that the etiology of OSCC were associated with DNA deletion, heterozygosity loss and mutation, histone acetylation, gene promoter methylation, proto-oncogene activation and over-expression [4,5].
Long non-coding RNA (lncRNA), with transcription length between 200 nt to 100 kb, always exist in nucleus or cytoplasm, they do very little or have no encoding proteins by themselves [6]. LncRNA is an important participant in gene expression, their differential expression may possibly affect the corresponding function performance. The role lncRNA played in life activities was still in its infancy, while evidence represented the close relationship between lncRNA and the development of cancer [7]. Genes were expressed differentially between normal cells and cancer cells. Significantly differentially expressed lncRNAs

Research Paper
might play an important part in cancer pathogenesis. Some genes promoted carcinogenesis while others inhibited [8]. Next generation sequencing techonology (RNA-seq) [9], as a new technology with high precision and reliability has been applied in screening various tumor genes, such as LncRNA Transforming Growth Factor β(ATB) in breast cancer [10], lncRNA nuclear-enriched abundant transcript 1(NEAT1) in prostate cancer [11].
Growth arrest specific 5 (GAS5) was one of the earliest discovered lncRNAs, and its high expression was firstly discovered in growth inhibition rat of NIH3T3 fiber raw cells [12]. Recent studies indicated that GAS5 has been found low expression in many types of tumors including breast cancer [13], colorectal cancer [14] and prostate cancer [15], however, their functional significance still needs to be established. Breast cancers showed a significantly lower GAS5 expression compared to normal breast epithelial tissues, low expression can induce growth arrest and apoptosis independently of other stimuli in breast cancer cell lines [16]. Mourtada-Maarabouni found that RNA interference GAS5 can protect leukemic and primary human T cells from the Rapamycin anti-proliferative effect [17]. All the evidence suggested that the down-regulation of GAS5 closely related to the development and metasis of cancers, and become a hot spot in cancer research.
In the study, we investigated the characteristics of human genome and used RNA-Seq data from OSCC patients to filter significantly differentially expressed lncRNAs. Relevant functions of differently expressed genes were analyzed by bioinformatic method. As a result, the expression level of GAS5 in OSCC was much lower than that in oral normal tissues, which provided an evidence that some relationship between GAS5 and occurrence of OSCC may exist. GAS5 was then transferred into OSCC cells and its roles were investigated in tumor progression. This study suggested that overexpression of lncRNA GAS5 may function as a therapeutic target for OSCC treatment.

Prediction of lncRNA
The data was processed and reconstructed by the transcription group. Results showed that tophat overall read mapping rates were over 99.1%, pair alignment rates were over 79.7%, and 379946 transcripts were assembled by cufflinks. Tablemaker filtered transcripts length, extron number and coverage, and 48287 transcripts left. Known transcripts were screened out and 1426 lncRNAs were obtained through classification code "j" from cuffmerge.

Function analysis
In order to further analyze the differential expression lncRNAs, we performed enrichment analysis of these genes in DAVID. GO-enriched analysis showed that on the biological process (BP), the differential expression genes were associated with biological attachment, cell adhesion, etc.; as to molecular function (MF), the genes were related to actin binding polysaccharide binding and pattern binding; cellular component (CC) was closely correlated with extracellular matrix and cell junction. KEGG pathways of these lncRNAs included small cell lung cancer and ECM-receptor interaction (Figure 3).

Gene expression by RT-PCR
RT-PCR was used to verify the expression of lncRNAs. Three high-expressed lncRNAs were selected, incuding an up-regulated gene (XLOC_002599) and two down-regulated genes (XLOC_002634 and XLOC_132858). The expression level of these three genes reflected the same results as predicted in the OSCC tissues ( Figure 4A). XLOC_002634 (GAS5), as a new gene in OSCC study, presented significantly lower in OSCC samples than that in normal tissues (p < 0.05). Therefore, GAS5 was chosen for the further cell experiment. pcDNA3.1-GAS5 was transferred into cancer cell lines and the expression of GAS5 in C-GAS5 group was significantly increased more than that in C-pcDNA group and Cancer group (each, p < 0.05, Figure 4B), suggesting the successful pcDNA3.1-GAS5 transfection. C-GAS5 group means cancer cells transfected with pcDNA3.1-GAS, C-pcDNA group means cancer cells transfected with vector pcDNA3.1 only, and C group means cancer cells without any treatment.

Cell proliferation
MTT colorimetry was used to detect cell survival, NK activity and cell proliferation. Results revealed that the growth of cancer cells was sharply inhibited after transfection with pcDNA3.1-GAS5 for 24 h. Furthermore, the cell survival of C-GAS5 group was much lower than C-pcDNA group and C group after 36 h incubation, and GAS5-transfected group was reduced steadily in the next 12 h (Figure 5). Significant Difference was shown before and after GAS5 transfection experiment. However, there was no difference of the proliferation between C-pcDNA group and C group, indicating that vector pcDNA3.1 may have no effect on the growth of OSCC cells.

Cell migration
Cell migration ability was detected with wound healing assay. As shown in Figure 6, the wound mask color showed the increased level, for example, black represented initial wound mask, and grey represented the revised area. Cells transfected with pcDNA3.1-GAS5 appeared heldup while non-transfected cancer cells spread extensively and covered larger grey area. These results indicated that overexpressed GAS5 can inhibit the migration of oral cancer cells in vitro.

Cell invasion
Meanwhile, invasion ability of cells was detected with Transwell assay. Images were captured at random after 48 h. As shown in Figure 7, the number of C-GAS5 cells was decreased for about half, which showed lower intracellular transport capability after transfection. In other words, the down-regulation of lncRNA GAS5 may

DISCUSSION
Extensive research has revealed that the abnormal expression of lncRNAs may be closely related to tumor and can be used as markers of cancer intestinal diagnosis [18]. In the process of genetic information, lncRNAs play an important role in cell regulatory with gene expression and affect the main cell pathway [19]. Due to various lncRNAs with different regulation mode, predicting and differential expression analysis, lncRNA of OSCC will be of great importance to seek the potential of gene therapy. In this study, we used several bioinformatics tools and screened lncRNAs from OSCC RNA-Seq data. According to differential expression analysis, 52 lncRNAs were found including 31 up-regulated and 21 downregulated lncRNAs. Functional enrichment analysis and protein-protein interaction network analysis for lncRNAs were carried out with bioinformatics method. The differentially expressed genes were analyzed in DAVID  database to determine certain regulatory function, which provided effective information for OSCC basic research and clinical application.
Among these significant differencially expression genes, GAS5 was a common down-regulated genes, and this type of genes was seen as anti-oncogenes, which can affect cell invasion and metastasis in tumor. GAS5 was a new gene for OSCC and chosen as a target gene for validation. RT-PCR results showed that GAS5 expression in OSCC was obviously lower than that in normal tissues. In addition, we transferred GAS5 into OSCC cell lines to increase the gene expression. The OSCC cells were incubated for 48 h and cell proliferation was analyzed using MTT assays, showing that tumor cells were inhibited obviously in C-GAS5 group; wound healing experiment turned out lower cell migration ability and invasion cell number decreased significantly in transwell assay after GAS5 transfection. All of the above results revealed that the over-expression of GAS5 inhibited tumor proliferation, migration and invasion ability, suggesting that the down-regulated expression of GAS5 correlated with OSCC occurrence and development.
In summary, bioinformatics method was used to select lncRNA and applied in the study in vitro and in vivo. Three significantly different expressed genes were verified by RT-PCR, and cell experiment showed interference of GAS5 expression can inhibit the proliferation and metastasis of tumor. In other words, over-expressed GAS5 can inhibit tumor growth and induce cell apoptosis, which may be regarded as an anti-oncogene for OSCC. It can be also clinically used as a new tumor marker and provided a new target for the treatment of OSCC. However, further clinical research and exploration still needed to seek the molecular mechanism for the probable regulation of OSCC biological behavior.

RNA-seq datasets of OSCC in this experiment
were obtained from NCBI [20] (https://www.ncbi.nlm. nih.gov/sra/ERR519502/). Two groups contain cancer group C and control group N, with 10 samples separately. These 20 samples were analyzed by Immila and the data quality was controlled with FASTQC [21] to guarantee reliable analysis process in the next steps. The data was firstly mapped to the reference genome (Homo sapiens hg38) using Tophat [22]. Then, Cufflinks were adopted to assemble these alignments sequence and Cuffmerge to combine [23]. After that, table maker was introduced to statistical computations, screening out the transcripts with single extron, transcripts' extron length larger than 200 bp and coverage no less than 3. The known noncoding protein transcripts were filtered out by comparing the data with human reference genome database such as UCSC and ENSEMBL. Finally, coding potential of left data were explored by CPAT [24], and protein domains were generally analyzed with HAMMER3.

Differential expression analysis
FPKM was calculated in step cufflinks, here we used cuffdiff procedure to analyse the differentially expressed transcripts. These results were then visualized by R, with CummeRbund package [23], volcano image and barplot of genes were obtained. DAVID [25] database was introduced to realize the enrichment (GO term) [26] and pathway analysis (KEGG pathway) [27].

RT-PCR validation experiment
OSCC tissues and normal oral epithelium tissues were obtained from Dental Hospital of Chongqing Medical University. The primers were designed using Primer 5 and synthesized by Invitrogen. Primers were designed according to the target genes, and the sequences of the primers were as follows.

In vitro validation experiment
OSCC cell lines were obtained from Chongqing Manuik company and cultured in 10% fetal bovine serum and RPMI1640 medium.
Gene over-expression: Lipofectamine RNAiMaX bought from Invitrogen (Carlsbad, CA) was employed as the cell transfection reagent. Cancer cells were plated without antibiotics, and transfected with 1ng pcDNA3.1-GAS5 as grown to 75-90% confluence. RNA extraction, retro-transcription and qPCR were performed.
MTT assay: Cells were collected after 48h transfection. 0.5% MTT were added into 1×10 6 /ml cell suspension, and cells cultured continuously for 4 h, then dimethyl sulfoxide ( DMSO ) was used to extract crystalline materials. Light absorption value of each hole was obtained with ELISA Reader(OD 570 nm) and repeated four times.
Wound healing assay: Cells were seeded on 6-well plate with transfection protocols. After that, a line was drawn in the middle of the board using a 200 µL pipette Statistical analysis count numbers of cancer cells and C-GAS5 cells in the same view (p < 0.05). www.impactjournals.com/oncotarget tip. Pictures were taken by inverted microscope (×50) at 0-48 h post-wounding, and wound mask was calculated with Image Pro Plus software.
Invasion assay: Matrigel mixed with MEM(1:2) were added in Transwell upper room and reacted to gel for 30 min in 37°C, lower room was filled with 500 µL serum medium. Cells were processed into suspension, 3 × 10 4 numbers of cells were then added in upper room, discarded the supernatant after 48h and fixed with poly formaldehyde in 3 min, followed with cristal violet staining in 5 min. Different inverted microscope visions (×50) were captured to caculate quantification of invading cells.

Statistical analyses
All the experimental data were analyzed by SPSS18.0 software, with ANOVA single factor analysis and LSD t-test. Measurement data were written in x ± s, and significance was set at p < 0.05.

CONFLICTS OF INTEREST
No potential conflicts of interest were declared.