Systemic bioinformatics analysis of recurrent aphthous stomatitis gene expression profiles

Recurrent aphthous stomatitis (RAS) represents the most common chronic oral diseases with the prevalence ranges from 5% to 25% for different populations. Its pathogenesis remains poorly understood, which limits the development of effective drugs and treatment methods. In this study, we conducted systemic bioinformatics analysis of gene expression profiles from the Gene Expression Omnibus (GEO) to identify potential drug targets for RAS. We firstly downloaded the gene microarray datasets with the accession number of GSE37265 from GEO and performed robust multi-array (RMA) normalization with affy R programming package. Secondly, differential expression genes (DEGs) in RAS samples compared with control samples were identified based on limma package. Enriched gene ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of DEGs were obtained through the Database for Annotation, Visualization and Integrated Discovery (DAVID). Finally, protein-protein interaction (PPI) network was constructed based on the combination of HPRD and BioGrid databases. What’s more, we identified modules of PPI network through MCODE plugin of Cytoscape for the purpose of screening of valuable targets. As a result, 915 genes were found to be significantly differential expression in RAS samples and biological processes related to immune and inflammatory response were significantly enriched in those genes. Network and module analysis identified FBXO6, ITGA4, VCAM1 and etc as valuable therapeutic targets for RAS. Finally, FBXO6, ITGA4, and VCAM1 were further confirmed by real time RT-PCR and western blot. This study should be helpful for the research and treatment of RAS.


INTRODUCTION
Despite the rapid development of anticatarrhals and treatment methods, recurrent aphthous stomatitis (RAS) still represents the most common chronic oral disease, which suffered from a long-term painful and characterized by a yellowish ulcer and with an erythematous halo that heals spontaneously [1][2][3]. The prevalence of RAS could variy from 5% to 25% for people with different diet habits and come to the highest for people between 10-40 years of age [4]. Several inducing factors, such as work-rest schedule, stress, smoking, stress, and etc are summaried, and these factors may joint to influence the pathological process. In addition, previous study also showed that RAS could be induced by the abnormal mucosal cytokine cascade, which precipitated by clinical or subclinical trauma and leading to an enhanced cell-mediated immune response directed toward focal areas of the oral mucosa [5,6]. What's more, Wray et al [7] even reported that RAS could induce the initiation of oral squamous cell carcinoma for susceptible population. However, the underlying etiology and pathogenesis remain poorly understood. The progress of molecular has accelerate the emergence of novel drugs and therapeutic methods for the pain relase and reduction of RAS recurrence frequency, while, there is still no effective methods for its healing.
In the past decades, gene microarray has been as a valid method for the parallel quantification of expression values for hundreds, and even thousands of genes at once time [8]. Besides, the application of gene microarray has also promoted the identification of multi disease-related genes, including RAS related genes [9]. In this study, we conducted systemic bioinformatics analysis, including screening of differential expression genes (DEGs), functional enrichment analysis, and network analysis, of gene expression profiles of RAS and adjacent normal tissues which detected through gene microarray for the identification of potential RAS related genes. Besides, several potential RAS related genes identified by bioinformatics were further confirmed by real time RT-PCR. Our study should shed light on the mechanisms of RAS and provide valuable biomarkers for RAS.  Figure 1 illustrated the relative mRNA levels in all of the samples after normalization. Based on the criteria of FDR < 0.05 and fold change > 2, we identified a total of 915 DEGs in RAS samples compared with adjacent nonulcer tissue. Strikingly, more than three quarters (707/915) of DEGs were found to be up-regulated in RAS, while, only 208 genes were down-regulated.

Enriched functions
A total of 215 GO terms and 24 KEGG pathways were found to be significantly enriched in the DEGs. Table  1 shows the top 20 most significantly enriched GO terms which mainly involved in immune and inflammatory response. The full list of KEGG pathways were shown in Table 2. Consistent with the results of GO terms, most of KEGG pathways are closely associated with immune and inflammatory response, such as Cytokine-cytokine receptor interaction, Natural killer cell mediated cytotoxicity. Figure   2 illustrated the significantly enriched KEGG pathways and their hits number, i.e. DEGs contained in them.

Network analysis
The 915 DEGs were found to be involved in 11466 PPI pairs in the union data of HPRD and BioGrid database. FBXO6 has the highest number, i.e. 614, of direct neighborhoods (which referred as degree hereinafter) in the PPI network and there are 14 genes, including BCL6, FOS, IL7R, CEBPB, LCK, ICAM1, STAT1, ISG15, FYN, LYN, UBC, VCAM1, ITGA4 and FBXO6, were found to have degree larger than 100. Besides, modular analysis resulted in 5 network modules with module score > 1.5. Table 3 lists the 5 modules and genes contained in them and Figure 3 illustrated the visualization of modules.

qPCR and Western blot confirm genes that involved in the RAS pathological process
To validate the critical genes participate in the pathogenesis of RAS, ulcerated tissues from RAS patients and normal tissues from healthy donors were subjected to qPCR ( Figure 4A, 4B, 4C) and Western blot ( Figure 4D). The increased expression levels of FBXO6, ITGA4 and VCAM were verified to be consistent with the microarray results.

DISCUSSION
Recurrent aphthous stomatitis (RAS), also known as canker sores and recurrent aphthous ulceration, is a recurrent oral mucosal disease that shows different geographical incidence rates in the world [17]. RAS is known to be extremely aching, and it may even have a negative effect on the quality of life of the affected individual, impairing eating, swallowing and speaking [18]. High-throughput gene expression analysis not only identifies genes that have a potential effect on disease, but also provides an understanding of expression changes during disease progression. In present study, by the comparison of gene expression profile from 14 case samples and 19 control samples, we identified 915 DEGs with adjPval<0.05 and |logFC|>1 including 707 upregulated genes and 208 down-regulated genes.
GO enrichment analysis revealed that DEGs significantly enriched in immune response, inflammatory response and defense response. The healing of an ulcer is a dynamic process usually involving the classic three phases of wound healing known as inflammation, proliferation and remodeling [19]. ITGA4 encodes a member of the integrin alpha chain family of proteins. Integrins are heterodimeric integral membrane proteins composed of an alpha chain and a beta chain that function in cell surface adhesion and signaling [20]. The encoded preproprotein is proteolytically processed to generate light and heavy chains that comprise the alpha 4 subunit. This subunit associates with a beta 1 or beta 7 subunit to form an integrin that may play a role in inflammatory disease [21]. This integrin is a therapeutic target for the treatment of inflammation, including multiple sclerosis, Crohn's disease and inflammatory bowel disease. This gene is a member of the Ig superfamily and encodes a   .154  13  28  CD5, FCGR2A, FYN, IL7R, LAT, LCK, LYN,PIK3R1, PLCG2, SLA,  SYK, TRAT1, WA   2  1.894  66  125   ABL1, ACTB, BRCA1, C1QB, C1QC, CCR5, CD247, CD82, CREBBP,  CXCR4, CYBA, CYBB, EGFR, F2, FLNA, GNB2L1, GRB2, HCK,  HNRNPA2B1, HNRNPM, ICAM1, IFNAR2, ILF3, ITGA4, ITGAL,  ITGB1, ITGB2 cell surface sialoglycoprotein expressed by cytokineactivated endothelium. This type I membrane protein mediates leukocyte-endothelial cell adhesion and signal transduction, and may play a role in the development of artherosclerosis and rheumatoid arthritis [22]. VCAM1 interacts with integrin alpha-4/beta-1 (ITGA4/ITGB1) on leukocytes, and mediates both adhesion and signal transduction. The VCAM1/ITGA4/ITGB1 interaction may play a pathophysiologic role both in immune responses and in leukocyte emigration to sites of inflammation [23]. What's more, proliferation is also very important during the healing of an ulcer. FBXO6 encodes a member of the F-box protein family which is characterized by an approximately 40 amino acid motif, the F-box. The F-box proteins constitute one of the four subunits of the ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which function in phosphorylation-dependent ubiquitination [24].
The F-box proteins are divided into 3 classes: Fbws containing WD-40 domains, Fbls containing leucinerich repeats, and Fbxs containing either different protein-protein interaction modules or no recognizable motifs. The protein encoded by this gene belongs to the Fbxs class, and its C-terminal region is highly similar to that of rat NFB42 (neural F Box 42 kDa) which may be involved in the control of the cell cycle [25]. Li and his colleagues report that FBXO6 can significantly promote the growth and proliferation of gastric cancer cells and normal gastric cells and change the cell cycle of them [26]. There were still many deficiencies in our research. For example, only a few genes were confirmed. In future researches, we will further validate the reliable biomarkers for RAS by more functional search. Our final attempts are to find the reliable biomarkers for clinical examination.

Differential expression analysis
Gene microarray analysis was all conducted through R programming software. Briefly, raw CEL data were imported into R through affy version1.54.0 package [10] and performed Robust Multi-Array (RMA) based correction for the comparable of expression values among all samples; the normalized expression profiles were then used for the identification of differential expression genes (DEGs) in RAS samples compared with adjacent nonulcer tissues through linear models for microarray data (limma) version3.32.2 [11]. Genes with fold change > 2 and FDR < 0.05 were considered as significant.

Functional enrichment analysis
We performed functional enrichment analysis for DEGs through the Database for Annotation, Visualization and Integrated Discovery (DAVID) to uncover biological processes involved in RAS. Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways with p-value < 0.05 and minimum hits > 10 were considered as significantly enriched [12].

Network analysis
To explore interaction pairs among DEGs, we obtained their protein-protein interaction (PPI) pairs from the union data in HPRD and BioGRID database [13,14]. Cytoscape software [15] was used for the visualization of PPI network. Besides, for the interpretation of whole network, we performed modular analysis for the network based on the MCODE plugin [16] of Cytoscape based on the thresholds of module score > 1.5.

Quantitative PCR
Total RNA was extracted from ulcerated tissues and normal tissues using EasyPure RNA Kit (TransGen Biotech) following the manufacturer's protocol, and then subjected to reverse transcription PCR by EasyScript Reverse Transcriptase kit (TransGen Biotech).

Western blot
20μg proteins were separated on a PAGE-gel and then transferred to nitrocellulose. The filters were blocked with 5% nonfat dry milk powder in TBS. Primary Abs were diluted 1/1000 in TBS containing 3 mg/ml BSA and 0.02% sodium azide, and incubated with the filters overnight at 4°C. After washing with TBST, the filters were incubated 1h with HRP-conjugated goat anti-rabbit IgG. Filters were washed extensively with TBST, and immunoreactive bands were visualized by ECL.