Competitive endogenous RNA networks: integrated analysis of non-coding RNA and mRNA expression profiles in infantile hemangioma

Infantile hemangioma (IH) is the most common vascular tumour in infants. The pathogenesis of IH is complex and poorly understood. Therefore, achieving a deeper understanding of IH pathogenesis is of great importance. Here, we used the Ribo-Zero RNA-Seq and HiSeq methods to examine the global expression profiles of protein-coding transcripts and non-coding RNAs, including miRNAs and lncRNAs, in IH and matched normal skin controls. Bioinformatics assessments including gene ontology (GO) and kyoto encyclopedia of genes and genomes (KEGG) pathway analyses were performed. Of the 16370 identified coding transcripts, only 144 were differentially expressed (fold change ≥ 2, P ≤ 0.05), including 84 up-regulated and 60 down-regulated transcripts in the IH samples compared with the matched normal skin controls. Gene ontology analysis of these differentially expressed transcripts revealed 60 genes involved in immune system processes, 62 genes involved in extracellular region regulation, and 35 genes involved in carbohydrate derivative binding. In addition, 256 lncRNAs and 142 miRNAs were found to be differentially expressed. Of these, 177 lncRNAs and 42 miRNAs were up-regulated in IH, whereas 79 lncRNAs and 100 miRNAs were down-regulated. By analysing the Ribo-Zero RNA-Seq data in combination with the matched miRNA profiles, we identified 1256 sponge modulators that participate in 87 miRNA-mediated, 70 lncRNA-mediated and 58 mRNA-mediated interactions. In conclusion, our study uncovered a competitive endogenous RNA (ceRNA) network that could further the understanding of the mechanisms underlying IH development and supply new targets for investigation.


INTRODUCTION
Infantile hemangioma (IH) is the most common vascular tumour in childhood, affecting 4% to 5% of infants worldwide [1]. Hemangiomas can show severe progression, which leads to tissue and organ damage that in some cases becomes life-threatening. Clinical treatment varies, including steroids, interferon-alfa, and β-blocker propranolol [2,3]. However, no defi nitive therapy is available for IH due to the adverse effects of each drug. The risk factors for IH include preterm birth and placental anomalies [4]. In most cases, IH has a unique clinical course with proliferation and involution phases [5]. Numerous genes involved in IH have been identifi ed. However, the pathogenesis and cause of hemangioma remain largely unknown.
The competitive endogenous RNA (ceRNA) hypothesis proposes that RNA transcripts, both coding and non-coding, compete for post-transcriptional control and coregulate each other using microRNA response elements (MREs) [6,7]. Mounting evidence has shown that long non-coding RNAs and messenger RNAs can function as ceRNAs in diverse physiological and pathophysiological states such as myogenesis, melanoma development and cancer [8][9][10][11]. A recent study profi led the expression of distinct long non-coding RNAs (lncRNAs) in infantile Research Paper www.impactjournals.com/oncotarget hemangioma using microarray analysis and suggested that lncRNAs regulated several genes with important roles in angiogenesis [12]. Endothelial and circulating C19MC microRNAs are biomarkers of infantile hemangioma [13]. Additionally, integrative meta-analysis identifi ed microRNA-regulated networks in infantile hemangioma [14]. However, the role of the ceRNA network in IH has not been elucidated.
In this study, we used Ribo-Zero RNA-Seq and HiSeq to examine the global expression profi les of proteincoding transcripts and non-coding RNAs, including miRNAs and lncRNAs, in IH and matched normal skin controls. Subsequently, gene ontology and pathway analysis displayed that, compared with the matched normal skin controls, many processes over-represented in IH were related to immune system processes, extracellular region regulation, and carbohydrate derivative binding. Further ceRNA network analysis identifi ed 1256 sponge modulators including 87 miRNA-mediated, 70 lncRNAmediated and 58 mRNA-mediated interactions. Our study may help expand understanding of the roles of the transcriptome, particularly non-coding transcripts, in the mechanisms underlying IH development and provide new research directions.

Differential expression profi les and bioinformatics analysis of mRNAs in IH compared with matched normal skin controls
To profi le differentially expressed mRNAs, lncRNAs and miRNAs in IH, we performed RNA-seq on 3 IH samples and matched normal skin controls. We used an Illumina HiseqXTen platform (Illumina, San Diego, CA) for sequencing with (2 × 150 bp) the pairedend module. Fold changes (IH vs. matched normal skin controls) and p values were calculated from the normalized expression levels. Hierarchical clustering showed distinguishable mRNA expression patterns among the samples ( Figure 1A). Up to 144 mRNAs were differentially expressed in the IH samples compared with the matched normal skin controls (fold change ≥2, P ≤ 0.05; for a list of differentially expressed mRNAs, see Table 1). A total of 84 and 60 mRNAs were up-regulated or down-regulated, respectively, by more than two-fold in IH vs. adjacent normal skin tissues (P < 0.05) ( Figure 1B). KEGG Pathway analysis indicated that the chemokine, NF-kappa B and TGF-beta signalling pathways, as well as cell adhesion molecules (CAMs), were mostly found in the IH samples compared with matched normal skin ( Figure 1C). In addition, gene ontology (GO) analysis revealed that numerous biological processes, molecular functions and cellular components were involved. Many of the processes that are deregulated in IH were related to immune system processes, carbohydrate derivative binding and extracellular region regulation ( Figure 1D).

Bioinformatics analysis of lncRNAs in IH compared with matched normal skin controls
Using the Gencode, RefSeq and UCSC Knowngene databases for non-coding transcripts, we identifi ed 256 differentially expressed lncRNAs with greater than twofold changes in IH and p values < 0.05 (Figure 2A). Of these, 177 were overexpressed and 79 were underexpressed in IH relative to the matched normal skin controls (fold change ≥ 2, P ≤ 0.05; for a list of differentially expressed lncRNAs, see Table 2). LncRNAs (long non-coding RNAs), are defi ned as greater than 200 nucleotides in length, transcribed by RNA polymerase II (RNA PII), and lacking an open reading frame [15]. LncRNAs have been found to regulate protein-coding (pc) gene expression at both the transcriptional and post-transcriptional levels [16]. To identify the potential mRNA targets of lncRNAs, we use RNAplex to predict the binding of lncRNAs with the antisense target mRNAs. mRNAs 10 kb upstream or downstream of lncRNAs were considered to be the conceivable lncRNA targets and defi ned as cis target mRNAs. Gene ontology (GO) analysis revealed that cis target mRNAs of differentially expressed lncRNAs were mostly involved in regulatory mechanisms related to transcription, nucleic acid binding transcription factor activity and intracellular components ( Figure 2B). KEGG Pathway analysis indicated that the MAPK signalling pathway, regulation of autophagy and metabolic pathways were implicated for the cis target mRNAs of differentially expressed lncRNAs ( Figure 2C).

Differential expression and bioinformatics analysis of miRNAs in IH compared with matched normal skin controls
We also determines the miRNA expression profi les between IH and matched normal skin controls using HiSeq. One hundred forty-two miRNA candidates were found to be differentially expressed (fold change ≥ 2, P ≤ 0.05; for a list of differentially expressed miRNAs, see Table 3). Of these, 42 miRNAs were up-regulated in IH, whereas 100 miRNAs were down-regulated ( Figure 3A). To examine the potential biological functions of the miRNAs of interest in IH, we use miRanda, targetscan and PITA software to identify the target genes of known miRNAs with differential expression profi les and extracted intersections or unions of target genes as the fi nal prediction result. Gene ontology (GO) analysis revealed that the target mRNAs of differentially expressed miRNAs were mostly involved in cellular processes, cell components and binding ( Figure 3B). www.impactjournals.com/oncotarget

Real-time quantitative PCR validation
To validate the RNA-seq data, we randomly selected 9 differentially expressed RNAs (bold-face type in Tables 1-3). Real-time quantitative PCR (qRT-PCR) and Bulge-Loop TM qRT-PCR analyses were performed on an additional 9 independent IH skin samples ( Table 4). The results revealed that similar up-regulation or downregulation patterns were observed in both the RNAseq and qRT-PCR samples for the 9 RNAs ( Figure 4, The top 20 pathways that are associated with the coding genes are listed. The enrichment Q value or false discovery rate correct the p value for multiple comparisons. P values are calculated using Fisher's exact test. The term/pathway on the vertical axis is drawn according to the fi rst letter of the pathway in descending order. The horizontal axis represents the enrichment factor, i.e., (the number of dysregulated genes in a pathway/the total number of dysregulated gene)/(the number of genes in a pathway in the database/the total number of genes in the database). Top 20 enriched pathways are selected according to the enrichment factor value. The selection standards were the number of genes in a pathway ≥4. The different colours from green to red represent the Q value (False discovery rate value). The different sizes of the round shapes represent the gene count number in a pathway. (D) Most enriched GO terms of the three ontologies that are associated with the differentially expressed coding genes are listed. The horizontal axis represents the gene number. The term/GO on the vertical axis is drawn according to the fi rst letter of the GO in descending order. Red bar represents the biological process, blue bar displays the molecular function, and green bar illustrates the cellular component. Norm or Ctr, matched normal skin tissue; Tum, infantile hemangioma skin tissue. bold in Tables 1-3). Therefore, our RNA-seq data were reliable and stable. Among the 9 RNAs, IFI44L, ISG15, TCONS_00088818, TCONS_000112159, TCONS_000125870, miR-503-5p and miR-524-3p were all expressed to a greater extent in the IH tissues than in the matched normal skin controls.

The ceRNA network construction
Recently, lncRNAs and mRNAs have been demonstrated to be function as ceRNAs in diverse physiological and pathophysiological states. It is known that mRNAs or lncRNAs can bind miRNAs through Ctr, matched normal skin tissue; Tum, infantile hemangioma skin tissue.
microRNA response elements (MREs). Therefore, we use rna22 and targetscan to screen the lncRNAs and mRNAs with MREs. To ascertain the associations of differentially expressed lncRNAs and miRNAs with mRNAs, based on the 142 differentially expressed miRNAs, 144 differentially expressed mRNAs, and 256 differentially expressed lncRNAs, an lncRNA-miRNA-mRNA correlation network was constructed ( Figure 5). The network displayed the associations among 87 miRNAs, 70 lncRNAs and 58 mRNAs mediated interactions.
For example, miR-26a-1-3p could bind to lncRNAs TCONS_00074621 and mRNA CD24, miR-24-3p bind with TCONS_00000006 and LEFTY2, moreover, miR-514a-3p bind to TCONS_00040753 and FLT1. As shown in Figure 5, one miRNA was associated with one to tens of mRNAs and vice versa. One lncRNA was related to one to tens of miRNAs. In total, 1256 sponge modulators participated in 87 miRNA-mediated, 70 lncRNA-mediated and 58 mRNAs' transcripts-mediated interactions. These fi ndings indicated that the expression profi les of miRNAs, mRNAs and lncRNAs were signifi cantly correlated.

DISCUSSION
Currently, with the advent of next-generation sequencing technologies, RNA-Seq is gradually replacing microarrays for the detection of transcript expression profi ling. IH is one of the most common tumors diagnosed    [20]. Genome-wide transcriptional profi ling of vessels from proliferating and involuting hemangiomas has been used to identify differentially expressed genes [19]. A lncRNA microarray study reported that a large number of genes are differentially expressed in IH [12]. Integrative meta-analysis identifi es miRNA-regulated networks in IH [14]. Here, based on RNA-seq technology, using IH tissues and matched normal skin, we presented a new ceRNA network that determined the functions of particular miRNAs, lncRNAs and mRNAs in IH development ( Figure 5). Alterations in one ceRNA may have striking effects on the integrated ceRNA and transcriptional networks. Taken miR-664a-3p as an example, it was down-regulated in IH tissues from the small RNA-seq data (Table 3). Bulge-Loop TM qRT-PCR demonstrated that expression of miR-664a-3p was decreased in another nine IH tissues (Figure 4). The ceRNA network analysis revealed that three lncRNAs TCONS_00020616, TCONS_00058199 and TCONS_00108595 could bind to miR-664a-3p. Moreover, nine mRNAs including ADCYAP1, CSMD3, IL18R1, IKZF3, CD8A, FGL2, FUT9, IGF2BP1, and TMEM38B were predicted to bind with miR-664a-3p ( Figure 5). This ceRNA network indicated that those nine mRNAs' expression could be regulated by miR-664a-3p, and three lncRNAs TCONS_00020616, TCONS_00058199, TCONS_00108595 could compete to bind with miR-664a-3p and then affect those nine mRNAs' expression.
The pathogenesis of IH has been linked to pathways affecting angiogenesis and vasculogenesis [21]. Those microarray analyses concluded that angiogenin may be a useful serum marker for hemangiomas [22], IH endothelial cells (HEMECs) refl ects a pro-proliferative cell type with altered adhesive characteristics [17], proliferating hemangiomas display increased expression of genes involved in endothelial-pericyte interactions, as well as those involved in neural and vascular patterning [19], lncRNAs likely regulate several genes in angiogenesis [12], miRNA-mRNA expression networks display that deregulated genes play roles in cell growth and differentiation, cell signaling, angiogenesis and vasculogenesis [14]. In the present study, using RNA-seq technology, we found that deregulated mRNAs related to immune system processes, carbohydrate derivative binding, extracellular region regulation, chemokine, NF-kappa B and TGF-beta signalling pathways, as well as cell adhesion molecules (CAMs) (Figure 1). Moreover, cis target mRNAs of differentially expressed lncRNAs were mostly involved in regulatory mechanisms related to transcription, nucleic acid binding transcription factor activity, intracellular components, MAPK signalling pathway, regulation of autophagy and metabolic pathways ( Figure 2). Additionally, target mRNAs of differentially expressed miRNAs were Endothelial TGF-β signalling has been implicated in the regulation of angiogenesis [23]. Expression of NFkappa B target genes was demonstrated in proliferating  Figure 4: The differential expression of mRNAs, lncRNAs and miRNAs between additional IH skin (n = 9) and matched normal skin tissues (n = 9). mRNAs and lncRNAs expression were validated by quantitative real-time PCR using 2 (-△△Ct) method. miRNAs expression was validated by Bulge-Loop TM qRT-PCR. * P < 0.05, ** P < 0.01.
IH. Targeting NF-kappa B in infantile hemangiomaderived stem cells reduced VEGF-A expression [24]. The chemokine CXCL-14 has been reported to be involved in the occurrence and development of infantile hemangioma [25]. Our RNA-seq data found that TGF-beta, NF-kappa B signalling and chemokine signalling were involved in the pathogenesis of IH. The results are consistent with those of previous studies, which suggested that the RNA-seq data are reliable.
By carefully comparing our data with other's, IGF2, FOXF1 and EGFL7 were reported to be up-regulated in IH, FOXC1 and EGFR were down-regulated in IH [12]. In this study, we found that IGF2 mRNA-binding proteins (IGF2BPs) including IGF2BP1, IGF2BP2 and IGF2BP3 were all downregualted in IH. Although recent publication reported that results obtained by RNA-seq and microarrays were highly reproducible [26], some discrepancy may be existed in the differentially expressed RNAs. Therefore, further demonstrating the function of particular RNA in IH development is urgently needed. In addition, larger samples are needed to perform receiver operating characteristic (ROC) curve analysis to prove that some of the IH RNAs are promising biomarkers.
Taken together, understanding the functional interactions among lncRNAs, miRNAs and mRNAs could lead to new explanations for IH disease pathogenesis [7]. Further elucidating the underlying mechanisms of the functions of miRNAs, lncRNAs and mRNAs in IH would be helpful in revealing the biological aetiology and potentially provide useful information for IH evaluation and treatment.

Ethics statement
This study was approved by the Medical Ethics Committee of the Obstetrics and Gynaecology Hospital Red color represents the mRNA name, green color displays the lncRNA name, and orange color illustrates the miRNA name. Taken TCONS_00126150 | CD24 and TCONS_00126153 | CD24 as an example, the TCONS_ number before | means the ID number of each transcript, one gene has many transcripts.
affi liated with Nanjing Medical University (No. [2015] 91). Children with IH underwent surgery at out hospital. The IH samples and matched normal skin controls were collected from patients who underwent surgery and whose parents provided written consent.

Tissue samples
Proliferating capillary infantile hemangioma (IH) and matched normal skin tissues were obtained from 12 different patients who were admitted to the Obstetrics and Gynaecology Hospital affi liated with Nanjing Medical University for IH removal. Patient information is listed in Table 4. A diagnosis of proliferative infantile hemangioma was confi rmed by routine pathological examination. The collected skin samples were immediately frozen in liquid nitrogen for total RNA preparation.

Total RNA isolation
Total RNA was extracted from biopsy samples using the Qiagen RNeasy mini kit (Qiagen, Valencia, CA). After ribosomal RNA depletion, RNA-seq libraries were prepared using ScriptSeq complete kits from Epicenter (Madison, WI). RNA purity was assessed using the Nano Photometer ® spectrophotometer (IMPLEN, CA, USA), and RNA concentration was measured using the Qubit® RNA Assay Kit in Qubit ® 2.0 Flurometer (Life Technologies, CA, USA). RNA integrity was assessed using the RNA Nano 6000 Assay Kit from the Bioanalyser 2100 system (Agilent Technologies, CA, USA).

Library preparation, quality examination and sequencing for mRNAs and lncRNAs
The sequencing libraries were prepared following manufacturer recommendations from the VAHTS TM Total RNA-seq (H/M/R) Library Prep Kit for Illumina ® . The details of library construction were patented by the company (Vazyme, China). After cluster generation, the libraries were sequenced on an Illumina Hiseq X10 platform, and 150-bp paired-end reads were generated.
Raw reads in fastq format were fi rst processed using in-house perl scripts. Clean reads were obtained by removing reads with adapters, reads in which unknown bases were more than 5% and low quality reads (if the percentage of low quality bases was greater than 50% in a given read, we defi ned the low quality base to be the base whose sequencing quality was no more than 10). At the same time, Q20, Q30, and GC contents were calculated for the clean reads. All downstream analyses were based on the clean reads.
The reference genome and gene model annotation fi les were downloaded directly from the genome website. The reference genome index was built using Bowtie (v2.1.0) [27], and the paired-end clean reads were aligned to the reference genome using TopHat (v2.1.1) [28].

Transcriptome assembly, lncRNA prediction and target gene prediction
The mapped reads from each sample were assembled using Cuffl inks (v2.2.1) [29] with a referencebased approach. Cuffl inks uses a probabilistic model to simultaneously assemble and quantify the expression levels of a minimal set of isoforms, which provides a maximum likelihood explanation of the expression data in a given locus. Then, Cuffmerge was used to merge these sample assemblies into a master transcriptome, which was compared to known transcripts via Cuffcompare. The lncRNAs were predicted by several strict steps based on RNA structural characteristics and non-coding properties. The steps were as follows: 1) transcripts, not in any class code of " j, i, o, u, x ", were fi ltered out; 2) transcripts shorter than 200 bp were fi ltered out; 3) transcripts aligned to sequences in the NONCODE database [30] by blastn were identifi ed as known lncRNAs; 4) the retained transcripts (known lncRNAs were not included) were used to predict protein coding potential by Coding Potential Calculator (CPC) [31] and TransDecoder (http:// transdecoder.github.io/), transcripts with coding potential were removed, and those without coding potential were identifi ed as novel lncRNAs. The known lncRNAs and novel lncRNAs were together used for subsequent analyses.
LncRNAs can negatively or positively affect expression of the downstream gene via an upstream noncoding promoter. Genes within 10 kb upstream or downstream of lncRNAs were abstracted by bedtools (http://bedtools.readthedocs.io/en/latest/) as lncRNA target genes. However, antisense lncRNAs can regulate overlapping sense transcripts. Transcripts that overlapped with LncRNAs on the opposite strand were also identifi ed as lncRNA target genes, and the interactions between lncRNAs and transcripts were revealed by RNAplex [32].

Quantifi cation of gene expression levels and differential expression analysis
Cuffdiff (v2.2.1) [33] was used to calculate FPKMs for both lncRNAs and coding genes in each group. Gene FPKMs were computed by summing the FPKMs of the transcripts in each gene group. FPKM stands for "fragments per kilobase of exon per million fragments mapped" and is calculated based on the length of the fragments and the reads count mapped to each fragment.
Cuffdiff (v2.2.1) provides statistical routines for determining differential expression in digital transcripts or gene expression datasets using a model based on a negative binomial distribution. Transcripts or genes with corrected p values less than 0.05 and absolute values of log2 (fold change) <1 were classifi ed as signifi cantly differentially expressed. www.impactjournals.com/oncotarget

Small RNA sequencing and bioinformatics analysis
Total RNA was separated by 15% agarose gels to extract the small RNA (18-30 nt). After ethanol precipitation and centrifugal enrichment of small RNA samples, the library was prepared according to the methods and processes described in the Small RNA Sample Preparation Kit (Illumina, RS-200-0048). Insert size was assessed using the Agilent Bioanalyser 2100 system (Agilent Technologies, CA, USA), and after the insert size was consistent with expectations, qualifi ed insert size was accurately quantitated using a Taqman fl uorescence probe from the AB Step One Plus Real-Time PCR system (Library valid concentration > 2 nM). The qualifi ed libraries were sequenced using an Illumina Hiseq 2500 platform, and 50-bp single-end reads were generated.
First, the tags were mapped to the reference genome by SOAP [34] to analyse their distributions within the genome and were aligned to the miRBase database [35] using blast. The tags were identifi ed as known miRNAs when they satisfi ed the following criteria: 1) there were no mismatches when aligned to the miRNA precursors in the miRBase database; 2) based on the fi rst criteria, the tags were aligned to the mature miRNAs in the miRBase database with at least 16-nt overlap while allowing offsets. The miRNA target genes were predicted using two software programs (targetscan and miRanda) as we previously described [36], and the intersection of target genes (the intersections were the same target genes of the same miRNAs) were the fi nal target genes.
The miRNA expression levels were measured by "Transcripts Per Kilobase Million" (TPM). TPM = 10 C / L 6 where C is the read count of a miRNA and L is the total count of clean reads in sample.
Differentially expressed miRNAs were evaluated using the following statistical tests: 1) Statistical algorithm developed by Audic and Claverie (1997) [37] P y | x = N N x + y !
x!y! 1+ N N where N 1 is the total clean reads from sample 1, N 2 is the total clean reads from sample 2, x is the number of reads from sample 1 mapped to miRNA A and y is the number of reads from sample 2 mapped to miRNA A.

Gene Ontology (GO) and KEGG enrichment analysis
GO enrichment analysis of differentially expressed genes or target genes of differentially expressed lncRNAs was implemented using a perl module (GO::TermFinder) [38]. GO terms with corrected p values less than 0.05 were considered to be signifi cantly enriched among the differentially expressed genes or the target genes of differentially expressed lncRNAs. R functions (phyper and qvalue) were used to test for statistical enrichment of the differentially expressed genes or target genes of the differentially expressed lncRNAs among the KEGG pathways. KEGG pathways with corrected p values less than 0.05 were considered to be signifi cantly enriched among the differentially expressed genes or the target genes of the differentially expressed lncRNAs.

Validation of RNA-seq data
To confi rm the RNA-seq data, the expression profi les of randomly selected mRNAs and lncRNAs were tested in another 9 IH patients using quantitative real-time polymerase chain reactions (qRT-PCR) with the SYBR green method on an Applied Biosystems ViiA™ 7 Dx (Life Technologies, USA). Patient information is listed in Table 4. The sequences of the specifi c PCR primer sets used for qRT-PCR are listed in Table 5. The RNA expression levels were normalized to the internal control gene, glyceraldehyde 3-phosphate dehydrogenase (GAPDH), using the 2 (-△△Ct) method as we previously described [39]. Three selected miRNAs were further examined by Bulge-Loop TM qRT-PCR according to the manufacturer's protocol ( RIBOBIO, Guangzhou, China) with the SYBR green method on an Applied Biosystems ViiATM 7 Dx (Life Technologies, USA). The miRNA expression levels were normalized to u6 (RIBOBIO, Guangzhou, China), using the 2 (-△△Ct) method.