Identification of novel long non-coding RNAs in triple-negative breast cancer

Triple-negative breast carcinomas (TNBC) are characterized by particularly poor outcomes, and there are no established markers significantly associated with prognosis. Long non-coding RNAs (lncRNAs) are subclass of noncoding RNAs that have been recently shown to play critical roles in cancer biology. However, little is known about their mechanistic role in TNBC pathogenesis. In this report, we investigated the expression patterns of lncRNAs from TNBC tissues and matched normal tissues with Agilent Human lncRNA array. We identified 1,758 lncRNAs and 1,254 mRNAs that were differentially expressed (≥ 2-fold change), indicating that many lncRNAs are significantly upregulated or downregulated in TNBC. Among these, XR_250621.1 and NONHSAT125629 were the most upregulated and downregulated lncRNAs respectively. qRT-PCR was employed to validate the microarray analysis findings, and results were consistent with the data from the microarrays. GO and KEGG pathway analysis were applied to explore the potential lncRNAs functions, some pathways including microtubule motor activity and DNA replication were identified in TNBC pathogenesis. Our study revealed that a set of lncRNAs were differentially expressed in TNBC tissues, suggesting that they may play role in TNBC. These results shed light on lncRNAs’ biological functions and provide useful information for exploring potential therapeutic targets for breast cancer.


INTRODUCTION
Breast cancer is a heterogeneous neoplasm that comprises subtypes with substantial differences in biology and diverse clinical outcomes.As more molecularly targeted therapeutic agents are launched, more clinical remission problems are arising [1].Therefore, identification of novel therapeutic targets is essential to combat breast cancers, especially those lacking estrogen receptor/progesterone receptor and ErbB2 receptor (triple negative breast cancer, TNBC).TNBC accounts for approximately 10-25% of all breast cancers and is of particular clinical interest due to its tendency to affect younger women and refractory to currently available targeted therapy.The molecular mechanisms for aggressive clinical behavior of TNBC are not fully understood.Various studies show that TNBC have broad and diverse categories for which additional subclasses are needed.Thus, there is considerable interest in understanding potential biomarkers that are significantly associated with TNBC prognosis.
Long non-coding RNAs (lncRNAs) are a subclass of noncoding RNAs (ncRNAs) and have sequence lengths of 200 bp and above [2,3].It has become increasingly apparent that lncRNAs contribute to tumor development through many different cellular processes, ranging from transcriptional and post-transcriptional regulation to the control of cell cycle distribution, cell differentiation and epigenetic modifications [4,5].LncRNAs modulate gene transcription regulation by rearranging chromatin via chromosomal looping and by affecting the binding of transcription factors.LncRNAs also affect miRNA functions by controlling pre-mRNA splicing or as miRNA sponges.Recently, accumulating evidence indicates that there is aberrant expression of lncRNAs in many cancer types, including glioma, lung, colorectal and hepatocellular cancers, etc [6][7][8].Although prognostic lncRNAs expression signatures have been defined for some invasive breast carcinomas, little is known about lncRNAs expression in TNBC, and the underlying pathways regulating TNBC aggressiveness remain poorly understood [9].
Here, we analyzed the expression patterns of lncRNAs and mRNAs in TNBC samples and compared them with the corresponding patterns in adjacent nontumorous tissue samples.We identified more than 1,200 unique lncRNAs and mRNAs significantly differentially expressed using microarray technology.Several of the differentially expressed lncRNAs were verified by qPCR in other 12 pairs of tissue samples.To determine the biological roles of these differentially expressed lncRNAs and mRNAs, GO and Pathway analyses were used.Coding-non-coding gene co-expression network identified many lncRNAs, such as lncRNA XR_250621.1,that potentially play a key role in TNBC pathogenesis.Our results suggest that lncRNAs expression patterns may provide new molecular biomarkers for the diagnosis of TNBC.

LncRNAs and mRNAs expression profiles in TNBC
LncRNAs profiling detected 1,403 lncRNAs with significant differential expression levels with at least a two-fold change in TNBC tissues compared with paired normal tissues, with 853 up-regulated and 550 down-regulated respectively.The list of the top 30 differentially expressed lncRNAs identified by microarray analysis was shown in Table 1.Among the dysregulated lncRNAs transcripts, XR_250621.1 (humanseq85285) was the most down-regulated, with an FC of 291.3, whereas NONHSAT125629 (humanseq51739) was the most up-regulated, with an FC of 23.9.Using the same criteria as the lncRNAs, we found 574 up-regulated and 445 down-regulated mRNA transcripts.The most upregulated and downregulated mRNA transcripts were CDCA2 (NM_152562) and ANKRD30A (NM_052997), with FCs of 22.8 and 155.3, respectively (shown in Table 2).Hierarchical clustering of the lncRNAs and mRNAs profile was performed using cluster 3.0.2;Hierarchical clustering of the expression of the 1,403 lncRNAs and 1,019 mRNAs based on centered Pearson correlation clearly separated TNBC from normal tissues (Figure 1).

Validation of the microarray data using qPCR
The most upregulated lncRNA XR_250621.1 and downregulated lncRNA NONHSAT125629 were selected for validation using qPCR.In addition, two lncRNAs (ENST00000503938 and NONHSAT012762) were randomly selected to validate the microarray consistency using qPCR.The results demonstrated that lncRNAs NONHSAT125629 and ENST00000503938 were upregulated and that XR_250621.1 and NONHSAT012762 were down-regulated in the tumor samples compared with NT samples (Figure 2).These qPCR results are consistent with the microarray data.

Go and KEGG pathway analysis
To predict the functions of the lncRNAs, we adopted method originally demonstrated in this paper [10].Briefly, we first calculated the co-expressed mRNAs for each of the differentiated lncRNAs, and then we conducted a functional enrichment analysis of this set of co-expressed mRNAs.The enriched functional terms were used as the predicted functional terms for each given lncRNA.To explore potential biological associations, we ran GO and Pathway analysis with the top 500 differentially expressed lncRNAs and mRNAs.GO analysis indicated that several functional pathways were enriched.Among these pathways, protein binding, fibroblast growth factor-activated receptor activity, structural constituent of ribosome, protein kinase binding and poly(A) RNA binding signaling were the most closely associated with TNBC (Figure 3A).Furthermore, using the same criteria as the GO analysis, KEGG Pathway analysis showed that some pathways corresponded, including ribosome, pathways in cancer, NOD-like receptor signaling pathway, cell cycle, DNA replication, etc. (Figure 3B).

Construction of co-expression network
To explore which lncRNAs and mRNAs play a critical role in TNBC progression, we constructed a co-expression network based on the correlation analysis between the differentially expressed lncRNAs and mRNAs.LncRNAs and mRNAs with Pearson's correlation coefficients of no less than 0.99 were used to construct the network.To explore lncRNAs that possibly have trans-regulating functions, we compared the mRNAs that coexpressed with these lncRNAs with the mRNAs that are regulatory targets of certain Transcription factors (TFs).Results show that EP300, NFYB and E2F1 may play central roles in lncRNAs process (Figure 4).The coexpression network indicated that one mRNA or lncRNA might correlate with one to ten lncRNAs (Figure 2S).The co-expression network may suggest that the interregulation of lncRNAs and mRNAs is involved in TNBC.

DISCUSSION
Pathogenesis of breast cancer remains unclear; therefore, further study of breast cancer is of great importance.As lncRNAs constitute an important class of gene expression regulatory factors, their aberrant expression would inevitably lead to abnormal gene expression levels, which might result in tumorigenesis [2,5].To date, there have been few studies studying lncRNAs expression profile in breast cancer or predicating the association of lncRNA expression with clinical pathological features and outcomes in breast cancer [11,7,12,9,13,14].Thus, lncRNAs have opened a new field of breast cancer genomics.Although there are no drugs that act against lncRNAs presently, it will be fascinating to observe whether drugs could be developed that specifically target lncRNAs.Notably, lncRNAs can be detected in human body fluids and hold great promise as biomarkers.
In the present study, we investigated lncRNAs expression signature of TNBC tumor samples from patients.With abundant and varied probes amounting to 78,243 human lncRNAs and 30,215 coding transcripts in the microarray, a large number of lncRNAs could be determined quantitatively and significant differential expression in cancer tissue compared to normal breast tissue was observed.1,758 lncRNAs and 1,254 mRNAs were found to be significantly differentially expressed.In addition, qRT-PCR was employed to validate the microarray analysis findings, and results were consistent with the data from the microarrays.These results revealed that there were unique lncRNAs expression signatures in these tissues.However, the majority of differentially expressed lncRNAs corresponded to novel transcripts of unknown functions [4,15].In order to obtain insight into lncRNA target gene functions, GO analysis and KEGG pathway annotation were applied to the lncRNA target gene pool.GO analysis revealed that the number of genes corresponding to down-regulated mRNAs were larger than that corresponding to up-regulated mRNAs.These pathways may play important roles in the occurrence and development of TNBC.Increased understanding of the role of these potential endogenous lncRNAs in breast cancer cells could provide additional insight on the role these pathways play in mediating breast cancer progression.
An increasing number of studies have demonstrated that a number of lncRNAs are not transcriptional noise, but have important functions, such as regulating gene expression at various molecular levels, including protein, RNA, miRNA and DNA [16,17].Few studies have focused on how lncRNA genes themselves were transcriptionally regulated.Yang et al. developed a system by which users could browse transcription factor binding sites in the regulatory regions of lncRNAs [18].However, lncRNAs are temporally and spatially expressed and regulated, and motif-based sequence analysis cannot capture the dynamic regulation of lncRNAs by transcription factors.In this study, we constructed a transcription factors-lncRNAs-mRNA network based on expressions in the TNBC tissue and binding sites in the regulatory regions of a specific lncRNA.Results showed that EP300, NFYB and E2F1 played central roles in lncRNAs process and TNBC development, which were consist with previous reports [19][20][21].As more data become available, it will facilitate the research on the transcriptional regulation of lncRNAs.
Several limitations should be acknowledged for this study.First, gene expression microarray have limited dynamic range and lack the ability to discover novel features as splice isoforms or fusion transcripts.RNA-seq technology promises to unravel previously inaccessible complexities in the transcriptome, such as allele-specific expression and novel promoters and isoforms.However, datasets produced are large and complex and interpretation is not straight forward.Second, the sample size of each dataset is relatively small, the significance and robustness of the signature requires further confirmation, ideally with large prospective patient cohorts with prognostic date.Last but not least, although the roles of the lncRNAs in TNBC pathogenesis are presently unclear, our findings suggest that lncRNAs deserve further studied.Additional functional investigations of these lncRNAs on cancer cell lines and xenograft models may increase our outstanding of their roles in determining TNBC prognosis.
To summarize, comprehensive in-depth analysis of the expression profiles of lncRNAs was executed in this study.A set of lncRNAs with differential expression were found in TNBC compared with normal breast tissue.Furthermore, potential roles for these lncRNAs in the regulation of protein binding, fibroblast growth factor-activated receptor activity, structural constituent of ribosome, protein kinase binding and poly (A) RNA binding signaling pathways will be identified.Further investigation of the lncRNAs identified in this study will likely focus on their biological functions and their association with TNBC.Our study provides useful information for exploring potential therapeutic targets for breast cancer.

Ethics statement
This study was approved by the human ethics committee of the Zhejiang Taizhou Hospital, People's Republic of China.All patients are informed and have declared written informed consent that their samples can be used for research.
All patients received tumor resection at Zhejiang Taizhou Hospital and were diagnosed with TNBC histopathologically after surgery.Immunochemical staining of estrogen receptor/progesterone receptor and ErbB2 receptor in 3 samples are shown in Figure S1.There was no radiotherapy or chemotherapy prior to surgery. 3 paired samples were used for microarray analysis of lncRNAs and 12 paired samples were used for an extra evaluation by real-time PCR.Demographic and clinical characterizations of the study population are summarized in Table S1.

Tissue collection and RNA extraction
Paired TNBC tissues and adjacent normal breast tissues from every subject were snap-frozen in liquid nitrogen immediately after resection and stored at -80 °C until use.The mirVana TM RNA Isolation Kit (Ambion, Foster City, CA, United States) was used to extract total RNA from frozen samples, according to the manufacturer's protocols, which were then eluted with 100 mL of nuclease-free water.Total RNA was quantified with the NanoDrop ND-2000 (Thermo Scientific) and the RNA integrity was assessed using Agilent Bioanalyzer 2100 (Agilent Technologies).

LncRNA and mRNA microarray expression profiling
The microarray profiling was conducted in the laboratory of the OE Biotechnology Company in Shanghai, People's Republic of China.The sample labeling, microarray hybridization and washing were performed based on the manufacturer's standard protocols.Briefly, mRNA was purified from total RNA after removal of rRNA by using an mRNA-ONLY Eukaryotic mRNA IsolationKit (Epicentre Biotechnologies, USA).Then, each sample was transcribed to double strand cDNA, then synthesized into cRNA and labeled with Cyanine-3-CTP.The labeled cRNAs were hybridized onto the Human lncRNA array V4.0 (4 × 180 K, Agilent), including the global profiling of 78,243 human lncRNAs and 30,215 coding transcripts.After washing, the arrays were scanned with the Agilent Scanner G2505C (Agilent Technologies).Feature Extraction software (version 10.7.1.1,Agilent Technologies) was used to analyze array images and extract the raw data.Genespring (Version 12.5, Agilent Technologies) was employed to finish the basic analysis of the raw data.To begin with, the raw data were normalized with the quantile algorithm.The probes that had at least 1 condition out of 2 conditions flagged as "P" were chosen for further data analysis.Differentially expressed lncRNAs and mRNAs were then identified through fold-change as well as P values calculated with t-test.The threshold set for up-and down-regulated genes was fold change ≥ 2.0 and p value ≤ 0.05.Afterwards, Hierarchical Clustering was performed to display the distinguishable lncRNAs and mRNAs expression patterns among the samples.

Functional group analysis
GO analysis and KEGG analysis were applied to determine the biological roles of these differentially expressed mRNAs, base on the latest KEGG (Kyoto Encyclopedia of Genes and Genomes) database (http:// www.genome.jp/kegg/).The p value (Hypergeometric-P value) denotes the significance of the pathway correlated to the conditions.The recommend p-value cut-off is 0.05.

Construction of the co-expression network
Potentially trans-regulated protein-coding genes were defined as coexpressed and beyond 100 kb in genomic distance from, or on the other allele of, differentially expressed lncRNAs.The lncRNAs-Transcription factors (TFs) network was constructed using hypergeometric cumulative distribution function with the help of MATLAB 2012b (The MathWorks).The graph of the lncRNAs-TFs network was drawn with the help of Cytoscape 3.01 (Agilent and IBS).If the intersection of these two groups is large enough (P < 0.01, calculated by hypergeometric cumulative distribution function and FDR < 0.01, under the control of the Benjamini and Hochberg procedure), then we predict that these lncRNAs possibly participate in pathways regulated by these TFs.The recently released ENCODE data on TFs and their regulatory targets were used in our analysis

Real-time quantitative reverse transcription-PCR
A two-step reaction process was used for quantification reverse transcription (RT) and PCR.Each RT reaction consisted of 0.5 μg RNA, 2 μL of Primer Script Buffer, 0.5 μL of oligo dT, 0.5 μL of random 6 mers and 0.5 μL of Primer Script RT Enzyme Mix I (TaKaRa, Japan), in a total volume of 10 μL.Reactions were performed in a GeneAmp ® PCR System 7500 (Applied Biosystems) for 15 min at 37 °C, followed by heat inactivation of RT for 5 s at 85 °C.The 10 μL RT reaction mix was then diluted 10-fold in nuclease-free water and held at -20 °C.At the end of the PCR cycles, melting curve analysis was performed to validate the specific generation of the expected PCR product.All experiments were done in triplicate.The expression levels of lncRNAs were normalized to glyceraldehyde-3-phosphate dehydrogenase and were calculated using the 2-ΔΔCt method.The primer sequences were designed in the laboratory based on the DNA sequences and is shown in Table S2.

Statistical analysis
The Statistical Program for Social Sciences (SPSS) 18.0 software (SPSS, Chicago, IL, United States) was employed to perform all the statistical analyses.All data were expressed as the mean ± SD or proportions where appropriate.For comparisons, paired t-tests and unpaired t-tests were performed where appropriate.GraphPad Prism 5.0 for Microsoft Windows (GraphPad Software, San Diego, CA, United States) was used to plot all graphs.P values of 0.05 (two-tailed) were considered statistically significant.

Figure 1 :
Figure 1: Heat map and hierarchical clustering of lncRNA profile comparison between the TNBC and normal breast samples.Each row represents one lncRNA, and each column represents one tissue sample.The relative lncRNA expression is depicted according to the color scale.Red indicates up-regulation; green indicates down regulation.2.0, 0 and -2.0 are folds changes in the corresponding spectrum, whereas N represents normal breast samples tissue and C represents TNBC tissue.The differentially expressed lncRNAs clearly self-segregated into N and C clusters.

Figure 2 :
Figure 2: Comparison between microarray data and qPCR results.A. ENST00000503938 , B. NONHSAT012762 C. XR_250621.1 and D. NONHSAT125629 which were determined to be differentially expressed in TNBC samples compared with NT samples in 3 paired patients by microarray was validated by qPCR in 12 paired tissues.The heights of the columns in the chart represent the log-transformed median fold changes in expression across the 12 patients for the lncRNA validation; the bars represent standard errors.The validation results of the lncRNAs indicated that the microarray data correlated well with the qPCR results.

Figure 3 :
Figure 3: GO analysis A. and KEGG Pathway analysis B. of aberrantly expressed lncRNAs in TNBC.