Analysis of differential gene expression profile identifies novel biomarkers for breast cancer

Breast cancer is the most prevalent cancer diagnosis in women. We aimed to identify biomarkers for breast cancer prognosis. mRNA expression profiling was performed using Gene Chip Human Transcriptome Array 2.0. Microarray analysis and series test of cluster (STC) analysis were used to screen the differential expressed mRNAs and the expression trend of genes. Immumohistochemical staining with 100 clinical specimens was used to validate two differentially expressed genes, ITGA11 and Jab1. In the present study, significantly enriched Gene Ontology (GO) terms and pathways were identified. 26 model profiles were used to summarize the expression pattern of differentially expressed genes. Results of immunohistochemistry were consistent with those of the microarray, in that ITGA11 and Jab1 were differentially expressed with the same trend. Survival analyses using the Kaplan–Meier method demonstrated that breast cancer patients with high levels of either ITGA11 or Jab1 had a significant association with worse prognosis. Our study identified ITGA11 and Jab1 as novel biomarkers for breast cancer.


INTRODUCTION
Breast cancer is the most common cancer [1] and also the second leading cause of cancer deaths for women [2]. The lack of better adjuvant therapy remains to be a major challenge in reducing the burden of breast cancer patients. Nowadays, the tumor size, lymph node involvement, and distant metastasis (TNM) staging system of the American Joint Committee on Cancer (AJCC) has been widely recognized, but there is still a lack of worldwide-recognized system or reliable markers predicting the prognosis of breast cancer patients. While applying for neoadjuvant chemotherapy or endocrine therapy, clinicopathological parameters are usually unstable, which complicates the judgment of real prognosis. Therefore, there is a pressing need to find biomarkers for breast cancer which can help to develop better treatment solutions for breast cancer.
Intensive research has been focused on understanding the molecular mechanisms of breast cancer [3]. Many genetic changes that lead to abnormal cellular functions have been identified in breast cancer cells [4,5]. Multiple factors in the tumor microenvironment further influence the cancer progression via a wide variety of receptors and the corresponding signal pathways [6], which also involve various oncogenes and anti-oncogenes [7,8].

Research Paper
Microarray data analysis, which features high throughput and high sensitivity, has made it possible to test the expression changes of the whole genome [9]. There have been many reports on gene expression profiling in breast cancer [10,11]. Therefore, the development of microarray analysis provides new insight in diagnosis and treatment of breast cancer.
Understanding new developments in transcriptome and pathways may identify novel biomarkers for cancer. Integrin α11 (ITGA11), a integrin family members, involve in various processes that influences the cell's biological behavior, such as metastasis, embryogenesis, hemostasis, immune response, tissue repair, cancer growth, tumor angiogenesis, and resistance to therapy [12,13]. Alterations in integrin disturb cancer cell adhesion and extracellular matrix assembly, which may further lead to tumor metastasis [14]. Integrins also interact with tyrosine kinase receptors which promote cancer cell proliferation, and differentiation [15].
In the current study, we identified potential genes associated with breast cancer tumorigenesis by transcriptional network analysis and further validated ITGA11 from the STC analysis and a differentially expressed gene Jab1/COPS5 in clinical patients.

Transcriptome array analysis of mRNA expression in breast cancer
HE Staining confirmed the breast cancer tissue and paired adjacent noncancerous breast tissue ( Figure 1A). We used Affymetrix GeneChip Human Transcriptome Array 2.0 to analyze mRNA expression in the breast tissue, and applied the RVM t-test to filter the differentially expressed mRNAs, 509 mRNAs were found to be significantly down-regulated (fold change > 1.2, p < 0.05) while 1277 mRNA were markedly up-regulated (fold change > 1.2, p < 0.05) in breast cancer tissue compared with the adjacent noncancerous breast tissue ( Figure 1B).
In order to identify mRNAs that are overrepresented in any functional class, the dysregulated mRNAs were subjected to functional enrichment analysis using FunRich software (Figures 2 and 3). According to the cell component, 19.3% of genes were categorized as cytoplasm, 11.8% of genes were identified in plasma membrane ( Figure 2A). Among molecular functions, genes were enriched in transporter activity (3.9%), ubiquitinspecific protease activity (2.5%), protein serine/threonine kinase activity (2.2%) and extracellular matrix structural constituent (2.1%) ( Figure 2B).
Further to this, genes involved in metabolism, energy pathways, cell growth and/or maintenance and protein metabolism were enriched in ( Figure 3A). In the context of biological pathway, cell cycle, mitotic, polo-like kinase signaling events in the cell cycle, PLK1 signaling events membrane trafficking were significantly overrepresented in mRNAs ( Figure 3B).

STC analysis
To further narrow the range of target genes with high significance, the tumor grade-serial expression pattern of significantly gene was investigated. Each profile consists of a cluster of multiple genes that have similar expression patterns with increasing tumor grade. As shown in Figure 4A, 26 model profiles were used to summarize the expression pattern of these genes. Each box represents a model profile. Among the 26 patterns, totally 10 expression patterns including profiles 23,4,22,7,14,25,26, 20, 17 and 9 showed significant P values (P < 0.001). Six of these clusters contained genes which were stable (profile 22 and 14) or gradually elevated (profiles 23, 25, 26 and 17), while genes in profile 4 had opposite effects to profile 23. Profile 9 contained genes which were suppressed at grade 1 points and then gradually increased expression levels at higher tumor grades points ( Figure 4B).

ITGA11 and Jab1 overexpressed in breast cancer patients
To validate the findings from our transcriptome array analysis, immunohistochemical staining of ITGA11 and Jab1/Cops5 was conducted in 80 breast cancer tissue and 20 noncancerous breast tissue. As expected, both ITGA11 and Jab1/Cops5 were overexpressed in breast cancer ( Figure 5A and 5B). In agreement with the immunohistochemistry findings, the data from online database ONCOMINE indicated that ITGA11 and Jab1/ Cops5 mRNA expression levels in breast cancer are much higher than those in normal breast tissue ( Figure 5C). www.impactjournals.com/oncotarget
In agreement with the immunohistochemistry results, data from ONCOMINE indicated that ITGA11 and Jab1/Cops5 expression levels tend to be higher in breast cancer with higher grade ( Figure 6A). Furthermore, RNA sequencing analysis of mRNA expression from the GEPIA online database revealed that ITGA11 was associated with Jab1/Cops5 in breast cancer patients ( Figure 6B). In addition, our immunohistochemistry data in breast cancer tissue indicated that Jab1 level was associated with ITGA11 levels in breast cancer ( Figure 6C) and both ITGA11 and Jab1 levels were correlated with tumor grade ( Figure 6D).

Survival analysis
The median follow-up time was 64 months (range 10-120). In order to evaluate the prognostic influence of ITGA11 and Jab1/COPS5 expression, we carried out Kaplan-Meier analyses to compare grouped patients. The survival curves demonstrated that patients with high levels of ITGA11 or Jab1/COPS5 had a significant association with worse OS (p = 0.034, p = 0.007; Figure 7A and 7B). We also investigated the relationship between ITGA11 and Jab1/COPS5 and survival of breast cancer patients in GEPIA database. The online data was consistent with the IHC data, suggesting high levels of either ITGA11 or Jab1/COPS5 was associated with worse survival in breast cancer ( Figure 7C and 7D). However, we didn't find significant influence of endocrine therapy on the survival ( Figure 7E). We also analysed the survival in different subtypes of breast cancer and found similar results ( Supplementary Figures 1 and 2).

DISCUSSION
Breast cancer is the most frequent cancer among women worldwide. Screening and diagnosis of breast cancer at earlier stages are of great importance to improve patient survival and reduce treatment costs. However, the underlying mechanism regulating breast cancer aggressiveness remain poorly understood, and biomarkers for the detection of earlystage breast cancer are still lacking.
The advent of high throughput mRNA microarray analysis method makes it possible to detect the expression of thousands of mRNAs, which allows us to have a clearer picture of the global transcriptome in both cancer tissue and normal tissue [18]. In this study, we assessed the mRNA expression profiles in both breast cancer tissue and paired noncancerous breast tissue using microarray technique and explored their possible functions using GO analysis, KEGG pathway analysis and STC analysis. A number of mRNAs were significantly differentially expressed in breast cancer tissue compared with noncancerous breast tissue. In order to validate the microarray results, we further carried out independent measurement of ITGA11 and Jab1 protein levels in breast cancer and noncancerous tissue samples using immunohistochemistry. The immunohistochemistry results showed good consistency with microarray.
We also performed GO analysis, KEGG pathway analysis, and STC analysis to identify the enriched biological functions among the differentially expressed genes. It was found that the genes were participated in a variety of molecular functions, cellular components, and biological processes. Many pathways related to cancer have been identified by the pathway analysis, among which "cell cycle" and "PLK1 signaling events" are two of the most enriched pathways. Cell cycle is a highly organized and regulated process that ensures duplication of genetic material and cell division. Proliferation depends on progression through four distinct phases of the cell cycle, which is regulated by several cyclin-dependent kinases (CDKs) and their cyclin partners [19]. Cancer growth is caused by abrogation of appropriate cell-cycle control, and many cell-cycle kinases are amplified or overexpressed in cancer [19]. Polo-like kinase 1 (PLK1) plays a vital role in cell cycle progression through mitosis via its effects on chromosome segregation, spindle assembly and cytokinesis [20]. Inhibition of PLK1 delay acentriolar spindle formation during mitosis and promote apoptosis [21]. Further, PLK1 is an important regulator of the DNA damage checkpoint [22]. PLK1 is overexpressed in a variety of malignancy including breast cancer [23,24]. Additionally, PLK1 overexpression is associated with poor prognosis in cancer patients [25]. These results demonstrated the reliability of our microarray study.
Integrins are heterodimeric cell surface adhesion receptors contains α and β subunits. Twenty-four distinct integrin heterodimers are expressed in mammals as a result of combinatorial association of 18 α and 8 β subunits [26]. Extracellular matrix (ECM) ligands can bind to the α subunit and activate intracellular signaling events via the β subunit to integrate extracellular and intracellular events necessary for cell motility and invasion [14]. Many integrins are expressed at low or undetectable levels in adult epithelia, but are up-regulated in tumors [26]. Integrin α11 (ITGA11) is expressed in many tissues in the embryo but disappears with maturation in adult tissues [27]. However, it has been proved that its expression is up-regulated in malignant conditions such as non-small-cell lung carcinoma, where it has been suggested to be connected to cancer cell growth [28,29].
We found high expression of ITGA11 was correlated with poor prognosis in breast cancer patients, which is in agreement with previous studies that Integrin expression levels are correlated with prognosis in glioblastoma, melanoma, gastric cancer, cervical cancer, and ovarian cancer [30][31][32][33].
Aberrant overexpression of Jab1/COPS5 is demonstrated to play a role in the pathogenesis of several types of human cancers and correlate with poor cancer prognosis [34,35]. Jab1/COPS5 isopeptidase activity is essential for human and murine mammary epithelial transformation and progression [36]. Jab1/ COPS5 expression was low in or absent from normal breast tissue, while it was abnormally expressed in breast tumors [37]. Importantly, breast cancer patients with Jab1/ COPS5-negative tumors had neither relapse nor disease progression at a median follow-up time of 70 months [38]. In line with these results, our study found that higher levels of Jab1 expression in breast cancer patients compared with that in non-cancerous tissue and Jab1 expression was associated with tumor grade, suggesting an role of Jab1 in tumor progress.
Taken together, we have identified ITGA11 and Jab1 as biomarkers in breast cancer. High throughput microarray data analysis may act as an efficient tool to discover more prognostic markers and therapeutic targets in breast cancer.

Patients and tissue samples
Three breast cancer tissue and three paired adjacent noncancerous breast tissue specimens were from Anyang Tumor Hospital (Anyang, Henan, China). Other 80 cases of breast cancer patients and 20 cases of breast hyperplasia patients treated at Anyang Tumor Hospital from July 2007 to July 2012 were randomly included in this study for immunohistochemical analysis. Breast hyperplasia is a diagnostic category of proliferative disease that includes inflammatory hyperplasia, atypical ductal hyperplasia and atypical lobular hyperplasia. The inclusion criteria for the participants were: aged 18 years above; diagnosis of breast cancer. Exclusion criteria were: preoperative chemotherapy or radiotherapy; deficiency of clinical data or lack of follow up. The diagnosis of breast cancer and hyperplasia was confirmed pathologically. Patients who had preoperative diagnosis and had not received preoperative chemoradiotherapy were selected in our study based on the availability of archived paraffin-embedded tissue blocks for immunohistochemistry. Ethical approval from Anyang Tumor Hospital and informed consent from patients have been obtained. The clinical and pathological characteristics of 80 breast cancer patients were summarized in Table 1.

RNA isolation and transcriptome array
Total RNA in the samples was extracted using Trizol reagent. The integrity and concentration of all RNA samples were measured using the NanoDrop 1000 spectrophotometer. The total RNA extracted from three breast cancer tissue and three paired adjacent noncancerous breast tissue specimens were hybridized to an Affymetrix GeneChip Human Transcriptome Array 2.0. The arrays were scanned by GeneChip ® Command Console ® Software and the acquired array images were analyzed by Affymetrix GeneChip Operating Software.

Cluster analysis and series test of cluster (STC) analysis
Differential expressed genes from microarray data were screened by applying random variance model (RVM) t-test and considered to be down or up regulated with p < 0.05. The cluster analysis of genes was accordingly conducted through GCBI online system (https://www. gcbi.com.cn/gcuser/html/member/home). STC algorithm of gene expression was performed to profile the gene expression with grade malignancy series and to identify the most probable set of clusters generating grade malignancy series. Dynamic nature of gene expression profiles was taken into account in STC and therefore it can identify the number of distinct clusters. Fisher's exact test was used to examine significant profiles, and p < 0.05 was considered as the threshold of significance.

Functional enrichment analysis
mRNAs identified in breast cancer were subjected to Gene Ontology (GO) and biological pathway enrichment analysis using FunRich tool (http://www.funrich.org) against human FunRich background database.

Analysis of clinical mRNA microarrays for the detection of correlations between ITGA11 and patients survival
Transcriptome data from patient samples of breast cancer were analyzed using the online database ONCOMINE (https://www.oncomine.org/resource/ login.html) to investigate whether the expression of the markers are associated with tumor grade. RNA sequencing analysis and visualization platform GEPIA (http://gepia. cancer-pku.cn/) was used to determine whether the expression levels of ITGA11 and Jab1 were correlated in breast cancer. GEPIAwas used to determine whether the expression of ITGA11 and Jab1 was correlated with the breast cancer patients' overall survival.

Immunohistochemical analysis
The tissues were processed routinely and stained with hematoxylin and eosin (HE) [39]. ITGA11 and Jab1 levels in the formalin-fixed, paraffin-embedded tissue were evaluated using immunohistochemical staining, as described in our previous work [40]. Briefly, the samples were sectioned and mounted on slides, following drying at 60°C for 1 hour. Slides were then deparaffinized in 2-xylene. To retrieval antigen, the slides were boiled for 3 minutes in 0.01 mol/L sodium citrate (pH 6.0) and then cooled at room temperature for 30 minutes. To block the Endogenous peroxidase activity, the slides were further immersed in 0.3% H 2 O 2 . Then the slides were incubated with the primary antibodies ITGA11 (Santa Cruz, sc-98740) and Jab1 (Santa Cruz, sc-13157) diluted at 1:200 overnight at 4°C and were detected by a secondary antibody kit (Dako Corp). ITGA11 and Jab1 expression were measured by counting no less than 400 tumor cells. Tumor cells were considered positive for markers when nuclear or cytoplasmic staining was present. The positivity represented the estimated fraction of positively stained cells (-, ≤5%; +, 5% to 25%; ++, 26% to 50%; +++, >50%). All experiments were performed in accordance with approved guidelines and regulations of Anyang Tumor Hospital.

Follow-up and statistical analysis
Overall survival (OS) was calculated as the period from initial diagnosis to death regardless of breast cancer related or not. Before closing the research database, the authors updated the follow-up data of patients who had not visited our outpatient department for more than three months. Patient follow-up was censored at the time of death or finalization of the study. Percent of ITGA11 positive cells and Jab1 positive cells were presented as means ± standard deviation (SD). Categorical variables were presented as numbers and percentages. Comparisons between groups were carried out with the T test or the one-way ANOVA and LSD tests for continuous variables. Multivariate survival analyses were performed to identify independent factors for overall survival. Kaplan-Meier method was applied for performing stratified overall survival analysis, followed by the log-rank test. It was regarded as statistically significant when P < 0.05.