Identification of key pathways and genes in response to trastuzumab treatment in breast cancer using bioinformatics analysis

Breast cancer (BC) is one of the leading causes of death among women worldwide. The gene expression profile GSE22358 was downloaded from the Gene Expression Omnibus (GEO) database, which included 154 operable early-stage breast cancer samples treated with neoadjuvant capecitabine plus docetaxel, with (34) or without trastuzumab (120), to identify gene signatures during trastuzumab treatment and uncover their potential mechanisms. The gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) enrichment analyses were performed, and a protein–protein interaction (PPI) network of the differentially expressed genes (DEGs) was constructed by Cytoscape software. There were 2284 DEGs, including 1231 up-regulated genes enriched in DNA replication, protein N-linked glycosylation via asparagine, and response to toxic substances, while 1053 down-regulated genes were enriched in axon guidance, protein localization to plasma membrane, protein stabilization, and protein glycosylation. Eight hub genes were identified from the PPI network, including GSK3B, RAC1, PXN, ERBB2, HSP90AA1, FGF2, PIK3R1 and RAC2. Our experimental results showed that GSK3B was also highly expressed in breast cancer tissues and was associated with poor survival, as was β-catenin. In conclusion, the present study indicated that the identified DEGs and hub genes further our understanding of the molecular mechanisms underlying trastuzumab treatment in BC and highlighted GSK3B, which might be used as a molecular target for the treatment of BC.


INTRODUCTION
Breast cancer (BC) is one of the leading causes of death among women worldwide [1]. The increasing incidence of BC is due to various genetic and environmental changes that lead to the disruption of the cellular signaling network [2][3][4][5]. In 2012, there were 14.1 million new cancer cases globally, and BC accounted for 11.8% of them [6]. The introduction of trastuzumab, a humanized monoclonal antibody targeting the extracellular domain of HER2, revolutionized the treatment of HER2positive BC [7][8][9]. The current gold standard in clinical practice is 1 year of adjuvant trastuzumab administration. Several trials have been conducted, looking to further refine the adjuvant treatment of patients with early-stage HER2-positive BC.
Catenin beta-1, also known as β-catenin, is a protein that is encoded by the CTNNB1 gene. β-Catenin is an oncogene that plays a key role in the signaling output of the canonical Wnt cascade [10]. Wnt signaling results in Research Paper www.oncotarget.com β-catenin accumulation and transcriptional activation of specific target genes that regulate a remarkable variety of cellular processes, such as cell proliferation, cell survival and migration [11]. Mutations and overexpression of β-catenin are associated with many cancers, including hepatocellular carcinoma, colorectal carcinoma, lung cancer, breast tumors, ovarian cancer and endometrial cancer [12]. Although mutation of CTNNB1 is rare in BC [13], mounting evidence has revealed that the mutations in CTNNB1 are often associated with an upregulation of β-catenin and the pathogenesis of endometrial cancer and ovarian cancer [14]. Several variants of CTNNB1 were found to be associated with BC risk [15,16], but the mechanism of CTNNB1 in BC is still unknown.
Glycogen synthase kinase 3 beta, also known as GSK3B, is an enzyme that in humans is encoded by the GSK3B gene. GSK3B, a substrate of PI3K/ Akt signaling, plays an important role in fundamental functions, such as the cell cycle, cytoskeletal integrity, apoptosis, transcription factor expression and formation of neurofibrillary tangles through PI3K/Akt signaling [17,18]. GSK3B has also been shown to interact with CTNNB1 [19], as the phosphorylation of GSK3B leads to β-catenin nuclear accumulation [20].
There are no reliable biomarker profiles available to identify key genes and pathways in BC with trastuzumab treatment. Furthermore, numerous clinical studies have been performed with data (GSE22358) from the Gene Expression Omnibus (GEO, www.ncbi.nlm.nih.gov/ geo/) [21][22][23][24][25]. The Glück et al. [26] dataset (GSE22358) includes 154 stage II-III samples from patients with operable early-stage breast cancer prior to neoadjuvant chemotherapy of capecitabine plus docetaxel, with (34) or without (120) trastuzumab. The data include histologic grade, molecular subtype, ER-status, PR-status, HER2status, p53 status and response to treatment. Therefore, we chose GSE22358 to identify a molecular predictor of trastuzumab benefit in BC.
In this study, we chose GSE22358 from GEO and used the GEO2R online tool to detect the differentially expressed genes (DEGs). Subsequently, the DEGs were screened using Gene-E software and hub genes with a high degree of connectivity were selected. Next, we established a PPI network of the DEGs gene ontology (GO) and pathway enrichment analysis. Moreover, analyses of biological process (BP), molecular function (MF), cellular component (CC) and KEGG pathways of the DEGs and three modules were performed. Overall survival (OS) analysis of these hub genes was performed using the Kaplan-Meier plotter online database (http://kmplot. com/analysis/). By analyzing the biological functions and pathways, we may gain further insight into BC treatment at a molecular level and explore the potential candidate biomarkers for diagnosis, prognosis, and drug targets.

Identification of DEGs
A total of 154 operable early-stage BC samples receiving neoadjuvant capecitabine plus docetaxel, with (34) or without trastuzumab (120) were analyzed. The series from each chip was analyzed separately using GENE-E software, which identified the DEG lists. Based on the GENE-E analysis, using P < 0.05 criteria, a total of 2284 genes were identified after the analyses of GSE22358 of which 1231 were up-regulated and 1053 were down-regulated. The DEG expression heat map (top 50 up-regulated and down-regulated genes) is shown in Figure 1. We uploaded all DEGs to the online software DAVID to identify overrepresented GO categories and KEGG pathways. GO analysis results showed that up-regulated DEGs were significantly enriched in biological processes (BP), including DNA replication, protein N-linked glycosylation via asparagine, response to toxic substance, mRNA 3′-end processing, and anaphase-promoting complex-dependent catabolic process (Table 1); the down-regulated DEGs were significantly enriched in biological processes, including axon guidance, protein localization to plasma membrane, protein stabilization, protein glycosylation, and regulation of phosphatidylinositol 3-kinase signaling (Table 1). For molecular function (MF), the upregulated DEGs were enriched in DNA polymerase binding, ATPase activity, protein binding, microtubule binding, and poly(A) RNA binding, and the down-regulated DEGs were enriched in solute:proton symporter activity, metallocarboxypeptidase activity, ErbB-3 class receptor binding, calcium-dependent protein binding, and protein tyrosine phosphatase activity (Table 1). In addition, GO cell component (CC) analysis showed that the up-regulated DEGs were significantly enriched in the nucleoplasm, membrane, melanosome, nucleus, and intracellular membrane-bound organelles, and down-regulated DEGs enriched in the Golgi membrane, endoplasmic reticulum membrane, membrane, Golgi apparatus, and actin cytoskeleton (Table 1). Table 2 contains the most significantly enriched pathways of the up-regulated DEGs and down-regulated DEGs analyzed by KEGG analysis. The up-regulated DEGs were enriched in protein processing in the endoplasmic reticulum, base excision repair, pentose and glucuronate interconversions, proteasome, and ascorbate and aldarate metabolism while the down-regulated DEGs were enriched in proteoglycans in cancer, the ErbB signaling pathway, bacterial invasion of epithelial cells, the insulin signaling pathway, and the neurotrophin signaling pathway. www.oncotarget.com

Module screening from the PPI network
Based on the information in the STRING database, among these genes, GSK3B showed a 42-node degree. Moreover, a total of 1000 nodes and 2079 edges were analyzed using the plug-in MCODE. The top 3 significant modules were selected, and the functional annotation of the genes involved in the modules were analyzed ( Figure  2). Enrichment analysis showed that the genes in module 1 were mainly associated with GSK3B, RAC1, PXN, ERBB2, HSP90AA1, FGF2, PIK3R1, and RAC2.

GSK3B is upregulated in breast cancer
Cross-cancer alteration analysis showed that GSK3B has an amplification pattern in most cancer types, especially in breast cancer ( Figure 3). To further investigate the function of GSK3B in breast cancer, we performed immunohistochemical staining on breast cancer tissues and adjacent normal tissues. The staining results revealed significantly higher positivity for GSK3B expression in BC tissues than in adjacent normal tissues ( Figure 4A). Similarly, GSK3B is also highly expressed in breast cancer patients ( Figure 4B, 4C). Beta-catenin encoded by CTNNB1 is a constituent of adherens junctions and acts as an intracellular signal transducer in the Wnt signaling pathway, a pathway that is closely related with the occurrence, development, invasion and metastasis of breast cancer. We also found that β-catenin is upregulated in breast cancer patients ( Figure 4B, 4C). GSK3B and CTNNB1 are highly expressed in bladder cancer and breast cancer, and they are notably positively correlated ( Figure 4D, 4E). Moreover, overexpression of GSK3B or CTNNB1 increased cell proliferation ( Figure  4F). High levels of GSK3B and CTNNB1 were observed in poor survival curves ( Figure 5). These findings have revealed that the GSK3B signaling pathway may be a potential target for breast cancer therapy.

DISCUSSION
As the leading cause of cancer mortality in women, BC is a serious public health problem worldwide, and the age of onset tends to be younger in recent years [1,5,27]. In addition, BC is lacking effective methods for early screening and diagnosis. Therefore, sensitive and specific biomarkers for BC are urgently needed. Trastuzumab, an anti-HER2 humanized antibody, has shown great clinical benefits in HER2-positive BC treatment [28]. In recent years, there has been great interest in researching the mechanisms of trastuzumab treatment. Studies have  shown that numerous molecules play important roles in how a patient responds to trastuzumab, including ERBB-family SNPs, p53 protein, BAG-1 protein and individual patients' metabolism [29][30][31][32]. Xiong et al. [33] confirmed that CD147 suppression enhances the effects of trastuzumab through MAPK and Akt phosphorylation while HER2 amplification level is not currently a prognostic factor for trastuzumab-based targeted therapy [34]. In addition, studies have shown that AUY922 [35] or taxane [36] plus trastuzumab is an effective regimen for patients with relapsed HER2-positive BC after (neo)adjuvant trastuzumab. However, trastuzumab resistance has emerged as a major problem in its clinical application. Many studies have attempted to elucidate the mechanisms underlying trastuzumab resistance. Heregulin, MEOX1 and lncRNA GAS5 confers resistance to the anti-HER2 agent trastuzumab [37][38][39]. Some studies have demonstrated that inhibition of S100P and depletion of KLK10 results in reversal of trastuzumab-resistance (TzR) [40,41]. In contrast to trastuzumab that inhibits the ErbB2 homodimer, another therapeutic antibody H2-18 binds to domain I of ErbB2, which induces programmed cell death (PCD) and exhibits greater antitumor efficacy than trastuzumab [42].
In the present study, we extracted data from GSE22358 and identify 1231 up-regulated and 1053 down-regulated DEGs with or without trastuzumab using bioinformatics analysis. To gain a more indepth understanding of these DEGs, we performed GO function and KEGG pathway analysis. The GO analysis showed that up-regulated DEGs were mainly involved  in DNA replication, protein N-linked glycosylation via asparagine, and response to toxic substance, and downregulated DEGs were involved in axon guidance, protein localization to plasma membrane, protein stabilization, and protein glycosylation. Furthermore, the KEGG pathways of up-regulated DEGs included proteoglycans in cancer, the ErbB signaling pathway, bacterial invasion of epithelial cells, the insulin signaling pathway and  the neurotrophin signaling pathway while the downregulated DEGs were enriched in protein processing in the endoplasmic reticulum, base excision repair, pentose and glucuronate interconversions, and proteasome ascorbate and aldarate metabolism. This finding is consistent with the fact that glycosylation and metabolic processes play an important role in cancer processes. We also constructed a PPI network with DEGs and listed the top eight hub genes: GSK3B, RAC1, PXN, ERBB2, HSP90AA1, FGF2, PIK3R1 and RAC2. The  hub genes play an important role in cancer cell growth, migration, and invasion, especially in breast cancer [43][44][45][46][47]. Furthermore, these genes were involved in significant pathways, including the Fc receptor signaling pathway and regulation of the immune response pathway.
GSK3B was identified as one of the hub genes exhibiting the highest degree of connectivity. Furthermore, our experimental results showed that GSK3B was also highly expressed in breast cancer tissues and associated with poor survival (Figure 4A-4D). We hypothesize that this gene might contribute to the progression of breast cancer with trastuzumab treatment.
GSK3B associates with the destruction complex through a binding site in AXIN1 and phosphorylates β-catenin, which is subsequently targeted for proteosomal degradation [48]. Wang et al. [43] conducted an association study to determine whether common genetic variations in six genes (APC, AXIN1, AXIN2, CSNK1D, CSNK1E, and GSK3B) that encode the destruction complex of the Wnt/β-catenin signaling pathway account in part for the contribution of the pathway to BC risk. Mole et al. [49] also reported that the truncated somatostatin receptor variant sst5TMD4 is associated with increased invasiveness and aggressiveness in BC. sst5TMD4 overexpression increases vimentin, total β-catenin and phosphorylated GSK3B levels.
GSK3B can interact with β-catenin [19], and β-catenin is also highly expressed in BC. β-Catenin is a marker of poor prognosis in human cancer and has been implicated in human breast cancer, via targeting cyclin D1 or vimentin [50][51][52]. A number of studies have suggested that dysregulation of Wnt/β-catenin signaling occurs in human breast cancer. Blockade of Wnt/β-catenin signaling could suppress breast cancer metastasis [53,54]. Figure 4E shows the results of the correlation analysis between GSK3B and β-catenin. GSK3B and β-catenin are notably positively correlated. Therefore, we speculated that GSK3B has a similar function as β-catenin in breast cancer, but further verification is still needed.
In conclusion, our data provide a comprehensive bioinformatics analysis of DEGs, which may be involved in the progress of trastuzumab treatment. The study provides a set of useful targets for future investigation into the molecular mechanisms and biomarkers. However, further molecular biological experiments are required to confirm the function of the identified genes.

Microarray data
The gene expression profiles of GSE22358 were downloaded from the GEO database. GSE22358, which was based on the Agilent GPL5325 platform (Agilent, CA, USA), was submitted by Gluck et al. The GSE22358 dataset contained 154 operable early-stage BC samples receiving neoadjuvant capecitabine plus docetaxel, with (34) or without trastuzumab (120).

Identification of DEGs
The raw data files used for the analysis included TXT files (Agilent platform). The analysis was performed using GENE-E (version 3.0, Broad Institute, USA). We applied hierarchical clustering analysis to categorize the data into two groups that had similar expression patterns with or without trastuzumab. We used a classical t test to identify DEGs with defined a P value cutoff of ＜0.05 as statistically significant.

Gene ontology and pathway enrichment analysis of DEGs
Gene ontology analysis (GO) is a common useful method for annotating genes and gene products and for identifying characteristic biological attributes of highthroughput genome or transcriptome data [55,56]. KEGG (http://www.genome.jp/) is a knowledge base for systematic analysis of gene functions, linking genomic information with higher-order functional information [57,58]. Comprehensively mapping a user's gene to the relevant biological annotation in the DAVID database (https://david.ncifcrf.gov/) is an essential foundation for the success of any high-throughput functional gene analysis [59]. To analyze the DEGs at the functional level, GO enrichment and KEGG pathway analysis were performed using the DAVID online tool. P < 0.05 was considered statistically significant.

Integration of a protein-protein interaction (PPI) network and module analysis
The Search Tool for the Retrieval of Interacting Genes (STRING) database is an online tool designed to evaluate protein-protein interaction (PPI) information. STRING (version 10.0) covers 9,643,763 proteins from 2031 organisms. To evaluate the interactive relationships among DEGs, we mapped the DEGs to STRING, and only experimentally validated interactions with a combined score >0.4 were selected as significant. Then, PPI networks were constructed using the Cytoscape software. The plug-in Molecular Complex Detection (MCODE) was used to screen the modules of the PPI network in Cytoscape. The criteria were set as follows: MCODE scores >3 and number of nodes >4. Moreover, the function and pathway enrichment analyses were performed for DEGs in the modules. P < 0.05 was considered to have significant differences.

Integrative analysis of complex cancer genomics with cBioPortal
The cBioPortal for Cancer Genomics (http://www. cbioportal.org) provides visualization, analysis and www.oncotarget.com download of large-scale cancer genomics data sets [60,61]. GSK3B was selected for analysis in different cancers.

Human sample collection
The patients were from Dazhou Central Hospital.

Statistical analysis
All data are expressed as the mean ± s.e.m. Statistical analysis was performed with GraphPad PRISM version 5.01 (GraphPad software, Inc.) and SPSS 18.0 software package (SPSS Inc.).

Author contributions
J.F. and F.H. collected and analyzed the data. Y.T. wrote and revised the manuscript. X.F., F.Z. and Y.C. analyzed the data. F.Z. designed experiments, interpreted the data, and revised the manuscript.