Gene modules associated with breast cancer distant metastasis-free survival in the PAM50 molecular subtypes

To identify PAM50 subtype–specific associations between distant metastasis-free survival (DMFS) in breast cancer (BC) patients and gene modules describing potentially targetable oncogenic pathways, a comprehensive analysis evaluating the prognostic efficacy of published gene signatures in 2027 BC patients from 13 studies was conducted. We calculated 21 gene modules and computed hazard ratios (HRs) for DMFS for one-unit increases in module score, with and without adjustment for clinical characteristics. By comparing gene expression to survival outcomes, we derived four subtype-specific prognostic signatures for BC. Univariate and multivariate analyses showed that in the luminal A subgroup, E2F3, PTEN and GGI gene module scores were associated with clinical outcome. In the luminal B tumors, RAS was associated with DMFS and in the basal-like tumors, ER was associated with DMFS. Our defined gene modules predicted high-risk patients in multivariate analyses for the basal-like (HR: 2.19, p=2.5×10−4), luminal A (HR: 3.03, p=7.2×10−5), luminal B (HR: 3.00, p=2.4×10−10) and HER2+ (HR: 5.49, p=9.7×10−10) subgroups. We found that different modules are associated with DMFS in different BC subtypes. The results of this study could help to identify new therapeutic strategies for specific molecular subgroups of BC, and could enhance efforts to improve patient-specific therapy options.


INTRODUCTION
Breast cancer (BC) is a biologically and clinically heterogeneous disease with diverse morphologies, molecular features, and clinical behaviors. Clinicopathologic characteristics such as clinical TNM stage at diagnosis, histologic grade, lymph node involvement and estrogen receptor (ER) and human epidermal growth factor receptor 2 (HER2) statuses have been associated with BC prognosis. Gene expression studies have shown that different molecular BC subtypes exhibit different characteristics and prognoses [1,2]. Tumor subtypes based on the PAM50 classifier have distinct prognoses, and respond differently to systemic therapy [3][4][5]. Further, expression signatures for genes such as Gene70 [6], Gene76 [7], Grade Index (GGI) [8] and OncotypeDx [9] have been studied to help identify low-risk patients who are unlikely to benefit from systemic adjuvant therapy, while still correctly identifying high-risk patients [6][7][8][9][10][11].
Several gene signatures have been identified to describe oncogenes such as RAS, E2F3, SRC, MYC and β-catenin [12], as well as activation of insulin-like growth factor 1 (IGF1) [13], and the AKT/mammalian target of rapamycin (mTOR) [14] and mitogen-activated protein kinase (MAPK) [15] pathways. Other signatures have identified PIK3CA mutations [16], and deficiencies in PTEN [17]. Chemotherapy sensitivity in different BC subtypes is altered based on activation of different oncogenic pathways [18], and Desmedt, et al. reported that BC survival depended on ER and HER2 status [4]. However, it is still unknown whether patients within different PAM50 subtypes have different DMFS based on activation of different biological process and oncogenic pathways.
To identify robust, PAM50 subtype-specific associations between DMFS and gene modules describing biologically relevant, potentially targetable oncogenic pathways and prognosis signatures, a comprehensive systematic analysis evaluating the prognostic efficacy of 21 published gene signatures in 2027 BC patients from 13 studies [16,[19][20][21][22][23][24][25][26][27][28][29][30] was conducted. In this pooled insilico study, we also investigated whether modules were associated with DMFS beyond clinical characteristics in each molecular subtype. Furthermore, we wanted to confirm previous findings from small sample studies on the association between DMFS and specific pathways, such as AKT-MTOR and RAS [31], and extend our analysis to PAM50 subtypes. This study expands on the analysis conducted by Desmedt,et al. [4] in four ways. First, ten more oncogenic pathways were evaluated in our study. Second, only datasets generated using the Affymetrix U133 plus 2.0 or U133A platforms were utilized in our study, while the datasets used in Desmedt,et al. [4] were gathered from different platforms, possibly resulting in heterogeneity [32]. The final distinction is that we not only explored the associations between gene modules and BC patient survival within each PAM50 subtype, but we also identified four subtype-specific prognosis modules that could predict high-risk of distant metastasis.

Identification of PAM50 subgroups
The clinicopathological characteristics of the 2027 BC patients included in the study are listed in Table 1. Out of the 2027 samples, we classified 554 as luminal A, 774 as luminal B, 129 as HER2+, 440 as basal-like and 130 as normal-like. There are differences between subtypes based on PAM50 with regard to DMFS. The survival probability for DMFS (Supplementary Figure S1) of the basal-like subtype was lower than the luminal A and luminal B groups (Hazard ratio (HR)=0.37, P < 2.0×10 −16 for luminal A; HR=0.80, P = 1.5×10 −2 for luminal B). However, no survival difference was found between the basal-like and HER2+ subgroups (HR=1.07, P= 0.65).

Pair-wise gene module correlations
Pair-wise correlations between 21 gene modules for the pooled population are depicted in Figure 1 (Pearson's coefficient correlations are listed in Supplementary Table  S3). GGI, a gene signature that mainly measures tumor proliferation, was highly correlated with Oncotype DX, Gene70, modules describing proliferation (AURKA) and PTEN loss, and MYC and IGF1 pathway activation.

Gene modules associated with DMFS
Univariate associations between gene modules and DMFS were analyzed for all patients and PAM50 subtypes (Figure 2). High module scores of AURKA, Gene70, AKTmTOR, HER2 receptor signaling and PTEN loss were associated with poor DMFS in the luminal A and luminal B subtypes, but not in the basal-like and HER2+ subtypes. Additionally, a significant reduction in DMFS was associated with low ER signaling, PIK3CA and Gene76 module scores, and high PLAU, VEGF, E2F3, IGF1, MAPK, MYC, RAS, OncotypeDX and GGI module scores for the entire cohort ( Figure 2A). Luminal A tumors with high STAT1, E2F3 and Gene70 module scores had poor DMFS ( Figure 2C); luminal B tumors with high activation of IGF1, MAPK, MYC, and OncotypeDX gene modules and low activation of the ESR1 module had poor DMFS ( Figure 2D). However, there is no module significantly associated with DMFS within the basal-like and HER2+ subtypes after FDR adjustment ( Figure 2B & 2E). In a multivariate model including age, histologic grade, node status and treatment, we found that node-positivity; grades 2 and 3, no adjuvant treatment and chemotherapy were all associated with poor DMFS for the entire cohort (Table 2). Additionally, older patients (>=50 years) with basal-like tumors who received chemotherapy, had grade 2 or 3 luminal A tumors, or had luminal B tumors with positive nodal status had poor DMFS ( Table 2). There was no covariate associated with DMFS in the HER2+ subtype.
GO biological process analysis showed that the basallike, luminal A and HER2+ specific modules contained genes enriched in the immune response, cell cycle process and response to chemical stimulus, respectively (Table 3). Genes in the luminal B module were not significantly associated with any specific processes.

DISCUSSION
To identify biological processes, oncogenic pathways and prognosis signatures associated with DMFS in PAM50 BC subtypes, an extensive analysis of gene expression data generated from 2027 BC patient samples was conducted. We have shown that molecular subtype-specific modules contain distinct prognostic information. Further, we identified four subtype-specific prognosis signatures that successfully predict high-risk BC patients with poor survival outcomes.
In the entire study population, low ER signaling, PIK3CA and Gene76 module scores, and high AURKA, E2F3, IGF1, MYC, PTEN, RAS, OncotypeDX, GGI and Gene70 module scores were associated with poor DMFS. It was previously reported that low ER signaling and high RAS module scores were associated with poor DMFS [31]. Moreover, it is not surprising that the proliferationrelated modules (AURKA, OncotypeDX, GGI and Gene70) are involved in BC prognosis, since increased expression of proliferation-related genes was associated with poor outcome [33].
In the luminal A subgroup, the E2F3 and PTEN pathways appeared to predict DMFS both at the univariate and multivariate levels. High E2F3 and PTEN loss module scores were associated with poor outcome. Similarly, Hollern, et al. reported that the E2F transcription factors play important roles in regulating tumor development and metastasis in a mouse module of metastatic BC [34]. Schade, et al. found that tumors from PTEN-deficient/NIC mice showed histopathological and molecular features of the luminal subtype of primary human BC, and PTEN deficiency in this type of mouse model leads to dramatic acceleration of mammary tumorigenesis and metastasis. Functional studies still need to be conducted to determine whether E2F3 and PTEN loss actually mediate tumorigenesis and metastasis, and affect prognosis of luminal A BC.
Our study also found that in the luminal B subgroup, high expression of the RAS pathway was significantly associated with poor clinical outcome. Zhang, et al. reported that RAS GTPase-activating protein SH3 domain-binding protein 1 (G3BP1), an essential RAS mediator, participates in the progression of BC via activation of the epithelialto-mesenchymal transition, and that it could be a potential therapeutic target for metastatic human BC [35]. In addition, a recent study suggested that RAS signaling activation was a key determinant for metastatic dissemination and was strongly linked to poor survival of luminal BC patients [36].
These findings emphasize the need for additional prognostic markers for molecular subgroups, specifically for the HER2+ and basal-like subgroups, which are associated with limited therapeutic options and poor prognosis. In our integrated study, four subtype-specific prognosis signatures were identified, with some overlaps between the genes within different modules (Supplementary Figure S3). The luminal A-specific gene module contains genes involved in the cell cycle process, which is correlated with clinical outcomes of BC [33]. The HER2+ specific gene module contains genes involved in response to chemical stimulus. It has been reported that an expression signature enriched in response to chemical stimulus was related to acquired anthracycline resistance in human BC cells [37]. The basal-like specific module contained genes enriched in the immune response. Interestingly, Desmedt,et al. [4] and Teschendorff, et al. [38] reported that immune response might be linked with development of distance metastases subgroup C. and HER2+ subgroup D. P-values were obtained from the log-rank test. www.impactjournals.com/oncotarget of BC. Moreover, tumor-infiltrating lymphocytes, routinely used as immune response markers, are most frequently found in triple-negative BC, and their higher presence at diagnosis is associated with better clinical outcomes after adjuvant chemotherapy in this subtype [39,40].
There are several caveats to our study. First, as a retrospective study, heterogeneous patient cohorts were included. Second, we did not attempt to identify an optimal cutoff, but rather used a continuous value for the various modules based on their associations with DMFS. Because different gene expression-based platforms and protocols were used in each study, standardization of a module cutoff value may be unreliable. Third, the prognostic discriminative power of our defined four subtype-specific signatures should be validated in prospective clinical trials.
Despite these limitations, our observations may have potential implications for the clinical management of BC. First, we provide additional evidence that different biological processes and oncogenic pathways are associated with DMFS in different BC subtypes. Second, our results generate hypotheses that should be tested in BC subtype-focused trials of targeted agents. Specifically, it may be worth combining RAS pathwaytargeted therapeutics, like Mek inhibitors, together with routine therapy for luminal B BC patients. For luminal A subtypes, PTEN loss modules and E2F3 activation are associated with poor DMFS, suggesting that these patients are likely to benefit from poly (ADP-ribose) polymerase (PARP) inhibitors [41]. For the HER2+ subtype, our prognosis signature included genes involved in response to chemical stimulus. As for basal-like subtype tumors, our results emphasize the importance of immune response in relation to DMFS, and suggest that routine assessment and quantification of tumorinfiltrating lymphocytes could provide meaningful prognostic information in a clinical setting.
Normalized gene expression data were downloaded from the GEO data repository with accession numbers: GSE7390 [19], GSE9195 [16], GSE16446 [20], GSE45255 [21], GSE20685 [22], GSE6532 [23], GSE11121 [24], GSE12093 [25], GSE2603 [26], GSE25066 [27], GSE42568 [28], GSE17907 [29] and GSE12276 [30]. The accession numbers were used to name these datasets. Characteristics of the cohorts are summarized in Table 1. We used the probe set 205225_at for ER, 216836_s_at and 234354_x_at for HER2 with Affymetrix U133A and U133Aplus2 platforms, respectively, as previously reported [42]. The cutoffs for ER and HER2 expression were derived from fitting two normal distributions to the observed distribution of expression values for each study separately using the MCLUST function in R. Probes were mapped to gene symbols, and the ones without known gene symbols were filtered. When multiple probes were mapped to the same gene, the average expression of these probes in a particular data set was selected to represent the gene with the collapsedRows function [43] in R.

Breast cancer molecular subtypes
We assessed the molecular subtypes for each tumor based on the PAM50 algorithm [44] with the "genefu" R package (http://www.bioconductor.org/packages/release/ bioc/html/genefu.html). Samples belonging to the basallike, HER2+, luminal A, and luminal B subtypes were included for subtype-based analysis.

Module scores
To compute module scores derived from gene signatures for each sample within different datasets, we calculated module score as follows: Module Score = where n is the gene number in a specific module, x i was the expression level of the gene, and gene-specific weights w i were equal to -1 or +1 depending on the direction of their association with the phenotype in the original publication. Each module score was robustly scaled within a study so that the 2.5%, 50% and 97.5% quintiles equaled −1, 0 and +1, respectively, allowing for comparison between different datasets generated with different microarray technologies and normalization procedures. Compositions and weights of the gene modules are provided in Supplementary Table S1.

Development of molecular subtype-specific prognostic survival modules
For each BC molecular subtype, the following two steps were included in the identification of the subtype-specific prognostic signature: (1) determination of the association between each gene and DMFS and (2) determination of the module with the optimal gene number, with high prognostic prediction ability between gene expression and DMFS. The Cox regression analysis was used to evaluate the association between DMFS and the expression level of each gene after divided by its median; we estimated the HRs of each gene within each dataset separately and combined them using inversevariance weighting within a fixed effect model. As for prognostic prediction, we ranked the DMFS-associated genes according to their p values of HRs, then defined 20 candidate modules with the number of top ranked prognostic genes: where n was 1 to 20, and computed module scores. As described in the Module scores section, if the combined HR of a specific gene was larger than 1 then its weight was 1, or else its weight was -1. Finally, the module that was most closely correlated with DMFS in univariate and multivariate analyses, that is, with the largest HR value for one unit increase of the module score, was identified.

Statistical analysis
Statistical analyses were performed using the R statistical software version 3.2.0 [45] with our customized functions. Modeling strategy is described in Supplementary  Table S5. Within the entire cohort, we performed pairwise correlations between the different modules using the Pearson's correlation. All reported p-values were two sided.

Survival analysis
We considered DMFS as the survival end point. Survival analysis was conducted via the 'survival' R package. Survival curves were based on Kaplan-Meier estimates.
To study the univariate association relationships between gene modules and DMFS across different datasets, HRs and 95% CIs for one unit increase in module score were computed using a Cox regression model with the dataset as stratum indicator, allowing for different baseline HRs between different cohorts. This kind of analysis was also performed within PAM50 subtypes.
For multivariate analysis between gene modules and DMFS, a Cox regression model with the independent variables, age (>=50 or <50), clinical nodal status (negative or positive), histologic grade (1, 2 or 3) and treatment (chemotherapy, endocrine therapy, both endocrine therapy and chemotherapy, or no adjuvant therapy) was used to select the covariates significantly related to DMFS (p<0.05) for adjustment. We calculated adjusted HRs for DMFS for one-unit increase in module score with the data sets as stratum indicator.
For multiple module testing adjustment, the false discovery rate (FDR) values were computed with the p.adjust R function with the fdr option.

Gene ontology and functional analysis
Gene ontology (GO) analyses to test modules for enrichment of genes associated with particular biological processes were done using DAVID (http://david.abcc. ncifcrf.gov/) [46], a web-delivered application that enables the discovery, visualization, and exploration of molecular interaction networks in gene expression data.