A generic cycling hypoxia-derived prognostic gene signature: application to breast cancer profiling.

BACKGROUND
Temporal and local fluctuations in O2 in tumors require adaptive mechanisms to support cancer cell survival and proliferation. The transcriptome associated with cycling hypoxia (CycHyp) could thus represent a prognostic biomarker of cancer progression.


METHOD
We exposed 20 tumor cell lines to repeated periods of hypoxia/reoxygenation to determine a transcriptomic CycHyp signature and used clinical data sets from 2,150 breast cancer patients to estimate a prognostic Cox proportional hazard model to assess its prognostic performance.


RESULTS
The CycHyp prognostic potential was validated in patients independently of the receptor status of the tumors. The discriminating capacity of the CycHyp signature was further increased in the ER+ HER2- patient populations including those with a node negative status under treatment (HR=3.16) or not (HR=5.54). The CycHyp prognostic signature outperformed a signature derived from continuous hypoxia and major prognostic metagenes (P<0.001). The CycHyp signature could also identify ER+HER2 node-negative breast cancer patients at high risk based on clinicopathologic criteria but who could have been spared from chemotherapy and inversely those patients classified at low risk based but who presented a negative outcome.


CONCLUSIONS
The CycHyp signature is prognostic of breast cancer and offers a unique decision making tool to complement anatomopathologic evaluation.


INTRODUCTION
Hypoxia is nowadays described as a hallmark of tumors [1,2]. Tumor angiogenesis and glycolytic metabolism are two extensively studied responses of cancer cells to a deficit in oxygen [1]. The building of new blood vessels to bring O 2 and the respirationindependent metabolism to survive under low O 2 are actually complementary responses of tumors to hypoxia [1,2]. These somehow opposite modes of adaptation account for local and temporal heterogeneities in tumor O 2 distribution. The terms 'intermittent hypoxia' or 'cycling hypoxia' were settled to describe this phenomenon of fluctuating hypoxia in tumors [3,4]. As a corollary, the extent of cycling hypoxia reflects tumor plasticity and thus measures the capacity of tumor cells to survive and proliferate in a hostile environment [3].
Although we and others have contributed to demonstrate the existence of cycles of hypoxia and/or ischemia in mouse, canine and human tumors [see [5,6] for review], technologies aiming to routinely measure tumor O 2 fluctuations in the clinics are not (yet) available despite important progresses in the in vivo imaging of hypoxia [7][8][9][10][11]. In the absence of readily accessible  monitoring strategies, the analysis of the transcriptome associated with this phenomenon could represent a prognostic biomarker of cancer progression. Indeed, although mutations and defects in tumor suppressor genes directly influence the whole genetic profile of a given tumor cell clone, cycling hypoxia could be envisioned as a supra-oncogenic phenomenon influencing gene expression [3]. In other words, independently of the genetic background of tumor cells, cycling hypoxia has the potential to lead to common alterations in the expression of some transcripts, and thus to a possible clinically exploitable signature.
Clinical data sets derived from breast cancer patients could be used to evaluate the performance of such cycling hypoxia-related gene signature. The clinical and genetic heterogeneities of this disease and the very large panel of data sets available represent indeed good opportunities to evaluate new prognostic gene expression signatures [12]. Whole genome analysis already provided several molecular classifications for breast cancer beyond standard clinicopathologic variables [12][13][14][15][16][17][18][19][20][21]. The latter include tumor size, presence of lymph node metastasis and histological grades [22] but also encompass three predictive markers of response, namely expression of oestrogen (ER), progesterone (PR) and HER2 receptors [12]. Treatment guidelines are nowadays still largely based on algorithms integrating these informations such as the Notthingham Prognostic Index [22,23] or Adjuvant! Online [24]. Accordingly, for early-stage breast cancer, adjuvant chemotherapy is recommended for most patients with ER-negative or HER2-positive tumors [13,[25][26][27]. The challenge actually resides in selecting patients with  Oncotarget 6954 www.impactjournals.com/oncotarget ER-positive HER2-negative disease who could benefit from chemotherapy.
In this study, we derived a transcriptomic signature of cycling hypoxia (CycHyp) using 20 cell lines derived from various human tumors and characterized by a large variety of distinct genetic anomalies. We then validated the capacity of the CycHyp signature to optimize patient stratification. In particular, we showed how the CycHyp signature could identify ER-positive node-negative breast cancer patients at high risk based on conventional NPI (and who could have been spared from chemotherapy) and inversely those patients classified at low risk but who could have drawn benefits of chemotherapy.

Identification of the CycHyp signature
Tumor cells covering a large diversity of tissues (Suppl. Table 1) were submitted to cycling hypoxia (CycHyp) for 24 hours, maintained under normoxic conditions or exposed to continuous hypoxia (ContHyp) for the same period of time ( Figure 1A). Corresponding mRNA samples were analysed by hybridization using Human Gene 1.0 ST Affymetrix microarrays. Gene expression profiles of each cell type under normoxia vs. cycling hypoxia (CycHyp) were produced to identify the most differentially expressed probesets. The CycHyp signature was determined as the top 100 probesets with the lowest FDR-corrected p-values averaged over 200 resamplings (Table 1); a ContHyp signature was also determined in parallel ( Table 2). The heatmaps made with the 100 probe sets of the CycHyp signature confirmed its excellent potential of discrimination between cycling hypoxia and either normoxia ( Figure 1B) or continuous hypoxia ( Figure 1C). Moreover, Gene Set Enrichment Analysis (GSEA) [28] indicated that when considering differentially expressed probesets (after FDR correction), only 2 gene sets were significantly enriched in the CycHyp signature (Suppl.  Table 3). Also, when using the MSigDB molecular signature database referring to hypoxia or HIF (www.broadinstitute.org), we found 13 hypoxia gene sets sharing, on average, only 1.4 gene with CycHyp (Suppl. Table 4) whereas 44 hypoxia gene sets showed overlap with ContHyp with an average of 6.6 (1-27) common genes (Suppl. Table 5). We also compared the CycHyp signature to 13 other hypoxia-derived signatures described by Seigneuric et al. [29] and Starmans et al. [30]. The CycHyp signature was again far from those signatures with an average of only 1 gene in common. The overlap was larger between ContHyp and those signatures with an average of 6 genes in common (Suppl. Table   6). Finally, using TFactS [31] to analyse transcription factors regulating expression of genes associated to either signature, HIF-1α was only found as positively associated with the ContHyp signature.

The CycHyp signature predicts clinical outcome in breast cancer patients
To evaluate the prognostic value of the CycHyp signature, we focused on breast cancer because of the very large amounts of well-annotated clinical data sets available and a clearly identified need to discriminate between patients at low and high risks among subgroups determined on the basis of clinicopathologic criteria [12,13]. Publicly available GEO data sets allowed us to collect information on the survival of 2,150 patients with primary breast cancer (see clinical features in Table 3).
In order to exploit these data sets, we first transferred the Gene 1.0ST datasets in the HU133 platform. We then used the VDX dataset (GSE2034 and GSE5327) as a reference because of its large number of node negative untreated patients [17]. This training dataset was used to estimate a prognostic multivariate Cox proportional hazard model built on the CycHyp signature (see Methods for details). The other eight datasets (see references in Table 3) were used according to the methodology described by Haibe-Kains and colleagues [32], to assess the prognostic performance of the CycHyp signature on independent samples. We first chose to evaluate our signature independently of the clinicopathological data. The prognostic potential of the CycHyp signature to discriminate between patients at low or high risk was confirmed with a HR=2.39 and a p-value = 1.13e-18 whathever the treatment and the tumor histology ( Figure  2A). We then focused on the ER+ HER2-population which is known to be heterogeneous and thus difficult to treat [12,13]. The discriminating capacity of the CycHyp signature remained strikingly high in the ER+ HER2patient populations (HR = 2.47, p-value = 3.88e-13, Figure 2B). Finally, among this subpopulation of patients, we considered those with a node negative status ( Figure  2C) and among the latter, those who did not receive any treatment ( Figure 2D). Hazard ratios rose to 3.16 and 5.54 in these conditions (p-values = 2.85e-9 and 6.44e-10, respectively), further supporting the discriminating potential of the CycHyp signature. In particular, the data presented in Figure 2D allowed to exclude any confounding influence of the potential benefit arising from the treatment administered to these patients and thus clearly identified a population of patients who remained inadequately untreated.

The CycHyp signature provides significant additional prognostic information to available multigene assays
To evaluate the performance of the CycHyp signature, we compared it with other well-established prognostic multigene assays for breast cancer, namely Gene70 or Mammaprint [14], Gene76 [17] and Oncotype Dx [15]. Using the same set of ER+ HER2-node negative patients as used in Figure 2D, we could determine the low vs. high risk patient stratification according to these signatures. The superior prognostic potential of the CycHyp signature could be captured from the Kaplan Meier curves obtained with the Gene 70, Gene76 and Oncotype DX signatures (compare Figure 3A with Figure  2D). Hazard ratios confirmed the net advantage of the CycHyp signature with a significantly higher value than the three other metagenes ( Figure 3B). The concordance index, which is the probability of a high risk patient to relapse before a low risk patient, was also higher with the CycHyp signature ( Figure 3B). Finally, the Balanced Classification Rate (BCR), which represents the average between sensitivity and specificity to discriminate between patients with progressing disease vs. diseasefree at 5 years, was significantly higher for the CycHyp signature than the three other multigene assays ( Figure  3B). The sensitivity of the CycHyp was above 80% and the specificity of the CycHyp signature was well above the level of the others ( Figure 3B). Of note, the metrics corresponding to each data set taken separately is depicted in Suppl. Figure 2.
Importantly, to further validate the prognostic significance of the CycHyp signature, a comparison with random gene signatures was performed according to the methodology described by Venet et al. [33] and Beck et al. [34]. Figure 3C shows the distribution of the p-values (logrank test in log 10) for 1000 randomly generated signatures together with the p-values of the CycHyp and ContHyp signatures. The logrank test (or Mantel-Haenszel test) [35] is commonly used to assess whether there is a significant survival difference between risk groups. The discrimination between risk groups was significantly higher (P < 0.001) with the CycHyp signature as compared to each of the random signatures whereas Oncotarget 6957 www.impactjournals.com/oncotarget the ContHyp signature was not significantly better (vs. random ones; P=0.141). The same analysis was carried out for the three other metrics (HR, CI and BCR) to assess the discrimination capability between risk groups and confirmed the significantly higher value of the CycHyp signature (vs. random signatures) (Suppl. Figure 3).

The CycHyp signature in association with NPI offers a powerful prognostic tool
We then aimed to determine whether the CycHyp signature could improve the Nottingham Prognostic Index (NPI) for better predicting the survival of operable breast cancers. The NPI algorithm combines nodal status, tumour size and histological grade and allows to model a continuum of clinical aggressiveness with 3 subsets of patients divided into good, moderate, and poor prognostic groups with 15-year survival [22,23,36]. Since few patients were assigned a poor index, we merged here the moderate and poor indices into a high risk group to facilitate the comparison with the CycHyp signature. We found that by integrating the CycHyp signature, an important proportion of patients could be reclassified to another risk group (Figure 4). 44.1% of patients classified at high risk using the NPI algorithm were identified at low Oncotarget 6958 www.impactjournals.com/oncotarget risk when using the CycHyp signature and were confirmed to be "false positive" since they actually exhibited a profile of survival closer to the low risk NPI patient ( Figure 4A). Inversely, using the CycHyp signature, we also identified in the patients at low risk based on the NPI criteria, 33.1% of patients with a risk profile closer to the patients with a negative outcome ( Figure 4B). This increased discriminating potential remained highly relevant when considering all patients or patients with a ER+ HER2status (and among the latter, those with a node negative status or the untreated ones) (see Suppl. Figure 4).

DISCUSSION
This study demonstrates that a gene signature derived from the transcriptomic adaptation of tumor cells to cycling hypoxia is prognostic of breast cancer. The CycHyp signature that we have identified and validated in this study has not only prognostic value independently of molecular risk factors but also provides significant additional prognostic information to clinicopathologic criteria. Clinical outcome of breast cancer patients is nowadays largely based on histological grade and the status of ER, PR, and HER2 receptors [12,13,22]. In early breast cancer, a lack of expression of ER (and PR) will almost systematically lead to the administration of adjuvant chemotherapy in addition to locoregional treatment [12,25,26]. Also, for patients with a tumor expressing HER2, chemotherapy and/or trastuzumab represents the option the most likely to be beneficial based on current clinical knowledge [12]. The impact of chemotherapy is actually more difficult to anticipate for the rest of early-stage breast cancer patients, i.e. those diagnosed with a ER-positive and HER2-negative disease. These patients represent indeed a wide spectrum of different risk profiles: for women with high-risk disease, if chemotherapy is appropriate, others will derive little benefit from it. Our study therefore represents a significant advance for this population of patients, which consists of two third of all breast cancers. We have indeed demonstrated that the CycHyp signature outperforms the existing major prognostic gene expression signatures and offers a unique decision making tool to complement the discrimination of breast cancer patients based on anatomopathologic evaluation.
More generally, the excellent prognostic value of CycHyp confirms the link between cycling hypoxia and cancer aggressiveness [4,5]. This gives credentials to the phenotypic adaptation of tumors resulting from heterogeneities in blood flow distribution as a trigger of cancer progression [3,4]. Also, with the recent impetus in the understanding of tumor metabolism [37,38], it has become obvious that the capacity of a given tumor cell to survive in both aerobic and anaerobic environments represents a critical advantage [39][40][41]. Interestingly, our study also documents the higher prognostic value of a transcriptomic signature derived from cycling hypoxia vs. continuous hypoxia. This confirms that although hypoxia is a frequent feature of poor-prognosis tumors and was reported to drive gene signature associated with negative outcome [42][43][44][45], prognostic markers integrating fluctuations in the hypoxic status of tumors (this study) introduce an additional layer of complexity that better fits the in vivo situation.
Whether the CycHyp signature encompasses genes that actively drive cancer progression or reflects a context of metabolic and hypoxic stress favorable to increased mutagenesis and genetic instability [3], warrants further studies. A few hints can however be gleaned from the comparison of the different signatures.
First, the comparison of the CycHyp and ContHyp signatures indicates that the cycling nature of hypoxia leads to specific alterations in mRNA expression since only 11 common transcripts were found in the two gene lists (see symbols # in Table 1). Furthermore, among these 11 genes, most encode for proteins involved in housekeeping functions such as chromatin packaging (HIST1H 1C, 2AC, 4A and 4C) and RNA processing (RPS13 and 28). The only gene common to the two signatures with a known function related to hypoxia is RBX1 or E3 ubiquitin ligase which mediates the Oncotarget 6959 www.impactjournals.com/oncotarget ubiquitination and subsequent proteasomal degradation of target proteins [46], including the misfolded proteins known to accumulate under low pO 2 . Besides the RBX1 gene, the CycHyp signature does not actually contain genes known to be consistently regulated in response to chronic hypoxia. By contrast, the ContHyp signature contains 14 genes already reported to be overexpressed under low pO 2 and even directly under the control of the transcription factor HIF-1α, including those coding for glucose metabolism enzymes (ALDOA, PFKB3, PFKB4, PGK1, PGAM1, GPI) and the angiogenic growth factor VEGFA. This HIF-dependent gene expression program of the ContHyp signature was actually confirmed in the GSEA and MSigBD analyses and was consistent with previously reported hypoxia-driven gene signatures [42,44,45]. More generally, these findings position the CycHyp signature far from the conventional hypoxiaderived signatures [29,30] but instead as a biomarker of a distinct tumor biology process involving adaptation to fluctuations in the tumor microenvironment.
Second, a large amount of transcripts of the CycHyp signature encode for proteins themselves involved in the regulation of transcription. Data mining revealed that more than 18 transcripts of the CycHyp signature are transcription factors/regulators and 13 others are directly involved in RNA processing (see symbols * and § in Table 1, respectively). This represents one third of the genes comprising the CycHyp signature and reflects a major difference with the ContHyp signature. While hypoxia is usually associated with cell cycle arrest and mTOR inhibition, cycling hypoxia may be compatible with a maintained proliferation potential. This is further supported by the suppression of geroconversion (ie, the process leading from proliferative arrest to irreversible senescence) observed in response to hypoxia [47,48] that offers tumor cells the opportunity to re-enter cell cycle when O 2 is again available. Further studies are needed to compare the evolution of mTOR activity and mTORdependent genes (including those encoding for ribosomal proteins) during cycling and continuous hypoxia.
Finally, the in vitro conditions at the origin of the establishment of the CycHyp signature may actually have specific bearing on its robustness and applicability. Indeed, we previously documented that fluctuating oxygen levels could also directly impact endothelial cells within a tumor [49,50] indicating that non-tumor cells may also contribute to the same transcriptomic adaptation as tumor cells, thereby reinforcing the relevance of the CycHyp signature. Also, although we have used the CycHyp signature as a prognostic biomarker for early-stage breast cancer, this signature was identified by integrating the information arising from tumor cells of various origins and characterized by various oncogenic alterations; the prognostic value of the CycHyp signature in other cancers is currently under investigation in our laboratory.
Altogether, the above findings indicate that the CycHyp signature represents a new generation of prognostic biomarker reflecting a generic environmental condition in tumors that differs from the conventional view of a static, continuous hypoxia occurring in tumors. When applied to breast cancer, the CycHyp signature has a powerful prognostic value independently of molecular risk factors but also offers a unique decision making tool to complement the discrimination of patients based on anatomopathologic evaluation. The CycHyp signature is distinct from conventional hypoxia-related gene signature but also from existing prognostic metagenes, and the rationale behind its discovery supports a potential broad applicability to evaluate cancer patient outcomes.

Tumor cells
Twenty cell lines derived from cancer patients (see Suppl. Table 1 for details) were submitted to cycling hypoxia (CycHyp), i.e. 24 cycles of 30 min incubation under normoxia and 30 min incubation under hypoxic (1% O 2 ) conditions to reproduce tumor hypoxic fluctuations, as previously reported [5,51]. We also considered control conditions of 24 h continuous exposure of tumor cells to either 21% O 2 (Normoxia) or 1% O 2 (ContHyp). For each culture condition, cells were immediately snap-frozen at the end of the last incubation period.

Identification of the signatures
mRNA extracts from each tumor cell cultured under the three above conditions (normoxia, cycling hypoxia and continuous hypoxia) were analysed by hybridization on Human Gene 1.0 ST Affymetrix microarrays (GEO access number: GSE42416): http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?tok en=probzowmiyseqxm&acc=GSE42416 The extent of the resulting tumor cell datasets (20 samples in each of the three conditions) led us to resort on a resampling mechanism to increase the robustness of the signatures to be identified. For every resampling experiment, a subset of 90 % of the samples was chosen uniformly at random as a training set and the remaining 10% were used as validation set. Differentially expressed probesets (one probeset = a collection of probes designed to interrogate a given sequence) were assessed on each subset according to a t-test and the corresponding FDR corrected p-values were reported. The 100 probesets with the lowest corrected p-values, averaged over 200 resamplings [52][53][54], formed the CycHyp (Table 1) or ContHyp (Table 2) signatures. All such expression differences were highly significant (p<1e-4) after Benjamini-Hochberg FDR correction for the multiplicity www.impactjournals.com/oncotarget of the test [55]. Of note, in each resampling, the 10 % data not used to select probesets allowed one to estimate the discrimination potential between (cycling or continuous) hypoxia versus normoxia conditions. The average classification accuracy over all resamplings amounted to 97.5 % for CycHyp and 94.3% for ContHyp.
The 100 HGU1.0 ST probesets forming the CycHyp signature corresponded to 94 unique Entrez GeneID in the NCBI database, out of which 69 genes were available on the HGU133a platform (i.e., the technology used in most clinical studies considered here). Those 69 genes were represented by 87 HGU133a probesets. The few datasets collected on HGU133plus2 were reduced to the probesets also present on HGU133a.

Patient data sets
All breast cancer expression data were summarized with MAS5 and represented in log2 scale (except for GSE6532 already summarized with RMA). Breast cancer subtypes (ER+/HER2-, ER-/HER2-and HER2+) were identified with the genefu R package [56] (see Supplementary R Package). Disease-free survival at 5 years was used as the survival endpoint. The data from all patients were censored at 10 years to have comparable follow-up times across clinical studies [32].

Prognostic models of the clinical outcome
The VDX dataset (GSE2034 and GSE5327 from the GEO database) was considered as a reference because of its large number of node-negative untreated patients [17]. This dataset formed the training set used to estimate a prognostic model of the clinical outcome. A risk score for each patient was computed from a penalized Cox proportional hazards model [57] implemented in the Penalized R package [58]; the parameters of the elastic net penalty were learned on the training set by crossvalidation. Prediction into a high risk vs. low risk group resulted from a predefined threshold value on this risk score. The decision threshold was chosen on the training set to maximize the specificity and sensitivity of the discrimination between patients with progressing disease versus disease-free patients at 5 years. Following the methodology described by Haibe-Kains et al. [32], all other datasets were used as validations to assess the prognostic performances on independent samples, i.e. balanced classification rate (BCR), concordance index (CI) [59] and hazard ratio (HR) [60]. The survcomp R packages were used to test the significance of the HR and CI values [33] while a Z-test allowed to infer p-values for the BCR relying on an approximation by a normal distribution.
Prognostic performances of a penalized Cox model defined on the CycHyp signature were also compared with well-established prognosis models for breast cancer, namely Gene 70 (Mammaprint) [14], Gene 76 [17] and Oncotype DX [15] signatures. Those existing signatures were associated to specific prognostic models implemented in the genefu R package [56]. Comparison of CycHyp and ContHyp signatures was also carried out with random gene signatures of the same sizes, i.e. 87 and 123 probesets, respectively. One thousand signatures of each size were generated and analysed using the methodology described by Venet et al. [11]. The objective of those experiments was to assess to which extent the CycHyp and ContHyp signatures had a better discrimination power between risk groups than random signatures. Gene Set Enrichment Assay (GSEA) analysis was also performed using the molecular signature database (MSigDB) and the CycHyp and ContHyp signatures expanded to 2118 and 2065 differentially expressed genes, respectively (after FDR correction and averaged over all resamplings.