Prioritization of metabolic genes as novel therapeutic targets in estrogen-receptor negative breast tumors using multi-omics data and text mining

Estrogen-receptor negative (ERneg) breast cancer is an aggressive breast cancer subtype in the need for new therapeutic options. We have analyzed metabolomics, proteomics and transcriptomics data for a cohort of 276 breast tumors (MetaCancer study) and nine public transcriptomics datasets using univariate statistics, meta-analysis, Reactome pathway analysis, biochemical network mapping and text mining of metabolic genes. In the MetaCancer cohort, a total of 29% metabolites, 21% proteins and 33% transcripts were significantly different (raw p <0.05) between ERneg and ERpos breast tumors. In the nine public transcriptomics datasets, on average 23% of all genes were significantly different (raw p <0.05). Specifically, up to 60% of the metabolic genes were significantly different (meta-analysis raw p <0.05) across the transcriptomics datasets. Reactome pathway analysis of all omics showed that energy metabolism, and biosynthesis of nucleotides, amino acids, and lipids were associated with ERneg status. Text mining revealed that several significant metabolic genes and enzymes have been rarely reported to date, including PFKP, GART, PLOD1, ASS1, NUDT12, FAR1, PDE7A, FAHD1, ITPK1, SORD, HACD3, CDS2 and PDSS1. Metabolic processes associated with ERneg tumors were identified by multi-omics integration analysis of metabolomics, proteomics and transcriptomics data. Overall results suggested that TCA anaplerosis, proline biosynthesis, synthesis of complex lipids and mechanisms for recycling substrates were activated in ERneg tumors. Under-reported genes were revealed by text mining which may serve as novel candidates for drug targets in cancer therapies. The workflow presented here can also be used for other tumor types.


INTRODUCTION
Estrogen receptor signaling is one of the main molecular features that determines the aggressiveness and the clinical course of breast cancer. Estrogen receptor negative breast tumors are aggressive and have a poor prognosis due to their high proliferation rate and their resistance to many therapeutic approaches. The estrogenindependent growth of ERneg tumors depends on a range of biological pathways, including central energy and nucleotide metabolism [1,2], motivating to characterize metabolic dysregulations associated with the aggressive Research Paper www.oncotarget.com tumor phenotype. Human metabolic network's operation and regulation is governed by up to 10% genes in the human genome. Many of these genes and associated pathways are dysregulated and fuel a tumor's growth, therefore they are potential drug targets.
How to identify these dysregulations in aggressive breast tumors and rank them? One of the experimental approaches is to analyze the tumors with omics assays including metabolomics, proteomics and transcriptomics. In fact, breast tumors are extensively analyzed using these assays [3][4][5][6][7][8][9][10][11][12][13]. Tumor metabolomics reveals new insights into breast cancer metabolism [14][15][16][17]. MetaCancer consortium has been built to identify and validate new breast cancer biomarkers based on metabolomics [17,18]. We have previously shown that beta-alanine and glutamate to glutamine ratio (GGR) are associated with aggressive phenotype in the MetaCancer cohort [17,18]. Furthermore, a multi-omics approach has been successfully used to characterize metabolic dysregulation and identify potential new therapeutic targets in lung cancer [19], but such investigations are limited for breast tumors. These approaches yield different lists of potential targets, creating a challenge to identify which genes and pathways can be targeted in follow-up experiments. A handful of genes are over-studied in reference to tumor biology, for example IDH1 or FH, indicating a selection bias. However, poorly-studied significant genes and associated metabolic pathways provide an additional repertoire to identify new drug targets beyond the handful of genes.
In the present study, we integrated tissue-based metabolomics with proteomics and transcriptomics in the MetaCancer cohort to identify the metabolites, proteins and metabolic genes that are associated with ERneg phenotype. To identify metabolic genes that were poorly studied in ERneg tumors, we performed Gene Expression Omnibus (GEO) based meta-analysis of nine publicly available datasets, and ranked the metabolic genes based on their significance in meta-analysis and literature count.

Workflow to prioritize tumor metabolic genes
We applied a new bioinformatics workflow ( Figure  1) to characterize metabolic dysregulations between ERpos and ERneg breast tumors. We used metabolite, protein and transcript measurements data from the MetaCancer cohort [17,18], in addition to publicly available gene expression data (Supplementary Table 1). The workflow incorporates significance testing, gene expression meta-analysis, pathway analysis, network mapping and text mining. The output of this workflow yields lists of metabolic pathways and under-studied metabolic genes, here associated with ERneg breast tumor biology. Table 1 summarizes significantly different transcripts, proteins and metabolites between ERneg and ERpos tumors in the MetaCancer and the public datasets. We used raw p-values to interpret the molecular differences across all multi-omics levels of cellular regulation using bioinformatics approaches. Hence, we did not correct p-values for multiple hypothesis testing because our objective was not to find a diagnosis biomarker panel that can distinct the tumor subtypes.

MetaCancer dataset
The metabolomics data of the MetaCancer study consisted of 470 metabolites detected by gas chromatography/time of flight mass spectrometry, including 161 identified metabolites and 309 unidentified metabolites. Breast tumors from 251 women were studied, of which 192 were ERpos and 59 were ERneg. A total of 164 metabolites (63 identified and 101 unidentified metabolites) were significantly different between ERpos an ERneg cancer groups (raw p-value <0.05) with an average effect size of 1.37-fold changes, ranging from 0.59-3.03-fold. The levels of 133 metabolites were higher in ERneg tumors and 31 metabolites were lower in ERneg tumors (Figure 2C, 2D and Supplementary Table 4).
The proteomics MetaCancer data consisted of 125 formalin-fixed paraffin embedded (FFPE) samples using 96 ERpos tumors and 29 ERneg tumors. FFPE samples were not available for all 251 patients. ER status for one sample was not known. We could not use the exact same fresh frozen tumors as used for the metabolomics analysis because of limited sample availability. Conversely, FFPE samples are not useful for metabolomics analysis due to the FFPE fixation process. Out of total 1500 detected proteins, 1231 were found in at least six samples in either ERpos or ERneg groups and were kept in the dataset for statistics analysis (Supplementary Table 5). A total 295 proteins were found to be significantly different regulated (raw p-value <0.05) using the Mann-Whitney U test. The levels of 97 proteins were lower in ERneg tumors and 198 proteins were higher in ERneg tumors ( Figure 2B, 2D, Supplementary Table 5).
The transcriptomics MetaCancer dataset was downloaded from the gene expression omnibus (GEO) database with accession number GEO59198. The data set included 18401 genes determined in 122 ERpos and 32 ERneg breast tumors (Table 1). A total of 6040 genes (33%) were significantly different (raw p-value <0.05) between ERpos and ERneg groups using the GEO2R utility ( Figure 2D) [20]. Among those genes, 3103 were under-expressed and 2937 were over-expressed in the ERneg tumors (Table 1, Supplementary Table 6). We have then utilized the Expasy database to obtain enzyme commission (EC) annotations for all human genes from NCBI. We selected the EC numbers that chemically transform small-molecules by excluding enzymes that use proteins or genes as substrates, yielding a total of 1549 human genes coding for metabolic enzymes (Supplementary Table 7). 540 metabolic genes (35%) were found to be significantly different (raw p-value <0.05) in ERneg tumors compared to ERpos tumors in the MetaCancer cohort ( Figure 2A, Supplementary Table  7 and Supplementary Figure 1).

Breast tumor public gene expression datasets
We collected eight additional gene expression datasets for which the ER status was known. The eight additional GEO studies consisted of total 540 for ERpos and 299 ERneg breast cancer samples, ranging between 32-138 samples for ERpos tumors and 10-88 ERneg tumors (Table 1). On average, we found 23% of all transcripts to be significantly different expressed between ERpos and ERneg, ranging from 5-42% ( Figure 2E). Similar to the MetaCancer study, 47% of all significant genes were over-expressed, 53% were under-expressed. A meta-analysis by comparing the raw p-value distribution of each gene with across all datasets with a null-distribution using the Kolmogrov-Smirnov test (KS) yielded a total of 929 metabolic genes which were significant (raw p-value <0.05) ( Figure 2F, Supplementary Table 7, Supplementary  Figure 1). Estrogen receptor 1 (ESR1) gene expression was lower in ERneg tumors across all gene expression datasets (Supplementary Table 6).

Integrated pathway analysis and visualization
Next, we investigated whether these different omics data can be grouped by functional relationships. Here, we focused on integrated analysis of metabolic pathways. We used statistical over-representation as generic tool to summarize and rank the differential regulation of genes, proteins and metabolites by mapping to the Reactome database [21]. We used Reactome as reference database because it uses GO terms and because it provides more comprehensive mapping of genes to metabolic pathways than KEGG pathways, BioCyc or HMDB. We then confined the Reactome analysis to significantly regulated molecular markers, yielding a joint list of 542 genes, 104 proteins and 56 metabolites Supplementary Table  8). Notably, 20 metabolites, 7 proteins and 45 genes Figure 1: Overview of multi-omics data mining to reveal metabolic dysregulation by integrating raw p-values from metabolite, protein and gene expression analysis. Results of significance testing on individual omics-levels were subsequently analyzed by pathway enrichment analysis, by mapping to biochemical networks and by text mining. The overall outcome of such analysis is a list of key pathways and genes that includes well-studied genes as well as genes that have rarely been reported before in the context of breast cancer. www.oncotarget.com could not be mapped to any pathway set and therefore had to be discarded from further analysis. The lists of remaining molecular markers were summarized by Reactome overrepresentation analysis into 886 pathways (Supplementary Table 9) of which 88 (~10%) were found to be significantly enriched ( Figure 3). Interestingly, for most pathways, the over-representation analysis was based on all three omics levels, while 15 pathways were supported by only protein and gene lists and 8 pathways were based solely on gene set enrichments. This finding is based on the larger size of gene lists compared to proteins or metabolites, in addition to bias in proteomics and metabolomics data acquisitions. For example, the metabolomics platform used here did not cover biosynthesis of complex lipids, while our FFPEbased proteomics method failed to observed low-abundant proteins. We did not use lipidomics analysis here because most lipids are not annotated by specific enzymes in Reactome. Consequently, only gene lists supported the finding of dysregulation of conjugation of amino acids or carboxylic acids (xenobiotic metabolism, Figure 3, branch 5a) and only gene-or gene/protein lists supported the finding of dysregulation of complex lipids ( Figure  3, branch 7a). Importantly, Reactome pathway mapping highlighted various biological pathways involved in amino acids ( Figure 3 Table 9). Key specific pathways within these branches were glycolysis, pentose phosphate pathway, TCA cycle, nucleotide salvage, glutathione conjugation, steroids metabolism, fatty acyl-CoA biosynthesis, serine biosynthesis and metabolism of aromatic amino acids. These pathways were also found to be significantly associated with ERneg phenotype when genes, protein and metabolite lists were analyzed separately (Supplementary Tables  10-13). Several branches indicate the importance of lipids, nucleotides and amino acids for sustaining tumor growth and cell division in the more aggressive ERneg tumors. Other branches such as carbohydrates and TCA metabolism can be summarized in altered utilization of energy sources, amino acid conjugation for neutralizing xenobiotic including anti-tumor drugs, cholesterol biosynthesis regulation by SREBP and coenzyme biosynthesis to support fatty acid production.
Next, we integrated differentially regulated genes, proteins and metabolites into network maps (  Table 6). Similarly, glycolysis and TCA cycle metabolism (Figure 4) was altered by increased levels of 7 metabolites, 33 genes and 13 proteins, including G6PD, PGM1, MAOM, IDHP, CISY and lower levels of F16P2 (Supplementary Table 5). The combined action of these genes and proteins supports metabolic reprogramming towards the pentose phosphate pathway (PPP) and TCA anaplerosis. NADPH metabolism ( Figure  4) was activated via the PPP to provide more reducing potential for fatty acid synthesis and increased levels of  pentose sugars for nucleotide biosynthesis (Supplementary Table 4). Higher levels of GLYM, P5CS, P5CR1, P4HA1 and PRDX4 indicated increased collagen remodeling, a pathway that was missed by Reactome analysis. To sequester xenobiotics (Figure 4), ERneg tumors preferred conjugation of amino acids by cytochrome P450 genes instead of using glutathione transferases. Not all metabolic dysregulations are represented in Figure 4. For example, DHCR7, involved in cholesterol biosynthesis, was higher in the ERneg tumors (Supplementary Table 6).

Identification of metabolic genes and proteins that were under-studied in breast cancer
Many of the differentially regulated proteins or genes in the network maps ( Figure 4) have been studied before. Next, we asked which of these genes and proteins could serve as novel therapeutic targets? To this end, we searched 30 million PubMed abstracts using key-word text-mining to rank all differentially regulated metabolic genes by their publication frequencies in breast tumor biology (see Methods, Supplementary Table 7). The resulting Table 2 gives the differential expression for the top-50 most significant genes from the MetaCancer and the GEO meta-analysis study in relation to the publication record. This comparison yielded a range of metabolic genes that are currently under-studied with less than five papers. Interestingly, over 2/3 of the 50most significant metabolic genes are shown here to be severely understudied. Several of the understudied genes are directly involved in critical metabolic pathways of tumor etiology, including glucose metabolism (SORD), Larger node sizes indicate increased significance metabolite pathway dysregulation using the Reactome database tools. Red colors represent pathways supported by gene, protein and metabolite data, green colors are pathways supported by only proteins and genes, blue pathways are based on gene data only. www.oncotarget.com lipid biosynthesis (FAR1), cAMP signaling pathway (PDE7A, ADCY6), and glutamate anaplerosis (ABAT, GLUD1). We propose that these understudied genes should be further investigated in their role in breast tumor aggressiveness. To check the clinical significance of these genes, we have used KMPlot [23] tool that estimates the relapse free survival (RFS) using a standardized cohort of breast cancer gene expression studies. We have found that 29 genes were positive associated with RFS and 11 genes negatively associated with RFS in a meta-cohort of 1,809 patients (Supplementary Table 15).
In the context of finding novel, drugable targets for ERneg breast tumors, we then analyzed specifically the proteomics dataset with respect to finding the upregulated proteins that are supported by the GEO meta-analysis, irrespective of the protein function or involvement in metabolism. From a total of 1231 proteins, 295 proteins (24%) were differentially regulated in MetaCancer with a raw p-value < 0.05. 39 of these proteins were also found to be significantly different in the GEO metaanalysis, of which 21 were up-regulated in ERneg tumors ( Figure 5A). Importantly, 10 of these proteins were found to be severely understudied using our PubMed text mining approach (Figure 5B, Supplementary Table  14), including CALML5, ACTR3, PADI2 and PFKP. Among these 10 understudied proteins, four proteins were encoded by metabolic genes, including PFKP, GART, PLOD1 and ASS1. Although PFKP is a key regulatory enzyme in glycolysis and gluconeogeneis pathways, only four PubMed abstracts were found targeting this gene in breast cancer research. We propose to remove bias in study designs to not only study heavily published genes such as CD44 and GSTP1 but also relevant new targets such as PFKP, GART, PLOD1 and ASS1.

DISCUSSION
Cancer metabolism is radically different from non-malignant cells. Breast cancers can be grouped into different subtypes by presence of receptor genes, stages or grades. We here focused on comparing ERneg and ERpos classes of breast cancer but did not extend this analysis to further subtypes such as triple negative tumors using HER2 and PR status. When designing the study, we queried the GEO database for finding a larger number of studies that had ER-status reported to ensure to have a good base for meta-analysis of significantly expressed metabolic genes. From the nine studies (plus MetaCancer) we downloaded from GEO, only 5 had also information on HER2 status, reducing the power of gene expression meta-analysis. Secondly, several of those studies had already small sample sizes, and if HER2 status was added,  Similarly, our proteomics data set consisted of only 14 triple-negative tumors, again compromising statistical power for finding differences in metabolic genes. Therefore, we focused on comparing ERneg and ERpos tumors for which we previously showed stark differences between in central metabolism, altered ratios of glutamine/glutamate (glutaminolysis) and beta-alanine accumulation [17]; yet, an integrated analysis with prioritization and a focus on candidate pathways and druggable targets was missing. Despite the established importance of metabolism as hallmark of cancer, few other studies focused on the interplay of genetic mutations and gene expression and actual metabolic phenotypes. Overall, we have found that ERneg tumors show highly active biosynthesis of amino acids, lipids and nucleotides and utilize substrate recycling as well as xenobiotic metabolism to support tumor growth (Figure 3, 6). We consciously constrained our approach to metabolic genes and gene products and here provided the first integrated multi-omic analysis of ERneg versus ERpos tumors. We used solid statistical footing to focus on interpretations of metabolic differences to find novel targets and give the first example how in-depth text mining can be used to prioritize gene targets that have been largely ignored in the literature. Yet, despite clear evidence from the data, not all metabolic enzymes may be suitable as drug targets because they support critical metabolic pathways in cells throughout the body. Hence, strong inhibition of such enzymes might lead to severe therapeutic side-effects. Instead, knowledge about specific pathways might also be useful for targeting regulatory and signaling pathways in metabolism that may have less toxicity in normal cells. Our study results presented here support the idea that specific metabolic genes have gained less attention than others, even if they act in the same pathway. For example, our analysis showed that glutaminase is frequently reported in PubMed abstract in reference to breast cancer, whereas the classic mitochondrial glutamate oxidation enzyme, GLUD1, has only been once reported in this context. Specifically, new research on targeted drug delivery may render high-flux metabolic enzymes equally important for new therapy options as immunotherapy or classic oncology chemotherapies.

Nucleotide salvage pathway
Reactome pathway analysis of our data highlighted the nucleotide salvage pathway as specifically important in ERneg tumors (Figure 3, branch 3, Supplementary Table  9). 5NTC (Cytosolic purine 5'-nucleotidase), a key enzyme involved in nucleotide salvage, was up-regulated in ERnegative tumors. Five purine metabolites were elevated in ER-negative tumors, including adenine, guanosine, guanine, xanthine and hypoxanthine (Supplementary Table 4). Meanwhile, elevated levels of beta-alanine were observed in ER-negative tumors, an intermediate of the pyrimidine salvage pathway, along with the concurrent increases of uracil, pseudo-uridine, UMP and CMP. Concurrently, 5-deoxy-methylthioadenosine (MTA), a further purine salvage metabolite, was also observed at higher levels in ER-negative tumors (Supplementary Table  4). Aggressive tumors bypass autophagy and apoptosis and hence, require more re-use of nucleotides for cell survival and cell division. This process may therefore contribute to the cancer phenotype of cell survival. In addition, cancer cells also activate de novo nucleotide biosynthesis. There are a range of drugs targeting these metabolic pathways in chronic lymphocytic leukaemia, lung cancer and pancreatic cancer [24], but these drugs have not yet been repurposed to be tested against ERneg tumors. Our analysis motivates and supports the initiation of such clinical trials of these drugs for the management of ERneg breast tumors.

Microenvironment remodeling
Metabolites involved in collagen biosynthesis as well as collagen remodeling enzymes were enriched in ER-negative tumors. Breast tumor cells increasingly rely on de novo biosynthesis of proline for collagen metabolism [25]. Accordingly, we found increased levels of proline and trans-4-hydroxyproline as well as two enzymes involved in de novo biosynthesis of proline in ERneg tumors, pyrroline-5-carboxylate reductase (PYCR) and P5C-synthase (ALDH18A1) (Supplementary Table 5). PYCR converts pyrroline-5-carboxylate (P5C) to proline and ALDH18A1 produces P5C from glutamate. Extracellular matrix (ECM) is the most abundant component in the tumor microenvironment and it has been associated with breast cancer progression and metastatic spread [26]. As a scaffold of tumor microenvironment, collagen changes in the microenvironment regulate ECM remodeling, release signals and trigger a cascade of biological events, promote tumor invasion and migration [27]. ECM proteins and ECM mediated signaling pathways may be promising drug targets for breast cancer [28]. Along with the change of collagen genes, the expression levels some chaperone and co-chaperone Top 50 most significant genes in the GEO meta-analysis for ten studies and their significance in the MetaCancer proteomics and transcriptomics datasets (raw p-values). Note: -indicates proteins not found in the proteomics dataset www.oncotarget.com proteins, such as HSP90AA1, HSP90AB1, HSP90B1 and CDC37, were higher in ERneg tumors, as well as YWHAQ, YWHAE (Supplementary Table 5), which involved in PI3K-AKT signaling pathway [29].

Carbohydrate metabolism
The strongest effect supported by three omics levels (metabolites, proteins and genes) was found for an increased PPP activity. PPP intermediates, ribose-5phosphate and ribulose-5-phosphate, were increased in ERneg tumors, caused by an increase in the abundance of the key oxidation enzymes, glucose-6-phosphate dehydrogenase (G6PD) and 6-phosphogluconate dehydrogenase (PGD) and their encoding genes. Levels of transketolase, a key enzyme in the non-oxidative branch of the pentose phosphate pathway, and its encoding gene TKT, were also found increased in ERneg tumors. Further, gene expressions of phosphoglucomutase 1 (PGM1), ribose-5-phosphate isomerase (RPIA) and deoxyribose-phosphate aldolase (DERA) were also upregulated in ERneg tumors (Supplementary Table 5, 7). The pentose phosphate pathway delivers both NADPH and pentose phosphates required for cell division and tumor proliferation [30]. The activation of this pathway supports tumor growth known for ERneg tumors. In comparison to the clear differential regulation of the PPP, glycolytic enzymes showed less consistent regulation of genes, proteins and metabolites when comparing ERneg to ERpos tumors.
Apart from metabolic changes in PPP, higher levels of three uncommon carbohydrates were observed, including a two-fold increase in trehalose, maltose and maltotriose levels in ERneg tumors (Supplementary  Table 4 and Figure 6). These oligosaccharides might be co-imported from blood along with the well-known glucose uptake in tumors. This is the first report on these compounds in relation to breast tumor metabolism; the maltose-degrading enzyme (GAA, LYAG) in the MetaCancer study was found down-regulated (Supplementary Table 5) while the corresponding gene was found significantly down-regulated, consistent with other studies in the GEO meta-analysis (Supplementary Table 7). This significance in differential regulation of oligosaccharide metabolism in ERneg tumors on all three omics levels suggests that further potentially this pathway could also be important for future drug therapies.

Mitochondrial oxidation
We have observed increased levels of the metabolites nicotinamide and citric acid cycle (TCA) intermediates citrate, alpha-ketoglutarate, malate, fumarate  Table 4) along with overexpression of citrate synthase (CISY) and nicotinamide phosphoribosyltransferase (NAMPT) in ERneg breast cancer patients (Supplementary Table 5). Under hypoxic condition, attenuation of electron transport chain in tumors is expected and NADH production in TCA can imbalance the NAD+/NADH ratio. Targeting the NAD+ salvage pathway is a promising therapeutic option for cancer patients [31]. Aggressive breast tumor shows an overexpression of hypoxia inducible factor 1 alpha (HIF-1) [32]. NAMPT also known as visfatin, the rate-limiting enzyme in the NAD salvage pathway has been targeted by inhibitors such as FK866 [33] and CHS-828 [34]. This protein has also been reported to be overexpressed in prostate cancer to promote tumor cell survival [35] and is required for de novo lipogenesis in the tumor cells [36]. Over-expression of NAMPT in breast cancer tissue is associated with poor survival [37]. Inhibiting NAMPT can also make the ERneg breast cancer sensitive to additional chemotherapies as observed in vitro [38].
For example, higher expression levels of gene encoding ATP-dependent 6-phosphofructokinase, platelet type (PFKP) were found to be statistically up-regulated in ERneg tumors in eight out of ten GEO studies and specifically, were also found up-regulated in both the proteomics and transcriptomics dataset of the MetaCancer cohort ( Figure 5B). However, PubMed text mining showed there were only four publications focused on PFKP in reference to breast cancer. PFKP is a critical rate limiting enzyme in glycolysis, phosphorylating fructose-6phosphate to fructose-1,6-bisphosphate and determining the rate of glycolytic flux versus flux into the pentose phosphate pathway. Two further 6-phosphofructokinase isomers are existing in humans, PFKM (muscle type) and PFKL (liver type) [39]. PFKM was not detected in our proteomics dataset and not significantly different in transcriptomics dataset (raw p = 0.08), while the gene expression of PFKL was upregulated in the MetaCancer ERneg breast tumor transcriptomics dataset (raw p < 0.05). Therefore, targeting specific isoforms of phosphofructokinase may be useful as potential target to deprive cancer cells from essential substrates and energy for proliferation while allowing the survival of normal cells [40].
Similarly, other underreported metabolic genes might serve as new therapeutic targets. Three isoforms of procollagen-lysine, 2-oxoglutarate 5-dioxygenase (PLOD) have been identified. PLOD1 hydroxylates a lysine residue in the alpha-helical or central domain of procollagens; PLOD2 is responsible for lysine hydroxylation in the telopeptide of procollagens whereas the substrate specificity of PLOD3 is unknown [41]. PLOD1 and PLOD2 expression was induced by hypoxia in breast cancer cells [41]. The expression of PLOD1 was upregulated in lysyl oxidase-like 4 (LOXL4) knockout xenograft tumor tissues and LOX4 knockdown could enhance tumor growth and metastasis through collagendependent extracellular matrix changes in TNBC [42]. We therefore propose that PLOD isoforms may be interesting targets to study the relevance of metabolism in the tumor microenvironment.
A third example is argininosuccinate synthetase 1 (ASS1). ASS1 converts citrulline to arginine and is a key enzyme in arginine biosynthesis that is critical for many functions, including nitric oxide signaling or polyamine biosynthesis, as well as the liver urea cycle. In the MetaCancer cohort, we found both ASS1 transcripts and the enzyme were up-regulated in ERneg tumors (Supplementary Tables 5, 7), and the ASS1 substrate citrulline was decreased in the metabolomic results (Supplementary Table 4). An inhibition should restrict arginine availability. Arginine starvation has been shown to impair mitochondrial respiratory function in ASS1deficient breast cancer cells [43], suggesting that arginine starvation therapy such as pegylated recombinant arginine deiminase (ADI-PEG20) could be an option for patients with low ASS1 expression [44]. Dietary arginine restriction has also been shown to reduce tumor growth in a xenograft model of ASS1-deficient breast cancer [45]. We propose that such drugs with undergoing clinical trials could be repurposed for ERneg tumor therapy.
One limitation of this study is that only one metabolomics platform (GC-TOF MS) was used. The combination of multi-platforms including CSH-QTOF MS/MS and HILIC-QTOF MS/MS will significantly increase the number of detected and annotated metabolites and the pathway coverage. Future studies can also use advanced mass spectrometry instruments for proteomics and the next generation sequencing methods for transcriptomics to increase the pathway coverage.

MetaCancer cohort details
The study included 276 fresh frozen breast tumor biopsies and 126 FFPE tissues. They were collected for the tissue bank of the European FP7 MetaCancer consortium at the Charité Hospital. The project was approved by the institutional review board of the Charité Hospital (EA1/139/05). Further details about the cohort can be found in our previous report [17]. Biopsies were used for transcriptomics and metabolomics analysis and FFPE tissues were used for proteomics analysis.

MetaCancer metabolomics and proteomics data
GC-TOF MS data acquisition of the fresh frozen tumor biopsies tissues was performed as previously published [46]. FFPE tissues were analyzed using an LTQ mass spectrometer for measuring proteins. Details are given in the Supplementary Text 1.

Public breast tumor transcriptomics datasets
The MetaCancer transcriptomics dataset was downloaded from the GEO omnibus database using the accession number GSE59198. Nine other gene expression datasets were selected for which estrogen receptor status was available in the GEO omnibus database. Details about these datasets are provided in the Supplementary Table 1.

Bioinformatics and statistical analysis
Metabolite enrichment analysis was performed using the ChemRICH tool [47]. Input file for the ChemRICH analysis is provided in the Supplementary  Table 3. MetaMapp network was visualized using the Cytoscape software. Metabolites were linked to proteins using the KEGG and Expasy databases. Links were visualized as an integrated network using the Cytoscape network visualization software. Metabolites were linked to enzyme commission (EC) number first using the KEGG database and EC number to protein mapping was obtained from the Expasy database. Pathway over-representation analysis was conducted using the Reactome pathway analysis tool [21] because it provides the most comprehensive coverage of metabolic pathway maps. Gene symbols and "breast cancer" were searched in the PubMed database to get the count of abstracts for a gene and breast cancer. NCBI eutils web services were used for running the searches for all the metabolic genes. All statistical analyses were conducted using R. Mann-Whitney-Wilcoxon test was performed on both metabolomics and proteomics datasets. GEO2R utility was used to analyze transcriptomics datasets.

CONCLUSIONS
We here show how multi-omics data can be utilized along with text mining to identify metabolic genes and metabolic pathways that are understudied in breast tumor research, specifically for ERneg tumors that are known to have poor clinical outcomes. We see this approach as a hypothesis-generating method, prioritizing the multitude of genes that are to be significant in classic transcriptomics studies. We also showed that a meta-analysis of multiple studies refines and strengthens candidate gene selections to study metabolic reprogramming. We propose that such understudied, significant metabolic genes could be used as additional potential therapy targets, especially if genes and proteins were found up-regulated in different breast cancer studies, and if metabolite abundance support protein activities.

Ethics approval
The project was approved by the institutional review board of the Charité Hospital (EA1/139/05). All participants provided informed consent.