Profiles of miRNAs matched to biology in aromatase inhibitor resistant breast cancer

Aromatase inhibitor (AI) resistance during breast cancer treatment is mimicked by MCF-7:5C (5C) and MCF-7:2A (2A) cell lines that grow spontaneously. Survival signaling is reconfigured but cells are vulnerable to estradiol (E2)-inducible apoptosis. These model systems have alterations of stress related pathways including the accumulation of endoplasmic reticulum, oxidative, and inflammatory stress that occur prior to E2-induced apoptosis. We investigated miRNA expression profiles of 5C and 2A to characterize their AI resistance phenotypes. Affymetrix GeneChip miRNA2.0 arrays identified 184 miRNAs differentially expressed in 2A and 5C compared to E2-free wild-type MCF-7:WS8. In 5C, 34 miRNAs of the DLK1-DIO3 locus and miR-31 were overexpressed, whereas miR-222 was low. TCGA data revealed poor and favorable overall survival for low miR-31 and miR-222 levels, respectively (HR=3.0, 95% CI:1.9-4.8; HR=0.3, 95% CI:0.1-0.6). Targets of deregulated miRNAs were identified using CLIP-confirmed TargetScan predictions. KEGG enrichment analyses for 5C- and 2A-specific target gene sets revealed pathways associated with cell proliferation including insulin, mTOR, and ErbB signaling as well as immune response and metabolism. Key genes overrepresented in 5C- and 2A-specific pathway interaction networks including EGFR, IGF1R and PIK3R1 had lower protein levels in 5C compared to 2A and were found to be differentially modulated by respective miRNA sets. Distinct up-regulated miRNAs from the DLK1-DIO3 locus may cause these attenuative effects as they are predicted to interact with corresponding 3′ untranslated regions. These new miRNA profiles become an important regulatory database to explore E2-induced apoptotic mechanisms of clinical relevance for the treatment of resistant breast cancer.

miRNA expression in PAM50 subtypes: Based on raw read counts, miRNAs significantly differentially expressed between PAM50 subtypes were determined using non-parametric Kruskal-Wallis tests as well as pairwise Wilcoxon-Mann-Whitney post-tests.

Survival analyses:
Overall survival was used as endpoint in all survival analyses. Associations between survival and miRNA expression was explored using the conditional inference tree framework from the R package partykit. The method recursively partitions the samples in subsets with significantly differing survival functions. The P-value cut-off to accept a split was set to 0.1. The significance of the association between groups of samples and overall survival was analyzed by log-rank tests. Holm correction procedure [6] was applied to adjust resulting P-values for multiple testing. Over-represented functional categories are usually identified by comparing the observed overlap between category and miRNA target set with the expected overlap estimated from the background gene set. Notably, the definition of the background gene set is of crucial importance. It is well known, that miRNA targets are enriched for certain functions compared to non-miRNA targets [7,8] and therefore, the background gene set was compiled in such a way to include only predicted miRNA targets. This is critical in order to avoid a bias in the calling of functional terms. The resulting miRNA-gene interaction map included 1,067 miRNAs and 11,795 genes connected via 263,451 interactions.

KEGG pathway analyses:
We based our pathway analyses on the number of interactions between miRNAs and pathway genes. The interaction map was filtered to include genes for which KEGG pathway annotations were available. This resulted in a background interaction map of 1,065 miRNAs, 4,330 genes and 99,246 interactions. Fisher's exact test was used to test if the number of observed interactions between genes of a certain pathway and a miRNA set significantly exceeded the number of expected interactions as estimated from the background interaction map. Additionally, we applied an approach in line with Bleazard et al. [9] using permutation tests for the identification of significantly associated pathways. Here, the number of interactions between a miRNA set and a pathway was compared with the number of interactions obtained from a random set of miRNAs of the same size as the input set sampled (without replacement) from the background miRNA set. Sampling was repeated 10,000 times and a permutation Pvalue was calculated using the proportion of random miRNA sets that exhibited equal or more interactions with the pathway genes. The overlap between gene sets from enriched KEGG pathways (P < 0.05) were illustrated as graph diagrams executed with yED Graph Editor (http://www.yworks.com/products/yed). Two nodes, representing two different KEGG pathways were connected if their intersection was significantly greater than expected (one-sided Fisher's exact test, correction for multiple testing by Holm's method). Importantly, non-miRNA targets among the pathway genes were not considered for the network construction. The node size refers to the pathway size in terms of the number of miRNA targets. The edge weight illustrates the degree of similarity between two pathways as measured by the Jaccard index.
General KEGG pathways of the category "global and overview maps" and "pathways in cancer" were neglected.
Gene Ontology (GO) term enrichment: Over-representation of GO terms were analyzed for the sub-category 'Biological process'. Target gene lists generated from the overlap between gene sets from significantly enriched KEGG pathways were searched for enriched GO terms using the GORILLA tool (http://cbl-gorilla.cs.technion.ac.il/). The background gene set was defined as all genes with a GO annotation. GO terms were summarized by removing redundant terms using REVIGO (http://revigo.irb.hr/) using default parameters.     (red) and down (green) regulating miRNAs were obtained from specific miRNA sets given in Table 1.  (Figs. 5A, B). Resulting GO terms were summarized by removing redundant terms with REVIGO using default parameters. Terms were then filtered according to the following thresholds: frequency < 5% and -log 10 P < 40 (intersection gene set), -log 10 P < 15 (5C-, 2A-specific gene sets). Dark color bars represent major GO terms; light color bars specify minor sub-terms of major terms (bold). A.

Supplementary Figure 8: Abbreviated GO terms of the subcategory 'Biological process' enriched in
5C and 2A models. Major (bold) and minor terms are listed for the AI resistance phenotype common to the 5C and 2A models (grey boxes), 5C-specific (blue boxes) and 2A-specific phenotype (green boxes).