Landscape of genome-wide age-related DNA methylation in breast tissue

Despite known age-related DNA methylation (aDNAm) changes in breast tumors, little is known about aDNAm in normal breast tissues. Breast tissues from a cross-sectional study of 121 cancer-free women, were assayed for genome-wide DNA methylation. mRNA expression was assayed by microarray technology. Analysis of covariance was used to identify aDNAm’s. Altered methylation was correlated with expression of the corresponding gene and with DNA methyltransferase protein DNMT3A, assayed by immunohistochemistry. Publically-available TCGA-BRCA data were used for replication. 1,214 aDNAm’s were identified; 97% with increased methylation, and all on autosomes. Sites with increased methylation were predominantly in CpG lslands and non-enhancers. aDNAm’s with decreased methylation were generally located in intergenic regions, non-CpG Islands, and enhancers. Of the aDNAm’s identified, 650 are known to be involved in cancer, including ESR1 and beta-estradiol responsive genes. Expression of DNMT3A was positively associated with age. Two aDNAm’s showed borderline significant associations with DNMT3A expression; KRR1 (OR 6.57, 95% CI: 2.51–17.23) and DHRS12 (OR 6.08, 95% CI: 2.33–15.86). A subset of aDNAm’s co-localized within vulnerable regions for somatic mutations in cancers including breast cancer. Expression of C19orf48 was inversely and significantly correlated with its methylation level. In the TCGA dataset, 84% and 64% of the previously identified aDNAm’s were correlated with age in both normal-adjacent and tumor breast tissues, with differential associations by histological subtype. Given the similarity of findings in the breast tissues of healthy women and breast tumors, aDNAm’s may be one pathway for increased breast cancer risk with age.


INTRODUCTION
It is well established that breast cancer incidence increases with age [1,2]. Among U.S. women, approximately 12% of invasive breast cancers are diagnosed in women < 45 years of age while 68% are diagnosed among women who are over age 55 [1,2]. The underlying mechanism for this large difference in risk is not well understood. One potential mechanism is epigenetic alterations, including DNA methylation, which is one of the hallmarks of aging [3] and breast carcinogenesis [4][5][6][7].
Increased age is associated with global hypomethylation of CpG loci outside of CpG Islands and also regional hypermethylation of CpG Islands [6][7][8][9][10][11]. Methylation of the tumor suppressor genes ESR1, IGFBP3, and RASSF1A specifically increases with age [6,7,10,12], but the converse occurs with methylation of repetitive elements [11]. Little is known about the timing of altered DNA methylation or age-related DNA methylation (aDNAm) in breast carcinogenesis, and whether it is different from normal aging in normal breast epithelial cells. A few studies that consist of only small sample sizes (n = 23 and n = 15 in two separate analyses) have examined aDNAm in normal breast tissues from healthy individuals [7,13]. A recently published study with 100 samples showed aDNAm at regulatory regions [14]. However, to date, there has been no study of the role of aDNAm with gene expression in normal breast tissues, although, there some contradictory evidence for this in blood cells [13,[15][16][17][18].
In this study, aDNAm was assessed in the breasts of women with no prior history of cancer, and the results were compared to aDNAm tumor tissues from The Cancer Genome Atlas (TCGA). Further, to better understand the mechanism of these changes and their impact on carcinogenesis, gene expression for these aDNAm were assessed, and if the aDNAm varied with DNA methyltransferase protein levels.

Characteristics of study subjects
Characteristics of study subjects are given in Table 1. Subjects' ages ranged from 17 to 76 years, with a mean of 38 years. The women were 77% premenopausal and 67% were of European American ancestry. Consistent with patients who typically seek breast reduction surgery, many women were overweight or obese (mean BMI: 30 kg/m 2 ; range 21-46).
A Manhattan plot of the aDNAm's is shown in Figure 1. The aDNAm's are spread across all of the autosomes; although 2.3% of aDNAm's would be expected to be on the X chromosome by chance, none were found. The top 30 aDNAm's are listed. A substantially higher frequency of aDNAm's were found on chromosomes 3,9,18, and 19 (> 20%, indicated in red), and a substantially lower frequency on chromosomes 12, 14, 21, and 22 (< -20%, blue) than expected given the proportion of CpGs on the array ( Figure 1B).
Overall, the correlations for those aDNAm's with increased methylation were stronger (partial correlations 0.41 to 0.78) than for those with decreased methylation (partial correlations -0.53 to -0.45) (Supplementary Table  1). aDNAm's most strongly associated with age, with increased and decreased methylation are shown as dot plots in Figures 1C and 1D. The three aDNAm's with the strongest statistical association with age were localized in the 5'UTR or 1stExon of ZNF274, the gene body of PTPRN, and an intergenic CpG locus (cg04880546) ( Figure 1C). Those which decreased mostly strongly in association with age were localized in the TSS1500 of CDKN1A, the body of PALLD, and one intergenic CpG (cg02315421) ( Figure 1D).
The overlapping genomic location of aDNAm's with sites prone to mutation in human cancer was found www.impactjournals.com/oncotarget using the COSMIC mutation database. Fifteen aDNAm's were co-localized within vulnerable genomic regions where somatic mutations occur, including for breast cancer (NEFM) (Supplementary Table 2). Three of these loci were in genes for transcriptional regulators (PAX5, SOX21, and ZGPAT). All overlapped aDNAm's were located in CpG Islands (n = 10) or shores/shelf (n = 4) except one locus (Supplementary Table 2).

Correlation between aDNAm's and their gene expression
We found a significant inverse correlation between gene expression of C19orf48 and its methylation level (cg01534416 located in TSS1500) (r = -0.42, FDR q = 0.011) ( Figure 4B). Another four genes showed borderline significant correlations at FDR q < 0.1. Three of these were located in functional promoter regions (corresponding to genes MYEF2, SPATA18, and RFC4) and the other was located in the gene body (C9orf41). All aDNAm's were located in CpG Islands or shores and inversely correlated with gene expression ( Figure 4B). An expected % was calculated based on a total 437,653 CpGs analyzed on the array and the observed % was calculated among 1,214 aDNAm's at Bonferroni corrected P < 0.05. Chromosomes in red and blue are those substantialy (20%) higher or lower than expected, respectively. Dot plots of beta-values for top 3 aDNAm's for gain (C) and loss (D) of methylation based on the p-value. Each point represents the beta-value for an individual (n = 121). Three age groups are colored for < 35 (blue) , 35-49 (red) , and > = 50 (green) for visualization. The raw and bonferroni corrected P-value for the association of DNA methylation with age are shown.

TCGA-BRCA dataset: aDNAm's in breast adjacent normal and tumor tissues
The aDNAm's identified for the healthy women were queried in the TCGA-BRCA data. The characteristics of TCGA samples were provided in Supplementary  Table 4. The mean methylation levels across all CpG loci differed little between breast tissue from healthy women (the present study) and adjacent "normal" and tumor tissue in the TCGA dataset in both pooled and paired samples ( Figure 5A). The mean aDNAm level in the breast tissues from the women without a history of cancer was 39% lower than for the TCGA adjacent normal tissues (n = 95) (P = 1.18 × 10 −26 ) and 53% lower than for the TCGA tumor tissues (n = 698) (P < 1.89 × 10 −37 ) ( Figure 5A). An unsupervised clustering for the respectively. A proportion was tested (Chi-Squre) for a difference across regions between gain and loss of methylation (P-value in the box) and for a difference of gain/loss methylation compared to the expected aDNAm's (P-value on the topt of box plot) . www.impactjournals.com/oncotarget 1214 aDNAm's consistently showed higher methylation of aDNAm's among TCGA tumor compared to adjacent normal tissues, and to the breast tissues of women without a history of cancer (Supplementary Figure 1A).
The mean methylation of the 1,214 aDNAm's was similar by tumor subtype for Luminal A (mean beta 0.29), Luminal B (mean beta 0.31), and HER2 breast cancers (mean beta 0.28), but was statistically different for Basal (mean beta 0.23) ( Figure 5B, Supplementary Figure  1B). The mean methylation of the 1,214 aDNAm's was significantly higher for tumors that were ER+, PR+, or HER2+ compared to ER-, PR-, or HER2-at P = 1.60 × 10 −21 , P = 4.82 × 10 −15 , or P = 9.95 × 10 −3 , respectively ( Figure  At the locus level in the TCGA dataset, among the 1,214 aDNAm's, 49.3% of the 1,214 aDNAm's (n = 599) were associated with age in both adjacent normal and tumor tissues at FDR q < 0.05 ( Figure 6A). These replicated aDNAm's in both tissues were significantly enriched for CpG Islands compared to the ones that were not replicated (only observed in the breast tissues from the women without cancer) (70% vs. 44%; P < 0.05) ( Figure 6A). The genes corresponding to these replicated aDNAm's were predicted to be involved in cellular function and maintenance as the primary cellular functional role ( Figure 6A). Also, 35% (n = 419) or 11% (n = 133) were significantly associated with age in only adjacent normal or only tumor tissues, respectively (Spearman correlations with FDR q < 0.05). The direction of change with age was the same for all loci.
We further examined if the 732 aDNAm's in TCGA tumor tissues were differentially methylated by hormone receptor status. Of these aDNAm's, 565 (77%), 492 (67%), and 102 (14%) were significantly differentially Lists of beta-estradiol or/and ESR1 responsive genes corresponding to aDNAm's are shown. (C) Among genes corresponding to aDNAm's, 86 breast cancer related genes are shown by spatial location of molecules. Genes confirmed to be associated with age are colored. Orange colored genes was confirmed to be associated with age in both breast adjacent normal and tumor tissues from TCGA independent datasets. Light blue or pink colored genes were associated with age in either adjacent normal or tumor, respectively. Gray colored genes were not validated. www.impactjournals.com/oncotarget methylated by ER, PR, and HER2 status, respectively, at FDR q < 0.05 ( Figure 6B). The majority of them (92-97%) were significantly more methylated (7-23% higher) in positive compared to negative hormone receptor tumors ( Figure 6B).
The replicated aDNAm's in the TCGA dataset and differentially methylated aDNAm's by hormone receptor status are provided in Supplementary Table 5.

DISCUSSION
Age is a significant and established risk factor for breast cancer [1,2]. Gene methylation in breast cancer also changes with aging [5][6][7], but the extent of these changes in normal breast tissues and the direct relevance to breast cancer development is unknown. Studying aDNAm's in normal breast tissues from women without a history of cancer or evidence of benign breast disease should provide insight for what occurs naturally in the breast over time, and potentially contribute to breast cancer development. In this study, 1,214 aDNAm's were identified, all autosomal, most showing increased methylation with age, and usually found in CpG Islands and non-enhancers. Two aDNAm's co-localized within vulnerable regions for somatic mutations in breast cancer. In addition to a significant inverse correlation between gene expression of C19orf48 and its methylation level, we found another four genes to be borderline statistical significant correlations (C9orf41, MYEF2, SPATA18, and RFC4). High expression of DNMT3A protein was significantly associated with increasing age and two aDNAm's (corresponding to KRR1 and DHRS12). A majority of the genes corresponding to aDNAm's were associated with cancer including breast cancer; the genes were highly enriched for ESR1 or betaestradiol responsive genes. Using the independent TCGA-BRCA dataset, we found that aDNAm's identified in breast tissues of women without a history of breast cancer were highly methylated in breast tumors compared to both adjacent normal tissues and to the normal breast tissues. Methylation of the aDNAm's was lower for basal tumors compared to other subtypes.
Global gene methylation on repetitive elements has been shown to be inversely correlated with aging [8,10]. In contrast, it also is known that DNA methylation at specific loci is positively correlated with age [19]. We found a positive correlation for methylation with age for 97% of the CpGs identified; 60% of these were in CpG Islands as compared to only 32% expected by chance. In line with these findings, other studies have shown regional hypermethylation of CpG Islands in a variety of other organs [7,11,13,[20][21][22]. There also is some consistency in these findings for aDNAm's reported in blood for a population-based longitudinal study of healthy individuals (n = 400) for 31% of the detected 162 aDNAm in blood [20]. However, in the only prior study of normal breast tissues that we are aware of, there was low agreement of results, (8% [16/199 aDNAm's]), perhaps due to their using a different assay platform or small study size (n = 23 and n = 15 in two separate analyses) [7].
We found alterations in DNA methylation related to age for 803 unique genes, most corresponding to coding genes known to be involved in cancer (e.g., TP73, CDKN1A and ESR1). aDNAm's involved ESR1 and beta-estradiol responsive genes. ESR1 is one of the well-established master transcriptional regulators in the breast [23] and is epigenetically silenced in breast cancer [24]. Some aDNAm's were identified in genes for noncoding RNAs (ncRNAs) that may affect regulation of Oncotarget 114656 www.impactjournals.com/oncotarget gene expression [25,26] and involved in breast cancer [27,28]. It has previously been reported that ncRNAs may be associated with aging [29]., 8 aDNAm's were located in the promoters of precursor-miRNAs (pre-miRNAs), indicating a possible regulatory role of the aDNAm's in miRNA expression. These include mir-15b/16-2 that targets the BCL2 oncogene [30][31][32]; mir-7-3, a let-7 family member that regulates the RAS oncogene [33]; mir-148a, known to induce apoptosis by targeting IGF-1R and IRS1 in breast cancer cells [34]; mir-426 known to promote cell proliferation in breast cancer [35]; and, mir-935 known to be differentially expressed in hormone-responsive breast cancer cells [36]. A role for two of the miRNAs (mir-596 and mir-1253) is unclear for breast cancer, but hypermethylation of these have been found in endometrial cancer cell lines [37].
None of the 1,214 aDNAm's identified in the present study mapped to the X-chromosome, and this was also found, except for one in the 1,685 aDNAm's identified in TCGA tumor tissues (Bonferroni corrected P < 0.05; data not shown). This paucity of aDNAm's on the X chromosome is surprising considering that DNA methylation plays a key role in X chromosome inactivation [38]. However, a role in cancer seems less likely and is consistent with the small number of mutations on the X chromosome compared to autosomes [39]. Similarly, a recent study showed a significantly higher stability of X chromosome transcripts than for autosomal transcripts in ). The differences were tested using Mann-Whitney rank tests and Benjamini and Hochberg False Discovery Rate (FDR) q < 0.05 was considered significant. In pie charts, higher and lower methylation of differentially methylated aDNAm's in positive compared to negative hormone receptor status are shown as red and blue section, respectively. www.impactjournals.com/oncotarget various human cell lines, both male and female, and in mice [40]. Taken together, the X chromosome may be both genetically and epigenetically more stable than autosomes, even over aging, in both normal and tumor breast.
The mechanisms for changes in DNA methylation with aging are poorly understood. The de novo DNA methyltransferases play a key role in early development, are down-regulated in adult somatic tissues, and conversely are over-expressed in breast cancer and other tumors [41,42]. In this study, expression of DNMT3A protein in the breast tissue was correlated with age and its expression was significantly correlated with two aDNAm's including KRR1 and DHRS12, indicating a possible mediation of DNMT3A in DNA methylation of these two CpGs during aging particularly for those aDNAm's. A recent review suggested that de novo methylation related to age is involves DNA methyltransferases, consistent with the findings at least for these two CpGs herein [10].
In our study, we found only a few aDNAm's associated with gene expression in normal breast tissue. Although we could not validate the direct relationship of DNA methylation and gene expression for most of the aDNAm's genes, we found correlates to some of the aDNAm's herein; gene expression was decreased for 5 aDNAm genes (C9orf41, C19orf48, MYEF2, SPATA18, and RFC4). Understanding how differences in aDNAm' related to their gene expression has been challenging. This is because gene expression is regulated in many ways in addition to DNA methylation [43], and so associations are plausibly weak. For example, there are important roles for CpG Islands, promoters, and enhancers [44], and ncRNAs [45]. A recent study of normal breast tissues showed aDNAm's enriched for regions of chromatin remodeling and transcriptional control [14], suggesting its possible contribution to gene expression. In our study, the most statistically significant correlation was found for C19orf48, which encodes a minor histocompatibility antigen. Although its functional role is unknown, the promoter methylation site of C19orf48 (cg01410314) correlated with its gene expression in the current study is marked by the enhancer-related histone mark H3K27Ac based on the UCSC Genome Browser (data not shown), indicating a possible contribution of DNA methylation on its gene expression. SPATA18 is suggested to be a novel transcriptional target of P53 and is down-regulated in breast cancer compared to normal breast tissues, indicating a novel tumor suppressor gene [46]. RFC4 is involved in DNA replication and chromosomal stability, and its upregulation was found in the poor prognostic group of breast cancer [47]. Given that the aDNAm's in SPATA18 and RFC4 are located in CpG Islands, some of the aDNAm's identified in this study may contribute to gene regulation. Although a limited number of aDNAm's was found, another functional significance of the a subset of aDNAm's identified that overlapped with mutation-prone sites in human cancer, mostly located in CpG Islands, shores, or shelves, indicating a possible involvement of these aDNAm's in somatic mutations in cancers.
This study has several strengths. It provides a comprehensive analysis of aDNAm's in breast tissues from healthy women with no previous history of breast cancer. Prior studies mostly focus on blood and other tissues, and there has been limited study in the breast [20][21][22]. It is unknown how aDNAm in blood and other tissues reflect aDNAm's in the breast. Also, the study of breast tissues from women without cancer identifies aDNAm's that may be playing a role in breast carcinogenesis, when they are found to also occur in breast tumors, as demonstrated herein. Another strength is the assessment of aDNAm's biological effects on gene expression. Further the understanding of the association of altered DNA methylation with the expression of the DNMT3A protein adds to our understanding of these processes.
This study also has limitations. We studied breast normal tissues from women who had undergone reduction mammoplasty; these women necessarily have larger breasts than other women and have greater BMIs, possibly limiting the generalizability of these findings. However, this limitation was tempered by multivariable adjustment analysis for BMI. Further, the concordance between the findings in the healthy women and those in the TCGA data set provides some indication that the findings are more generalizable. Another limitation is that the cross-sectional study design provides evidence of association but not causality, and it is not know which of these women, if any, would develop breast cancer later in life. While a longitudinal examination of changes in methylation with aging would be ideal, it would be difficult to collect human breast tissues on multiple occasions for such a study. Also, although adipose tissue was dissected from epithelia tissue at the time of specimen collection, blunt dissection does not perfectly segregate the epithelia from adipose tissue components, so potential confounding due to cellular proportions may have entered into our determinations of aDNAm. Given the rising this concern about DNA methylation differences by cellular proportions, we assessed breast tissue heterogeneity on a subset of samples (93/121) as described in our previous study [48]. Although this assessment may not directly address the heterogeneity issue due to different biological materials used (slides for heterogeneity and tissues for DNA methylation), we found that 88% of aDNAm's (999/1214) were not associated with the heterogeneity index (unadjusted P > 0.05) (Data not shown). This provides evidence that heterogeneity had little influence on the majority of aDNAm's associated with age. Moreover, we were limited to prove biological mechanisms of aDNAm's although we found an evidence of estrogen influences on aDNAm's. Although cell-tocell signaling and interactions, cell death and survival, cell morphology, cellular growth and proliferation, and gene expression functions were enriched in IPA analyses, it is possible that these analyses are biased toward genes www.impactjournals.com/oncotarget with higher numbers of probes (average aDNAm's = 29, average non-aDNAm's = 18); however, we are uncertain if this biases the IPA analyses toward any particular function or regulator. Separately, we utilized an array-based method to identify aDNAm's. Although HumanMethylation450 BeadChips provide quantitative methylation levels at a single-base resolution, the coverage of total CpGs is low (approximately 2%). Also this array does not allow the detection of allele-specific changes in DNA methylation. Thus, new methods such as next-generation sequencing will further provide large-scale methylation data without loss of information in the entire genome. Although we found the lower mean methylation of aDNAm's identified in breast tissues of women without a history of breast cancer than those for TCGA tissues, it is possible that this reflects differences in collection and technical protocols, rather than a true biological difference.
This study is the first comprehensive report of changes in DNA methylation with age in normal breast tissues of women without a history of cancer. We found altered methylation at 1,214 aDNAm's, almost all of which were increased methylation. The alterations were present only on autosomes. The affected genes included those known to be important in breast cancer, such as ESR1 or beta-estradiol responsive genes. The results are consistent with the hypothesis that the relationship of aging to breast cancer may be explained at least in part by age-related changes in DNA methylation and gene expression in normal tissues before clinical cancer develops. Given that age is one of the strongest risk factors for breast cancer, understanding the mechanism of that association provides critical insights. Further understanding of the underlying mechanisms for age-related effects on DNA methylation warrants further study.

Study samples
A subset of samples was from our previous study and detailed methods of this study have been described elsewhere [48][49][50][51]. Briefly, women (n = 121) who underwent reduction mammoplasty at Georgetown University Medical Center (Washington, DC), the University of Maryland (College Park, MD), the Washington Hospital Center (Washington, DC) and the Center for Plastic Surgery (Buffalo, NY) provided written informed consent, an epidemiologic questionnaire, blood and their residual breast tissues. Institutional Review Boards was received at each institution. Breast tissues were grossly blunt-dissected to separate epithelial tissues from adipose, snap frozen in liquid nitrogen, and stored at -80 °C until use. A part of the sample was immersed in RNA later (Ambion, Inc., Austin, Texas). Women with evidence of premalignant benign breast disease were excluded. Demographic, lifestyle, reproductive, and family medical history data were assessed by an interviewer-administered questionnaire.

Genome-wide DNA methylation analysis and quality checks
Genomic DNA was extracted from dissected frozen fresh breast tissue using a MasterPure DNA purification kit (Epicenter, Madison, WI). Following bisulfite treatment of DNA using the EZ DNA Methylation kit (Zymo Research, Irvine, CA), genome-wide DNA methylation was analyzed using HumanMethylation450 BeadChips (Illumina, San Diego, CA) (HM450), according to the manufacturer's instructions. In order to minimize the impact of batch effects, samples were randomized by age and ancestry [52]. Illumina .idat file were imported into Partek Genomics Suite™ 6.6 (Partek Inc., St. Louis, MO) and normalized by Subset-quantile Within Array Normalization (SWAN) [53]. GRCh37/hg19 (Human Genome version 19) was used as a reference genome. Any probes with the following criteria were filtered out before further statistical analysis: detection P > 0.05 (n = 6,576), probes in Y chromosome (n = 416), and cross-reactive probes (n = 41,248) [54,55]. An ANOVA model was used to remove the batch effect, with processing data adjusted to remove these effects. 19 out of 121 samples (about 16%) were duplicated as internal quality controls (QCs) while processing the samples. The correlation coefficient for duplicate signal intensities in the arrays was 99% (data not shown). Previously, we have shown high concordance between HM450 and gene methylation by pyrosequencing in a subset of samples used in this study [50,51]. The HM450 data were deposited to under NCBI GEO GSE101961.

Human transcriptome array
Total RNA was extracted from frozen breast tissue stored in RNAlater (Ambion) using 1.5 mm Triple-Pure Zirconium Beads (Benchmark Scientific, Edison, NJ) and RNeasy Plus Mini Kit (Qiagen, Valencia, CA). To profile gene expression, the GeneChip ® Human Transcriptome Array 2.0 (Affymetrix Inc, Santa Clara, CA) was used. The data available were limited to 104 out of 121 samples because of the RNA quality. The raw data (CEL files) were imported into the Affymetrix Expression Console ® Software (version 1.3.1) for log 2 transformation and quantile normalization. Batch effect was removed as described above. Ten percent of samples were duplicated for quality control while processing the samples. The correlation coefficient for duplicate signal intensities in the arrays was 99% (data not shown). The Affymetrix gene expression data were deposited to under NCBI GEO GSE102088. www.impactjournals.com/oncotarget Immunohistochemistry (IHC) for DNMT3A IHC staining of DNA Methyltransferase 3 Alpha (DNMT3A) was done on formalin-fixed paraffin-embedded (FFPE) tissues (n = 91) using antibodies purchased from Novusbio (NBP-1-85961). Heat induced epitope retrieval was performed by immersing FFPE samples at 98°C (20 minutes) in citrate buffer (10 mM; pH 6.0) with Tween (0.05%). IHC was performed using the VectaStain Kit from Vector Labs according to manufacturer's instructions. Slides were exposed to biotin-conjugated secondary antibodies, and counterstained with hematoxylin (Fisher, Harris Modified Hematoxylin), blued in 1% ammonium hydroxide, dehydrated, and mounted with AcryMount. Consecutive sections with the primary antibody omitted were used as negative controls. Nuclear DNMT3A staining within epithelial cells was scored by the pathologist (BVSK) using the modified Allred method (scaled 0-6). The score combined an estimated proportion score on a scale of 0 to 3 (0: negative, 1: less than 10%, 2: 10-50%, and 3: greater than 50%) with an average intensity score of 0 to 3 (0: negative, 1: weak, 2: moderate, and 3: intense). 17.5% of data (16/91) was duplicated and agreed on all intensity (16/16) and distribution scores except for a distribution score from one sample (15/16).

The cancer genome atlas data (TCGA)
Level 1 data from TCGA-Breast Invasive Carcinoma (BRCA) database were downloaded as .idat files via https:// tcga-data.nci.nih.gov/tcga/. The data were normalized using SWAN [53]. There were data for 698 women with breast cancer who were either European American or African American; 88 of those also had data from paired adjacent normal tissues. An ANOVA model was used to remove the batch effects.

Statistical analyses
For initial identification of aDNAm's, a Bonferroni threshold of 0.05 was used to identify the most promising signatures. For all downstream analyses, Benjamini and Hochberg False Discovery Rate (FDR) = 0.05 was used as the threshold. If not stated otherwise, a raw P < 0.05 was considered statistically significant.

Locus-by-locus analysis to identify aDNAm's
For modeling purposes, M-values were derived from Beta-values by logit-transformation. To identify aDNAm's, analysis of covariance (ANCOVA) was used for age as a continuous variable with adjustment by race as a categorical variable (European American vs. African American), and body mass index (BMI; kg/m 2 ) as a continuous variable, variables that were significantly correlated with methylation [50]. A Bonferroni-corrected P < 0.05 (corresponding to raw P < 1.14 × 10 −7 ) was considered statistically significant.

Genomic features of aDNAm's
aDNAm's were classified by genomic location based on the Illumina annotation file (HumanMethylation450_15017482_v1-2): CpG Islands, 2 kb regions upstream and downstream of the CpG Islands (shores), 2 kb regions upstream and downstream of the shores (shelves), functional promoters [within 1500 base pairs (bps) of a transcription start site (TSS) (TSS1500); within 200 bps of a TSS (TSS200); 5' untranslated regions (5'UTR); first exon (1stExon)] and other regions [body, 3'UTR, or a stretch of DNA region located between genes (intergenic )]. To investigate potential sites prone to mutation due to DNA methylation of aDNAm's, we used the Catalogue of Somatic Mutations in Cancer database (COSMIC) (http://cancer.sanger.ac.uk) and searched genomic locations where aDNAm's are located in order to identify the overlap of aDNAm's with sites prone to mutation in human cancer.

Comparisons of distribution of aDNAm's across the genomic location
The distribution of aDNAm's across genomic location (CpG Island, functional location, and enhancer) was compared to the distribution of total CpGs analyzed (n = 437,653). The enhancers are derived from the Illumina annotation [56] based on enhancer elements determined by the Encyclopedia of DNA Elements (ENCODE). A chi-square raw P < 0.05 was considered statistically significant.

Correlation between methylation of aDNAm's and gene expression
Probes from HM450 and Affymetrix were matched for gene names (perfectly matching). The CpGs located up to 1500 bps upstream of the gene, gene body, and 3'UTR were included in the analysis. The aDNAm (M-value) was correlated with expression of a corresponding gene (mean of log 2 transformed intensities if there was more than one probe) by Spearman correlation. FDR q < 0.05 (corresponding to raw P < 8.66 × 10 −6 ) were considered significant.

Correlation of DNMT3A expression with age or aDNAm's
A logistic regression model was used because of the skewed distribution of DNMT3A expression (data not shown). Median values were used to dichotomize age (37 years) and aDNAm's. DNMT3A by IHC was dichotomized using the median Allred score as the cutpoint: ≤ 5 = lower expression (n = 43) and 6 = higher expression (n = 66). Logistic regression models were used to estimate odds ratios (OR) and 95% confidence intervals www.impactjournals.com/oncotarget (CI) for associations of DNMT3A protein expression with age at a raw P < 0.05 or with methylation levels of aDNAm's at FDR q < 0.05 (corresponding to raw P < 4.22 × 10 −5 ).

aDNAm's in the TCGA dataset
Mann-Whitney rank tests were performed to compare mean methylation levels between groups (normal, pooled adjacent normal, pooled tumor, paired normal, and paired tumor tissues) and between different breast cancer types. A raw P < 0.05 was considered statistically significant. To assess aDNAm's identified from normal breast tissues in TCGA samples, M-values of 1,214 CpGs were first filtered from TCGA datasets and correlated with age using the Spearman correlation. Spearman correlation FDR q < 0.05 were considered statistically significant.

Ingenuity pathway analysis (IPA)
The unique gene list corresponding to aDNAm's was created and used for IPA. The imported genes were classified by IPA (Ingenuity ® Systems, www.ingenuity.com) using the biological functions to be presented as being used for annotation, ranked by score. The score [score = -log 10 (pvalue)] computed by IPA is a measure of the probability of finding identified genes in a set of a list of biological functions stored in the IPA knowledge base (IPKB).