Targeted exome sequencing of Korean triple-negative breast cancer reveals homozygous deletions associated with poor prognosis of adjuvant chemotherapy-treated patients

Triple-negative breast cancer is characterized by the absence of estrogen and progesterone receptors and human epidermal growth factor receptor 2, and is associated with a poorer outcome than other subtypes of breast cancer. Moreover, there are no accurate prognostic genes or effective therapeutic targets, thereby necessitating continued intensive investigation. This study analyzed the genetic mutation landscape in 70 patients with triple-negative breast cancer by targeted exome sequencing of tumor and matched normal samples. Sequencing showed that more than 50% of these patients had deleterious mutations and homozygous deletions of DNA repair genes, such as ATM, BRCA1, BRCA2, WRN, and CHEK2. These findings suggested that a large number of patients with triple-negative breast cancer have impaired DNA repair function and that therefore a poly ADP-ribose polymerase inhibitor may be an effective drug in the treatment of this disease. Notably, homozygous deletion of three genes, EPHA5, MITF, and ACSL3, was significantly associated with an increased risk of recurrence or distant metastasis in adjuvant chemotherapy-treated patients.


INTRODUCTION
Breast cancer is one of the most prevalent cancers worldwide, with over 1,300,000 newly diagnosed patients and 450,000 deaths each year [1]. Breast cancer is a highly heterogeneous disease with diverse pathophysiological and clinical features that can be caused by distinct genetic, epigenetic, and transcriptomic changes. Based on expression of estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2), breast cancer can be categorized into three subtypes: hormone receptor-positive (ER+ or PR+), HER2positive (ER-, PR-, and HER2+), and triple-negative breast cancer (TNBC) (ER-, PR-, and HER-) types [2,3]. TNBC accounts for approximately 10-20% of invasive breast cancers, and the mortality rate of women with TNBC during the 5 years after diagnosis is high [4,5]. Based on ethnicity, breast cancer incidence rates are higher in Caucasian than in African-American, Hispanic, and Asian women. However, aggressive and advanced-stage breast cancer diagnosed at an early age, in particular TNBC, is more frequent in African-American than in Caucasian women [6].
Although agents targeting hormone receptors and HER2 can be used to treat hormone receptor-positive and HER2-positive types of breast cancer, these agents are ineffective against TNBC because of the absence of the targeted receptors (ER, PR, and HER2) [7,8]. Despite several pioneering genome-wide studies that aimed to identify diagnostic and therapeutic biomarkers in TNBC, there has been no comprehensive effort to date that has attempted to identify TNBC biomarkers in the Korean population [9][10][11]. Because there is no conventional therapy targeting TNBC, studies that intensively evaluate genomic alterations are essential to identify novel prognostic biomarkers and/or therapeutic targets for TNBC.
Owing to its greater cost-effectiveness than whole genome or whole exome sequencing, targeted exome sequencing has recently revolutionized human clinical cancer diagnosis, facilitated studies towards understanding cancer-causing mechanisms, and enabled the identification of therapeutic targets [12][13][14][15]. In particular, the HaloPlex target enrichment system for targeted exome sequencing has shown high efficiency in capturing targeted regions on the exome and high library complexity [16].
This study was designed to characterize the somatic mutation profiles of 368 cancer-associated genes in 70 Korean patients with TNBC and to identify novel somatic mutations and potential prognostic genes. We found that more than half of the patients in our cohort had deleterious mutations in several DNA repair-related genes, suggesting that poly ADP-ribose polymerase (PARP) inhibitors may be effective in treating patients with TNBC therapy. Moreover, we identified three candidate prognostic genes whose homozygous deletions were significantly associated with the prognosis of patients who had been treated with adjuvant chemotherapy.

Analysis of somatic single nucleotide variants and small insertions and deletions
Clinicopathological characteristics of the 70 patients with TNBC included in this study are described in Table 1. Of these patients, 15 (21%) experienced tumor recurrence, including eight with distant metastases. The mean follow-up period was 4.88 years. We determined whether clinicopathological factors, such as age, primary tumor stage (pT), and lymph node metastasis, were associated with patient outcomes, including diseasefree survival (DFS) and distant metastasis-free survival (DMFS), finding no evidence of association between these factors and either DFS or DMFS (Supplementary Table 1).
The average target coverage depths were 130.36× for tumor samples and 139.71× for normal samples, and target regions with read coverage depths >2× and >100× accounted for over 93% and over 40%, respectively, of the entire target region (Supplementary Table 2). Analysis showed 292 somatic single nucleotide variants (SNVs) and 30 somatic small insertions and deletions (INDELs) in 157 genes. Of these variants, 238 mutations were novel SNVs or INDELs that had not been reported previously in either the COSMIC or dbSNP database ( Figure 1A, Supplementary Table 3). Supplementary Table 4 lists all  somatic SNVs and INDELs, whereas Tables 2 and 3  Because deleterious germline mutations in BRCA1 and BRCA2 have been significantly associated with breast cancer [17,18], we assessed whether germline mutations in these two genes were present in our cohort. We found two deleterious germline mutations in BRCA1 in three patients, and one stop-gain germline mutation in BRCA2 in one patient (Supplementary Table 5). BRCA1 c.922_924delAGCinsT (p.Ser308fs), found in two patients, and BRCA2 c.8363G>A (p.W2788X), found in another patient, are mutations shown to have highly detrimental clinical impact [19][20][21], whereas BRCA1 c.279delA (p.Phe93fs), found in a fourth patient, was identified as a novel germline frameshift mutation.

Analysis of copy number variations
Copy number variation (CNV) analysis identified an average of 37.77 (range, 0-214) amplified genes and 26.86 (range, 1-170) homozygously deleted genes per patient ( Figure 1B). Supplementary Table 6 lists all genes with CNV amplifications and homozygous deletions, whereas Table 2 lists the most frequently altered of these genes. Homozygous deletion of TP53, a tumor suppressor gene with the highest mutation frequency in this study, was observed in ten patients with TNBC, indicating that 55 (79%) of the 70 patients in our study cohort had either mutated or deleted TP53. Frequent amplification of NDRG1 and deletion of WRN and ATM were validated by qPCR (Supplementary Figure 2).
In addition to the deleterious germline mutations described previously, somatic homozygous deletions of BRCA1 and BRCA2 were observed in the genomes of 12 and 10 patients, respectively ( Table 2, Supplementary  Table 5). Some of these homozygous deletions were limited to a single exon, whereas other encompassed several exons (Supplementary Figure 3).

Association of homozygous deletions with clinical outcomes
Using a Cox proportional-hazards regression model, we determined whether these somatic mutations were associated with the prognosis of the 67 patients who had been treated with adjuvant chemotherapy. We found that homozygous deletion of the three genes identified in our study was associated with an increased risk of recurrence or distant metastasis in patients with TNBC (Supplementary Table 7). Figure 3A shows the hazard ratios (HRs) and 95% confidence intervals (CIs) of each gene for DFS and DMFS. In addition, Kaplan-Meier analysis was performed to confirm the association between homozygous deletions of these three genes and poor prognosis. These analyses showed that homozygous deletions of EPHA5 (P < 0.001 for DFS; P = 0.003 for DMFS), MITF (P < 0.001 for DFS; P < 0.001 for DMFS), and ACSL3 (P < 0.001 for DFS; P = 0.001 for DMFS) were significantly associated with a negative prognosis in patients with TNBC ( Figure 3B).

The cancer genome atlas data analysis
Associations between levels of mRNA expression and copy number alteration of genes identified as frequently amplified in our 70 Korean TNBC samples were analyzed using CNV and mRNA expression data from The Cancer Genome Atlas (TCGA) breast cancer database. We found that copy number gain or amplification of six genes (NDRG1, UBR5, MYC, EXT1, NBN, and COX6C) was positively correlated with high mRNA expression ( Figure 4A). Kaplan-Meier analysis showed that the overall survival rates were significantly lower in breast cancer patients with than without amplification of one of these genes (log rank test; NDRG1, P = 0.0554; UBR5, P = 0.0122; MYC, P = 0.0094; EXT1, P = 0.0103; NBN, P = 0.0030; and COX6C, P = 0.0073) ( Figure 4B).
Next, we used STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) v.10 [22] to perform a network interaction analysis of proteins encoded by these genes with the most frequent genetic alterations (i.e., somatic non-synonymous mutations and CNVs) in our cohort of 70 Korean patients with TNBC. We found that DNA damage response genes, such as TP53 and WRN, were frequently mutated in our TNBC cohort ( Figure 4C). Notably, mutual exclusivity analysis using 500 clinical breast cancer samples from the TCGA database indicated a high likelihood of co-occurrence of alterations in the TP53, MYC, WRN, NDRG1, NOTCH3, UBR5, and BRD4 genes, all of which are involved in the above-mentioned interaction network. This finding supports the reliability and robustness of our analysis (Supplementary Figure 4).

DISCUSSION
Despite recent attempts to understand the clonal evolution of TNBC and to determine a detailed mutational spectrum in these tumors, little is known about the unique mutational profiles and therapeutic targets in TNBC patients from diverse ethnic populations [9]. This study revealed a comprehensive mutational spectrum specific to Korean patients with TNBC, as well as identifying novel, potentially prognostic genes. Compared with a cohort of Western European-North American patients with TNBC, our cohort of Korean patients possessed unique genetic features, which also included commonly mutated genes, such as TP53 and PIK3CA (Supplementary Table 8). Several recent studies have reported that mutations in NOTCH3 and NOTCH4 may cause breast cancer [23][24][25]. Similarly, we discovered novel recurrent SNVs in the N-terminal cytoplasmic domain of NOTCH3, including c.6841C>G (p.A2281P) in seven patients, and in the EGFlike domain of NOTCH4, including c.625T>G (p.T209P) in nine patients, c.118T>G (p.T40P) in five patients, and c.3064C>G (p.A1022P) in four patients. These mutations may have an important role in inducing oncogenic activity by inhibiting the binding of their ligands to NOTCH3 and NOTCH4. In addition, three patients in our cohort had the PIK3CA c.3140A>G (p.H1047R) mutation, which was recently reported as being crucial in inducing multipotency and heterogeneity of breast cancer [26,27]. These findings reinforce the likelihood that the other novel recurrent mutations identified in our cohort warrant further investigation as molecular pathogenic biomarkers.
We also found that 26 (37%) of the 70 patients in our cohort had mutations in BRCA1 and BRCA2, including 12 and 10 patients with homozygous deletions of BRCA1 and BRCA2, respectively, two and four with deleterious somatic mutations, respectively, and three and one with deleterious germline mutations, respectively. In addition, homozygous deletions of DNA damage repair genes, such as ATM, WRN, and CHEK2, were present in more than half of our study cohort, suggesting that a large proportion of Korean patients with TNBC have an impaired DNA repair system, such as a homologous recombination deficiency. These findings suggest that a PARP inhibitor may have potential for treatment of TNBCs.   Adjuvant chemotherapy has been reported to dramatically increase DFS and overall survival of patients with basal-like breast cancer (BLBC) [28]. Of the 70 patients in our TNBC cohort, 67 had been treated with adjuvant chemotherapy. Nevertheless, we found that three homozygously deleted genes were significantly associated with poor prognosis in patients who had received adjuvant chemotherapy. These findings suggest that homozygous deletion of these genes may contribute to resistance to adjuvant chemotherapy. Moreover, our results may provide clues about the mechanism of TNBC resistance to chemotherapy.

Ethics statement
This study was approved by the Institutional Review Board of the Samsung Medical Center, Seoul (South Korea), and performed in accordance with the principles of the Declaration of Helsinki. Because the study was retrospective in nature, the requirement for informed consent was waived. Patient information was anonymized and de-identified prior to analysis.

Patients and tissue samples
Seventy TNBC and matched normal tissues were collected from the pathology department at Samsung Medical Center, Seoul, South Korea. Immediately upon removal, the specimens had been frozen immediately in liquid nitrogen or fixed in formalin, with the latter used to produce formalin-fixed and paraffin-embedded (FFPE) blocks. Sections of each FFPE sample were stained with hematoxylin and eosin for sample validation by a pathologist (YLC). The expression of ER, PR, and HER2 was assessed by the same pathologist (YLC), as previously described [28].

Selection of target genes
Of the 368 selected target genes, 234 had previously been reported to be cancer-associated genes frequently mutated in solid tumors and sarcomas, but not in hematological cancers, and listed in the Cancer Gene Census of the Wellcome Trust Sanger Institute (http:// cancer.sanger.ac.uk/census/), and 134 were genes encoding cell growth-and kinase-related factors and transcription factors. These 368 genes included 5,700 regions encoding exons. The total size of the target region was 961,497 bp (Supplementary Table 9).

Targeted exome sequencing using HaloPlex target enrichment
Genomic DNA was extracted from frozen samples using DNeasy Blood & Tissue kits (Qiagen, Hilden, Germany) according to the manufacturer's instructions. DNA of sufficient purity was defined spectrophotometrically using a 260 nm/280 nm ratio between 1.8-2.1 and a 260 nm/230 nm ratio ≥ 1.5. After digestion and denaturation, targeted fragment DNA was hybridized with biotinylated probes designed to guide circularization of the target DNA fragments, with incorporation of sequencing motifs. Targeted fragments bound to biotinylated HaloPlex probes (Agilent Technologies, Santa Clara, CA, USA) were retrieved using magnetic streptavidin beads. Circularized molecules were closed by ligation, which ensured that only perfectly hybridized fragments were circularized and that only circular DNA targets were amplified by PCR, thus providing enriched and bar-coded amplified products for sequencing with a HiSeq 2000 (Illumina, San Diego, CA, USA).

Bioinformatic analysis of SNVs and INDELs
Paired-end sequence raw reads were trimmed and filtered to produce clean reads with good base quality (Phred Q score > 20). Burrows-Wheeler Alignment (BWA 0.5.9), the Genome Analysis Toolkit (GATK), and SAMtools were used to align these paired-end sequencing reads with the human reference genome hg19. Identified SNVs and small INDELs were analyzed using the variant databases dbSNP135, dbNSFP COSMIC, and the 1000 Genomes, and several software programs, such as SNPEff, SIFT, PolyPhen2, LRT, PhyloP, Mutation_Taster, Mutation_Assessor, FATHMM, and GERP_NR. Somatic non-synonymous SNVs and INDELs were selected using the following criteria: a >20% read-allele frequency at the position; ≥15 mapped reads at the position; and zero SNV or INDEL allele reads in the targeted sequence of corresponding normal tissue. Variants were confirmed by visualization in the Interactive Genomic Viewer and NextGENe software v2.3.1 (SoftGenetics, State College, PA, USA), as well as by quantitative PCR (qPCR).

Bioinformatic analysis of CNVs
Genomic CNVs were assessed using NextGENe v2.3.1 (SoftGenetics), which compares the median read coverage levels between target genomic regions of cancer and matched normal tissues after global normalization of genome-wide read coverage levels. CNVs were calculated as the log2 ratio of read coverage in cancer and matched normal tissues. CNVs with a log2 ratio >1.5 were considered amplified, whereas CNVs with a log2 ratio <-1.2 were considered homozygous loss-of-function mutations.

Survival analysis
Survival was analyzed by the Cox proportionalhazards regression method [29] using clinical information and somatic mutation data of patients who had been treated with adjuvant chemotherapy. After determining the HR and p-value of each mutation, Benjamini-Hochberg multiple testing correction was applied to address the risk of false positives because of multiple analysis (false discovery rate = 0.05) [30].

Protein-protein interaction networks and gene expression analysis
STRING, KEGG (Kyoto Encyclopedia of Genes and Genomes), and DAVID (Database for Annotation, Visualization, and Integrated Discovery) were used to analyze oncogenic and tumor-suppression pathways in TNBC samples. In addition, CNV information, RNA expression, and mutation data of our TNBC samples were compared with those of TNBC samples from the TCGA database.

Validation of genomic alterations
Two SNV regions in TP53 were selected for experimental validation of somatic mutations. Target regions in genomic DNA from tumor and matched normal tissues of patients TNBC030 and TNBC045 were amplified by PCR, and products were either sequenced directly or cloned into the T vector for Sanger sequencing. Five clones from each sample were selected. Frequent CNVs, such as amplification of NDRG1 and deletion of ATM, BRCA1, BRCA2, and WRN, were selected for validation by qPCR. Genomic DNA from tumor and matched normal tissues of patients TNBC038 and TNBC048 for ATM; patients TNBC026, TNBC031, TNBC038, and TNBC066 for BRCA1; patients TNBC004, TNBC011, TNBC014, and TNBC068 for BRCA2; and patient TNBC030 for WRN was analyzed by qPCR. The relative expression of these genes in corresponding samples was calculated according to the ddCt method, using TERT as a reference gene [31,32]. Details regarding mutated and altered genomic regions, and the primers used in the validation experiments are provided in Supplementary Table 10.

Author contributions
YKS and YLC conceived, designed, and supervised the study. HMJ drafted the manuscript and experimentally validated the identified genetic alterations. HMJ, RNK, YKS, and YLC performed targeted exome sequencing data analyses. RNK, EO, JH, and JWN assisted with bioinformatics analysis and statistical analysis. YLC and JSC interpreted the results of hematoxylin and eosin staining and of immunohistochemical staining. HL and BHN helped with statistical analysis. MJK, SKL, SP, SJN, GYG, and DHC helped draft the manuscript. All authors read and approved the final manuscript.