Whole-exome sequencing identified mutational profiles of high-grade colon adenomas

Although gene-to-gene analyses identified genetic alterations such as APC, KRAS and TP53 mutations in colon adenomas, it is largely unknown whether there are any others in them. Mutational profiling of high-grade colon adenoma (HGCA) that just precedes colon carcinoma might identify not only novel adenoma-specific genes but also critical genes for its progression to carcinoma. For this, we performed whole-exome sequencing (WES) of 12 HGCAs and identified 11 non-hypermutated and one hypermutated (POLE-mutated) cases. We identified 22 genes including APC, KRAS, TP53, GNAS, NRAS, SMAD4, ARID2, and PIK3CA with non-silent mutations in the cancer Census Genes. Bi-allelic and mono-allelic APC alterations were found in nine and one HGCAs, respectively, while the other two harbored wild-type APC. Five HGCAs harbored either mono-allelic (four HGCAs) or bi-allelic (one HGCA) SMAD4 mutation or 18q loss that had been known as early carcinoma-specific changes. We identified MTOR, ACVR1B, GNAQ, ATM, CNOT1, EP300, ARID2, RET and MAP2K4 mutations for the first time in colon adenomas. Our WES data is largely matched with the earlier ‘adenoma-carcinoma model’ (APC, KRAS, NRAS and GNAS mutations), but there are newly identified SMAD4, MTOR, ACVR1B, GNAQ, ATM, CNOT1, EP300, ARID2, RET and MAP2K4 mutations in this study. Our findings provide resource for understanding colon premalignant lesions and for identifying genomic clues for differential diagnosis and therapy options for colon adenomas and carcinomas.


INTRODUCTION
Colorectal cancer (CRC) is the third most common cancer in males, and the second most common in females worldwide [1]. Although there have been advances in the treatment of CRC and the mortality rate has declined in the past few decades, it remains the fourth most common cause of cancer-related deaths worldwide [2]. The Research Paper multistep model in the adenoma-to-carcinoma sequence is well recognized in CRC development [3]. According to this model, most CRCs are considered to arise from pre-existing adenomas. Microscopically, conventional colorectal adenomas are classified on the basis of their architectural phenotype as either tubular or villous or tubulovillous type. According to the cellular atypia, they are classified as either low-or high-grade colon adenoma (HGCA) [4]. APC, KRAS and β-catenin mutations are considered key events in adenoma development while PIK3CA and TP53 mutations are considered key events in the progression to invasive CRC [5][6][7]. SMAD4 mutation is rarely detected in colon adenomas but occurs in intramucosal carcinoma and more commonly in invasive carcinomas with metastases [8,9].
All of these mutations were first identified by conventional gene-to-gene analyses [10,11]. With advances in sequencing technology (next-generation sequencing, NGS), a comprehensive molecular characterization of CRC has been studied using wholeexome (WES) and/or whole-genome sequencing (WGS) that allows for the interrogation of thousands of variants from multiple genes within a given tumor sample at the same time [12]. Although mutational landscapes of CRC have been reported several times, the genomic data of colon adenomas, especially those generated by highthroughput genome profiling technologies such as WES, is scarce. An earlier study analyzed a case of synchronous colon adenoma and CRC by WES [13]. Another study analyzed 82 nucleotide positions of 14 genes in colon adenomas with various types and grades by targeted sequencing [9]. Due to the small case number or narrow extent of genes analyzed in these reports, they did not reveal a general mutational landscape of colon adenomas. In this study, by using NGS-based WES for 12 HGCAs, we attempted to identify the mutational profiles of HGCA and found several mutations that had not been identified in colon adenomas in previous studies.  Table 1). One HGCA (HGCA12) was found to be a hypermutated adenoma that harbored an exceptionally high incidence of somatic mutations (883 non-synonymous single nucleotide variants (SNVs) and 12 indels). It had a somatic mutation (p.E491K) in polymerase ε (POLE) gene that encodes the catalytic and proofreading subunits of the DNA polymerase ε [12]. Excluding this hypermutated HGCA, a total of 772 somatic non-silent mutations (752 non-synonymous SNVs and 20 indels) (average: 70.2, range: 36-102) were identified from the other 11 HGCAs ( Figure 1A, Supplementary Table 2 and Supplementary  Table 3). The C: G->T: A transitions were the most significant changes in the HGCA samples ( Figure 1B and Supplementary Table 2), which was consistent with a previous report that analyzed a colon adenoma [13]. Based on the notion that DNA methylation frequently occurs within the context of CpG sites, the enrichment of C: G->T: A transitions in 5′-CpG-3′ dinucleotides might be related to extensive DNA methylation at 5′-CpG-3′ sites in CRC tumorigenesis [14,15].

Whole-exome sequencing and somatic variants
To address whether the mutations found in our study were causally implicated in tumor development, we looked up the cancer Gene Census [16] and the Cancer Driver Database [17] with the CHASM [18] analysis. In this study, mutations that were significantly predicted as drivers (FDR < 0.2) in the CHASM analysis and overlapped the driver databases (the cancer Gene Census or the Cancer Driver Database) were considered potential driver mutations. Under this criterion, 17 genes were considered the candidate driver genes (APC, KRAS, SMAD4, GNAS, RET, NRAS, MTOR, ACVR1B, GNAQ, ATM, TP53, PIK3CA, ERBB2, TRRAP, MAP2K4, MAP3K4 and CNOT1). In addition, six genes with truncating mutations overlapped the driver databases were also included (APC, ARID2, ERBB4, TCF7L2, AMER1 and EP300). Overall, we defined 22 genes as candidate drivers (Table 1 and Figure 2). Mutations identified with WES were validated by either digital-polymerase chain reaction (PCR) or Sanger sequencing (Supplementary  Table 3). As these genes besides APC, KRAS and SMAD4 are listed in neither of the cancer Census Genes nor the COSMIC CRC genes, they might be passenger mutations or HGCAspecific mutations that might not play an important role in HGCA progression to CRC.
Of the ten HGCA cases with APC mutations, eight harbored bi-allelic APC mutations, while two cases (HGCA3 and HGCA8) harbored mono-allelic APC mutations ( Figure 2). All of the KRAS mutations were detected either in the amino acid residue p.G12 or p.G13 that is a mutational hotspot [19,20]. Of the SMAD4 mutations (p.R361H, p.G386S and p.D404E), p.R361H and p.G386S have been found in many cancers, including colon, stomach and esophageal cancers [21], while p.D404E is a novel mutation that has not been reported. Missense mutations of TP53 (p.R175H and p.R249S) in this study have been reported in many cancers including colon adenoma [22] and p.R175H is one of the mutational hotspots in TP53 (the COSMIC database). Two HGCAs harbored nonsense mutations (p.R358X and p.Q414X) of AMER1, which encodes an X-linked negative regulator of WNT signaling [23]. Mutations of AMER1, TCF7L and MTOR have been reported in CRC [12,24]. One case (HGCA4) harbored an NRAS hotspot mutation p.Q61H, which has been reported in many cancers, including thyroid, lung, skin and CRC [25]. PIK3CA mutation identified in APC wild-type case (HGCA10) was the hotspot mutation p.H1047R that is common in CRC as well [12]. ATM p.R337H mutation was the hotspot site and common in colon and liver cancers [26]. GNAQ p.T96S mutation has been reported in skin and prostate cancers [27,28]. GNAS mutation identified in this study was the hotspot mutation p.R201C that is common in low-grade appendiceal mucinous neoplasm and pancreatic intraductal papillary mucinous neoplasm [29,30]. Nonsense mutation of ARID2 p.R1273X in our study has been reported in esophageal and head/neck cancers [31,32]. RET p.G423R mutation has been reported in hairy cell leukemias [33]. ERBB2 p.V842I and EP300 p.R1312X mutations have been reported in colon and ovary cancers, respectively [12,34].

Copy number alterations
We analyzed somatic copy number alterations based on the read depth difference in the exome sequencing data between HGCA and matched normal tissues. Of the tumor-related gene mutations detected, 13 genes were accompanied by copy number alterations as well ( Figure  2 and Supplementary Table 4). Copy number gains were found in six genes (KRAS, AMER1, GNAS, ARID2, NRAS and GNAQ) and copy number losses in eight genes (APC, SMAD4, TCF7L2, TP53, MTOR, NRAS, MAP2K4 and EP300). Of note, the case with mono-allelic APC mutation (HGCA3) exhibited copy loss as well, suggesting a bi-allelic alteration (mutation + copy loss). Likewise, two cases with mono-allelic TP53 mutation (HGCA9 and HGCA11) also exhibited copy loss in the locus. As for KRAS, in total, eight of twelve HGCAs (67%) exhibited mutation and/or copy gain. Similarly, SMAD4 gene showed mutation or copy loss in five of twelve HGCAs (42%).  Twenty-two candidate driver genes are mutated in the adenomas. Block colors represent the functional categories of mutation and copy number alteration. Asterisks represent the somatic mutations that overlap the COSMIC database at variant level. www.impactjournals.com/oncotarget

Pathway analysis of mutated genes in the HGCA
The mutated genes identified in HGCAs were annotated through the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis using DAVID tools [35] and found that mutated genes in the HGCA were significantly associated with tumorigenesis-related gene functions (Supplementary Table 5).

DISCUSSION
Comprehensive analysis of genomic profiles is crucial for understanding tumorigenic mechanism in cancer development. Compared to CRC, mutational profiles of HGCA, a premalignant lesion that precedes invasive CRC, are not well studied. The aim of this study was twofold. First, we attempted to disclose the genomic profiles of HGCAs to find the mutational abundance compared to that of CRC especially with respect to the driver mutations. The next aim was to find cancerrelated gene mutations that emerge from the HGCA stage rather than invasive CRC stage. Consistent with the previous report, total numbers and spectrum of somatic mutations identified in the HGCAs were not significantly different from those in CRCs [12].  [20,36]. As for TP53 and PIK3CA, their mutations are known to occur mainly in the late stage of the CRC model (invasive carcinoma) [20]. However, an earlier study [36] reported that TP53 mutations were fairly common in colon adenomas albeit less than CRC. PIK3CA mutations are rare in in colon adenoma, but often occur in advanced adenomas greater than 5 cm in diameter [37]. In our study, the HGCA10 with PIK3CA mutation was found in a large adenoma (3.7 cm in diameter) with a synchronous carcinoma component, suggesting that the HGCA case may be a full-grown late-stage one. SMAD4 mutation that had seldom been reported in HGCAs was identified in our study. Also, previously unreported mutations in colon adenomas (MTOR, ACVR1B, GNAQ, ATM, CNOT1, EP300, ARID2, RET and MAP2K4) were newly discovered in this study.
Earlier studies showed that SMAD4 mutation followed APC mutation and precedes TP53 mutation in CRC development [20,38]. Our data on SMAD4 mutation in HGCAs is in agreement with the gene sequence (APC-SMAD4-TP53 mutations). However, it is not in agreement with the previous notion that SMAD4 mutations rarely occurred either in adenoma or intramucosal carcinoma, but was common in advanced CRCs [8,38]. SMAD4 somatic mutations have been reported in 0.8% of colon adenomas (the COSMIC database), in 0% of low-and high-grade adenomas and 0% of adenomas and intramucosal carcinomas [8,9], indicating SMAD4 mutation is a rare event in early CRC tumorigenesis. The patterns of SMAD4 alteration in our HGCAs are somewhat different from those described in advanced CRCs where SMAD4 mutations were usually accompanied by allelic loss of 18q where SMAD gene resides [8]. Of the five SMAD4 alterations four (HGCA1, HGCA5, HGCA6 and HGCA9) were either somatic mutation or 18q loss, indicating they were mono-allelic, while one (HGCA10) harbored both 18q loss and somatic mutation. Size of HGAC10 (3 cm in diameter) and with a synchronous CRC, suggesting that it may be a late-stage adenoma. Together, our data suggest that mono-allelic inactivation of SMAD4 either by mutation or copy loss may occur in HGCAs and that it may require additional hit during or after the progression to invasive or metastatic CRC.
The mutations (MTOR, ACVR1B, GNAQ, ATM, CNOT1, EP300, ARID2, RET and MAP2K4) newly detected in HGCAs in the present study have been reported in CRCs at low incidences (0.9%-8.9%) as well as other cancers [31][32][33]. The COSMIC database, however, shows that there is no mutation for these genes in 29 colon adenomas. Together with this, our data indicate that these gene mutations albeit low incidences may occur in adenomas, at least in HGCAs, and might possibly contribute to early CRC development.
Somatic mutations of POLE in the exonuclease domain are found in CRC and endometrial cancers that show association with hypermutability [39]. POLE somatic mutations in these cancers were recurrent in seven amino acids (p.P286R/H/L, p.S297F/Y, p.V411L, p.P436R, p.M444K, p.A456P and p.S459F), but the POLE p.E491K mutation in our result has not been reported. Association of germline POLE mutation with familial adenoma development suggests that POLE mutation might occur as an early event in sporadic tumors [40], but there has been no such data that support the hypothesis. In this study, we show data supporting that loss of proofreading activity by POLE mutation could play a role in early CRC development.
In summary, our study for the first time attempted to disclose mutational profiles of HGCAs by WES that had not been analyzed before except a study for one case. Our data is largely in agreement with the earlier 'colorectal adenoma-carcinoma model' that was made by gene-to-gene approaches [10,11]. Our HGCA mutation list includes not only those in this classical model (APC and KRAS) but also those already known as adenoma genes (NRAS and GNAS). In addition, we newly identified SMAD4, MTOR, ACVR1B, GNAQ, ATM, CNOT1, EP300, ARID2, RET and MAP2K4 in HGCAs. Our data are based www.impactjournals.com/oncotarget on the analysis of twelve HGCA genomes. The small size of the study set is due to the difficulties in obtaining histologically defined fresh HGCA tissues in a single institution since in most of the cases they are used up for diagnostic purpose. Further investigation in a larger cohort of HGCA may reveal valuable information to confirm our data, e.g., SMAD4 mutation, 18q loss and other newly discovered mutations. Also, such an approach would reveal additional information, e.g., discovery of potential additional driver mutations in HGCAs. In addition, metaanalysis of multiple cases with diverse ethnic backgrounds may be required to investigate whether such mutational contexts are population-specific. Our findings may provide a useful resource for understanding this premalignant disease and identifying genomic clues for differential diagnosis and therapy options for colon adenoma and carcinoma.

Tumor specimen and DNA extraction
Fresh frozen normal and HGCA tissues from 12 Korean patients were obtained from the tissue banks of Seoul Saint Mary Hospital (Seoul, Korea), Guro Hospital of Korea University (Seoul, Korea) and Busan National University Hospital (Busan, Korea). All of the cases were sporadic, without any family history of CRC. Clinicopathologic features of the 12 HGCA patients are summarized in Table 2. We defined HGCA according to the World Health Organization [41] where is defined by architectural complexity and cytologic features including extent of nuclear stratification and severity of abnormal nuclear morphology. All of the samples from tumor and normal areas were frozen, cut, and stained with hematoxylin and eosin. A pathologist identified that HGCA lesions were in the samples (Supplementary Figure  2). All cases were tubulovillous adenomas (Table 2). We observed high grade dysplasia in both tubular and villous components in all cases. HGCA areas with rich tumor cell population (at least 70%) were selected and sliced from the frozen tissues with clean blades, and were subsequently used in the study. We also examined the histology by FFPE that showed definite HGCA. After the slicing, we examined again the histology of remnant frozen tissues under microscope and found no changes in the histologic diagnoses of HGCA. We used normal tissues as controls belonged to the same patients.

Whole-exome sequencing and copy number inference
WES was performed for genomic DNA from the HGCA and matched normal samples using Agilent SureSelect Human All Exome 50Mb kit (Agilent Technologies) and Illumina HiSeq2000 platform. The collection and processing of the sequencing data was performed as previously described [42]. The preparation of genomic DNA libraries and 101 bp paired-end sequencing reads was performed according to the manufacturer's instructions. Firstly, we used Burrows-Wheeler Alignment tool (BWA) to align the pairedend sequences onto the UCSC hg19 human reference genomes [43]. Local alignment and score recalibration of the sequencing data were performed, using the Genome Analysis ToolKit [44]. Additional downstream analyses, following alignment, were performed using Picard (http://picard.sourceforge.net) and Samtools [45]. We used Mutect and SomaticIndelDetector to call somatic single nucleotide variants (SNVs) and small insertions/ deletions (indels) by comparing the sequencing data from the adenoma samples with those from the matched normal tissue samples [44,46]. ANNOVAR package was used for the analysis of mutations on coding sequences, and for the functional annotation of somatic variants [47]. For the identification of copy number alteration, we used VarScan 2 to obtain the read depth differences between the tumors and matched normal exome sequencing data [48]. The GC-corrected read depth was log 2 -transformed and segmented using circular binary segmentation algorithm [49].

Validation of the whole-exome sequencing
We validated mutations of 12 genes either by digital-PCR or Sanger sequencing as described previously [50]. Digital-PCR was performed using the TaqMan Genotyping assay and the QuantStudio 3D digital PCR system (Life Technologies) according to the manufacturer's protocol.