POLE and POLD1 screening in 155 patients with multiple polyps and early-onset colorectal cancer

Germline mutations in POLE and POLD1 have been shown to cause predisposition to colorectal multiple polyposis and a wide range of neoplasms, early-onset colorectal cancer being the most prevalent. In order to find additional mutations affecting the proofreading activity of these polymerases, we sequenced its exonuclease domain in 155 patients with multiple polyps or an early-onset colorectal cancer phenotype without alterations in the known hereditary colorectal cancer genes. Interestingly, none of the previously reported mutations in POLE and POLD1 were found. On the other hand, among the genetic variants detected, only two of them stood out as putative pathogenic in the POLE gene, c.1359 + 46del71 and c.1420G > A (p.Val474Ile). The first variant, detected in two families, was not proven to alter correct RNA splicing. Contrarily, c.1420G > A (p.Val474Ile) was detected in one early-onset colorectal cancer patient and located right next to the exonuclease domain. The pathogenicity of this change was suggested by its rarity and bioinformatics predictions, and it was further indicated by functional assays in Schizosaccharomyces pombe. This is the first study to functionally analyze a POLE genetic variant outside the exonuclease domain and widens the spectrum of genetic changes in this DNA polymerase that could lead to colorectal cancer predisposition.


INTRODUCTION
Colorectal cancer (CRC) is one of the most common tumors and an important cause of mortality in the developed world [1]. It is caused by environmental and genetic factors, with 35% of the variation in CRC susceptibility probably explained by inherited causes [2]. The best known examples of inherited CRC predisposition are Mendelian forms such as Lynch syndrome and familial adenomatous polyposis. They account for ~5% of all cases, and are due to germline mutations in APC, MUTYH and the mismatch repair (MMR) genes, which confer a high risk of developing this disease [3]. However, there is still a considerable number of cases with strong familial CRC aggregation and early disease onset with an unknown inherited genetic basis. An example could correspond to familial CRC type X, where Amsterdam clinical criteria used for Lynch Syndrome identification are fulfilled but there are no alterations in the MMR system [4].
In order to identify new inherited risk factors, wholegenome sequencing combined with linkage disequilibrium studies were recently conducted in families affected by multiple colorectal adenomas and early-onset CRC [5]. By doing so, p.Leu424Val and p.Ser478Asn mutations in POLE and POLD1 DNA polymerases, respectively, were established as a new high-penetrance cause of germline CRC predisposition with an autosomal dominant pattern of inheritance. POLD1 p.Ser478Asn was also found to be involved in germline predisposition to endometrial cancer [5,6]. These mutations are located in the exonuclease domain of the protein, which has proofreading activity by removing misincorporated nucleotides during DNA replication [7]. Therefore, mutations in this protein domain will disrupt the fidelity of DNA replication, which will lead to a mutator phenotype, resulting in tumorigenesis. The pathogenicity of these first identified variants was confirmed by functional studies in the orthologous genes in yeast. Regarding somatic studies, tumors from POLE and POLD1 mutation carriers showed a hypermutated phenotype with an excess of G > T/C > A and C > T/G > A transversions, especially in the context TCT > TAT and TCG > TTG [5,[8][9][10].
Since the discovery of these two new hereditary CRC genes, some additional efforts have been made to characterize the mutational spectrum and the clinical features associated to this new syndrome, which was accordingly named polymerase proofreading-associated polyposis [6 ]. To this date, POLE mutations have been found to be the germline predisposition factor in families with multiple adenomas and early-onset CRC [11][12][13][14][15][16], as well as in other neoplasms such as endometrial, ovarian, brain, pancreas, small intestine and cutaneous melanoma [15,[17][18][19]. On the other hand, POLD1 mutations have been found to predispose carriers to multiple adenomas, CRC, endometrial and breast cancer [11,14,15]. Recently, some CRC patients with deficient MMR system caused by tumor biallelic inactivation were reported to also carry germline mutations in POLE, these MMR somatic mutations being the consequence of the POLE hypermutator tumor phenotype [12,20].
Regarding the prevalent mutations reported up to now, POLE p.L424V has been detected in 24 independent families [5,[11][12][13][14][15]17], whereas POLD1 p.S478N has been found in 4 independent families [5,14]. Additionally, new rare variants located in the active exonuclease domain of these two proteins have also been reported. Among them, alteration of the proofreading activity as evidence of their pathogenicity was confirmed for some variants by functional assays in yeast, T4 bacteriophage or E. coli orthologous polymerases. A functional validation of the exonuclease activity was previously available for POLE p.D368V, p.Y458F [14,18], and POLD1 p.D316H, p.D316G, p.P327L and p.L474P [5,11,15], or it was specifically produced for POLE p.W347C [19].
Finally, the phenotype associated to this new hereditary CRC syndrome is not well defined yet and a better definition of the clinical characteristics will most likely help to detect potential mutation carriers in the general population. Accordingly, the aim of our study was to screen the exonuclease domain of POLE and POLD1 in 155 patients with multiple polyps and early-onset CRC in order to find mutations affecting the replication fidelity of these proteins and shed light on this matter. Our final goal was to facilitate genetic counseling in order to correctly implement preventive strategies in those families.

Patient characteristics
Three subgroups of patients were studied including those presenting multiple polyps, early-onset CRC or MMR-defective CRC without germline alterations in the known hereditary CRC genes. Patients with multiple polyps presented 10-100 polyps, being the main precursor lesion adenomatous, serrated or a combination of adenomatous and serrated polyps with an age of onset < 70 and no alterations in the APC or MUTYH genes. Another selection criteria for the multiple polyps group included having at least one first-degree relative with multiple polyps, CRC, advanced adenomas or endometrial cancer diagnosed before the age of 70. Early-onset CRC patients were selected with an age of onset < 50 and no alterations in MUTYH or the MMR genes/proteins and tumor microsatellite stability. CRC patients with MMR deficiency presented loss of MMR protein (MLH1, MSH2, MSH6, PMS2) expression by immunohistochemistry with neither detected germline mutation in the MMR genes, nor somatic MLH1 hypermethylation. Clinical characteristics are summarized in Table 1. Our patient cohort included predominantly cases with multiple polyps phenotype (83 cases, 53.5%), most of them presenting adenomas www.impactjournals.com/oncotarget (67.5%) with a mean polyp onset age around 57 y.o. The early-onset CRC group corresponded to 59 cases (38.1%) and distal location of the tumor was predominant. No alterations in APC or MUTYH genes were present in the polyposis group, and alterations in MUTYH or the MMR genes/proteins and tumor microsatellite instability were absent in the early-onset CRC group. The smallest group (13 patients, 8.4%) was the MMR-defective CRC

Variant detection
After screening the exonuclease domain of POLE and POLD1 in our cohort of patients, we detected several genetic variants that are listed in Table 2. Firstly, it is important to highlight that none of the two previously described recurrent mutations in POLE (p.Leu424Val) and POLD1 (p.Ser478Asn) were present in our cohort. Among the detected variants, some of them corresponded to polymorphisms with an allelic frequency > 10%, they were also present in accessed human genetic variant databases with a similar frequency and they were not considered for further analysis. We proceeded to select those genetic variants with an allelic frequency < 10% or not present in any of the checked human genetic variant databases. Among them, some variants were located in intronic sequences or corresponded to coding synonymous variants. Their putative involvement in abnormal splicing processing was evaluated by using several bioinformatics tools. Results are summarized in Supplementary  Table 1. None of the analyzed variants showed a prediction of altered splicing by 2 or more tools. However, one of the variants corresponded to a deletion of 71 base pairs in intron 13 of the POLE gene. This variant was not present in any of the human genetic variation databases but it was found in 2 early-onset CRC patients in our cohort, so we decided to further characterize it. In order to do so, we analysed its putative splicing alteration in a carrier at the RNA level by using RT-PCR. When compared to a control RNA not carrying this variant, we did not detect any additional amplification band in the deletion carrier being indicative of no splicing alteration, which was also confirmed by Sanger sequencing (Supplementary Figure 1).

Novel POLE missense variant
Instead of the previously described variants, we detected an interesting POLE missense variant in an early-onset CRC patient that corresponded to c.1420G > A (p.Val474Ile) ( Table 2). The carrier proband presented a distal MMR-proficient CRC at the age of 47 and 2 nonadvanced adenomas. The tumor showed expression of the MMR proteins, was stable and it did not present any histotype feature linked to MMR-defective tumors (tumor infiltrating lymphocytes, Crohn's like inflammatory reaction, mucinous, signet ring cells, medullary growth pattern). Her father and paternal aunt also had CRC at the age of 58 and 47, respectively, and one of her sisters presented non-advanced adenomas ( Figure 1). This family fulfilled Amsterdam criteria and it could be considered an example of familial CRC type X. Unfortunately, variant segregation analysis was not feasible since DNA was unobtainable from additional family members.
The POLE c.1420G > A (p.Val474Ile) was suggestive of being a new putative pathogenic variant. It was not present in any of the accessed human genetic variation databases including a Spanish repository and the affected amino acid was found conserved in 100 vertebrates as well as in D. melanogaster, C. elegans and yeast. Regarding its protein location, Val474 is located in the N-terminal domain but very close to the C terminus end of the exonuclease domain (only 3 amino acids away). In order to assess the potential functional impact of the POLE Val474Ile variant, we analyzed further its position in the polymerase structure. To do so, we took advantage of the crystallized structure of the Saccharomyces cerevisiae DNA polymerase epsilon (PDB IB 4M8O). Indeed, the acceptable amount of sequence identity between human and yeast proteins (57%) had already been exploited for the construction of a homology model for human POLE (available at the ModBase database, UP Q07864). Analysis of this model allowed us to locate the Val474Ile variant in the N-terminal domain but in very close proximity to the exonuclease domain ( Figure 2A). When superposed with the template structure ( Figure 2B) along with DNA, it showed that the variant may not affect DNA binding directly. Conversely, it could have an indirect effect on the helical packing of the exonuclease and N-terminal domains, which could distort the physiological conformation needed for a correct polymerase activity.

Functional studies in yeast for POLE Val474Ile
In order to further test the putative functional effect of this variant, we proceeded to analyze it in yeast. POLE V474 residue is highly conserved in eukaryotes, including Schizosaccharomyces pombe (S. pombe). We constructed the equivalent substitution in this organism, pol2 p.V475I, and another strain carrying the equivalent change to the previously reported POLE p.L424V mutation (pol2 p.L425V) as a positive control. We compared the ade6-485 allele reversion rates in these mutant strains with the wild-type pol2 strand as negative control (Figure 3). When comparing Pol2-L425V substitution with the wild-type strain, 40 times more revertants were observed (P-value = 0.0108). Regarding Pol2 p.V475I, the mutation rate was 17 times higher when compared to the wild-type strain (P-value = 0.0040).

Somatic studies for POLE Val474Ile
Paired-tumor tissue was only available from the patient carrying the POLE p.Val474Ile variant. When tested on the corresponding tumor, loss of heterozygosity (LOH) could not be detected by Sanger sequencing in the carrier when comparing to her germline DNA. Wholeexome sequencing (WES) was also performed in the tumor DNA of the patient carrying the POLE p.Val474Ile variant to study the number and spectrum of somatic mutations. A second mutational event in the POLE gene was not found. We had available WES data obtained from four other MMR proficient CRC tumors that did not present germline or tumor alterations in POLE or POLD1 to compare their number of substitutions and mutational spectrum with our POLE mutant. Regarding tumor WES results in the five samples analyzed, mean coverage was > 90× and > 79% of DNA in each tumor was sequenced with ≥ 30× coverage. First, a tumor profile for each sample was generated by eliminating variants present in a germline exome dataset. Mutation density plots suggested that tumour profiles were correctly generated and germline variants were mostly eliminated (Supplementary Figure 2). Additionally, WES data was normalized by selecting only those sequenced  Table 2). However, when mutation spectrum was analyzed, the tumor DNA from the p.Val474Ile germline carrier did not show an increase in G > T/C > A or C > T/G > A transversions as suggested by previous studies.

DISCUSSION
Our molecular screening of the POLE and POLD1 exonuclease domains in a cohort of 155 patients with multiple polyps or early-onset CRC identified a novel POLE mutation in one family. We also detected several intronic variants most likely polymorphic and without pathogenic involvement.
Importantly, we did not find any of the previously described mutations for POLE (p.Leu424Val) and POLD1 (p.Ser478Asn) in our cohort. As reported by previous studies, the POLE p.Leu424Val mutation frequency in multiple colorectal adenomas or familial CRC cohorts seems to be typically ≤ 0.3% [5,11,12,14,15], whereas the POLD1 p.Ser478Asn mutation seems to be even less frequent (< 0.1%) [5,14]. Taking our results into account, we can conclude that these mutations have indeed a very low frequency. On the other hand, we can also hypothesize that our sample size was probably not large enough to be able to detect any carrier of these mutations (334 samples needed to be screened to detect one carrier at a 0.3% frequency). Additionally, it could be also possible that the frequency of these mutations may be even lower in the Spanish population since only one carrier for the POLE p.Leu424Val mutation and no POLD1 p.Ser478Asn mutation carriers have been reported so far [11].
Leaving aside these potentially recurrent mutations, other different variants have already been reported in POLE and POLD1. Most of them are located in the protein exonuclease domain and include p.Trp347Cys, p.Asn363Lys, p.Asp386Val, p.Lys425Arg, p.Pro436Ser, p.Tyr458Phe in POLE [13,14,[17][18][19][20], and p.Asp316His, p.Asp316Gly, p.Pro327Leu, p.Arg409Trp, p.Leu474Pro for POLD1 [5,11,15]. These previously reported variants and our newly identified variant are indicative that the entire coding region for POLE and POLD1 should be screened instead of focusing only in a few variants. It should also be noted that the phenotype selection criteria in our screened cohort included multiple polyps with at least one affected first-degree relative, early-onset CRC or MMR-defective CRC without germline alterations in the known genes. Previous studies have either used similar [11] or more permissive selection criteria [12][13][14] with similar results. It could be argued that including only multiple polyps with family history may have reduced the chances of detecting carriers. Considering previously reported known POLE or POLD1 mutation carriers, the phenotypic spectrum included multiple polyps and earlyonset CRC, as well as family history and would reinforce therefore the phenotype selection criteria used in our cohort. Regarding our molecular screening approach, we used PCR amplification of genomic DNA and subsequent Sanger sequencing corresponding to the entire exonuclease domain and adjacent intronic sequences. This approach is not biased to detect only POLE p.Leu424Val and POLD1 p.Ser478Asn mutations as it was the case for some previous studies [11,12] and permitted to detect additional mutations located in this region.
We were able to detect a new mutation in the POLE gene corresponding to c.1420G > A (p.Val474Ile). The heterozygous carrier was recruited in the early-onset CRC group and belonged to an Amsterdam I family without alterations in the MMR repair system. Its rarity, amino acid species conservation and location in the POLE protein already predicted a plausible pathogenic role. Our results in yeast suggest that the human POLE p.L424V and POLE p.V474I variants will cause an increased mutation rate due to faulty proofreading activity of this protein, although POLE p.V474I with an attenuated phenotype when compared to POLE p.L424V. However, since POLE Val474Ile is having a smaller effect on proof reading, other additional genetic variants also contributing to CRC predisposition cannot be disregarded.
Exome sequencing of the tumor corresponding to the p.V474I variant carrier revealed a slightly higher number of substitutions compared with four POLE wildtype tumors, although the increment was not as high as could be expected taking into account previously reported data [5,10]. In this sense, when comparing with POLEexonuclease mutant CRC tumor samples present in the COSMIC database (e.g. TCGA-CA-6718-01, P286R, 5,946 substitutions; and TCGA-A6-6141-01, p.S297F, 1,981 substitutions), a hypermutated profile was not evident. However, it should be noted that our tumor exome data normalization and filtering strategy was more stringent and probably some somatic mutations were overlooked. Besides, the percentage of G > T/C > A transversions was not increased compared with the other POLE wild-type tumors.
It is also worth mentioning that the same variant was previously reported as somatic in the COSMIC database in a gastric cancer with microsatellite instability (TCGA-BR-6452-01) [24]. This gastric tumor presented, as defined by the authors, "an ultramutated profile", with 11,375 substitutions, and a mutation rate of approximately 283 mutations per Megabase. The percentage of G > T/C > A transversions in this sample was 10.29%. Notably, this ultramutated tumor presented three other somatic missense variants in POLE located far away from the exonuclease domain (p.R1111Q, p.S681R and Y1889C) that could be promoting even a stronger effect. In this regard, it is also worth mentioning that previous reports regarding POLE tumor mutation profile were generated mostly by using gene panel sequencing and with much higher coverage than the one used in the present study. We can hypothesize that our tumor profiling may have failed to show a clear distinctive profile for our novel POLE mutation due to a suboptimal filtering of germline variants and a limited sequencing coverage, as well as a milder mutator effect as shown by the functional assays in yeast.
In conclusion, we detected a new plausible POLE mutation, p.V474I, in an early-onset CRC patient. This variant is located right next to the exonuclease domain and affects protein function, leading to a proofreading activity defect as shown by yeast studies. The pathogenicity of this change was also suggested by bioinformatics and protein structure predictions. When checking its tumor profile, it showed an increase in the number of variants but not as strong as in the p.L424V POLE mutation in agreement with the yeast functional results. It is also worth mentioning that this is the first study to functionally analyse a POLE genetic variant outside the exonuclease domain in POLE and widens the spectrum of genetic changes in this DNA polymerase that could lead to CRC predisposition.

Patients
Three subgroups of patients were studied including those presenting multiple polyps, early-onset CRC or MMR-defective CRC without germline alterations in the known hereditary CRC genes. Multiple polyps patients presented 10-100 polyps, being the main precursor lesion adenomatous, serrated or a combination of adenomatous and serrated polyps with an age of onset < 70 and no alterations in the APC or MUTYH genes. Another selection criteria for the multiple polyps group included having at least one first-degree relative with multiple polyps, CRC, advanced adenomas or endometrial cancer diagnosed before the age of 70. Early-onset CRC patients were selected with an age of onset < 50 and no alterations in MUTYH or the MMR genes/proteins and tumor microsatellite stability. CRC  Germline DNA samples used for Sanger sequencing were obtained from peripheral blood, whereas in one case showing a relevant POLE variant, formalin-fixed, paraffinembedded (FFPE) tumor DNA was also isolated for LOH and tumor profiling studies using the QIAamp DNA Blood Kit or QIAamp Tissue Kit, respectively (QIAGEN, Redwood City, USA) and following manufacturers' instructions. RNA for splicing analysis was obtained from peripheral blood collected in a PAXgene Blood RNA tube in one patient and isolated using the PAXgene blood RNA kit (PreAnalytiX, Hombrechtikon, Switzerland) following the manufacturer's protocol.
One rare intronic variant in the POLE gene was studied at the RNA level by RT-PCR and PCR amplification using custom primers located two exons upstream and downstream from the variant (Supplementary Table 3) and Sanger sequencing to verify for correct exon splicing. A nonsynonymous variant with an allelic frequency < 10% was evaluated with bioinformatics tools (Polyphen, http://genetics.bwh.harvard.edu/pph2/ and CADD_phred, http://cadd.gs.washington.edu/score) in order to predict its possible effect on the protein function. This variant has been submitted to the ClinVar database (http://www.ncbi.nlm. nih.gov/clinvar/; accession number SUB1552845). Amino acid conservation in 100 vertebrates was also checked in the UCSC genome browser (http://genome.ucsc.edu/).