Profiling cancer-associated genetic alterations and molecular classification of cancer in Korean gastric cancer patients

Recently, the Cancer Genome Atlas (TCGA) Research Network and Asian Cancer Research Group provided a new classification of gastric cancer (GC) to aid the development of biomarkers for targeted therapy and predict prognosis. We studied associations between genetically aberrant profiles of cancer-related genes, environmental factors, and histopathological features in 107 paired gastric tumor-non-tumor tissue GC samples. 6.5% of our GC cases were classified as the EBV subtype, 17.8% as the MSI subtype, 43.0% as the CIN subtype, and 32.7% as the GS subtype. The distribution of four GC subgroups based on the TCGA and our dataset were similar. The MSI subtype showed a hyper-mutated status and the best prognosis among molecular subtype. However, molecular classification based on the four GC subtypes showed no significant survival differences in terms of overall survival (p= 0.548) or relapse-free survival (RFS, p=0.518). The P619fs*43 in ZBTB20 was limited to MSI group (n= 5/19, 26.3%), showing similar trends observed in TCGA dataset. Genetic alterations of the RTK/RAS/MAPK and PI3K/AKT/mTOR pathways were detected in 34.6% of GC cases (37 individual cases). We also found two cases with likely pathogenic variants (NM_004360.4: c. 2494 G>A, p.V832M) in the CDH1 gene. Here, we classified molecular subtypes of GC according to the TCGA system and provide a critical starting point for the design of more appropriate clinical trials based on a comprehensive analysis of genetic alterations in Korean GC patients.


INTRODUCTION
Gastric cancer (GC) is ranked fifth for cancer incidence and second for cancer deaths, and one in 36 men and 1 in 84 women develop stomach cancer before age 79 [1]. The histologic classification of gastric carcinoma has been based on the Lauren [2] and 2010 WHO classification systems, which recognize four histological subtypes [3]. Neither the Lauren nor the WHO system is particularly clinically useful, as their prognostic and predictive capabilities cannot adequately guide patient management. Thus, new classifications are needed for GC to provide insights into pathogenesis and the identification of new biomarkers and novel treatment targets [4]. Recently, advances in technology and high-throughput analysis have improved our understanding of the genetic basis of GC. To provide a roadmap for patient stratification and trials of targeted therapies, the Cancer Genome Atlas (TCGA)

Molecular subtype classification and clinical phenotype
We classified molecular subtypes using genomic data according to subtypes derived by TCGA and correlated clinical covariates of 107 GC patients with those molecular subtypes ( Table 4). The EBV subtype (6.5% of GC) was significantly enriched in EBV burden and characterized as uncommon histological subtype (Table 5). In the EBV subtype, no samples with a TP53 mutation were detected, but mutations of ARID1A (4 cases, 57.1%), CDH1 (3 cases, 42.9%), PIK3CA (2 cases, 28.6%) and RHOA (2 cases, 28.6%) were present with a relatively high frequency. Genetic alterations of the JAK2 and PDCD1LG2 genes were not detected in the EBV subgroup. Only one case harbored the mutant CD274 ( Figure 1 & Table 4).
The MSI subtype (17.8% of GC) showed instability in one more locus in the MSI assay. The MSI subtype presented with an elevated mutation rate (6.6 per case) and was characterized by alterations of genes involved in mismatch repair. Almost all cases with mutations in MLH1 www.impactjournals.com/oncotarget  significance. Interestingly, we observed that mutations of ZBTB20 were limited to the MSI group ( Table 4).
The CIN subtype (43.0% of GC) was also characterized by a relatively low-somatic mutation rate (1.8 per case) and a high frequency of TP53 mutations. In the CIN subtype, we observed amplifications of www.impactjournals.com/oncotarget

Prognosis analysis in 107 gastric cancer patients
Among the 107 GC patients, the date of last followup (months), loco-regional recurrence, distant metastasis, and cause of death were obtained from 72 patients. The median follow up period was 459.5 days, and there were 19 (26.4%) and 12 (16.7%) cases of gastric cancer www.impactjournals.com/oncotarget relapse and gastric cancer-related death, respectively. We conducted a survival analysis but did not observe a substantial difference in overall survival (p= 0.898) or relapse-free survival (RFS, p=0.548) among the four GC subtypes (Figure 3 (a) & (b)). The classification based on AJCC stages showed significant differences in overall survival (p= 0.001) or RFS (p <0.001) (Figures 3 (c)

DISCUSSION
We analyzed germline mutations with paired non-tumor and GC tissue samples in 107 Korean patients. Two cases harbored a likely pathogenic variant (NM_004360.4: c. 2494 G>A, p.V832M) in the CDH1 gene. A V832M mutation has been identified in a hereditary diffuse gastric cancer (HDGC) in a Japanese family. The probands were diagnosed at the age of 56 [21]. This mutation were functionally characterized as a pathogenic mutation [22] and it was also detected in familial lobular breast cancer patients with the wild type BRCA1/2 gene [23]. Two cases with V832M were diagnosed at age 66 and 75, in this study, respectively. Both cases were advanced GC in stage IIB at diagnosis and the family history of GC was not known.
According to the results of somatic variants, the Q1334del/dup (n=23/52) in ARID1A and L15del (n=6/13) in CDH1 were detected at a frequency of 5~33% of altered alleles in tumor tissue (Table 3 & Supplementary 2). The in-frame indel (Q1334del/dup), which increases the amount of the ARID1A protein in the nucleus and restores its tumor suppressor functions, has also been reported in GC samples [24]. This single nucleotide polymorphism (SNP) were also occasionally reported in COSMIC database (COSMIC v78) and pancreatic cancers [25]. A three-nucleotide deletion c.44_46del TGC (L15del) in exon 1 of CDH1, which is in the signal peptide region of the E-cadherin protein, was also identified in Chinese GC patients, whereas it was not detected in 240 controls [26] and endometrial carcinomas [27]. RHOA belongs to the Rho family, which functions in the regulation of the actin cytoskeleton, and functional evidence indicates that mutant RHOA works in a gain-of-function manner in this gene [28]. An RHOA mutation was observed in 8.4% of GC cases (n=9/107), with mutations in the Arg5, Gly17, Thr37, Tyr42 and Glu64 residues (Table 3 & Supplementary 2). Among these mutations, the Arg5, Gly17, and Tyr42 residues are recurrently detected in GC [28,29].
EBV-infected GC constitutes 5-10% of all GC cases [9,10] and the Cancer Genome Atlas project demonstrated that EBV-infected GC is one of four molecular subtypes [28]. We also demonstrated that EBV-infected GC grouped as a molecular subtype. As in the EBV-subtype, ARID1A mutations (4 cases, 57.1%) were prevalent, and no samples with a TP53 were detected. The frequency of ARID1A and TP53 mutations were similar to the TCGA data [28]. Inhibitors of the PI3K/AKT/mTOR pathway, JAK2 pathway and PD-1/PD-L1, PD-L2 pathway are considered as potentially applicable targeted therapies in EBV-infected GC [4,30]. However, only 28.6 % of EBVinfected GC harbored PIK3CA mutations (n=2/7), and drug related amplifications of JAK2, CD274, PDCD1LG2 and ERBB2 were not detected in the EBV subtype ( Figure  2). To provide applicable therapeutic options, the genetic alterations of PIK3CA AK2, CD274, PDCD1LG2 and ERBB2 should be further validated with large-scale EBVinfected GC.
We observed that the MSI subtype was associated with hyper-mutations in genes and was characterized by a more favorable prognosis than other molecular subtypes. Both the TCGA and ACRG classifications also characterized the MSI subtype by the high mutation frequency and best prognosis [6,28]. For intestinal type GC, patients with a good prognosis were characterized by a high mutation rate and microsatellite instability. Further, mutations of PIK3CA (29.4%) and KRAS (26.5%) were represented in good prognosis subgroup [31]. In our study, mutations of KRAS (26.3%) and PIK3CA (36.8%) were present with statistical significance in the MSI subtype, and KRAS G13D (4 cases) and PIK3CA H1047R mutations (3 cases) were frequently observed (Table 2). In addition, PIK3CA H1047R mutations were also frequently detected in the MSI subtype in a previous study [6]. The genetic alteration of ZBTB20 (P619fs*43, n=5) was limited to the MSI group. This SNP (P619fs*43; rs758277701; COSM267785) also was limited to the MSI group and similar trend was observed (20% of MSI) in TCGA data [28]. The clinical significance of this variation should be evaluated through further studies.
Genetic alterations of the RTK/RAS/MAPK and PI3K/AKT/mTOR pathways were detected in 34.6% of GC cases (n=37) (Figure 2). Thirteen samples (12.2% of GC) harbored ERBB2 alterations, 8 contained somatic base substitutions and 5 harbored amplifications, with these events being mutually exclusive. S310F (two cases) and V842I substitutions (two cases) in ERBB2 were recurrently detected in this study and have been functionally characterized as activating and sensitive to lapatinib in ERBB2-negative breast cancers. The functions of ERBB2 R678Q, which was also recurrently detected in this study, related to anti-ERBB2 (HER2)targeted therapy has not been tested [34]. Ten cases (9.4%) harbored mutated PIK3CA, and KRAS G13D coexisted in 4 cases ( Figure 2). Effects of the co-existence of genetic alterations of PIK3CA and KRAS on response to therapy are yet to be evaluated [4]. Dual PI3K and STAT3 blockade using NVP-BKM120 and AG490 (STAT3 inhibitor) showed a synergistic effect in GC cells harboring mutated KRAS by inducing apoptosis [35].
These biomarkers may facilitate enrollment of GC patients into clinical trials evaluating targeted therapies and provide the basis for developing solid therapeutic approaches in Korean GC patients [36][37][38].
Molecular classification based on four GC subtypes showed no significant survival differences in overall survival (p= 0.898) or RFS (p=0.548) in this dataset. And, ACRG classification-based subtypes also showed no significant association with survival in Korean GC [33]. Therefore, we thought that predicting prognosis for Korea GC patients might be performed more simply and effectively using AJCC stage [39] rather than molecular classification.
We classified molecular subtypes of gastric cancer according to the TCGA system using a targeted NGS panel of 43 genes, EBV, MSI, H. pylori and SNP array. The 43 gene cancer panel consisted of significantly mutated genes from the TCGA and ACRG cohort [6,28,40], genes associated with new targeted therapy of GC (EGFR, ERBB2, FGFR2, and KDR) and hereditary cancer syndromes (CDH1, MSH2, MLH1, STK11, and TP53) [12]. We demonstrated 1) the distribution of GC subtypes according to TCGA molecular group, 2) heritable genetic alterations, 3) environmental factors (EBV and H. pylori), 4) somatic genetic aberrant profiles including driver mutations and drug-targeted genetic alterations, and 5) histopathological features in Korean GC patients.

Subject selection
We obtained a total of 107 gastric tumors and matched non-tumor tissue samples from Yonsei University Wonju Medical Center Biobank (n=138, 69 paired samples) and Samkwang Medical Laboratory Biobank (n= 76, 38 paired samples). Tumor samples were obtained from patients who had not received prior chemotherapy or radiotherapy. The gastric cancer tissues consisted of 69 fresh-frozen (FF) paired tumor and non-tumor tissue samples and 38 formalin-fixed paraffin-embedded (FFPE) paired tumor and non-tumor tissue samples. Clinical data, including age, sex, clinical follow-up data, and pathologic reports, were provided from the tissue source institutions. The histologic classification of gastric carcinoma has previously been based on Lauren's criteria [2] and the 2010 WHO classification system [3]. Tumor TNM stage assignment was evaluated for consistency with the 7th Edition of the TNM classification by the American Joint Committee on Cancer (AJCC) [39]. Pathologic findings were reviewed by experienced gastrointestinal pathologists (S.N.K. and M.C.). The study was approved by the Institutional Review Boards of Samkwang Medical Laboratories and Yonsei University Wonju College of Medicine.

DNA preparation
DNA was extracted from FFPE tumor and adjacent non-tumor gastric tissues using a QIAamp DNA extraction kit (Qiagen, Hilden, Germany) according to the manufacturer's protocol. H&E-stained sections from FFPE blocks were reviewed by a board-certified pathologist, and representative sections with tumor content or benign tissue were identified. A G-DEX genomic DNA extraction kit (Intron Biotechnology, Korea) was used for FF tumor and matched non-tumor FF tissues according to the manufacturer's protocol. The quality and concentration of genomic DNA (gDNA) was evaluated by Nanodrop (ND-1000; Thermo Scientific, DE, USA) and the Agilent 2200 Tape Station system (Agilent Technologies,

Detection of EBV and H. pylori infection
EBV infection was detected using the Real-Q EBV quantification kit (Biosewoom, Seoul, Korea) and CFX96 real-time PCR system (Bio-Rad, USA) following the manufacturer's recommendations.
The entire length of the ROI of the NGS panel of 43 genes was 124,132 bp. To validate the performance of the NGS panel of 43 genes, NA12878 reference materials were used 7 times in 3 batches. The panel average coverage is 1,710× with 97% of targeted bases covered >20×. We downloaded the VCF file for NA12878 (https:// www.impactjournals.com/oncotarget www.ncbi.nlm.nih.gov/variation/tools/get-rm/) and then compared it to 7 variant call sets of our control reference materials (NA12878). The sensitivity and specificity of the 43 gene cancer panel were 96.4 % (95% CI: 0.941 -0.979) and 100%, respectively.
All acquired candidate variations went through post filters recommended by the authors of these tools. We extracted somatic mutations with Varscan2 and post-filtered with downstream analysis for altered allele frequency in tumors > 5%, > 50 x coverage, exonic variants, and population frequency 0.005 less than in the 1000 Genome Project (http://www.1000genomes. org), ESP6500 (http://evs.gs.washington.edu/EVS/), and Exome Aggregation Consortium (ExAC, http:// exac.broadinstitute.org/). We excluded somatic variants detected >2 times in non-tumor tissue. We identified germline variations post-filtered with downstream analysis for altered allele frequency > 30%, > 50x coverage, and population frequency less than 0.01 in the 1000 Genome Project, ESP6500, and ExAC databases. These variants were present in both GC and matched non-tumor" tissue. Visual inspection of filtered calls was performed using Integrated Genomics Viewer 2.3 software (IGV; Broad Institute, Cambridge, MA, USA).

CNV analysis
CNV analysis of the NGS panel of 43 genes was performed with dispersion and the Hidden Markov Model (HMM) method with normalized counts in NextGENe v2.4.1.2 -CNV tool (Softgenetics, State College, PA, USA). The dispersion value was automatically calculated and an HMM was used to merge multiple-exon calls and apply a priori probability. Using the coverage ratio value and the amount of noise in each region, the copy number state of each region in the sample was reported (duplication/normal/deletion) [48]. The NextGENe Viewer (SoftGenetics) was used to visualize the several large CNV calls. To validate the performance of this tool, we compared its results to Her2 immunohistochemistry (IHC) results. Fifty-four cases performed with Her2 IHC consisted of 4 positive cases (score: 3+) and 39 negative cases. We compared the Her2 IHC results and the CNV results from NextGENe-CNV analysis. The sensitivity and specificity of CNV analysis were 75.0% (95% CI: 0.194 -0.993) and 100% (95% CI: 0.929 -1.0), respectively.
The Infinium ® Global Screening Array (Illumina, San Diego, CA, USA) was performed for 69 FF tumor tissues and 22 FFPE tumor tissues according to the manufacturer's recommendations. The hybridized arrays were scanned using the HiScan system (Illumina, San Diego, CA, USA). CNA analysis from single nucleotide polymorphism (SNP) based arrays were performed using GISTIC 2.0. [49]. To eliminate bias from copy number variable regions in healthy individuals, we analyzed CNV in a certain range, including somatic CAN-reported regions in GCs (http://www.cbioportal.org/) and dosagesensitive regions of the genome [50].

Microsatellite instability (MSI) assay
Microsatellite status was assessed by the mononucleotide repeat markers BAT-25, BAT-26, NR-21, NR-24, and NR-27 in tumor and corresponding non-tumor tissues [51]. The five markers were coamplified in multiplex PCRs performed with Solg2X multiplex PCR Smart mix following the manufacturer's recommendations. The amplified PCR products were analyzed using the ABI 3500Dx system (Applied Biosystems, Foster City, CA, USA) and GeneMarker software (SoftGenetics, PA, USA). Tumors with two or more of the five markers showing instability were judged as high-frequency MSI (MSI-H), and tumors showing instability in only one locus were classified as low-frequency MSI (MSI-L) [52].

Molecular subtype classification and statistical analysis
As with the TCGA classification sequence [5], we also divided GC into EBV, MSI and CIN serially according to the results of EBV, MSI and SNP arrays. The remainder was then classified into the GS subgroup.
Fisher`s exact and Chi-squared tests were performed to evaluate differences in the respective proportion of several factors between subgroups. Patient follow-up periods were calculated as time between date of surgery and date of last follow-up (months). Relapse-free survival (RFS) was assessed based on the absence of loco-regional recurrence, distant metastasis, and death from any cause. GC-specific survival (GCSS) was calculated only for patients who died from any GC-related cause. Kaplan-Meier survival curves with log-rank tests were performed to compare RFS and GCSS according to AJCC stage and molecular subtype. Cox proportional hazard models were performed to assess the influence of prognostic factors on RFS. All statistical analyses were performed using SPSS 22.0 (SPSS, Chicago, IL, USA). Except for the univariate analysis, a p value less than 0.05 was regarded as significant.

Author contributions
K.L. contributed to the conception and design of the entire study, selected the eligible patients, and supervised the drafting of the manuscript. Y.K. carried out the molecular genetic studies, participated in EBV assay, MSI assay, the design of gastric cancerrelated target gene panel, data analysis and writing of the manuscript. J.K. contributed to the design of the study, sample selection, interpretation of data and contributed to the critical revision of manuscript. S.N.K. and M.C. reviewed the pathologic findings of GC tissues, S.C.O. supported for the NGS data analysis pipeline. All authors read and approved the final manuscript.