Nasopharyngeal carcinoma risk prediction via salivary detection of host and Epstein-Barr virus genetic variants

Genetic susceptibility and Epstein-Barr virus (EBV) infection are important etiological factors in nasopharyngeal carcinoma (NPC). In this study, in southern China, where NPC is endemic, a single nucleotide polymorphism (SNP) in the EBV-encoded RPMS1 gene (locus 155391: G > A [G155391A]) and seven host SNPs (rs1412829, rs28421666, rs2860580, rs2894207, rs31489, rs6774494, and rs9510787) were confirmed to be significantly associated with NPC risk in 50 NPC cases versus 54 hospital-based controls with throat washing specimens and 1925 NPC cases versus 1947 hospital-based controls with buffy coat samples, respectively. We established a strategy to detect the NPC-associated EBV and host SNPs using saliva samples in a single test that is convenient, noninvasive, and cost-effective and displays good compliance. The potential utility of this strategy was tested by applying a risk prediction model integrating these EBV and host genetic variants to a population-based case-control study comprising 1026 incident NPC cases and 1148 controls. Receiver operating characteristic (ROC) curve analysis revealed an area under the curve of the NPC risk prediction model of 0.74 (95% CI: 0.71−0.76). Net reclassification improvement (NRI) analysis showed that inclusion of the EBV SNP significantly improved the discrimination ability of the model (NRI = 0.30, P < 0.001), suggesting the promising value of EBV characteristics for identifying high-risk NPC individuals in endemic areas. Taken together, we developed a promising NPC risk prediction model via noninvasive saliva sampling. This approach might serve as a convenient and effective method for screening the population with high-risk of NPC.


INTRODUCTION
Nasopharyngeal carcinoma (NPC) is a malignancy associated with Epstein-Barr virus (EBV). It is quite rare in most parts of the world, with incidence rates well below 1 per 100,000 person-years, but it is rather prevalent in southern China, southeast Asia and northern Africa [1].
Remarkably, it has a geographic distribution and strikingly high incidence in the Guangdong Province, southern China, with an incidence rates above 20 per 100,000. Moreover, NPC incidence has remained high for at least the last 30 years in Sihui County, Guangdong Province [2][3][4]. For early-stage NPC, radiotherapy is often curative [5], whilst patients diagnosed at advanced stages have poorer outcomes [6][7][8]. Thus prevention and early detection are key to reducing NPC burden.
Risk prediction models aim to identify individuals at high risk for certain diseases, such as cancer. Risk prediction models of 13 cancer sites, not including NPC, were compiled by the U.S. National Cancer Institute [9]. Risk factors such as patient demographics, behavioral characteristics, medical history, and genetic variants may be useful in identifying high-risk individuals for earlier or more frequent screening and for preventative cancer counseling [10].
Several genome-wide association studies (GWASs) of NPC recently have been published [11][12][13][14]. The largest GWAS, based on subjects of southern Chinese descent, confirmed independent associations with three genetic variants in the human leukocyte antigen (HLA) region (rs2860580, rs2894207 and rs28421666 ), and also revealed three new susceptibility loci (rs1412829, rs6774494, and rs9510787) [13]. In a GWAS metaanalysis, we also identified an NPC risk locus in the CLPTM1L-TERT region on chromosome 5p15.33 (rs31489) [15]. EBV infection is an early, possibly initiating event in the development of NPC and is thought to play a critical role in transforming nasopharyngeal epithelial cells into invasive cancer cells [16,17]. In endemic regions, high-risk EBV variants might exist that contribute to NPC risk.
In the current study, we aimed to construct a new risk prediction model of NPC based on both host and EBV genetic variants determined through saliva testing. Our objective was to develop a noninvasive and convenient method to identify individuals at high risk for NPC.

Association of a SNP in the EBV genome with NPC
Based on genotyping for RPMS1 using nested PCR and Sanger sequencing methods in study population 1, we found that the frequencies of G155391A were significantly higher in the 50 matched samples from NPC patients (84% in tumor biopsy samples and 82% in tumor washing samples) than that in 54 healthy throat washing samples (39%) ( Table 1). The sex and age-adjusted ORs for NPC associated with the A vs. G genotype was 8.41 (95% CI = 3.13-22.62, P = 2.42 × 10 −5 ) based on tumor biopsy samples, and 6.34 (95% CI = 2.52-15.98, P = 9.01 × 10 −5 ) based on throat washing samples. These results confirmed the strong association of G155391A in RPMS1 of EBV with NPC risk in patients from Guangdong Province.

Saliva for simultaneous identification of host and EBV genetic variants
Paired saliva and buffy coat specimens of 103 controls from the study population 3 were used. The genotyping results for the seven host SNPs and one EBV variant in 103 buffy coat samples from study population 3 were compared with those from paired saliva samples. The seven host SNPs were successfully genotyped in all tested samples, and the concordance rates for the seven human SNPs in the paired saliva and buffy coat samples were all 100% ( Table 3). The EBV variant was genotyped in 70.87% (73/103) of the saliva samples but only 9.71% (10/103) of the buffy coat samples ( Table 3), suggesting that saliva is more enriched with EBV than buffy coats.
G155391A was also significantly associated with NPC in study population 3 (OR = 5.74, 95% CI = 4.42-7.46, P = 2.56 × 10 −39 ) ( Table 4). Because NPC risk was also increased among those with the GA genotype, albeit nonsignificantly, we combined GA and G carriers for subsequent analyses.

Risk prediction models based on the host and EBV genetic variants
ROC curve analysis revealed that the area under the curve (AUC) of the NPC risk prediction model that included the seven host SNPs and the EBV variant was 0.74 (95% CI = 0.71-0.76). This combined model    Table 2). Based on these calculations, the NRI was estimated to be 0.30 (P < 0.001), suggesting an improvement in prediction performance due to the inclusion of the EBV variant.

DISCUSSION
We developed and tested a risk prediction model incorporating host and EBV genetic variants as a tool to identify individuals at high risk for NPC in an endemic population. The AUC of the full model was 0.74 in the population-based case-control study and addition of the EBV variant clearly improved the performance of the risk prediction model over the use of host genetic variants alone. Moreover, we established a strategy to simultaneously detect host and EBV genetic variants using a small amount of DNA in saliva. These results represent an advance in NPC-related genotyping methodology and in the prediction of NPC development using a practical approach.
Previous studies demonstrated associations of host SNPs and certain EBV variants with risk of NPC,  Abbreviations: OR: odds ratio; CI: confidence interval *Adjusted for age and sex.
using DNA from buffy coat and throat washing samples, respectively. However, such genotyping information is difficult to obtain because the procedures for separately collecting and genotyping the buffy coat and throat washing samples are complex. Saliva is an efficient, cost-effective, and noninvasive approach to examining predictors or biomarkers of complex diseases, and compliance is high because self-collection of saliva is feasible, painless and noninvasive, thus could improve the donor care [18]. In addition, previous findings show that saliva may be a suitable source of human DNA for SNP analysis [19]. In many studies, infectious disease markers detected in blood are also found in saliva. For instance, viral DNA/RNA, antibodies, and viral antigens have been detected in saliva, and their levels strongly correlate with those in blood samples [20,21]. We speculated that saliva might be a good source of both human and EBV DNA [22,23]. Indeed, our results showed high concordance rates of host and EBV genotypes between blood and saliva, and high call rates of both host and EBV genetic variants in saliva samples. Therefore, our findings demonstrate that saliva is an appropriate material for simultaneously identifying host and EBV genetic variants, and that saliva sampling might be convenient for large-scale population risk prediction of NPC. Eight of 103 paired samples were successfully examined using both saliva and buffy coat samples, and 6 of the samples displayed no difference; thus, the genotyping results for saliva did not completely correspond to those for buffy coat. The number of EBV genomes was previously estimated as a median value of only 7 using 10 6 B cells from the peripheral blood of healthy individuals [24]. Evidence also has indicated that the distribution and interchange of viral strains among peripheral blood mononuclear cells, plasma, and saliva are complex [25].
We confirmed the contributions of host genetic factors, particularly involving the HLA locus, to NPC etiology. Our independent analyses of the G155391A variant in EBV RPMS1 in tumor biopsy and throat washing from study population 1 and in saliva from study population 3 also consistently revealed its strong association with NPC. Thus, G155391A in EBV RPMS1 could serve as a valuable indicator for high risk of NPC in southern China. Furthermore, our results support that the EBV strain in tumor biopsy and throat washing may have the same origin. It has been demonstrated that EBV within intact oropharyngeal epithelium was derived from EBV-infected salivary cells through cell-to-cell contact [26]. Therefore, examining EBV strain in throat washing or saliva samples could replace the need for doing so in tumor biopsy.
Cancer risk prediction models may identify individuals at high risk who could benefit from targeted interventions. For instance, based on a risk prediction model [27,28], an interactive risk assessment tool was designed to estimate a woman's risk of developing invasive breast cancer (http://www.cancer.gov/bcrisktool/) to help guide prevention strategies [29]. In addition, because human papillomaviruses 16 and 18 cause most cases of cervical cancer [30,31], vaccines are now available to prevent initial infection with these strains. Multiple etiological factors are believed to be involved in NPC development, including genetic susceptibility, EBV infection, and diet [32,33]. Thus, we could potentially utilize our risk prediction model in regions with high NPC incidence to identify high-risk groups for prevention or screening strategies specific to this population.
Although EBV antibody and DNA levels in serum are considered sufficiently sensitive and specific for NPC screening [34], the EBV antibody level depends on the host immune response to EBV infection and changes over time.
The first NPC predictive genetic model did not include EBV antibody titers as predictors. [35] Another study showed that combining the most significant host SNPs with EBV IgA antibody status, which is presently used as a biomarker for NPC, did not improve the AUC estimate for NPC diagnosis [36]. Our results also showed that detection of circulating antibodies (VCA and EBNA1 IgA) can identify NPC patients with good performance and no significant differences in EBV antibody levels in subgroups of subjects defined by different EBV genetic variants (data not shown). EBV seromarkers might appear to be useful only for early detection of NPC, whereas other risk predictors may be useful for long-term NPC risk prediction. A more frequent antibody-based screening could be applied among individuals who screen positive by the long-term prediction of DNA-based tests.
More efforts are required to further improve our NPC risk prediction model. Adding informative factors such as family history might improve the predictive ability, and a long-term prospective cohort study is needed to validate any risk prediction model. Given recent developments in the sequencing of the EBV genome, a more valuable approach would be to derive whole genome sequences to identify the possibility of more significant EBV strain variation contributing to NPC risk. Hence, a large prospective study including environmental factors, host genetic factors, EBV characteristics (EBV antibodies, EBV DNA levels and EBV variations), family history and their interactions would be expected to lead to the construction of a more comprehensive risk prediction model.
In summary, we established a strategy to determine host and NPC-associated EBV genetic variants via noninvasive saliva sampling. We further developed a promising NPC risk prediction model integrating host and EBV genetic variants. This approach might provide a convenient and effective method to identify individuals who are at high risk for NPC development, thus reducing the NPC burden in endemic populations.

Study population 1: EBV genotyping
Fifty-four histologically confirmed nasopharyngeal carcinoma (NPC) patients with individually matched samples from tumor biopsy (fresh or paraffin embedded) and throat washing were diagnosed at and enrolled from the Sun Yat-sen University Cancer Center in Guangzhou, China, between 2005 and 2007. The selection criteria for patients were Cantonese, not belonging to NPC family, without any close relative being NPC patient and newly diagnosed.
Sixty healthy controls with throat washing samples were enrolled from the physical examination center at the First Affiliated Hospital of Sun Yat-sen University. The selection criteria for control subjects included no individual history of cancer and matched to NPC cases by age, gender, residential region and ethnics.

Study population 2: Host genotyping
Buffy coat samples were collected from 2023 histologically confirmed NPC cases treated at the Sun Yat-sen University Cancer Center between 2005 and 2010 and from 2009 healthy controls free of NPC, other cancers, or infectious diseases identified from the physical examination center at the First Affiliated Hospital of Sun Yat-sen University. All subjects were residing in the province of Guangdong. The characteristics of these subjects are summarized in Supplementary Table 3.

Study population 3: Risk prediction model testing
Between 2010 and 2014 in Zhaoqing, Guangdong Province, 1026 population-based NPC cases were identified and enrolled through a rapid case ascertainment system involving a network of physicians in the area who diagnose or treat NPC. Through random selection from total population registries, 1148 population-based control subjects were identified and enrolled, with frequency matching to the cases by age, sex, and area of residence. Participation rates were approximately 81% among cases and 83% among controls.
Saliva samples were collected into vials containing an equal volume of prepared lysis buffer (50 mM Tris, pH 8.0, 50 mM EDTA, 50 mM sucrose, 100 mM NaCl, 1% SDS) as previously described [19]. Venous blood was collected simultaneously. Saliva specimens were provided by approximately 93% of participating cases and 89% of participating controls, while blood specimens were provided by approximately 98% and 83% of cases and controls, respectively. Each participant completed a detailed interview conducted face-to-face by a trained interviewer employing a structured questionnaire. The characteristics of these subjects are summarized in Supplementary Table 4.

Ethics statement
Each subject provided informed consent, and the institutional review boards of all participating institutions approved this collaborative study.

DNA extraction
DNA was isolated from tumor biopsy, throat washing, and buffy coat samples using a commercial DNA extraction kit (Qiagen, Germany). DNA from 200 µl of saliva or buffy coat samples in the population-based case-control study (study population 3) was automatically extracted using Chemagic STAR (Hamilton Robotics, Sweden) according to the manufacturer's instructions. DNA concentration greater than 10 ng/µl was required.

Nested PCR and Sanger sequencing
To detect specific EBV genomic variants linked to NPC development, we previously sequenced several EBV-encoded genes, including EBNA1, LMP1, and the BamHI-A rightward transcripts (BARTs) family, in NPC cases and controls from Guangdong Province [37]. The most striking finding was a significant association between a single nucleotide polymorphism (SNP) in the EBVencoded RPMS1 gene (locus 155391: G > A, referred to here as G155391A, resulting in the alternation of Asp to Asn) and NPC risk. RPMS1 encodes a major part of mRNA of the BARTs family and is regularly transcribed in NPC tissues [38]. Thus, G155391A in RPMS1 of EBV may represent a specific EBV variant in the NPC-endemic region of southern China that could serve as an indicator of high risk of NPC in this population.
For the present study, we interrogated the trend of nucleotide polymorphisms in RPMS1 were same in EBV DNA samples from tumor biopsy and throat washing. The RPMS1 fragment was amplified by nested polymerase chain reaction (PCR) using two primer pairs (RPMS1-1/RPMS1-2 and RPMS1-3/RPMS1-4; Supplementary Table 5) and PCR Master Mix (Promega). DNA from the EBV-positive C666 cell line and the absence of DNA were used as the positive and negative controls, respectively. The sequences of the PCR products were identified by Sanger sequencing.

Genotyping
Seven host SNPs previously identified in GWASs (rs2860580, rs2894207, rs28421666, rs9510787, rs6774494, rs1412829, and rs31489), together with 264 405 autosomal SNPs mostly enriched in the exonic regions, were genotyped in buffy coat samples from study population 2 using customized Illumina Infinium HumanExome BeadChips. For quality control of the exome-wide data, the SNP filtering criteria included remaining singleton autosomal SNPs, minor allele frequency above 1%, genotyping call rate above 98%, and no deviation from Hardy-Weinberg equilibrium (P ≥ 0.05 in controls); the sample filtering criteria included call rate above 98% for SNPs per sample. After further examination for relatedness and population structure, 1925 cases and 1947 controls were retained for association analysis of the seven GWAS-identified SNPs. www.impactjournals.com/oncotarget Simultaneous genotyping for the seven human SNPs and the single EBV variant was performed in saliva samples from study population 3 using the Agena Bioscience MassArray platform. The primers used are shown in Supplementary Table 6. Eighty-two samples were regenotyped due to a bad spectrum or no allele information. After quality validation, based on call rate (100%) and no deviation from Hardy-Weinberg equilibrium (P ≥ 0.05 in controls), 1026 cases and 1148 controls were included for further analysis.

Statistical analysis
We used logistic regression analysis to estimate odds ratios (ORs) and 95% confidence intervals (CIs) for associations with NPC risk adjusting for age and sex, assuming an additive model for host genetic variants. For comparisons across SNPs, we used the ORs for the highrisk alleles rather than minor alleles. P values for trend were derived from Cochran-Armitage trend tests. All reported P values are two-sided.
Using a logistic regression method, we constructed risk prediction models based on the seven host genetic variants, the single EBV variant, or all eight variants in saliva samples from study population 3 in which the EBV variant was successfully detected. Receiver operating characteristic (ROC) curve analysis was used to evaluate the performance of the model.
To quantify discriminatory improvement for the model including the EBV variant and the seven host variants compared with the model including only the host variants, we computed the net reclassification improvement (NRI). The predicted risk threshold was set at 0.2 or 0.3, and we used a reclassification table to evaluate how accurately the two models assigned the individuals to the low, intermediate, and high-risk categories.
The statistical analyses were performed using SPSS (version 16.0) and R (version 2.14.0).