Detection of nasopharyngeal carcinoma susceptibility with single nucleotide polymorphism analysis using next-generation sequencing technology

Nasopharyngeal carcinoma (NPC) is a head and neck cancer with high incidence in South China and East Asia. To provide a theoretical basis for NPC risk screening and early prevention, we conducted a meta-analysis of relevant literature on the association of single nucleotide polymorphisms (SNP)s with NPC susceptibility. Further, expression of 15 candidate SNPs identified in the meta-analysis was evaluated in a cohort of NPC patients and healthy volunteers using next-generation sequencing technology. Among the 15 SNPs detected in the meta-analysis, miR-146a (rs2910164, C>G), HCG9 (rs3869062, A>G), HCG9 (rs16896923, T>C), MMP2 (rs243865, C>T), GABBR1 (rs2076483, T>C), and TP53 (rs1042522, C>G) were associated with decreased susceptibility to NPC, while GSTM1 (+/DEL), IL-10 (rs1800896, A>G), MDM2 (rs2279744, T>G), MDS1-EVI1 (rs6774494, G>A), XPC (rs2228000, C>T), HLA-F (rs3129055, T>C), SPLUNC1 (rs2752903, T>C; and rs750064, A>G), and GABBR1 (rs29232, G>A) were associated with increased susceptibility to NPC. In our case-control study, an association with increased risk for NPC was found for the AG vs AA genotype in HCG9 (rs3869062, A>G). In addition, heterozygous deletion of the GSTM1 allele was associated with increased susceptibility to NPC, while an SNP in GABBR1 (rs29232, G>A) was associated with decreased risk, and might thus have a protective role on NPC carcinogenesis. This work provides the first comprehensive assessment of SNP expression and its relationship to NPC risk. It suggests the need for well-designed, larger confirmatory studies to validate its findings.


INTRODUCTION
Nasopharyngeal carcinoma (NPC) is a type of head and neck neoplasm typically associated with cervical lymphadenopathy, epistaxis, and rhinocleisis, that shows high incidence in South China and East Asia. Although the exact cause of NPC is in most cases unclear, environmental factors, certain dietary factors, and Epstein-Barr virus (EBV) infection are known to be closely related with its development. In addition, it is believed that genetic factors are important determinants of NPC risk [1][2][3].
A single nucleotide polymorphism (SNP) is a thirdgeneration genetic marker, after first-generation restriction fragment length polymorphism and second-generation simple sequence length (microsatellite) polymorphism. A SNP is a polymorphism in a gene's sequence caused www.impactjournals.com/oncotarget/ Oncotarget, 2017, Vol. 8, (No. 32), pp: 52708-52723 Research Paper by a variation in a single nucleotide. Research on SNPs focus mainly on their impact on three classes of genes: tumor suppressor genes, metabolic enzymes genes, and DNA repair genes, and is an important new approach to understand disease susceptibility and pathogenesis, and to guide diagnosis and individual treatment choice.
SNPs are usually detected by polymerase chain reaction (PCR), which is a technology with low speed and efficiency. The introduction of next-generation sequencing technology (NGS), however, made it possible the detection of SNPs with high-throughput. NGS, used to sequence hundreds of thousands to millions of genetic molecules in parallel, is a second-generation sequencing technology developed after first generation (Sanger) sequencing. NGS is much more convenient than the traditional genetic testing technologies [4,5]. At present, NGS is mainly used in the analysis of whole-genome gene expression profiles, tumor genome resequencing, whole-genome small molecule RNA analysis, whole-genome methylation analysis, and chromosome structure analysis, among others [6][7][8][9][10].
Although many studies have focused on the detection of tumor SNPs by NGS, for example in colorectal cancer, renal cell carcinoma, ovarian cancer, etc. [11][12][13], there are no reports, to our knowledge, evaluating the association of SNPs with NPC susceptibility using NGS. Therefore, we conducted both a meta-analysis of the association of SNPs with NPC risk in East Asian populations, as well as a case-control study to assess the expression of the identified SNPs using NGS. Our aim was to obtain concise, reliable data on the association of SNPs with NPC susceptibility, as to provide a theoretical basis for NPC screening and help guide early prevention efforts in high-risk populations. In addition, we hope our findings will provide new strategies and ideas to improve the diagnosis and treatment of NPC.

Meta-analysis
3754 relevant articles were initially retrieved, and 1821 articles remained after eliminating duplicates. After screening the titles and abstracts according to the inclusion criteria, 184 articles were selected. Data from 97 articles were entered after full-text review. From these, we found a relationship between 279 polymorphic loci in 126 genes and susceptibility to NPC.

Case-control study
Peripheral venous blood samples from 40 NPC patients and 40 healthy volunteers in South China were collected. There were 33 men and 7 women in the case group (Group A), in which the age distribution (mean ± standard deviation) was 46.05 ± 13.77 years. There were 15 men and 25 women in the control group (Group B), with an age distribution (mean ± standard deviation) of 36.13 ± 10.20 years. The results of the DNA and target fragments' quality testing are shown in Figure 1. A single, clear main band and few number of side band indicated that the quality of the extracted DNA and the amplified target fragments was high.
Partial data of automatic sequence, base recognition and data processing by the P-STAR high-throughput instrument are shown in Figure 2. The 15 SNP loci genotypes of all 80 samples could be obtained through automatic analysis with the PSTAR sequencer.
The specific gene distribution of the 15 SNPs in the case group (Group A) and the control group (Group B) is shown in Table 2 and Table 3. SNP-15 (GSTM1) could not be subjected to Hardy-Weinberg equilibrium (HWE) test because it has only two genotypes. Thus, the HWE test was performed on the case and control groups for the other 14 SNPs, and the results are shown in Table 4.
Statistical analysis on all SNPs were performed using the homozygous wild-type genotype as reference to assess the relationship between SNPs and NPC susceptibility ( Table 5). The statistical analysis yielded the following results. Firstly, the mutant heterozygous type of SNP-04 (HCG9, rs3869062, A>G) might be related to the incidence of NPC, as an increased risk was found for the AG versus the AA genotype (OR = 3.061, 95%CI: 1.064-8.804), while no association was found between the GG genotype and NPC risk. Secondly, the SNP-11 variation (GABBR1, rs29232, G>A) might have a protective role in the development of NPC, because the risk of the SNP-11 mutant genotype was lower than that of the wild-type (OR = 0.204, 95% CI: 0.079-0.528). Thirdly, genetic deletion of SNP-15 (GSTM1) might contribute to increased susceptibility to NPC. Finally, no associations were found between the remaining SNPs and NPC risk in our casecontrol study.

DISCUSSION
The study of the association of SNPs with tumor development and progression has become a hot area of research. For instance, using NGS, Gerlinger et al. found that VHL gene aberrations and an increased proportion of C>T transitions at CpG sites were closely related with the occurrence and development of clear cell renal carcinoma [12]. Stemke-Hale et al., on the other hand, reported that several SNPs and other mutations on multiple gene loci, such as KRAS c.35G>A p.Gly12Asp, BRAF V600E, and PIK3CA H1047Y, were related to borderline ovarian tumors [13]. In addition, NGS-based studies describing the association of SNPs and genetic mutations with colorectal cancer [11], breast cancer [36], glioblastoma [37], and adenoid cystic carcinoma [6], have been recently published. Although numerous studies investigated the  relationship of SNPs with NPC susceptibility, the results are generally underpowered and somewhat controversial. Although many methods exist to study SNPs, such as PCR and probe hybridization technologies, only a few studies have analyzed the association of SNPs with NPC susceptibility by high-throughput sequencing technologies, especially NGS. Therefore, and to provide a theoretical basis for NPC screening and early prevention, we performed a meta-analysis to review existing SNP/ NPC association data in East Asian populations, and carried out a case-control study to directly assess, using NGS, the expression of candidate SNPs in 40 NPC patients.
Our meta-analysis revealed that 15 SNPs were associated with the risk of NPC: TP53 (rs1042522, C>G), GSTM1 (+/DEL), IL-10 (rs1800896, A>G), GABBR1 (rs2076483, T>C; rs29232, G>A), MDM2 (rs2279744, T>G), miR-146a (rs2910164, C>G), MDS1-EVI1 (rs6774494, G>A), XPC (rs2228000, C>T), HCG9 (rs3869062, A>G; rs16896923, T>C), HLA-F (rs3129055, T>C), MMP2 (rs243865, C>T), and SPLUNC1 (rs2752903, T>C; rs750064, A>G). TP53 is a tumor suppressor gene located on human chromosome 17p13, and plays an important role in the stress response to hypoxia, gene damage, oncogene activation, and other injuries. TP53 also contributes to maintaining gene stability and regulating cell cycle [14][15][16][17][18]. Glutathione S-transferase M1 (GSTM1), a member of the glutathione s-transferase family, mediates cellular detoxification of electrophilic compounds, including carcinogens, by conjugation with glutathione [19][20][21]. The interleukin-10 (IL-10) gene is located on chromosome 1q31-32, and is mainly expressed in macrophages, mononuclear cells, activated B cells, and Th2 cells. IL-10 is a multifunctional negative regulatory factor and promotes tumor evasion of the immune response by downregulating the production of interferon-γ [22,23]. The gamma-aminobutyric acid type B receptor subunit 1 gene (GABBR1), located on chromosome 6p21.3, encodes the GABAB1 receptor of gamma-aminobutyric acid, the main inhibitory neurotransmitter in the central nervous system. GABBR1 is a metabolic pathway gene, mainly related to neural network remodeling and inhibition of the diffusion of abnormal brain electrical activity [24,25]. The murine double minute 2 (MDM2) proto-oncogene is located on chromosome 12q13-14. The MDM2 T309G  polymorphism (rs2279744) can enhance MDM2 binding to the transcription factor SP1, thereby enhancing its own transcription and increasing the inhibition of the tumor suppressor TP53. MDM2 can also mediate TP53independent tumorigenicity and control cell growth and proliferation through regulation of E2F1 expression via the MDM2-TP53-p21 signaling pathway [16,26]. The micro ribonucleic acid-146a (miR-146a) gene, located on chromosome 5q33.3, is overexpressed or underexpressed in some tumors and functions as an oncogene or tumor suppressor gene [27,28]. The Myelodysplastic Syndrome 1 / Ecotropic Viral Integration site-1 (MDS1-EVI1) complex locus encodes three kinds of proteins, namely EVI1, MDS1, and the fusion protein MDS1-EVI1. MDS1-EVI1 is a strong activator of promoters containing a GATA motif [29,30]. Xeroderma pigmentosum group C (XPC), located on chromosome 3p25, is one of the key genes of the nucleotide excision repair (NER) pathway in the gene damage repair system. The main function of XPC is to identify and excise damaged sites in the genes [30,31]. HCG9, the human leukocyte antigen (HLA) complex 9 gene, belongs to the HLA complex family along with HLA-F. The HLA complex, located on chromosome 6p21.3, is composed of a group of closely linked genes   and is the most complex gene system in the human genome [24,25]. Matrix metalloproteinase 2 (MMP2) is the main and the most widely distributed member of the matrix metalloproteinase family. It is involved in the degradation of extracellular matrix, and can destroy tissue barriers during the process of tumor cell invasion [32,33]. The short palate lung and nasal epithelium clone 1 (SPLUNC1) gene belongs to the palate lung and     were then sequenced by the PSTAR high-throughput instrument. We found a significantly increased risk for NPC in individuals carrying the AG genotype of HCG9 (rs3869062, A>G), as compared with those carrying AA alleles. A positive association with NPC was also detected for a genetic deletion (i.e. heterozygous genotype) of the GSTM1 allele. In contrast, our study indicated a negative association between a SNP in GABBR1 (rs29232, G>A) and NPC, which might indicate a protective role for this gene variant in the carcinogenesis of NPC. In contrast with most previous findings, no associations were found between the remaining candidate SNPs and NPC risk.
In our case-control study, the positive association of HCG9 (rs3869062) and GABBR1 (rs29232) with NPC risk contrasts with the negative association detected between these SNPs and NPC risk in our meta-analysis. The reason of this discrepancy may be that the sample size of our case-control experiment is small, and statistical analysis could not be carried out for some genotypes. Other limitations of our case-control study is that no consideration was paid to possible gene interactions, likely to affect the development of NPC and other cancers as well. However, our study identified three SNPs which are most likely related to NPC susceptibility, and provides a basis for further SNP analyses of NPC risk.
Hundreds of thousands to millions of DNA molecules can be analyzed in parallel using NGS technology, making gene analysis much more convenient and quick. Therefore, our future research plan is to enlarge the sample size to confirm through NGS the association of the three significant genetic variants [HCG9 (rs3869062), GABBR1 (rs29232), and GSTM1 ( + /DEL)] with NPC susceptibility. This should allow validation of a robust NPC risk screening protocol useful to detect at-risk population and guide early prevention efforts.

Meta-analysis methods and SNP identification
A comprehensive search strategy was performed to identify relevant case-control studies about the association of SNPs with NPC susceptibility. Using electronic databases, including PubMed (http://www. ncbi.nlm.nih.gov/pubmed/), Web of Science (http:// wokinfo.com/) and SD (http://www.science-direct.com/), a search strategy was performed based on combinations of the keywords 'nasopharyngeal, rhinopharyngeal, nasopharynx, rhinopharynx, or nasal part of pharynx' and 'cancer, tumor, tumour, neoplasm, carcinoma or adenocarcinoma' and 'polymorphism, variation, genetic, mutation, variant, or SNP'. The last search was updated on April 1, 2014. Although no language restrictions were applied initially, for the full-text review and final analysis the resources only permitted the review of studies published in English. The following eligibility criteria were used: i) studies published in English; ii) case-control design; iii) evaluating the association between SNP and NPC susceptibility; iv) participants were from East Asia, including China, Japan and Korea; and v) the studies indicated sample size, distribution of alleles, genotypes, or other information which could aid in inferring the estimation of odds ratios (OR) and their 95% confidence intervals (CI). Data from the eligible studies, selected in strict accordance with the inclusion criteria, were independently extracted by two investigators. Controversial issues were resolved following group discussion. The following data were extracted from each study: first author's name, publication year, gene, SNP, rs number, country of origin, genotyping method, conclusion (positive/negative), total number of cases and controls, genotype or allele distribution, and the P-value for Hardy-Weinberg equilibrium (HWE). The quality of the eligible studies was assessed by two investigators independently, according to a set of predefined criteria (Supplementary Table 16) originally proposed by Thankkinstian et al [38]. The revised criteria cover the representativeness of cases, the credibility of controls, specimens of cases determining genotypes, HWE in the controls, and total sample size. Disagreements between investigators were resolved by consensus. Study quality scores ranged between 0 (lowest) and 15 (highest). Studies with scores ≥10 were considered high-quality, while those with scores <10 were considered low-quality studies. Articles addressing the same SNP(s) were combined for meta-analysis. Summary ORs and corresponding 95% CIs were used to estimate the strength of the associations between each SNP and NPC risk in different comparison models. Z test was used to evaluate the statistical significance of pooled OR values. The statistical heterogeneity among studies was assessed by Q test and I 2 statistics. A random-effects model (DerSimonian and Laird method) was used to estimate the summary OR when the result of the Q test was P < 0.1 or I 2 ≥ 50%, which indicated the presence of heterogeneity. The fixed-effects model (Mantel-Haenszel method) was used when the result of the Q test was P ≥ 0.1 and I 2 < 50%, which indicated the absence of heterogeneity. Stata software, version 11.0 (Stata Corp., College Station, TX, USA) was used for all statistical analyses. All P-values were two-sided.

Subjects recruitment
The case group consisted of 40 subjects from South China that were pathologically diagnosed with NPC and treated at the Department of Oncology of the First Affiliated Hospital of Guangdong Pharmaceutical University between June 2014 and July 2014. The control group consisted of 40 healthy people from South China that underwent physical examination in the Medical Center of the same hospital during the same time. All subjects signed informed consent. This study was approved by the ethics committees of the First Affiliated Hospital of Guangdong Pharmaceutical University.

SNP genotyping
A 5 ml venous blood sample was collected from each subject. The samples from the 40 NPC patients in the case group (Group A), were numbered A01 to A40. Likewise, the 40 blood samples from the healthy volunteers in the control group (Group B), were numbered B01 to B40. Numbering and basic information of the 15 SNPs associated with NPC susceptibility, as detected by meta-analysis, are shown in Table 6. DNA was extracted from blood samples using the HYK Blood DNA mini Extraction Kit (HYK, Shenzhen, China). The quality and concentration of genomic DNA for each sample were determined using a NanoDrop Spectrophotometer (Thermo Scientific, Wilmington, DE). If the extracted DNA did not meet the experiment's concentration requirement (at least 50 ng/μl), it was reextracted. The quality of the extracted DNA was assessed by agarose gel electrophoresis (Tanon, Shanghai, China). A single, clear main band with a faint side band indicated that the extracted DNA quality was high and uncontaminated. Primer Premier 5 (Premier Biosoft, CA) primer design software was used to design primers targeting a ~100bp region around the polymorphic loci, and primer sequences were compared in the NCBI database to determine specificity. The primers used for PCR amplification of the 15 SNPs, synthesized by Sangon Biotech Co., Ltd (Shanghai, China), are shown in Table 7. A PCR instrument (Eppendorf, DE) was used to amplify the sequence of the detection region of each polymorphic locus, which was called target fragment. The quality of the amplified target fragment was assessed by PAGE gel electrophoresis (Tanon, Shanghai, China). Individually labeled magnetic beads were linked to each target fragment and then retrieved for high-throughput sequencing. The magnetic bead's label corresponding to each target fragment was annotated for proper identification. Finally, a PSTAR-II highthroughput sequencing instrument (HYK, Shenzhen, China) was used, according to the manufacturer's instructions, to detect the target fragments in order to analyze the corresponding polymorphic loci. The sequencing principle of the PSTAR-II was as follows: the sequencing primer was combined to the joint sequence of the sequencing sample by annealing based on the principle of base complementation. Then the fluorescent probe was combined with the sample under the action of the sequencing enzyme. The fluorescent signal emitted at specific excitation wavelengths was collected, and the base of the test site on the sample was determined by the type of fluorescent signal. Information on the sequencing primers is shown in Table 8.

Statistical analysis
PSTAR-II v33.211 software was used for PSTAR high-throughput data analysis. After HWE test (http:// www.oege.org/software/hardy-weinberg.html), the genotype frequencies between Group A and Group B were compared by two-sided χ 2 test. Unconditional logistic regression analysis was used for estimating OR and 95% CI. Three genotypes were discriminated for each SNP (except for GSTM1), and the homozygous wild-type was used as the reference for statistical analysis. A P-value <0.05 was considered statistically significant. Statistical analysis was performed using SPSS version 19.0 software (SPSS Inc., Chicago, IL, USA).