SF3A1 and pancreatic cancer: new evidence for the association of the spliceosome and cancer

A two-stage case-control study was conducted to examine the association between six candidate U2-depedent spliceosome genes (SRSF1, SRSF2, SF3A1, SF3B1, SF1 and PRPF40B) and pancreatic cancer (PC). Subjects with one or two T alleles at rs2074733 in SF3A1 had a lower risk of PC compared to those with two C alleles in combined two populations (OR: 0.59, 95% confidence interval: 0.48–0.73, False discovery rate (FDR)-P = 1.5E-05). Moreover, the presence of the higher-risk genotype at rs2074733 plus smoking or drinking had synergic effects on PC risk. These findings illustrate that RNA splicing-related genes appear to be associated with the occurrence of PC, and show synergic interactions with smoking and drinking in the additive model. In the future, our novel findings should be further confirmed by functional studies and independent large-scale population studies.


INTRODUCTION
Pancreatic cancer (PC), which is a highly malig nant tumor associated with poor prognosis [1], is the fourth leading cause of cancer death in the United States, with an estimated 38,460 deaths in 2013 [2].Also, the mortality rate in China have been risen gradually since 1991 [3].PC is caused by the interaction of environmental and genetic factors.Although many environmental factors (e.g., cigarette smoking, alcohol drinking and dietary factors) have been associated with the development of PC, smoking is the only confirmed factor [1,4].With respect to the germline variation of PC, multiple genes including BRCA2, PALB2, ATM, the ABO loci, NR5A2, CLPTM1L-TERT, BACH1, DAB2, and FAM19A5, have been shown to be associated with the progression of PC [1,[5][6][7][8][9].However, although many genetic factors have been identified, their PC-related interactions with environmental factors have not yet been fully elucidated.
As the field of cancer genomics continues to expand, scores of new cancer genes have been revealed, including genes involved in RNA splicing [10].RNA splicing is an essential function in almost all eukaryotic organisms.During this process, introns are removed from pre-messenger RNAs; splicing factors recognize specific-sequences in the pre-RNA; and the spliceosome connects the exons.There are two types of spliceosomes, the major U2-dependent spliceosomes and the minor U12-dependent spliceosomes, which recognize the major U2-and minor U12-dependent intron groups, respectively [11].During U2-dependent intron recognition, a number of factors play prominent roles, including: the U1 small nuclear ribonucleoproteins (U1 snRNPs) responsible for recognizing 5′-splice site (5′SS); splicing factor1 (SF1), which binds the branchsite A residue; the U2 small nuclear ribonucleoprotein auxiliary factor 35/65 (U2AF35/65) heterodimer responsible for recognizing the 3′-splice site (3′SS) by binding the AG dinucleotide and the downstream polypyrimidine tract of the branch site; and the serinearginine (SR) rich family of RNA-binding proteins (e.g., SRSF1 and SRSF2), which can bind to splicing enhancers through the arginine -serine rich domain and recruit the U1 snRNP and U2AF to the 5′ or 3′SS.SF3B1, splicing factor 3b, subunit 1, which encodes the SF3b1 protein, helps the U2 snRNP bind to the 3′SS with SF3a1 (Supplementary Figure S1).
Although genetic variants of the spliceosome complex are generally believed to play a causative role in cancer, relatively few association studies have been performed on solid tumors, especially PC.We have already verified the association between genetic polymorphism of U2AF35/U2AF65 with PC and revealed the interaction of U2AF65 and smoking may increase the risk of PC previously [19].Subjects with C allele in rs310445 of U2AF65 gene had a 1.31-fold risk to be associated with pancreatic cancer compared to those with TT genotype (p = 0.010).A synergic effect of smoking and C allele of rs310445 was also observed, with synergic index (SI) of 2.08 (95% confidence interval (CI): 1.37-2.78)[19].Here, we selected other six U2-dependent spliceosome genes previously shown to have cancer-associated mutations (SRSF1, SRSF2, SF3A1, SF3B1, SF1 and PRPF40B), and screened 17 putative tag single nucleotide polymorphisms (tagSNPs) in these genes.We used a two-stage case-control study, in which two independent Chinese populations were taken as the screening and validation populations, and used to explore the genetic effect of U2-dependent spliceosomes on PC susceptibility.We also examined potential gene-environment interactions between the identified variant, smoking status, drinking status, and PC risk.

Subject characteristics
Five samples (2 cases and 3 controls) with call rates less than 95% were excluded.298 PC cases and 525 controls were finally included in the screening stage, and 413 PC cases and 557 controls were included in the validation stage.The characteristics of the subjects are summarized in Table 1.There was no significant

Genotyping and assessing the associations between SNPs and PC
In the screening population, we detected the genotyping of 17 tagSNPs in six candidate genes.Three SNPs (rs11231868 in SF1, rs5749066 and rs5749068 in SF3A1) deviated significantly from Hardy-Weinberg Equilibrium (HWE) in the control group [False discovery rate (FDR)-P < 0.05], and were thus excluded from the analysis (Supplementary Table S1).Therefore, 14 SNPs were included in our analysis.Among them, six SNPs in three genes showed potential associations with PC: rs4073998 and rs8626 in PRPF40B; rs2074733, rs5994293 and rs9608886 in SF3A1; and rs8819 in SRSF1.These SNPs were significantly different between the patient and control groups after we adjusted for age, gender (Supplementary Table S2).The same trend was still existed after adjusted for age, gender, smoking and drinking.The P values after FDR correction were 0.014, 5.5E-05, 0.014, 0.023, 0.014 and 0.001, respectively.The ORs and 95%CIs are summarized in Table 2. Further analysis showed evidence for strong linkage disequilibrium between rs4073998 and rs8626 (D' = 0.964, r 2 = 0.717).Therefore, five SNPs (rs4073998 in PRPF40B; rs2074733, rs5994293 and rs9608886 in SF3A1 and rs8819 in SRSF1) were selected for further validation.
Genotyping of these five SNPs was performed in the larger validation population of 413 PC cases and 557 controls.All of the tested SNPs conformed to HWE in the control group.Interestingly, only rs2074733 was significantly different between PC cases and controls after we adjusted for age, gender, smoking and drinking.Compared with the CC genotype, subjects with the TT or CT genotypes had a lower risk of PC [OR 0.54 (95%CI: 0.41-0.73);FDR-P = 3.0E-04] (Table 3).
We further analyzed the effect of rs2074733 in the combined populations of the two stages, and the resultant P value for the Breslow-Day homogeneity test was 0.396.In agreement with the abovementioned results, rs2074733 was significantly associated with PC risk in the combined population.Subjects with one or two T alleles had a lower

Gene-environment interactions with respect to PC risk
Table 4 shows the results of our multiplicative and additive interaction analyses between rs2074733 and smoking or drinking in the combined group.Smoking had a synergic additive interaction with the CC genotype of rs2074733.Compared to nonsmokers with one or two T alleles, there were higher risks of PC among smokers with one or two T alleles, nonsmokers with the CC genotype, and smokers with the CC genotype.The ORs were 1.53 (95%CI: 1.12-2.08),1.45 (95%CI: 1.12-1.88)and 3.05 (95%CI: 2.07-4.50),respectively.The SI, the attributable proportion due to interaction (AP) and the relative excess risk due to interaction (RERI) were 2.43 (95%CI: 1.85-3.01),1.38 (95%CI: 0.37-2.40),and 0.41 (95%CI: 0.22-0.61),respectively.Moreover, we observed an positive additive interaction between drinking and rs2074733 in the combined population too.Compared to non-drinkers with one or two T alleles, the ORs for drinkers with one or two T alleles, nondrinkers with the CC genotype, and drinkers with the CC genotype were 0.84 (95%CI: 0.61-1.14),1.50 (95%CI: 1.17-1.92)and 1.68 (95%CI: 1.16-2.43),respectively.The SI was 2.44 (95%CI: 1.06-3.81).

DISCUSSION
This two-stage case-control study in two independent Chinese populations investigated whether the potential functional SNPs of six U2-dependent spliceosome genes could be associated with PC.A reproducible association between rs2074733 of SF3A1  and PC risk in both populations was identified.And potential synergic additive interactions between CC genotype and smoking/drinking status were found.This is the first study to show a SNP of SF3A1 and its interaction with smoking/drinking and the risk of PC in central and northern Chinese populations.SF3A1 encodes subunit 1 of the splicing factor 3a heterotrimer, which can facilitate the binding of the U2 snRNP to the 3′SS with SF3b1.Studies have shown that the SF3a heterotrimer is necessary for the in vitro conversion of the 15S U2 snRNP into the active 17S particle, which performs pre-mRNA splicing and the knockdown of single SF3 subunits blocks splicing [20].Population studies have revealed that SF3A1 expression may be up-regulated in head and neck cancers, rectal carcinomas, and human non-small and small-cell lung cancers [21][22][23].The aberrant expression of splicing factors, including SF3A1, are known to modify splice site selection [24], thereby influence the splicing of oncogenes and tumor suppressors, and induce the production of mRNA isoforms.This could contribute directly or indirectly to the development, progression and therapeutic response of cancers [25].For example, the abnormal expression of SR family proteins (e.g., SRSF1 and SRSF3) in various human tumors was found to affect the alternative splicing of cancer-related genes, such as RON, BIN1, MNK2, S6K1, KLF6, FoxM1 and HIPK2, thereby creating protein isoforms that could contribute to the proliferation, avoidance of apoptosis, cell cycle modulation, or signal transduction of tumor cells [26][27][28][29][30]. Through our bioinformatic analysis, three SNPs rs5753071, rs10376 and rs10427610 were in complete linkage disequilibrium with rs2074733.They were reported to locate at the site of transcription factor binding, histone modification and open chromatin.And evidence suggested they acted as expression quantitative trait loci for the SF3A1 gene [31].Notably, rs2074733 is located at 22q12.2, which is reportedly associated with the occurrence of lung cancer in Han Chinese [32].Thus, genetic variants rs2074733 in SF3A1 may be involved in other cancer types beyond PC.Future studies are warranted to determine whether changes in SF3A1 can alter the splicing of specific oncogenes or tumor suppressors to promote PC.
We observed synergistic effects between smoking/ drinking and rs2074733.These findings seem biologically plausible.First, smoking is the only confirmed environmental risk factor for PC.Although the underlying mechanisms are poorly understood, a higher frequency of splicing variants for the MDM2 oncogene has been observed in tumors induced in smokers than those in nonsmokers [33].Its over expression reportedly contributes to pancreatic neoplastic transformation [34], and its splicing variants may promote p53-independent cell growth, inhibit apoptosis and contribute to tumorigenesis [35].Furthermore, an in vitro study showed that the activated carcinogenic metabolites of polycyclic aromatic hydrocarbons can induce alternative splicing of MDM2 [33].In the context of drinking, alcohol consumption has been shown to modulate numerous genes in terms of their transcript levels and the ratios of their splice variants [36].Thus, smoking and drinking may alter the splicing of specific oncogenes and tumor suppressors, increasing the risk of tumorigenesis among subjects with the risk variant of rs2074733 in SF3A1.
Two other SNPs of SF3A1 (rs5994293 and rs9608886), which also locate to the region of 22q12.2, were significantly associated with PC in screening population.However, these associations were not replicated in validation population.The same phenomenon was observed for PRPF40B rs4073998 and SRSF1 rs8819, suggesting that these positive associations in screening population may have been due to chance.In the future, additional studies in diverse populations, including PC patients with different stages, should be conducted to examine the possible associations between PC and these SNPs.

Study subjects
In this two-stage case-control study, screening population was conducted among 298 PC patients and 525 cancer-free controls from Central China.Patients were consecutively recruited from January 2008 to September 2012 at Tongji Hospital (Huazhong University of Science and Technology, Wuhan, China).The controls were cancerfree volunteers randomly selected from heath examination programs given during the same period, part of which also included in our previous case-control studies [37].Then, 413 PC patients and 557 cancer-free controls from North China were genotyped for validation.All cases were enrolled from January 2008 to December 2012 at Peking Union Medical College Hospital.The controls were cancer-free individuals selected from a community cancer-screening program for early detection, offered in the same region during the same time.All the PC patients in the two stages were diagnosed with pancreatic ductal adenocarcinoma by histopathologic examination of biopsy or resected tissue specimens, diagnostic imaging studies (computed tomography scan, ultrasound, and endoscopic retrograde cholangiopancreatography or magnetic resonance cholangiopancreatography, or exploratory laparotomy. All enrolled subjects were unrelated Han Chinese.The cases were primary incident pancreatic ductal adenocarcinoma patients.The blood was collected prior to radiotherapy/chemotherapy. The cancer-free controls were frequency-matched to the cases by age (±5 years) and gender.Prior to recruitment, written informed consent was obtained from each subject, and information on demographic characteristics (e.g., gender, age, smoking status, and drinking status) was collected in a face-toface interview.Regular smoking is defined as at least one cigarette a day and for a year or more.Regular drinking is defined as drinking at least once a day for three consecutive months or more.This study was approved by the institutional review boards of Chinese Academy of Medical Sciences Cancer Institute and the Tongji Medical College of Huazhong University of Science and Technology.All the methods were performed in accordance with the approved guidelines and regulations.
First, SNP data for the candidate gene regions were downloaded from the Hapmap database for the Chinese Han Beijing population (Release 27 phase I+II+III; http:// www.HapMap.org) .The Haploview 4.2 software was then used to select tagSNPs based on the criteria of r 2 > 0.8 and a minor allele frequency (MAF) > 0.05.We selected a total of 82 tagSNPs covering the six candidate genes.Thereafter, we used the SNPinfo database to prioritize the 82 tagSNPs by predicting their potential functions, which included splicing regulation, stop codon, polyphen prediction, transcription factor-binding motif, micro RNA binding site, amino acid substitution, and other functional effects.Finally, we selected 17 SNPs for genotyping of the screening population (Supplementary Table S1) using the TaqMan Openarray assay system (Applied Biosystems, Foster City, CA, USA).Thereafter, the five most positive SNPs were genotyped using TaqMan Real-time Polymerase Chain Reaction (RT-PCR, Applied Biosystems, Foster City, CA, USA) in the validation population.To ensure the accuracy of genotyping, quality control was monitored by including 5% random duplicate samples; these duplicates yielded a concurrence rate of 100% (data not shown).We excluded SNPs that had genotype call rates of < 95% or showed deviation from HWE.

Statistical analysis
For each tagSNP, we evaluated HWE in both control groups using a goodness-of-fit chi-square test.In both populations, t-test was used to compare the difference of age distribution between cases and controls.Pearson chi-square tests were used to detect differences between cases and controls in the distributions of gender, smoking status, drinking status and SNPs.Unconditional logistic regression was used to calculate the ORs and their 95% CIs for associations between genotypes and PC in both stages after adjustment for covariates (i.e., age, gender, drinking status, and smoking status).A two-tailed P < 0.05 was used as the criterion of statistical significance.False discovery rate (FDR) is the expected ratio of erroneous rejections of the null hypothesis to the total number of rejected hypothesis among all the genes or SNPs analyzed in this study.The Benjamini and Hochberg method was used to calculate FDR values using in SAS [40].All P values were adjusted by FDR, and we adjusted the P values in the validation stage and combined stage together with the P values in the screening stage.
Before we combined the populations of the two stages, the Breslow-Day test [41] was used to assess the homogeneity of the ORs in the two populations.And we added the stage for adjustment in the combined analysis [42].Multiplicative and additive interactions were tested to evaluate positive interactions between genes and environmental factors.Multiple logistic regression models were used to detect the potential multiplicative interactions, and the log likelihood ratio (LLR) was used to assess whether the model was significantly improved by adding an additional interaction term.SI, AP, and RERI were evaluated in additive interaction analyses.Lack of interaction is indicated by SI = 1, AP = 0 and RERI = 0 [43].All statistical analyses were conducted using the SAS9.2 software (SAS Institute, Cary, NC, USA).The linkage disequilibrium of the SNPs was estimated using the Haploview4.2 software.The additive interaction analysis was performed using the R3.0.0 software (http://www.r-project.org/).

CONCLUSIONS
We herein report SF3A1 rs2074733 as a new susceptibility locus for PC that shows additive interactions with smoking and alcohol drinking.These findings support the potential importance of genetic variants of splicing factors in PC and could help increase the personalization of strategies to prevent PC.Although SNPs with rare polymorphisms may have been missed in this study, our results suggest that additional studies with larger sample sizes should use targeted SNP fine mapping to further identify true causal variants, especially in 22q12.2.Furthermore, additional functional studies are warranted to verify the biological mechanism(s) underlying the interactions between smoking, drinking and spliceosome gene polymorphisms.

Table 2 : Effects of 14 tag SNPs from eight RNA splicing-related genes on PC risk in the screening population
aThe last genotype was used as the reference for OR estimation.b Adjusted by gender, age, smoking and drinking in the unconditional logistic regression.c Each P value was modified by FDR correction for multiple comparisons (the number of comparisons = 14).* Significant difference after FDR correction risk for PC compared to those with CC genotype [OR 0.59(95%CI: 0.48-0.73),FDR-P = 1.5E-05].

Table 3 : Effects of five candidate tagSNPs on PC risk in the validation population
aThe last genotype was used as the reference for OR calculations.b Adjusted by gender,age,smoking and drinking in the unconditional logistic regression.c Each P value was modified by FDR correction for multiple comparisons (the number of comparisons = 19) * Significant difference after FDR correction.

Table 4 : Interactions between smoking, drinking and rs2074733 in the occurrence of PC in the combined group
a Adjusted by gender,age, stage and drinking or smoking.* Statistically significant.www.impactjournals.com/oncotarget