Quantitative assessment of the diagnostic role of FHIT promoter methylation in non-small cell lung cancer

Aberrant methylation of CpG islands acquired in promoter regions plays an important role in carcinogenesis. Accumulated evidence demonstrates FHIT gene promoter hyper-methylation is involved in non-small cell lung cancer (NSCLC). To test the diagnostic ability of FHIT methylation status on NSCLC, thirteen studies, including 2,119 samples were included in our meta-analysis. Simultaneously, four independent DNA methylation datasets from TCGA and GEO database were analyzed for validation. The pooled odds ratio of FHIT promoter methylation in cancer samples was 3.43 (95% CI: 1.85 to 6.36) compared with that in controls. In subgroup analysis, significant difference of FHIT gene promoter methylation status in NSCLC and controls was found in Asians but not in Caucasian population. In validation stage, 950 Caucasian samples, including 126 paired samples from TCGA, 568 cancer tissues and 256 normal controls from GEO database were analyzed, and all 8 CpG sites near the promoter region of FHIT gene were not significantly differentially methylated. Thus the diagnostic role of FHIT gene in the lung cancer may be relatively limited in the Caucasian population but useful in the Asians.


INTRODUCTION
Lung cancer is a complicated disease involving genetic and epigenetic variation, and is one of the leading causes of cancer death all over the world [1]. Lung cancer is often lacking of symptoms in its early stages, however, the five-year survival rate can be increased from 5% to 63% with the early stage of NSCLC thus showing the importance of early diagnosis of NSCLC [2,3]. DNA methylation is one of the epigenetic modifications in eukaryote, which regulates genes and microRNAs expression [4] and alternative splicing events [5]. It has been observed and confirmed that DNA methylation change is wide-spread in tumor tissues. Hence, with the advantages like good chemical stability, non-invasive detection ability, quantitative signal, reasonable cost and low requirements for sample quality [6], DNA methylation could be a promising biomarker in early cancer detection.
FHIT (fragile histidine triad) belongs to the histidine triad gene family, which encodes Hydrolase of Ap3A [7], and the FHIT-Ap3A enzyme-substrate complex appears to be the tumor suppressor signal [8]. FHIT is located on chromosome 3 and encompasses the common fragile site FRA3B. As a result, translocations and aberrant transcripts of FHIT are frequently occurred by carcinogen-induced damages [9]. FHIT loss was observed in 64% of nonsmall-cell lung cancer patients and was significantly associated with squamous cell carcinoma and poor tumor grade [10]. In addition, aberrant transcripts of FHIT have been found in other kinds of tumors, such as gastric [11], esophageal [12], and colon carcinomas [13]. FHIT has been recently seen as a genome caretaker which is of great importance for genome stability. Multiple studies have found the reduction of FHIT expression in precancerous lesions, indicating its potential suppressing role in carcinogenesis [14][15][16][17][18][19]. The FHIT -/-mice were more prone to develop carcinogen-induced tumors as well as the spontaneous tumors than wild type mice [20,21]. And FHIT viral gene therapy was found to be able to prevent and reverse carcinogen-induced tumors in a gastric cancer mouse model [22]. Moreover, recent studies have found that FHIT can also function as the tumor suppressor by inhibiting EMT [23,24]. In summary, FHIT is now considered as a cancer suppressor gene and the loss or aberrant transcripts of FHIT may be associated with carcinogenesis.
In this study, we performed a meta-analysis to evaluate the ability to use FHIT methylation level for early lung cancer diagnosis. Moreover, we searched The Cancer Genome Atlas project (TCGA) as well as the Gene Expression Omnibus (GEO) database, collecting hundreds of NSCLC samples with whole genome DNA methylation datasets and comprehensive clinical information to validate our meta-analysis and correct for the publication bias [25]. Several studies have showed the improved robustness of combining data from papers and databases [26,27]. Therefore, we innovatively integrated the high-throughput data and published articles to assess and validate the diagnostic ability of FHIT methylation test in NSCLC.

Study characteristics
Based on our search strategy, we firstly identified 948 potentially relevant articles (Medline, 229; Web of science, 549; Embase, 170; Cochrane Library, 0). Reference lists including reviews from the relevant articles were also manually screened for inclusion. More detailed information about the inclusion or exclusion criteria was shown in Figure 1. Finally, 12 studies [28][29][30][31][32][33][34][35][36][37][38][39] were pooled for analysis ( Figure 2 and Table 1). The selection of the criteria was described in method section. All these articles were written in English. In total, 1090 lung cancer tissues/ plasma and 1029 normal counterpart tissues/plasma were collected. The age of the subjects in the 12 studies ranged from 28 to 86, with mean or median age ranging from 53 to 68. The proportions of stage I samples in the 12 studies differed from 0 to 67.33%, and the percentage of male individuals in the NSCLC samples has a range of 65.2 to 83.8% (Table 1). As for the study aim, 4 articles were especially aiming at NSCLC diagnosis, while the others were designed for the NSCLC prognosis or pathogenesis. For the methylation status detection methods, 10 of the 12 inclusions were conducted with methylation-specific polymerase chain reaction (MSP), while others performed quantitative MSP (Methylight). In addition, three kinds of  Table 1).

Meta-analysis and heterogeneity source identification
The odd ratio (OR) for FHIT methylation in cancer group was 3.43 (95% CI: 1.85 -6.36) in random effects model, and 2.03 (95% CI: 1.60 -2.57) in fixed effects model, indicating a slight increase of methylation in lung cancer tissues ( Figure 2). Comprehensive subgroup analyses were also conducted based on different subtypes, lincluding sample types (tissue or plasma), age, counterpart categories (autogenous or heterogeneous), proportion of stage I, proportion of stage I and II, proportion of male, aim of the study (diagnosis or non-diagnosis), ratio of adenocarcinoma to squamous (Ad/Sc) and other potential confounding factors (Supplementary Table 2). Significant differences were found between the ORs of the younger (51.4, 95% CI: 12.07 -221.80) and older (3.30, 95% CI: 1.64 -6.64) subgroup ( Figure 3A) and between the ORs of higher (29.58, 95% CI: 6.82 -128.37) and lower (2.67, 95% CI: 1.32 -5.40) proportion of stage I and II subgroup ( Figure 3B). Interestingly, difference was found between Asian (3.50, 95% CI: 1.50 -8.14, P = 0.005) and Caucasian population (2.55, 95% CI: 0.86 -7.57, P = 0.09) subgroup ( Figure 3C), and the differential methylation in Caucasian population is not significant, indicating that diagnostic ability of FHIT methylation might be limited in Caucasian population. Both tissue and plasma groups showed significant association between FHIT methylation and NSCLC (OR = 3.68 and 3.89, respectively) ( Figure  3D), which suggested that FHIT methylation test is a promising biomarker for NSCLC diagnosis with either tissue or plasma samples. FHIT has been reported to be related with smoking history but not with cancer, thus we conducted the subgroups of the percentage of smoking samples. And we found no significant difference between the smoker%<68% and smoker% >=68% subgroups (Supplementary Figure 7). In addition, significant difference between cases and controls was found in both subgroups of MSP and qMSP (OR = 3.22 and 4.31, respectively), suggesting the robustness of both methods in detecting the methylation status of FHIT promoter region. Heterogeneity analysis revealed that heterogeneity existed among 13 studies (I 2 = 78.8%, Q 2 = 61.05, P < 0.0001) (Figure 2), whereas age, aim and stage were significant heterogeneity resources. The trend in ORs was inversely correlated with age (beta = -3.92, P = 0.05), and age counted for 40.03% of total variance. The aim and stage were also two important heterogeneity sources (P = 0.028 and 0.006), explaining about 51.44% and 17.07% of overall heterogeneity respectively. Other factors such as sample type, proportion of males, detection methods, failed to explain the heterogeneity ( Table 2).
In order to give a robust estimation and bias analysis of our results, a funnel plot of was conducted and the result showed a significant publication bias (Egger test, z =2.76, P = 0.019) and 7 studies exceeded the 95% confidence intervals (Supplementary Figure 1). The adjusted pooled OR after the trim and fill analysis was 2.09 (95% CI: 1.10 -3.96, P = 0.024) in the random effects model indicating a significantly positive association between FHIT methylation and NSCLC (Supplementary Figure  2). Moreover, sensitivity analysis was also applied, the overall ORs were between 2.97 (95% CI: 1.64 -5.37) and 4.10 (95% CI: 2.17 -7.76) in the random effects method, indicating the combined OR was consistent and reliable (Supplementary Figure 3). Finally, the cumulative meta-analysis at the time of the published literature found the OR was tending to be stable (Supplementary Figure 4).

Validation with independent TCGA and GEO lung cancer datasets
In order to validate the above meta-analysis results with independent datasets, we searched and obtained several datasets from TCGA and GEO. For datasets from TCGA, we downloaded lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) methylation datasets. Eight CpG sites located in the same CpG islands as the three sets of primers (Table 3) were obtained after data filtering. In LUAD dataset, though five out of the eight CpG sites showed p-values with statistical significant both in Wilcoxon rank sum test and logistic regression, the absolute mean difference was < 0.1 for all ( Table 3). As a result, none of the eight CpG sites could be considered as differentially methylated between lung adenocarcinoma tissues and adjacent normal tissues. Concordantly, in the LUSC dataset, 3 out of 8 CpG sites showed a p-value <0.05 after multiple correction but the absolute mean difference of the 3 CpG sites were < 0.1, which was the same as in the LUAD dataset and couldn't be regarded as significant methylated as well (Table 3 and Figure 4). Because of the conflicting results came from the meta-analysis and TCGA dataset, we obtained other datasets from the GEO website. The first dataset was the combination of GSE39279 and GSE52401. In GSE39279 dataset, 322 lung adenocarcinoma and 122 lung squamous cell tissues were included. While in GSE39279 dataset, a total of 244 normal lung tissues were included, and both of the datasets used the Illumina HumanMethylation450 Bead Chip for methylation measurement. The two datasets were combined and a total of 444 tumor tissues and 244 normal tissues were included in the subsequent analysis. We performed the same analysis as in TCGA dataset and the result was almost the same. Due to the large number of samples, we found all the p-values of the eight CpG sites were < 0.05 even after multiple test correction (Supplementary Table 3 and Supplementary Figure 5). However, the absolute mean difference of the eight CpG sites were < 0.1 and couldn't be considered as significant methylated CpG sites.
Moreover, we downloaded GSE56044 with 124 NSCLC tissues and 12 adjacent normal tissues for further validation. GSE56044 didn't have clinical  Bold P-values lower than 0.05 indicate the item would be a significant heterogeneity. QE is used to test for residual heterogeneity in meta regression analysis. McaM and McoM represent the mean of case methylation (Beta) and mean of control methylation (Beta). Methylation levels are calculated with formula: Beta = (M/M + U). P-values a were calculated from Wilcoxon rank sum test after false discovery rate (FDR adjustment). P-value b and OR b and 95%CI b are from logistic regression analysis with P-value b were also after false discovery rate (FDR adjustment). www.impactjournals.com/oncotarget information on the subtypes of NSCLC and thus we just utilized NSCLC tissues for subsequent comparison. And the result was unsurprisingly the same as the two datasets mentioned before, showing no significant methylation state of the eight CpG sites (Supplementary Figure 6).

Gene expression data with TCGA RNA-Seq dataset
DNA methylation played a key factor in regulating gene expression. It may be informative to see if the gene expression of FHIT was changed due to the very different results obtained from microarray data and the metaanalysis. We downloaded level 3 RNA-Seq data of LUAD and LUSC from TCGA project. However, after calculating the fold change and p-value with multiple correction, no significantly differential expression was shown both in LUAD (P = 0.58, Fold change = 1.30) and LUSC (P = 5.7x10 -7 , Fold change=1.86) when compared with the adjacent normal tissues. Furthermore, the expression level of FHIT is relatively low in LUAD (mean RPKM=37.04) and its adjacent normal tissues (mean RPKM: 28.49) as well as in LUSC (mean RPKM=17.29) and its adjacent normal tissues (mean RPKM=32.18), which implied that the role of FHIT gene played in NSCLC carcinogenesis need to be further confirmed (Figure 4).

DISCUSSIONS
The FHIT gene loss was observed in 64% of NSCLC patients and is reported to be significantly associate with squamous cell carcinoma and poor tumor grade. However, the diagnostic ability of the methylation status of the FHIT gene in lung cancer still remains unclear. We therefore performed an integrated analysis to give a comprehensive evaluation of the diagnostic ability using FHIT promoter methylation level as a biomarker in NSCLC. As expected, a significant association was found between FHIT methylation and NSCLC in meta-analysis (OR = 3.43), indicating the existence of a strong association between FHIT promoter methylation and lung cancer.
In the validation stage, all the results from three independent datasets showed no significance of differential methylation between NSCLC and normal tissues on account of the small mean methylation difference. It was found that in the dataset from TCGA dataset, none of the eight CpG sites which shared the same CpG island with the primers in the meta-analysis is significantly different methylated. And the result is further confirmed by other two datasets from the GEO database. Furthermore, we downloaded the RNA-Seq data from TCGA project and still no significant differential expression of FHIT gene was found both in LUAD and LUSC when compared with adjacent normal tissues. Besides, the expression level of the FHIT gene is relatively low in comparison with other functional genes in cancer. We should be noticed that all the independent datasets from TCGA and GEO were based on Caucasian population (Supplementary Table 5 -6). The result about Caucasian population from datasets is consistent with the result from meta-analysis, so the relationship between FHIT methylation and NSCLC in Caucasian population is robust. In addition, we also detected the methylation status of FHIT promoter in other kinds of cancers using TCGA datasets for further validation, and similar results were obtained and showed limited diagnostic ability (Supplementary Table 4). Besides, we need more micro-assay and RNA-Seq data based on Asian population to distinguish whether the diagnostic role of FHIT is specific in the Asians.
In our meta-analysis, we found high rate of heterogeneity between the studies (p < 0.0001). Thus we did further research to explore the influential confounding factors. We found that ages, stages as well as the aims are the sources of heterogeneity (Table  2). However, significant odds ratios between FHIT promoter methylation and NSCLC were still retained in most of the subgroups, which is in accordance with the overall meta-analysis results (Supplementary Table  2). Subgroup analysis showed that FHIT methylation is significantly relevant to NSCLC in Asians (OR = 3.50, 95% CI: 1.50 -8.14) but not in Caucasian population (OR = 2.55, 95% CI: 0.86 -7.57), indicating that aberrant methylation of FHIT can be a diagnostic biomarker for NSCLC in Asian population. In the comparative analysis with the other studies, Wu et al found differential methylation of FHIT promoter in both Caucasian and Asian populations, which was different with our findings [40]. In addition, the much more significant difference of FHIT promoter methylation between NSCLC and normal controls was observed in our meta-analysis and in Wu's as well as in Yan's study [41]. The above consistencies and inconsistencies between the three studies implied the need to test the association between FHIT methylation and NSCLC with larger sample sizes and more advanced technology.
There are several limitations in our study. Firstly, the strong heterogeneity of the included studies may decrease the statistical power of our results. Secondly, though we have conducted the trim and fill analysis and sensitivity analysis, the publication bias may still present. Thirdly, we have searched the papers only written in English, while many papers written in other languages were ignored. Due to the previous limitations, we strongly recommend to use more advanced methylation detection methods, like WGBS (whole genome bisulfite sequencing) and RRBS (restricted region bisulfite sequencing), to explore the association between FHIT promoter methylation and NSCLC with larger sample sizes.

Search strategy, selection of studies and data extraction
This pooled study involved searching a range of computerized databases, including Embase, Cochrane Library, OVID Medline and Web of Science for articles published in English by October 2015. The study used a subject and text word strategy with (FHIT OR AP3Aase OR FRA3B) AND (lung cancer) as the primary search terms. Wildcard character of star, dollar or some other truncations were applied according to the rules of the databases to allow effective article collection.
Two independent reviewers (Geng, Guo) screened the titles and abstracts derived from the literature search to identify relevant studies. The following types of studies were excluded: animal and cell experiments, case reports, reviews or meta-analyses and studies of non-case-control studies or studies with insufficient data or those proving inaccessible after making contact with the authors. The remaining articles were further examined to see if they met the inclusion criteria: 1) the patients had to be diagnosed with NSCLC (Ad and Sc), 2) the studies contained FHIT gene promoter methylation data from tissue, blood or plasma, 3) the studies had to be case-control studies which included tissue-tissue, blood-blood or plasma-plasma in case and controls respectively, 4) OR can be calculated or extracted from the text. The reference sections of all retrieved articles were searched to identify further relevant articles. Potentially relevant papers were obtained and the full text articles were screened for inclusion by two independent reviewers (Geng, Guo). Disagreements were resolved by discussion with WP, ZL and AW. Included studies were summarized in data extraction forms. Authors were contacted when relevant data were missing. The name of the first author, year of publication, sample size, age (mean or median), gender proportion (male/female, M2F), the proportion of TNM stage I and II samples (proportion of early stage of NSCLC samples), publication aim (for diagnosis or not), analyzing multiple genes or not (one or more genes detected simultaneously in studies design), control type (autogenous or heterogeneous counterpart) and methylation status of the FHIT promoter in human NSCLC and normal or control tissues were extracted (Table 1).

Meta-analysis and heterogeneity source identification
Data were analyzed and visualized mainly using R Software (R version 3.1.0) including meta, metafor and mada packages [42]. The strength of association was expressed as pooled odds ratio (OR) with corresponding 95% confidence intervals (95% CI). Data were extracted from the original studies and recalculated if necessary.
Heterogeneity was tested using the I 2 statistic with values over 50% and Chi-squared test with P ≤ 0.1 indicating strong heterogeneity between the studies. Tau-squared (τ 2 ) was used to determine how much heterogeneity was explained by subgroup differences. The data was pooled using the DerSimonian and Laird random effects model (I 2 > 50%, P ≤ 0.1) or fixed effects model (I 2 < 50%) according to heterogeneity statistic I 2 [43]. A two-sided P ≤ 0.05 was set as the threshold of being significant without special annotation. With a lack of heterogeneity among included studies, the pooled odds ratio estimates were calculated using the fixed-effects model [44]. Otherwise, the random-effects model was used [45]. Random effects meta-regression was employed to determine how much of the heterogeneity (between-study variance) was explained by the explanatory variables when the heterogeneity was significant. Nine variables were analyzed in metaregression, including control types (autogenous and heterogeneous), gender proportion, proportion of TNM stage I and II samples, mean or median age (> 59 or ≤ 59), single or multiple target detection, sample types (plasma or tissue), methylation detection methods (MSP, qMSP), study designs (diagnosis or non-diagnosis) and primer sets. Sensitivity analyses were performed to assess the contribution of single study to the final result with the abandonment of one article each time. Publication bias was analyzed by funnel plot with mixed-effects version of the Egger test [46]. If bias was suspected, the conventional meta-trim method was used to re-estimate the effect size.  [48][49][50]. All of the above datasets are using Illumina HumanMethylation450 Bead Chip for methylation measurement. The estimation of methylation for each CG probe was calculated between methylated (M) and unmethylated (U) alleles. Specifically: M and U represent the mean signal intensities for about 30 replicates on the array. The methylation signals of the CpG sites in the datasets previously mentioned were all defined according to the beta value. CpG site would be immediately omitted when it was missing in any one or more samples. CpG sites of FHIT gene in TCGA dataset and GEO dataset were not completely the same due to the quality control previously mentioned. P-value was calculated with Wilcoxon rank sum test. To correct for multiple testing, Benjamini and Hochberg procedure was conducted. For identification of differentially methylated CpG sites, adjusted P-value ≤0.05 and absolute mean difference ≥0.1 was set as the criteria. Besides, logistic regression was also conducted to calculate the OR and p-value for every CpG site with Benjamini and Hochberg multiple comparison correction followed. Data was analyzed and visualized mainly with R software (R 3.1.0) [51] [52].

RNA-Seq data extraction and analysis
RNA-Seq data was downloaded from TCGA Data Portal, including 114 lung adenocarcinoma and 104 lung squamous cell carcinoma and 218 paired adjacent normal lung tissues. Level 3 RNA-Seq data was obtained and per million mapped reads (RPKM) was used for gene expression quantification. We assessed the significance of the differential gene expression by comparing the tumor tissues with paired adjacent normal tissues using Wilcoxon rank sum test and following the Benjamini and Hochberg false discovery rate (FDR) correction [52]. For identification of differentially expression genes, adjusted p-value ≤0.05 and fold change ≥2.0 were set as the criteria. All the data analysis was conducted with open-source R software (version 3.1.0).

CONCLUSION
The diagnostic role of FHIT gene in the lung cancer is relatively limited in the Caucasian population but may be useful in the Asians. However, more datasets and studies with large sample sizes are needed for further confirmation.

ACKNOWLEDGMENTS
The computations involved in this study were supported by Fudan University High-End Computing Center.

CONFLICTS OF INTEREST
The authors declare that they have no competing interests.

GRANT SUPPORT
This study was partially supported by the grants from the 111 Project (B13016).