Systematic analyses and comprehensive field synopsis of genetic association studies in hepatocellular carcinoma

Hepatocellular carcinoma (HCC) is one of the most common malignancy in the world. In order to comprehensively examine the association between genetic variants and risk of HCC, a systematic literature search and meta-analyses of the evidences have been performed. With the data from 301 articles, we conducted meta-analyses for 69 polymorphisms involving 46 distinct genes. The result showed that 31 polymorphisms in 25 genes are significantly associated with HCC risk. Cumulative epidemiological evidence for a significant association with HCC risk was graded strong for one polymorphism (NQO1 rs1800566). Furthermore, we provided a database to integrate and analyze the association of genetic variants and HCC risk. To the best of our knowledge, this is the first comprehensive field synopsis and systematic meta-analysis of genetic association with HCC risk. We have provided a useful resource and platform for investigators to explore the association of sequence polymorphisms and HCC risk.


INTRODUCTION
Hepatocellular carcinoma (HCC) is the most common malignancy primary cancer and the third-leading cause of cancer mortality worldwide [1].The prevalence of this cancer shows remarkable geographic heterogeneity, with the highest rates being observed among East Asian and African populations [1,2].Recently, the number of cases increased dramatically in Western countries, and it has been estimated that the annual number of new cases exceeds 700,000 worldwide [3].Despite the rapid progress in diagnostic and therapeutic modalities, the overall 5-year survival rate for HCC is extremely low (18%) [4].Many etiological factors for HCC has been reported [5].Nearly 80% of the HCC are associated with infections with hepatitis B virus or hepatitis C virus.Alcoholic liver diseases and non-alcoholic fatty liver diseases are also major risk factors for HCC [6].HCC is the combined result of a multi-stage, multi factor of long-term exposure and accumulation [7][8][9][10].Multiple studies have revealed that cancer is a genetic disease, and is thought to develop through the acquisition of genetic alterations [11][12][13].Many efforts have been devoted to uncover genetic aberrations in HCC [8,[14][15][16], such as point mutation in p53 (TP53) and β-catenin (CTNNB1), etc.However, our understanding of genetic landscape in HCC is still far from complete and the key drivers of HCC tumourigenesis remain poorly to be understood.
Up to date, numerous works enrolling tens of thousands of subjects have been performed to examine the role of genetic variations in HCC carcinogenesis in the past two decades.Many sequence polymorphisms have been identified as potential genetic factors associated with HCC susceptibility.Notably, many studies might investigate the association between a specific polymorphism with HCC risk, however, the results of these works are not always consistent.Systematic review covering all tested polymorphisms is necessary.Here, we comprehensively evaluate the candidate-gene association studies of HCC risk, and perform meta-analyses for

Research Paper
variants with sufficient data.We provided a systematic synopsis of our current understanding of the genetic basis of HCC susceptibility.

Characteristics of the eligible studies
In this work, we totally identified 505 eligible articles, comprising 282,042 subjects (case: 124,452, 44.1%).A total of 255 polymorphisms in 198 genes were eligible in our analysis (Figure 1).Most of these works (n=496, 98.22%) have been published since 2000.We conducted meta-analyses for 69 polymorphisms in 46 genes that had at least three data sources (301 eligible articles left).For the 69 main meta-analysis works, the mean sample size was 4087 (range from 607 to 14425) with an average of 7.6 independent studies.

Meta-analyses
Detailed meta-analysis results were recorded for each of the polymorphisms.At first, we evaluated these polymorphisms using additive model.Among all the allele contrast meta-analysis, 21 (30%) polymorphisms in 19 genes showed nominally significant associations with HCC risk (P-value < 0.05).The number of subjects enrolled in the meta-analyses ranged from 607 to 14425 (mean: 4087, Table 1).The genotype distributions of these polymorphisms in the control group were all in accordance with the Hardy-Weinberg equilibrium (HWE).Strong associations with HCC (ORs > 2) have been detected for only one polymorphism (PNPLA3 rs738409, OR=2.01).Moderate associations with HCC (ORs 1.5-2.00.5-0.8)were identified for 13 polymorphisms (Table 1).Five additional variants were significantly associated with HCC risk in meta-analyses stratified by ethnicity (Table 1 models to further evaluate the associations of genetic variants with HCC risk.Another 5 significant associated variants in 5 genes were identified using either dominant or recessive models (Table 2).Hepatitis B virus (HBV) infection is particular important risk factor for HCC.Many works have documented the association between genetic polymorphisms and risk of HBV-related HCC.
To assess the cumulative epidemiologic evidence for significant meta-analysis, Venice criteria [17] were applied.Epidemiological credibility were scored as 'strong', 'moderate' or 'weak' by a composite assessment, including the amount of evidence, extent of replication, and protection from bias.For the amount of evidences, 14 grades of 'A', 16 grades of 'B', and 0 grades of 'C' were given.For the extent of replication, 10 grades of 'A', 7 grades of 'B', and 13 grades of 'C' were given.For the protection from bias, 16 grades of 'A', 4 grades of 'B', and 10 grades of 'C' were given.One polymorphism (NQO1 rs1800566) was graded strong for evidence of association with HCC risk using Venice criteria result.Moreover, moderate and weak for the evidence of true association with HCC were assigned to 14 and 16 polymorphisms, respectively.
Previous meta-analyses works have independently analyzed the association between genetic variants and risk of HCC [18][19][20][21][22]. Here, we comprehensively compiled these works, and compared the differences with our results (Supplementary Table S2).The result showed that a total of 42 polymorphisms have been reported by previous meta-analyses, and the results of seven polymorphisms (rs11614913, rs1143627, rs25487, rs2910164, rs1801131, rs17401966 and rs861539) are inconsistent with our work because of their limited data resources.A total of 27 polymorphisms have been comprehensively evaluated for the first time.

HCCdb: a database of HCC-related polymorphisms
To facilitate HCC related polymorphisms integration and online query, we subsequently constructed a database of HCC-related polymorphisms (HCCdb).Currently, HCCdb contains 69 polymorphisms in 46 genes.The current version of HCCdb provides a user friendly search engine, which allows to search the basic content of gene or literature information.Furthermore, HCCdb provided a module to carry out a direct meta-analysis on the polymorphisms.Users can select the genetic model and effect models when performing online meta-analysis.In HCCdb, the OR and 95% CI can be measured to evaluate the strength of associations between polymorphisms and HCC risk.The database is freely available at http:// donglab.ecnu.edu.cn/databases/HCCdb/.

DISCUSSION
Sequence polymorphisms in genes have been considered as underlying candidates in hepatocellular carcinogenesis.In this work, we described the results of the first systematic synopsis and meta-analysis in the field of genetic predisposition to HCC.We found that 31 polymorphisms in 25 genes showed significant associations with HCC risk.Our work provided a comprehensive research synopsis of candidate-gene association study of HCC risk.Using Venice criteria results, we graded one polymorphism strong for cumulative epidemiological evidence of association with HCC risk (NQO1 rs1800566).NQO1 is a cytosolic enzyme and plays an important role in protecting cells against oxidative stress by catalyzing two-electron reduction of numerous quinoid compounds into their less toxic form.Many epidemiological studies have investigated the effect of NQO1 rs1800566 polymorphism on carcinogenesis [23,24], and the effect seems diverse in different malignant tumors.This work showed that the NQO1 rs1800566 polymorphism is a critical risk factor for HCC risk (OR=1.34).However, we failed to evaluate the susceptibility of NQO1 rs1800566 polymorphism to HCC in specific populations because of the limited eligible published case-control studies.
There are still several deficiencies in this work.First, although we have thoroughly searched the literature in PubMed database to identify eligible studies, it is possible that some studies might have been missed.To extend our search, we also checked the related metaanalysis in Google Scholar linking multiple databases.Second, we did not evaluate gene-gene interactions or gene-environment interactions.More additional studies specifically designed to detect these interactions are needed.Third, although Venice criteria offer the advantage for assessing various sources of potential bias, some of the indicators are difficult to measure, such as genotyping error, population stratification and phenotype misclassification.
To the best of our knowledge, this work is the largest and most comprehensive assessment of the literature on the genetic association with HCC susceptibility conducted to date.This work not only summarizes the current literature linking to genetic epidemiology of HCC, but also gives comprehensive data and helpful clues for designing future studies to further investigate genetic risk factors for HCC.In the future, the updating of web-based data collection for disease related studies would help to improve the cumulative evidence for genetic associations in HCC.

Literature search and selection criteria
To identify the genetic variants associated with HCC risk, we searched the PubMed database by using the following keywords: "(liver cancer OR hepatocellular carcinoma) AND (polymorphism OR polymorphisms)" in title field without language restrictions for studies published up to 18 June 2015.Furthermore, Google scholar was also used to search eligible publications using the same keywords.This search produced 2667 potentially relevant publications.After further evaluation, 505 eligible publications were retained (Figure 1).All the eligible studies should met the following criteria: 1) Publications must be published in a peer-reviewed journal; 2) the study used a case-control and other appropriate cohort design in human beings were included; 3) Family-based studies were excluded; 4) HCC cases were diagnosed by pathological and/or histological examination, excluding liver cirrhosis, chronic hepatitis B, acute liver failure, asymptomatic HBV carriers and so on; 5) sufficient genotype data were presented to calculate the odd ratios (ORs) and 95% confidence intervals (CIs).

Data extraction
Data were independently extracted by two reviewer (PZ and YZ) and then checked by another reviewer (DD).The results of a total of 16 publications have some inconsistent and disagreements.All of these disagreements were encountered because of careless.We collected the disagreements and discuss with DD, and made the final decision after re-performing the work.The study provided enough information for the genotypic or allelic distribution of individual variants for both HCC cases and controls was the one we needed.Following characteristics were collected: first author's surname, publication year, ethnicity (categorized as Caucasian, Asian, African, or mixed, including more than one ethnic category [25]), dbSNP ID, gene symbol, variants, source of control (hospital based, population based, family based), numbers of cases and controls of different genotypes, genotyping method, PubMed ID, and the type of the infected virus, respectively.

Statistical analysis
For the stabilization of heterogeneity test statistic (I 2 ) and the operation of sensitivity analyses, we performed meta-analyses for the 69 genetic variants with case-control data available containing at least three independent sources.All statistical tests were conducted by STATA, version 12.0.All tests were two-tailed, and only P-value<0.05was considered significant.To comprehensively analyze the relationship between genetic variants and risk of HCC, we selected three genetic models: additive model, dominant model and recessive model.To illustrate the models, we assume a polymorphism genome locus having two alleles, labeled A and a.A is the high-risk candidate allele and a is the lower-risk allele.Additive model is the same as allele model, represent the effect of the A allele vs. the a allele; dominant model represent the effect of the a/a+A/a vs. the A/A genotypes only when present in two copies of A allele, recessive mode represent the effect of the A/ A+A/a vs. the a/a genotypes when present in either one or two copies of A allele.Summary odds ratios (ORs) with 95% confidence intervals (CIs) for alleles or genotypes were used to assess strength of associations between genetic variants and HCC risk by the randomeffects method [26].In the primary analyses, pooled ORs were acquired for allele contrast.In addition, dominant and recessive models were also assessed on all eligible polymorphisms.For some specific variants, like GSTM1 and GSTT1 'Present/Null', conventional comparisons were used in original studies.If data permitted (at least 3 data source), we also performed subgroup analyses by ethnicities.For common variants (MAF≥5%), the minor allele and major allele sometimes were reversed in different ethnicities, it may lead to deviation.To minimize false-negative errors, for variants that showed no evidence of association with HCC risk in the metaanalyses, only those with admission of six independent datasets were selected for presentation.
Heterogeneity assumption was estimated by Chisquare based on Q-test [27].I 2 statistic was also used to assess heterogeneity [28].Generally, I 2 values less than 25% correspond to mild heterogeneity, values between 25% and 50% correspond to moderate heterogeneity, and values greater than 50% correspond to large heterogeneity between studies.Sensitivity analyses was performed excluding studies whose allele frequencies in controls exhibited significant deviation from the Hardy-Weinberg Equilibrium (HWE), given that the deviation may denote bias.Moreover, the extent to which the combined risk estimate might be influenced by individual studies was assessed consecutively omitting every study from the meta-analyses (leave-one-out sensitivity analyses).Begg's funnel plots [29] and Egger's linear regression test [30] were used to investigate the publication bias.

Evaluation of cumulative evidence
In order to assess statistically significant associations identified by meta-analyses, Venice criteria was employed in this work.It grades the cumulative evidence at three major criteria: (i) amount of evidence; (ii) replication of results; (iii) publication from bias.Amount of evidence was graded by the sum of test alleles or genotypes among both cases and controls in the meta-analysis; 'A' for over 1,000, 'B' for 100 to 1,000, and 'C' for less than 100.Caution was taken when applying this criterion to rare variants with frequency <1%, as an A grade is unobtainable.Replication was graded by the heterogeneity statistic; 'A' for I 2 < 25%, 'B' for I 2 between 25% and 50%, and 'C' for I 2 >50%.Protection from bias may be caused by factors that lead to systematic deviations from the true effect of a genetic association.Biases may operate at the level of a single study, a collection of studies (e.g.meta-analysis), or a research field at large.It can be graded as 'A' if there was no observable bias and bias was unlikely to explain the presence of the association, 'B' if bias could be present or could explain the presence of the association, or 'C' if bias was evident or was likely to explain the presence of the association.Assessment of protection from bias also included consideration of the magnitude of the association; a score of 'C' was assigned to an association with a summary OR < 1.15 unless the association had been replicated prospectively by multiple studies with no evidence of publication bias.Cumulative epidemiological evidence was defined as strong, moderate, or weak.If all three grades were A, we considered it was strong.While if one or more grades were C, it was weak.All other combinations was moderate.

Database construction
The HCCdb system is based on a three-tier architecture: client, server and database.It includes a userfriendly web interface, PHP's DBI module and MySQL database.HCCdb was developed on MySQL v4.1 with the MyISAM storage engine.

Figure 1 :
Figure 1: Profiles of literature search, meta-analysis and evaluation of cumulative evidence.