Global proteomic profiling in multistep hepatocarcinogenesis and identification of PARP1 as a novel molecular marker in hepatocellular carcinoma

The more accurate biomarkers have long been desired for hepatocellular carcinoma (HCC). Here, we characterized global large-scale proteomics of multistep hepatocarcinogenesis in an attempt to identify novel biomarkers for HCC. Quantitative data of 37874 sequences and 3017 proteins during hepatocarcinogenesis were obtained in cohort 1 of 75 samples (5 pooled groups: normal livers, hepatitis livers, cirrhotic livers, peritumoral livers, and HCC tissues) by iTRAQ 2D LC-MS/MS. The diagnostic performance of the top six most upregulated proteins in HCC group and HSP70 as reference were subsequently validated in cohort 2 of 114 samples (hepatocarcinogenesis from normal livers to HCC) using immunohistochemistry. Of seven candidate protein markers, PARP1, GS and NDRG1 showed the optimal diagnostic performance for HCC. PARP1, as a novel marker, showed comparable diagnostic performance to that of classic markers GS and NDRG1 in HCC (AUCs = 0.872, 0.856 and 0.792, respectively). A significant higher AUC of 0.945 was achieved when three markers combined. For diagnosis of HCC, the sensitivity and specificity were 88.2% and 81.0% when at least two of the markers were positive. Similar diagnostic values of PARP1, GS and NDRG1 were confirmed by immunohistochemistry in cohort 3 of 180 HCC patients. Further analysis indicated that PARP1 and NDRG1 were associated with some clinicopathological features, and the independent prognostic factors for HCC patients. Overall, global large-scale proteomics on spectrum of multistep hepatocarcinogenesis are obtained. PARP1 is a novel promising diagnostic/prognostic marker for HCC, and the three-marker panel (PARP1, GS and NDRG1) with excellent diagnostic performance for HCC was established.


INTRODUCTION
Hepatocellular carcinoma (HCC) is a worldwide prevalent and deadly neoplasia, which occurs almost in the background of cirrhotic liver as a result of chronical hepatitis virus infection. The prevalence of HBV carriage is reported to be 350 million people worldwide [1]. China has a high HBV prevalence, with approximately 93 million individuals with chronic HBV [2]. It has been reported that 15-20% of chronic hepatitis B patients progress to cirrhosis within 5 years and that the annual incidence of HCC is 2.8% [3]. Hepatocarcinogenesis is a typical multistage process characterized by chronic viral infection, liver cirrhosis, and HCC [4,5].
Despite remarkable advances in diagnostic and therapeutic techniques [6,7], the molecular pathogenesis www.impactjournals.com/oncotarget is extremely complex and heterogeneous even intratumor [8], long-term survival rates remain poor. The best method of achieving long-term survival is diagnosing the disease at an asymptomatic stage when potentially curative treatments are feasible [9]. Surveillance of patients at the highest risk for developing HCC, i.e., patients with cirrhosis, is a critical strategy that can potentially decrease the cancer-related mortality rate [10]. Therefore, the more accurate markers have long been desired to discriminate HCCs from dysplastic nodules (DN) and liver cirrhosis, or to have good prognostic performance for patients with HCC [11]. The global largescale protein profiles of multistep hepatocarcinogenesis are an important step toward the identification of new diagnostic and/or prognostic biomarkers and therapeutic targets. The most protein markers of HCC arise from the various established methods, including indirect gene expression analysis (gene arrays) and direct proteomics [12]. The establishment of isobaric tags for relative and absolute quantitation (iTRAQ) and two-dimensional liquid chromatography−tandem mass spectrometry (2D LC−MS/MS) for large-scale analysis of protein expression is a new tool for markers, which has not been used in the multistage hepatocarcinomagenisis. iTRAQ 2D LC−MS/MS has made it possible to seek novel molecular markers in more large-scale proteomics for diagnosis, outcome prediction, and identifying molecules involved in carcinogenesis in a process of tumor development.
In this study, iTRAQ-2D LC-MS/MS was used to quantitatively analyze the protein alternations of multistep HBV-related hepatocarcinogenesis. We compared protein profiles in a series of 5 pooled samples: healthy subjects, patients with HBV hepatitis, patients with HBV cirrhosis, patients with HCC and their peritumoral tissues. The discovery of the molecular profiles will help to peep into histologic process of hepatocarcinogenesis. From the seven candidate markers, we identified PARP1 as a new promising diagnostic/prognostic biomarker for HCC and established a three-marker panel (PARP1, GS and NDRG1) greatly improving the diagnostic accuracy of HCC in liver nodules. This is the first report concerning the clinical utility of PARP1 for diagnosis and prognosis in patients with HCC.

Protein expression profiles in multistep hepatocarcinogenesis by iTRAQ
To identify protein expression patterns on the spectrum of hepatocarcinogenesis, we performed comparative protein profiling in 5 pooled samples ( Figure 1, Table S1): NLs, HLs, CLs, PLs and HCCs using iTRAQ 2D LC−MS/ MS. In order to reduce false positive results, a strict cutoff for protein identification was applied with the unused ProtScore > 1.3 and at least one peptide with 95% confidence limit [13]. We obtained quantitative data on 37874 sequences and 3017 proteins during hepatocarcinogenesis (Tables S2, S3). Figure 2A showed fold changes of all identified protein in groups of HLs, CLs, PLs, and HCCs relative to NLs. The number of differentially expressed proteins (the fold change cutoff ratio < 0.5 or > 2.0) were highest in HLs followed by HCCs, and the lowest in PLs. We compared global protein expression patterns among different groups. Hierarchical clustering of all identified proteins was performed ( Figure 2B). NL and PL groups clustered well together. HL and HCC groups had similar proteomic patterns and comprised a major sample cluster, CLs added to this group yielded another cluster. These data suggested that the difference between PLs and NLs is minimal, supporting the fact that malignant nodule can be removed with PLs left for patients. Functional annotations of these differentially expressed proteins (compared to NL group) were analyzed. The top 10 behaviors of the biological process were shown in Figure 2C. Biological processes responsible for production of extracellular matrix/components are significantly involved in CLs and PLs, but more obviously in CLs than in PLs. Although PLs are peritumoral tissues, PLs suffer from chronic hepatitis and proceed to cirrhosis. Biological processes about energy production (essential for cell division, e.g., carboxylic acid and glucose metabolic processes) as well as oxidation reduction were most markedly observed in HLs and HCCs.

Screening and validation of diagnostic marker for HCC
To screen the diagnostic markers of HCC, we focused on the proteins which upregulates in HCCs. The top six highest proteins were 14-3-3sigma, N-myc downstream regulated 1 (NDRG1), tumor protein D52 (TPD52), farnesyl pyrophosphate synthase (FDPS), glutamine synthetase (GS), and poly [ADP-ribose] polymerase 1 (PARP1) ( Table 1). To validate whether these proteins are exclusively overexpressed in HCC, we stained them with antibodies in 40 nonmalignant nodules (7 NLs,19 CLs, 14 DNs), 51 HCCs (24 grade 1-2, and 27 grade 3) and 23 ICCs (Table S4). Notably, heat shock 70 kDa protein (HSP70) ranked No. 11 on the list, and has been repeatedly reported as the diagnostic marker for HCC [2, 14,15]. Therefore, HSP70 was also stained for reference. ROC curves were constructed to evaluate the area under the curve (AUC) for these potential markers. The AUCs for FDPS, 14-3-3sigma, TPD52, HSP70, NDRG1, GS, and PARP1 were 0.513, 0.650, 0.674, 0.699, 0.792, 0.856, and 0.872, respectively, and all the AUCs were significant compared with Reference Line except that for FDPS. The PARP1, GS, and NDRG1 showed optimal diagnostic performance, whose AUCs were significantly higher than that for 14-3-3sigma, TPD52, and HSP70. (Figure 3A, 3B). PARP1, GS, and NDRG1 were further analyzed for their performance of diagnosis for HCC. Of the three biomarkers, PARP1 was firstly validated as a novel diagnostic marker for HCC, GS is the classic marker for HCC [2, 15,16], and NDRG1 is identified in HCC previously by our center [17] and other groups [18,19]. Next, we compared the diagnostic performance of PARP1 for HCC with that of GS and NDRG1. The results demonstrated comparable AUCs of the three markers. Strikingly, the AUC value significantly increased to 0.945, when PARP1, GS and NDRG1 were combined (p < 0.001, Figure 3A, 3B). The immunohistochemical features of three markers were shown in Figure 3C.
The optimal cut-points for positive expressions of the three markers were determined in ROC curve analysis with the points closest to the point with both maximum sensitivity and specificity. Thus, tumors designated positive for PARP1, GS, and NDRG1 were those with values above the value of 25%, 5% and 5%, respectively. Using these criteria, the results were summarized in Table 2. All NLs were negatively stained by each of the three markers. The number of immuno-positive cases for which there was at least one marker increased from 7/19 (36.8%) in the case of CLs to 8/14 (57.1%) for DNs and to 51/51 (100%) for HCCs. Immuno-positive cases for which there were at least two markers (regardless of their identity) were observed in 0/7 NLs, in 1/19 (5.3%) CLs, in 4/14 (28.65%) DNs, in 20/24 (83.3%) G1/G2 HCCs, in 25/27 (92.6%) G3 HCCs, and in 7/23 (30.4%) ICCs (Table 2). Further statistical analysis showed that when at least two positive marker was considered, the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy and Youden index for differentiating HCCs from non-HCC tissues were 88.2%, 81.0%, 78.9%, 89.5%, 84.2% and 0.69, respectively (Table 3).

Association between PARP1, GS and NDRG1 expression and clinicopathological features and survival of HCC patients
To validate the reproducibility of these findings and determine the association with specific pathologic features of HCC and with survival, we further performed immunohistochemistry assay of PARP1, GS and NDRG1 in another independent cohort including 180 paired HCCs and PLs. Using the positive criteria described above, the positive expression of PARP1, GS and NDRG1  Correlation analysis showed that the expression of NDRG1 in HCCs was significantly associated with tumor size and differentiation, PARP1 with tumor size and stage (p < 0.05, Table 4). However, no significant associations in characteristics of tumor, such as tumor stage, tumor size, etc. were observed in the high versus low GS groups.
Kaplan-Meier analysis showed that patients with a high PARP1 expression had a significantly poorer prognosis than those with a low PARP1 expression (p = 0.005, Figure 4A). However, the patients with a high GS expression had a significantly better survival than those with a low GS expression (p = 0.023, Figure 4B). High NDRG1 expression was also associated with the poorer over survival as we reported previously (p < 0.001, Figure 4C). Further univariate and multivariate Cox regression analysis indicated that PARP1 and NDRG1 expression were the independent prognostic factors for poor survival of HCC patients (Table 5).

DISCUSSION
Timely and conclusive diagnostic reports are very important for the treatment of hepatocellular nodules. The effectively markers are required to make diagnosis and prognosis more objective and accurate. Although some of tissue biomarkers have been proposed and emphasized in clinical diagnostic/prognostic practice, these biomarkers should be validated in different aetiological/locational HCC, and compared with the newly discovered marker discovered by novel technology. iTRAQ was firstly used in the multistage hepatocarcinomagenisis and provide global protein profiles during hepatocarcinomagenisis. Of the top 6 highest proteins in HCC as well as classic maker HSP70, PARP1, GS and NDRG1 exhibited the best diagnostic performance for HCC, and additional predictive power could be achieved when using a 3 marker panel. PARP1, GS and NDRG1 were further verified in an  independent cohort for diagnostic value and have different prognostic performance for HCC patients after operation. iTRAQ-based proteomics gave an overview of the global protein alternation on the spectrum of multistage hepatocarcinogenesis in this study. Distinct protein markers with different stages of hepatocarcinogenesis could be recognized on the basis of these global protein expression data. Overall, the greatest difference was observed between NL and HL group, but not between NL and HCC group. And the PL cluster well together with NL, which support the clinical practice that malignant nodule can be removed with PL left for patients. Biological processes about energy production (essential for cell division, e.g., carboxylic acid and glucose metabolic processes) as well as oxidation reduction were most markedly observed in HLs and HCCs. Biological processes responsible for production of extracellular matrix/components are significantly involved in CLs and PLs.
We verified the diagnostic performance of these 6 upregulated proteins in HCCs, together with HSP70 in 40 non-HCC nodules, 51 HCCs, and 23 ICCs. Except FDPS, other 6 proteins have more or less diagnostic value     for HCC. In consistent with Zhang, Y.

NL (n = 7) CL (n = 19) DN (n = 14) G1/G2 HCC (n = 24) G3 HCC (n = 27) ICC (n = 23)
[20], 14-3-3sigma is upregulated in HCC in our study, but the diagnostic value is not satisfactory (AUCs = 0.650). 14-3-3sigma is upregulated in gastric breast cancer [21] but downregulated in esophageal squamous cell carcinoma [22]. For TPD52, its overexpression has been described from a multitude of cancer types, including breast, prostate, ovarian, and has been linked to poor prognosis in breast and prostate cancer patients [23][24][25][26]. Little is known about TPD52 expression in liver cancer, we observed its upregulation in HCC but without good enough diagnostic value (AUCs = 0.674). About HSP70, it has been reported as the diagnostic marker for HCC lonely or combined with others and exhibited good accuracy [2, 15,16], and ranked in the top 11. Therefore, we selected HSP70 as reference to evaluate diagnostic performance of focused proteins. However, HSP70 was not good enough diagnostic biomarker in our study (AUC = 0.699). This may be caused by different etiology of HCC. HCV-related HCC accounts for great majority in the previous publications [2, 15,16], while our study included only HBV-related liver disease, which are major etiology of HCC in China. These data may indicate that HSP70 is an inferior diagnostic biomarker for HBV-related HCC. Further analysis showed that the diagnostic performance improve grealy, when PARP1, GS and NDRG1 were combined. To our attention, the three markers were stained with relatively high ratio in ICCs, if the ICCs were excluded from the cohorts, the accuracy of diagnosis HCCs would be much better. Especially, GS was stained in 13/23 (56.5%) ICCs. Of the three biomarker panel, GS is the classic marker for HCC [2, 15,16]; NDRG1 is identified by our center [17] and other groups [18,19]. PAPR1 increase the replication efficiency of HBV, inhibiting the DNA repair capacity, potentially contributing to the development of HCC [27]. PARP1 is firstly identified as diagnostic/prognostic marker for HCC in this study. Our findings not only confirm the value of GS and NDRG1 for the detection of HCC, but also establish a three-marker panel (PARP1, GS and NDRG1) with good diagnostic performanc of HBV-related HCC. We further performed immunohistochemistry in an independent cohort of 180 HCC patients. High PARP1 expression was associated with larger tumor size and poorer survival. As we reported previously, NDRG1 was related with poorer survival. Interestingly, high GS staining was associated with better overall survival although GS failed to be an independent factor for overall survival. Indeed, Dal Bello et al. [28] showed that GS immunostaining correlates with reduced tumor-specific and lower overall mortality after radiofrequency ablation. However, GS expression is reported to be risk for HCC recurrence by Osada [29]. Further studies are needed to confirm the prognostic role of GS in the HCC patient.
In conclusion, this study has characterized the global protein expression profiles during multistage hepatocarcinogenesis, which provide a rich resource of proteins for further exploration about carcinogenesis. We identified PARP1 as a novel biomarker for HCC and demonstrated that a panel composed by NDRG1, GS, and PARP1 is very useful in distinguishing between CLs, DNs and HCCs. The diagnostic and prognostic values of PARP1 and its possible therapeutic applications are worth further investigation.

Study cohorts and sample collection
The samples enrolled in this study were grouped into three independent cohorts based on their usage. The first cohort of 75 samples (Cohort1, Figure 1, Table S1) was used for biomarker discovery by proteomic profiling. These included 15 normal livers (NLs), 15 HBV hepatitis livers (HLs), 15 HBV cirrhotic livers (CLs), 15 HBV-related HCC livers (HCCs) and paired 15 peritumoral livers (PLs). The liver tissues of the cohort 1 after operation were immediately snap-frozen in liquid nitrogen and stored at -80°C until use. Formalinfixed paraffin-embedded tissue samples were obtained from another two separate sets: cohort 2 of 114 subjects (Table S4), and cohort 3 of 180 patients (Table S5) Figure 1.

iTRAQ coupled with 2D LC−MS/MS analysis
To create quantitative protein expression profiles, an iTRAQ experiment was performed with 2D LC −MS/ MS. In each group, proteins extracted from 15 different liver samples were equally mixed for proteomics analysis to improve profiling coverage and quantitative accuracy. [13,30] The iTRAQ labeling was performed according to the manufacturer's instructions (Applied Biosystems, USA). Briefly, 100 μg of protein in each group was precipitated with ice-cold acetone overnight at at −20°C, and then the protein pellets were dissolved and digested using trypsin. The peptide mixtures from each group were labeled with the iTRAQ regents respectively as follows: NL, 113; HL, 114; CL, 115; PL, 119; HCC, 121 ( Figure 1). The differentially iTRAQ-labeled peptides were mixed equally, desalted, and dried for subsequent analysis. The first dimension separation by High pH RP Chromatography was performed on an L-3000 HPLC System (Rigol) by using a C18 RP column (5 um, 250 mm ,4.6 mm i.d., Agela). In the second dimension, Fractions of peptides from the first dimension RPLC were separated by a Low pH RP column (3 um, 10 cm, 75 m i.d., C18) and then subjected to a Triple-TOF 5600 (Applied Biosystems) mass spectrometry for measurement. For protein identification and quantification, the complete set of raw data files (*.wiff) from Triple-TOF 5600 were searched by ProteinPilot version 4.2 using Paragon search engine against the human ref-sequence protein database. The ratios of the peak areas of the five iTRAQ reporter ions reflected the relative abundances of the peptides and the proteins in the above five groups. Cluster 3.0 software was used to investigate the hierarchical clustering of identified proteins. Java Treeview was used for visualization. The biological function of the identified proteins was analyzed on line DAVID.

Immunohistochemistry assay
Expression of the interesting proteins was stained in paraffin-embedded liver samples from HCC patients in Cohort 2 and Cohort 3 (Tables S2, S3). Briefly, 4-μm sections were de-waxed and then treated with an antigen retrieval procedure and incubated in methanol containing 0.5% hydrogen peroxide for 20 min to block endogenous peroxidase. The sections were blocked in normal protein block serum solution, and then incubated with the primary antibody at 4°C overnight, and then washed by PBS buffer for 3 times (5 min of each) at room temperature. It further followed by incubating with HRP-conjugated secondary antibodies (BiotechInc, China) at room temperature for 1 hour. Finally, the sections were subjected to DAB staining and hematoxylin re-staining. A negative control was obtained by replacing the primary antibody with a normal murine or rabbit IgG. Immunoreactivity for proteins was scored using a semi-quantitative method by evaluating the number of positive cells over the total number of liver cells. Scores were assigned by using 5% increments (0%, 5%, 10%. 100%), as reported. [14] The results were independently assessed by two pathologists double-blindly, and concordance on agreed scores was achieved with a high k coefficient value (> 0.80). The antibodies and the dilution were detailed in the Table S6.

Statistical analyses
Statistical analysis was performed using SPSS software version 18.0 (SPSS, Chicago, IL, USA). Qualitative variables were analyzed by the Fisher's exact test and Pearson's chi-squared test, while quantitative variables were analyzed by Student's t test. Receiver operating characteristic (ROC) curves were used to assess the diagnostic value of candidate proteins. The statistical significance of the correlation between biomarker expression and disease-specific survival was estimated by the log-rank test. Cox proportional hazards regression was carried out to identify the independent factors which significantly impact survival. All statistical tests were two-sided, and a p-value < 0.05 was considered statistically significant.

FINANCIAL SUPPORT
The study was supported by the National

CONFLICTS OF INTEREST
We declare that there is no any ethical/legal conflicts involved in the article.