RHCG and TCAF1 promoter hypermethylation predicts biochemical recurrence in prostate cancer patients treated by radical prostatectomy

Purpose: The lack of biomarkers that can distinguish aggressive from indolent prostate cancer has caused substantial overtreatment of clinically insignificant disease. Here, by genome-wide DNA methylome profiling, we sought to identify new biomarkers to improve the accuracy of prostate cancer diagnosis and prognosis. Experimental design: Eight novel candidate markers, COL4A6, CYBA, TCAF1 (FAM115A), HLF, LINC01341 (LOC149134), LRRC4, PROM1, and RHCG, were selected from Illumina Infinium HumanMethylation450 BeadChip analysis of 21 tumor (T) and 21 non-malignant (NM) prostate specimens. Diagnostic potential was further investigated by methylation-specific qPCR analysis of 80 NM vs. 228 T tissue samples. Prognostic potential was assessed by Kaplan-Meier, uni- and multivariate Cox regression analysis in 203 Danish radical prostatectomy (RP) patients (cohort 1), and validated in an independent cohort of 286 RP patients from Switzerland and the U.S. (cohort 2). Results: Hypermethylation of the 8 candidates was highly cancer-specific (area under the curves: 0.79-1.00). Furthermore, high methylation of the 2-gene panel RHCG-TCAF1 was predictive of biochemical recurrence (BCR) in cohort 1, independent of the established clinicopathological parameters Gleason score, pathological tumor stage, and pre-operative PSA (HR (95% confidence interval (CI)): 2.09 (1.26 - 3.46); P = 0.004), and this was successfully validated in cohort 2 (HR (95% CI): 1.81 (1.05 - 3.12); P = 0.032). Conclusion: Methylation of the RHCG-TCAF1 panel adds significant independent prognostic value to established prognostic parameters for prostate cancer and thus may help to guide treatment decisions in the future. Further investigation in large independent cohorts is necessary before translation into clinical utility.

followed by statistical analysis on logit-transformed [3], peak-corrected [4] β-values in R [5] using LIMMA [6] (t-statistics) to identify differential methylation between T and NM samples (Δβ: mean β T -mean β NM ). N and AN samples were pooled into one control group (NM) for further analysis, as analysis for differential methylation (LIMMA applied to peak-corrected M-values) revealed highly similar methylation patterns for the two groups. All samples passed the inclusion criteria of a detection P-value <10 -5 . Multi-dimensional scaling analysis was performed using the 10,000 most variable CpG sites across all tissue samples. Correction for multiple testing (adjusted P-value, adj. P) was performed according to the Benjamini-Hochberg procedure [7].

Bisulfite sequencing
Bisulfite sequencing (BS) of genomic DNA from prostatic cell lines was performed as previously described [8].
Briefly, primers (table S13) were designed using MethPrimer [9]. Bisulfite converted DNA was PCR amplified, gel purified and sub-cloned using the TOPO®TA Cloning® Kit for sequencing (Invitrogen). A minimum of 5 colonies were PCR amplified and sequenced using the M13 forward and reverse primers (included in the TOPO cloning kit), followed by manual inspection of C/T status at each CpG site. Results were visualized and analyzed using QUMA [10].

RNA-seq
A total of 14 T, 6 N, and 6 AN fresh-frozen prostate tissue samples were collected at Dept. of Urology, Aarhus University Hospital, DK (2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011). Of these, 6 T samples were also included in a previous RNA-seq study [11]. Total RNA was isolated from tissue samples using the RNeasy Mini Kit (Qiagen) according to manufacturer's instructions, except that at the time of extraction, 1.5x (vol.) 100% EtOH was added to the tissue samples. All included samples had a RIN score >7, according to RNA Pico chip analysis on a 2100 Bioanalyzer (Agilent Technologies). Whole transcriptome, strand-specific RNA-seq libraries for multiplexed paired-end sequencing were prepared using Ribo-Zero Gold and ScriptSeq v2 kit (Epicentre), as previously described [11].

RNA-seq data handling and statistical analysis:
Paired-end RNA-seq reads were mapped to the human genome (hg19) using TopHat [12] with the Bowtie aligner [13]. HTSeq [14] was used to summarize reads per gene of interest with the "union" overlap resolution mode. Differential expression analysis was performed using edgeR [15] with the most complex dispersion found for each gene. Correction for multiple testing (adjusted P-value, adj. P) was performed according to the Benjamini-Hochberg procedure [7].

External datasets
450K and RNA-seq data sets (297 T, 34 AN) from The Cancer Genome Atlas (TCGA) were downloaded from the TCGA data portal [16,17] and processed as described above.
Marmal-aid data was downloaded from the Marmal-aid database [18]. Raw beta-values were batch-and peak corrected using ChAMP [19].