Circulating cell-free nucleosomes as biomarkers for early detection of colorectal cancer

The aim was to evaluate serum levels of circulating cell-free nucleosomes (ccfn) containing a variety of epigenetic signals including 5-methylcytosine DNA, histone modifications H3K9Me3, H3K9Ac, H3S10PO4, H3K36Me3, H4K20Me3, H4PanAc and pH2AX, nucleosome variant H2AZ and nucleosome adducts with HMGB1 and EZH2 as well as ccfn per se, in addition to develop and evaluate predictor models based on the above mentioned ccfn and including serum levels of carcinoembryonic antigen (CEA), in early detection of colorectal cancer (CRC). Blood-samples were collected from 4,105 individuals undergoing colonoscopy. Serum levels of ccfn and CEA were determined using enzyme-linked immunosorbent assays platforms. Individual assessment of levels of ccfn showed area under the receiver operating characteristic curve (AUCROC) = 0.525–0.576 in discrimination of individuals with CRC from individuals with non-malignant findings. Predictor models including ccfn containing 5-methylcytosine DNA, CEA, age and gender improved results (AUCROC = 0.736, sensitivity = 0.37 at specificity = 0.90). Further improvement was achieved in discrimination of individuals with CRC from individuals with clean colorectum (AUCROC = 0.840, sensitivity = 0.57 at specificity = 0.90). The levels of ccfn among patients with CRC appeared to be stage-independent. In conclusion, the performance of the developed predictor models is potentially promising in early detection of CRC.


INTRODUCTION
During the last decade, major progress has emerged in diagnostics and treatment of colorectal cancer (CRC), but the disease still poses a major challenge to public health. CRC is one of the most frequent types of cancer and is estimated to account for some 700,000 deaths per year worldwide [1].
To counter this challenge, national population-based screening programs for CRC have been implemented in several countries, including Denmark [2]. Screening has been shown to reduce incidence and improve overall survival by early detection of CRC and by primary prevention through removal of pre-cancerous lesions (adenomas) [3,4].
Colonoscopy is considered the Gold Standard in screening for CRC due to high diagnostic accuracy and the possibility of intervention (polypectomy, biopsy, etc.) during the examination. However, the feasibility of a colonoscopy-based screening program is challenged by capacity requirements, costs and side-effects, such as abdominal pain, transient cardiovascular changes, bleeding episodes and bowel perforation [5]. Therefore, fecal occult blood tests (FOBT) are often used to identify high-risk individuals, who should be offered subsequent Research Paper diagnostic colonoscopy. At present, the most frequently applied FOBT in national screening programs is fecal immunochemical test (FIT) [2]; the sensitivity of FIT for detection of CRC is 0.79 at a specificity of 0.94 [6].
Despite the widespread use of feces-based tests in screening for CRC, the tests are challenged by low compliance; compliance for FIT is around 60% [7,8] leading to a diminished effectiveness of screening. Hence, blood-based biomarkers, as an alternative or supplement to feces-based screening tests, have been subject to intense research. Blood samples are easy to retrieve, and the acceptance within a screening population consisting of mostly healthy individuals are higher compared with fecal samples required for FIT [9,10]. Blood-based biomarkers are a heterogeneous group of various proteins, epigenetic markers, transcriptomes, metabolomes and circulating tumor-derived DNAs, and some have shown promising association to early stages of CRC [11][12][13][14]. In particular combination of various biomarkers have shown complimentary performance in discrimination of CRC [15][16][17], and the application of data fusion to incorporate information regarding age, gender, health-status, etc. to the determination of biomarkers has further increased the performance of some biomarkers [18].
Recent studies indicate that changes in levels of circulating cell-free nucleosomes (ccfn) may have potential as biomarkers for CRC [12,19]. Nucleosomes consist of small DNA chains of approximately 147 bp, wrapped around a histone octamer that contains pairs of H2A, H2B, H3 and H4 proteins. Nucleosomes are bound together with linker DNA, linker histones and other non-histone proteins to form intracellular chromatin. During cell death the linker DNA is digested and nucleosomes are released into the circulation as ccfn, which can be detected by enzymelinked immunosorbent assays (ELISA) [12]. Changes in levels of total ccfn have been shown in the circulation of individuals with cancer as a result of increased cell turnover [20]. Furthermore, altered epigenetic control of gene expression (DNA methylation, modifications in histone structure, etc.) plays a crucial role in carcinogenesis [21]. Changes in levels of ccfn that contain CRC-specific epigenetic alterations can be detected in individuals with CRC [12,19], and possess the potential to be a valuable biomarker in early detection of CRC.

RESULTS
A total of 4,105 individuals referred to diagnostic colonoscopy were included in the study comprising 441 individuals with CRC (279 with colon cancer and 162 with rectal cancer), 143 with primary cancer of non-colorectal origin and 342 with high risk adenomas ( Figure 1). The population included 1,964 men and 2,141 women with median age of 64 . Figure 2 shows the distribution of optical density (OD) measurements of 5 mC according to diagnostic groups. A significant trend (p < 0.001) of decreased levels in individuals with CRC, other cancers and colorectal adenomas are shown compared with individuals with other non-malignant findings and clean colorectum.
Spearman rank correlations were used to assess the association between the ccfn. Correlation coefficients were found to be > 0.5 except for H3K9Me3 where the correlation coefficients with the other ccfn were > 0.3 (data not shown). Figure 3 shows a scatter plot of OD-measurements of 5 mC vs. ccfn per se illustrating the correlation between the two measurements with a correlation coefficient of 0.81 (p < 0.001).
Results of the assessment of influence of disease stage, gender and comorbidity on levels of ccfn and CEA are presented in Table 1. An independence of disease stage (CRC stage I-IV) was shown as comparisons of levels of the ccfn in the four disease groups did not reach significance. As expected, levels of CEA were significantly stage dependent with higher levels associated to higher stage. The level of H3S10PO4 was the only ccfn that showed an association to gender with significantly lower levels in women compared to men; however, all ccfn had R 2 < 0.02 (Table 1). Regarding influence of comorbidity, levels of 5 mC, H4PanAc, H3K9Ac, H3S10PO4, H3K36Me3, EZH2, HMGB1 and CEA (lung disease), H3K9Me3 and pH2AX (cardiovascular disease), and H3K9Me3 (diabetes I/II) were significantly lower in individuals with comorbidity compared to levels in individuals with no comorbidity and clean colorectum; R 2 < 0.05 (Table 1). In the comparison of individuals with rheumatic disease to individuals with no comorbidity and clean colorectum, significance was not reached for levels of any ccfn or CEA (Table 1).
A weak dependence of levels of ccfn on age was shown with decreasing levels at increasing age (R 2 < 0.02) Results from the univariate analysis of the ccfn and CEA regarding the discrimination of individuals with CRC (endpoint 1), individuals with CRC and high-risk adenomas (endpoint 2), individuals with cancer (endpoint 3), individuals with high risk adenomas (endpoint 4) and individuals with other cancers excluding CRC (endpoint 5) www.impactjournals.com/oncotarget are shown in Table 2. For endpoint 1, the area under the receiver operating characteristic (ROC) curve (AUC ROC ) was in the range of 0.525-0.576; results for H3K36Me3 and H4K20Me2 were not significant, but significance was reached for the remaining ccfn. Regarding endpoint 2 and 3, the AUC ROC was found to be in the range of 0.526-0.574; results for all ccfn were significant. AUC ROC was 0.533-0.584 for endpoint 4; results for all ccfn were significant except ccfn per se. Results for endpoint 5 showed insignificant result for the majority of ccfn with the exception of ccfn per se, 5 mC, H3K9Me3 and pH2AX; AUC ROC was 0.508-0.564.
Results from the predictor models (AUC ROC with associated p-value and sensitivities at specificities 0.70, 0.80, and 0.90) are presented in Table 3. The ccfn with independent significance and thus included in the predictor model 1-5 (based on endpoint 1-5, respectively) was 5 mC. The discrimination of individuals with CRC from individuals with non-malignant findings (endpoint 1) yielded AUC ROC = 0.736 (sensitivity = 0.37 at specificity = 0.90) when applying predictor 1. Similar AUC ROC was shown in the discrimination of individuals with cancer (CRC and other cancers) (endpoint 3) and individuals with other cancers (excluding CRC) (endpoint 5); AUC ROC = 0.736 (sensitivity = 0.37 at specificity = 0.90) and AUC ROC = 0.741 (sensitivity 0.39 at specificity 0.90), respectively. In comparison, the discrimination of individuals with high risk adenomas (endpoint 4) and individuals with CRC and high risk adenomas (endpoint 2) from individuals with nonmalignant findings were lower; AUC ROC = 0.646 (sensitivity = 0.21 at specificity = 0.90) and AUC ROC = 0.691 (sensitivity = 0.28 at specificity = 0.90), respectively.
To further explore the performance of the predictor models based on serum levels of ccfn, CEA, age and gender, subgroup-predictor models were developed. Compared to endpoint 1, subgroup-predictor model based on the discrimination of individuals with CRC from individuals with clean colorectum showed an improved AUC ROC and associated sensitivity (AUC ROC = 0.840, sensitivity = 0.57 at specificity = 0.90). Subgrouppredictor model based on the discrimination of individuals with early stage CRC (stage I or II) and late stage CRC (stage III or IV) from individuals with clean colorectum resulted in a difference in AUC ROC of 0.026 in favor of late stage CRC. Eliminating CEA from the subgroup-predictor models resulted in a difference of 0.026 in favor of early stage CRC (data not shown).
Finally, a subgroup predictor model based on the discrimination of individuals with CRC stage I + II + III from individuals with non-malignant findings, thus eliminating the influence of metastatic disease, showed AUC ROC = 0.711, sensitivity = 0.31 at specificity = 0.90. Figure 4 shows ROC curves and AUC ROC of predictor model 1 and 2 and subgroup-predictor models

DISCUSSION
Assessed individually, the determined ccfn have a weak ability to discriminate individuals with high risk adenomas and cancer including CRC from individuals with non-malignant findings (AUC ROC = 0.508-0.584). However, the performance was substantially improved when predictor models based on endpoint 1-5 were constructed combining levels of ccfn determinations (5 mC) with levels of CEA, age and gender. The performance in discrimination of individuals with CRC from individuals with non-malignant findings yielded AUC ROC of 0.736. This is in line with previous research showing association between specific ccfn alterations and CRC [12,19,22], and the improved performance by combination of ccfn as biomarker for cancer [17,19].
The performance of the predictor models in discrimination of individuals with high risk adenomas and CRC (endpoint 2) and individuals with high risk adenomas exclusively (endpoint 4) obtained AUC ROC = 0.691 and AUC ROC = 0.646, respectively. The inferior AUC ROC compared with the performance in discrimination of individuals with CRC may be explained by the noninvasive nature of adenomas compared to carcinomas, which could cause a reduction in release of altered ccfn into the circulation. Supporting this speculation, previous research has showed that the profile of epigenetic alterations in tissue from colorectal adenomas and CRC are similar, and the expression of some alterations is higher in tissue from CRC compared to tissue from adenomas [23,24].
In comparison to the performance in discrimination of individuals with CRC, the performance in discrimination of individuals with cancer including CRC (endpoint 3) and individuals with other cancers excluding CRC (endpoint 5) obtained similar results, AUC ROC = 0.736 and AUC ROC = 0.741, respectively. This may indicate that ccfn represent epigenetic changes associated with carcinogenesis in general and not only carcinogenesis of CRC. However, the findings regarding discrimination of individuals with other cancers are based on a relatively small number of individuals, and further studies are needed to draw a final conclusion.
The performance of the predictor models was further improved in discrimination of individuals with CRC from individuals with clean colorectum which yielded AUC ROC of 0.840 (sensitivity 0.57 at specificity 0.90). This discrimination was performed to provide an indication of the maximum potential performance of the predictor models in a screening setting, as the performance based on endpoint 1-5 might be underestimated due to the design of the study with the inclusion of a symptomatic population. Thus, the individuals of the control group are symptomatic. The progress of CRC does not influence levels of ccfn as variations in levels of ccfn between the groups of individuals of the four stages of CRC were insignificant ( Table 1). The performance of predictor model 1 in discrimination of individuals with early stage CRC compared to discrimination of individuals with late stage CRC is small, and the difference in AUC ROC in favor of late stage disease reverts to favor early stage disease when CEA is eliminated from the predictor model. Furthermore, similar performance of the predictor model was obtained when metastatic CRC (stage IV) was eliminated from the discrimination. Altogether, this indicates that levels of ccfn are independent of progression of CRC. This might be explained by epigenetic changes playing a qualitative role in carcinogenesis. Hence, it is speculated that the amount of DNA and histone changes in the nucleosomes is persistent from the initial neoplastic transformation to the advanced cancer stage making ccfn ideal as a biomarker for early detection of CRC. The independence of disease stage would be an advantageous quality as potential bloodbased screening-biomarker for CRC.
Exclusively, 5 mC was the only ccfn included in the predictor models. This may be explained by the relatively strong correlations observed between the ccfn (Spearman rank correlation) indicating a close and complex interaction between the different histone modifications, variants and nucleosomes containing methylated DNA. The effect of a single nucleosomic modification on the development of CRC may be linked to the effect of numerous modifications in a dynamic complex of various epigenetic regulatory mechanisms [25].
In this study, the blood samples were collected before colonoscopy but after bowel preparation which potentially could affect levels of biomarkers and thereby might induce bias. However, it has been shown that levels of ccfn are not influenced by bowel preparation [26] which supports the potential use of ccfn in screening for CRC even further.
Levels of ccfn decline with increasing age although the overall influence of age is minor as the proportion of the explained variance is very low. In general, the effect of aging on epigenomic regulation of DNA is complex and involves interaction between multiple regulatory  Figure 4. Circulating cell-free nucleosomes containing 5-methylcytosine DNA and CEA were included as explanatory variables along with age and gender in all five predictor models. Endpoint 1: Individuals with colorectal cancer (CRC) vs. individuals with non-malignant findings; Endpoint 2: Individuals with CRC and high risk adenomas vs. individuals with non-malignant findings excluding individuals with high risk adenomas; Endpoint 3: Individuals with cancer vs. individuals with non-malignant findings; Endpoint 4: Individuals with high risk adenomas vs. individuals with non-malignant findings excluding individuals with high risk adenomas; Endpoint 5: Individuals with cancer excluding CRC vs. individuals with non-malignant findings.
proteins and pathways. During aging, a general loss of heterochromatin has been shown which causes a loss of repressive histone marks and methylated DNA [27]. This supports the present findings of decreasing levels of ccfn with increasing age. The effect of gender on levels of ccfn was concluded to be of limited importance since only levels of H3S10PO4 showed difference between genders and all ccfn obtain R 2 -values < 0.02 . Lung disease, cardiovascular disease and diabetes I/II were found to affect levels of some ccfn, but again the influence was found to be minor.
The levels of ccfn were measured in output from ELISA as OD values. However, the assay does not distinguish between the numbers of times a specific alteration appears in a single nucleosome. Hence, the OD output from the ELISA detection is not equivalent to a concentration of ccfn, but is an expression of the amount of altered epigenetic control of gene expression in serum samples.
The Sept9 DNA methylation assay was the first blood-based CRC screening test to be approved for use in an average risk population [28] with a reported sensitivity of 0.48 at specificity 0.915 [29] in an asymptomatic population. The detection rates are correlated to the progression of CRC as advanced disease leads to increased detection rates, and the detection rates for high risk adenomas are low. The performance of the Sept9 DNA methylation assay has been shown to be influenced by age and comorbidities such as diabetes [30]. A direct comparison between the levels of ccfn as blood-based biomarker for early detection of CRC found in this study with the performance of the Sept9 DNA methylation assay is not possible due to differences in study-design. However, the independence of progression of CRC, the minor influence of co-morbidity and age, the relatively high performance in discrimination of individuals with high risk adenomas and the high performance in discrimination of individuals with CRC from individuals with clean colorectum found in this study, may indicate that levels of ccfn are competitive to Sept9 DNA methylation assay as blood-based biomarkers for early detection of CRC. However, further studies based on asymptomatic populations are needed to confirm the results of this study.
The epigenetic regulation of DNA composes a complex interaction between different modifications leading to diverse patterns of gene expression emerging from the same genome [25]. The ccfn evaluated in this study are all associated with CRC, but are not separate events in the epigenetic regulation of the genome during carcinogenesis, reflecting only a tip of the iceberg of the entire epigenetic alterations leading to CRC. This explains the finding of relatively strong correlations between the ccfn.
In conclusion, the performance of the designed predictor models is potentially promising in early detection of CRC. Addition of biomarkers of different quality, for example protein biomarkers, to the present predictor-model based on epigenetic modifications might improve the performance even further, making the biomarker-combination potentially competitive to FIT and thereby beneficial supplement in a screening program. Another future aspect is the possible supplement of the predictor-models to FIT in order to potentially improve performance of current screening-programs. However, further studies are needed to support these proposals.

MATERIALS AND METHODS
The study was a part of the Endoscopy II Project, which has been approved by the Ethics Committee of the Capital Region of Denmark (H-3-2009-110) and the Danish Data Protection Agency (2007-58-0015) and performed according to the Helsinki II Declaration.
Individuals scheduled for first time ever diagnostic colonoscopy due to symptoms attributable to CRC at seven hospitals in Denmark (Bispebjerg, Herning, Hillerød, Horsens, Hvidovre, Randers, and Aarhus hospitals) were included from 1 May 2010 to 30 November 2012.
Inclusion criteria were age ≥ 18, symptoms attributable to CRC and signed informed consent. Exclusion criteria were previous malignant disease, previous large bowel adenoma, member of a Hereditary Non-Polyposis Colorectal Cancer or Familiar Adenomatosis Polyposis family and surgical intervention within 3 months before inclusion. Furthermore, individuals unable to understand the written study-information were excluded from the study. On the day of the colonoscopy an informed consent was signed by the participating individual, and personal data (age, gender, comorbidity (cardiovascular disease, lung disease, rheumatic disease, and diabetes I/II)) was recorded by a dedicated research nurse. Blood samples for serum were collected before colonoscopy according to a validated Standard Operating Procedure and subsequently centrifuged at 3,000 X G for 10 minutes at 21°C. The supernatant serum was transferred to separate freezing tubes leaving 0.5 cm of serum untouched above the buffycoat to avoid contamination from white cells and platelets. Serum samples were labelled with unique barcodes to ensure subsequent identification (FreezerWorks ® ), and all samples were stored at −80°C under 24/7 electronic surveillance.
Findings at colonoscopy including pathological diagnosis were registered in a database in addition to the obtained personal data. After final data compilation a rigorous audit of the database was performed to ensure the validity, and the database was locked.
The endpoints considered were: Endpoint 1: Individuals with CRC (all stages), Endpoint 2: Individuals with CRC and high risk adenomas, Endpoint 3: Individuals with cancer (CRC and cancer of non-colorectal origin). Endpoint 4: Individuals with high risk adenomas, Endpoint 5: Individuals with other cancers (excluding CRC).
The control group for endpoint 1, 3 and 5 was individuals with non-malignant findings including low and high risk adenomas. The control group for endpoint 2 and 4 was individuals with non-malignant findings including low risk adenomas and excluding high risk adenomas.
The criteria for high risk adenoma were: ≥ 1 cm in size or ≥ 3 lesions or villous component or high grade dysplasia [31].
The panel of 12 ccfn assays were constructed based on literature searches on published results for epigenetic modifications associated with carcinogenesis of CRC as well as previous experiences of ccfn modifications and adducts associated with CRC.
Ccfn and CEA levels in the serum samples were determined by Belgian Volition SPRL, Isnes, Belgium using Nu.Q TM ELISA platforms as previously described [12]. In brief, ccfn were captured onto an ELISA plate using an anti-nucleosome antibody and the level of nucleosomes containing a particular epigenetic feature was quantified by binding of a separate detection antibody directed to bind to the epigenetic feature of the nucleosomes.
Absolute mass or other SI units of ccfn have not yet been defined, and no International Standard preparations, absolute or relative, have been reported. Therefore, the performed measurements of levels of ccfn and CEA were expressed in the output from the ELISA detection as OD.
All ELISA measurements on each serum sample were performed in duplicate and results used for the statistical analysis were expressed as the mean of the duplicate measurement. Only results from individuals with a complete set of biomarker determinations (5 mC, H3K9Me3, H3K9Ac, H3S10PO4, H3K36Me3, H4K20Me3, H4PanAc, pH2AX, H2AZ, HMGB1, EZH2, ccfn per se and CEA) were used for statistical analyses. The flow chart (Figure 1) presents an overview of individuals recruited and included or excluded in the study. Missing biomarker determinations were due to lack of sample material and were missing at random.

Statistics
Comparison of ccfn levels between CRC stages, co-morbidities, gender and age were done using linear modelling with the ccfn levels log transformed. Analysis of co-morbidities, gender and age were restricted to individuals with a clean colorectum result. In addition, the analysis of co-morbidities was adjusted for age and gender. The results of these analyses are presented by the R 2 value (reflecting proportion of the total variation explained by the model) and the p-value.
All endpoints considered are binary as described above. Logistic regression analysis with the endpoints as the dependent variable and the biomarkers as explanatory variables has been done. Age and gender have also been included as explanatory covariates. Initial analyses were done for each explanatory variable as a univariate analysis, and then multivariate analysis was performed combining all explanatory variables as well as age and gender, reducing the model to only include statistically significant ccfn. The final model was developed using 10-fold cross validation with backwards selection using the Akaike information criterion. The results are presented by the ROC curves with AUC ROC as a measure of discrimination and the sensitivities at pre-specified specificities (70, 80, and 90%). For each endpoint, a predictor, i.e. a linear combination of the significant explanatory covariates, has been established; exclusively, 5 mC and CEA were included as explanatory variables along with age and gender in all five predictor models. Subgroup analyses restricted to early stage CRC (stage I or II), late stage CRC (stage III or IV), and restricting the controls to individuals with clean colorectum have been performed. P-values less than 5% are considered significant. Database management and calculations have predictor models based on the discrimination of endpoint 1 (A), endpoint 2 (B), the discrimination of early stage CRC (Fa) and late stage CRC (Fb) from individuals with non-malignant findings, the discrimination of individuals with CRC (G) from individuals with clean colorectum, and the discrimination of individuals with CRC and high risk adenomas (I) from individuals with clean colorectum. Letters in parenthesis refer to results from Predictor models listed in Table 3.