Magnetic resonance tumor regression grade (MR-TRG) to assess pathological complete response following neoadjuvant radiochemotherapy in locally advanced rectal cancer

This study aims to evaluate the feasibility of a magnetic resonance (MR) automatic method for quantitative assessment of the percentage of fibrosis developed within locally advanced rectal cancers (LARC) after neoadjuvant radiochemotherapy (RCT). A total of 65 patients were enrolled in the study and MR studies were performed on 3.0 Tesla scanner; patients were followed-up for 30 months. The percentage of fibrosis was quantified on T2-weighted images, using automatic K-Means clustering algorithm. According to the percentage of fibrosis, an optimal cut-off point for separating patients into favorable and unfavorable pathologic response groups was identified by ROC analysis and tumor regression grade (MR-TRG) classes were determined and compared to histopathologic TRG. An optimal cut-off point of 81% of fibrosis was identified to differentiate between favorable and unfavorable pathologic response groups resulting in a sensitivity of 78.26% and a specificity of 97.62% for the identification of complete responders (CRs). Interobserver agreement was good (0.85). The agreement between P-TRG and MR-TRG was excellent (0.923). Significant differences in terms of overall survival (OS) and disease free survival (DFS) were found between favorable and unfavorable pathologic response groups. The automatic quantification of fibrosis determined by MR is feasible and reproducible.


INTRODUCTION
Magnetic Resonance (MR) is the most accurate imaging modality to stage locally advanced rectal cancer (LARC). MR role in stratification of patient risk and in guiding patient management has been widely investigated [1][2][3][4][5][6]. The strength of this technique is based on its ability to distinguish normal rectal wall from pathologic tissues on the basis of the differences in signal intensity achievable on T2-weighted sequences [4].
Neoadjuvant radiochemotherapy (RCT), which is the treatment of choice in patients with locally advanced rectal cancer (LARC) [7,8], induces a development of fibrosis within the tumor which decreases the contrast with vital tissue. Thus, the use of MR for restaging after RCT is hampered by the difficulty to distinguish post-treatment fibrosis from residual tumor, due to their very similar T2 signal intensity [9].

Research Paper
that in these patients, surgery may be deferred and an active surveillance can be performed [10,11]. There is no consensus on the method to identify CR after RCT [12].
Histopathological tumor regression grade (P-TRG), defined as the ratio between fibrosis and residual tumor, is routinely used to assess response to therapy and demonstrated to be an important predictor of patient's outcome [13][14][15][16].
Previous experiences [17][18][19]] developed a TRG system based on MR T2 weighted sequences by applying the principles of histopathological grading, MR-TRG, and demonstrated a good correlation with patient's outcome. However, in all published experiences, the quantification of fibrosis was assessed on the basis of a visual evaluation performed by experienced radiologists, reporting variable results [17][18][19].
The primary aim of our study was to develop an algorithm for automatic quantification of the fibrosis induced by RCT and to evaluate whether it can be used to identify CRs. The secondary aim of the study was to use the quantitative evaluation of fibrosis to develop an MR-TRG score and to evaluate the agreement with P-TRG.

RESULTS
A total of 65 patients completed all the three phases of the study (Figure 1). Twenty-three patients (35.3%) achieved complete response (pCR) at histology, while 42 patients (64.7%) achieved either partial response (pPR) or no response (pNR). No differences in terms of sex, age or tumor characteristics were observed between pCR and pP/ NR. Patients characteristics are summarized in Table 1. Mean time consumed for contouring the entire tumor volume was 10 minutes (± 3.62 minutes, median: 9.5 minutes).
According to ROC analysis, a cut-off value of 81% of fibrosis was identified to discriminate between favorable and unfavorable pathologic response groups. Thus, in the group of favorable pathologic response were included only tumors with a percentage of fibrosis equal or greater than 81%. Accordingly, in the group of unfavorable pathologic response all tumors with a percentage of fibrosis equal or lower than 80% were included.
Performances of the automatic quantification of fibrosis algorithm are summarized in Table 2. Using the aforementioned cut-off value of fibrosis, a sensitivity of 78.26% (95% CI: 56.3 -92.5) and specificity of 97.62% (95% CI: 87.4 -99.9) were calculated ( Figure 2). Eighteen of the 23 pCRs (78%) were included in the favorable group. Five pCRs (22%) were included in the unfavorable groups. One pP/NR (2%) was included in the favorable group.
During the follow-up period (30 months), 14 patients (21.5%) died as a result of cancer-related causes. Twenty-five (38%) patients had disease progression for local recurrences with or without metastatic disease.
A significant difference between favorable and unfavorable pathologic response groups for OS and DFS

DISCUSSION
Our results demonstrated that the automatic fibrosis quantification is feasible and reproducible. The proposed method provided high sensitivity and specificity for the identification of CR after neoadjuvant RCT. The automatic fibrosis quantification was able to identify www.impactjournals.com/oncotarget CR and P/NR, with high sensitivity (78%), specificity (97%) and AUC (0.947). These results should be compared to previous studies using a visual assessment and histology as reference standard. Bhoday et al [19] correctly identified 17 out of 18 CRs in a total population of 143 patients using a cut-off value of 50% of fibrosis to classify favorable MR-TRG group. However, using this cut-off value a large number of false positives were observed resulting in an overall sensitivity of 15.3% and a specificity of 96.9%. Patel et al [6], using the same visual approach of Bhoday et al, reported a sensitivity of 61.7% and a specificity of 90.9% for identification of ypT0-T3a tumors considered as favorable result after neoadjuvant RCT. Performances of our software-based method are notable not only in terms of sensitivity and specificity, but especially because in our cohort only pT0 tumors were considered as CRs.
Other MRI biomarkers have been proposed for identification of CRs. In particular, diffusion weighted imaging (DWI) and dynamic contrast-enhanced MRI (DCE-MRI) showed the highest accuracy. A recent study reported a sensitivity of 35% and a specificity of 94% [25] combining DWI and T2w morphologic sequences. It has been demonstrated that the evaluation of DWI volumes increases the sensitivity of this biomarker up to 64% [26]. Several studies investigated the accuracy of DCE-MRI reporting variable results. However, the evaluation of standardized index of shape (SIS), demonstrated to be the most accurate and reproducible method to identify CRs with a sensitivity higher than 90% and a specificity higher than 80% [27,28].
In the era of organ preservation strategies, it is crucial to correctly identify CRs, needing a high sensitivity, but it is even more important to correctly classify P/NR, needing a high specificity, to avoid the delay of surgery in a patient with residual tumor.
Because the automatic fibrosis quantification can be used to determine MR-TRG classes, a comparison with P-TRG should be performed. We found a good correlation between the two methods (weighted kappa: 0.91), as opposite to a previously published study [29]. One of the possible explanation of the good results obtained is the use of 3.0 Tesla MRI scanner while most of the previous literature is based on datasets acquired on 1.5 Tesla systems. The higher strength of the magnetic field is, in fact, associated with a higher signal and overall higher image quality.
Despite our good results, a limitation of our approach should be underlined. The algorithm we used divides pixels necessarily into two groups: high signal for residual tumor and low signal for fibrosis. With this approach, MR-TRG 0 and 5 (respectively 100% and 0% of fibrosis) cannot be calculated, thus we grouped p-TRG 0 and 1 as well as p-TRG 4 and 5 and we finally compared four TRG classes for both p-TRG and MR-TRG. However, a real site-to-site comparison between the two methods is difficult to be realized in clinical practice, since it would require the surgical specimen to maintain the in vivo anatomy (as in MRI) while it is preliminarily dissected and fixed in formalin. Another major difference is that the evaluation with MRI is performed on the entire tumor volume while histopathology is performed on sample sections.
Histopathologic TRG has been shown to be an independent prognostic factor after neoadjuvant RCT [30,31]. And studies using MR-TRG demonstrated similar data, showing a significant difference in terms of OS and DFS between poor and good MR-TRG groups [6,32,33]. Also, our results found a significant difference in terms of OS and DFS between favorable and unfavorable pathologic response groups determined with our algorithm.
One of the main limits of the visual fibrosis quantification is the poor interobserver agreement. The reproducibility of the method has been previously investigated reporting a variable agreement between readers, ranging between poor and good [18,29,34]. In these previous publications, a higher agreement was  Contraindication to the use of neoadjuvant therapy or surgical treatment.
Suspension of neoadjuvant combination chemotherapy-radiation treatment prior to surgery, presence of synchronous tumors, mucinous histotype, neurological or psychiatric disorders or previous pelvic radiotherapy.
Hypersensitivity to the study drug or to one of the excipients.

Legal incapacity.
Concurrent treatment with experimental drugs or participation in another clinical trial with any investigational drug within 30 days before study screening.
Alcohol or drug abuse. IV administration of 2ml/kg of body weight of gadolinium chelate followed by a 15 ml saline flush at a rate of 2 ml/s.  The histogram shows the distribution of pixels on the basis of their signal intensity. www.impactjournals.com/oncotarget observed for identification of poor responders compared with good responders. Moreover, Patel et al [18] found disagreement between central reviewer and second reviewer (expert reader and less experienced reader, respectively) in the identification of complete responders, due to a higher percentage of good responders assessed by the central reviewer, this result demonstrated the influence of experience in visual assessment. In our study, to reduce potential bias related to reader expertise, we used low experienced readers (fifth year residents in radiology), and we found a moderate interobserver agreement (ICC: 0.48) for the manual quantification of the tumor volume which is an operator dependent process. However, the interobserver agreement for the automatic fibrosis percentage quantification, based on the datasets manually contoured by the operators, returned very good (ICC=0.89). This result underlines the usefulness of an automatic quantification method, which is crucial to standardize the procedure.
The time consumed to draw the ROI around the margins of the tumor on all axial slices, consisting in 10 minutes per dataset, should be considered as a limit of the proposed method. However, despite this drawback, the computing system we used only takes few seconds to perform the analysis.
Despite the excellent performances of the proposed method, we should underline some limitations. First, the quantification of the percentage of fibrosis was a retrospective process performed on a prospectively recruited population. Second, the study population is relatively small. Nevertheless, our sample size is similar to the ones reported in most of the previous studies. Third, we did not validate the method on a control group. Finally, we applied our approach only to T2 weighted images. Theoretically, the algorithm we used also works with other sequences like DWI or perfusion maps. However, this would require a dedicated study.
In conclusion, our results demonstrated that automatic fibrosis quantification with MRI is feasible, provides better results compared to visual assessment and can be considered a reliable method to identify CRs after neoadjuvant RCT.

Subjects
The study was conducted according to Good Clinical Practice (GCP)-International Conference on Harmonization (ICH) [20]. All patients signed a written informed consent to be enrolled in the study. The protocol was approved by the Local Ethical Committee (Rif. 2737/28.03.2013).
All patients were enrolled between May 2013 and December 2014.
Patients with histologically-confirmed rectal adenocarcinoma (Stage II and Stage III according to the International Union Against Cancer (IUCC) classification [21]), were included in this non-randomized, prospective, multi-center trial (two centers). Exclusion criteria are listed in Table 3.
All patients underwent optical colonoscopy with biopsy for immunohistochemical analysis and an MR study for tumor staging. Two weeks after staging, patients started the neoadjuvant RCT protocol. An MR study for restaging after RCT was performed within one week before surgery. All patients underwent total mesorectal excision (TME) 6-8 weeks after the end of RCT. All gross specimens were analyzed by one single pathologist. Patients were followed up for thirty months at intervals of three months with physical examination, routine blood tests and yearly whole body computed tomography to assess local recurrences or distant metastases.

MR protocol and image analysis
All MR examinations were performed using a 3T scanner (Discovery MR750, General Electrics, Milwaukee, Wisconsin, USA) using a phased-array coil, with the protocol described in Table 4.
For the purpose of this study, only T2 weighted images acquired after neoadjuvant RCT were analyzed.
Following a previous experience [22], we decided to use the algorithm K-means for the automatic quantification of fibrosis. This algorithm, implemented in an in-house software developed in MATLAB software (The MathWorks Inc., Natick, Massachusetts, United States), automatically partitions data (n) into k numbers of mutually exclusive clusters; (k<n). The number of partitions is driven by the operator. We set k = 2 to cluster pixels into two partitions on the basis of their median signal intensity: a high signal intensity partition assumed to represent the residual tumor and a low signal intensity partition assumed to represent fibrosis ( Figure 4).
Before the automatic analysis, a manual contouring of the entire tumor volume was performed on each axial section of T2-weighted images.
Tumor volume was considered as the entire mass appreciable including both fibrosis and vital tissue and excluding the lumen of the colon.
Two fifth year residents in radiology (with 1-year of experience on MR of the rectum), separately, manually contoured each lesion with 3D-slicer, a free platform for biomedical research (Brigham and Women's Hospital, Boston, MA, USA). Time consumed to contour the entire tumor volume was recorded.
The manual contouring process provided as output the total tumor volume in cubic millimeters. The contoured dataset was processed with the k-means algorithm which provided two outputs, the volumes of low (fibrosis) and high (residual tumor) signal intensity pixels in cubic millimeters.
The percentage of fibrosis within the total tumor volume was calculated according to the following formula: (Fibrosis volume / Tumor volume) x 100 Then, we divided tumors into four classes, according to the P-TRG described by Dworak-Rodel, on the basis of the percentage of fibrosis developed after RCT. Thus, in MR-TRG class 1 were grouped all tumors with ≤ 25% of fibrosis, in class 2 between 26% and 50%, in class 3 between 51% and 75% and in class 4 ≥ 76%.

P-TRG Assessment
A pathologist, blinded to MR and biopsy findings, analyzed in random order all the gross specimens. The rectal segment harboring the neoplasm was examined by sectioning orthogonal to the long axis, obtaining 2-3 mm thick macro section specimens. According to Dowrak-Rodel technique tumor regression was semiquantitatively assessed by the amount of viable tissue versus the amount of fibrosis, ranging from no evidence of fibrosis to a complete response with no residual tumor identifiable, [23].

Neoadjuvant radiochemotherapy
Radiochemotherapy was administered following the standard of care in our hospital [24]. Radiation therapy was performed with a fractioned 3D-conformational technique (45 Gy in 5 weeks) to the whole pelvis. An additional dose of 5.4-9 Gy was administered to the tumor volume in 3-5 days (6-15 MV energy photons).
Chemotherapy was administered through a central venous access (port-a-cath) as follows: 5 or 6 cycles of oxaliplatin (2-hour infusion 50 mg/m 2 ) the first day of each week of radiotherapy followed by five daily continuous infusions of 5-FU 200 mg/m 2 /die. Oxaliplatin infusion was preceded by desamethasone (8 mg) and ondansetron (8 mg) administration. Toxicity was evaluated according to NCI-CTC version 3.0dsds.

Statistical analysis
All continues variables were expressed as median and mean ± standard deviation (SD).
The percentage of fibrosis developed after RCT was automatically determined by the K-means algorithm. An optimal cut-off value of percentage of fibrosis for separating patients into favorable and unfavorable pathologic response groups was identified by receiver operating characteristics (ROC) analysis, with Youden index. This was done plotting the percentage of fibrosis as absolute value and the result of histology, dichotomized as complete responder (CR) or partial/non-responder (P/NR).
Diagnostic performance for the identification of CR was calculated by means of ROC curves. The area under the curve (AUC), sensitivity (SE), specificity (SP), positive predictive value (PPV) and negative predictive value (NPV) were calculated.
Differences in terms of overall survival (OS) and disease free survival (DFS) between favorable and unfavorable pathologic response groups were calculated by using Kaplan-Meier product limit method with univariate log-rank test.
Since the process includes two steps, consisting of manual contouring of the tumor and the automatic quantification of the percentage of fibrosis, the reproducibility was evaluated for each step by means of intraclass correlation coefficient (ICC).
Weighted Kappa statistic was performed to evaluate the agreement between MR-TRG and P-TRG classes.
Statistical analyses were carried out using a commercially available statistical software (MedCalc Statistical Software version 16.4.3, MedCalc Software bvba, Ostend, Belgium and GraphPad Prism version 5.0, GraphPad Software, La Jolla, California, USA). A twotailed P < 0.05 was considered statistically significant.

Author contributions
Marco Rengo: data collection and analysis and manuscript editing.
Simona Picchia: data analysis and manuscript editing.
Simona Marzi: data analysis. Vincenzo Tombolini: supervisor of the radiotherapy. Andrea Laghi: corresponding author and supervisor of the entire study.