Quantitative radiomic profiling of glioblastoma represents transcriptomic expression

Quantitative imaging biomarkers have increasingly emerged in the field of research utilizing available imaging modalities. We aimed to identify good surrogate radiomic features that can represent genetic changes of tumors, thereby establishing noninvasive means for predicting treatment outcome. From May 2012 to June 2014, we retrospectively identified 65 patients with treatment-naïve glioblastoma with available clinical information from the Samsung Medical Center data registry. Preoperative MR imaging data were obtained for all 65 patients with primary glioblastoma. A total of 82 imaging features including first-order statistics, volume, and size features, were semi-automatically extracted from structural and physiologic images such as apparent diffusion coefficient and perfusion images. Using commercially available software, NordicICE, we performed quantitative imaging analysis and collected the dataset composed of radiophenotypic parameters. Unsupervised clustering methods revealed that the radiophenotypic dataset was composed of three clusters. Each cluster represented a distinct molecular classification of glioblastoma; classical type, proneural and neural types, and mesenchymal type. These clusters also reflected differential clinical outcomes. We found that extracted imaging signatures does not represent copy number variation and somatic mutation. Quantitative radiomic features provide a potential evidence to predict molecular phenotype and treatment outcome. Radiomic profiles represents transcriptomic phenotypes more well.


INTRODUCTION
Glioblastoma (GBM) is the most common and fatal type of glioma, accounting for 40% of all malignant primary brain tumors and showing a median survival of 14 months despite treatment by surgical resection followed by concurrent chemoradiotherapy. This dismal prognosis is largely attributed to the cancer heterogeneity of GBM [1]. Thus, great effort has been made to advance technology in the fields of molecular and genomic biology in order to better characterize individual cancer.
Quantitative magnetic resonance (MR) imaging can be defined as the extraction of quantifiable features from advanced magnetic resonance images and quantitatively describes the tumor phenotypes [2,3]. At the initial diagnosis or during treatment, quantitative imaging biomarkers can represent a biomarker of normal biological or pathological processes or treatment response to therapeutics [2]. To date, MR-based morphological measurements using T1 contrast-enhancement or T2 FLAIR imaging are well-known and widely accepted quantitative methods under some clinical situations such as Alzheimer disease, osteoarthritis, and a variety of cancers [2]. Several studies have demonstrated that MR imaging signatures can indicate clinical outcomes in high-grade gliomas [4]. However, few studies have demonstrated that physiologic biomarkers derived from conventional imaging data are closely associated with clinical outcome or have investigated their association with the underlying genetic messages [5]. Recently, rapid developments in advanced imaging analysis for tumor classification or quantification of tumor vascularity and permeability have been introduced. In radiographic studies of glioblastoma, physiologic imaging biomarkers are gradually being uncovered and are believed to be crucial for predicting the treatment response of the tumor or natural course of tumor progression.
Here, we report a comprehensive radiogenomic study of glioblastoma based on integration of quantitative largescale radiomic profiling and genomic data. We integrated structural and physiologic MR imaging data with multiplatform genomic data from 65 glioblastomas. We also evaluated quantitative large-scale radiomic profiling to identify statistically significant associations between genomic features and radiomic signatures in glioblastomas.

Patient characteristics
From May 2012 to June 2014, we retrospectively identified 65 patients with treatment-naïve GBM with available clinical information from the Samsung Medical Center data registry. The median age of the patients was 58.7 years (range, 29-74 years), and the population was composed of 35 males and 30 females. 46 of 65 patients (70.8%) were expired at the investigation and median overall survival of the total population was 13.2 months (95% CI 9.3-17.6 months). The metadata for the 65 GBM samples are provided in Table 1.

Consensus clustering identifies three subtypes of GBM
We performed a quantitative study to investigate the association of genomic features such as gene expression or copy number variation with 82 radiomic phenotypes (Table  1). Table 1 summarizes the name and brief description of all 82 radiomic profiles. A total of 5,330 quantitative MR imaging features (representing first-order statistics, size, and volume) were extracted from volume masks to maximize characterization of the tumor (82 x 65, no of features x no of patients). Using a high-throughput approach, we identified radiomic signatures consisting of 82 MR imaging features that segregated patients with GBM into three distinct consensus clusters ( Figure 1A). The similarity matrix formed by Pearson's correlation coefficients of the 65 samples suggested three robust stable clusters. Based on the gene functions of discriminatory mRNA transcripts, radiomic clustering represented interpretation in the context of existing molecular subclassification of GBM.

Association between genetic pathway and radiomic phenotypes
For each of the 82 radiomic phenotypes, we investigated the Gene Set Enrichment Analysis (GSEA) for identifying the KEGG pathways whose transcriptional changes associated with the change of tumor radiomic phenotype. The gene-level statistics used to characterize the association between gene expression and radiomic phenotype were obtained by the p-value resulted from the Spearman rank correlation test. ClaNC, a nearest centroidbased classifier that balances the number of genes per class, identified signature genes for all three subtypes. An 840 gene signature (210 genes per class), was established from the smallest gene set with the lowest cross validation (CV) and prediction error. As a proof of concept, we showed that a significant number of genes overexpressed in radiomic subgroup 1 were associated with anion channel activity, peroxisome, and the classic subtype of GBM suggested by Verhaak et al. [1]. Group 2 was characterized by high expression of genes associated with mitosis, cell cycle, ribosomes, and the proneural or neural subtype. The third group showed overexpression of extracellular matrix molecules, defense response, immune signaling molecules, and the mesenchymal subtype ( Figure 1B).

Sample-specific differentially expressed genes
We investigated the correlation matrix between radiomic data and gene expression profiles in the heatmap (Figure 2A). We found that each radiomic www.impactjournals.com/oncotarget profile representing volume, blood flow or volume and diffusion had his own molecular characteristics.
To identify pathways that are differentially correlated with radiomic phenotypes, using these four gene sets, a single-sample GSEA (ssGSEA) enrichment score was calculated for all samples. We used ssGSEA projection as a hypothesis-generating gene set identification tool. Single sample GSEA identifies biologically relevant gene sets that correlate with a radiomic phenotypes by estimating pathway activities of the radiomic phenotypes in this study. We performed an unsupervised clustering analysis to associate transcriptional activities of all genetic pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database with radiomic phenotypes ( Figure  2B). As a result, we found each physiologic or structural MR imaging features showed its own transcriptomic pathway. Specifically, pathway transcriptional activities were associated with radiomic phenotypes with statistical significance. Radiomic features related to CBV and CBF showed upregulation of Wnt, PDGF, EGFR, ALK, Rb, VEGF, and Myc pathways and downregulation of PTEN pathway (p < 0.01). In contrast, radiomic features representing volumetric parameters based on T1CE, T2FLAIR, or ADC demonstrated an inverse relationship with radiomic features representing CBV or CBF.

Associations between somatic gene mutations/ CNV and radiomic phenotypes
We compared measurements of a radiomic phenotype for patients harboring somatic mutations in a gene versus those without. For each mutation/ CNV and each radiomic phenotype, a linear regression was used to fit the value of the radiomic phenotype and examine whether the mutation has a significant effect on the phenotype. However, the number of patients  with exactly the same mutation was quite small and might not provide reliable statistics. Compared to the transcriptional activities of genetic pathways, the CNVs and representative somatic mutations showed far fewer statistically significant associations (Figure 3).

DISCUSSION
Radiogenomics investigates the association between imaging phenotypes and genomic features and is an emerging field in cancer medicine. As an inexpensive and noninvasive method, quantitative radiomics can provide the great promise of personalized medicine by inferring underlying genomic phenotypes. It can also serve as an imaging biomarker to characterize the underlying genomics. If the radiomic data can represent the temporal-spatial intra/inter tumoral heterogeneity, repeated MR imaging would be a powerful tool to show the molecular status of a tumor sample using a noninvasive technique [9]. In the field of radiologic studies of glioblastomas, physiologic MR imaging and structural MR imaging provide invaluable information about the tumor characteristics. For instance, dynamic susceptibility contrast (DSC) perfusion MR imaging is commonly used in clinical practice to interpret the tumor microvasculature for the purpose of estimation of rCBV or MTT for cerebral blood flow or volume. In addition, apparent diffusion coefficient (ADC) imaging is another important MR imaging biomarker that represents the cellular density and the movement of free fluid water. Interpretation of advanced physiologic imaging requires specialized well-trained radiologists; however, interpretation is still subjective and can produce potential inter-observer variability. To overcome this drawback, attempts to quantify the imaging data have aimed to identify and characterize imaging biomarkers, but prediction of the genomic characteristics remains challenging.
To date, several studies have reported that imaging features represent the clinical outcome in patients with malignant glioma [4,[9][10][11][12][13][14][15][16][17]. Most studies have investigated clinical application in glioblastomas by quantification of structural imaging such as volume or shape [13,15,[18][19][20][21][22][23], while comprehensive integration of physiologic imaging such as diffusion or perfusion imaging, clinical outcome, and genomic profiles has been rarely reported [4,9,16,24,25]. Integrated analyses using gene expression, copy number, methylation, and somatic mutation patterns have demonstrated distinct GBM subtypes, possibly related to the treatment response and clinical outcome [1,26]. Several authors have demonstrated that specific molecular subtypes in glioblastoma correlate with certain imaging traits [22,27]. We present a comprehensive radiogenomic study of glioblastoma based on the integration of multi-omics molecular data from our genome data bank with MRIbased radiomic data for 65 glioblastomas. We found that radiomic data could provide information about molecular subtypes of GBM based on the transcriptomic profile. In addition, the radiomic group associated with mesenchymal subtypes showed a worse prognosis than other groups. Compared with the close relationship between transcription level and radiomic phenotype, we identified far fewer statistically significant associations for CNVs and gene somatic mutations. This finding was compatible with the results of a breast cancer study by Zhu et al. [28]. Genetic mutations are further upstream in the functional activities of the cellular system, and Zhu et al. suggested that gene expressions are therefore more closely associated with phenotype in the process of genetic events influencing phenotype development. On unsupervised clustering analysis between radiomic features and transcriptomic profiles, multiple parameters representing CBV and CBF had similar transcriptomic patterns and parameters, indicating that MTT and TTP have their own genomic signatures. Volumetric radiomic data had a similar genomic pattern to ADC. This finding demonstrates that specific radiomic phenotypes have similar genomic signature patterns.
However, our study has some limitations for interpreting the association between genomic signature and imaging phenotype. Genomic data in this study were generated using a single tissue sample from each primary tumor. A single biopsy sample of a tumor cannot represent a heterogeneous cell population; therefore, the genomic data of a single sample only partially reflects the overall genomic landscape of the entire tumor. In the future, topographic analysis of radiomic data combined with multiple biopsies will provide more intensive and accurate information. Through the study of radiogenomics, we aimed to identify good surrogate radiomic features that can reveal genetic changes of tumors, thereby establishing noninvasive means for monitoring tumor progression. We believe that our initial results provide the motivation to investigate the relationships between the multi-layer molecular system of the tumor and the various quantitative radiomic phenotypes of glioblastoma.

Patient enrollment
Between May 2012 to June 2014, we retrospectively identified 65 patients who met the following criteria: (1) previously untreated and histologically confirmed grade IV GBM according to the World Health Organization (WHO) classification; (2) available clinical variables including patient demographics; (3) available genomic data such as transcriptome and exome; and (4) available preoperative MR imaging data consisting of pre-contrast axial T1-weighted (T1), post-contrast axial T1-weighted (T1CE), T2-weighted fluid attenuation inversion recovery (T2FLAIR) images, MR-diffusion weighted imaging (DWI) for assessment of apparent diffusion coefficient (ADC), and MR-perfusion weighted imaging (PWI) for assessment of relative cerebral blood flow (rCBF), relative cerebral blood volume (rCBV), mean transit time (MTT), and time to perfusion (TTP). These data were obtained from the medical records at Samsung Medical Center, Seoul, Korea. Patients with recurrent GBM, secondary GBM, grade III glioma, or previous history of treatment were excluded from this study. All tissue samples were collected with written informed consent under a protocol approved by the Institutional Review Board (IRB) of Samsung Medical Center (2010-04-004, Seoul, Korea). All patients were treated by the concomitant chemo-radiotherapy followed by adjuvant Temozolomide 6 cycles. Enrolled patients in this study were not enrolled for any clinical trials.

MRI data acquisition and preprocessing
All MR imaging was preoperatively conducted on a 3-T scanner, and post-contrast images were acquired 5 minutes after injection of contrast agent. The standard MRI protocol included axial T1-weighted imaging, T2-weighted imaging, fluid-attenuated inversion recovery (FLAIR), perfusion-weighted, and diffusion-weighted MR images. Diffusion-weighted images (DWI) were acquired before injection of contrast and were obtained with TE/TR 80 ms/3 s, section thickness 5 mm with 1 mm intersection gap, matrix size 164 × 162, and FOV 24 cm using monopolar spin-echo echo-planar preparation. Apparent diffusion coefficient (ADC) images were calculated from acquired DWI with b-values of 0 s/mm2 and 1,000 s/mm2, and dynamic susceptibility contrast perfusion-weighted Values are number (%), median (range), or mean ± SD.
images (PWI) (TR/TE 1720/35 ms, flip angle 40°, section thickness 5 mm, acquisition matrix 128 × 128, 50 volumes, acquisition time 1 minute 30 seconds) were performed. ADC maps were generated using an EWS Workstation (Philips Healthcare), and dynamic susceptibility contrast perfusion images were processed using a dedicated software package (nordicICE; NordicNeuroLab, Bergen, Norway). All Digital Imaging and Communications in Medicine (DICOM) images were adjusted for spatial smoothing, noise filtering, and motion correction as supported in the dedicated software. Gamma variate fitting was applied to avoid a recirculation effect and dynamic curves were corrected mathematically to reduce contrast agent leakage effects.

Volumetric segmentation and ROI analysis
For region of interest (ROI) analysis, T1CE and T2FLAIR images were evaluated to define tumor margins by manual segmentation. The image processing software package (nordicICE; NordicNeuroLab, Bergen, Norway) was used for the manual ROI-based image segmentation and volume measurement. In detail, two ROI were obtained from both T1CE-and T2FLAIR-based image in each patient. An ROI was drawn on each slice where the contrast-enhancing area on the T1CE image and high signal intensity area on the T2FLAIR image were visible. Each tumor lesion was segmented by two independent neuro-radiologists (ST Kim and HY Kim) for both T1CE and T2FLAIR modalities. Regions with central necrotic area and normal vascular structures were subtracted from ROIs. Single ROIs from each case were selected for further analysis with consensus between reviewers. Volumes of masked ROIs were also validated with a fully automated method using BraTumIA v1.2 [6], which shows results consistent with previous studies [7,8]. All MR images were transferred to a high-performance cluster server for post-processing. After post-processing of perfusion and diffusion-weighted images to generate rCBV and ADC maps from both T1CE and T2FLAIR images (5%, 25%, 75%, 95% of total intensity, mean, standard deviation and median value), histogram statistics from voxels in the ROI were extracted. Several volumetric parameters (number of pixels in mask on ROI, mask area (cm 2 ) on ROI, mask volume (mL) on ROI from T1CE and T2FLAIR) were separately investigated, obtaining segmental volumes of contrast-enhancing tumor region and edematous or infiltrative peritumoral T2FLAIR region, and other relevant volumetric parameters. Overall, 82 volumetric and physiologic parameters were investigated in this study (Table 2).

Tissue specimens
Following receipt of informed consent in accordance with the appropriate IRB, glioblastoma specimens were obtained from patients undergoing surgery. For genomic analysis, tumor specimens that were diagnosed by the pathologists were snap-frozen and preserved in liquid nitrogen. Genomic DNA and mRNA were extracted using the DNeasy kit and the RNeasy kit (Qiagen), respectively for whole exome and transcriptomic sequencing.

Next-generation RNA sequencing (RNA-Seq)
RNA-Seq was performed for all 65 patients. RNA-Seq-based transcriptome profiling was performed by the Samsung Institute for Intractable Cancer Research (Seoul, Korea) using the Illumina TrueSeq RNA Sample Prep kit. Trimmed sequenced reads of 30 nucleotides (nt) were mapped on hg19 using GSNAP and the resulting alignment SAM files were summarized into BED files using SAMtools and bedTools (bamToBed). DEGseq R package was used to calculate RPKM (Reads Per Kilobase of transcript per Million reads) values from the hg19 refFlat file downloaded from the UCSC genome browser and the BED files during the per nucleotide coverage analysis. Log transformation was applied to correct for the skewed distribution.

Somatic mutation
The sequenced reads in FASTQ files were aligned to the human genome assembly (hg19) with Burrows-Wheeler Aligner version 0.6.2. Before mutation calling using SAMtools, Picard version 1.73, and Genome Analysis ToolKit (GATK version 2.5.2.), the initial alignment BAM files were subjected to conventional preprocessing We used MuTect (version 1.1.4), SomaticIndelDetector (GATK version 2.2) and Variant Effect Predictor (VEP).

Copy number variation
The ngCGH python package was used to generate aCGH-like data from whole exome sequencing (WES). Matching normal samples were used as the reference for calculating copy number variations in tumors. Segmentation and copy number calculation of each gene were performed.

Data analysis
Unsupervised hierarchical clustering was performed on the radiomic phenotypes extracted from the quantitative MR imaging data using NordicICE software program. Before clustering analysis, all radiomic phenotypes were standardized to have a zero mean and a unit standard deviation for clustering analysis and producing heatmap. We then performed consensus clustering (Pearson's correlation coefficients) on the radiomic phenotypes. We selected k = 3 as it gives sufficiently stable similarity matrix. Putative candidates of which radiomic phenotypes correlated to mRNA gene expression were identified to determine