Molecular stratification of metastatic melanoma using gene expression profiling: Prediction of survival outcome and benefit from molecular targeted therapy.

Melanoma is currently divided on a genetic level according to mutational status. However, this classification does not optimally predict prognosis. In prior studies, we have defined gene expression phenotypes (high-immune, pigmentation, proliferative and normal-like), which are predictive of survival outcome as well as informative of biology. Herein, we employed a population-based metastatic melanoma cohort and external cohorts to determine the prognostic and predictive significance of the gene expression phenotypes. We performed expression profiling on 214 cutaneous melanoma tumors and found an increased risk of developing distant metastases in the pigmentation (HR, 1.9; 95% CI, 1.05-3.28; P=0.03) and proliferative (HR, 2.8; 95% CI, 1.43-5.57; P=0.003) groups as compared to the high-immune response group. Further genetic characterization of melanomas using targeted deep-sequencing revealed similar mutational patterns across these phenotypes. We also used publicly available expression profiling data from melanoma patients treated with targeted or vaccine therapy in order to determine if our signatures predicted therapeutic response. In patients receiving targeted therapy, melanomas resistant to targeted therapy were enriched in the MITF-low proliferative subtype as compared to pre-treatment biopsies (P=0.02). In summary, the melanoma gene expression phenotypes are highly predictive of survival outcome and can further help to discriminate patients responding to targeted therapy.

p-value <0.01 were kept if present in minimum 80% of the samples. Of the 13955 probes retained, 13744 probes corresponded to RefSeq features. These RefSeq features mapped to 11051 unique genes and the most varying probe for each gene was kept in the data. In order to compare gene expression levels across samples, the 11051 probes were mean centered. The centroids from Harbst et al. was used to classify the samples into any of the four recently identify melanoma subtypes [2]. In detail, the nearest centroid classification was used to classify a sample according to the maximum correlation of the sample with the four centroids (Pearson's correlation > 0.2, otherwise set as "unclassified").

Network construction and module expression
A gene co-expression network was created based on correlations between the genes in the data set and five modules of highly connected genes were extracted [3]. Briefly, the initial network was created by connecting nodes (genes) by edges (representing correlations) using a correlation cutoff of 0.6 and only including genes with five or more neighbors. Based on gene ontology analysis and published associations with melanoma-specific tumor biology, the five modules were further entitled as: the micropthalmia-associated transcription factor (MITF) module, including genes such as MITF, MLANA, SILV, CDK2 and ETV5; the cell cycle module, containing genes related to the M phase, such as CCNB2 and CENPF; the stroma module consisting of extracellular matrix-related genes; the immune response module representative of genes enrolled in tumor infiltrating lymphocyte and antigen presentation; and finally also an interferon module consisting of genes responsive of interferon treatment. The resulting gene expression landscape consisting of 394 genes scattered around the five core modules was visualized in Cytoscape ( Figure 1B) [4]. Module activity scores were calculated for all samples, i.e. mean expression values of all genes representing the particular modules.

SureSelect deep targeting sequencing
Target enrichment design, library preparation and data processing were performed as previously described [5]. Briefly, 1697 frequently mutated cancer-associated genes were selected based on information in the COSMIC database and in the literature to create the SureSelect target enrichment design (5.5 MB coverage by 120 bp-long tiling probes). DNA samples were sheared, end-repaired and given a unique adapter (sample specific barcode) before PCR-amplification. The adapter-ligated fragments were subjected to target enrichment (SureSelect, Agilent), PCR-amplification and sequencing according to Illumina Paired-End Sequencing Library Protocol on a HighSeq2000.

Principal component analysis (PCA)
PCA was used to assess that the variation in the gene expression data was due biological factors and not systematic experimental artifacts. Briefly, PCA functions by reducing the dimensionality of the data by extracting principal components (PCs), i.e. uncorrelated linear vectors further explaining the variation in the data [6]. The PCs are then tested for association with the biological and technical variables to elucidate their impact on the variation in the data. We performed PCA by using the swamp package in R [7]. Results are shown as a heatmap of the associations between PCs and sample annotations ( Figure S4).

Immunohistochemistry (IHC)
A subset of the melanoma tumors (n = 59), representing the four major gene expression phenotypes, was selected for immunohistochemistry. After formalin-fixed paraffinembedding, sections (4 µm) were captured and prepared according to standard procedures. Staining was performed using antibodies against MITF (clone: C5, Thermo  The data was quantile-normalized between samples using limma [8], given an offset of 32, capped at 65,000, log-transformed and median-centered over genes. As described in the above section "Gene expression analysis and profiling", samples were classified into any of the four melanoma subtypes using the nearest centroid classification (Pearson's correlation > 0.1, otherwise set as "unclassified"). The 438 available centroid genes found in the data were monitored for batch effects using TCGA's technical annotations.
The gene expression phenotypes and their clinical relevance were evaluated in three independent external datasets obtained from the Gene Expression Ominbus (GEO) repository (GSE50509 [10]; GSE61992 [11]; GSE35640 [12]). In the first study, 21 patients were treated with the BRAF inhibitors (BRAFi) dabrafenib or vemurafenib and evaluated for best objective response (RECIST response, %), progression-free survival (PFS) and screened for resistance mechanisms [10]. In total, 50 specimens were being analyzed in this study including 21 pre-treatment tumors and 29 post-relapse tumors. In the second study, 10 patients were treated with a combination of BRAFi (dabrafenib) and the MEK inhibitor (MEKi) trametinib, and analyzed for similar endpoints as in the prior study. However, gene expression data was only available for 9 patients, thus including 19 specimens in total (9 pre-treatment tumors and 10 post-relapse tumors) [11]. In the third study, 56 patients were analyzed for a pre-treatment gene expression signature predictive of response to MAGE-A3 immunotherapeutic treatment and each patient were assigned a status of responder or non-responder [12].
Genome-wide gene expression microarray was performed using Illumina Human HT-12 v4 BeadChip arrays (Rizos' and Long's studies) and Affymetrix HG-U133.Plus 2.0 system (Ulloa-Montoya's study) [10,12]. We performed similar preprocessing of the three already normalized datasets obtained from GEO including addition of a probe presence filter, log2 transformation and KNN-imputation (the latter step was only performed in Rizos' and Long's data). In detail, for samples run on the Illumina platform, we removed probes when less than 80% of the samples had a detection p-value < 0.01, whereas in the Affymetrix data the cutoff was set to 0.1.

Combining datasets using Distance Weighted Discrimination (DWD)
DWD is a well-known classification method, which has also proven to be very useful in microarray datasets to compensate for systematic biases [13,14]. Based on the findings that molecular portraits are conserved across platforms [15], we combined our dataset with the above external datasets and applied DWD adjustment before performing nearest centroid classification. Initially, all external datasets were preprocessed as described in the above section "Validation sets" and pairwise analyzes were performed between our dataset and the external datasets (pairwise merging).
Briefly, common centroid genes across the itemed datasets were extracted (Our/Rizos, 260 genes; Our/Long, 281 genes; Our/Ulloa-Montoya, 220 genes). Individual datasets (genes not mean centered) were combined and adjusted in a pairwise manner using the InSilicoMerging package in R calling the "DWD" method. The genes in the pairwise combined and adjusted data were mean centered across all the data and further classified. In order to check for remaining source biases, the data were visualized after DWD adjustment in a multidimensional scaling (MDS) plot ( Figure S5) but also analyzed using hierarchical clustering to conclude sample dispersion in a source independent manner (results not shown).

All statistical analyzes were performed using R (two-sided tests). Fisher's exact test and
Kruskal-Wallis test were performed to compare gene expression phenotypes with clinical characteristics. All survival analyses were made using the survival package in R.
In addition, we performed non-parametric Kruskal-Wallis tests to analyze the module activity scores against clinical characteristics.     p-value) less than 0.01 or 0.05 for each sample.

Supplementary Figure 5
Visualization of potential sample source bias after pairwise merging of datasets using inSilicoMerging (R package) calling DWD adjustment. The DWD adjustment was performed using common centroid genes across the itemed datasets (Our/Rizos, 260 genes; Our/Long, 281 genes; Our/Ulloa-Montoya: 220 genes).