Cross-species genomic and epigenomic landscape of retinoblastoma.

Genetically engineered mouse models (GEMMs) of human cancer are important for advancing our understanding of tumor initiation and progression as well as for testing novel therapeutics. Retinoblastoma is a childhood cancer of the developing retina that initiates with biallelic inactivation of the RB1 gene. GEMMs faithfully recapitulate the histopathology, molecular, cellular, morphometric, neuroanatomical and neurochemical features of human retinoblastoma. In this study, we analyzed the genomic and epigenomic landscape of murine retinoblastoma and compared them to human retinoblastomas to gain insight into shared mechanisms of tumor progression across species. Similar to human retinoblastoma, mouse tumors have low rates of single nucleotide variations. However, mouse retinoblastomas have higher rates of aneuploidy and regional and focal copy number changes that vary depending on the genetic lesions that initiate tumorigenesis in the developing murine retina. Furthermore, the epigenetic landscape in mouse retinoblastoma was significantly different from human tumors and some pathways that are candidates for molecular targeted therapy for human retinoblastoma such as SYK or MCL1 are not deregulated in GEMMs. Taken together, these data suggest there are important differences between mouse and human retinoblastomas with respect to the mechanism of tumor progression and those differences can have significant implications for translational research to test the efficacy of novel therapies for this devastating childhood cancer.


Supplementary Tables
Supplemental Table 1. Genome-wide AGDEX analysis of human retinoblastoma 20 and mouse data Supplemental

SUPPLEMENTAL EXPERIMENTAL PROCEDURES Clonal Analysis of Human and Mouse Retinoblastoma
Human (Y79) and mouse (SJmRBL8) retinoblastoma cell lines were seeded in 96 well plates using single-cell deposition by a high-speed cell sorter (Aria, BD Biosciences).
Single cells were allowed to grow and form large clonal populations. Clones were karyotyped using the cytogenetic method described below.

Sister Chromatid Analysis and Cytogenetics
Chromosome spreads were imaged by using a Zeiss Axio Imager Z1. Images were imported into Slidebook 5 software (Intelligent Imaging Innovations, 3i, Denver, CO).
The intensity profile of a line crossing the middle points of the chromatids' long arms was generated. The line profiles were then loaded into Igor Pro software (Wavemetrics Inc., Portland, OR) for fitting with a 2-peak Gaussian equation: A -amplitudes, σ -widths of the peaks.
The difference between centers of the peaks (i.e., x 1 and x 2 ) extracted from the fits was calculated as the distance between chromatids. Such measurements were made for 5 selected chromosomes per cell. Between 6 and 10 cells per sample were analyzed for each 4-or 20-hr colcemid treatment.
Spectral karyotyping (SKY) of chromosomes was performed by using a SkyPaint probe (Applied Spectral Imaging, Vista, CA) per the manufacturer's recommendations. Probes were detected by using Applied Spectral Imaging's concentrated antibodies detection kit (CAD) as described by the manufacturer. Images were acquired with a Nikon fluorescence microscope equipped with an interferometer (Spectra Cube; Applied Spectral Imaging) and a custom-designed filter cube (Chroma Technology Corporation, Rockingham, VT). SKY analysis was performed with SKY View version 2.1 software (Applied Spectral Imaging).

⎥
For clonal studies, the chromosome spread slides were allowed to air dry for optimal banding of the chromosome with trypsin and Wright's stain. The mitotic index was adequate for a comparative chromosome analysis for all cell lines.

Statistical Analysis
Cytogenetics -Fisher's exact test was used to compare the proportion of cells with a normal genome, a doubled genome, and/or a genome with a few whole chromosomal gains or losses across cell lines. The genomic stability of cell line clones was measured by estimating the probability that the clone retained the number of chromosomes present in its parent cell. This retention estimate was computed by first determining the proportion of clone cells with the same number of chromosomes as each parental cell and then averaging these proportions over all the parental cells. The Wilcoxon rank sum test was used to compare the retention probability estimates across two groups.
Sister chromatid cohesion -The inter-chromatid distances varied substantially between individual cells within the same cell population. Therefore, to perform pairwise comparisons across cell populations, the mean inter-chromatid distance was computed for each cell and these within-cell means and the Wilcoxon rank-sum test was used to perform pairwise comparisons of the cell-mean values across cell populations as previously described (Pounds and Dyer, 2008).

ROS Quantitation
Intracellular reactive oxygen species (ROS) were analyzed by flow cytometry using dichlorofluorescein diacetate (DCFH-DA; Sigma) as a specific dye probe which fluoresces upon oxidation by ROS. Cells were seeded at 1·10 5 cells per 35 mm dish. Cells loaded with DCFH-DA (50 μg/ml) with light exclusion for 60 min were washed three times with PBS. Intracellular accumulation of fluorescent DCF-DA was measured (10,000 cells each) by flow cytometry.

aCGH
A genome-wide microarray (G4838A, Agilent Mouse CGH 1x1M), consisting of 60mer in situ synthesized oligonucleotides, was designed and manufactured by Agilent Technologies (Santa Clara, CA). This array contained 963,261 unique biological probes spaced across the mouse genome. Array hybridization was performed according to the manufacturer's recommended protocols. In brief, 1.5 µg of each genomic DNA sample was labeled using the Agilent's ULS Labeling kit. Hybridization was carried out in an Agilent oven at 65°C for 40 hours at 20 rpm, followed by standard wash procedures. The microarray was then scanned in an Agilent scanner at 3 μm resolution, and the array data was extracted using the default CGH settings with LOWESS normalization of Agilent Feature Extraction Software (v10.5.1.1). Tumor DNA was being compared to control DNA from tail of the same animal. The circular binary segmentation method implemented in Partek software (Partek Inc., St. Louis, MO) was applied for copy number variation calling. Aberration calls were assigned to those segments with at least 5 markers (probes), p-value less than 0.05, and mean log2 ratio greater than 0.15 or less than -0.25. In addition, the calls on whole chromosome gain or loss were made for those aberrant segments having size greater than 80% of that of the chromosome, with mean log2 ratio greater than 0.1 or less than -0.12, respectively. The recurrent gains and losses at genomic location level and gene level, except for the sex chromosomes, were further summarized in three categories (whole chromosome, large chromosomal segment (>3Mb), and focal lesion (>20Kb and <3Mb)) among each of the three tumor groups.

Exome Sequencing
Whole genome amplified DNA was prepared using the repli-G WGA kit (Qiagen).
Whole exon enrichment was accomplished using the SureSelect XT Mouse All Exon Kit (Agilent) as directed except 4µg of WGA material was substituted for 3µg of genomic DNA. Sequencing was performed on an Illumina HiSeq 2000 with V3 flow cells, using the paired end 100 cycle protocol, with the samples multiplexed to an equivalent of 3 per lane. The resulting data files were demultiplexed and converted to FASTQ files using CASAVA 1.8.2 (Illumina) and mapped using BWA to the mouse reference build MM9.

Validation of exome sequence data
PCR amplification using TAQGOLD 360 (ABI) of regions containing putative variants was performed with synthetic oligos (IDT) designed (PRIMER 3) to match sequence flanking the putative variants (primers and locations listed in Sup. Table 11). Both tumor and paired normal samples were amplified and pooled separately. Amplicon libraries were prepared using the Nextera XT (Illumina) sample preparation kit and sequenced using a MiSeq following the 150 cycle paired end protocol. Variants were identified by running MiSeq reporter or Bambino software, and scored as either somatic or germline.

AGDEX analysis
Agreement of differential expression within and cross-species retinoblastoma genomics was conducted as described previously (Pounds et al., 2011a;Pounds et al., 2011b). The Affymetrix 430v2 array was used to profile the expression of 45101 probe-sets for 27 RBTKO, 27 MDMX, 26 p53TKO retinoblastoma samples, and 6 wt (p5) control samples.
Additionally, the Affymetrix U133+2 array was used to profile the expression of 54 675 probe-sets for 57 human retinoblastoma samples and 8 control samples (fetal retina, FW18). The expression data were normalized with the MAS 5.0 algorithm.
We used the Affymetrix best-match dataset (available from www.affymetrix.com) to define 79361 pairs of ortholog-matched probe-sets across the two arrays. The best-match dataset was used to define the gene-sets for the mouse array probesets. The 1454 biological process gene-set definitions from the geneset enrichment analysis website (www.broadinstitute.org/gsea) were used for the U133+2 array.
Adaptive permutation with Bmin=100 and Bmax=10000 was used to compute P-values for gene-set statistics. P-values for individual probe-set statistics and the genome-wide dop and cosine statistics were determined using 10000 permutations.
Kruskal-Wallis test was used to compare three mouse strains (RBTKO, p53TKO and MDMX) and Wilcoxon Rank-sum test to compare two groups of RBTKO versus p53TKO, RBTKO versus MDMX, and MDMX versus p53TKO.

Integrated Analysis
First, mRNA expression from 430v2 Affymetrix arrays was RMA summarized in Partek Genomics Suite 6.6 (St. Louis Mo) and compared between the RBTKO tumors and P5 wt retinae controls employing the unequal variance t test and log2ratio. Next, peak scores for methylation data provided by Nimblegen arrays were compared between RBTKO and P5 wt samples by unequal variance t test and mean peak difference for each observation.
Finally, Nimblegen ChIP-chip data was collated by histone mark and evaluated for RBTKO and control samples and scores by the following protocol. For any histone class for each genotype and gene, a signal must be present in at least 2 of the 3 samples for it to be considered present. If histone marks are present only in RBTKO samples then the mark is given a 1. If they are absent in both or present in both tumor and control, then a 0 is given. When only the control peak is present, it is assigned a -1. The calls for each histone class are then combined and scored jointly (Sup. Table 12).
The three data types were joined by refseq id or by genesymbol. Data that did not match to expression data was excluded. Data was filtered by removing all probesets that did not have at least one expression value greater than 5 or |fold change RBTKO vs wt | > 2.
Repeated measures within a gene were removed by selecting the probeset with the greatest |logratio|. Repeated measures in ChIP-chip were removed by retaining the greatest |combined score -4|. For repeated measures of methylation data the greatest mean diff was retained.
Prior to ranking p values for methylation, data were inferred to be 1 with mean diff of zero for any genes symbols where that data was not observed. Likewise, for any gene without ChIP-chip data a combined score of 4 was imposed. In order to achieve a two metric for p-values from expression and methylation scores were produced. The scores were produced by taking the -log10(pvalue) and then multiplying data with logratios below zero by -1. This creates a dispersed two tail p value derived metric. Care was then taken to coherently rank the scores such that expression was ranked high to low, methylation was ranked hypo to hyper methylated and histones were ranked unmarked to highly marked in RBTKO. The pvalue metrics were multiplied by -1 if not positively correlated to their magnitude change. The ranks between expression and the methylation marks were negatively correlated. In this way, each magnitude difference value was two sided with genes of interest at the extremes of the distribution. This process yielded 5 ranks, which were then additively combined by the following formula for each gene: Combined rank = (logratio WT/TKO expression rank + pvalue expression metric rank)/2 + combined histone score rank + (rank mean diff TKO-WT + pvalue methylation metric rank)/2.
The 5 rank lists were each randomly sampled with replacement to create a new combine rank 1M times. For each gene the observed test statistic was then compared to the population of bootstrapped test values to determine the p value. The Benjamini and Hochberg(Benjamini and Hochberg, 1995) method for controlling false discovery rate was implemented in STATA/MP 11.2. This entire process was then applied to the previously published retinoblastoma human data (Zhang et al., 2012), so that the results could be compared (Sup. Fig. 5).

Supplemental Figure 1. Genome-wide AGDEX Analysis for Mouse Retinoblastoma.
Agreement of differential expression (AGDEX) for the mouse comparison (x-axis) and human comparison (y-axis) for each of the 79361 probe-set pairs for (A) RbTKO, (B) MDMX, and (C) p53TKO mouse retinoblastoma models.