Genome-wide multi-omics profiling of the 8p11-p12 amplicon in breast carcinoma

Genomic instability contributes to the neoplastic phenotype by deregulating key cancer-related genes, which in turn can have a detrimental effect on patient outcome. DNA amplification of the 8p11-p12 genomic region has clinical and biological implications in multiple malignancies, including breast carcinoma where the amplicon has been associated with tumor progression and poor prognosis. However, oncogenes driving increased cancer-related death and recurrent genetic features associated with the 8p11-p12 amplicon remain to be identified. In this study, DNA copy number and transcriptome profiling data for 229 primary invasive breast carcinomas (corresponding to 185 patients) were evaluated in conjunction with clinicopathological features to identify putative oncogenes in 8p11-p12 amplified samples. Illumina paired-end whole transcriptome sequencing and whole-genome SNP genotyping were subsequently performed on 23 samples showing high-level regional 8p11-p12 amplification to characterize recurrent genetic variants (SNPs and indels), expressed gene fusions, gene expression profiles and allelic imbalances. We now show previously undescribed chromothripsis-like patterns spanning the 8p11-p12 genomic region and allele-specific DNA amplification events. In addition, recurrent amplification-specific genetic features were identified, including genetic variants in the HIST1H1E and UQCRHL genes and fusion transcripts containing MALAT1 non-coding RNA, which is known to be a prognostic indicator for breast cancer and stimulated by estrogen. In summary, these findings highlight novel candidate targets for improved treatment of 8p11-p12 amplified breast carcinomas.


INTRODUCTION
Molecular profiling of cancer genomes and epigenomes with microarray and next-generation sequencing (NGS) technologies has, in recent years, provided a more in-depth overview of diseasespecific aberrations, thereby identifying novel targets for treatment. These complex landscapes of somatic structural rearrangements and epigenomic modulations are comprised of a composite of driver and bystander

Research Paper
Oncotarget 24141 www.oncotarget.com aberrations either acquired via chromothripsis or accumulated over time [1,2]. Nevertheless, certain structural variants (SVs) confer selective advantage because they contain one or more genes with tumorigenic potential [3]. One such recurrent genetic aberration is DNA amplification of the 8p11-p12 genomic region, which has clinical and biological implications in multiple malignancies [4,5]. In breast carcinoma (both familial and sporadic cases), the 8p11-p12 genomic region is a frequent target for DNA amplification and loss, resulting in the deregulation of multiple putative "driver" genes and aggressive tumor features [6][7][8][9][10]. However, the 8p11-p12 genomic region spans over 10 Megabases (Mb) and encompasses over 50 known genes, many of which have been shown to be activated by more than one molecular mechanism, i.e. translocation and DNA amplification [11]. Therefore, the aggressive phenotype imposed by the 8p11-p12 amplicon may be the result of one or more interacting genes in this genomic region and/ or crosstalk with other genetic and epigenomic aberrations [12]. However, little is known about the type and extent of other structural rearrangements (translocations and fusion genes) and genetic variants (indels and substitutions) found in 8p11-p12 amplified tumors and their contribution to aggressive features.
In this study, we evaluated array-CGH and gene expression microarray data for 229 breast cancer patients in relation to clinicopathological features and clinical outcome to identify putative oncogenes and tumor suppressors associated with 8p11-p12 amplification [12]. Furthermore, we performed RNA sequencing (RNAseq) in conjunction with SNP genotyping analysis for 23 amplified tumors to identify common chromosomal rearrangements and genetic variants.
Nine minimal common 8p11-p12 amplification peaks identified using DNA copy number analysis We recently described the effect of genetic and epigenetic crosstalk in breast carcinomas harboring DNA amplification on chromosome 8p11-p12, suggesting that aberrant DNA methylation patterns on chromosome 8q may also contribute to the aggressive phenotype [12]. To further define the role that 8p11-p12 amplification may have on breast cancer pathophysiology, we examined genomic profiling data for 229 invasive breast carcinomas and transcriptomic data for 150/229 samples, as previously presented [12,[14][15][16]. The array-CGH copy number analysis identified 83 samples (36%) with recurrent CNAs on chromosome bands 8p11-p12 (47 high-level amplifications, 20 low-level gains and 16 losses) and 146 samples (64%) with neutral DNA dosage on chromosome 8p11-p12. Furthermore, the amplicon contained five major sub-regions mapping to a 12.0 Mb region spanning 31.9-43.9 Mb (from telomere to centromere on the 8p arm, according to the hg17 build 35 reference assembly of the human genome), which was further refined to nine minimal common amplification peaks (range, 41.2-377.4 kb) from 34.3-42.5 Mb (Figure 2A-2B and  Supplementary Table 2).
One of the smallest peaks and notably the most common mapped to a 67.9 kb region spanning the WHSC1L1 gene on chromosome band 8p12 (amplified in 32/47 cases). Dual-color interphase FISH performed using a contig of 58 overlapping BAC clones spanning the 8p11-p12 genomic region (Supplementary Table 3) demonstrated extensive intra-and intertumoral heterogeneity in amplified cases ( Figure 2C). Intratumoral heterogeneity frequently ranged from neutral DNA copy number (two copies per FISH probe) to highlevel amplification (up to 50 copies per FISH probe). Two main types of hybridization patterns were observed, e.g. hybridization signals that were clustered in set positions in the interphase nuclei or scattered signals, suggesting the presence of homogenously staining regions at 8p11-p12, translocation events with DNA sequences from chromosome 8p on other chromosomes, double minutes containing sequences from chromosome 8p and/ or aneuploidy of chromosome 8. www.oncotarget.com Allele-specific copy number analysis reveals elevated DNA copy number for one allele at the 8p11-p12 locus Whole-genome SNP genotyping analysis was then performed, followed by genome-wide allele-specific copy number analysis with the ASCAT algorithm for 23 samples (16 Luminal B/HER2-, two Luminal B/HER2+, four HER2/ER-, and one Basal-like subtype) harboring high-level regional 8p11-p12 amplification. High-level amplification of the 8p11-p12 genomic region was shown in the ASCAT profiles for all but one case (T7631). Furthermore, 48% of cases were classified as aneuploid (ploidy > 2.7) with on average 40% nonaberrant cell admixture. Allelic imbalances spanning the 8p11-p12 region (31.9-43.9 Mb) were detected (Pearson correlation = −0.21), where the minor allele (least frequent allele) frequently displayed CN = 0, CN = 1 or CN = 2, while the overall copy number ranged from CN = 1 to CN > 10 ( Figures 3-4). These findings suggest that certain parts of chromosome 8p11-p12 were amplified on the major allele, whereas neutral DNA dosage or DNA loss were found on the minor allele.
A review of the array-CGH data showed that only one-third of fusion breakpoints could be attributed to DNA copy number gains and losses. However, SNP genotyping revealed that the majority of fusions occur at DNA breakpoints in addition to allelic imbalance on almost all chromosomes ( Figure 5). Intrachromosomal fusions, in particular, frequently spanned regions of highlevel amplification. As expected, the majority of recurrent fusions spanning genomic regions with DNA copy number changes included fusions with the MALAT1, NEAT1, and TRPS1 genes, but also NDUFC2-KCTD14-TMSB15A Consequently, the Oncofuse Bayesian classifier pipeline classified 83/1,245 (6.7%, range 0-20) fusion transcripts (65 fusion genes) as "driver" fusion events with oncogenic properties, including the COL1A2-TRPS1 and MAST2-PRKCA fusions that were identified in more than one tumor and several other known breast cancerrelated genes, e.g. BCL2, ESR1, ERBB2, IGFBP5, TRPS1 (Supplementary Table 4). One or both of the gene fusion partners (27/65 and 19/65 fusion transcripts, respectively) also frequently exhibited high gene expression patterns irrespective of DNA copy number in samples harboring "driver" fusion events (Supplementary Figure 2). Pathway analysis showed that these fusion transcripts play a pivotal role in cancer, cell cycle, DNA replication, recombination and repair, cell death and survival, cellular growth and proliferation, cellular movement, ErbB signaling, PTEN signaling, and DNA double-strand break repair by homologous recombination (P < 0.05).
The mutational landscape was also assessed in 10 non-amplified breast carcinomas from The Cancer Genome Atlas (TCGA). As expected, the mean number of genomic variants was significantly lower in the nonamplified TCGA samples (10,822.2 ± 1,113.0; range, 4,073-15,688) than the 8p11-p12 amplified tumors due to the use of whole transcriptome sequencing in the current investigation and mRNA-seq for the TCGA dataset. However, there was no significant difference in the mean number of exonic variants (358.7 ± 43.1; range, 216-677, in the TCGA cohort), the type of exonic variants or singlenucleotide substitutions identified in the two study groups (Supplementary Figure 3B and 3D). The distribution of indels and SNVs in coding regions was also evaluated in the non-amplified samples to identify exonic variants associated with 8p11-p12 amplification. Frameshift insertion in HIST1H1E (encoding p.Ala167fs) and nonsynonymous SNV in UQCRHL (encoding p.His56Arg) were only present in samples harboring 8p11-p12 amplification and resulted in mutation-dependent changes in gene expression levels. Consequently, neither of the two transcripts have been previously reported in the Catalogue Of Somatic Mutations In Cancer (COSMIC) database ( Figure 6A).
Sequence Ontology analysis was then performed to identify potential deleterious genetic variants predicted to have a disruptive effect in the protein by resulting in protein truncation, gain/loss of function or nonsense mediated decay, i.e. frameshift insertion, frameshift deletion, frameshift block substitution, stopgain, or stoploss. In total, 33 potential deleterious genetic variants were identified in ≥ 20% of the amplified tumors; none of the 33 genetic variants were found in the COSMIC database ( Figure 6B). Pathway analysis showed that the genes associated with the genetic variants play a pivotal role in cancer, cell cycle, cell death and survival, cell morphology, and gene expression. To distinguish whether the 33 deleterious genetic variants were 8p11-p12 amplification-specific, the mutation frequency was then  AP2M1, BECN1, C3, HIST1H1E, INPP5B, MAGI3, MTRNR2L8, MTUS1, PIMREG) had significantly higher mutation rates in 8p11-p12 amplified samples, of which 7/10 genetic variants were only found in amplified samples (P < 0.05). The 10 genetic variants were further evaluated to assess the effect of 8p11-p12 amplification status and/  AP2M1, BECN1, C3, HIST1H1E, INPP5B,  MAGI3, MTRNR2L8, MTUS1, PIMREG) were dependent on amplification status (8p11-p12 amplified versus nonamplified samples). Three genetic variants (BECN1 frameshift deletion, MTUS1 frameshift insertion, PIMREG frameshift deletion) showed mutation-dependent (mutated versus wild-type samples) changes in gene expression in 8p11-p12 amplified samples (P < 0.05; Figure 6C).

DISCUSSION
We report that few genes spanning the 8p11-p12 amplicon in breast carcinoma are involved in genetic mutations and DNA methylation modifications, suggesting that DNA amplification is the primary mode of gene activation for this genomic region [12,[17][18][19]. In this study, an integrative analysis with multi-omics screening identified previously unknown recurrent genetic features, ranging from chromothripsis events to fusion transcripts, associated with 8p11-p12 amplification in breast carcinoma. Using array-CGH and SNP genotyping data, we illustrated that invasive breast tumors frequently contain complex rearrangements on one or two chromosomes (chromothripsis-like patterns) spanning regions of DNA amplification, including the 8p11-p12 genomic region. Further examination of the 8p11-p12 amplicon showed that DNA amplification was restricted to only one of the alleles, indicating allele-specific amplification events. DNA copy number analysis revealed recurrent chromothripsis-like events spanning the 8p11-p12 amplicon, including an amplification peak comprised of the histone lysine methyltransferase WHSC1L1 (Wolf-Hirschhorn syndrome candidate 1-like 1), also known as NSD3. WHSC1L1 has been studied extensively to better understand its role in 8p11-p12 amplification in breast carcinoma and other malignancies [17,[20][21][22][23]. At least two co-expressed WHSC1L1 isoforms (the long and short isoforms) compete for binding sites on target proteins [24]. The full-length WHSC1L1 protein contains several functional domains with methyltransferase and protein binding activity, which play a pivotal role in chromatin modification and regulation of transcription by methylating lysine-27 of histone H3 (epigenetic tag denoting inhibition of transcription). In contrast, the short isoform contains a single PWWP-domain (proline-tryptophan-tryptophanproline) that may be involved in cell growth. In the absence of 8p11-p12 amplification, Zhou et al. showed an increase in cell proliferation and cell invasion in the MDA-MB-231 breast cancer cell line following WHSC1L1-long knockdown [24,25]. These findings are in contrast with results found in breast cancer cell lines harboring the 8p11-p12 amplicon, where cell proliferation decreased after WHSC1L1 depletion [23]. In addition, it is still unclear whether WHSC1L1 overexpression really does play a role in cell cycle regulation of G2/M transition by activating CCNG1 and NEK7 [26,27]. WHSC1L1 was also found to regulate methylation of lysine-36 on histone 3 and transcriptional elongation by binding to LSD2 (a  (denoted 8p amp) and none of the TCGA breast carcinoma samples with neutral 8p11-p12 copy number (denoted no amp). (B) Bar plot illustrating the percentage of 8p11-p12 amplified samples (blue bars) and TCGA breast carcinoma samples with neutral 8p11-p12 copy number (orange bars) harboring a putative deleterious genetic variant in exonic regions (in at least 20% of amplified samples). Genetic variants with significantly different mutation frequencies in the two groups were marked with an asterisk ( * ) symbol (P < 0.05). (C) Box plots illustrating the effect of 8p11-p12 amplification and mutation status on FPKM values. Asterisk denotes significant p-values ( * P < 0.05, ** P < 0.01, *** P < 0.001). www.oncotarget.com H3K4-specific lysine demethylase), and G9a (a H3K9specific methyltransferase) [28][29][30][31]. Although recent reports have shown that protein methyltransferases can be targeted with small-molecule inhibitors, none are currently used in clinical practice [26,32,33].
Whole-transcriptome RNA-seq and genome-wide SNP genotyping analysis highlighted the prevalence of fusion transcripts and genetic variants in 8p11-p12 amplified tumors. These analyses revealed that almost 90% of the identified fusion gene partners were ncRNAs, such as MALAT1 (metastasis-associated lung adenocarcinoma transcript 1; also known as NEAT2) and NEAT1. MALAT1 was highly promiscuous with over 400 gene partners (as both the 5′-and 3′-gene partner), suggesting that these fusions occur at the RNA level. MALAT1 is an evolutionary conserved gene that has been shown to be involved in chromosomal translocations and contain genetic variants [34,35]. Like other ncRNAs, MALAT1 can migrate from the nucleus to the cytoplasm where it can interact with both DNA and proteins in the nucleus and cytoplasmic RNA molecules and proteins [36]. MALAT1 is particularly interesting because it is a) frequently overexpressed in different malignancies, b) a prognostic indicator of poor survival in breast cancer, c) has been shown to be controlled by 17β-estradiol stimulation in prostate cancer, d) c-MYC has been shown to bind to the MALAT1 promoter thereby inducing MALAT1 transcription, and e) has been shown to be associated with cell proliferation, metastasis, and the cell cycle [37][38][39][40][41]. Furthermore, a Malat1 knockout mouse model resulted in normal pre-and postnatal development and Malat1 inhibition in a mouse model for luminal B breast cancer gave rise to poorly developed metastatic tumors, suggesting that MALAT1 inhibition may be a feasible approach to reduce tumor growth and metastasis with minimal adverse effects on normal tissue [40,41].
Among protein-coding genes, several breast cancerrelated genes were predicted to be fusion transcripts with oncogenic potential, e.g. BCL2, ESR1, ERBB2, IGFBP5, TRPS1. Interestingly, we have previously shown that TRPS1 is among other genes spanning chromosome 8q that are hypomethylated in 8p11-p12 breast tumors [12]. Fusions are commonly produced during the formation of structural rearrangements, transcription read-throughs, and alternative splicing, where one fusion partner frequently deregulates the other [42]. As expected, the majority of the fusion transcripts identified here spanned genetically instable regions with DNA breakpoints, particularly intrachromosomal fusions. It was also shown that in recurrent fusions, such as MALAT1-AHNAK/AHNAK-MALAT1 and MALAT1-TRPS1/TRPS1-MALAT1, our data suggest that MALAT1 only deregulated the expression patterns of its gene partner (AHNAK or TRPS1) when MALAT1 was the 5′-gene partner. Additionally, several interesting in-frame fusions and inframe kinase fusions were identified, several of which may be targetable with kinase inhibitors.
In contrast to the fusion transcripts, few ncRNAs contained genetic variants such as indels and substitutions. Intriguingly, genetic variants in HIST1H1E encoding p.Ala167fs (frameshift insertion) and UQCRHL encoding p.His56Arg (nonsynonymous SNV) were found in all 23 amplified samples and in none of the non-amplified TCGA samples. These genetic variants also resulted in significant up-regulation of the two genes in mutated/amplified samples. HIST1H1E is a linker histone gene that may play a role in epigenetic regulation, whereas UQCRHL has been identified as a prognostic factor for hepatocellular carcinoma that plays a pivotal role in mitochondrial respiration [43][44][45]. The few exonic variants and fusion transcripts identified in 8p11-p12 genes were tumorspecific rather than amplification-specific, suggesting these molecular mechanisms may be secondary modes of gene activation. Three exonic variants were identified in the RAB11FIP1 and ZNF703 genes and FGFR1, RAB11FIP1 and WHSC1L1 were among fourteen genes spanning the 8p11-p12 amplicon to be identified as fusion gene partners.
In summary, we describe the genetic landscape of 8p11-p12 amplification in breast carcinoma, including previously undescribed chromosomal rearrangements and gene fusions. Our work may pave the way for future studies investigating the mechanisms by which specific oncogenes within the 8p11-p12 amplification region promote breast tumorigenesis, which may lead to more specific target therapies and thereby improve treatment for patients with 8p11-p12 amplified breast carcinomas.

Evaluation of genomic and transcriptomic profiling data
To further investigate the clinical significance of 8p11-p12 DNA amplification in breast carcinomas, genomic profiling data for 229 primary invasive breast carcinomas (corresponding to 185 patients) previously profiled with microarray-based comparative genomic hybridization (array-CGH) and gene expression microarray data for 150 of the 229 samples (corresponding to 140 patients) [12,[14][15][16] were evaluated and correlated with clinicopathological features and clinical outcome. Normalized values from five normal breast samples profiled with Illumina HumanWG-6 Expression Beadchips (GEO, accession number GSE17072) were used as normal controls [46]. The patients were diagnosed in Western Sweden between 1988 and 1999 and the freshfrozen tumor samples were stored in the tumor biobank at the Sahlgrenska University Hospital Oncology Lab (Gothenburg, Sweden).

Nucleic acid isolation and purification
For SNP genotyping and RNA sequencing (RNAseq) analysis, genomic DNA and total RNA were isolated from 10-20 mg sections of fresh-frozen tumor specimens for 23/47 samples with focal 8p11-p12 amplification. Prior to nucleic acid isolation, each specimen was evaluated for neoplastic cell content using touch preparation imprints stained with May-Grünwald Giemsa (Chemicon). Highly representative specimens with at least 70% neoplastic cell content were included in downstream analyses. Genomic DNA was isolated using the Wizard Genomic DNA extraction kit (Promega), including proteinase K treatment (Roche) followed by phenol-chloroform purification (Sigma). Total RNA was isolated with the RNeasy Lipid Tissue Mini Kit (Qiagen) according to the manufacturer's instructions. DNA and RNA concentration were measured using Nanodrop ND-1000 (Nanodrop Technologies). The total RNA concentration was also evaluated using QuBit (ThermoFisher Scientific). RNA integrity was assessed using the RNA 6000 Nano LabChip Kit with Agilent 2100 Bioanalyzer (Agilent Technologies).

Whole transcriptome RNA sequencing (RNA-seq)
Total RNA samples from 23 breast carcinomas with high-level regional 8p11-p12 amplification were processed at the Science for Life Laboratory (National Genomics Infrastructure Stockholm). Illumina TruSeq strand-specific RNA libraries (Ribosomal depletion using RiboZero human) containing 125 bp pair-end reads were obtained for each sample on a HiSeq2000 sequencer (Illumina). The computations were performed on resources provided by SNIC through Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) under Project b2015076, as described in the Supplementary Methods [51].

Quality control
Quality control of raw RNA-seq reads was performed prior to assembly using FastQC (0.11.5). The RNA-seq reads were then trimmed and filtered with TrimGalore (0.3.3) to remove adapter sequences and reads with Phred quality scores below 20, followed by alignment to the hg19 build 37 reference assembly of the human genome using STAR (2.5.1b) [52]. Read alignment yielded approximately 40-50 million aligned reads per sample. Counts and Fragments Per Kilobase of transcript per Million mapped reads (FPKM) were calculated using HtSeq (0.6.1) [53] and Cufflinks (2.2.1) [54], respectively. Quality control statistics for mapped reads (e.g. gene body coverage and read distribution) were obtained using RSeQC (2.3.6).

Fusion gene identification
Fusion transcripts were identified with FusionCatcher (0.99.5a) using criteria to remove false positive candidate fusion events, followed by classification of "driver" fusion events (Bayesian probability scores < 0.5) with oncogenic potential using Oncofuse (1.1.1) [55,56]. Circos plots were generated with the Circos module (0.66) to visualize DNA copy number alterations, SNP plots, fusion genes, and exonic variants for each sample [57]. The difference in gene expression patterns for specific fusion transcripts was determined using t-test or ANOVA, as appropriate (P < 0.05).

Variant calling and filtering
The Genome Analysis Toolkit (GATK 3.5.0) variant calling pipeline [58] and the ANNOVAR tool (2016.05.11) were used to identify and annotate genetic variants, e.g. SNPs and indels, in individual samples with the SplitNCigarReads, BaseRecalibrator (with dbSNP Build 138 hg19), HaplotypeCaller, and VariantFiltration (Fisher Strand (FS > 30.0) and Qual By Depth values www.oncotarget.com (QD < 2.0)) tools, respectively. Common genetic variants found in the human population were removed with ANNOVAR using the dbSNP (hg19_snp138), 1000 Genomes Project (1000g2015aug) with a minor allele frequency (MAF) threshold of 0.01, SweGen dataset [59], and NHLBI GO Exome Sequencing Project (hg19_ esp6500siv2_al) databases. Genetic variants not found in the COSMIC database version 70 (cosmic70) were denoted as "novel" genetic variants. Sequence Ontology analysis was performed to identify a conservative set of potential deleterious genetic variants resulting in amino acid changes, i.e. frameshift insertion (SO:0001909), frameshift deletion (SO:0001910), frameshift block substitution (SO:0001589), stopgain (SO:0001587), or stoploss (SO:0001578) [60]. To determine whether the deleterious genetic variants were associated with 8p11-p12 amplification, the mutation frequency was also evaluated in mRNA-seq data for 10 primary breast carcinomas sequenced by The Cancer Genome Atlas (TCGA) that lacked 8p11-p12 amplification (SNP segmented mean < 0.4) [61,62]. BAM files for the 10 TCGA samples were downloaded from the Genomic Data Commons (GDC) Portal, converted to FASTQ format with BEDTools BAMTOFASTQ (2.25.0), and compressed with Gzip before running the GATK variant calling pipeline with RNA-seq reads aligned to the hg19 build 37 reference assembly.

Genome-wide SNP genotyping analysis
Genome-wide SNP genotyping analysis was processed for the 23 amplified samples with Illumina Infinium HumanOmni 2.5-8 v1.3 Beadchips at the SCIBLU Genomics DNA Microarray Resource Center (SCIBLU), Department of Oncology, Lund University. The beadchips were scanned on an iScan (Illumina) and data processed using the Illumina GenomeStudio Genotyping Module software (V2011.1) and hg19 build 37 reference assembly of the human genome to calculate B-allele frequencies (BAF) and logR ratios (LRR). Genome-wide allele-specific copy number profiles were generated in R/Bioconductor (version 3.3.2) using the ASCAT (allele-specific copy number analysis of tumors, version 2.5) algorithm and the germline genotype prediction function for Illumina 2.5M SNP arrays, as previously described [63]. ASCAT profiles illustrate the copy number for the minor allele (least frequent allele) and the estimated overall copy number (sum of the minor and major allele counts).

Fluorescence in situ hybridization (FISH)
Probe labeling and hybridization were done using locus-specific bacterial artificial chromosome (BAC; BACPAC Resources Center) probes to verify gene amplification and fusion genes. Touchprint preparations were prepared with fresh-frozen tumor samples on Superfrost Plus microscope slides (Erie Scientific Company). Dual-color FISH was performed using co-hybridized biotin-16-dUTP and dioxigenin-11-dUTP labeled probes (Supplementary Table 3). The slides were analyzed using a Leica DMRA2 fluorescent microscope (Leica) equipped with an ORCA Hamamatsu CCD (charged-couple devices) camera and filter cubes specific for green fluorescein isothiocyanate (FITC), red rhodamine, and UV for DAPI visualization. Digitalized black and white images were acquired using the Leica CW4000 software package.

Ingenuity pathway analysis (IPA)
Ingenuity Pathway analysis (Ingenuity Systems, Redwood City, USA) was performed to assess the functional relevance of the differentially expressed transcripts, deleterious genetic variants (identified in ≥ 20% of samples), and fusion genes with oncogenic potential. Canonical pathways, diseases and bio functions, and upstream regulator analyses were generated using Fisher's exact test (P < 0.05). The activation state of the upstream regulators was determined with the z-score, where z > 2 and z < −2 were denoted as activation and inhibition, respectively.

Statistical analyses
Statistical analyses were performed using a 0.05 p-value cutoff in R/Bioconductor (version 3.3.2). All p-values are two-sided. The difference in mutation frequency and gene expression patterns between 8p11-p12 amplified and non-amplified samples were determined using Wilcoxon Rank Sum test or Pairwise Wilcoxon Rank Sum Test.

Data availability
The data reported in this study have been deposited in the NCBI Gene Expression Omnibus and are accessible through GEO Series accession number GSE97293 (https://www.ncbi.nlm.nih.gov/geo/query/acc. cgi?acc=GSE97293).

Author contributions
K.H., P.K., A.K., and E.F.-A. were responsible for overall study concept, design of experiments, and collection of clinical data. K.T., H.E., and J.B. contributed to the computational analyses of the RNA-seq data. S.N. contributed to the statistical analyses. E.W.R. and G.S. provided technical and material support. T.Z.P. performed the experiments, analyzed the data, and wrote the manuscript. All authors reviewed, edited, and approved the final manuscript.