Whole-exome analysis in osteosarcoma to identify a personalized therapy

Osteosarcoma is the most common pediatric primary non-hematopoietic bone tumor. Survival of these young patients is related to the response to chemotherapy and development of metastases. Despite many advances in cancer research, chemotherapy regimens for osteosarcoma are still based on non-selective cytotoxic drugs. It is essential to investigate new specific molecular therapies for osteosarcoma to increase the survival rate of these patients. We performed exomic sequence analyses of 8 diagnostic biopsies of patients with conventional high grade osteosarcoma to advance our understanding of their genetic underpinnings and to correlate the genetic alteration with the clinical and pathological features of each patient to identify a personalized therapy. We identified 18,275 somatic variations in 8,247 genes and we found three mutated genes in 7/8 (87%) samples (KIF1B, NEB and KMT2C). KMT2C showed the highest number of variations; it is an important component of a histone H3 lysine 4 methyltransferase complex and it is one of the histone modifiers previously implicated in carcinogenesis, never studied in osteosarcoma. Moreover, we found a group of 15 genes that showed variations only in patients that did not respond to therapy and developed metastasis and some of these genes are involved in carcinogenesis and tumor progression in other tumors. These data could offer the opportunity to get a key molecular target to identify possible new strategies for early diagnosis and new therapeutic approaches for osteosarcoma and to provide a tailored treatment for each patient based on their genetic profile.


INTRODUCTION
Osteosarcoma is the most common nonhaematological primary malignant tumor of the bone, it arises from mesenchymal cells that produce osteoid and immature bone and affects mainly the extremities of adolescents and young adults [1-2].
The 5-years survival rate for patients with osteosarcoma without evidence of metastasis is 60% to 65%, whereas it is only 20% to 28% for osteosarcoma patients with metastases at the time of diagnosis [3]. Treatment of high grade osteosarcoma is based on a multidisciplinary approach that includes neoadjuvant chemotherapy, surgical excision of the primary tumor and metastasis excision; evaluation of response to therapy in the surgical specimen is crucial to eventually schedule a postoperative chemotherapy [4]. Standard therapy regimens often involve the use of high-dose methotrexate, doxorubicin, cisplatin and ifosfamide [4][5]. Evaluation of response to therapy was assessed by percentage area of necrosis in the specimen after surgery; patients who respond to therapy showing tumor necrosis ≥ 90% after surgery and they do not develop metastasis (Responder) while patients that do not respond to neoadjuvant chemotherapy showing tumor necrosis ≤ 90% after surgery and eventually developing metastasis (Non-responder) [6].
Although the survival rate has improved considerably after the introduction of neoadjuvant chemotherapy and surgery, metastatic or recurrent disease still occurs and the survival rate of patients is mainly linked to the resistance to therapy and to the development of metastasis [7]. In patients with metastatic osteosarcoma treated with neoadjuvant therapy, the "Responder" status shows improved survival (82% at 5-years) compared to "Non-Responder" (70% at 5-years) [8][9]. Over the last decade, scientific research has provided essential information about the pathogenic, molecular and biochemical pathway involved in osteosarcoma [10][11]; however, molecular mechanisms related to carcinogenesis, progression and resistance to therapy are still largely unknown. Indeed, even now the osteosarcoma karyotype is considered complex and only mutations of tumor suppressors genes TP53 and RB1 are commonly associated with the development of osteosarcoma [12].
There are many examples of "molecular targeted therapy", where tailored therapeutic agents have been selected to aim against specific molecules and their downstream effector pathways in each patient [13]. Nonetheless survival rates of patients have not greatly improved [14], as little is known at genetic level about the pathogenesis of osteosarcomas resulting in a lack of more effective and tailored chemotherapy drug regimens [15].
Our hypothesis is that only the analysis of the entire coding genome could lead to a better understanding of the molecular mechanisms underlying the development and progression of osteosarcoma. Next generation sequencing (NGS) technologies can unveil DNA sequences whose study can be applied to characterize both common and rare genomic alterations across cancer types. The NGS approach could clarify the landscape of genetic alteration in osteosarcoma and provide relevant biological data. In particular, the knowledge of genetic alterations in coding exon regions, through WES (Whole Exome Sequencing) approach, may result in an easier discovery of new proteins as molecular target which could be aimed at personalized therapies. Indeed, WES could lead to create algorithms to identify relevant genetic alterations networks relevant to the clinical context of the disease [16][17][18]. The aim of this study was to perform WES analysis in selected tumors samples of osteosarcoma, both from responder and non-responder patients that developed metastasis. The goal will be to improve the knowledge of osteosarcoma carcinogenesis, progression and resistance to therapy, to identify new strategies for therapeutic approaches and personalized therapy.

Sequence coverage and mutation analysis
This analysis provided a vast new reservoir of data and, after filtering the data, within our discovery set of eight osteosarcomas, 5 responder and 3 non-responder, we identified 18,275 somatic variations in 8,247 genes. The number of coding variations in each sample ranged from 496 to 8,500 with an average of 2,347.5 variations. All the variations were missense and the 98.5% were SNVs (Single Nucleotide Variants) while the 1.5% were MNPs (Multiple Nucleotide Polymorphisms).
The mutation rate in the responder group was slightly higher than the non-responder group (Table 1, Figure 1). Moreover, we found that, as in several other cancers [19], C:G>T:A transitions were the most predominant somatic substitution both in the responder group (43%) and in the non-responder group (34%) and we found that the vast majority of these transitions were identified in the context of 5'-CpG-3' dinucleotide as the result of a 5'-methylcytosine deamination (data not shown).
We did not find common mutated genes in all eight samples but we found three mutated genes (KIF1B, NEB, KMT2C) in 7/8 (87%) samples. Seventeen genes were mutated in 6 samples (75%, 6/8) and 57 genes showed variations in 5 samples (62.5%, 5/8). We noted a great variability of genes mutated among the samples as shown in Supplementary Table 1. Indeed, taking into consideration common mutated genes in at least 5 samples, we noted that sample n°5 showed variations in 77 genes while the sample n°6 showed variations only in 22 genes (Supplementary Table 1).
These 77 genes that were found mutated in more than half of the samples are involved in important biological processes such as transcription regulation, ion transport, DNA damage and repair, cell adhesion and angiogenesis ( Table 2).
Despite variations of TP53 and RB1 are associated with development of human osteosarcoma, we found that variations of TP53 were present in only one patient (12.5%, sample n°5) while only two patients showed variations of RB1 gene (25%, sample n°1 and 5).

Genes that may be involved in osteosarcoma tumorigenesis
We found that KIF1B, NEB and KMT2C were the only three genes mutated in 7/8 samples. KIF1B is located on chromosome 1p36, and belongs to the kinesin superfamily of intermediate filaments [20]. NEB, maps to chromosome 2q22, encodes nebulin, a giant protein component of the cytoskeletal matrix [21]. We found 10 and 27 variations respectively in these two genes and they were mutated in all samples except in sample n°7, while  sample n°5 had the higher number of variations in both cases (Table 3). Finally, KMT2C is located on chromosome 7q36.1 and it is a member of the mixed-lineage leukemia (MLL) family and it has histone H3 lysine-specific methylation activity. KMT2C is an important component of histone H3 lysine 4 methyltransferase complex, implicated in chromatin remodelling, called ASCOM (ASC-2-and Mll3-containing complex) [22]. Among these three genes, KMT2C showed the highest number of variations: we found 36 variations, with sample n°5 revealing the highest number of variations, except in sample n°3 (Table 4). Particularly, sample n°3 was a 14-year old boy with diagnosis of high grade chondroblastic osteosarcoma made on femur biopsy (stage IVa), follow-up was 11 years, he did not respond to neoadjuvant chemotherapy and develop metastasis.

Genes that may be involved in osteosarcoma metastatic progression and resistance to therapy
We found that 0.18% (15/8247) of genes showed variations, all missense (94% SNV, 6% MNP), only in the non-responder group (Table 5). Categorizing them by KEGG pathway analysis and literature review, we found that some of these are involved in important biological processes (Table 6). ERBB4 is located on chromosome 2q34 and it is a receptor tyrosine kinase member of the epidermal growth factor receptor (EGFR) family [23]. ERBB4 is involved in biological process such as signal transduction, transport and catabolism. It is involved in human disease like cancer: it participates in pathways such as ERBB signalling, implicated in tumor migration and invasion, and in other pathway implicated in inhibition of tumor angiogenesis [24]. In signal transduction, transport and catabolism and human disease biological process is involved THBS1 too and moreover is implicated in cell growth and death process. THBS1, thrombospondin 1, is located on chromosome 15q14 [25] and it is involved in RAP1 and TP53 signalling pathways, involved in inhibition of angiogenesis and metastasis, in PI3K-AKT and TGF-beta signalling pathway, involved in the control of cell cycle and apoptosis, and it takes part in focal adhesion and ECM-receptor interaction process [26]. Other genes are enzymes with specific activity like as rRNA and tRNA biogenesis (DIS3, KARS), spliceosome activity (BCLAF1) or charboidrate metabolism (PDHX) and all involved in carcinogenesis [27][28][29][30][31][32].

DISCUSSION
Osteosarcoma is a heterogeneous tumor and it is a so-called "orphan cancer" with no known driver oncogenes [33]. It is essential to find useful biomarkers and to detect the potential targets for new drugs, to increase overall survival of these patients. This could be done only if we better understand the complex biology of this tumor and the molecular pathways that lead to the development of metastases and resistance to therapy. At present time, the most innovative approach is represented by NGS, which allows to analyse the entire human genome [18]. In this study, we performed the analysis of exomic sequences of osteosarcomas to advance our understanding of their genetic underpinnings and to correlate the genetic alterations with the clinical and pathological features of each patient. Our aim was to identify clinically relevant variations in the landscape of somatic events in osteosarcoma pathogenesis and so we performed WES analysis on eight high grade osteosarcoma biopsies. Using stringent criteria for the analysis of these presumptive variations, we identified 18,275 somatic variations, all missense, in 8,247 genes. We noted that the mutation rate was similar between responder and non-responder group but overall this rate was higher than previous studies probably due to the small number of our samples [34][35][36]. We did not find specific structural abnormalities common to all samples, such as specific translocations like other sarcomas (e.g. Ewing sarcoma EWS/FLI1 t(11;22)(q24;q12)), confirming the complexity of osteosarcoma karyotype [15]. Despite TP53 and RB1 genes are considered top driver genes in osteosarcoma cancerogenesis, our study showed lower mutation frequency of TP53 and RB1 than other studies [34,35,[37][38][39]. Indeed, we observed that only one patients (12.5%, 1/8) showed variations of TP53 gene and that only two patients showed variations of RB1 gene (25%, 2/8). Our results could be the effect of the small number of samples, however the range of mutation rate described in osteosarcoma is very large going from 31% to 82% and from 19% to 64% for TP53 and RB1 respectively [41]. Moreover, most TP53 mutations described in some studies, are structural variations in intron 1 [34,37,38] that we did not study and the mutation rate outside the intron 1 is low (19-20%) [34,40]. However, as reported in literature, among all mutations of TP53 reported in previous studies only some mutations are clonal events [37]. Lastly, our data was filtered by Sift and Polyphen-2 bioinformatics software, so we reported only mutations that are considered deleterious for the protein function.
However, beyond TP53 and RB1, between genes considered candidate driver genes in osteosarcoma carcinogenesis [36] we found that ATM, a gene involved in DNA damage control signalling pathway, showed variations in 5/8 osteosarcoma samples. ATM is a tumor suppressor gene that belongs to the family of proteins that respond to DNA damage by phosphorylating key substrates involved in DNA repair and/or cell cycle control [36]. Moreover, we did not find variations in genes associated at inherited cancer predisposition syndromes associated with osteosarcoma as RECQL4, BLM, and WRN [34,35,37].
Moreover, we did not find variations common to all osteosarcoma samples but we found three mutated genes in 7/8 samples (87%) never studied in osteosarcoma. Variations in these three genes may be early molecular events and so they may be related to osteosarcoma tumorigenesis. KIF1B is responsible for intracellular vesicular transport [20], a process involved in the production matrix in osteosarcoma [42]. KIF1B has been shown to act as a tumor suppressor in multiple cancers by acting on various inhibitors of cell proliferation and activators of apoptosis [20]. Another interesting NEB, maps to chromosome 2q22, encodes nebulin, a giant protein component of the cytoskeletal matrix [21], that coexists with thick and thin filaments within the sarcomeres of skeletal muscle [21]. Finally, KMT2C catalyses the monomethylation of H3K4 in collaboration with hormone receptors and transcription factors involved in developmental signalling [22,43]; it is one of the histone modifiers previously implicated in some cancer types [22]. It caught our attention showing more variations than KIF1B and NEB, finally because this gene is found to be inactivated in other tumors causing an alteration of tumor suppressor gene TP53 [43], which is also frequently altered in osteosarcoma [44]. Moreover, recent evidence showed that mutations of KMT2C can modify its cooperation with oestrogen receptor [45], this hormone playing a key role in the development and support of bone remodelling and matrix production [46]. Recently, it has been demonstrated that highly conserved epigenetic  regulators are frequently mutated in cancer [47]; indeed, some studies suggest that variations in the coding sequences of regulatory elements, that act on enhancers to recognize specific transcription factors, could be the cause of tumor development [47]. KMT2C acts in collaboration with hormone receptors and transcription factors involved in growth-promoting pathways and it performs its action at transcription enhancer regions [22]. KMT2C is frequently mutated in a broad spectrum of cancers and it has been related to tumorigenesis [48]. Therefore, our hypothesis is that KMT2C could be involved in osteosarcoma carcinogenesis too. We found that the responder group showed more variations than the non-responder group and we noted that 15 genes showed variations only in the non-responder group which subsequently developed a metastatic disease. We found that some of them are involved in carcinogenesis and tumor progression but not in osteosarcoma. Particularly, ERBB4 is a member of the Tyr protein kinase family and the epidermal growth factor receptor subfamily that regulates cell proliferation and differentiation [23]. It has been shown that ERBB4 is overexpressed in Ewing Sarcoma cell lines derived from chemoresistant or metastatic Ewing sarcoma, a tumor of the bone [24].
ERBB4 leads to activation of PI3K-AKT, focal adhesion kinase (FAK) and the RAC1 GTPase, a mediator of cell migration and invasion [24], pathways that are already been described in osteosarcoma pathogenesis [49][50].
ERBB4 is rarely mutated in human cancers [51] but recently it was shown that ERBB4 can inhibit the tumor suppressor TP53 by regulating the MDMX-MDM2 complex stability, the primary inhibitors of TP53, leading to TP53 inactivation in tumor [52]. Thrombospondin I (THBS1) is an endogenous inhibitor of angiogenesis, which limits blood vessel density in normal tissues and curtails tumor growth. It interacts with a variety of ECM molecules and cell surface receptors and this protein has been shown to play roles in tumorigenesis; particularly, it was showed that THBS1 induced cell migration in several tumor cell lines suggesting that THBS1 is involved in cancer invasion [26]. DIS3 is one of catalytic components of the human RNA processing and degrading exosome it has both endonucleolytic and 3-prime/5-prime exonucleolytic activity [27]; mutations in DIS3 have been observed in several cancer types [28] and in multiple myeloma was observed a shorter survival in patients with DIS3 mutations [29]. BCLAF1 encodes a transcriptional  repressor that interacts with several members of the BCL2 family of proteins and ectopical BCLAF1 expression induces apoptosis in various cell types or autophagic cell death in myeloma cells. It was hypothesized that BCLAF1 plays a critical role repressing the transcription of survival genes through TP53 inhibition. Moreover, it was found that tumor cells suppress BCLAF1 expression inducing a cascade of antiapoptotic cellular events that all together determine an increase survival of tumor cells [31]. The KARS gene encodes lysyl-tRNA synthetase, which catalyses the aminoacylation of tRNAlys in the cytoplasm and mitochondria. It was showed that it was implicated in cancer metastasis indeed the phosphorylation of KARS at the Thr52 residue by p38MAPK, causes the dissociation from the cytosolic multi-tRNA synthetase complex and the following translocation to the plasma membrane, where it associates with a 67 kDa laminin receptor (p67LR) involved in migration and metastasis [30]. The PDHX gene encodes component X of the pyruvate dehydrogenase (PDH) complex that is in the mitochondrial matrix and catalyzes the conversion of pyruvate to acetyl coenzyme A [32]. It was showed that in colorectal cancer cells the glucose metabolism was regulated by miR-26a direct targeting the PDHX, which inhibits the conversion of pyruvate to acetyl coenzyme A in the citric acid cycle to require the glucose uptake to the energy needs of cancer cells [32]. In successive studies, it will be necessary to study whether and how these mutations may modify the expression of these genes and the function of the associated protein to discovery a possible molecular target.
Some of these genes identified in our study are involved in metastatic progression and poor survival in tumors, also of mesenchymal origin, but these have never been studied in osteosarcoma [24,26,29,30]. Our hypothesis is that alterations in these genes could be involved in a mechanism of metastatic progression and drug resistance in osteosarcoma.
We recognize these preliminary results may arise due to the small sample size, which is a limitation of the study, and we believe that it will be necessary first to ascertain our data in a larger cohort of patients.
In conclusion, our WES analysis confirms that the osteosarcoma karyotype is complex, however, we found genes probably involved in osteosarcoma carcinogenesis, as KMT2C, and we identified a group of 15 genes probably involved in metastasis development and drug resistance that could offer the opportunity to identify potential drug targets in order to create a personalized therapy and increase the survival rate among young patients with osteosarcoma.

Population study
The population of this study included 8 diagnostic biopsies of patients with conventional high grade osteosarcoma (sec. Enneking classification) obtained before neoadjuvant therapy from the files of the Department of Medical and Surgical Sciences and Biotechnologies, Sapienza University of Rome, Polo There were not inherited cancer predisposition syndromes associated with osteosarcoma among our samples. The clinical and pathological characteristics of each osteosarcoma patients are described in Table 7. All the tissue samples were reviewed on microscopy examinations by two dedicated pathologists.

Whole-exome sequencing (WES)
Genomic DNA was extracted from FFPE osteosarcoma biopsies using RecoverAll Total Nucleic Acid isolation FFPE kit (Life Technologies, Foster City, CA) according to the manufacturer's instruction and was used to prepare fragment libraries suitable for massively parallel paired-end sequencing.
DNA was amplified with GenomePlex Single Cell Whole Genome Amplification kit, following the manufacturer's protocol, because of the degraded material, typical of osteosarcoma samples. We measured DNA concentration of the samples using Qubit 2.0 Fluorometer (Invitrogen, Life Technologies, Grand Island, NY) with the Qubit DNA HS assay kit or Qubit DNA BR Assay kit according to the quantity of the starting material. Moreover, we checked the quality of DNA with 2200 Tapestation Instrument (Agilent Technologies, Santa Clara, CA, USA) using the D1000 screen tape or the High sensitivity D1000 screen tape as appropriate, to see the level of fragmentation of our samples.
To prepare the DNA library we used Nextera Rapid Capture Expanded Exome Kit (Illumina, San Diego, CA, USA) following the guidelines of the protocol but we used the half of the amount of the fragmentation enzyme because of the high level of fragmentation of all the samples tested with Tapestation. We load a maximum of 5 pooling libraries for each cartridge NextSeq High Output (300 cycles) on NextSeq500 (Illumina, San Diego, CA, USA).

WES data analysis
We sequenced the whole exomes of eight highgrade osteosarcoma tissues using NextSeq550 Illumina platform. The instrument generated approximately an average read length of 120.6 bases. After mapping to the human reference genome GRCh37 (hg19) using Burrows-Wheeler Alignment tool (BWA), we obtained the average depth of each base in the target regions as 17.7x. The average coverage of target regions was 121.7x.
The raw data generated from Illumina NextSeq550 were converted using Bcl2toFastq tools provided by Illumina. The data analysis of exomes was performed by using the SeqMule pipeline1. The final Variant Call Format files (VCFs) were uploaded on the VariantStudio Illumina (Illumina San Diego, CA, USA) software to perform annotation and filtration steps. We filtered genomic data by quality score of 30, by mean read depth of 5, and by GMAF ≤ 1 (Global Minor Allele Frequency: the frequency of the second most frequent allele in the population). Next, the detected mutations were filtered for the variants absent in database of Single Nucleotide Polymorphisms (dbSNP) to distinguish known somatic mutations from germline mutations. We further selected the somatic variations in the coding sequence by excluding those variations in flanking sequences like splicing sites, 5ʹ-UTR, 3ʹUTR, introns and intergenic regions. Moreover, we performed bioinformatics analysis using SIFT (v4.0.3) and PolyPhen-2 (v2.1.0) tools, that predict whether an amino acid substitution affects protein function based on sequence homology. To be sure that the variations to study were only those that caused a protein alteration, we selected only those that were at the same time deleterious for SIFT and probably damaging for PolyPhen-2.

Validation of variations
Genes that showed variation in the 87% of osteosarcoma samples (KIF1B, NEB and KMT2C) were selected for target resequencing and validation by Sanger sequencing. DNAs were amplified using the Gene Amp PCR System 9700 (Life Technologies, Foster City, CA). The PCR products were then purified using Exosap-IT (Affymetrix, Santa Clara, CA) and then sequenced using Big Dye Terminator version 1.1 Cycle Sequencing Kit (Life Technologies, Foster City, CA). Unincorporated primers and dye terminators were removed using the Montage-SEQ96 Sequencing Reaction Cleanup Kit (Merck Millipore, Billerica, MA). Sequencing was performed on an ABI PRISM 3100 Genetic Analyzer (Life Technologies, Foster City, CA) with 3100 Genetic Analyzer Data Collection software version 1.1. The sequencing and each reaction were performed in triplicate.