Genome-wide association analysis identifies SNPs predictive of in vitro leukemic cell sensitivity to cytarabine in pediatric AML

Cytarabine has been an integral part of acute myeloid leukemia (AML) chemotherapy for over four decades. However, development of resistance and high rates of relapse is a significant impediment in successfully treating AML. We performed a genome-wide association analysis (GWAS) and identified 113 (83 after adjusting for Linkage Disequilibrium) SNPs associated with in vitro cytarabine chemosensitivity of diagnostic leukemic cells from a cohort of 50 pediatric AML patients (p<10-4). Further evaluation of diagnostic leukemic cell gene-expression identified 19 SNP-gene pairs with a concordant triad of associations: i)SNP genotype with cytarabine sensitivity (p<0.0001), ii) gene-expression with cytarabine sensitivity (p<0.05), and iii) genotype with gene-expression (p<0.1). Two genes from SNP-gene pairs, rs1376041-GPR56 and rs75400242-IGF1R, were functionally validated by siRNA knockdown in AML cell lines. Consistent with association of rs1376041 and gene-expression in AML patients siRNA mediated knock-down of GPR56 increased cytarabine sensitivity of AML cell lines. Similarly for IGF1R, knockdown increased the cytarabine sensitivity of AML cell lines consistent with results in AML patients. Given both IGF1R and GPR56 are promising drug-targets in AML, our results on SNPs driving the expression/function of these genes will not only enhance our understanding of cytarabine resistance but also hold promise in personalizing AML for targeted therapies.


INTRODUCTION
Acute myeloid leukemia (AML) is the second most common pediatric leukemia with poor outcomes indicated by 5-year survival rates of 50-60% [1,2]. Cytarabine is one of the most effective chemotherapeutic agents used in AML treatment and has contributed to improved remission rates and overall survival. [3] Although most patients achieve remission after intensive cytarabine-containing induction therapy, a significant proportion of patients experience relapse. Development of resistance to antileukemic agents, such as cytarabine, is one of the biggest challenges in achieving successful treatment outcome in AML. Thus, there is a need to better understand the www.oncotarget.com Oncotarget, 2018, Vol. 9, (No. 79), pp: 34859-34875 Research Paper www.oncotarget.com factors that impact the chemosensitivity of leukemic cells. Previous efforts to identify single nucleotide polymorphisms (SNPs) associated with cytarabine resistance in AML have focused on SNPs within candidate pharmacokinetic genes involved in cytarabine metabolism and transport, as well as a few (LCLs) genome-wide efforts studies using of lymphoblastoid cell lines (LCLs) [4][5][6][7][8][9][10][11][12]. One of the limitations of LCLs used in GWAS studies is that these were derived from healthy individuals that are available through the International HapMap project. [9][10][11][12] Given that LCLs do not represent leukemic cells, these cell lines are not the best models to study leukemic cell chemosensitivity to cytarabine. Other efforts have focused on identifying differences in gene expression among AML cell lines with varying sensitivity to cytarabine. [13][14][15][16][17][18][19] To the best of our knowledge, no GWAS has so far been performed to identify SNPs that associate with the sensitivity of AML patients' diagnostic leukemic cells to cytarabine in vitro. The difficulty of obtaining bone marrow samples and the in vitro cytarabine chemosensitivity in primary AML cells is probably the reason for the lack of such studies, especially in pediatric patients.

Patient demographics
The current study included 65 patients enrolled in the multi-center St. Jude AML02 clinical trial with specimens available for determination of in vitro leukemic cell cytarabine chemosensitivity. Of these 65 patients, 50 (including 40 white patients) had leukemic cells with in vitro cytarabine with IC50 < 5 ng/μL and 15 (including 10 white patients) had leukemic cells that were resistant to cytarabine (IC50 not achieved). The baseline demographics of the patients included in the study are summarized in Table 1. There was no significant difference in age, gender and race between the patients classified as having leukemic cells sensitive or resistant to cytarabine. However, as expected, patients with high-risk group features were more abundant in cytarabine resistant cases. Given that our overall sample size was limited and most (77%) of the participants were genetically white and had a similar proportion of resistant vs. sensitive cases as the whole cohort, we restricted our analysis to the 50 genetically white patients.

Genome-wide association analysis
After rigorous quality control (QC) as described in the methods below, we evaluated 1,317,106 variants in 50 white patients. With this limited sample size, none of the SNPs reached genome-wide significance at the Bonferroni threshold for the familywise error rate (P<5 × 10 -8 ; Figure 1). Nevertheless, as our study likely has the largest feasible sample size in this rare disease setting, we selected 113 SNPs with p < 0.0001 to evaluate for further scientific evidence of involvement in cytarabine response ( Table 2).
The top GWAS SNP associated with cytarabine resistance is rs721947 (P=1.77x10 -7 ). It is located on chromosome 12 downstream of NEDD1 and co-localized within a long intergenic non-coding RNA (LincRNA) region. A few other potentially interesting SNPs localized within or close to the genes of potential relevance in myeloid malignancies. A synonymous SNP, rs1376041 G>A (P=1.42x10 -5 ), is located within a conserved region close to exon/intron splice junction within the G proteincoupled receptor 56 (GPR56) gene. Presence of the minor allele A for rs1376041 was associated with cytarabine resistance. Our results on this SNP are very exciting given that GPR56 contributes to AML development and its expression has been associated with inferior outcome in AML. [18,20] rs75400242 G>A (P=2.1x10 -5 ) SNP is located in a region upstream of IGF1R (Insulin like growth factor receptor) a gene expressed in human leukemia cell lines with pathological significance in AML [21][22][23]. IGF1R is among one of the significantly phosphorylated receptor tyrosine kinase in AML that has been associated with activation of PI3K/AKT signaling and thus cell growth and survival in AML [23].
Several SNPs were located in or near cell adhesion genes (ALCAM, BCAM, CDH4, NEDD9, OCML, PCDH1 and PCDH17) and genes involved death receptor signaling pathways (HTRA2 and ZC3HAV1a PARP family member). SNPs with in or close to other biologically interesting genes included: CXC3R1, a chemokine receptor which is expressed at higher levels in AML [26]; DLK1, a member of the NOTCH signally pathway with a potentially oncogenic role in myeloid dysplastic syndrome [27], a disease that frequently progresses into AML; VEGFC (vascular endothelial growth factor C), for which higher expression has been previously associated with chemo-resistance and adverse prognosis in AML [28]. www.oncotarget.com

Expression quantitative trait loci (eQTL) analysis for differentially expressed genes
We then evaluated the association of the expression of the genes using 578 unique microarray probe sets located within 500kb of the 113 SNPs identified in the GWAS analysis above. Thirty-eight probes representing 38 unique genes showed significant differential expression between the 10 cytarabineresistant cases and the 40 cytarabine-sensitive cases (p < 0.05; Supplementary Table 1). Next, we determined cis-eQTL associations between expression levels of the 38 genes with differential expression between sensitive and resistant cell lines and the 113 identified SNPs. We identified 19 SNP-gene pairs among 13 unique SNPs and 12 unique genes with a concordant triad of associations (Table 3): SNP genotype with cytarabine sensitivity (p < 0.0001), mRNA expression with cytarabine sensitivity (p < 0.05), and SNP genotype with mRNA expression (p < 0.1). Five SNP-gene pairs were informatively redundant due to linkage among SNPs. Figures 2A-2C provides heat-maps illustrating the genotype distribution of the 13 SNPs between cytarabine sensitive and resistant leukemia cells as well expression levels of the 12 genes between cytarabine sensitive and resistant leukemia cells.     Guided by the results above and existing evidence from literature on genes with relevance in AML, we further selected GPR56 and IGF1R for further in vitro validation of the impact on cytarabine resistance in AML cell lines.

rs1376041-GPR56
GPR56 has been reported previously to play a role in leukemogenesis [20]and has been recently identified to be part of a 17-gene leukemia stem cell signature, LSC17. [29] The GPR56 SNP rs1376041 G>A minor allele A was associated with cytarabine resistance (P=1.42x10 -5 , Figure 3A); greater GPR56 mRNA expression levels in AML patients' diagnostic leukemic cells were associated with resistance to cytarabine (P=0.036, Figure 3B) and consistent with these, presence of rs1376041 minor allele A was associated with greater GPR56 mRNA expression in leukemic cells (P=0.022, Figure 3C). Other genes in the flanking region within 500kb of the GPR56 SNP include other G protein-coupled receptors, namely GPR97 and GPR114 ( Figure 3D). In addition to the ciseQTL association of rs1376041 with GPR56 expression, a significant association was also observed with the neighboring gene KATNB1 (Table 3). We further investigated the impact of siRNA mediated transient knockdown of GPR56 on cellular sensitivity to cytarabine. siRNA mediated knockdown resulted in a significant reduction in GPR56 mRNA in both THP1 and K562 cell lines (P<0.0005, Figure 4A). GPR56 knockdown resulted in significant increase in apoptosis as shown by increase in Annexin V assay (P<0.01, Figure 4B and 4C) and reduction in cell viability (P<0.05, Figure 4D and 4E) in response to cytarabine at 5μM and 200μM of cytarabine in both the cell lines.

rs75400242-IGF1R
The second SNP-gene pair of interest is rs74500242-IGF1R, an insulin-like growth factor 1 receptor with tyrosine kinase activity. As shown in Figure 5A, the presence of the minor allele A for rs75400242 was associated with cytarabine resistance (P=2.19x10 -5 ). Furthermore, IGF1R mRNA expression level within diagnostic leukemic cells was significantly higher in cytarabine resistant cases (P=0.031, Figure  5B). The presence of an A allele for rs75400242 showed a marginally significant association with greater IGF1R expression (P=0.062, Figure 5C). Other genes near rs75400242 include ARRDC4 and FAM169B ( Figure 5D).
We further investigated the impact of siRNA mediated transient knockdown of IGF1R on cellular sensitivity to cytarabine. siRNA mediated knockdown resulted in reduction in IGF1R mRNA in THP1 and K562 cell lines (P<0.05, Figure 6A). IGF1R knockdown enhanced apoptosis and correspondingly had reduced cell viability in response to cytarabine treatment, in THP1 cell lines ( Figure 6B and 6D) and K562 cell lines ( Figure 6C and 6E).

DISCUSSION
Development of resistance to cytarabine adversely impacts treatment outcome, making it very critical to understand the molecular mechanisms underlying cytarabine resistance. Past efforts to understand factors www.oncotarget.com    influencing cytarabine chemosensitivity have focused on the use of lymphoblast cell lines (LCLs) derived from healthy population as a model system or comparing gene-expression differences between parental and cytarabine resistant AML cells lines. [9][10][11][12] However, each of these systems have limitations as LCLs do not represent leukemic cells and AML cell lines can undergo changes in culture that might impact gene expression. In this study, we report the first GWAS analysis to identify genomic markers predictive of AML patients' leukemic cell cellular sensitivity to cytarabine. We identified 113 SNPs associated with cytarabine in vitro sensitivity of leukemic cells obtained from pediatric AML patients. Within 500 kb of these SNPs, there were 38 genes that were differentially expressed between cytarabine sensitive and resistant cases. Nineteen SNP-gene pairs fulfilled the following three criteria: (1) association of SNPs with cytarabine in vitro sensitivity (p< 0.0001), (2) association of gene expression with cytarabine in vitro sensitivity (p<0.05) and (3) association of SNP with gene expression in a consistent direction (p < 0.1). Although this approach highlights significance of some SNPs, it is possible that other significant SNPs impact gene function by mechanisms other than regulating gene expression. Figure 7 summarizes the overall results of the current study. Although in-depth mechanistic investigation of all the 113 SNPs is beyond the scope of this work study, we selected two genes GPR56 and IGF1R for further in vitro validation. Consistent with the SNP and geneexpression results, siRNA mediated knockdown of these genes increased cellular susceptibility to cytarabine. GPR56, also commonly referred to as ADGRG1, encodes for G protein-coupled adhesion molecule and is highly expressed in leukemic stem cells and has been implicated in development of AML. [30] We have previously shown higher expression of GPR56 to be predictive of inferior outcome and more recently our data shows a significant relationship between GPR56 methylation and expression with outcome in pediatric AML. [18], Consistent with our findings, another study investigating DNMT3B as a player in leukemogenesis reported GPR56 as one of the target genes of DNMT3B and in adult AML high expression of GPR56 was associated with inferior overall survival. [31] GPR56 is also part of the 17 gene leukemic stem cell score (LSC17) that has been recently shown to have prognostic significance in predicting therapy resistance and relapse risk. [32] Saito et al. showed greater GPR56 expression in AML samples with high ecotopic viral integration site-1 expression (EVI1 high ), which is considered a refractory type with poor prognosis. Further, GPR56 knockdown in mice has been shown to result in an increase in cell migration and a decrease in cellular adhesion to the extracellular matrix, resulting in a reduction in cell growth rate and enhanced apoptosis. [20].
Our results show that the A allele of the GPR56 rs1376041 SNP was associated with cytarabine resistance as well as with higher mRNA expression levels of GPR56 gene, which in turn was associated with greater cytarabine resistance. This result indicates that rs1376041 SNP's contribution to cytarabine in vitro sensitivity might actually be through its regulation of GPR56 gene expression levels. This adds a unique level of evidence indicating GPR56 rs1376041 SNP to be predictive of its expression and cytarabine chemosensitivity of patient leukemia cells. Further, GPR56 knockdown in AML cell lines increased cytarabine sensitivity, suggesting its potential role in drug resistance and thus inferior outcome. To the best of our knowledge, even though recent data suggests significant role of GPR56 in AML, SNPs in GPR56 have not been studied in context of developing AML or resistance to treatment response.
The second signal, the rs75400242 SNP, is a G>A intronic SNP located 360 kb upstream of IGF1R, a gene encoding a transmembrane receptor protein having tyrosine kinase activity. [33,34] Upon activation by insulin-like growth factor 1 (IGF-1), the IGF system regulates proliferation and differentiation of hematopoietic cells. [35] Activated IGF1R, can lead to upregulation of phosphoinositide 3 kinase/protein kinase B (PI3K/Akt) signaling, promoting growth and survival of AML cells. [23,36] Cell growth and survival through IGF1R receptor-mediated activation of the PI3K/Akt signaling pathway has been reported previously [23,37,38], however its association with development of drug resistance is limited. Our results show IGF1R to be associated with cytarabine in vitro chemosensitivity and in conjunction with reports of specific targeting of IGF1/ IGF1R signaling pathway having potent anti-leukemic activity in AML cells with constitutive PI3K activation, makes it a likely drug target.
Our study clearly adds another level of evidence by mapping a genetic variation that is predictive of leukemic cell cytarabine resistance. Although further studies are warranted to both validate the associations of SNPs identified in our study and to understand the functional underpinnings of the mechanisms contributing to cytarabine resistance, this is difficult due to the lack of such data sets owing to the practical challenges associated with obtaining in vitro drug sensitivity data from primary leukemia cells. To the best of our knowledge, even though limited by the relatively small sample size as compared to other GWAS studies, our study is the largest data set with in vitro cytarabine chemosensitivity in leukemic cells from pediatric AML patients.
In conclusion, although future work in larger clinical cohorts is warranted, our results identified SNPs, mapping to genes of relevance in AML.
[39] In the event the blast count was <80%, samples were enriched to achieve more than 80% blasts using magnetic cell sorting (Miltenyi Biotech, Germany). The sensitivity of leukemic cells to cytarabine was determined in vitro using the 4-day 3-(4,5-dimethylthiazol-2-yl)-2,5diphenyltetrazolium bromide (MTT) cytotoxicity assay. [18] The leukemic cells were exposed to six different concentrations of cytarabine (ranging from 0.002-2.5 ng/ μL) or to culture media (without drug) in a 96-well plate. MTT assay was performed after 96 hrs of drug treatment to determine cytarabine IC50. Fifteen patient samples did not achieve IC50 even at highest drug concentration tested, which was 2.5 ng/μL. Therefore, IC50 values from the 65 samples were dichotomized into those less than 5 ng/μL, which represented the sensitive group (n=50 total, n=40 Caucasians), and those that did not achieve IC50, represented as a resistant group (n=15 total, n=10 Caucasians). www.oncotarget.com

Gene expression
The mRNA expression levels in diagnostic leukemic blasts from the patient samples were previously determined using the Affymetrix U133A microarray data. [40] All gene expression values were natural log transformed prior to data analysis.

Quality control procedure
QC steps were performed to obtain a high quality dataset for use in statistical analysis. QC is crucial since the capability of a GWAS to reveal true associations and the utility of those GWAS results is dependent on the quality of the data with significant impact on downstream analyses and replication studies. [41]The initial dataset had 2,612,357 variants and 65 participants. Marker QC comprised of filtering out 21,442 low quality SNPs having call rate ≤95% and 578,851 monomorphic SNPs. Of the remaining SNPs, the non-mitochondrial/non-haploid SNPs with MAF>5% were the ones analyzed. Multi-step sample QC procedure included identification and removal of individuals with a large proportion of SNPs failing i.e samples with a call rate ≤95%, individuals with mismatch between genetic and reported sex, related or duplicated individuals and individuals with outlying heterozygosity rate. [41,42] Sample QC also confirmed the individual's identified race and if genetic race was not consistent with their reported race, it was reassigned. All the samples passed sample QC steps and none were excluded in the analysis. Details of the QC procedure can be found in the Supplementary Table 2. All QC steps were performed using PLINK 1.9. [43,44].

Statistical analysis
Given the phenotype of interest, namely cytarabine IC50, was not normally distributed (Supplementarty Figure 1), the patients were classified into two groups: cytarabine sensitive: leukemic cells with IC50 less than 5 ng/μL, (n= 40 Caucasian patients), and cytarabine resistant: leukemic cells that never achieved IC50 and thus were assigned IC50 of 5 ng/μL, (n= 10 Caucasian patients). We utilized Cochran-Armitage trend test to assess SNPs associated with cytarabine sensitive or resistant groups within our white patients. [43,44] Genome-wide significance was set at an alpha level of 5x10 -8 [45] and a suggestive level was set at 1x10 -4 for follow-up with gene expression data. All GWAS was conducted using PLINK 1.9. [43,44].

Gene-expression analysis
Differential gene expression for genes within +/-500kb of the significant/suggestive GWAS SNPs was conducted using Wilcoxon test to detect genes differentially expressed between the sensitive and resistant groups. cis eQTL analysis was conducted on candidate SNP-gene pairs identified in the GWAS and differential gene expression analysis above using Wilcoxon and Kruskal-Wallis tests, depending on how many genotype groups are available for each SNP. These analyses were done using R statistical software version 3.4.0. [46].

In vitro functional validation
In vitro validation of two genes identified in GWAS and validated in gene expression analysis was performed using AML cell lines: THP-1, K562 (ATCC, Manassas, VA, USA). The cell lines were maintained in RPMI-1640 (Invitrogen, USA) supplemented with 10% FBS (Invitrogen, USA) at 37°C in a humidified atmosphere containing 5% CO2. The cells were passaged every 2-3 days in order to maintain them in logarithmic growth phase. siRNAs targeting GPR56 and IGF1R as well as scrambled siRNA were procured from Dharmacon (Accell SMARTpool, USA). Just before transfection, the cells were grown in RPMI-1640-medium free of antibiotics. Delivery of siRNA at a final concentration of 50 nM was performed using RNAi Max reagent (Invitrogen, USA) according to the manufacturer's instructions. To monitor the effect of siRNA on gene silencing, transfection (5 × 105 cells/well) was done in 6-well plates for 48 hours. RNA levels of target genes GPR56 and IGF1R as well as house-keeping gene GAPDH was then measured by Taqman gene expression assays (ThermoFisher Scientific, USA). Transfected cells harvested and exposed to cytarabine at 5 μM and 200 μM final concentration for 48 hours. Cells were stained with acridine orange (AO) and propidium iodide (PI) for viability assays and fluorescein isothiocyanate-conjugated Annexin V for apoptosis assays followed by manufacturer's procedure, (Cellometer Inc., Lawrence, MA, USA). Cell imaging was undertaken by the Cellometer Vision CBA and FCS Express-6 software. In order to measure threshold in the software, it was set to 0% to quantify total fluorescence of each counted cell from the captured images. Based on fluorescence intensities, the results were calculated and converted to FCS file for analysis in the De Novo Software.55.