Association of a cytarabine chemosensitivity related gene expression signature with survival in cytogenetically normal acute myeloid leukemia

The prognosis of cytogenetically normal acute myeloid leukemia (CN-AML) varies greatly among patients. Achievement of complete remission (CR) after chemotherapy is indispensable for a better prognosis. To develop a gene signature predicting overall survival (OS) in CN-AML, we performed data mining procedure based on whole genome expression data of both blood cancer cell lines and AML patients from open access database. A gene expression signature including 42 probes was derived. These probes were significantly associated with both cytarabine half maximal inhibitory concentration values in blood cancer cell lines and OS in CN-AML patients. By using cox regression analysis and linear regression analysis, a chemo-sensitive score calculated algorithm based on mRNA expression levels of the 42 probes was established. The scores were associated with OS in both the training sample (p=5.13 × 10−4, HR=2.040, 95% CI: 1.364-3.051) and the validation sample (p=0.002, HR=2.528, 95% CI: 1.393-4.591) of the GSE12417 dataset from Gene Expression Omnibus. In The Cancer Genome Atlas (TCGA) CN-AML patients, higher scores were found to be associated with both worse OS (p=0.013, HR=2.442, 95% CI: 1.205-4.950) and DFS (p=0.015, HR=2.376, 95% CI: 1.181-4.779). Results of gene ontology (GO) analysis showed that all the significant GO Terms were correlated with cellular component of mitochondrion. In summary, a novel gene set that could predict prognosis of CN-AML was identified presently, which provided a new way to identify genes impacting AML chemo-sensitivity and prognosis.


INTRODUCTION
Acute myeloid leukemia (AML), characterized by the rapid growth of abnormal white blood cells interfering with the production of normal blood cells, is the most common type of acute leukemia affecting adults.Presently, induction chemotherapy with cytarabine and anthracyclines is the first-line treatment for AML except for acute promyelocytic leukemia (According to NCCN AML Guidelines 2015, version 1).However, outcomes of AML vary greatly among patients after chemotherapy.Clinical studies have shown that the five-year survival rate of AML varies from 18% to 82%, and relapse rate varies from 33% to 80% [1][2][3][4].
It is well known that chromosomal abnormalities are major prognostic factors in AML.Based on karyotype, AML patients can be divided into favorable, intermediate and unfavorable cytogenetics risk groups which show different survival profiles [5].However, the detailed mechanism of the prognosis variation in normal karyotype
Cancer cell lines, which were descended from naturally occurring tumors, are commonly used for molecular biology research and drug discovery in vitro.Previous studies have confirmed that the use of pharmacological data and genomic information of cancer cell lines can help researchers to identify therapeutic biomarkers [15][16][17].These data are also helpful to find genomic variants associated with sensitivity of antitumor drugs and thus potentially affect cancer prognosis.Cytarabine is a first-line drug for inducing remission in common therapeutic schedule for AML.The achievement of complete remission (CR) is indispensable for a better prognosis; meanwhile, chemo-resistance is essential for treatment failure and poor outcomes for AML [18][19][20].Therefore, further identification of genomic information affecting drug response phenotype in leukemia cell lines may help to find new factors affecting AML prognosis.
Recently, a lot of classified studies based on RNA microarray have been performed to establish the classifier which might help to predict the outcomes of CN-AML.For example, Metzeler et al. reported an 86-probe-set mRNA expression signature which was correlated with overall survival (OS) of AML [21].Gentles et al. found a leukemic stem cell gene expression signature correlated with clinical outcomes in AML [22].Garzon et al. derived a lncRNA score composed of 48 lncRNAs and found that the score was an independent marker for the outcome in CN-AML patients [23].However, no studies have focused on genes related to chemo-sensitivity.In 2013, an open access database named Genomics of Drug Sensitivity in Cancer (GDSC) was developed [24].This database includes date of half maximal inhibitory concentration (IC 50 ) values for 138 anticancer drugs on more than 800 cancer cell lines.Whole genome mutations and expression levels are also available for the GDSC database.To find out whether genes correlated with chemo-sensitivity could also predict the outcome of AML, we downloaded IC 50 values of cytarabine from 96 leukemia cell lines and raw whole genome expression data of these cell lines from GDSC in this study.Probes significantly associated with IC 50 levels in the cells and OS in GSE12417 dataset, a whole genome expression dataset for CN-AML in NCBI Gene Expression Omnibus (GEO), were selected.Based on the selected probes, chemo-sensitivity score was derived and replicated in a TCGA CN-AML dataset.

Cluster analysis in blood cell lines
At first, we conducted a cluster analysis based on expression of all probes.The blood cancer cell lines were clustered into two classes except for Daudi (a human Burkitt's lymphoma cell line) and U-698-M (a B-Acute Lymphoblastic Leukemia cell line) (Figure 1A).Significant difference in mean cytarabine IC 50 values between the two classes was observed (p=0.030, Figure 1B).These results indicated that the whole genome expression pattern can affect cytarabine sensitivity in blood cell lines.

Function prediction of genes notably impacting cytarabine IC 50 in blood cancer cell lines
Results of linear-regression analysis showed that 4207 probes were significantly associated with cytarabine IC 50 values (raw p<0.05) in the GDSC blood cell lines.GO annotation was carried out to predict the potential function of genes targeted by the probes.In terms of biological processes, 4 of the top 10 categories belonged to RNA processing modification and metabolic process (Figure 2A).Interestingly, molecular functions of all top 10 categories were related to protein or nucleic acid binding (Figure 2B).With regard to cellular components, lumen appeared in 3 most enriched categories (Figure 2C).KEGG database was also used to identify gene networks affected by these genes (Figure 2D).Most pathways were related to immune system.The details of pathway analysis based on all 4207 probes were listed in supplementary materials (Supplementary material S2).

A 42 probe set correlated with both cytarabine sensitivity and AML OS
Among the 4207 probes associated with cytarabine IC 50 values, 453 were significantly associated with OS in GSE12417 U133 AB sample, while 337 were associated with OS in GSE12417 U133 plus sample (univariate cox regression analysis P<0.05).Finally, 42 probes correlated with both cytarabine sensitivity and OS in the same direction were identified.Details of these probes were listed in Table 1.After analysis in David database, 33 GO Terms with raw p<0.05 were identified, and four (GO:0044429, Bonferroni p=1.74 × 10 -5 ; GO:0005759, Bonferroni p=0.002;GO:0031980, Bonferroni p=0.002;GO:0005739, Bonferroni p=0.006) of them passed correction for multiple tests (Table 2).Interestingly, all these four GO Terms correlated with cellular component of mitochondrion which plays important roles in AML progression.

Association of the chemo-sensitivity score based on 42 probes with AML survival in patients from GSE12417 dataset
The detailed information including univariate cox score and combined cox score of the 42 selected probes mapped to 44 annotated genes were summarized in Table 1.Among these probes, 3 targeted two genes, and 2 genes were represented by 2 probes simultaneously.As compared to patients with a favorable chemo-sensitivity score, those with an unfavorable score had a shorter OS in both samples with U133 AB (p=0.006,HR=1.739, 95% CI: 1.174-2.575,Figure 3A) and U133 plus data (p=0.003,HR=2.456, 95% CI: 1.358-4.441,Figure 3B).As age was significantly associated with OS in both U133 AB and U133 plus samples in univariate analysis, we performed further multivariate analysis adjusted by age.The results showed that the influence of chemo-sensitivity score on OS was more notable (p=5.13 × 10 -4 , HR=2.040, 95% CI: 1.364-3.051for U133AB sample; p=0.002,HR=2.528, 95% CI: 1.393-4.591 in U133 plus sample).

Validation of the association between 42-probebased chemo-sensitivity score and survival in TCGA CN-AML patients
TCGA AML dataset was downloaded and association between chemo-sensitivity score based on the 42 probes and OS in 56 de novo CN-AML patients was also assessed.Patients with an unfavorable score had a shorter OS (p=0.013,HR=2.442, 95% CI: 1.205-4.950,Figure 4A).However, age was not associated with OS in TCGA CN-AML patients (p=0.367).As disease-free survival (DFS) data was also available for the TCGA AML dataset.Our results showed that patients with an unfavorable score had significantly shorter DFS as compared with patients with a favorable chemo-sensitivity (p=0.015,HR=2.376, 95% CI: 1.181-4.779,Figure 4B).
To confirm whether chemo-sensitivity score was an independent risk factor for OS in TCGA CN-AML patients, association between the score and some known risk factors for AML outcome, including cytogenetic risk, NPM1 mutations, FLT3-IDT mutations, CEBPA mutations, IDH mutations, and DMNT3A R882 mutations were analyzed.However, no significant association was observed for any of the somatic mutations (Table 3).

DISCUSSION
In this study, we derived a gene signature including 42 probes which mapped to 43 annotated genes from GDSC blood cell lines and the GEO GSE12417 dataset.The chemo-sensitivity score calculated based on the expression level of the 42 probes was significantly associated with OS in GSE12417 CN-AML patients and replicated in the TCGA AML dataset.Multivariate analysis showed that the chemo-sensitivity score might be an independent risk factor for AML outcome.Our results indicated that chemo-sensitivity score might be used for predicting prognosis of AML patients after cytarabine based chemotherapy.
Cytarabine is a key drug used for the induction therapy of AML.Evidence has shown that AML patients who achieved CR had longer OS than non-CR patients [18][19][20], and chemo-resistance is the main reason for treatment failure in AML [25][26][27].Hence, factors affect IC 50 of cytarabine may also influence prognosis of AML.Our findings presently confirmed the conjecture.In this study, we obtained a gene signature related to both IC 50 values of cytarabine in blood cancer lines and AML OS accepted induction therapy based on cytarabine.According to the chemo-sensitivity score calculated by the expression level of the gene signature, AML patients could be divided into favorable and unfavorable groups that have different OS and DFS.These results indicated that identification of factors related to cytarabine response on cancer cell lines maybe a viable strategy for finding potential factors affecting AML survival.
For the 44 genes involved in the signature, results of pathway analysis by using the David database showed that 4 GO terms were significantly correlated with cellular component of mitochondrion after multiple testing corrections.Mitochondria is a key organelles in human cells participating in cell apoptosis [28,29], and it also plays important roles in AML progress and chemosensitivity [30].Energy is greatly demanded in cancer cells, therefore, more adenosine triphosphate (ATP) synthesized by mitochondrion is required.Using AML cell lines, Vo et al. demonstrated that increasing mitochondrial priming enhances chemo-sensitivity, and selection for reduced mitochondrial priming in relapsed AML may be an important determinant for the chemo-resistant   phenotype [31].Xiao et al found that mitochondrial ATPsyn-b played an important role in multidrug resistance in AML [32].Wang et al observed that arsenic trioxide (ATO), used for the treatment of acute promyelocytic leukemia (APL), could induce apoptosis at therapeutic concentrations (1-2mM) through the mitochondrial pathway in APL NB4 cells [33].
There is also study shows that AML blast cells have higher copy number of mtDNA and consume more oxygen [34].When AML cells were treated by tigecycline to inhibit mitochondrial translation, both mitochondrial translated proteins level and oxygen consumption were reduced with subsequent cell death [35].Moreover, activation of the molecular cascade of apoptosis through mitochondrial is supposed to be an important mechanism by which cytarabine kills cancer cells [35].Previous research reported that cytarabine could induce the release of reduced form of cytochrome c from mitochondrial into the cytosol and initiated caspase-3 activity, which could prompt apoptotic program and result in cells death in AML cells [36,37].
We further compared our results with previous studies that focused on the effect of gene expression in cytarabine response or leukemia treatment outcome.In a whole genome expression association (WES) study that focused on cytarabine sensitive and resistant murine cell lines, SLC14A1 was found to be upregulated in cytarabine-resistant cells [38].Interestingly, SLC14A1 is also included in our gene signature, but in an opposite direction.In our results, SLC14A1 high expression correlated with low IC 50 values in blood cancer cell lines and acted as a good prognostic predictor in AML patients for both GSE12417 and TCGA CN-AML datasets.We further queried this gene in GEO database, and found that there were remarkable inter-individual variations in the expression of SLC14A1 in AML patients (Supplementary Figure S1).In the cell line WES study, both B117P and B140P cell lines demonstrated low SLC14A1 expression and could not reflect the expression distribution of SLC14A1 in AML patients [38].This may therefore lead to the controversial findings.Further mechanism study is needed to confirm whether SLC14A1 expression increases cytarabine response in AML.Another gene, named TAL1, was reported to be associated with outcome of T-cell acute lymphoblastic leukemia [39].As in our results, high TAL1 expression predicted better prognosis.Three probes in TPD52 were included in our gene signature.Previous study found that high TPD52 expression was correlated with worse outcome in infant ALL, and this was consistent with our results [40].Moreover, UCK1, SLC25A38, VDAC1 in our signature have been reported to be involved in proliferation or apoptosis of leukemia cell lines [41][42][43].We also compared our signature with other prognostic signatures for AML.All the genes but SLC25A37 in our signature were reported for the first time [4,21,22,44,45].These indicate that our research tactics might help researchers to find new functionally relevant genes involved in AML progression or cytarabine efficacy.
There are also some limitations for this study.First of all, the data used in this study is obtained from public database, therefore information for some common AML somatic mutations and other known factors affecting AML drug response were not available for constructing prediction model, which may result in uncertainty of whether the chemo-sensitivity score was an independent factor.Integration of gene expression data and somatic mutation data is supposed to build a more precision prediction model.Secondly, only 17 AML cell lines are deposited in GDSC database, and in order to gain better statistic power of linear regression analysis, all the blood cancer cell lines were used to screen the probes that were associated with IC 50 of cytarabine.Therefore, this process might cause false positive results due to specificity of blood cancers.Finally, we failed to validate our prediction model in prospective study.Therefore, further studies are required to validate the clinic significance of the model.
In conclusion, we identified a novel gene set that could affect both the cytarabine sensitivity in blood cell lines and OS of AML patients underwent cytarabine therapy.The prediction model based on the gene set could predict prognosis of CN-AML.Most genes included in the set are correlated with mitochondrial, which hints that mitochondrial might be important in cytarabine response and AML outcomes.Our study proposed a new way to identify factors impacting AML prognosis, and this would provide some scientific basis for elucidating the individual difference in outcomes of AML and realizing personalized medicine according to gene expression pattern of AML.

GDSC cell lines
The open accessed database GDSC (http://www.cancerrxgene.org/)includes more than 800 cancer cell lines with whole genome mRNA expression, mutation and copy number variation information.Based on GDSC release 5.0, 96 blood cancer cell lines with both natural log transformed IC 50 values of cytarabine and normalized whole genome mRNA expression levels were selected in our study (Supplementary Table S1).The whole genome mRNA expression level of the cell lines were detected by Affymetrix GeneChip Human Genome U133A Array.The normalized method was Affymetrix Micro Array Suite 5.0 algorithm.Details for data processing method are described elsewhere [15].As in some cell lines, mRNA expression levels were tested by more than two microarrays, and mRNA expression level results were randomly selected from one of the arrays during data analysis.

GEO initial sample
The dataset GSE12417 included 242 cytogenetically normal AML patients with whole genome mRNA expression data and clinical prognostic information is available from the GEO database [21].Three types of arrays were used to detect the whole genome mRNA expression levels: Affymetrix GeneChip Human Genome U133A and U133B arrays (U133 AB sample) for 163 patients, and Affymetrix GeneChip Human Genome U133 plus 2.0 array (U133 plus sample) for the other 79 patients.The MAS5 algorithm was used to normalize the expression data.

TCGA validation sample
The TCGA AML dataset includes 200 de novo patients [46].As the TCGA dataset is very complex, only patients with cytogenetically normal, non-M3 subtype, induction chemotherapy based on cytarabine, and percentage of blast cells more than 50% were selected in our analysis.Gene expression level of this dataset was detected by Affymetrix GeneChip Human Genome U133 plus 2.0 array and normalized by MAS5 algorithm.The clinical information of TCGA AML dataset was downloaded from cBioPortal, a well-known tool used to extract and manage data from TCGA database [47].

Statistical analysis
Linear-regression analysis was performed to identify probes correlated with cytarabine IC 50 in 96 blood cancer cell lines.Probes with P<0.05 were then selected for analysis in the GSE12417 dataset.Multivariate cox regression analysis (including age) was conducted to confirm whether the probes affect OS of CN-AML patients.Probes exhibited significance (p<0.05) in both the linear-regression analysis and the cox regression analysis in the same direction were screened.For example, if high expression of a probe was associated with high cytarabine IC 50 value, the high expression of this probe must correlate with poor AML prognosis.
To obtain the chemo-sensitivity score for each patients, the expression values of the selected probes were standardized transformed (centered to a mean of 0 and then scaled the SD).Then, the univariate cox scores (which indicated the correlation between probe expression levels , where CSj was the chemo-sensitivity score of patient j, n was the number of selected probes, and g ij was the expression level of probe i in patient j. To obtain the Kaplan-Meier plots of chemosensitivity score, patients were divided into favorable and unfavorable groups based on their chemo-sensitivity scores.The median was used as cut-point.Patients with chemo-sensitivity scores equal to or higher than the median were classified as unfavorable, the others were classified as favorable.Heatmap.2 command in gplots package was utilized for the cluster analysis.Gene Ontology (GO) and pathway analysis were conducted by the tool David 6.7 (https://david.ncifcrf.gov/).The GO analysis was performed based on biological process (BP), cellular component (CC) and molecular function (MF).The Kaplan-Meier plots were generated by Graphpad 5.0.All of the data arrangement and statistical analysis were conducted by using R (version 3.1.2).Methods flow chart of analysis process was shown in Figure 5.

Figure 1 :
Figure 1: Whole genome expression pattern impact cytarabine sensitivity in blood cancer cell lines.A. Cluster analysis of all probes in blood cancer cell lines, B. Comparison of cytarabine IC 50 values between two classes of blood cancer cell lines divided by cluster analysis.

Figure 2 :
Figure 2: GO and KEGG pathway analysis of genes notably impacting cytarabine IC50 in blood cancer cell lines.The top ten significantly enriched GO categories and pathways were calculated and plotted as the − 1 × log10 (p value).A. Biological process, B. Molecular function, C. Cellular component, D. KEGG Pathways.

Figure 4 :
Figure 4: Influence of chemo-sensitivity score on OS and DFS of CN-AML in TCGA AML dataset.

Table 3 : Comparison of chemo-sensitivity score between genotypes of well-known somatic mutations affecting AML outcome
the AML patients in GSE12417) of the selected probes were acquired and used to calculate combined cox score for each probe by the following formula: CCSi=α i1 ×β 1 +α i2 ×β 2 , where CCSi was the combined cox score of probe i.Because GSE12417 included two samples, α i1 and α i2 represented the univariate cox score for U133 AB sample and U133 plus sample, respectively; while β 1 and β 2 were the proportion of U133 AB and U133 plus samples in GSE12417, respectively.To get the chemo-sensitivity score of each sample, the following formula was used: CSj=