Speed of leukemia development and genetic diversity in xenograft models of T cell acute lymphoblastic leukemia

T cell acute lymphoblastic leukemia (T-ALL) develops through accumulation of multiple genomic alterations within T-cell progenitors resulting in clonal heterogeneity among leukemic cells. Human T-ALL xeno-transplantation in immunodeficient mice is a gold standard approach to study leukemia biology and we recently uncovered that the leukemia development is more or less rapid depending on T-ALL sample. The resulting human leukemia may arise through genetic selection and we previously showed that human T-ALL development in immune-deficient mice is significantly enhanced upon CD7+/CD34+ leukemic cell transplantations. Here we investigated the genetic characteristics of CD7+/CD34+ and CD7+/CD34− cells from newly diagnosed human T-ALL and correlated it to the speed of leukemia development. We observed that CD7+/CD34+ or CD7+/CD34− T-ALL cells that promote leukemia within a short-time period are genetically similar, as well as xenograft-derived leukemia resulting from both cell fractions. In the case of delayed T-ALL growth CD7+/CD34+ or CD7+/CD34− cells were either genetically diverse, the resulting xenograft leukemia arising from different but branched subclones present in the original sample, or similar, indicating decreased fitness to mouse micro-environment. Altogether, our work provides new information relating the speed of leukemia development in xenografts to the genetic diversity of T-ALL cell compartments.


Targeted exome sequencing
Library preparation, exome capture, sequencing and data analysis have been done by IntegraGen SA (Evry, France).
Genomic DNA is captured using Agilent in-solution enrichment methodology (SureSelect XT Clinical Reasearch Exome, Agilent) with their biotinylated oligonucleotides probes library (SureSelect XT Clinical Reasearch Exome -54 Mb, Agilent), followed by pairedend 75 bases massively parallel sequencing on Illumina HiSeq4000. For detailed explanations of the process, see Gnirke publication in Nature Biotechnology [1].
Sequence capture, enrichment and elution are performed according to manufacturer's instruction and protocols (SureSelect, Agilent) without modification except for library preparation performed with NEBNext® Ultra kit (New England Biolabs®). For library preparation 600 ng of each genomic DNA are fragmented by sonication and purified to yield fragments of 150-200 bp. Paired-end adaptor oligonucleotides from the NEB kit are ligated on repaired A-tailed fragments, then purified and enriched by 8 PCR cycles. 1200ng of these purified Libraries are then hybridized to the SureSelect oligo probe capture library for 72 hr. After hybridization, washing, and elution, the eluted fraction is PCR-amplified with 9 cycles, purified and quantified by qPCR to obtain sufficient DNA template for downstream applications. Each eluted-enriched DNA sample is then sequenced on an Illumina HiSeq4000 as paired-end 75b reads. Image analysis and base calling is performed using Illumina Real Time Analysis (2.7.3) with default parameters.

Bioinformatics
Base calling is performed using the Real-Time Analysis software sequence pipeline (2.7.3) with default parameters. Sequence reads were mapped to the human genome build (hg19 / GRCh37) using Elandv2e (Illumina, CASAVA1.8.2) allowing multiseed and gapped alignments. The duplicated reads (e.g. paired-end reads in which the insert DNA molecule showed identical start and end locations in the human genome) are removed.
CASAVA1.8.2 is used to call single-nucleotide variants (SNVs) and short insertions/deletions (max. size is 300nt), taking into account all reads per position.
We used an in-house IntegraGen SA (Evry, France) algorithm which compares normal and tumor genotypes from exome sequencing data to determine the somatic nature of the variation. A somatic score is calculated for each variant ranging from 1 to 30, a score of 30 translating the highest confidence index. This score takes into account the frequencies and counts of mutated allele in both samples to minimize false positive variations. Finally, variants displaying mutated reads in the constitutional sample above 5 percent are considered as germline or false positive and eliminated to the somatic tab. The somatic variant caller handles indels similarly, analyzing the number of alignments covering a given position that include a particular indel (the variant count) versus the overall coverage at that position. For SNV analysis, samples with a SNV quality (Qsnv)<20 and somatic score<10 were eliminated. Remaining hazardous (Qsnv<20 and somatic score≥10) and non-hazardous (Qsnv>20 and somatic score≥5) somatic insertions/deletions and SNV were checked using IGV2.3.72 Software.
Variants annotation takes into account data available in dbSNP (dbSNP144), the 1000 Genomes Project (phase1_release_v3.20101123), the Exome Variant Server (ESP6500SI-V2-SSA137), and the Exome Aggregation Consortium (ExAC r3.0) and from an in-house database (201 exomes whole exomes for SNVs and 130 exomes whole exomes for indels). Functional consequences of variants on genes, transcripts, and protein sequence, as well as regulatory regions, are predicted by Variant Effect Predictor (VEP release 83) (stop, splicing, missense, synonymous…), as well as by location of the variants (e.g. upstream of a transcript, in coding sequence, in noncoding RNA, in regulatory regions). Regarding missense changes, two bioinformatics predictions for pathogenicity were available SIFT (sift5.2.2), PolyPhen (2.2.2). Other information like quality score, homozygote/heterozygote status, count of variant allele reads, mutation type (somatic or germline) and somatics score, the presence of the variant in the COSMIC database (version71) are reported.