A comparison of isolated circulating tumor cells and tissue biopsies using whole-genome sequencing in prostate cancer

Previous studies have demonstrated focal but limited molecular similarities between circulating tumor cells (CTCs) and biopsies using isolated genetic assays. We hypothesized that molecular similarity between CTCs and tissue exists at the single cell level when characterized by whole genome sequencing (WGS). By combining the NanoVelcro CTC Chip with laser capture microdissection (LCM), we developed a platform for single-CTC WGS. We performed this procedure on CTCs and tissue samples from a patient with advanced prostate cancer who had serial biopsies over the course of his clinical history. We achieved 30X depth and ≥ 95% coverage. Twenty-nine percent of the somatic single nucleotide variations (SSNVs) identified were founder mutations that were also identified in CTCs. In addition, 86% of the clonal mutations identified in CTCs could be traced back to either the primary or metastatic tumors. In this patient, we identified structural variations (SVs) including an intrachromosomal rearrangement in chr3 and an interchromosomal rearrangement between chr13 and chr15. These rearrangements were shared between tumor tissues and CTCs. At the same time, highly heterogeneous short structural variants were discovered in PTEN, RB1, and BRCA2 in all tumor and CTC samples. Using high-quality WGS on single-CTCs, we identified the shared genomic alterations between CTCs and tumor tissues. This approach yielded insight into the heterogeneity of the mutational landscape of SSNVs and SVs. It may be possible to use this approach to study heterogeneity and characterize the biological evolution of a cancer during the course of its natural history.


CTC isolation by polymer nanofiber (PN)-nanovelcro CTC chips
We employed a previously described method for CTC isolation (1). Briefly, we fabricated our polymer nanofiber NanoVelcro substrate using an electrospinning method (2)(3)(4) for the deposition of PLGA nanofibers onto a commercially available laser micro-dissection (LMD) membrane slide. Then, we fabricated the PDMS chaotic mixer based on a "soft-lithography" approach (5), which utilizes a silicon mold with designed structures to identically replicate the structure of the PDMS chaotic mixer. After assembling the PDMS chaotic mixer onto our PN-NanoVelcro substrate, we create a microfluidic channel that facilitates the chemical modification and subsequent CTC capture on the PN-NanoVelcro CTC Chip.
To achieve specific immobilization of CTCs we utilized streptavidin to conjugate the polymer nanofibers to facilitate the capture of CTCs labeled with biotinylated anti-EPCM. Briefly, 0.5 mL EDC (8 mg mL -1 ) and sulfo-NHS (2 mg mL -1 ) is prepared with 1x PBS and is slowly loaded into the channel to convert the carboxyl group on the terminal of the PLGA molecule to an amine-reactive sulfo-NHS ester. The channel is then rinsed multiple times with a PBS solution to eliminate the free EDC and sulfo-NHS. Streptavidin (250 μg mL -1 ) is then introduced into the channel, which reacts with the sulfo-NHS ester. After removal of the free streptavidin molecules, the PN-NanoVelcro chip coated with streptavidin is then rinsed with PBS multiple times and is ready for subsequent blood processing.
Patient blood samples were collected into EDTA-containing vacutainer tubes (BD bioscience) according to standard phlebotomy protocols. Whole blood was treated with an RBC lysis buffer (Biolegend) following the manufacturer's protocol. The peripheral blood mononuclear cells (PBMCs) were treated with biotinylated anti-EpCAM (R&D, 8μg mL-1, in 200μL PBS with 2% (w/v) BSA) at room temperature for 0.5h. After washing, the PBMCs (200μL) along with the biotinylated anti-EpCAM labeled CTCs were then loaded into 1 mL disposable syringes and introduced into the PN-NanoVelcro CTC Chips at a constant flow rate (0.5 mL/hr) via a syringe pump (KDS200, KD-Scientific). After processing, captured cells were fixed by 100% ice-cold ethanol. After rinsing away the remaining ethanol, an antibody cocktail consisting of FITCconjugated anti-CK and PE-conjugated anti-CD45 (in 2% Donkey Serum) was used for immunocytochemistry staining for one hour at room temperature away from light. After staining, the PN-NanoVelcro substrate was first washed by PBS and then dissembled from the PDMS chaotic mixer. After drying for 10 minutes in a vacuum container, the chips were loaded into a LCM microscope (ArcturusXT TM , Life Technologies) for CTC isolation.
In order to achieve the isolation of high purity single-CTCs, CTCs were first identified under the LCM microscope based on their morphology and fluorescence staining (CK+/CD45-). Selective dissection of CTCs was achieved by UV laser, and the dissected single CTCs were captured by the IR-activated conical polymer pillar onto a CapSure TM HS Cap. Finally, the HS Caps with dissected CTCs were stored in -80°C until analysis could be performed. To ensure these CTCs are free of contamination, the entire procedure is carried out in a Class 1000 clean room in the core facility of the California NanoSystems Institute at UCLA.

Single-CTC and tissue sample preparation and sequencing
After obtaining single CTCs, WGA is achieved on these samples using the REPLI-g Single Cell Kit according to the manufacturer's manual (QIAGEN GmbH). A reaction of a total volume of 50 ml was performed at 30 o C for 8 hours and then terminated at 65°C for 10 min. Amplified DNA products were stored at -20 o C. For quality assessment of the amplification product, a multiplex PCR protocol was performed over eight preselected housekeeping genes on different chromosomes, with the assumption that the integrity of these areas may represent the amplification quality of the entire genome. In our protocol, single-cell WGA products with successful PCR

SUPPLEMENTARY DATA
www.impactjournals.com/oncotarget/ Oncotarget, Supplementary Materials 2015 products in more than 6 housekeeping genes were chosen as qualified samples for subsequent sequencing.
For the FFPE archival primary tumor tissue, adjacent normal and liver metastasis tissues, the tissue core biopsies were obtained from the pathology core at CSMC. Tumor content in the primary tumor was >80%, and the tumor content was close to 100% in the metastatic liver tumor. DNA extraction was performed on these samples using the RecoverAll TM Total Nucleic Acid Isolation Kit for FFPE (Life Technologies). For the metastatic liver biopsy, due to the insufficient quantity of the FFPE DNA, we performed WGA by the REPLI-g FFPE Kit (Qiagen), which consists of random ligation of DNA fragments and MDA. For normal WBC control samples, we collected the flow-through from the PN-NanoVelcro CTC Chips, which are void of CTCs. This flow-through went through DNA extraction by the DNeasy tissue and blood kit (Qiagen). Due to unsatisfactory DNA quantity, direct MDA was performed using the REPLI-g Single Cell Kit.
DNA libraries were prepared using the TruSeq DNA kit (Illumina). The DNA library was then put on the Illumina CBot for template enrichment. Sequencing was performed on the Illumina HiSeq 2000 platform with paired-end 100 bp runs. For the WBC, tumor tissue and CTC samples, the targeted sequencing depths were 30X. For normal adjacent tissue, due to the insufficient DNA material, the targeted sequencing depth was 10X.

WGS data analysis
We used the Human (Homo sapiens) reference genome sequence (hg19) and its annotation files (dbSNP v137) for our analysis. They were downloaded from the University of California Santa Cruz Genome Bioinformatics website (http://genome.ucsc.edu/). For the WGS analysis pipeline, BWA 0.6.2 (6) was used for the alignment. After removing PCR duplicates by Picard 1.72, GATK 2.2.3 was used for indel-realignment and base quality recalibration (7). SNPs were called by the GATK unified genotyper algorithm.
Lorenz curves were used to assess the coverage uniformity in the single-CTC WGS data. Briefly, 100 million reads were randomly sampled from the WGS data. The human genome reference (hg19) was equally divided into 100 segments and the number of reads aligned to each area was calculated. The cumulative fraction of the total reads that cover a given cumulative fraction of the genome was used to calculate the Gini coefficient using ineq package in R. The Lorenz curve of our single-CTC WGS was compared with previously published singlecell sequencing data (8). The diagonal line indicates a perfectly uniform distribution of reads, and deviation from the diagonal line indicates an uneven distribution of reads. Raw data will be available in GenBank with the Accession number SRA121256.

SSNV analysis
SSNVs were identified by the following methods: 1. The bam file of a tumor genome was directly compared with the WBC reference using MuTect 1.1.4 (9).
2. These SSNVs were not present in the dbSNP 137 database.
3. The SSNVs in the normal adjacent tissue was identified by the comparison of normal tissue to the WBC reference via the MuTect algorithm. The SSNVs identified from the differences between normal adjacent tissue and WBC were used to further filter the SSNVs of CTCs and tumor tissues.
Founder SSNVs were identified as the SSNVs shared between the primary and metastatic tumors. Clonal SSNVs in the CTCs were defined as SNVs detected in at least three single CTCs. For the assessment of supporting reads of founder SSNVs in the CTC WGS, Samtools mpileup commend was used to examine the presence of reads supporting the founder SSNVs in the CTCs. The Samtools mpileup commend was also used to examine the primary and metastatic tumors for the presence of supporting reads of the clonal SSNVs in CTCs.

Statistical model for assessing the probability of SSNVs in single-CTCs
Bayesian probabilistic models are widely used in common SSNV calling tools, such as GATK for calling SNPs and Mutect for calling SSNVs. However, for singlecell sequencing data, we have to take into account the ADO rate when assessing the confidence of SSNV calls. For each site, denote bi and ei as the called base and error probability of the ith read (i = 1…n) spanning the site from a single-CTC sequencing data. We then calculate the posterior probability for the true diploid genotype Gk in { AA, AC, AG, AT, CC, CG, CT, GG, GT, TT } of the single-CTC given the sequencing data by averaging out all possible sequenced genotypes Al that might have an allele dropped out after WGA: where e ADO is the ADO rate and we assume the two alleles have the same chance to be dropped out. For the prior probability, we use the equation published by Larson et al (10)., such that where R is the reference base and π = 10 −5 is the expected rate of heterozygous mutations in human samples. For a specific alternative allele M, the LOD score is defined as .
For the determination of ADO, since there is no CTC bulk sequencing that can be compared with single-CTC data, we utilized an estimation approach. First, the heterozygous loci in the WBC, primary and metastatic tumors were determined individually by the GATK unified genotyper. Due to the possible loss of heterozygosity in tumor cells, we selected only loci present as heterozygous in all three of the tissues into our analysis. The genotype of each single-CTC was compared with the abovementioned list of loci. The ADO was estimated as the total number of homozygous loci in the list divided by the total number of loci.
A heatmap was constructed using the LOD score. For the concern of visualization, we truncated the maximum LOD score to 1, where LOD≥1 means the posterior probability of the CTC harboring the specific alternative allele is greater than 10 times the probability of no mutations. We note the minimum value -5 was introduced by the prior probability of observing the somatic mutation. Thus, LOD≥-5 means that if we ignore the prior probability, the likelihood of the CTC harboring the specific alternative allele is greater than no mutations.

WES of CTCs
CTCs isolated for WGA that showed one to five bands with correct product size from the multiplex PCR were marked as suboptimal for WGS. Eight of these CTCs isolated from samples collected from July to September 2012 were sent for WES. The DNA library was prepared using the TruSeq DNA kit (Illumina) followed by exome enrichment using the SeqCap EZ Human Exome library kit (v3.0, Roche). Enriched exome DNA was then put on the Illumina CBot for template enrichment. Sequencing was performed on the Illumina HiSeq2000 platform with paired-end 100 bp runs. The SSNVs in the CTC WES were analyzed utilizing exactly the same pipeline described above.

Validation of variations
Due to the fact that false positives can be introduced during the amplification process of single-cell DNA, validation of our findings couldn't be performed on the same cells. Therefore, our mutation sites were validated by the following approach. First, we performed additional whole exome sequencing to validate that the somatic exonic mutations sites were not present in WBCs. In brief, DNA from another batch of WBCs was extracted using the same method. The DNA library was then prepared using the TruSeq DNA kit (Illumina) followed by exome enrichment using the SeqCap EZ Human Exome library (v3.0, Roche). Enriched exome DNA was then put on the Illumina CBot for template enrichment. Sequencing was performed on the Illumina HiSeq2000 platform with paired-end 100 bp runs. The data alignment was performed as mentioned earlier, but the genotypes of all bases were called by GATK. For the validation of false somatic mutations, randomly selected somatic mutation sites were checked in this new dataset. The success of validation was defined as somatic mutation sites with REF genotypes found in the WBC exome sequencing.
Randomly selected SSNVs were further validated by Sanger sequencing. A total of 30 exonic mutations were selected for validation in all tumor and WBC samples. Among them, five of them did not have suitable primers for PCR amplification. The success of validation is defined as the detection of identified mutations in any of the tumor tissues divided by the total number of sites with PCR products.

Structural variation analysis
For the analysis of structural variations (SVs), the CREST algorithm (14) was used for the identification of breakpoints of possible SVs. Due to the concern that CREST was not designed for FFPE and WGA tissues, the identification of supporting reads of the SVs were done manually on large segment rearrangements. In brief, the paired-end reads near the two ends of the identified breakpoints were plotted. The reads with indels were eliminated to avoid possible mapping errors. The reads with two ends on each sides of the breakpoint were identified as supporting reads. The supporting reads condition was also assessed using WBC and normal adjacent tissue controls to validate the somatic nature of the SVs. For the SVs in the important oncogenes, the lengths of these SVs are mostly < 1kb, so that they cannot be assessed by the manual validations of supporting reads. For the validation of ETS rearrangement status, reads in ERG and its reported fusion partners, SLC45A3, HERPUD1, TMPRSS2 and NDRG1 were examined. SVs in PTEN, RB1 and BRCA2 were too small to be validated by the supporting reads method. CASS (15) was used for the identification of CNV analysis with a floating window of 100kb in size. The identified CNVs were filtered based on their p-values (<0.05). FREEC (16) with the setting of a 100kb fixed window was used to compare the tumor and normal tissue for the construction of Figure 2D and S5. The final results of SSNV analysis were based on the overlap areas between those identified by CASS and FREEC. For the aCGH, the DNA from the FFPE preserved primary prostate tumor tissue was prepared using the Recoverall total nucleic acid isolation kit (Ambion, Life Technologies). The CytoScan HD array assay (Affymetrix) was performed following the manufacturer's manual. The results were analyzed using the Affymetrix Chromosome Analysis Suite (ChAS) 2.0 software.