Mutation profiling of tumor DNA from plasma and tumor tissue of colorectal cancer patients with a novel, high-sensitivity multiplexed mutation detection platform

BACKROUND Circulating tumor DNA (ctDNA) holds promise as a non-invasive means for tumor monitoring in solid malignancies. Assays with high sensitivity and multiplexed analysis of mutations are needed to enable broad application. METHODS We developed a new assay based on sequence-specific synchronous coefficient of drag alteration (SCODA) technology, which enriches for mutant DNA to achieve high sensitivity and specificity. This assay was applied to plasma and tumor tissue from non-metastatic and metastatic colorectal cancer (CRC) patients, including patients undergoing surgical resection for CRC liver metastases. RESULTS Across multiple characterization experiments, the assay demonstrated a limit of detection of 0.001% (1 molecule in 100,000) for the majority of the 46 mutations in the panel. In CRC patient samples (n=38), detected mutations were concordant in tissue and plasma for 93% of metastatic patients versus 54% of non-metastatic patients. For three patients, ctDNA identified additional mutations not detected in tumor tissue. In patients undergoing liver metastatectomy, ctDNA anticipated tumor recurrence earlier than carcinoembryonic antigen (CEA) value or imaging. CONCLUSIONS The multiplexed SCODA mutation enrichment and detection method can be applied to mutation profiling and quantitation of ctDNA, and is likely to have particular utility in the metastatic setting, including patients undergoing metastatectomy.


Mutation enrichment and sequencing workflow
Internal positive controls (IPCs), used to calculate the process yield and to monitor assay performance for each mutation individually, were then added to each sample. The IPCs have identical sequence to the mutant alleles at PCR primer and SCODA probe sites, but additionally contain random identifier sequences (RIDs), which are random DNA barcodes that facilitate yield calculations for individual input molecules and allow controls to be easily distinguished from mutant DNA sequences arising from plasma genomic DNA. Approximately 50 internal positive control molecules were added for each mutation in the 46-mutation panel. A negative control sample, containing 300 ng wild-type DNA (Roche Diagnostics, Indianapolis, IN), was run in parallel to the test samples.
Each sample was then assigned a unique sample DNA barcode in a multiplex 10-cycle barcoding PCR reaction with Q5 DNA polymerase (New England BioLabs, Ipswich, MA). 99% of each sample (up to a maximum of 300 ng DNA) was used as template to amplify 7 loci containing the 46 mutations: BRAF exon 15, KRAS exon 2, PIK3CA exons 9 and 20, and EGFR exons 19, 20, and 21. The remaining 1% of the sample DNA was barcoded in a separate reaction in which mutation panel loci and two additional control loci (COG5 and ALB, for quantification) were amplified. In both barcoding PCR reactions, all primers contain 5' tags used as universal linkers, allowing amplification of all loci with a single primer set in later steps. The barcoded amplified products and quantification reaction products were then pooled and purified with Zymo DNA Clean and Concentrator columns (Zymo Research, Irvine, CA).
Purified PCR products were enriched for mutations using the Boreal Genomics OnTarget platform, which is based on the previously described sequence-specific SCODA technology [11]. Polyacrylamide gel, containing short probes complementary to all 46 mutations of interest and the two wild-type control sequences, was cast into cartridges designed for the OnTarget platform. Samples were loaded into the cartridges, and synchronous, time-varying electric fields were applied to the cartridge at a fixed baseline temperature. The enrichment process focused all mutants and control sequences into an extraction well, while rejecting wild-type and off-target sequences to a waste well. Enriched DNA recovered from this process was further purified with Zymo Oligo Clean and Concentrator columns (Zymo Research, Irvine, CA).

Data Analysis
Sequencing data was analyzed in a fully automated fashion using custom analysis scripts written using BWA (Burrows-Wheeler Aligner) [12] for alignment to a custom reference library based on sequences from within the 46-mutation panel and SAM Tools (Sequence Alignment/Map) [13] for further data manipulation following alignment. Mutation quantification, quality control, and visualization were performed using scripts written in Perl, Python, and MySql and with tools including Graphviz.
Raw FastQ files from the MiSeq were first de-multiplexed by sample barcode, trimmed to retain only the endogenous regions of each molecule lying between the barcoding PCR primers, and then filtered according to the following criteria:

a)
Forward and reverse reads must align to the same reference sequence b) Both reads carry the same mutation c) The mutation identified must be contained within the 46-mutation panel Reads satisfying the above conditions were binned according to sample barcode and mutation. The remaining reads were then re-analyzed to determine whether they aligned to a separate reference library for the internal positive control molecules as follows: 1.
The first 15 bases of the endogenous section of each read were aligned to a reduced set of reference sequences for the loci within the SCODA 46-mutation panel 2. RID barcodes were found by searching for flanking sequence specific to its locus 3.
RID sequences were then removed from the endogenous sequence; the remaining endogenous sequence was then passed through the tests (a)-(c) above.
IPC reads passing the filter were corrected for sequencing errors within the RIDs and binned according to sample barcode, mutation, and unique RID sequence. The average single molecule yield through the entire workflow for each sample barcode / mutation combination was then calculated as the average number of reads over all RIDs for that barcode and mutation.
The number of input mutant molecules for each mutation within each sample was then calculated by dividing the number of mutant reads for a given barcode by the average single molecule yield for that mutation and barcode. A similar process was followed for the wild-type COG5 and ALB sequences, and used to measure the total number of genomes that entered the workflow, taking into account that only 1% of these loci was amplified in the barcoding PCR reaction. Mutation abundances were calculated as the ratio of input mutant copies to total input genome copies.
Mutations were called as positive only if they passed the following criteria:  The detected number of input mutant copies must be ≥ 1.
 The detected input mutation abundance must be greater than the average number of copies detected in historical wild-type samples plus 3 standard deviations (99.9% confidence interval, 1 tailed).
 The detected input mutation abundance must be greater than the greatest number of copies seen in the wild-type samples run in parallel with test samples.
 The detected mutation abundance must be ≥ 5% the abundance of any other mutations having an edit distance of 1 (i.e. a single base change, insertion or deletion). This criterion prevents false positive calls due to cross-talk, i.e. conversion of one mutant into another mutant due to PCR errors following multiplexed SCODA mutation enrichment. In cases where one mutation is present at high abundance, this criterion can have a significant impact on the limit of detection for closely related mutations.
Note that the reported limit of detection is the maximum mutation abundance (%) associated with the above criteria.

Assay Characterization
Prior to running clinical samples from cancer patients, the assay was characterized for specificity, sensitivity, and reproducibility. Some mutations with higher background, and higher limits of detection (e.g. EGFR T790M) cannot be reliably detected at 10 copies nominal input, due to the background signal observed for these mutations in wild-type DNA (see Supplementary Figure 1). This issue is less pronounced in samples with a lower overall mass of input DNA, where the background would have a lower absolute copy number (data not shown). Nearly all of the mutation calls for WT samples showed <1 input copy (478 / 552 total mutation tests), with only 1 false positive observed over 12 runs (KRAS G12C, 3 copies, 0.003%; LOD 1 copy, 0.001%).
The assay was further characterized by assessment of plasma DNA from a separate set of 47 healthy donors with no clinically apparent evidence of cancer. DNA was extracted from 10 mL of plasma from each subject, and the resulting DNA was analyzed by the 46-mutation SCODA assay. As expected, no mutant DNA sequences were detected in plasma DNA for the vast majority of these subjects (43 of 47, 91%). In 4 of the 47 subjects, mutant DNA sequences were detected at very low levels (1-3 copies) approaching the limit of detection. Three of the four subjects had sufficient plasma to allow for re-testing.
For two of these subjects, re-testing revealed no mutant DNA sequences. For the third subject, the same mutant DNA sequence (KRAS G12R) was again detected. In the absence of an orthogonal method with a similar level of analytical performance, it was not possible to characterize these cases as false positives or low-level signals corresponding to somatic mutation events that were not clinically evident. Even in the cases where re-testing did not confirm the presence of mutant DNA sequences, the low level of mutant DNA could have been missed due to sampling noise in the second plasma sample.