Rapid, ultra low coverage copy number profiling of cell-free DNA as a precision oncology screening strategy

Current cell-free DNA (cfDNA) next generation sequencing (NGS) precision oncology workflows are typically limited to targeted and/or disease-specific applications. In advanced cancer, disease burden and cfDNA tumor content are often elevated, yielding unique precision oncology opportunities. We sought to demonstrate the utility of a pan-cancer, rapid, inexpensive, whole genome NGS of cfDNA approach (PRINCe) as a precision oncology screening strategy via ultra-low coverage (~0.01x) tumor content determination through genome-wide copy number alteration (CNA) profiling. We applied PRINCe to a retrospective cohort of 124 cfDNA samples from 100 patients with advanced cancers, including 76 men with metastatic castration-resistant prostate cancer (mCRPC), enabling cfDNA tumor content approximation and actionable focal CNA detection, while facilitating concordance analyses between cfDNA and tissue-based NGS profiles and assessment of cfDNA alteration associations with mCRPC treatment outcomes. Therapeutically relevant focal CNAs were present in 42 (34%) cfDNA samples, including 36 of 93 (39%) mCRPC patient samples harboring AR amplification. PRINCe identified pre-treatment cfDNA CNA profiles facilitating disease monitoring. Combining PRINCe with routine targeted NGS of cfDNA enabled mutation and CNA assessment with coverages tuned to cfDNA tumor content. In mCRPC, genome-wide PRINCe cfDNA and matched tissue CNA profiles showed high concordance (median Pearson correlation = 0.87), and PRINCe detectable AR amplifications predicted reduced time on therapy, independent of therapy type (Kaplan-Meier log-rank test, chi-square = 24.9, p < 0.0001). Our screening approach enables robust, broadly applicable cfDNA-based precision oncology for patients with advanced cancer through scalable identification of therapeutically relevant CNAs and pre-/post-treatment genomic profiles, enabling cfDNA- or tissue-based precision oncology workflow optimization.


SUPPLEMENTARY MATERIALS TCGA data analysis
TCGA pan-cancer copy number analyses were run on somatic (scna_minus_germline_cnv_hg19__seg) segmented Affymetrix SNP6 array-based copy-number calls for 11,576 tumor samples across 32 tumor types contained in the most recent (01/28/2016) TCGA GDAC Firehose standard data run (stddata__2016_01_28) [1]. Data was downloaded from the TCGA GDAC Firehose repository using the firehose_get utility (v0. 4.6), and the fraction of genome altered (FGA) was calculated as in cBioPortal (https://groups.google.com/forum/#!topic/ cbioportal/HKLa9C9m4y4). Specifically, FGA was calculated for all tumor samples as the total number of bases in regions affected by copy-number alterations with log2 (CopyNumberRatio) > 0.2 or < −0.2 divided by 3 billion (the approximate median number of bases in all segments for each sample across all analyzed samples and tumor types).

Cell-free DNA extraction
Five milliliters of peripheral blood were collected for 92 samples from 76 patients with metastatic castration resistant prostate cancer (mCRPC) and 10 healthy controls (5 male, 5 female) using K2 EDTA blood collection tubes (Cat: 366643, BD, NJ) (Table S1). Within 4 hr, blood was mixed with equal volume of PBS and Ficoll-Paque Plus (Sigma-Aldrich; MO) was used to separate plasma from red blood cells and peripheral mononuclear cells (PBMC). Plasma was centrifuged twice at 1500 g for 12 min to limit cell contamination and stored in −80°C.
For 11 patients (13 samples) with metastatic lung adenocarcinoma, 4 patients (7 samples) with metastatic colorectal cancer, 3 patients (3 samples) with leukemias, and 2 patients (4 samples) with sarcoma, one patient with both sarcoma and breast cancer, and a patient with uterine leiomyosarcoma, 10 mL peripheral blood was collected using Streck Cell-Free DNA BCT tube (Streck; NE) (Table S1). Within 4 hr, blood was centrifuged at 1600 g for 10 min, and then plasma was centrifuged at 1600 g for 10 min to remove cell debris and stored in −80°C. Cell free DNA was extracted from all plasma (2 mL) samples with QIAamp Circulating Nucleic Acid Kit (Qiagen; CA) according to the manufacturer's instructions. Sample collection and NGS was performed with Institutional Review Board approval.

Low-pass whole-genome sequencing and copynumber detection
Sequencing alignment and coverage analyses were performed using Torrent Suite version 5.0.2 (Ion Torrent, Carlsbad, CA). Initially, reads were aligned to the hg19 version of the human reference genome using tmap (v5.0.7) and aligned, non-PCR-duplicate reads (samtools v1.3) were used as input for our copy-number calling workflow. Genome-wide copy number alterations were first called using the QDNASeq R package (version 1.6.1) [2]. Briefly, the genome was divided up into variable bin sizes (15,25,50, 100, 500, and 1,000 kilobase-pair bins), and bin-level counts of high-quality mapped reads (MAPQ ≥ 37) were calculated separately for each sample. Raw bin-level counts were simultaneously corrected for GC content and mappability by fitting a LOESS surface through median read counts for bins with the same combination of GC content and mappability and dividing raw bin-level counts by the corresponding LOESS fitted value. GC-and mappability-corrected bin-level counts were then normalized by median bin-level corrected counts within each sample. Bins previously shown using either ENCODE or 1000G data to yield anomalous copy-number results due to germline copy number variants (CNVs), low mappability, or large stretches of uncharacterized nucleotides were excluded [2]. For each bin in each tumor sample, high-quality, corrected, median-normalized read counts were divided by average corrected, mediannormalized read counts from our 5 normal male samples. Segmented copy-number events were called from binlevel corrected, median-and control-normalized read counts using the circular binary segmentation algorithm implemented by the DNACopy (1.44.0) R package, and final segment-and bin-level copy-number values were used for subsequent analyses as described. Focal CNAs were defined as CNAs 1.5-20 Mb long with a log2(CNRatio) ≥ 0.2, thresholds similar to those described elsewhere [3].

Targeted sequencing: oncomine comprehensive assay
For 60 patient cfDNA samples (31 high tumor  content mCRPC samples, 13 low tumor content mCRPC  samples, 11 high tumor content non-mCRPC samples,  1 mCRPC sample with germline chr20 deletion, and 4  male normals; see Table S1) and both sheared UMUC-5 and VCaP gDNA samples, we performed targeted NGS using the DNA component of the Oncomine Comprehensive Assay (OCP), a custom multiplexed PCR-based panel of 2,530 amplicons targeting 126 genes. These genes were selected based on pan-cancer analysis that prioritized somatic, recurrently altered oncogenes, tumors suppressors and genes subject to high level copy alterations, combined with a comprehensive analysis of known/investigational therapeutic targets [4]. Barcoded libraries were generated from 1-20 ng of cfDNA per sample and multiplexed sequencing was performed using the Ion Torrent Proton sequencer. Library preparation with barcode incorporation, template preparation on the OneTouch 2 and sequencing using the Ion Torrent Proton sequencer (Ion Torrent, Carlsbad, CA) were performed according to the manufacturer's instructions. Data analysis was performed using Torrent Suite 5.0.2, with alignment by TMAP using default parameters, and variant calling using the Torrent Variant Caller plugin (5.0.2.1) using default low-stringency somatic variant settings. Variant annotation filtering and prioritization, along with genelevel copy number estimation, were performed essentially as described [4][5][6][7] using validated in house pipelines, and gene level copy-number calls, and prioritized point mutations, small insertions/deletions (indels), and copynumber variants were reported for each patient sample (Table S2 & S3). Copy number alterations called from targeted NGS data with log2(copy number ratio) >= 0.6 or <= -1.0 were prioritized.
In silico dilutions were subsequently carried out on both undiluted whole-genome sequencing cell line samples with our coverage-matched normal male sample (for all integer percent dilutions 0-100%), where for each dilution the following steps were executed: 1) Shuffle undiluted cell line & normal male FASTQ files (using code above) 2) Sample appropriate portion of reads from each file using seqtk NGS toolkit (v1.0-r31) (seqtk sample -s100 <FASTQ file><proportion_to_sample>) 3) Concatenate proportional FASTQ files (cat <vcap_prop_file><normal_prop_file>) 4) Map mixed read set to the reference genome (hg19) using identical mapping approach to that used for original undiluted cell line and patient whole-genome sequencing samples: tmap mapall -f hg19.fasta -r input.fastq -s output. bam -v -Y -u -prefix-exclude 5 -o 1 stage1 map 4 5) Sort and index aligned bam files for input to copy-number calling workflow Genome-wide copy number variation calls were subsequently generated for each in silico dilution as described (see Methods).

Clustering
Mean-shift, k-means, and xmeans clustering approaches were assessed and deployed to identify relevant clusters from segment-(whole-genome sequencing) or gene-level (targeted sequencing) copy number ratio data. All clustering analyses were carried out in R (3.2.3) using packages LPCM (v0.45-0), RWeka (0.4-26), or base packages as applicable. For mean-shift clustering, variable bandwidths were evaluated, supporting a static bandwidth value of 0.01 on exome or wholegenome copy-number calls. Mean-shift clustering showed the most consistent expected cluster identification across in vitro/in silico dilutions, and was used for all analyses described herein.

Tumor content estimation
For whole-genome sequencing samples, reference segment-level copy-number ratio distributions were established through serial in vitro and in silico VCaP and bladder (UMUC-5) cell line dilutions as described. A heuristic least squares based distance metric (LSS) was used to approximate tumor content from whole-genome copynumber data. LSS between cluster centroids was calculated as a proxy for tumor content using the following formula: where is the vector of cluster centroids for clusters identified by the mean-shift algorithm, is the length of the cluster vector, and is th element of this vector. If only one cluster was assigned for a given sample, LSS was calculated as the square root of the cluster center squared (equivalently, the absolute value of the cluster centroid): Reference LSS distributions were established across serial in silico dilution experiments at all integer percent dilutions 0-100% as described, and these distributions were used to guide tumor content estimation for patient samples. While tumor content estimates were not generated for samples with LSS values < 0.1, these samples were specifically scanned for focal CNAs, as described above.

In silico experiments: downsampling
For the VCaP and UMUC-5 in silico dilutions, as well as 9 patient cfDNA samples (5 w/highest tumor content, 1 germline chr21 deletion, 2 no tumor content), we carried out in silico downsampling experiments to evaluate capacity to call copy-number alterations at variable effective whole-genome coverages (range: 0.005-0.1×). After downsampling (using samtools view -s <proportion of reads to sample> -bh <original.bam. file>) for each sample, copy-number alterations were called across variable bin sizes as described. Given the effective coverages analyzed, bin sizes were not analyzable across all coverages (e.g., 0.01× whole-genome coverage corresponds to approximately 150 k single-end reads, leaving < 10 reads per 100 kb bin, on average). For this reason, we considered effective coverage & bin combinations ≥ 30 reads per bin as analyzable for this analysis.
Serial in silico downsampling experiments were also carried out on targeted sequencing data from 10 mCRPC patient plasma cfDNA samples (5 high tumor content, 1 germline chr20 deletion, and 3 normals) to 500, 250, 100, 50, and 25x effective target coverage by the same sampling approach taken with whole-genome data.

VCaP cfDNA WGS vs COSMIC array-based CN calls
Of 500 segment-level copy-number calls for chromosomes 1-22 & X reported as present in VCaP by COSMIC, 464 (92.8%) overlapped ≥ 90% of at least one 15kbp bin from our low pass (0.26x whole-genome coverage) analysis of undiluted VCaP, with 496 (99.2%) showing at least some (≥ 1 bp) overlap of one bin or more. We calculated median of bin-level integer copy number values for all 15kbp bins overlapped at ≥ 90% by a COSMIC-reported copy-number segment, and compared these low-pass sequencing derived values to segment-level integer copy-number values reported in COSMIC. Given the known variability in reported copy-number estimates for VCaP focal AR amplification (copy number of 14 reported by COSMIC; at least 3-18 copies by FISH [8]), we explored correlations between COSMIC segmented copy-number and both raw and capped (copy-number = 14) sequencing-derived copy-number values.

UMUC-5 cfDNA WGS vs Targeted NGS CN Calls
Copy number calls from whole genome sequencing of sheared UMUC-5 genomic DNA (gDNA) were compared to calls derived from targeted sequencing (OCP) of sheared DNA in this study. Of 126 genes targeted on the OCP, 90 had more than 3 amplicons and ampliconlevel estimate variability sufficient for gene-level copynumber analysis. Coding sequence for 87/90 genes (97%) overlapped at least one 15 kbp bin-level call from wholegenome sequencing data of sheared gDNA. Gene-level copy number estimates from whole-genome sequencing data were calculated as mean log2 copy number ratio for 15 kb bins overlapping genome space from first to last coding base pair for each of the 87 genes.

Application to exome sequencing segmented copy-number calls
In order to test the efficacy of this particular approach for approximating tumor content from alternate datasets, we tested our LSS approach on segmentedcopy-number calls from 129 clinical advanced/treatment refractory cancer tissue samples subjected to exome sequencing as part of the MI-ONCOSEQ project at the University of Michigan [9,10]. Tumor content for all MI-ONCOSEQ samples is estimated through a model fitting variant allele frequencies of all somatic mutations and a model assessing zygosity shift of heterozygous SNPs and local copy number [9,10]. As our analysis of TCGA copynumber data, the fraction of genome altered (FGA) was calculated for each MI-ONCOSEQ sample as the total number of bases in regions affected by segmented copynumber alterations with log2(CopyNumberRatio) > 0.2 or < -0.2 divided by 3 billion (the approximate median number of analyzable bases across all analyzed samples).

Concordance with tissue-based whole-exome sequencing copy-number profiles
Segmented log2 copy number ratio data from wholeexome sequencing of fresh frozen tissue specimens [9,11] was available for 23 of 27 patients also profiled by cfDNA low-pass WGS. Each of these 23 patients had at least 1 cfDNA plasma sample (range: 1-3), and 18 of 23 (78.3%) had at least 1 cfDNA sample with elevated tumor content (LSS ≥ 0.1) suitable for concordance analyses. For these 18, the median of cfDNA low-pass WGS bin-level copy number values for all 500kbp bins overlapped at ≥ 90% by a tissue-based copy-number segment was calculated as a pseudo-cfDNA segment call, and correlations between tissue-and cfDNA-based copy number ratios were evaluated.

Focal AR amplification determination
Given the difficulty of appropriate copy number segmentation on chrX, median 100kb bin-level copy number estimates across chrX q-arm were subtracted from mean 100kb bin-level copy-number estimates at AR locus (chrX:66.0-67.5Mb), and difference values >= 0.2 were used to call focal AR amplifications in our mCRPC cohort. Two cfDNA high tumor content samples (TP1216 and TP1295) met the above criteria, but were excluded as potential false positives due to use of 100kb bin width at low coverage (< 300,000 total high-quality (MAPQ >= 37) mapped reads). An additional low tumor content sample (TP1139) met the amplification criteria, but with excessive variability in chrX bin-level copy-number estimates, was considered negative for AR amplification for all subsequent analyses.

Validation of ThruPLEX cfDNA WGS for Ion Torrent Benchtop Sequencers
Given the limited amount of recoverable cfDNA requiring amplification for WGS, we first sought to validate the performance of ThruPLEX RGP-0003 WGA library construction with index barcode and adaptor incorporation for rapid sequencing on Ion Torrent benchtop sequencers. This single tube approach is compatible with ≤ 50pg double stranded DNA and generates libraries with Ion Torrent barcode index and sequencing adaptors in a three hour workflow. Using the Ion Torrent Proton sequencer, we sequenced pooled ThruPLEX cfDNA libraries from normal controls (n = 10), resulting in an average genome wide coverage depth of 0.62× (range: 0. 16 We next sought to validate the ability of our low pass WGA cfDNA profiling approach to robustly detect genome-wide copy-number profiles to facilitate direct management (if CNAs are actionable) and guidance of additional testing through cfDNA tumor content approximation. Using our WGA cfDNA profiling approach, in vitro dilutions of sheared genomic DNA for VCaP and UMUC-5 cell lines were sequenced to an average genome wide coverage depth of 0.72x (range: 0.19-1.15x) with average sequencing uniformity of 91.6% (range: 90.8-96.8%) (Supplementary Table 1). As shown in Supplementary Figure 2A, the genome-wide copy-number profile from low-pass WGS of an undiluted artificial VCaP cfDNA sample (genomic DNA sheared to ~180bp) revealed known broad and focal CNAs, including characteristic 8p loss/8q gain, focal AR amplification, and evidence of the previously-reported chromothripsis event on 5q [12]. We then compared regional overlap and magnitude of our low-pass WGS segmented copy number calls from the undiluted artificial VCaP cfDNA sample to those reported in COSMIC for VCaP [13]. We observed a highly significant correlation between our median segment-level low-pass sequencing-derived copy number calls from the undiluted VCaP sample and copynumber values reported in COSMIC (Pearson correlation = 0.77, p < 0.001), with even stronger correlation when the cfDNA copy-number estimation of the highly amplified AR locus was capped (see Supplementary Methods; Pearson correlation = 0.92, p < 0.001) (Supplementary Figure 2B). Likewise, we observed significant correlation between gene-level copy number estimates from a validated targeted multiplexed PCR based approach [4] on genomic DNA from the UMUC-5 urothelial carcinoma cell line compared to artificial UMUC-5 cfDNA subjected to ThruPLEX library preparation (Pearson correlation coefficient = 0.92, p < 0.001; see Supplementary Figure 2). Together, these results support the fidelity of copy number profiling from cfDNA using ThruPLEX library preparation and low coverage Ion Torrent sequencing.

Assessment of copy number calling robustness from ultra low pass cfDNA WGS across varying tumor content
Accurate approximation of tumor content from cfDNA is critical to using low-pass WGS to guide management based on low-pass WGS profiles. Hence, we next used our in vitro and in silico dilution series to develop a novel tumor content approximation approach (LSS) that can robustly approximate tumor content in cfDNA samples with tumor content > ~10% ( Figure  2C). Our approach leverages the distribution of segmentlevel copy number ratios to inform on cfDNA tumor content; specifically, we use the expectation that as tumor content decreases, so too should the distance between peaks (or 'clusters') in segment-level copy-number ratio distributions (Supplementary Figure 3A and 3B). After clustering of segment-level copy-number ratios and identification of cluster centers, our LSS metric aggregates the distance between cluster centers to infer cfDNA tumor content from low-pass WGS genome-wide copy number calls using known in vitro and in silico dilutions (see Supplementary Figure 4 and Supplementary Methods). Supplementary Figure 3C highlights the near-linear relationship between LSS and effective tumor content across both VCaP and UMUC-5 in silico dilutions, enabling application of these reference LSS distributions for interpretation of LSS values and approximation of tumor content from WGS cfDNA patient samples described below. In addition, as shown in Supplementary Figure 3D, we also found that our clustering and LSS approach applied to segmented genome wide copynumber profiles from the 129 MI-ONCOSEQ profiled advanced cancer tissue samples (see Fig 1B) was highly concordant to tumor content estimates made in MI-ONCOSEQ using exome-wide SNVs and heterozygous SNPs. Taken together, these results confirm our ability to detect clinically relevant focal CNAs (such as EGFR and AR amplifications) down to 5% effective cfDNA tumor content, determine genome-wide copy number profiles of both focal and broad CNAs, and approximate tumor content from ThruPLEX cfDNA libraries.
We next sought to validate our ability to detect genome-wide CNAs across a range of effective cfDNA tumor content for ultra low pass WGS (0.005-0.1x whole-genome coverage) using in vitro dilution data and further in silico dilution experiments (see Supplementary Methods). Across both the UMUC-5 and VCaP in vitro dilution series, genome-wide bin-and segment-level copy-number ratio estimates were generated and both focal and broad copy-number alterations were detected (Supplementary Figures 2, 5, and 7). For VCaP, significant correlations (p < 0.05) were observed for segment-level copy-number values down to effective tumor contents of 10% and 5% for in vitro and in silico dilutions respectively, suggesting our approach is capable of systematically detecting genome-wide copy number profiles even at low tumor content (see Supplementary Figure 2). As high level focal amplifications are the majority of actionable CNAs, we also focused on the known focal AR amplification in VCaP, along with the known focal EGFR amplification in UMUC-5 [14,15], which could both be robustly detected down to 5% tumor content based on in vitro and in silico dilutions (Supplementary Figure 7). For UMUC-5, we observed a similar ability to identify both focal and broad copy number alterations at expected copy-number ratios across the full in vitro dilution series (Supplementary Figure 5). Significant correlations between bin-level sequencing-derived copy number calls and gene-level copy number calls derived from targeted NGS for the UMUC-5 cell line were also observed across the full in vitro dilution series (p < 0.001; see Supplementary Figure 5).

PRINCe concordance with comprehensive tissuebased profiling
To systematically evaluate concordance for somatic molecular alterations across multiple biocompartments, we focused on 27 of the 76 men (35.5%) with mCRPC already profiled by cfDNA low-pass WGS (corresponding to 33 of 93 (35.5%) mCRPC cfDNA samples) where synchronous or asynchronous comprehensive whole exome and whole transcriptome profiling was attempted on fresh frozen or FFPE tissue specimens. Of 27 men, 4 (14.8%) had either insufficient tumor content for comprehensive profiling by biopsy or incomplete tissue profiling data for analysis. Notably, all 4 men had cfDNA samples that yielded clinically informative results. TP1182 [TP_2007] was a plasma cfDNA sample taken 2 years after MiOncoseq biopsy (MiOncoseq biopsy reported 10% tumor content by pathology review, and no DNA sequencing was done on MiOnco FFPE tissue), and by plasma cfDNA profiling demonstrated a focal AR amplification and focal 2-copy PTEN deletion. TP1201 and TP1405 [TP_2278] were plasma cfDNA samples taken 2 years and 1 month prior to Mi-Oncoseq research biopsy of metastatic bone lesion, respectively. Tissue biopsy yielded < 10% tumor content by pathologist review and was not profiled by DNA or RNA sequencing. While TP1201 (taken two years prior to tissue biopsy) had no detectable cfDNA tumor content nor focal copy-number alterations, TP1405 (taken 1 month prior to the research biopsy) shows low (but detectable) tumor content and detectable focal AR amplification.
Another plasma cfDNA sample (TP1353 [MO_1579]) was taken 7 months after a Mi-Oncoseq tissue biopsy that yield 6 tissue blocks w/no tumor content, while low-pass cfDNA WGS revealed a focal AR amplification and 2-copy PTEN deletion. Both copynumber alterations were validated by targeted NGS of the same cfDNA sample, and TP53 G245S (variant fraction: 32.7%) and ATM D817H (28%) mutations were also detected. Lastly, TP1354 [TP_2171] was taken 1 week before a Mi-Oncoseq bone needle core biopsy that yielded no tumor by pathology review ('blind sequencing' of DNA & RNA was still attempted). While no somatic alterations were identified by comprehensive tissue-based profiling of the bone needle core biopsy sample, cfDNA lowpass WGS and targeted NGS identified both a focal AR amplification, as well as TP53 R282W (14.2%) and KDR T1336I (5.4%) somatic mutations. These results highlight potential complementary clinical utility for plasma cfDNA profiling in comprehensive tissue-based NGS workflows.
Of 23 men with comprehensive tissue-based profiling and at least 1 profiled cfDNA sample (range of cfDNA samples per individual: 1-3), 20 (87%) had a cfDNA sample w/elevated cfDNA tumor content amenable to analysis. To evaluate concordance between cfDNAand tissue-based DNA copy-number profiles, segmented tissue-based copy-number profiles were compared to whole genome cfDNA segmented copy-number profiles for the 18 of 20 (90%) individuals with fresh frozen tissue specimens within 2 years of blood sample collection (see Supplementary Methods). Low-pass whole-genome WGS copy-number profiles were highly correlated (median r = 0.86 [range: 0.54-0.94) for these samples despite variable specimen tumor content and sample collection synchronicity (median number days between tissue-and cfDNA specimen collection: 108 (range: 0-682 days)). One patient (TP_2073/TP1303) with synchronous tissue and blood specimens displayed both high correlation of genome-wide copy-number profiles (Pearson corr: 0.96), as well as fully concordant somatic mutations for regions targeted by both tissue whole exome and cfDNA targeted NGS approaches, including a putative homozygous TP53 splice mutation (p.R2494X) present at 89% in tissue and 49% in cfDNA (VF: 48.5%, 414/853 flow-corrected variant-containing reads). Further, 5 of 10 patient samples with clear 21q22.2 copy-number deletion consistent with deletion leading to TMPRSS2:ERG gene fusion were from patients with tissue-based whole transcriptome profiling, and TMPRSS2:ERG fusion isoform expression was confirmed in 5/5 (100%) corresponding tissue specimens.
In total, 17/58 (29.3%) high tumor content cfDNA samples demonstrate detectable focal PTEN deletions, of which 11 (64.7%) were likely 2-copy losses. Of these, 4 had near-synchronous analyzable tissue-based profiling data, and all 4 corresponding tissue-based copy-number profiles show focal deep / likely 2-copy PTEN deletions. Copy number losses affecting RB1 were also frequent in our high tumor content cohort, with 4 samples (4 patients) exhibiting focal 2-copy RB1 deletions. While 3 of these 4 patients were lost to follow-up, the remaining patient (TP1320) also had detectable AR amplification, and having received a single (taxel-based) line of therapy post-ADT, progressed rapidly on abiraterone over the course of 3 months on therapy (PSA rising from 37.3 to 70.4), with PCa-related death 4 months after cfDNA profiling (Supplementary Table 1). Another patient (TP1282) also underwent comprehensive synchronous tissue profiling (MO_1473) of a left femur bone biopsy 2 weeks after blood collection, and while demonstrating high overall copy-number concordance (Pearson corr: 0.94), tissue profiling identified only a 1-copy RB1 loss compared to the 2-copy loss seen by cfDNA low-pass WGS. Overall, given the known frequency of RB1 hemi-and homozygous copy number loss in advanced and castration-resistant neuroendocrine/small-cell prostate cancer [11,16], these results highlight our capacity to detect therapeutically relevant focal copy-number deletions from low-pass WGS of cfDNA from routine blood samples.
While high levels of overall genome-wide concordance were observed between tissue-and plasma cfDNA-based copy-number profiles, discrepancies with important clinical relevance were also identified. In one patient with a history of both primary prostatic adenocarcinoma and a metastatic lesion with small cell carcinoma/neuroendocrine features (TP1019/MO_1234), synchronous profiling of same-day specimens detected a clear focal AR amplification in the cfDNA that was absent in the tissue based profiling of a small cell carcinoma focus (despite identical prioritized somatic point mutations), suggesting circulating evidence of both AR-driven and AR-independent clones. Further, applications of this approach in advanced, treatment-naïve patients have also suggested utility for identifying clinically relevant copynumber changes (including focal 2-copy PTEN and RB1 deletions) in patients with high tumor burden. Overall, these results suggest noninvasive profiling may yield high concordance with near-synchronous tissue profiling for clinically relevant molecular alterations, and can provide unique complementary advantages and opportunities for expansion into treatment-naïve patient cohorts.

PRINCe in other cancers and for disease monitoring
To demonstrate feasibility and potential applicability of PRINCe for actionable CNA detection and disease monitoring in other tumor types, we also assessed 31 plasma cfDNA samples from 24 patients with other cancers (including lung, breast, colon, and sarcomas) (Supplementary Table 1 Our non-mCRPC cohort also contained paired preand post-treatment cfDNA samples from several patients. For example, we profiled pre-and post-EGFR inhibitor treatment initiation cfDNA samples from ULMC-125 (a patient with metastatic lung cancer). By PRINCe, ULMC-125's pre-treatment sample showed focal EGFR and FGFR1 amplifications and multiple arm-level and whole chromosome gains and losses, while previous ddPCR on the cfDNA identified an activating EGFR L858R hotspot mutation at a 62.5% variant fraction (consistent with amplification of mutant EGFR) ( Figure 5E). PRINCe analysis of ULMC-125's post-treatment cfDNA sample showed no detectable CNAs, consistent with no detectable EGFR L858R by ddPCR. PRINCe analysis of serial preand post-treatment cfDNA samples for ULMC-151 and ULMC-194 (patients with colorectal adenocarcinoma) also showed substantial depletion in detectable genomewide CNAs in the post-treatment samples, consistent with reduced (though non-absent) tumor-derived cfDNA fragment representation.
Low-pass WGS and targeted NGS profiling of a pretreatment plasma sample (PD-L1006_1) from a patient with stage IV lung adenocarcinoma who subsequently achieved a complete response after two doses of a PD-L1 inhibitor revealed high-level focal amplifications of CCND3 and CD93, along with a KRAS G12C hotspot mutation (6.4%, 29/456 variant containing reads). Three consecutive weekly plasma samples taken 5 months after completion of palliative radiotherapy from ULMC-185, an individual with a history of neurofibromatosis type 1 and pelvic sarcoma, highlight consistently elevated cfDNA tumor content and detectable focal EGFR amplification in each plasma sample, and putative germline ATM1 (I124V) and BAP1 (T423K) mutations in each sample. Together these results suggest substantial potential clinical utility for treatment response and disease monitoring using highly scalable complementary whole-genome and targeted cfDNA NGS-based profiling strategies.
These results further reinforce our ability to detect therapeutically relevant alterations across multiple cancer types using low-pass WGS of patient plasma cfDNA, even at tumor contents as low as 10%. Likewise, although our approach will obviously not be able to detect molecular evidence of disease recurrence at ultra-low tumor content (in contrast to ultra-deep, extremely accurate or personalized sequencing/ddPCR methodologies [17][18][19]), it can identify pre-treatment genome wide CNA profiles and cfDNA tumor content estimates that may enable lowcost and more frequent assessment of molecular evidence of recurrence in post-treatment cfDNA samples.

PRINCe as a precision medicine screening strategy
Given the above results demonstrating the utility of our approach for cfDNA based CNA profiling and tumor content approximation with very low coverage (~0.1-1×), we next sought to determine the robustness of ultra-low pass WGS (0.005-0.1x coverage) in order to decrease sequencing costs per sample. Down-sampling across our cell line and mCRPC samples demonstrated that our approach robustly determined high-quality whole-genome copy-number profiles down to 0.005x whole-genome coverage. For example, Supplementary Figure 6 depicts the CNA profile across effective whole genome coverages down to 0.005× (~82,000 reads) for patient sample TP1337, with robust detection of both broad and focal clinically relevant CNAs. Overall, we show this method performs well at 0.005x coverage down to effective tumor contents of ~10%, though accurate approximation of tumor content (vs. detection of CNAs) is challenging at such low coverage/tumor content. Importantly, however, we observed that the high level, focal EGFR amplification in UMUC-5 cells, as well as the AR amplification observed in VCaP cells and 8 of 9 (89%) high tumor content mCRPC samples, can be robustly detected at 0.01x coverage. At 0.005× coverage, although automated detection of AR amplification is less reliable, bin level copy number estimates demonstrate clear amplifications in the majority of samples. Taken together, our results support our ultra-low pass cfDNA WGS based PRINCe approach as capable of estimating tumor content from genome wide copy number profiles as well as identifying high level focal amplifications, a key therapeutic class of somatic alterations in cancer.

PRINCe to guide additional precision oncology testing
Although ultra-low pass cfDNA WGS is capable of detecting high level CNAs at relatively low tumor contents, additional approaches are needed to detect other alteration classes (mutations, short insertion/deletions and chromosomal rearrangements) and in patients with very low tumor content. As shown in Figure 1C, PRINCe approximation of cfDNA tumor content can be used to guide additional precision medicine in patients without potentially actionable/informative CNAs. For example, in patients with high tumor content by ultra-low pass cfDNA WGS, additional routine targeted sequencing, exome sequencing or WGS could all be performed on the cfDNA (or WGA cfDNA library), with coverage informed by the estimated tumor content, while ultra-deep cfDNA sequencing (or sequencing a tissue sample) can be reserved for patients with very low cfDNA tumor content.
To demonstrate the potential utility and feasibility of PRINCe in guiding such additional testing, we subjected separate 1-20 ng aliquots of unamplified cfDNA from 61 of our patient samples (32 high tumor-content and 14 low tumor-content mCRPC samples, 11 high tumor content non-mCRPC samples, and 4 male control samples with sufficient DNA), as well as the undiluted artificial VCaP and UMUC5 cfDNA samples as positive controls, to targeted multiplexed PCR based NGS using the DNA component of the Oncomine Cancer Assay (OCP). The OCP assay targets 126 potentially actionable tumor suppressors and oncogenes recurrently altered across cancers; we have previously validated this assay for somatic mutation and copy number calling from FFPE isolated DNA [4].
Sequencing of pooled patient samples resulted in a median average coverage of 1,075x (range: 42-17,944x), with average uniformity of 96.0% (higher than typically observed for FFPE DNA samples [4]). OCP on cfDNA confirmed high level EGFR amplification in UMUC-5, and high level AR amplifications in VCaP and all 22 high tumor content mCRPC samples sequenced (Supplementary Table  3). OCP sequencing of TP1337 cfDNA validated highlevel AR amplification, focal two-copy PTEN deletion, 1 copy RB1 deletion, and 8q gain originally identified by low-pass cfDNA WGS, and enabled detection of a unique somatic 28bp TP53 frameshift deletion (L264del28bp, variant allele frequency 20.8% with 504 covering reads) (Supplementary Figure 6). Critically, we observed high correlation between gene-level copy number alterations by targeted sequencing and segment-level calls in patient cfDNA samples from PRINCe (Pearson correlation coefficient: 0.80, p < 0.001). OCP sequencing of TP1291 cfDNA validated 1-copy PTEN and BRCA2 copynumber loss, and 2-copy RB1 deletion, as well as focal AR amplification observed in cfDNA low-pass WGS. Interestingly, targeted NGS of TP1291 cfDNA also detected a known Clinvar pathogenic stop-gain SNV at 71.1% variant fraction (p.R2494X, 1022/1437 variant containing reads), consistent with copy-number loss of the wild-type BRCA2 allele. Further, we identified the known homozygous TP53 R248W missense SNV in the VCaP sample (observed variant allele frequency 95.3%; 657× flow corrected coverage), as well as somatic, prioritized TP53 mutations in all 5 (100%) high tumor content mCPRC patient samples sequenced (see Supplementary  Table 4). In silico down-sampling experiments in targeted NGS data suggest mean coverages as low as 50× enable reliable detection of known putative clonal somatic point mutations, indels, and copy number variants in UMUC-5 simulated cfDNA and patient cfDNA samples with high tumor content ( Supplementary Figures 12 and 13). Taken together, these results underscore the potential for PRINCe followed by targeted sequencing (tuned to cfDNA tumor content) as part of a high-throughput, cost-effective clinical or translational research NGS workflow.