The subclonal structure and genomic evolution of oral squamous cell carcinoma revealed by ultra-deep sequencing

Recent studies suggest that head and neck squamous cell carcinomas are very heterogeneous between patients; however the subclonal structure remains unexplored mainly due to studies using only a single biopsy per patient. To deconvolute the clonal structure and describe the genomic cancer evolution, we applied whole-exome sequencing combined with ultra-deep targeted sequencing on oral squamous cell carcinomas (OSCC). From each patient, a set of biopsies was sampled from distinct geographical sites in primary tumor and lymph node metastasis. We demonstrate that the included OSCCs show a high degree of inter-patient heterogeneity but a low degree of intra-tumor heterogeneity. However, some OSCC cancers contain complex subclonal architectures comprising distinct subclones only found in geographically distinct regions of the primary tumors. In several cases we find mutations in the primary tumor that are not present in the lymph node metastasis. We conclude that metastatic potential in our population is acquired early in tumor evolution as evident by the ongoing parallel evolution in several primary tumors.


Supplementary Figure 22: Illustration of the visual interpretation method for the BAF vs copy number plot analysis in a situation with 90% tumor content and one subclone constituting 44 % of the tumor. LogR is defined as
Log2 copy number ratio. BAF: B-allele frequency. TC: Tumor content. AB: diploid, 1 wildtype (A-allele), 1 mutant allele (B-allele). B: 1 mutant allele, Loss of Heterozygosity. AAB: triploid, 2 wildtype, 1 mutant allele. ABB: triploid, 1 wildtype, 2 mutant alleles. BB: 2 mutant alleles.      Genes identified as driver genes by iCAGES based on missense mutations. RadialSVM: score measuring the driving potential of this missense mutation. Phenolyzer: score measuring its association with cancer based on prior knowledge and gene-gene interaction. iCAGESGeneScore: score measuring the final cancer driving potential. One would expect a ratio of 2:1 if the mutations were random passenger mutations. One-tailed binormial test was used.

Construction of phylogenetic trees
The phylogenetic trees are based on the BAF vs copy number plots (Supplementary Figures 4-22) and the mutational data (Supplementarys Table 3-7). Detecting possible subclones was done by visual interpretation of the plots (Supplementary Figure 23). In a tumor, there are somatic point mutations and copy number mutations which are present in all of the cancer cells, these mutations are believed to be the earliest, and they are, in the figure, represented by grey colored circles. We expect that these early point mutations, which are still located in diploid regions (AB all , A being the wildtype allele and B being the mutated allele), show a pronounced cluster at BAF all ≈ ½BAF max and LogR ≈ 0. From this cluster the tumor content (TC) can easily be derived: TC = 2BAF all . Correspondingly, there will be LOH regions in all cancer cells (B all ) in which the wildtype A-allele is lost, and the somatic point mutations in these regions are located at With identification of the events in all cancer cells, subclonal events can be identified by locating clusters of mutations that differ from these patterns. Specific subclonal point mutations and subclonal copy number events that occur in subclonal specific mutations are represented by red colored circles. The AB subclone region will have a lower BAF than ½TC (LogR ≈ 0), and its corresponding LOH region (B subclone ) will be located to the right of AB subclone with a LogR value between 0 and LogR of B all . Loss of wildtype and two copies of mutation allele (BB) for the subclone can be located at LogR = 0 and BAF ≥ AB subclone (not shown in figure). Duplication, of either the A-allele or B-allele, is shown to give a similar pattern for the subclone as for the early mutations in all tumor cells.
Interestingly, a subclone can alter the copy number of somatic point mutations present in all cancer cells, but the LogR will be characteristic of the subclone. Subclonal specific copy number events that occur in point mutations present in all cancer cells are represented by grey colored circles with a red outline. For a subclone that alters the copy number of somatic mutations present in all cells, B subclone will be located diagonally between AB all and B all , and BB subclone will have a LogR value between LogR of B all ≤ 0, and a BAF > AB all .
Combining the approach detailed above with the use of color coding the mutations, based on which biopsy they appear in, we can determine the distribution of subclones. Mutations that characterize different subclones can be found in different biopsies with different distribution. The distribution of a clone is determined as the ratio between BAF(AB subclone ) and BAF(AB all ). Lastly, using the assumption that every clone inherits the ancestral clone's somatic mutations, we can determine the tumor evolution. Mutations believed to be the earliest are present in all biopsies, as these mutations have been inherited by all clones. These mutations would characterize the ancestral clone; however, as described above, a subclone can also be characterized by copy number events. This means that mutations that are present in all biopsies but differ from the ancestral clone would be one or more subclones. Mutations that are not present in all biopsies characterize one or more subclones. It can be difficult to determine the lineage of parallel evolved subclones as they are derived from one ancestor, therefore we cannot determine which evolved first. In contrast, linearly evolved subclones have a clear lineage.
The phylogenetic trees are constructed with the assumption that each lymph node metastasis is derived from one cell of a clone in the primary tumor. The mutations that are seen in all cancer cells in the metastasis, determines which primary tumor clone it was derived from.