Genomic characterization of human papillomavirus-positive and -negative human squamous cell cancer cell lines

Human cancer cell lines are the most frequently used preclinical models in the study of cancer biology and the development of therapeutics. Although anatomically diverse, human papillomavirus (HPV)-driven cancers have a common etiology and similar mutations that overlap with but are distinct from those found in HPV-negative cancers. Building on prior studies that have characterized subsets of head and neck squamous cell carcinoma (HNSCC) and cervical squamous cell carcinoma (CESC) cell lines separately, we performed genomic, viral gene expression, and viral integration analyses on 74 cell lines that include all readily-available HPV-positive (9 HNSCC, 8 CESC) and CESC (8 HPV-positive, 2 HPV-negative) cell lines and 55 HPV-negative HNSCC cell lines. We used over 700 human tumors for comparison. Mutation patterns in the cell lines were similar to those of human tumors. We confirmed HPV viral protein and mRNA expression in the HPV-positive cell lines. We found HPV types in three CESC cell lines that are distinct from those previously reported. We found that cell lines and tumors had similar patterns of viral gene expression; there were few sites of recurrent HPV integration. As seen in tumors, HPV integration did appear to alter host gene expression in cell lines. The HPV-positive cell lines had higher levels of p16 and lower levels of Rb protein expression than did the HPV-negative lines. Although the number of HPV-positive cell lines is limited, our results suggest that these cell lines represent suitable models for studying HNSCC and CESC, both of which are common and lethal.


INTRODUCTION
Head and neck squamous cell carcinoma (HNSCC) is the seventh most common cancer worldwide, with over 600,000 cases annually [1].More than 50,000 new cases occur each year in the United States alone [2].The age-adjusted incidence of oropharyngeal cancer has been increasing at an alarming rate of 5% per year over the past decade in the United States [3] and at a similar pace in other industrialized countries [4].No significant improvements in the overall survival of patients with advanced disease have been made in about 3 decades.One of the risk factors associated with the marked increase in HNSCC cases is infection with high-risk types of human papillomavirus (HPV), which is associated with 71% of oropharyngeal cancer cases [5].HPV was first identified as a causative agent for cervical squamous cell carcinoma (CESC) about 40 years ago [6].Despite advances in early screening, HPV-associated CESCs are still the number one cause of death in women worldwide, with nearly 90% of the mortalities occurring in the developing world [5,[7][8][9].It is estimated that by 2020, HPV-positive oropharyngeal cancer cases will outnumber HPV-driven CESC cases in the United States [3].Additionally, HPV has been implicated in squamous cell cancers of the anogenital region, including penile, vulvar, and anal cancers [8,9].
Although HPV-associated and non-HPV-associated HNSCCs display clinical, epidemiologic, and molecular differences, the therapeutic regimens remain the same for both etiologies [10].The discovery of new therapeutic interventions for HNSCC requires conducting preclinical studies, which rely heavily on human tumor cell lines.Several reports have identified actionable molecular targets by investigating the genomic characteristics of HNSCC cell lines and tumors [11][12][13].Recent mutational and transcriptomic analyses of up to 22 HNSCC cell lines have provided valuable data on the characteristics of HNSCCs [11,14,15].The results of short tandem repeat (STR) profiling of 61 unique HNSCC cell lines have also been published [16].While researchers worldwide frequently use these cell lines and some of their characteristics have been published in a piecemeal fashion, the genomic characteristics of the available HNSCC and CESC cell lines have not been thoroughly investigated and compared in a single study.To complement and extend prior studies, we performed genomic and proteomic analyses of 55 HPV-negative and 9 HPV-positive HNSCC cell lines.These cell lines represent the vast majority of the preclinical models currently used worldwide for in vitro and in vivo biomarker and targeted therapy identification for HNSCC.In order to tease out the genomic alterations attributable to HPV, we also included CESC cell lines, which are predominantly HPV-positive, in our analyses.The purpose of this study was to determine the genomic profiles-including mutations, HPV integration, and viral gene expression-of a large number of frequently used HPV-negative HNSCC cell lines and a majority of the HPV-positive tumor cell lines.Through these analyses, we found similar incidences of common mutations and genomic alterations in targetable genes as observed in previously performed preclinical studies as well as in the clinical samples curated by the cancer genome atlas (TCGA), suggesting that these cell lines are suitable candidates for studying HNSCC and CESC.

Cell line selection and authentication with STR profiling
Despite the recent increase in the incidence of HPVpositive HNSCC and the high global prevalence of CESC, only 17 HPV-positive cell lines are widely used in research (9 HNSCC and 8 CESC cell lines).We characterized these cell lines and, for comparison, 2 commonly used HPVnegative CESC cell lines and 55 HPV-negative HNSCC cell lines, for a total of 74 cell lines (Supplementary Table 1).Like the patient tumors in TCGA, HPV-positive HNSCC cell lines are predominantly HPV16-positive, and CESC cell lines are predominantly HPV16-and HPV18-positive [17].We performed STR analysis to confirm cell line identity and conducted whole exome sequencing, mRNA sequencing (RNASeq), and reverse phase protein array (RPPA) analysis for genomic and proteomic characterization of the cell lines.The STR profiles of many of these cell lines have been previously reported [16]; however, we report here for the first time STR profiles of several HPV-positive HNSCC cell lines (Supplementary Table 2).Before conducting the genomic and proteomic analyses, we determined the doubling times of the HPV-positive HNSCC and CESC cell lines.Characterization of the doubling times of the HPVnegative HNSCC cell lines had been previously performed [16].Therefore, we focused on the HPV-positive cell lines and all the CESC cell lines (Supplementary Table 3).Overall, the HPV-positive HNSCC cell lines had longer doubling times than did the CESC cell lines (mean: 69 hours vs. 51 hours), but the difference was not statistically significant (t test p = 0.12).

Viral gene expression patterns
The roles of HPV oncoproteins E6 and E7 that promote apoptosis and cell cycle progression by respectively targeting p53 and Rb proteins for degradation are well established [19,20].To determine mRNA levels of the HPV oncogenes E6 and E7, we performed real time, quantitative reverse transcription polymerase chain reaction (RT-PCR) amplification using primers targeting these genes [19,20].We detected the presence of HPV16 E6 and E7 mRNA and/or protein in all 10 HPV16positive cell lines (Figure 2A).We also detected HPV33 E7 mRNA in UTSCC45 cells and HPV18 E7 mRNA in 4 HPV18-positive cell lines.The HPV type for 3 cell lines differed from that reported in the literature.The HPV type for ME180 cells is variably reported as HPV18, HPV39, or HPV68 [21][22][23].We detected neither E6 nor E7 from HPV18 or HPV39 in ME180 cells using RT-PCR.However, we did detect HPV68 E6 expression by RT-PCR and HPV68 E1, E2, E6, and E7 expression by RNASeq in these cells (Supplementary Figure 1).Furthermore, we detected no integrated HPV sequences in ME180 cells (integration described below).In MS751 cells, we did not detect HPV18 E7 by RT-PCR but did detect integration of HPV45 (described below).Finally, in CaSki cells, we detected HPV16 but not HPV18 [24].Detection of E6 and E7 proteins was limited by a lack of reliable antibodies, but we were able to detect HPV16 E7 protein in all 10 HPV16positive cell lines using Western blotting (Figure 2B).
We used transcriptome analysis (RNASeq) to identify the HPV genes expressed (Supplementary Figures 1-2) and integrated (Table 1) in the HPV-positive cell lines and TCGA tumors and detected E1, E2, E4, E5, E6, E7, L1, and L2 expression at variable levels in tumors and cell lines.
The expression level of E5 was lower in cell lines than in tumors.Otherwise, viral gene expression levels were similar in cell lines and tumors, although the low number of cell lines and wide variability of expression levels precluded a formal comparison (Figures 2C-2D; Supplementary Figures 1 and 2).As previously described [25,26], the predominant form of E6 in both cell lines and tumors was E6 * .

Detection of HPV integration sites
We investigated HPV integration in the cell lines and patient tumors from the TCGA using whole exome sequencing data.Similar to several previous reports summarizing studies of HPV integration sites, we confirmed HPV integration sites in HeLa, SiHa, CaSki, UPCISCC090, 93VU147T, HMS001, UDSCC2, and UMSCC47 cell lines by using RNASeq [12,27].Furthermore, we newly identified HPV integration sites in UPCISCC152, UPCISCC154, UMSCC104, UTSCC45, C4-I, C4-II, SW756, and MS751 cell lines (Supplementary Table 4, Figure 3A).While we confirmed the presence of HPV68 in ME180 by RT-PCR, no HPV integration sites were present in this CESC cell line.

HPV integration alters host gene expression
Several reports have demonstrated the enrichment of HPV integration sites within or near cellular genes [12,27].We evaluated the effects of HPV integration on cellular gene expression levels in the HPV-positive cell lines.Genes that were altered by HPV integration were more highly expressed than were the same genes that were unaffected by HPV integration in other HPV-positive cell lines.For instance, we confirmed previously reported altered expression of FOXE1 in UPCISCC090 and UPCISCC152 cell lines and of SLC47A2 in UMSCC104 cells [12] (Figure 4).We also identified decreased expression of NR1P1, a nuclear receptor protein, and USP25, a ubiquitin-specific peptidase, in UPSCISCC154 cells (Supplementary Figure 4).

Differential gene and protein expression of HNSCC and cervical cancer cell lines
We compared global gene expression patterns in 74 HNSCC and CESC cell lines.The majority of the CESC cell lines clustered together, with more variable results for the HPV-positive HNSCC cell lines.Cell lines from the same patients also clustered together (Supplementary Figure 5).We performed RPPA proteomic profiling on all the HPV-positive HNSCC cell lines and a comparison group of HPV-negative HNSCC cell lines in the same batch to eliminate batch effects.The samples were probed with 305 antibodies targeting total and/or phosphorylated proteins found in oncogenic signaling pathways (Supplementary Table 6).Two sample t-test was applied to identify differentially expressed proteins.The resulting p values were fitted by a beta-uniform mixture model [27] in order to correct for multiple hypothesis testing and to estimate false discovery rate (FDR).We found 23 proteins that were differentially expressed at a FDR of 0.2 (Supplementary Table 7) including p16 that results from Rb inactivation and is used as a clinical marker for HPV expression [28].As anticipated, HPV-positive cell lines had significantly higher expression levels of p16 and lower expression of phosphorylated Rb (pRb) than did HPV-negative cell lines (Supplementary Figure 6).

DISCUSSION
In this study, we characterized mutations, viral gene expression, and viral integration in all the readily available HPV-positive HNSCC cell lines, 10 CESC cell lines, 55 HPV-negative HNSCC cell lines, and over 700 patient tumors from TCGA.Using whole exome sequencing, we identified mutations in the cell lines that were similar in frequency to those found in patient tumors.As expected, HPV-positive cell lines expressed higher levels of p16 and lower levels of pRb than did HPV-negative cell lines.With the exception of cell lines derived from the same patients, each cell line had a different HPV integration site.Similarly, patient tumors had few recurrent viral integration sites.Also, as previously demonstrated for tumors, HPV altered the expression of genes near the integration sites.
Several other investigators have characterized HNSCC and CESC cell lines [4,11,15,16,27,29,30].Those studies have served as resources for the cancer research community, which relies on these established models.Our research builds on and integrates those studies, as we have characterized and compiled the most comprehensive set of HPV-positive HNSCC and CESC cell lines, with a large number of HPV-negative HNSCC cell lines for comparison.We confirmed the HPV positivity of all the HPV-positive cell lines but found that the type of HPV differed from that reported in the literature for 3 of the CESC cell lines.The differences between our results and those reported in the literature are likely attributable to the different HPV detection methods we used, as exemplified by the results for ME180 cells.Before PCR came into wide use, HPV18 DNA was detected in ME180 cells by DNA hybridization, but no HPV18 RNA was detected in these cells [21].However, hybridization conducted under more stringent conditions failed to detect HPV18, though sequencing identified HPV39 in ME180 cells [22].Later, Longuet et.al. [23] cloned and amplified HPV68 from ME180 cells.
The mutation profiles of the 74 cell lines included in this study resembled those found in similar human tumors.We identified TP53 mutations almost exclusively in HPVnegative cell lines.We also identified frequent mutations in CASP8, NOTCH1, FAT1, and PIK3CA.One noteworthy finding in the HPV-positive HNSCC cell lines was the absence of PIK3CA mutations.However, HPV-negative HNSCC and HPV-positive CESC cell lines had PIK3CA mutations at rates similar to those found in human tumors, suggesting that these mutations are not selected against during cell line creation.
We confirmed published HPV integration sites for 8 cell lines and found identical integration sites in cell lines that originated from the same patients, demonstrating that our technique is reproducible.We also newly defined HPV integration sites in 8 additional cell lines.As in HPV-positive CESC and HNSCC tumors [12,27], we found very few recurrent integration sites in our cell lines, even though the majority of HPV-positive cell lines had viral integration.Until recently, HPV integration events were thought to be random, and the main function of HPV integration was believed to be stabilization of viral  (A) Triangles point to the exact integration locations and may overlap.Each circle represents an integration event; circles that are close to each other may be shifted for better visualization.The size of the circles represents discordant read pairs in different ranges.A read pair was reported as discordant if the paired-end reads were uniquely mapped with one end to a human chromosome and the other to the virus chromosome.Discordant read pairs are evidence of HPV integration.Larger circles indicate that more discordant read pairs were mapped to that integration site and, therefore, provide stronger evidence for integration.(B) Unfilled triangles point to the exact integration event and may overlap.The shapes of the symbols refer to different sample types or sources.Each color represents a gene, but the location of the symbols may be shifted slightly for better visualization.Only genes with HPV integration in 2 or more patient tumor samples in TCGA or cell lines were plotted.
oncogene transcription [31].However, more recent reports using next-generation sequencing showed that integration events frequently occur (> 80% of the time) within or near known oncogenes, tumor suppressor genes, or DNA repair genes [12,27,32].These findings support the idea of nonrandom integration, in which cells containing HPV integration breakpoints in specific genes are selected for and may play a role in cancer progression.Consistent with this hypothesis, we observed altered host gene expression at or near HPV integration sites, similar to published findings in HPV-positive tumors [12,32,33].The levels of HPV E6 and E7 expression as measured by RT-PCR, Western blot, and RNASeq varied significantly across cell lines.Similarly, E6 and E7 mRNA expression (as determined by RNASeq) varied significantly between tumors from the TCGA.Although more than 3 decades of research on HPV-associated cancers has solidified the importance of viral oncogenes E6 and E7 [20,34], their expression is not sufficient for tumor development in vivo [35,36].In addition to the oncogenic activities of E6 and E7, other events are required for full transformation to cancer.Likewise, inhibition or knockdown of E6 and E7 has varied effects in HPV-positive cell lines [37][38][39] which suggests that HPVpositive cancers have variable dependence on E6 and E7, leading to heterogeneity in the expression levels of these oncogenes.
One limitation of our study is that we identified HPV integration sites using whole exome sequencing, which is limited to exomes and prevents the detection of HPV integration events that do not occur in protein-coding regions of the genome.Whole genome sequencing may have identified additional sites of integration but was not performed because of its cost.Another limitation was the small number of HPV-positive cell lines that were available for study.The large disparity between the number of available HPV-positive and HPV-negative cell lines makes it difficult to perform statistical analyses that may reveal In summary, in this panel of 74 human HNSCC and CESC cell lines, the frequency and patterns of mutations resembled those found in human tumors.In the HPVpositive cell lines, viral gene expression and integration largely mirrored that found in human tumors.These results suggest that these widely used HNSCC and CESC cell lines are suitable models for studying HNSCC and CESC and may be useful in research that could lead to new therapies while emphasizing the need for the development of more HPV-positive models.Our analyses and characterization of these cell lines will serve as a valuable resource for the cancer research community.

Growth curve analysis
Growth curve analyses were performed on cells prior to the tenth passage after thawing.The cells were passaged during log phase growth before they reached confluence.Cells were trypsinized with 0.25% trypsin/ethylenediaminetetraacetic acid and allowed to detach for 2 to 3 minutes at 37 o C. Following trypsin inactivation with serum-containing media, the cells were counted and serially diluted from 2500 cells/ mL to 40,000 cells/mL.The cells were then seeded in a 384-well plate (50 μL/well and 36 wells per dilution).The plated cells were allowed to attach at room temperature for 45 minutes and then placed in the cell culture incubator overnight.Each day for 4 days, 7 wells from each cell dilution were fixed, stained with 4′,6-diamidino-2-phenylindole, and counted.The doubling times were calculated using the formula T d = 3 * (LOG [2]/LOG [cell number at 96 h/cell number at 24 h]).

DNA and RNA isolation
For each cell line, genomic DNA was extracted from 1 to 2 million cells using the ArchivePure DNA cell/ tissue kit (5Prime, South San Francisco CA) following the manufacturer's instructions.The extracted DNA was stored in nuclease-free water (Ambion, Austin, TX).For each cell line, RNA was isolated from 5 million cells using the RNeasy Plus Kit (Catalog #73404; Qiagen, Germantown, MD) according to the manufacturer's protocol and eluted into ribonuclease-free water.cDNA synthesis and real-time PCR for HPV mRNA and genomic DNA quantitation cDNA was synthesized from 1 μg of RNA using the iScript cDNA Synthesis Kit (Catalog #170-8890; Bio-Rad, Hercules, CA) according to the manufacturer's protocol.The synthesized cDNA was used as a template to determine the HPV E6 and E7 gene expression levels for HPV types 16, 18, 33, 45, and 68.To determine relative HPV viral gene expression levels, genomic DNA was isolated, and HPV E6, E7, and GAPDH primers were used for real-time, quantitative PCR (Supplementary Table 8).Each 20-μL RT-PCR reaction contained 1× SsoFast SYBR Green SuperMix (Bio-Rad), a 100-nM concentration of each primer, and 25 ng of DNA.Thermocycling conditions were 98°C for 2 minutes for 1 cycle, 98°C for 30 seconds, and 60°C for 5 seconds for 40 cycles.

Immunoblotting (western blotting)
Cells were seeded at a density of 0.5 x 10 6 /mL in 10-cm dishes.The following day, cells were harvested and lysed using cell lysis buffer (Cell Signaling Technology, Danvers, MA) supplemented with a protease inhibitor cocktail.Cell lysates were incubated for 15 minutes on ice and then centrifuged for 5 minutes at 14,000 rpm.The collected supernatants were stored at −80 o C. Protein concentrations were determined using a bicinchoninic acid assay (Catalog #23252; Thermo Scientific, Rockford, IL).Ten micrograms of total protein was separated by sodium dodecyl sulfate polyacrylamide gel electrophoresis, transferred onto a nitrocellulose membrane, and detected using ECL detection reagents (Catalog #RPN2232; GE Healthcare, Pittsburgh, PA).The nitrocellulose membranes were blocked for 1 hour using blocking buffer (5% milk, 1× phosphate-buffered saline, 0.1% Tween 20).The cells were incubated with anti-HPV16 E7 (Cervimax, Vienna, Austria) (1:500) overnight at 4 o C. Protein expression was normalized using β-actin (Catalog #A1978; Sigma-Aldrich, St. Louis, MO).

RNA sequencing
RNA was isolated from the cell lines using the RNeasy Plus kit (Qiagen).Illumina-compatible libraries were prepared using a TruSeq Stranded Total RNA Sample Prep Kit (Illumina, Inc., San Diego, CA).Briefly, 500 ng of DNase I-treated total RNA was depleted of ribosomal RNA using biotinylated, target-specific oligos.Following purification, the RNA was fragmented using divalent cations, and first-strand cDNA synthesis was carried out using random primers.Following secondstrand synthesis, the ends of the resulting double-stranded cDNA fragments were repaired, 5′-phosphorylated, and 3′-A tailed.Illumina-specific Y-shaped indexed adapters were ligated.The products were then purified and enriched using PCR to create the final cDNA library.The libraries were quantified using qPCR (Kapa Biosystems, Wilmington, MA), assessed for size distribution using an Agilent Bioanalyzer (Agilent Technologies, Santa Clara, CA), and then multiplexed and sequenced on an Illumina HiSeq3000 sequencer using the 75-bp pairedend format.After sequencing, BCL files were converted to ".Fastq.gz"files, and individual sample libraries were de-multiplexed using CASAVA 1.8.2 software (Illumina) with no mismatches.

Whole exome sequencing and mutation calling
NimbleGen SeqCap EZ libraries (Roche-NimbleGen, Madison, WI) were prepared per the manufacturer's protocol.Briefly, indexed libraries were prepared from 0.5-1.0μg of sheared and RNasetreated genomic DNA using the KAPA HTP Library Preparation Kit (Kapa Biosystems).The indexed libraries were amplified by 7 cycles of ligation-mediated PCR.Following amplification and reaction cleanup using Agencourt AMpure XP beads (Beckman Coulter, Brea, CA), the libraries were quantified fluorometrically using the Qubit dsDNA HS Assay (ThermoFisher, Waltham, MA) and assessed for size distribution using the Fragment Analyzer (Advanced Analytical, Ankeny, IA).Library concentrations were normalized, and the libraries were pooled at 250 ng/library with 4 to 6 libraries/pool.Each multiplexed library pool was hybridized to a probe pool from the SeqCap EZ Human Exome Enrichment Kit v3.0 (Roche-NimbleGen).Following capture, the exomeenriched libraries were amplified using 6 cycles of PCR and then purified using Agencourt AMpure XP beads.The libraries were then quantified fluorometrically using the Qubit dsDNA HS Assay and assessed for size distribution using the Fragment Analyzer.Exome enrichment efficiency was determined using qPCR.Sequencing was performed on a HiSeq4000 Sequencer (Illumina) using the 75-nt paired-end format, with 1 pool (4-6 libraries) sequenced per lane.Following sequencing, BCL files were converted to ".Fastq."files, and individual sample libraries were de-multiplexed using bcl2fastq2 conversion software version 2.17.1.14(Illumina).
To account for the lack of matched normal DNA for established cancer cell lines, we employed several levels of variant filtering, as previously reported, to perform somatic mutation calls [11,53].The mutation filter removed any noncoding or silent sequences.The filtering steps also removed mutations found with high frequency in the 1000 Genomes Project and ESP6500 databases [54].Next, we removed mutations found in 3 or more unique patients and rescued any splice site mutations, frameshift indels, stopgain mutations, and stoploss mutations that may have been removed.We also rescued CLINVAR "pathogenic" mutations and COSMIC mutations that were present in more than 4 cases, as we described previously [55].

Reverse phase protein array
Cell lysates were serially diluted twofold for 5 dilutions (from undiluted to 1:16 dilution) and arrayed on nitrocellulose-coated slides in an 11 × 11 format, as we described previously [56,57].We stained the RPPA slides with 305 unique antibodies (Supplementary Table 6), which were analyzed as we previously described [56].

HPV detection and identification of integration sites using VirusSeq
For HPV detection and integration analysis, we applied the VirusSeq pipeline previously developed in the Department of Bioinformatics and Computational Biology at The University of Texas MD Anderson Cancer Center [58].The HPV detection analysis began by preparing reference genomes and annotation files.The reference sequences included human genome hg19 and the virus genomes from the Genome Information Broker for viruses (GIBV, http://gib-v.genes.nig.ac.jp/).Next, the paired-end reads of the samples were aligned against hg19 for human sequence subtraction using the MOSAIK algorithm [59,60].Unmapped reads were then aligned against virus genomes to detect the viral genomes.The overall count of the mapped reads in each viral genome was generated, and an empirical cutoff of 1000 read counts mapped within a viral genome was used to distinguish HPV-positive from HPV-negative samples.
To detect HPV integration, the reference genome and annotation files were prepared.The reference genome was a hybrid genome that combined the human hg19 and selected virus genomes containing different types of HPV and other viruses.Paired-end reads were aligned against the hybrid genome, hg19Virus, using MOSAIK to generate files containing all of the discordant reads for each chromosome.The read pair was reported as discordant if the paired-end reads were uniquely mapped with one end to a human chromosome and the other to the virus chromosome.Next, the first mate and the second mate of the paired-end reads were separately aligned against the hybrid hg19Virus genome in order to generate the genomic location for each uniquely mapped read, including information on readID, read sequences, and read mapping orientation.This information was then used for annotating the discordant reads generated previously.A curated refFlat file was used to annotate the discordant reads with human and virus genes.Finally, HPV integration sites were identified using VirusSeq and reported if there were at least 4 supporting discordant read pairs.

Human and HPV gene expression estimation
To compare global gene expression patterns in HNSCC and CESC cell lines, we performed unsupervised clustering in an unbiased manner using Pearson correlation metrics with Ward's linkage of the 5000 most variable genes, measured by median absolute deviation, in the 74 cell lines.RNAseq reads were first mapped to human genome hg19 using Tophat/2.0.13 [63].The read count for human genes was generated using HTSeq [64].The unmapped reads were then mapped to a combined reference genome of all HPV subtypes.The HPV reference genomes were downloaded from Papillomavirus Episteme database (https://pave.niaid.nih.gov/).The read counts for HPV genes were counted using a Python script developed in house.The reads per kilonucleotide per million-read (rpkm) values were calculated by dividing the read count by the total number of reads in millions and the gene length in kilonucleotides.

Figure 1 :
Figure 1: HNSCC and CESC cell lines have mutation frequencies similar to those in patient tumors.All HNSCC and CESC cell lines were subjected to whole exome sequencing.The 25 most frequently mutated genes are listed by cell line.For comparison, mutation frequencies of 431 HPV-negative HNSCC, 79 HPV-positive HNSCC, and 194 CESC human tumors were compiled from TCGA.

Figure 2 :
Figure 2: HPV viral gene expression in HNSCC and CESC cell lines and TCGA tumors.mRNA (A) and protein (B) were extracted from HPV-positive cell lines and subjected to qPCR and immunoblotting, respectively, for the viral oncogenes E6 and E7.GAPDH and β-actin were used as controls, respectively.Text color indicates HPV viral type: red, HPV16; blue, HPV18; gold, HPV68; black, HPV45; green, HPV33.HPV gene expression patterns in all HPV-positive samples from CESC and HNSCC cell lines (total 17) and CESC and HNSCC TCGA patient samples (total 301) for 5 HPV types (HPV16, HPV18, HPV33, HPV45, and HPV68) (C) or just HPV16-positive cell lines and TGCA samples (D).Reads coverage for HPV16-positive cell lines and TCGA samples.Expression levels are represented as reads counts per million reads (CPM).The HPV16 gene structures are presented in the bottom panel, with blue blocks represent exons and black lines represent introns.

Figure 3 :
Figure 3: HPV integration sites in HNSCC and CESC cell lines and tissues are diverse with few areas of recurrent integration.Viral integration sites for all HPV-positive cell lines alone (A) and combined with the TCGA HNSCC and CESC tumors (B).

Figure 4 :
Figure 4: HPV integration alters host gene expression.RNASeq reads occupancy profile for FOXE1 (A) and SLC47A2 (B) genes in CESC and HNSCC cell lines.Text and data for the cell lines with the HPV integration sites (red vertical lines) detected near these genes appear in orange.

Table 2 : Genes in which HPV was integrated in at least 5 TCGA tumors or HPV positive cell lines
differences in gene and protein expression levels.Another limitation is a lack of HPV-positive oropharynx cell lines.