Gene expression analysis of TIL rich HPV-driven head and neck tumors reveals a distinct B-cell signature when compared to HPV independent tumors

Human papilloma virus (HPV)-associated head and neck squamous cell carcinoma (HNSCC) has a better prognosis than it's HPV negative (HPV(−)) counterpart. This may be due to the higher numbers of tumor-infiltrating lymphocytes (TILs) in HPV positive (HPV(+)) tumors. RNA-Sequencing (RNA-Seq) was used to evaluate whether the differences in clinical behaviour simply reflect a numerical difference in TILs or whether there is a fundamental behavioural difference between TILs in these two settings. Thirty-nine HNSCC tumors were scored for TIL density by immunohistochemistry. After the removal of 16 TILlow tumors, RNA-Seq analysis was performed on 23 TILhigh/med tumors (HPV(+) n=10 and HPV(−) n=13). Using EdgeR, differentially expressed genes (DEG) were identified. Immune subset analysis was performed using Functional Analysis of Individual RNA-Seq/ Microarray Expression (FAIME) and immune gene RNA transcript count analysis. In total, 1,634 DEGs were identified, with a dominant immune signature observed in HPV(+) tumors. After normalizing the expression profiles to account for differences in B- and T-cell number, 437 significantly DEGs remained. A B-cell associated signature distinguished HPV(+) from HPV(−) tumors, and included the DEGs CD200, GGA2, ADAM28, STAG3, SPIB, VCAM1, BCL2 and ICOSLG; the immune signal relative to T-cells was qualitatively similar between TILs of both tumor cohorts. Our findings were validated and confirmed in two independent cohorts using TCGA data and tumor-infiltrating B-cells from additional HPV(+) HNSCC patients. A B-cell associated signal segregated tumors relative to HPV status. Our data suggests that the role of B-cells in the adaptive immune response to HPV(+) HNSCC requires re-assessment.


SUPPLEMENTARY METHODS
Supplementary Methods S1, Histology and immunohistochemistry TIL status was scored on frozen tumor sections that had been stained with H&E and viewed under low-power magnification (x2.5 objective) as described previously [1]; TIL high : diffuse, present in >80% of tumor/stroma; TIL mod : patchy, present in 20-80% of tumor/stroma; TIL low : weak/absent, present in <20% of tumor/stroma. Data regarding the percentage tumor cells, tumor grade and pattern of invasion were also recorded. Furthermore, IHC was performed on FFPE tumor sections against CD3, CD4, CD8 and CD20 (all from Novocastra, Milton Keynes, UK). TILs were quantified using a Zeiss AxioCam MRc5 microscope (Zeiss, Cambridge, UK) and Zeiss Axiovision software (version 4.8.1.0; Zeiss) in an average of 10 high-power (x400) fields across representative areas of each tumor to allow for intratumoral heterogeneity; an average intratumoral TIL score per high-power field was calculated. Additionally, IHC was performed against the antigenic targets, CD200 (Sigma-Aldrich Company Ltd., Gillingham, UK) and CD23 (Abcam, Cambridge, UK). HPV status was evaluated by IHC against p16 (CINtec, Roche, Burgess Hill, UK) and scored as HPV(+) (>50% tumor cells positive) or HPV(−) (<50% tumor cells positive); confirmation was by evaluation of E6 and E7 RNA transcript levels from the RNA-Seq data ( Table 1).

Supplementary Methods S2, RNA-Seq
RNA quality was assessed using the Agilent 2100 Bioanalyser (Agilent Technologies UK Ltd., Stockport, UK); an average RNA quality number (RIN) of 8.51±0.90 was observed across all tumor samples. Total RNA was converted into a library for sequencing on the HiSeq 2000 (Illumina Inc., San Diego, USA) using the TruSeq™ stranded mRNA Sample Preparation Kit (Illumina Inc.). Briefly, poly-A mRNA was purified from total RNA (100ng) using the Poly(A) Purist Mag Kit (Life Technologies Ltd., Paisley, UK), according to the manufacturer's instructions. The mRNA was then amplified and converted into cDNA, which was purified and used to construct libraries that were hybridized to the flow cell for single end (SE 35bp) sequencing.

Supplementary Methods S3, RNA-Seq data analysis
The quality of raw SE read data in FASTQ files was assessed and reads of low quality were trimmed or removed. SE reads were then mapped to the human genome (hg19) using TopHat (version 2.0.9) [2] and, following the removal of multi-mapping reads, converted to gene-specific read counts for 23,368 annotated genes using HTSeq-count (version 0.5.4) [3]. Non-specific filtering of count data was performed using the Bioconductor package EdgeR (version 3.4.2) [4,5] such that genes with less than 2 read counts per million in 25% of tumor samples were excluded from further analysis. The remaining 14,528 genes were subject to normalization using the TMM method [6] to account for differences in library size from sample to sample. Unsupervised clustering of samples was performed following variance stabilizing transformation of TMM normalized data and illustrated as a heatmap.
DEGs between HPV(+) and HPV(−) groups were identified with a FDR adjusted p-value <0.05 (i.e., q-value <0.05) and a fold change of >2 or <-2 using EdgeR [4]. Fold change was calculated in EdgeR as the log2 of geometric mean of intensities; a positive and a negative fold change represents genes that were expressed to a greater or lesser extent, respectively, in HPV(+) versus HPV(−) tumors. q-values were obtained from differential expression test in EdgeR using the generalized linear model likelihood ratio test and adjusted for multiple testing using the Benjamini and Hochberg method to control the FDR. This package models the negative binomial distribution and implements general linear models to identify DEGs. EdgeR was also used to identify DEGs while adjusting for covariates associated with varying proportions of lymphocyte subsets in each tumor sample as reflected in the expression of CD19 (B-cells) and CD4 and CD8A (T-cells) e.g. R-script used in EdgeR for the covariate adjustment was: design <model.matrix (~adjustv_CD19+adjustv_CD4+adjustv_CD8+Group).

Supplementary Methods S5, Functional analysis of individual microarray expression (FAIME) method
The FAIME method [9] was adapted to generate a score for a large number of tissue and cell types present in each tumor sample. Marker gene sets whose expression was associated with different tissue and cell types, including lymphocyte subsets (B-cells, NK cells and CD4 + and CD8 + T-cells), were accrued from the following resources: CTen [10], IRIS, [11], HeamAtlas [12], Palmer et al. [13], Grigoryev et al. [14] and Whitney et al. [15]. Particular attention was paid to gene expression markers of lymphocyte origin; a marker for a particular type of lymphocyte (e.g., a B-cell) needed to be expressed in that lymphocyte as confirmed in at least two of the resources and could not be expressed in another lymphocyte type (e.g., an NK or CD4 + or CD8 + T-cell). A FAIME score was then calculated for each tumor sample, for each cell type, by producing a weighted ranking of the genes in each sample and then determining the ranking of the marker genes for a particular cell type as compared to the genes not associated with that cell type. Finally, a student's t-test was used to assess whether the FAIME scores for a particular cell type were significantly different (q-value <0.05) between HPV(+) and HPV(−) tumors. In a separate group level assessment, the marker gene sets for each tissue and cell type significantly over-represented for DEGs (Bonferroni corrected p-value <0.05) were identified using a hypergeometric test.