Meta-analysis of transcriptome data identifies a novel 5-gene pancreatic adenocarcinoma classifier

Purpose Pancreatic ductal adenocarcinoma (PDAC) is largely incurable due to late diagnosis. Superior early detection biomarkers are critical to improving PDAC survival and risk stratification. Experimental Design Optimized meta-analysis of PDAC transcriptome datasets identified and validated key PDAC biomarkers. PDAC-specific expression of a 5-gene biomarker panel was measured by qRT-PCR in microdissected patient-derived FFPE tissues. Cell-based assays assessed impact of two of these biomarkers, TMPRSS4 and ECT2, on PDAC cells. Results A 5-gene PDAC classifier (TMPRSS4, AHNAK2, POSTN, ECT2, SERPINB5) achieved on average 95% sensitivity and 89% specificity in discriminating PDAC from non-tumor samples in four training sets and similar performance (sensitivity = 94%, specificity = 89.6%) in five independent validation datasets. This classifier accurately discriminated PDAC from chronic pancreatitis (AUC = 0.83), other cancers (AUC = 0.89), and non-tumor from PDAC precursors (AUC = 0.92) in three independent datasets. Importantly, the classifier distinguished PanIN from healthy pancreas in the PDX1-Cre;LSL-KrasG12D PDAC mouse model. Discriminatory expression of the PDAC classifier genes was confirmed in microdissected FFPE samples of PDAC and matched surrounding non-tumor pancreas or pancreatitis. Notably, knock-down of TMPRSS4 and ECT2 reduced PDAC soft agar growth and cell viability and TMPRSS4 knockdown also blocked PDAC migration and invasion. Conclusions This study identified and validated a highly accurate 5-gene PDAC classifier for discriminating PDAC and early precursor lesions from non-malignant tissue that may facilitate early diagnosis and risk stratification upon validation in prospective clinical trials. Cell-based experiments of two overexpressed proteins encoded by the panel, TMPRSS4 and ECT2, suggest a causal link to PDAC development and progression, confirming them as potential therapeutic targets.

that utilizes information from large publicly available microarray databases to pre-compute and freeze estimates of probe-specific effects and variances. The frozen fRMA data is updated with information from new array datasets to provide a normalized summary of the combined data. When the probe-level data contained in .cel files was not available, we used the gene expression data matrix (GEDM) of Affymetrix average difference intensities. The normalized datasets were further standardized using Z-score to reduce the batch effects among different datasets (4).

Differential gene expression analysis
For training set differential expression analysis, the two sample classes were normal pancreas (NP) and PDAC and the null hypothesis was "no difference in gene expression exists between the NP and PDAC sample classes". To identify differentially expressed genes, a linear model was implemented using the linear model microarray analysis software package (LIMMA) (5). The differentially expressed transcripts were identified using LIMMA, which estimates the differences between Normal and Cancer samples by fitting a linear model and using an empirical Bayes method to moderate standard errors of the estimated log-fold changes for expression values from each probe set. In LIMMA, all probes were ranked by t statistic using a pooled variance, a technique particularly suited to small numbers of samples per phenotype. The differentially expressed probes were identified on the basis of absolute fold change and Benjamini and Hochberg corrected P value (6). The genes with multiple test corrected P value <.05 and fold change (FC) of at least 1.5 were considered as differentially expressed. The genes that were found to be differentially expressed and with concordant directionality (upregulation or downregulation) in three out of four datasets were used for training the PDAC classifier.

Hierarchical clustering analysis of mouse GEM PDAC model microarray dataset
To evaluate the cross-species differential expression of the 5-gene PDAC classifier in a GEM mouse model of PDAC we performed unsupervised analysis using hierarchical clustering analysis (HCA) on the GSE33322 dataset. The HCA analysis was performed using Pearson correlation matrices with completelinkage method.

Cell culture
Capan-1, BxPC-3, MIAPaCa-2, Panc-1, ASPC1, PL45 and HPDE cells were purchased from American Type Culture Collection (Rockville, MD, US). These cells were maintained in Dulbecco's modification of Eagle's medium (DMEM) containing 10% fetal bovine serum, 1% penicillin/streptomycin, and 1% glutamine. Cell lines were cultured in BD Primaria tissue culture dishes, with dimensions of 100x20 mm at 37°C with 5 % CO 2 in a humidifier incubator and carried at 2.0 × 10 6 cells/ml, passaging two to three times weekly as needed. Cells were pelleted by centrifugation at 2,500 rpm for 8 min at 4°C and resuspended in fresh complete media in tissue culture plates 24 hrs before use in experiments to avoid any confounding gene expression that might occur because of handling. Confluent cells were harvested by trypsinization with 0.05 % trypsin and 0.02 % EDTA, pelleted by centrifugation at 2,500 rpm for 8 min at 4°C, and resuspended in fresh complete DMEM media and plated in BD Primaria tissue culture dishes 24 hrs before use in experiments.

Lentiviral production and infection
Lentiviral shRNAs targeting TMPRSS4 (shTMPRSS4) and ECT2 (shECT2) were obtained from Harvard Medical School (Boston, MA). The lentivirus was packaged by co-transfection of 293T cells with the shRNA expression vector, VSV-G (vesicular stomatitis virus-glycoprotein), and delta-VPR plasmids at the ratio of 1:0.9:0.1, using lipofectamine 2000 (Invitrogen, USA). Forty-eight hours after transfection, the supernatants containing lentiviral particles were harvested and titering was done using Hela cells.
Capan-1 and BxPC-3 cells were plated in 10 cm dishes until 80% confluence. The day of infection, media was removed and replaced with 8 ml of complete media supplemented with polybrene (8 ug/ml) into each plate. 250 l of lentivirus (shTMPRSS4, shECT2, shGFP or a scrambled shRNA control) was added to each plate and incubated for 24 hours. Cells were left to recover from infection for 24 hours before initiating selection with puromycin 3ug/ml for three days.
The absorbance of the solution was measured in a spectrophotometer (Bio-Rad Model 550, Bio-Rad Laboratories, Inc., Hercules, CA, USA) using a test wavelength of 540 nm.

Migration and invasion assays
Cell migration and invasion with shTMPRSS4 or shECT2 were tested using a modified Transwell chamber migration assay (8-m pore size membrane, BD Biosciences) or invasion assay (Matrigelcoated membrane, BD Biosciences). Cells (25 × 10 4 ) were seeded in serum-free medium into the upper chamber and allowed to migrate or invade toward 10% FCS as a chemoattractant in the lower chamber for 16 h. Cells in the upper chamber were carefully removed using cotton buds, and cells at the bottom of the membrane were fixed and stained with crystal violet. Quantification was done by counting the stained cells.

Soft agar colony formation assay
The soft agar assay was used to determine colony formation of the cells for detection of malignant cell transformation. Briefly, 1x10 5 shTMPRSS4 or shECT2 infected cells were added to the top layer soft agar mix in six well plates. The six well plates were stored in the incubator between 14-20 days. A few drops of DMEM were added every two to three days to keep the plate moist. After incubation colony formation was determined using a P-Iodonitrotetrazolium staining dye.

Quantitative real-time PCR (qRT-PCR) analysis of FFPE samples
With institutional review board (IRB) approval, human formalin-fixed paraffin embedded (FFPE) tissue was obtained from 22 PDAC patients who underwent primary surgical resection (pancreaticoduodenectomy or partial pancreatectomy). The original slides (5 m thickness) of FFPE tissue that were prepared and stained with hematoxylin and eosin were reviewed by a fellowship-trained, gastrointestinal and hepato-pancreato-biliary pathologist (EUY). 9 well-differentiated, 9 moderatelydifferentiated, 3 poorly differentiated and 1 other (mucinous adenocarcinoma/colloid carcinoma) PDAC samples and their respective background, non-neoplastic pancreatic parenchyma (9 with no significant pathologic abnormality and 13 with pancreatitis) were selected. Regions of neoplastic and background pancreatic parenchyma were designated for analysis (areas at least 64 mm 2 in size were outlined with permanent marker).
After matching the tissue block with the H&E stained slide, core punches, restricted to tumor regions that the pathologist marked as PDAC, pancreatitis or healthy pancreas, were extracted from the FFPE block. A 2.5 mm biopsy punch was used to punch three cores from each sample for RNA extraction.
Total RNA was isolated using the RecoverAll™ Total Nucleic Acid Isolation Kit (Ambion) after pooling the three cores for each sample.