Cell type specific gene expression analysis of prostate needle biopsies resolves tumor tissue heterogeneity.

A lack of cell surface markers for the specific identification, isolation and subsequent analysis of living prostate tumor cells hampers progress in the field. Specific characterization of tumor cells and their microenvironment in a multi-parameter molecular assay could significantly improve prognostic accuracy for the heterogeneous prostate tumor tissue. Novel functionalized gold-nano particles allow fluorescence-based detection of absolute mRNA expression levels in living cells by fluorescent activated flow cytometry (FACS). We use of this technique to separate prostate tumor and benign cells in human prostate needle biopsies based on the expression levels of the tumor marker alpha-methylacyl-CoA racemase (AMACR). We combined RNA and protein detection of living cells by FACS to gate for epithelial cell adhesion molecule (EPCAM) positive tumor and benign cells, EPCAM/CD45 double negative mesenchymal cells and CD45 positive infiltrating lymphocytes. EPCAM positive epithelial cells were further sub-gated into AMACR high and low expressing cells. Two hundred cells from each population and several biopsies from the same patient were analyzed using a multiplexed gene expression profile to generate a cell type resolved profile of the specimen. This technique provides the basis for the clinical evaluation of cell type resolved gene expression profiles as pre-therapeutic prognostic markers for prostate cancer.


INTRODUCTION
In men, prostate cancer is the most commonly diagnosed cancer and is the second leading cause of cancer related deaths [1]. For 2012, the incidence was estimated to be 238,000 and the mortality to be 29,000 in the USA. Even though 5-year survival rates exceed 98%, appropriate risk adapted therapies remain a challenge in identifying and separating patients with low and highrisk cancer [1]. Patients with slowly progressing tumors with a low risk for incurable metastatic disease have to be separated from quickly progressing tumors with a high risk for incurable metastatic disease. Data from one of the largest PSA screening trials (European Randomized Study of Screening for Prostate Cancer: ERSPC) [2] of more than 180,000 men estimated that rates for over-treatment are up to 50% in low risk groups. Moreover, high mortality rates in the high-risk group clearly demonstrate a need for improved therapeutic strategies.
Current pre-therapeutic risk assessment is based on limited clinical parameters such as histological grading of needle biopsies by the Gleason score, clinical TNM classification and blood serum-based PSA levels [3]. The needle biopsy is used as the gold standard for prostate cancer diagnosis, yet provides limited biological material (ca. 1x1x10mm). The cellular heterogeneity of the tissue consists of the epithelial tumor as well as benign cells, mesenchymal cells (stromal cells, myofibroblasts, endothelial cells, lipocytes) and lymphocytes, which contaminate the bulk tissue and decrease its usefulness for analysis [4]. Several published RNA-seq or microarray data sets have used bulk tumor tissue to generate profiles in order to classify prostate tumors [5,6]. Often, only a small distinct tissue section of the tumor is selected for the analysis, which might underestimate the heterogeneity of the tumor within the organ [7]. Additionally, the proportion of contaminating cells such as mesenchymal cells in a tumor sample can significantly influence the expression profile of the sample, even if corrected for genes typically expressed in such cells [8]. Micro laser dissection can resolve the cellular heterogeneity and allows for RNA-seq analysis, but is often limited by poor RNA quality due to the fixation process [9]. Furthermore, cells cannot be used for in vitro culture assays after fixation.
Until recently, isolation of living cells was primarily limited to detection based on the expression of cell surface proteins. Novel, functionalized gold nanoparticles allow for the isolation of living cells based on absolute mRNA expression levels of a specific target [10]. Alpha-methylacyl-CoA racemase (AMACR) is routinely used as a biomarker in prostate cancer diagnosis as it is overexpressed in 80% [11] of prostate cancers at the protein and mRNA level [12,13]. However, AMACR overexpression is typically also seen in HGPIN (high grade prostatic intraepithelial neoplasia), up to 21% of typical benign glands, in 10-79% of partial atrophy and 10% of adenosis [14]. On the other hand certain prostate cancer subtypes such as foamy gland carcinoma, atrophic and pseudohyperplastic carcinoma show low expression of AMACR [15]. We have to take into account that all the former entities might even coexist within the same specimen. Nonetheless, as such AMACR represents the best studied and routinely used potential target to identify living tumor cells using functionalized gold-nano particles (see methods). This technique might allow to further discriminate between tumor and benign cells, which both express the routinely used EPCAM cell surface protein. Isolated cell populations can now be separately analyzed for gene expression profiles. Advances in the technique for gene expression analysis allow for the detection of gene expression profiles down to the single cell level [16][17][18][19]. This allows for analyzing small samples from sparse input material such as needle biopsies.
In this study, we present a technique to characterize a prostate tumor by cell type resolved gene expression profiling from low input material such as needle biopsies. Distinct cell types were isolated simultaneously from needle biopsies. These cells were viable and were either used for in vitro culture or for multiplex gene expression analysis. Multiple biopsies were analyzed to cover different sections of the tumor.

Analysis of RNA-seq data sets
Two independent human prostate cancer RNA-seq studies with cancer and matched benign samples from 10 patients per study were analysed [6,43]. Both data sets were processed separately as follows: raw sequencing reads were mapped to the human genome (assembly hg19) with TopHat2 with first aligning reads against the transcriptome (Ensembl v65 gene annotation) (further non-default TopHat2 parameter chosen according to studyspecific read length and fragment length distributions: "-r 140 -mate-std-dev 20 -segment-length 19" for the former and "-r 150 -mate-std-dev 38 -segment-length 18" for the latter data set). Sequencing reads per annotated gene (Ensembl v65) were counted with htseq-count [44]. Differentially expressed genes between cancer and benign prostate samples were determined with DESeq2 , taking into account the patient-wise pairing of tumor and benign sample as additional factor.

Hierarchical clustering
Results from multiplex analysis on 48.48 dynamic array were exported as heatmaps and analyzed using the SINGuLAR TM script (Fluidigm) for R. A detection limit was defined and CT values were converted into (Detection limit -CTvalue) for easier visualization. Negative values are represented as zero. Euclidean hierarchical clustering was performed. Values range from 0 to 12 with increasing expression level and are color-coded from red to white.

Normalization
In the standard qRT-PCR experiments, expression values were normalized to AMACR low expressing populations. Considerable variation in the stochastic expression levels of housekeeping genes -and all other genes -at the single cell level makes the use of housekeeping genes not feasible (Fig. S3). We expected that the stochastic variation in expression levels would also affect cell numbers such as the 200 cells used in this study. The precise control of the number of input cells by FACS ensures the same amount of input RNA for the analysis.

Needle biopsy processing
Written consent was obtained from patients prior to any treatment including radical prostatectomy. The work was approved by a local ethics committee. Radical prostatectomy specimens were received and processed within 10 minutes after the dissection by a surgicalpathologist. Specimens were incised dorso-ventrally and needle biopsies were retrieved by conventional punch technique. Histological validation using H&E staining was performed on cross sections from the area where needle biopsies were retrieved. Needle cores were cut into 1x1mm pieces and incubated with 1:10 Collagenase (10mg/ml) in DMEM supplemented medium overnight at 37°C. Cores were washed and incubated with 5% Trypsin EDTA for 10 minutes at 37°C. Cores were separated into single cells by mechanical force (18 and 20G needles) and filtration (40μm). Cells were resuspended in 1ml DMEM and either used for analysis or frozen in liquid nitrogen.

SmartFlares (SF; MerckMillipore)
In brief, a 27 bp complementary target sequence is conjugated to a gold nanoparticle. A shorter complementary reporter strand binds to the target strand and is conjugated with a fluorophore (Cy5). The fluorophore is quenched by the gold nanoparticle when in the bound state. An endogenous target sequence with a higher binding affinity (due to its increased length) will replace the reporter strand and lead to the emission of a fluorescent signal, which can be detected by conventional FACS. No toxic or immunologic reactions induced by the gold nanoparticles have been reported to date [10]. The endo-and exocytosis machinery of living cells facilitates cellular uptake and release of the gold nanoparticles. This feature allows for cultivation of the cells after FACS isolation.

Fluorescent activated cell sorting (FACS)
Cell sorting was conducted with a FACSAria III cell sorter (BD Biosciences).

Antibodies and dyes
We used the following antibodies:

RESULTS
Two independent RNA-seq data sets [6,43] from human prostatectomy samples were analyzed. Only samples where tumor and benign control tissue matched to the same patient were included (n = 20). A total of 322 genes were identified as significantly differentially expressed in both data sets at a false discovery rate (FDR) of 0.1. The fold change in both data sets is highly correlated (Spearman's correlation coefficient ρ = 0.9037) (Fig. S1). A spearman coefficient of 0.9037 indicates a variation in the expression levels, which is most likely attributed to the biological variation of the two different sample populations (Fig. S1). Finally, 36 / 322 genes were selected according to a mean gene expression > 300 normalized read counts, log2 fold change > 1 for up-regulated and < -1 for downregulated genes (Prenser et al. data set [6]), and known regulatory or marker functions in prostate cancer. Additionally, marker genes for epithelial to mesenchymal transition (EMT), stemness and cell lineage for epithelial, stromal and lymphatic cells were included. Stemness and EMT genes are involved in metastatic and drug resistance pathways in various cancer entities [45]. Taqman primers for the selected genes were purchased from Life Technologies. Rigorous selection (efficiency between 90-110%) left 33 genes available for further analysis (see methods for complete list).
AMACR was among the 5 most up-regulated genes of the 322 genes identified and therefore was selected for customized SmartFlare design. The target sequence was directed against nucleotide 97-123 (according to the manufacturer). In a preceding experiment we confirmed the intracellular uptake of AMACR SmartFlare and a scramble control using confocal microspcopy in LNCAP cells after 4 h of incubation (Fig. S2). The intracellular AMACR signal correlates with FACS analysis, where 91% of the cells stained positive. To test the specificity of our AMACR SmartFlare oligos in cell culture systems, we selected two cell lines with significantly different expression levels of AMACR: the prostate tumor cell line LNCAP (high AMACR expression) and the benign prostate cell line RWPE-1 (low AMACR expression). qRT-PCR analysis resulted in an approximately 40-fold difference in AMACR expression between LNCAP and RWPE-1 cells when normalized against GAPDH. SmartFlares detect only absolute levels of mRNA expression and hence normalising qPCR data relative to a housekeeping gene can be misleading unless the input number of cells is precisely controlled. Therefore, we performed a cell culture-based in vivo validation of the AMACR SmartFlare specificity. LNCAP and RWPE-1 cells were incubated with 20μl of 1:20 diluted AMACR or a scrambled SmartFlare oligo as a control and monitored over 24 h for the absolute fluorescent intensity signal per cell using fluorescent time-lapse microscopy. Cells were automatically tracked by additional nuclear staining with Hoechst dye. Figure 1A demonstrates significantly higher fluorescence intensity after 24 h in LNCAP cells compared to RWPE-1 cells (p < 0.0005). The scramble signal is detectable at comparable levels in both cell lines and significantly lower compared to the AMACR signal (p < 0.005).
Next, we validated the expression levels of AMACR, CAMKK2, TMEFF2, REPS2 and ABCC4 identified in the above-mentioned RNA-seq data in the tumor cell line LNCAP and two benign cell lines RWPE-1 and PNT2 cells. All five tested genes showed significant up-regulation in LNCAP cells compared to RWPE-1 or PNT1A cells (p < 0.05). We also showed that the AMACR and p63 protein staining pattern in prostate tumors and benign glands mirrors the mRNA expression pattern in LNCAP cells and RWPE-1 cells. Tumor glands lack a basal cell layer and therefore do not stain positive for p63. In contrast, luminal benign epithelial cells show weak or absent staining for AMACR, whereas epithelial tumor cells show in most cases strong staining for AMACR instead (Fig. 1E and 1F). However, when AMACR staining intensity is compared between the two representative samples demonstrated in Fig. 1E and 1F it becomes obvious that the difference between benign and tumor cells is less prominent in E compared to F. This again underlines the heterogeneity in the AMACR expression. The gene expression pattern (up-regulated AMACR mRNA and down-regulated p63 mRNA) in LNCAP cells (normalized to benign RWPE-1 cells) represents the protein staining pattern (up-regulated AMACR and absent p63 protein) in primary prostate cancer tissue with high AMACR expression.
Next, single cell suspensions from fresh needle biopsies from human prostatectomy specimens were stained with AMACR SmartFlares and fluorescencelabeled antibodies against CD45 and Epcam. Single living cells were gated for CD45 positive lymphocytes and Epcam positive epithelial cells. CD45/Epcam double negative cells represent mesenchymal cells. Epcam positive cells were further gated into AMACR high and low expressing cells ( Fig. 2A). We observed significant enrichment of AMACR positive cells in the AMACR high compared to scramble control (Fig. 2B). We did not observe any enrichment in a benign control sample isolated from the same patient (Fig. 2B). Next, 200 cells from the high and low gate were sorted for qPCR analysis for a subset of tumor markers such as AMACR, CAMKK2, TMEFF2, REPS2 and ABCC4. Single tube assays were used for reverse transcription and preamplification. We observed higher expression of the tumor markers in the AMACR high expressing cells, which were identified by AMACR SmartFlares. Additionally, AMACR high expressing cells showed no expression of the basal cell marker p63 and higher expression of the luminal cell marker cytokeratin 8 (KRT8). This expression pattern is typical for prostate tumor cells and is in agreement with previous reports [46]. Additional cells were sorted, cultivated at a density of 1000 cells per 96-well and expanded for several days (Fig. 2D). Cells were cultivated in commercially available epithelial stem cell medium (PREGM, Lonza) supplemented with 15% matrigel (for detailed information see methods). Epcam positive epithelial cells, which comprise AMACR high and low expressing cells, represent only 17, 30% of all living cells (Table S4).
We decided to build upon the existing model and further characterize the cellular populations isolated from our samples. A separate sample was prepared under the same conditions except that four populations were sorted in parallel using a 4-way sort algorithm to minimize loss of cells. A total of 200 cells each from CD45 positive lymphocytes, CD45/Epcam double negative mesenchymal cells, Epcam positive AMACR high cells and Epcam positive AMACR low cells were sorted. We used the same tumor markers as before and included additional lineage markers such as Vimentin (stromal cells), CD45 (lymphocytes) and Epcam (epithelial cells). We confirmed that the sorted cell populations were identified by the expression of specific marker genes (Fig. 3C). Expression levels were normalized to AMACR low cells. In line with our previous observations (Fig. 2), the tumor markers AMACR, CAMKK2, TMEFF2, REPS2 and ABCC4 were significantly expressed in AMACR high cells as compared to AMACR low cells. Some tumor markers such as AMACR, TMEFF2 and ABCC4 were also expressed in  HOXc6  TWIST  HPN  ITPKA  ALDH1  CAMKK2  AR  AMACR  TMEFF2  NANOG  VIM  MIR200c  OCT_4  LAMB3  S100A14  SIM2  GOLM1  REPS2  SOX9  ABCC4  ADAM2  SNAIL2  SNAIL1  ZEB1  GATA3  SOX2  mesenchymal cells or lymphocytes at levels comparable to AMACR low cells. The analysis of the AMACR-Cy5 fluorescence intensity in the gates for CD45 and Epcam/ CD45 double negative cells were selected to represent the expression levels observed by qPCR (Fig. 3B). An additional 200 cells from the same needle biopsy were sorted and analyzed for the expression of 29 genes (Methods Table 1 except with additional lineage marker). Gene sets representative for stemness and epithelial to mesenchymal transition (EMT) were included, which provide information on the metastatic potential of the tumor cells. Also, genes that are commonly found to be down-regulated in prostate cancer were included (Methods Table 1). A hierarchical clustering analysis separated the four populations (Fig. 3D) resembling the qPCR results presented in Fig. 3C.
To account for the multifocal occurrence of prostate cancer, four separate needle biopsies (L1, L2, R1, R2) from another prostatectomy specimen were analyzed using the same setup as before (Fig. 4B). Distribution of cells in the AMACR high and low gates did show some variation in the four needle biopsies and only two biopsy cores contained detectable CD45 positive lymphocytes. Both findings can be explained by the expected heterogeneous cellular composition in each biopsy (Fig. 4A). Expression levels were analyzed by hierarchical clustering, and clearly separated mesenchymal cells and lymphocytes from epithelial cells (Fig. 4C). Epithelial cells were also clustered in AMACR high and low expressing cells. We could also detect the already described heterogeneous expression pattern of AMACR on the gene expression level. From biopsy L2 the AMACR low expressing cell population did show the gene expression signature from tumor cells. Vice versa the AMACR high expressing cell population from biopsy R2 did show a benign expression profile (Fig. 4C). These findings further support the reported cellular heterogeneity within separate biopsies represented by different proportions of cell types per biopsy. Distribution of cell types within the needle biopsies are described in Table S4.

DISCUSSION
We can conclude that AMACR mRNA expression level-based separation of epithelial cells generates cell populations that closely resemble tumor and benign prostate cells, with respect to their gene expression profile. Even representing the expected heterogeneity of AMACR expression within the same specimen.
It is the first time that this technology has been applied to primary prostate tissue and small input material such as a needle biopsy. We selected AMACR as a target in this study because of its routine use in cancer diagnostics and the good correlation of expression at the mRNA and protein level [12,13]. However, this technology can be applied to any target of interest. The increasing number of individual primary prostate tissue transcriptomes from RNA-seq provides the basis for the selection of such targets. Unfortunately, the majority of the above data is acquired from bulk tissue analysis and may not represent the expression levels found in a cell type resolved analysis. For the majority of samples analyzed, we observed only one population of AMACR positive cells, which was stretched over more than one log 10 step of fluorescence intensity. These findings most likely represent a continuous regulation of AMACR expression within the epithelial cell population. Technical explanations such as hydrolysis of the SmartFlare reporter oligo or detection of splice variants could account for such a signal distribution. Unspecific signals induced by hydrolysis are controlled by scramble SmartFlares, which showed significantly lower fluorescent signals than those detected in the AMACR high gate. The AMACR SmartFlare was designed to detect exon one, which is included in all known splice variants. No single cell analysis data regarding AMACR expression in living human prostate tumor cells are available for comparison.
A limitation of the expression profiles of the separated populations occurs when attempting to correlate the data with detailed histological analysis. The cross section of the prostatectomy sample showed a heterogeneous tumor tissue with tumor and benign glands and proportions of mesenchymal cells and inflammatory cells such lymphocytes. However, no exact estimation can be made to analyze the cellular composition along the needle biopsy. For future studies, improved biopsy technologies are needed, which would allow for simultaneous analysis of living cells and histological validation. We currently explore the use of a needle punch system to generate two tissue cylinders in very close proximity (< 1mm). One cylinder can be used for  cells and paraffin embedded material for downstream analysis in a feasibility experiment (data not shown). The use of 200 cells from each population allows for the detection of a large number of transcripts and samples in a cost efficient way. As yet, this does not allow resolution at the single cell level. The number of detectable genes per cell decreases at the single cell level, which is mainly attributed to highly stochastic expression of certain transcripts in a single cell at a given time point [16,17]. From 48 selected genes for single cell expression analysis in 23 LNCAP cells, only 16 were expressed in at least 10% of the cells (data not shown), an observation that has been published previously [17]. By using 200 cells, we can account for the stochastic single cell effect and allow for the detection of a broader gene expression profile in a defined cell population. For certain low abundant subpopulations such as tumor stem cells or infiltrating lymphocytes, single cell analysis remains a valuable tool that can be applied using our method [17][18][19].
The multiplex gene expression analysis on a 48.48 dynamic array is limited to 48 genes and 48 samples. However, it represents a cost efficient technology for analyzing a large number of samples with minimal reagent input and high precision due to the microfluidic driven assembly of the nano liter scaled reaction volumes. To further increase the power of this system in order to reveal novel interaction networks, potentially to be used as prognostic criteria, RNA-seq technology has to be applied via adopted single cell protocols [19,47].
Cell populations referred to as mesenchymal cells in this work are known to consist of further subpopulations such as stromal cells, endothelial cells, myofibroblast cells and cancer associated fibroblasts (CAFs) [48]. The same holds true for CD45 positive lymphocytes, which can be further subdivided into CD3+, CD4+, CD8+ and FoxP3+ cells [49]. These subpopulations will have to be taken into account in future studies.
Using a cell type resolved analytical method of living cells to dissect prostate tumor tissue can overcome several previous limitations and improve the quality of tissue analysis post-biopsy: Living cells from primary tissue can be analyzed for a gene expression signature and, from the same population of cells, could be used for functional assays such as drug screening, murine xenografts or three-dimensional tissue cultures to examine drug response within the complex interaction of different cell types. Such analyses cannot be performed with fixed tissue biopsies. Further more, potential "omics" analyses from living cells provides a higher quality of input material compared to formalin or alcohol fixed tissue.
A commercially available gene expression signature assay for prostate cancer has been recently validated in several patient cohorts and showed superior prediction for prostate cancer disease recurrence after the initial treatment when compared to clinical and histological variables [50]. However, it has focused on a single tumor focus per patient, which has been analyzed as a bulk sample from formalin fixed tissue. Our method has the potential to even further refine the prediction accuracy of such a test assay by dissecting and analyzing the heterogeneity of different cell types at multiple sites within the tumor tissue. Different cells types can be analyzed separately at multiple sites of the tumor. Furthermore, different gene expression signatures from distinct cell types such as tumor cells, benign cells, mesenchymal cells and lymphocytes can be correlated to identify better prognostic and therapeutic strategies. A limitation of using living cells is that data has to be collected prospectively with follow up periods of five to ten years to correlate gene expression signatures to clinical outcomes such as disease recurrence, treatment response or death from cancer. Still, several important issues for clinical decision making such as the presence of metastatic disease at the time of diagnosis, the identification of the biopsy of origin of the metastases or the initial treatment response, could be answered with significantly shorter follow up.
In summary, an integrated cell type resolved gene expression analysis derived from multiple histological validated needle biopsies from the same organ provides the basis to further analyze novel interaction networks, define prognostic signatures and identify therapeutic targets in heterogeneous tumor tissue. Such tools will help to identify prostate cancer patients at high risk for tumor progression or death from cancer and treat them effectively as well as to prevent unnecessary treatment of patients with indolent prostate cancer.

Authors' contributions
MW, MK, VD and CJ performed experiments, ASR and RB performed RNA-seq data analysis, DG and MF performed data analysis and wrote the manuscript, RS, MW and WSS supervised the project, MK developed the project, DG and MK wrote the manuscript.