Profiling the circulating mRNA transcriptome in human liver disease

The human circulation contains cell-free DNA and non-coding microRNA (miRNA). Less is known about the presence of messenger RNA (mRNA). This report profiles the human circulating mRNA transcriptome in people with liver cirrhosis (LC) and hepatocellular carcinoma (HCC) to determine whether mRNA analytes can be used as biomarkers of liver disease. Using RNAseq and RT-qPCR, we investigate circulating mRNA in plasma from HCC and LC patients and demonstrate detection of transcripts representing more than 19,000 different protein coding genes. Remarkably, the circulating mRNA expression levels were similar from person to person over the 21 individuals whose samples were analyzed by RNAseq. Liver derived circulating transcripts such as albumin (ALB), apolipoprotein (APO) A1, A2 & H, serpin A1 & E1, ferritin light chain (FTL) and fibrinogen like 1 (FGL1) were significantly upregulated in HCC patient samples. Higher levels of some of these liver-specific transcripts in the plasma of HCC patients were confirmed by RT-qPCR in another cohort of 20 individuals. Several less abundant circulating transcripts associated with cancer were detected in most HCC samples, but not in healthy subjects. Liver specificity of circulating transcripts was confirmed by investigating their expression in HCC tumor and liver cancer cell lines. Liver specific mRNA sequences in the plasma were predominantly present outside circulating extracellular vesicles. Conclusions: The circulating “mRNA” transcriptome is remarkably consistent in diversity and expression from person to person. Detection of transcripts corresponding to disease selective polypeptides suggests the possibility that circulating mRNA can work as a biomarker analyte for cancer detection.


RNAseq experiments
Total RNA from 1-2 ml plasma samples (sedimented at 2,000g for 5 min) was prepared using the Qiagen RNeasy Serum/Plasma Kit (Qiagen, Valencia, CA, USA) and quantified on a NanoDrop One spectrophotometer utilizing A260/A280 ratios to assess nucleic acid purity. RNAseq analysis was carried out at Cancer Genomics Facility at Thomas Jefferson University, Philadelphia, PA. RNA purity and integrity were assessed by Agilent 2100 BioAnalyzer and RIN values were determined. Libraries were prepared using the SMARTer ® Stranded Total RNA-Seq Kit v2 (Takara Bio USA, Inc., Mountain View, CA, USA) to support library preparation from as little as 250 pg of RNA. The protocol used was for preparation of total stranded RNA with ribosomal RNA and mitochondrial RNA targeted for removal. Quality of each library was assessed with the Agilent TapeStation 2200. Six to eight libraries were pooled (n = 6-8) and sequenced using Illumina v2 chemistry, 2 × 75 base paired-reads, on one high-output flow cell of an Illumina NextSeq 500 to achieve at least 50 million paired reads per sample in 75-base read length paired-end mode. Adapter trimming was performed on the resulting paired end fastq files, followed by trimming residual Pico v2 SMART Adapters by removing the first 3 nucleotides from the second read in each pair.

RNAseq data analysis
Paired-end sequence reads were analyzed according to currently available best practices for wholetranscriptome analysis, which include alignment onto the current human reference genome assembly (GRCh38) using the STAR splice aware aligner. Transcript counts and abundances expressed in transcripts per million (TPM) were estimated based on GENCODE v28 gene annotations using the RSEM algorithm. Differential expression between groups was tested with the R/Bioconductor DESeq 2 package. Tissue specific content was generated using the R/Bioconductor TissueEnrich package. Most abundant highly expressed genes from RNAseq data were inputted to determine which tissue-specific genes are enriched in those datasets. Tissue-specific genes were defined by processing RNAseq data from the Human Protein Atlas (HPA) [1], GTEx [2], and mouse ENCODE [3] using the algorithm from the HPA. The hypergeometric test was used to determine if the tissue-specific genes are enriched among the input genes. Tissue enrichment is defined as genes with an expression level greater than 1 (TPM or FPKM) that also have at least five-fold higher expression levels in a particular tissue compared to all other tissues. Fold change in liver tissue, for example, would be calculated as: Fold change = (k/n)/(Kb/Nb), where k is the number of detected liver specific genes, n is the number of all detected genes, Kb is the number of liver specific genes in the human genome, and Nb is the number of genes in the human genome. Read piles of liver specific transcripts were analyzed using alignment .bam files corresponding to genome version GRCh38.
The coverage density plots showed how many reads are aligned to a particular location, typically lining up with exon boundaries.

RT-qPCR analysis
Total RNA from 1-2 ml plasma samples (University of Texas, Houston) was prepared using the Qiagen RNeasy Serum/Plasma Kit (Qiagen, Valencia, CA, USA) and quantified on a Nanodrop One spectrophotometer. RNA (1 µg) was subjected to DNase digestion before reverse transcription to cDNA. Each cDNA sample was spiked with a 102 nt zebrafish specific DNA ultramer (IDT, Coralville, IA) for use as internal control. A 200 pM stock of DNA ultramer was made and a serial dilutions were prepared from 20 nM to 0.02 oM and subjected to qPCR using ultramer specific oligos. The standard curve demonstrated that ultramer concentration was inversely proportional to Ct values obtained and melting curves reflected the specificity of ultramer amplification (data not shown). To validate RNAseq results, all cDNA samples were spiked with 20 pM ultramer, which produced a Ct of 14 in the standard curve (data not shown). PCR reactions for each sample (n = 20) with 25 ng cDNA were subjected to RT-qPCR. For effective amplification of circulating mRNA fragments, primers were designed from RNAseq high sequencing depth regions. Relative expression of each transcript was calculated using the ∆∆Ct method by normalizing with internal reference DNA ultramer and the average ∆Ct value (n = 10) from age and gender matched NHC samples. Each PCR reaction was carried out in triplicate sets and relative expression determined.

Isolation and characterization of EVs
EVs were purified from HepG2 and Huh7 liver cancer cell lines as previously described [4][5][6] with some modifications. Briefly, the culture supernatant was collected after 48 hours of starvation. The supernatant was first pre-cleared of any cellular debris by centrifugation at 2,000g for 20 minutes at 4°C and subsequently transferred to a fresh ultracentrifuge tube and centrifuged at 10,000g for 30 minutes at 4°C. The remaining supernatant was centrifuged to purify the EV fraction at 100,000g for 120 minutes at 4°C. The EV pellet was washed by a second centrifugation in 1X PBS at 100,000g for 120 minutes at 4°C. The final EV pellet was resuspended in 1X PBS for storage at -80°C. Plasma samples from HCC patients and normal healthy controls (CMCVAMC) were subjected to centrifugation at high speed for 30 minutes to remove residual cellular debris and supernatant collected. A minimum of 1 ml of plasma from patients was used to purify EVs by commercial ExoQuick kit (System Biosciences, Palo Alto, CA, USA) and isolated EVs were further purified by ultracentrifugation using 1X PBS at 100,000g for 120 minutes at 4°C. EV pellet was resuspended in 100 μl PBS. One μL of the EV suspension was used for NanoSight NS300 (Malvern Instruments, Malvern, UK) analysis after diluting 1:1,000 in PBS, at Thomas Jefferson University. The NTA software (Malvern Instruments, Malvern, UK) was used to obtain the size distribution and concentration of particles. Remaining EV suspension was lysed for characterization by immunoblotting using primary Abs against CD9, CD63, CD81, and TSG101 followed by secondary antibody treatments.