Systematically characterizing dysfunctional long intergenic non-coding RNAs in multiple brain regions of major psychosis

Schizophrenia (SZ) and bipolar disorder (BD) are severe neuropsychiatric disorders with serious impact on patients, together termed “major psychosis”. Recently, long intergenic non-coding RNAs (lincRNAs) were reported to play important roles in mental diseases. However, little was known about their molecular mechanism in pathogenesis of SZ and BD. Here, we performed RNA sequencing on 82 post-mortem brain tissues from three brain regions (orbitofrontal cortex (BA11), anterior cingulate cortex (BA24) and dorsolateral prefrontal cortex (BA9)) of patients with SZ and BD and control subjects, generating over one billion reads. We characterized lincRNA transcriptome in the three brain regions and identified 20 differentially expressed lincRNAs (DELincRNAs) in BA11 for BD, 34 and 1 in BA24 and BA9 for SZ, respectively. Our results showed that these DELincRNAs exhibited brain region-specific patterns. Applying weighted gene co-expression network analysis, we revealed that DELincRNAs together with other genes can function as modules to perform different functions in different brain regions, such as immune system development in BA24 and oligodendrocyte differentiation in BA9. Additionally, we found that DNA methylation alteration could partly explain the dysregulation of lincRNAs, some of which could function as enhancers in the pathogenesis of major psychosis. Together, we performed systematical characterization of dysfunctional lincRNAs in multiple brain regions of major psychosis, which provided a valuable resource to understand their roles in SZ and BD pathology and helped to discover novel biomarkers.

we detected significantly differentially expressed lincRNAs in three contexts -comparisons between SZ cases and controls in BA24 (BA24_SZ for short) and BA9 (BA9_SZ for short) and comparisons between BD cases and controls in BA11 (BA11_BD for short), we constructed co-expression networks for these contexts. For each context, the FPKM values of filtered lincRNAs and PCGs (read count>2 in more than 50% samples) were normalized by log2 transformation (i.e. log2(FPKM+1)) and quantile normalized, then Pearson correlation coefficients were calculated for all gene pairs. The resulting correlation matrix was transformed to an adjacency matrix using a power function. A scale-free topology criterion [4] was adopted to choose the power (10, 9 and 5 for BA24_SZ, BA9_SZ and BA11_BD, respectively). Based on the adjacency matrix, topological overlap (TO, indicating relative interconnectedness between two genes) was calculated for each pair of genes. Then, modules were detected using average linkage hierarchical clustering based on the TO dissimilarity (1-TO). The minimum module size was set at 30 genes and the minimum height for merging modules at 0.25 for BA24_SZ and BA9_SZ, while minimum height of 0.15 was chosen for BA11_BD to obtain moderately large and distinct modules.

Identification of significantly co-expressed modules
To verify genes in the modules were co-expressed beyond by chance, we performed permutation test based on the assumption that the mean TO of a network module should be greater than that of a random module [5]. For each module, we randomly selected 100 sets of genes with equivalent size of the module from the network. The p-value was estimated as the ratio of random gene sets whose mean TO was greater than that of the module. Then, multiple comparison correction was performed for all modules in a network. Finally, modules with FDR<0.05 were considered significantly co-expressed modules.

Module characterization
Enrichment of DELincRNAs or DEPCGs in the significantly co-expressed modules were assessed using hypergeometric test with FDR<0.05. The modules which contained lincRNAs and were enriched with DELincRNAs or DEPCGs were retained for subsequent analyses.
Over-representation of brain-related markers were performed using the WGCNA function userListEnrichment with Bonferroni corrected p-value<0.05. Correlation test between module eigengenes (equivalent to the first principal component [6]) and clinical traits such as diagnosis, age, sex, race, PMI and brain PH were performed using corPvalueStudent function.
Disease-associated modules were identified as those showing significant correlation (p-value<0.05) with disease state while non-significant correlation with other clinical traits.
To identify potentially disease-causal modules, we utilized the method described in [7] to perform gene-set-based GWAS enrichment analysis. Summary results of association of SNPs with SZ or BD were retrieved from [8,9], respectively. For each gene in a given module, we assigned SNPs within 110 kb upstream and 40 kb downstream to it. We counted the number of SNPs (N) assigned to all genes in each module. Then we determined the proportion of SNPs with nominal association p-values<0.05 in the module. Next, we randomly selected N SNPs from GWAS data with replacement and calculated the proportion of SNPs with GWAS p-value<0.05. We repeated random selections 10000 times and the p-value of GWAS enrichment in a module was determined as the ratio of random selections whose proportion of nominally significant SNPs was greater than that of the given module. Finally, FDR correction was performed to identify modules with significant SZ-or BD-associated GWAS signal enrichment.

MeDIP-seq analysis
Methylated DNA immunoprecipitation and sequencing (MeDIP-seq) of 6 SZ patients and 6 controls was performed as previously described [10]. Briefly, DNA was collected from BA24 of samples and then fragmented by sonication to a mean size of approximately 250 bp, followed by end-blunting, dA addition to the 3'-end and ligation of adapters. After PCR amplification, DNA fragments were subjected to 50 bp paired-end sequencing using Illumina HiSeq2000. After quality control, raw reads were aligned to human reference genome hg19 using bowtie (version 1.1.1) [11] with two mismatches. Only the nonredundant, uniquely mapped reads were retained for subsequent analysis.
To quantify the methylation levels of lincRNA, MeDIP-seq data of BA24 was processed using MEDIPS package (version 1.12.0) [12]. Specially, the relative methylation score (rms) was calculated with a sliding window of 250 bp. Then the absolute methylation score (ams) which was corrected for CpG density was calculated for each DELincRNA in each sample through dividing mean rms by mean coupling factors of bins that located in the promoter (-2 to 0.5 kb from TSS). Differential methylation was calculated using Wilcoxon rank sum test among 35 DELincRNAs. A lincRNA was defined as significantly differentially methylated if P-value<0.05.