Evidence for the association of chromatin and microRNA regulation in the human genome

Both microRNAs (miRNAs) and chromatin regulation play important roles in cellular processes and they function at different regulatory levels of transcription. Although efforts have been devoted to the investigation of miRNA and chromatin regulation, there’s still no comprehensive work to illustrate their relationships due tothe lack of whole-genome wide datasets in different human cellular contexts. Based on the recently published large-scale epigenetic data, we examined the association between miRNA and epigenetic machinery. Our work confirmed a general relationship between miRNA biogenesis and chromatin features around pre-miRNA genomic regions. Obvious enrichments of DNA methylation and several histone modifications were observed within the pre-miRNA genomic region, which werecorrelated with miRNA expression levels. Furthermore, chromatin features at genepromoter regionsweretightly associated with miRNA regulation. Interestingly, we found that genes with their promoter regions located in the active chromatin state regions tend to have a higher probability to be targeted by miRNAs. This worksuggests that miRNAs and chromatin features are often highly coordinated, which provides a guide to deeply understand the complexity of gene regulation.


INTRODUCTION
Mounting evidence have shown that miRNAs, an abundant class of small non-coding RNAs, are key posttranscriptional regulators of gene expressionin a wide variety of organisms ranging from plants to worms and mammals [1][2][3][4]. Recent studies have revealed various molecular mechanisms by which miRNAs downregulate their target mRNAs [5,6]. Around 30%-80% of the human genes are predicted to be regulated by miRNAs [7][8][9][10][11]. Each miRNA can target multiple genes and, in turn, more than one miRNAs can bind a single mRNA target. It has been realized that the mechanism of miRNA regulation is quite complicated and needs to www.impactjournals.com/oncotarget/ Oncotarget, 2017, Vol. 8, (No. 41), pp: 70958-70966 Research Paper be scrutinized by the network-based systems biology approaches [12].
Recent studies have revealed that chromatin is one of the most complex molecular ensembles in the cell [13]. The eukaryotic DNA is tightly wrapped around histone octamers to form nucleosomes. Chromatin consists of arrays of nucleosomes with many dynamic features such as DNA methylation, post-translational histone modifications, as well as the binding ofchromatin-remodeling complexes and modification binding proteins, etc. [14][15][16]. It has been documented that chromatin features are involved in both activation and repression of transcription [17]. For example, H3K4me1 and H3K4me3 are tightly associated with transcriptional activation, while H3K27me3 and H3K9me3 are correlated with transcriptional repression [18][19][20][21]. The influence of chromatin on gene regulation is supported by the finding that histone post-translational modifications lead to the recruitment of protein complexes that regulate transcription [17,20,22].
Given the importance of miRNA and chromatinregulation in the post-transcription regulation process, it is not surprising that miRNA and chromatin regulation are coordinated. Amongthe several regulatory mechanisms, one is based on epigenetic modifications. Recently, it has been proved that miRNA genes are subject to hyper-methylation and hypo-methylation in a tumor-and tissue-specific manner [23]. On the other hand, epigenetic features of specific genes are also correlated with miRNA regulation. For example, miR-148 has been shown to target DNMT3B gene [24], reflecting a regulatory feedback loop between epigenetic regulation and miRNAs. In this way, miRNA-epigenetic machinery forms an intricate network regulating gene expression.
Thanks to the wealth datasets from the ENCODE project [25], which opens the door for us to comprehensively explore the relationship between miRNAs and chromatin features in different human cellular contexts. In this study, we employed chromatin accessibility, DNA methylation and different types of histone modification data generated by the ENCODE project in six human cell lines to analyze their relationship to miRNAs. Our work revealed severalnew insights: (1) certainchromatin features around pre-miRNA regions weretightly associated with miRNA expression in different cell lines; (2) the promoters of miRNAs target genes were preferentially located in 'open' chromatin domains; (3)miRNA target gene promoters were negatively correlated with DNA hypomethylation; (4) active histone modification marks of gene promoter regions showed different patterns between miRNA targets and nontargets. These results provided a more comprehensive view ofthe relationship between miRNA and chromatin features. Abetter understanding of the relationshipbetweenmiRNA and chromatin regulation will help us to understand the complexity of transcriptional regulation.

Chromatin features can significantly influence miRNA transcription
Chromatin features have been thought to regulate the transcription of miRNA genes in a manner similar to that of protein-coding genes [26]. To comprehensively study the relationship between chromatin regulation and miRNA expression in multiple human cellular contexts, we characterized12 genome-wide chromatin tracks, including 10tracks of histone modification marks, one track of DNA methylationand one track of chromatin accessibility in six human cell lines (see Material and Methods). These data derived from extensive experimentswere performed by different ENCODE production groups, enabling integration across various types of chromatin features in different cellular contexts.
At first, we measured chromatin features enrichment around human pre-miRNA genomic sequences. Average levels of histone modifications were calculated across the 4,000 bp window surrounding the center of each pre-miRNA sequence. We found obviously higher signals of different chromatin featuresin expressed miRNAs around the pre-miRNA sequencesthan expressionsilenced miRNAsand random sequencesfrom intergenic regions (Figure 1), suggesting that chromatin features are highly enriched within microRNA precursor sequences. Next, we plotted the distributions of chromatin features around pre-miRNA sequences. Obvious enrichments of several chromatin features were observed in expressed miRNAs ( Figure 2), particularly in highly expressed miRNAs in comparison to lowly expressed miRNAs and silenced miRNAs in human embryonic stem cells. The same conclusions were reached in all six human cell linecontexts. Notably, these activation associated histone marks significantly occupiedhighly expressed miRNAs.Furthermore, consistent with previous work [23], it was apparent that DNA methylations around pre-miRNA regions werecorrelated with miRNA expression levels ( Figure 2). These results further validated the tight relationship between chromatin regulation and miRNA biogenesis.
To further investigate the correlation of chromatin features and miRNA expression, we combined these chromatin features around pre-miRNAs genomic sequences and used them as input features. Based on the expressed and silenced miRNA groups, we applied the SVM classifier model for predicting miRNA expression. The performance of the model was assessed by cross-validation (see Materials and methods). The result indicated that our model achieved a comparable information when predicting miRNA expression (measured by the area under the receiver operator characteristic curve, AUC) in different cell lines ( Figure 3A, AUC H1-hESC =0.64, AUC GM12878 =0.65, AUC HepG2 =0.69, www.impactjournals.com/oncotarget AUC K562 =0.65, AUC HeLa-S3 =0.74, AUC A549 =0.71, respectively). When considering the highly expressed and silenced miRNA group, we obtained a higher level of classification accuracy with the AUC values ranging from 0.83 to 0.91 ( Figure 3B, AUC H1-hESC =0.87, AUC GM12878 =0.83, AUC HepG2 =0.91, AUC K562 =0.85, AUC HeLa-S3 =0.89, AUC A549 =0.86, respectively). Our results comprehensively indicated that it's not a causal effect of chromatin features on miRNA expression, and chromatin features are involved in miRNA transcription.

Correlation between miRNA regulation and chromatin features of gene promoter regions
Next, we examined the association betweenchromatin and miRNA regulation. Because the expressions of miRNAs are highly variable across different human cell lines, only expressed miRNAs in each cell line were considered. Previous works reported that gene expression was related to histone modification in their promoter regions [17,27]. Therefore, the relationship between miRNA regulation and chromatin features (chromatin accessibility, DNA methylation and histone modification) of the gene promoter regions(defined as 4,000 bp window relative to the transcription start site) were investigated.

miRNAs preferentially target genes with open chromatin domain in their promoters
In order to investigate the relationship between miRNA regulation and chromatin accessibility, recent published DNase I hypersensitive sites (DHS) data generated by DNase-Seq method were compiledin these six human cell lines. Weexamined the average DNase signals within the promoter regions to the miRNA targets and non-targets, respectively. The result showed that DHS peaks were preferentially located in the promoter regions of miRNA targets than non-targets (Table 1).SinceDNase I sensitivity provides a quantitative marker of regions of open chromatin, we grouped all genes based on theDNase I hypersensitive signals in their promoter regions and calculating the miRNA target rate in each group. As shown in Figure 4, DNase I hypersensitive signals weresignificantly correlated with the miRNA target rate in all six human cell lines (R H1-hESC =0.68, R GM12878 =0.69, R HepG2 =0.73, R K562 =0.63, R HeLa-S3 =0.74, R A549 =0.7, respectively). These results indicated that genes with their promoter regions exposed to the open chromatin state are more proneto be targeted by miRNAs.

miRNAs preferentially target genes with low DNA methylation level in their promoters
Next, we attempted to explore the relationship between promoter methylation and miRNA regulation by taking   advantage of the recently published human methylome data in six human cell lines. In this study, we found that the promoter DNA methylation levels of miRNA targets weresignificantly lower than those of miRNA non-targets in six cell lines, indicating their functional complementation ( Table 1). The ratio of the observed to the expected CpG content (CpG o/e ) has been used as a proxy for the DNA methylation status in the human genome. Accordingly, genes were classified into hyper-methylated group and hypo-methylated group according to the extent of CpG o/e , so that hyper-methylated group hadlower-than-expected CpG o/e and hypo-methylated group hadhigh CpG o/e . We found that DNA methylation levels in hyper-methylated promoters hadno significant differences between miRNA targets and non-target.  To better understand the relationship between histone modifications and miRNA regulation, we examined the enrichment profiles of 10 histone modification marks generated by ChIP-seq method around gene promoter regions between miRNA targets and nontargets.Except for H3K9me3, H3K27me3 and H3K36me3, We found that histone marks were significantly more enriched in the promoter regions of miRNA targets than miRNA non-targetsacross different cellular context (Table  1). H3K36me3 is associated with transcribed regions, while H3K9me3 and H3K27me3 are both considered marks for transcriptional repression. Since histone marks exhibit combinatorial patterns in the human genome [28], we used a hierarchical clustering method to analyze the activation and repression histone modification patterns of gene transcription. In each cell line, genes were segregated into active cluster (cluster A) and repressed cluster (cluster R), respectively. It was obvious that genes in cluster A correspondedto activating histone marks (such as H3K4me3 and H3K27ac) and hadhigher expression levels, whereas cluster R having more repressed marks (such as H3K9me3 and H3K27me3) tendedto be lowly expressed.Our result indicated that genes nested in cluster A are preferentially targeted by miRNA ( Figure 5), which suggested the associations between miRNA regulation and histone modifications corresponding to gene activation.

DISCUSSION
Transcription is a complicated dynamic process, involving a combination of various regulators. In this study, we undertook a comprehensive analysis of the relationship between miRNA and chromatin features across multiple human cell line, and we found that chromatin features wereassociated with both miRNA biogenesis and posttranscriptional regulation.Chromatin states and miRNAs are principle classes of gene regulators in transcription. The epigenetic landscape can determine the chromatin structural states that ultimately control the transcriptional outcome of the cell accommodate developmental or environmental requirements. Our work indicated an interconnection between miRNAs and chromatin machinery, which could provide a useful starting point to explore the molecular basis of morphological complexity.
Although the pathological and physiological importance of miRNAs has been appreciated, little is known regardingtheir regulation. Up to date, mounting evidence have indicated that a substantial number of miRNA genes are subjected to epigenetic alterations [29,30]. An extensive analysis of miRNAs has shown that most of them are associated with CpG islands, suggesting that they are subjected to the regulation of DNA methylation [31]. Furthermore, several lines of evidence have proved that aberrant methylation status can be responsible for the deregulated expression of miRNAs in cancers [32]. Our results indicated that nearly all chromatin features are highly enriched in pre-miRNA regions, and some specific histone modifications and DNA methylations are associated with miRNA expression. Similar to protein-coding genes, we found that chromatin features are predictive of miRNA expression, suggesting some similarities between mRNA maturation and miRNA biogenesis. However, the performance of our miRNA expression prediction model is less accurate than protein-coding gene expression prediction, this can be partially attributed to the factthat miRNA biogenesis is also under the control of other regulatory mechanisms, such as transcription factor [33], etc.
The influences of miRNAs on target gene expression can be roughly classified into two different types: 'tuning' and 'buffering' [34]. In expression tuning, miRNAs relate to the expression level of their targets, whereas expression buffering relates to the reduced expression variance.The regulation of miRNA is a complicated process, and the association betweenmiRNAs and chromatin regulation leads to a more complicated scenario during transcription process. Recent works have documented that gene regulatory networks are always composed of some small sets of recurring interaction patterns called 'motifs' [35,36]. In cases studies so far, these network motifs are likely to preserve their phenotypes, wired into the regulatory networks of the cell.In this work, we provided evidence of associations between miRNAs and chromatin regulation. Both chromatin features and miRNAs can exert a widespread impact on gene expression, and the miRNAs expression and their post-transcriptional regulation are influenced by chromatin regulation. They present a prevalence of integrated transcriptional regulatory circuit. Posttranscriptional control of expression variation is carried out by miRNAs, such that the miRNA and target genes are wired into an incoherent feed forward loop. Within such incoherent feed forward loop architecture, miRNAs can buffer expression variation of their target genes against the fluctuation of chromatin features.The incoherent feed forward loop is characterized as one of the most common network motifs in transcription networks, which is largely dominated in both bacteria [37] and fungi [38].According to this setup, we might expect that chromatin features and miRNAs mediated mechanism can maintain homeostasis and increase network robustness.
Taken together, our work provided a comprehensive investigation on the miRNA-epigenetic relationship. The results suggest that these two principles of gene regulations are not entirely separable, and a complicated mechanism might tie it all together. We speculated that the emerging pictures of transcription regulation are much more complicated than previously thought.This study comprehensively provided the first attempt to understand the complexity of gene regulation control.

Epigenetic data sources
We compiled 12 epigenetic features in six human cell lines, consisting of embryonic stem cells (H1-hESC), B-lymphoblastoid cells (GM12878), hepatocellular carcinoma cells (HepG2), erythrocyticleukaemia cells (K562), epithelial carcinoma cells (HeLa-S3) and alveolar basal epithelial cells (A549). All data used in this work were downloaded from the University of California, Santa Cruz (UCSC) hg19 genome browser (http://genome.ucsc. edu/encode/).Histone modifications within the histone tails were compiled, and these data were identified by ChIP-seq method generated by the ENCODE project. This dataset were generated using the same platform, containing 10 histone modifications in each cell line (H3K4me1, H3K4me2, H3K4me3, H3K27me3, H3K36me3, H3K9me3, H3K79me2, H3K20ac, H3K27me1 and H3K9ac).DNase I hypersensitivity is an alternative measurement of chromatin accessibility, and DNase-Seq provided a powerful technique for identifying genome-wide DNase I hypersensitive sites [39]. In order to determine whether mRNAs targeted by miRNA are preferentially located in open chromatin domains, we compiled DNase I hypersensitive site data generated by DNase-Seq method from different cell lines. DNA methylation data were also downloaded from UCSC genome browser, and these data were all generated using 450K methylation BeadChip.

Annotation of pre-miRNAs, miRNAs and their target genes
We downloaded the annotation of 1,594 human small hairpin precursor (pre-miRNA) from miRBase (http://www. mirbase.org/) [40], and explored the distribution of these chromatin features across4,000 bp windows surrounding the centers of pre-miRNAs genomic sequences. Mature miRNAs (2,233) were also retrieved from miRBase, and their expression levels were quantified using the next-generation sequencing method generated by the ENCODE project.
In this study, we used three current in silico miRNA target prediction methods to determine miRNA target genes, including TargetScan [41], PITA [42] and Pictar [43].To minimize the false positive of miRNA target prediction, a high-quality miRNA target data set was generated by intersecting data generated by at least two different in silico miRNA target prediction methods. Those without being detected by any method were defined as miRNA non-targets. In this work, miRNA target rate was defined as the ratio of the number of genes that were miRNA targets to the total number of genes in human.

Support vector machine model for miRNA expression prediction
Based on the miRNA expression data, miRNAs could be classified into expressed (sequencing reads can be detected) and silenced groups. Furthermore, expressed miRNAs were classified into highly and lowly expressed groups using K-means clustering method. Chromatin features across 4,000 bp windows surrounding the centers of pre-miRNA sequences were integrated. Support vector machine (SVM), implemented by LibSVM package [44], was then introduced for classification. We evaluated the performance of the models using two-fold cross-validation. Briefly, we randomly divided the data into two subsetswith equal sizes, one training set and one testing set, respectively. The model was trained using the training set and applied to the testing set to predict expression. The prediction power of the SVM model was estimated based on the testing set. The model generates a probability indicating how likely a miRNA is to be expressed. By setting different threshold values, we can depict the sensitivity (true positive rate) and the specificity (true negative rate) of the prediction. Receiver operator characteristic (ROC) curve was used to show the classification accuracy of our SVM model.