Genome-wide 5-hydroxymethylcytosine modification pattern is a novel epigenetic feature of globozoospermia.

Discovery of 5-hydroxymethylcytosine (5hmC) in mammalian genomes has excited the field of epigenetics, but information on the genome-wide distribution of 5hmC is limited. Globozoospermia is a rare but severe cause of male infertility. To date, the epigenetic mechanism, especially 5hmC profiles involved in globozoospermia progression, remains largely unknown. Here, utilizing the chemical labeling and biotin-enrichment approach followed by Illumina HiSeq sequencing, we showed that (i) 6664, 9029 and 6318 genes contain 5hmC in normal, abnormal, and globozoospermia sperm, respectively; (ii) some 5hmC-containing genes significantly involves in spermatogenesis, sperm motility and morphology, and gamete generation; (iii) 5hmC is exclusively localized in sperm intron; (iv) approximately 40% imprinted genes have 5hmC modification in sperm genomes, but globozoospermia sperm exhibiting a large portion of imprinted genes lose the 5hmC modification; (v) six imprinted genes showed different 5hmC patterns in abnormal sperm (GDAP1L1, GNAS, KCNK9, LIN28B, RB1, RTL1), and five imprinted genes showed different 5hmC patterns in globozoospermia sperm (KCNK9, LIN28B, RB1, SLC22A18, ZDBF2). These results suggested that differences in genome-wide 5hmC patterns may in part be responsible for the sperm phenotype. All of this may improve our understanding of the basic molecular mechanism underlying sperm biology and the etiology of male infertility.


INTRODUCTION
Globozoospermia (also called round-headed sperm syndrome), characterized by 100% round-headed spermatozoa and lack of acrosome, is a rare but severe cause of male infertility [1]. Familial globozoospermia a suggests that globozoospermia most probably originates in spermatogenesis, specifically in acrosome formation and sperm head elongation, which is largely determined by genetic and epigenetic factors [1][2][3][4]. As is already known, sperm carry distinctive epigenetic modificat ions that are adjusted by reprogramming during the spermatogenesis and fertilization process [1,2]. However, to date, little is known about epigenomics, especially the 5hydroxymethylcytosine (5hmC) profiles in the pathophysiology of globozoospermia. 5hmC, a novel modified cytosine, is oxidized from 5methylcytosine by the ten-eleven translocation family of proteins, and the discovery of 5hmC in mammalian genomes has excited the field of epigenetics [5]. In addition, 5hmC, as a unique and dynamic mark of cellular state, has been shown to be involved in diverse cellular processes, including transcriptional regulation, DNA methylation regulation, stem cell pluripotency and tumorigenesis [6]. Notably, recent studies have suggested that highly ordered alterations of 5hmC are potentially responsible for the differentiation of spermatogenic cells [7]. For this reason, the present study was undertaken to investigate comprehensive 5hmC profiling in normal, abnormal, and globozoospermia sperm by a chemical-labeling and biotinenrichment approach followed by Illumina HiSeq sequencing, and to provide novel insight into the epigenetic-mediated dysfunction in the pathogenesis of globozoospermia.

Isolation and identification of normal, abnormal, and globozoospermia sperm
All normal sperm clearly have an oval head with a long tail ( Figure 1A). While various misshapen head or tail defects such as amorphous head, crooked and double tail were observed in abnormal sperm ( Figure 1B). The globozoospermia sperm had round heads ( Figure 1C). In all, these three sets of sperm represent typical sperm types with different fertility ability and were used for studying the impact of 5hmC on male infertility.

5hmC enrichment and sequencing in normal, abnormal, and globozoospermia sperm
To assess the general content of 5hmC, we first evaluated the existence of 5hmC in sperm genome by dot blot assay (Figure 2A). We detected a significant quantity of 5hmC in as little as 100 ng sperm genomic DNA. It is quite interesting to note that the amount of 5hmC is remarkably changed in different tissues in contrast to the stable patterns of 5-methylcytosine (5mC) [8].
To generate genome-wide maps of 5hmC in sperm genome, we used a well-established chemical-labeling and biotin-enrichment approach to enrich 5hmCcontaining DNA fragments from normal, abnormal, and globozoospermia genomic DNA and subjected them to high throughput sequencing. Generally, we got 45 million to 60 million sequencing reads and mapped these to human genome with approximately 90% successful mapping rates ( Figure 2B). We identified 5hmC enriched peaks using a modelbased analysis of CHIPseq software (MACS) (P < 10 −5 , fold enrichment > 10). In total, we identified 20486, 38282 and 19354 peaks in normal, abnormal, and globozoospermia sperm, respectively (Figures 2C and 2D,  Supplementary Table S1).

Genomic features of 5hmC in normal, abnormal, and globozoospermia sperm
We plotted those 5hmC peaks on Ref Seq annotated genes and identified 6664, 9029 and 6318 genes containing 5hmC in normal, abnormal, and globozoospermia sperm, respectively ( Figure 3A), of which there was an especially strong overlap with 3576 genes in all these 5hmC gene pools ( Figure 3B). The total and specific 5hmCcontaining gene lists are shown in Supplementary Table S2. Furthermore, analysis of genome-wide 5hmC-containing genes shows that 5hmC are not distributed randomly on chromosomes, but exhibit a unique pattern on specific chromosomes ( Figure 3C). With regards to the distribution region, it is striking that most of 5hmC peaks are located in introns ( Figure 3D), whereas in ES cells 5hmC is preferentially present in the upstream of gene bodies and in the brain it is enriched in gene bodies [9,10].

GO analysis of 5hmC-containing genes in normal, abnormal, and globozoospermia sperm
The preferential distribution of 5hmC in introns in sperm genome suggested it may have distinct roles in sperm maturation and function. As shown in Figure 4A and Supplementary Table S3, 5hmC-containing genes in all three genomes share cell motion and signal transduction pathways, indicating 5hmC has conserved Notably, cellular component organization is lost in globozoospermia but present in normal and abnormal sperm, whereas cell adhesion and response to (chemical) stimulus pathways is additionally involved in abnormal and globozoospermia without normal sperm.
To evaluate aberrant 5hmC modification in sperm dysregulation, we further performed GO analysis of specific 5hmCcontaining genes in normal, abnormal, and globozoospermia sperm genome ( Figure 4B, Supplementary Table S3). We found the organic substance metabolic process pathway is most significantly anomalous in normal, abnormal, and globozoospermia sperm. In particular, 10 gamete generation genes are implicated in abnormal sperm (Table 1), suggesting aberrant 5hmC modification of these genes may affect gamete generation, potentially leading to sterility of abnormal sperms.

5hmC-containing genes overlap with imprinted genes among normal, abnormal, and globozoospermia sperm
To evaluate 5hmC modification alteration in imprinted genes, we compared 5hmC-containing genes with 96 known imprint genes from the imprinted gene database (http://www.geneimprint.com/site/home), and visualized by area-proportional Venn diagrams using an online tool BioVenn. In total, approximately  Mapping results of 5hmC sequencing in normal, abnormal, and globozoospermia sperm genomes. Raw reads were aligned to human UCSC hg19 and peaks calling using MACS (P < 10 −5 , fold enrichment > 10). (C) 5hmC peak numbers in normal, abnormal, and globozoospermia sperms. 20486, 38282 and 19354 peaks were identified in normal, abnormal, and globozoospermia sperm, respectively. (D) Average value of 5hmC fold enrichment in normal, abnormal, and globozoospermia sperm. www.impactjournals.com/oncotarget 40% of the imprinted genes (38 imprinted genes) are 5hmC-containing genes in normal, abnormal, and globozoospermia sperm genomes ( Figure 5A). In detail, 30, 30 and 21 imprinted genes contained 5hmC in normal, abnormal, and globozoospermia sperm genomes, respectively ( Table 2). The Venn diagram shows that normal, abnormal, and globozoospermia sperm share 14 imprinted genes ( Figure 5A). Compared with normal sperm, 6 imprinted genes lost 5hmC modification, while another 6 imprinted genes gained 5hmC modification in abnormal sperm ( Figure 5B). Interestingly, compared with normal sperm, a large portion (14 out of 30) of imprinted genes lost 5hmC modification and 5 imprinted genes gained 5hmC modification in globozoospermia patient ( Figure 5C), suggesting that the loss of 5hmC in imprinted genes may be associated with globozoospermia.

DISCUSSION
Emerging evidence indicates that epigenetic mechanisms, especially the aberrant DNA methylation (5mC) of imprinted genes in sperm DNA, play an important part in abnormal sperm parameters and male infertility [11]. However, it is interesting to note that sperm methylation profiles have been recently described [12], but to date, no studies have examined the distribution and features of 5hmC in sperm genome, and few studies have linked 5hmC to male infertility. The current study generated the first landscape of 5hmC in normal, abnormal, and globozoospermia sperm genomes, and provided novel insights into 5hmC-related sperm physiology and pathology.
In this study, we identified 6664, 9029 and 6318 genes containing 5hmC in normal, abnormal, and globozoospermia sperm, respectively. Notably, 5hmC is exclusively localized in intron, a distinct characteristic difference from previous reports that 5hmC is preferentially enriched within exons and near transcriptional start sites in embryonic stem cells [10], and alteration of 5hmC modification occurred mainly at gene bodies, along with environmental changes [13] and neurons development [14]. In addition, the discrete distribution of 5hmC in the intron regions of sperm genome suggested that it may have specific roles in sperm maturation and function.
We also found that 5hmC-containing genes are involved in various functional pathways, some of which have important roles in sperm, such as spermatogenesis, sperm motility and morphology, and sperm cell maturation. For instance, (i) growth differentiation factor 9 (GDF9) is significantly associated with sperm quality traits, and is involved in the initiation or maintenance of spermatogenesis [15]; (ii) methylation patterns of the nuclear ribonucleoprotein polypeptide N (SNRPN) promoters are associated with changes in sperm motility and morphology, which could lead to male infertility [16]; (iii) membrane-associated guanylate kinase, WW and PDZ domain containing 2 (MAGI2), which is known to localize at the tight junction of epithelial cells, plays an important part in sperm cell maturation [17]; (iv) FIGLA (factor in the germ line, alpha) encodes a germ cellspecific basic helixloophelix transcription factor, which has essential roles in the repression of sperm-associated genes during normal postnatal oogenesis [18]. Cell cycle (57) Single organism metabolic process (33) Transport (14) RNA processing (51) Secretion (20) Organic substance metabolic process (26) Regulation of multicellular organismal (92) Immune response (133) Organic substance metabolic process (138) Cell adhesion (113) Regulation of carbohydrate metabolic process (7) Cellular macromolecule metabolic process (52) Gamete generation (10) Cellular response to organic substance (14) Neurological system process (38) Multicellular organismal development (11) Single muticellular organism process (147) Regulation of tube size (7) It is well known that aberrant DNA methylation patterns, mainly in imprinted genes, have been associated with sperm dysfunction. Therefore, we further compared the 5hmC-containing genes among normal, abnormal, and globozoospermia sperm genomes, referring to the imprinted gene database. We found that approximately 40% (38 out of 96) imprinted genes had 5hmC modification in normal, abnormal, and globozoospermia sperm genomes. The globozoospermia sperm showed that a large portion (14 out of 30) of imprinted genes lose 5hmC modification, compared with normal sperm, suggesting that the loss of 5hmC in imprinted genes may have essential roles in globozoospermia progression. As shown in Table  2, six imprinted genes showed different 5hmC patterns between normal and abnormal sperm (GDAP1L1, GNAS, KCNK9, LIN28B, RB1, RTL1), and five imprinted genes showed different 5hmC patterns between normal and globozoospermia sperm (KCNK9, LIN28B, RB1, SLC22A18, ZDBF2). These data may help to identify several novel epigenetically regulated genes that are possibly involved in abnormal sperm and globozoospermia sperm, and these aberrant 5hmC-related genes may also be potential biomarkers of abnormal sperm parameters.
Taken together, our study provided a genomewide distribution of 5hmC in normal, abnormal, and globozoospermia sperm, and understanding the epigenetic mechanisms may yield new insight into the sperm biology and etiology of male infertility.

Ethical statements
The investigation was conducted in accordance with the ethical standards and according to the Helsinki Declaration of 1975, and was approved by the Institutional Review Board at China Medical University.

Preparation of genomic DNA, 5hmC specific chemical labeling and affinity purification
Genomic DNA was prepared using a Wizard Genomic DNA Purification kit (Promega Cat.:A1120) by following the manufacturer's instructions. Equal amounts of genomic DNA were extracted from 1 × 10 5 normal, abnormal, and globozoospermia sperm, respectively.

Sequencing of 5hmC-enriched genomic DNA
5hmC-enriched genomic DNA libraries were generated following the Illumina protocol for 'Preparing Samples for CHIP Sequencing of DNA'. Then, 20 ng of 5hmC-enriched DNA was used to initiate the protocol. DNA fragments were gel purified after the adapter ligation step. PCRamplified DNA libraries were quantified on an Agilent 2100 Bio analyzer using a quantitative PCR. We performed 100 bp single end sequencing on Illumina Hiseq2000 to get a 5hmC-enriched DNA fragment sequence.

5hmC reads mapping and peaks calling
The deep sequencing reads were stripped of the adaptor sequences with FASTX tool kit (http://hannonlab. cshl.edu/fastx_toolkit/). Reads that were less than 25 nt in length or contained an ambiguous nucleotide were discarded. The remaining reads were aligned to human UCSC hg19 genome, with up to two mismatches allowed, by the BWA software [19]. All non-redundant uniquely mapped reads were used for peaks calling using MACS (P < 10 −5 ) [20]. Association of 5hmC peaks with genomic features was performed by overlapping peak locations with known genomic features obtained from hg19 database. Location information of CDS, intron, 3′UTR, 5′UTR, upstream (200 bp), and downstream (200 bp) were downloaded from UCSC.

Immuno-dot-blot assay
Genomic DNA was denatured in TE buffer for 10 min at 95°C and immediately chilled on ice for 5 min. Dot blot was performed on a Bio-Dot Apparatus (#170-6545, Bio-Rad); 50, 100, 200 and 400 ng of each DNA sample was spotted on the positively charged nylon membrane, respectively, then the membrane was baked for 2 h at 80°C until completely dry, followed by UV254 crosslink for 10 min to fix DNA on the membrane. The membrane was then blocked briefly with 5% nonfat milk for 1.5 h at room temperature. The primary rabbit anti-5-hydroxymethylytosine antibody (1:10000, #39769, Active Motif) was applied to the membrane and incubated at RT for 1 h or overnight at 4°C. After incubation with a peroxidaseconjugated antirabbit IgG secondary antibody, the signal was visualized by using ECL (Millipore). The dotblot densities were analyzed with Image J software. The 5hmC-containing DNA was used as a positive control, and the normal C, 5mC, 5-carC-containing DNA were used as the negative controls to verify the specificity of 5hmC antibody.

Accession number
All original data sets have been deposited in the Gene Expression Omnibus Database under the accession number GSE46135.

Gene ontology analysis
Gene ontology analyses were performed on sets of unique RefSeq identifiers using DAVID bioinformatics resources 6.7 functional annotation tools [21]. GO Biological processes and Interpro database were used. Categories with P < 0.05 were considered statistically significant. The analysis results are visualized as an enrichment map by using Cytoscape software [22].