Ovarian cancer variant rs2072590 is associated with HOXD1 and HOXD3 gene expression

Ovarian cancer (OC) is a common cancer in women and the leading cause of deaths from gynaecological malignancies in the world. In addition to the candidate gene approach to identify OC susceptibility genes, the genome-wide association study (GWAS) methods have reported new variants that are associated with OC risk. The minor allele of rs2072590 at 2q31 was associated with an increased OC risk, and was primarily significant for serous subtype. The OC risk-associated SNP rs2072590 lies in non-coding DNA downstream of HOXD3 and upstream of HOXD1, and it tags SNPs in the HOXD3 3′ UTR. We think that the non-coding rs2072590 variant may contribute to OC susceptibility by regulating the gene expression of HOXD1 and HOXD3. In order to investigate this association, we performed a bioinformatics analysis by a functional annotation of rs2072590 variant using RegulomeDB (version 1.1), HaploReg (version 4.1), and PhenoScanner (version 1.1). Using HaploReg, we identified 19 genetic variants tagged by rs2072590 variant with with r2 >= 0.8. Using RegulomeDB, we identified that three genetic variants are likely to affect TF binding + any motif + DNase Footprint + DNase peak. Other genetic variants are likely to affect TF binding + DNase peak. Using PhenoScanner (version 1.1), we identified that these 19 genetic variants could significantly regulate the expression of nearby genes, especially the HOXD1 and HOXD3 in human ovary tissue.


INTRODUCTION
Ovarian cancer (OC) is a common cancer in women and the leading cause of deaths from gynaecological malignancies in the world [1].Like other human complex diseases, OC is caused by the combination of genetic variants and environmental factors, including the familial BRCA1 and BRCA2 mutations and common genetic variants of lower penetrance [1].In addition to the candidate gene approach to identify OC susceptibility genes, the genome-wide association study (GWAS) methods have also reported new variants that are associated with OC risk [1].
However, the exact genetic mechanisms for these OC susceptibility variants are still unclear [2].It is reported that the potential associations between gene expression and OC risk alleles may connect risk variants to their putative target genes/transcripts and biological pathways [2].The minor allele of rs2072590 at 2q31 was associated with an increased OC risk (OR = 1.16, 95% CI 1.12-1.21,p = 4.5 × 10 −14 ), and was primarily significant for serous subtype (OR = 1.20, 95% CI 1.14-1.25,p = 3.8×10 −14 ) [3].The 2q31 locus contains a family of homeobox (HOX) genes involved in regulating embryogenesis and organogenesis [3].Altered expression of HOX genes has been reported in many cancers [3].The OC risk-associated SNP rs2072590 lies in non-coding DNA downstream of HOXD3 and upstream of HOXD1, and it tags SNPs in the HOXD3 3′ UTR [3].

Research Paper www.impactjournals.com/oncotarget
We think that the non-coding rs2072590 variant may contribute to OC susceptibility by regulating the gene expression of HOXD1 and HOXD3.In order to investigate this association, we conducted a functional annotation of rs2072590 variant using RegulomeDB (version 1.1) [4], HaploReg (version 4.1) [5], and PhenoScanner (version 1.1) [6].

LD analysis using HaploReg
Using the LD information from the 1000 Genomes Project (EUR), we got 19 genetic variants tagged by rs2072590 variant with with r 2 >= 0.8.These 19 genetic variants are located around the HOXD4, HOXD3, AC009336.24and HOXD-AS1.Here, we give the detailed information including the LD information about these variants in Table 1.

Functional annotation using RegulomeDB
RegulomeDB was used to annotate these 19 genetic variants with known and predicted regulatory elements.The results showed that three genetic variants including rs1562315, rs2551802 and rs6433571 likely to affect TF binding + any motif + DNase Footprint + DNase peak, as described in Table 2. Other genetic are likely to affect TF binding + DNase peak.More detailed results are described in Table 2.

DISCUSSION
Overall, the GWAS methods have reported new variants that are associated with OC risk [1].However, the exact genetic mechanisms for these OC susceptibility variants are still unclear [2].Evidence shows that the potential associations between gene expression and OC risk alleles may connect risk variants to their putative target genes/transcripts and biological pathways [2].Zhao et al. selected seven OC risk variants including rs3814113 on 9p22, rs2072590 on 2q31, rs2665390 on 3q25, rs10088218, rs1516982, rs10098821 on 8q24, and rs2363956 on 19p13 [2].They evaluated the associations between gene expression and OC risk alleles using the whole genome mRNA expression data in 121 lymphoblastoid cell lines from 74 non-related familial ovarian cancer patients, and 47 non-cancer unrelated family controls [2].They identified two cis-associations between rs10098821 and c-Myc, and rs2072590 and HS.565379.
The OC risk-associated SNP rs2072590 lies in non-coding DNA downstream of HOXD3 and upstream of HOXD1, and it tags SNPs in the HOXD3 3′ UTR [3].However, Zhao et al. did not report any significant association between rs2072590 and HOXD1 or HOXD3.We think that the non-coding rs2072590 variant may contribute to OC susceptibility by regulating the gene expression of HOXD1 and HOXD3.Here, we conducted a functional annotation of rs2072590 variant using RegulomeDB (version 1.1) [4], HaploReg (version 4.1) [5], and PhenoScanner (version 1.1) [6].
Using HaploReg, we identified 19 genetic variants tagged by rs2072590 variant with with r 2 >= 0.8.Using RegulomeDB, we identified that three genetic variants are likely to affect TF binding + any motif + DNase Footprint + DNase peak.Other genetic variants are likely to affect TF binding + DNase peak.Using PhenoScanner (version 1.1), we identified that these 19 genetic variants could significantly regulate the expression of nearby genes, especially the HOXD1 and HOXD3 in human ovary tissue.

LD analysis using HaploReg
HaploReg is a tool for exploring annotations of the noncoding genome at variants on haplotype blocks   [5].HaploReg includes LD information from the 1000 Genomes Project, chromatin state and protein binding annotation from the Roadmap Epigenomics and the Encyclopedia of DNA Elements (ENCODE) projects, sequence conservation across mammals, the effect of SNPs on regulatory motifs, and the effect of SNPs on gene expression from eQTL studies [5].We used HaploReg (version 4.1) to identify the rs2072590 tagged variants using the LD information from the 1000 Genomes Project (EUR) with r 2 > = 0.8 [5].

Functional annotation using RegulomeDB
RegulomeDB (version 1.1) is a database that annotates SNPs with known and predicted regulatory elements in the intergenic regions of the human genome [4].Known and predicted regulatory DNA elements include regions of DNAase hypersensitivity, binding sites of transcription factors, and promoter regions that have been biochemically characterized to regulation transcription [4].RegulomeDB (version 1.1) includes the public datasets from Gene Expression Omnibus (GEO), the ENCODE project, and published literature [4].

Functional annotation using PhenoScanner
PhenoScanner (version 1.1) is a curated database holding publicly available results from large-scale GWAS [6].The motivation for creating this tool is to facilitate "phenome scans", the cross-referencing of genetic variants with a broad range of phenotypes, to help aid the understanding of disease pathways and biology [6].The catalogue currently contains nearly 3 billion associations and over 10 million unique SNPs [6].The results are aligned across traits to the same effect and non-effect alleles for each SNP [6].