Multiple analyses of large-scale genome-wide association study highlight new risk pathways in lumbar spine bone mineral density

Osteoporosis is a common human complex disease. It is mainly characterized by low bone mineral density (BMD) and low-trauma osteoporotic fractures (OF). Until now, a large proportion of heritability has yet to be explained. The existing large-scale genome-wide association studies (GWAS) provide strong support for the investigation of osteoporosis mechanisms using pathway analysis. Recent findings showed that different risk pathways may be involved in BMD in different tissues. Here, we conducted multiple pathway analyses of a large-scale lumbar spine BMD GWAS dataset (2,468,080 SNPs and 31,800 samples) using two published gene-based analysis software including ProxyGeneLD and the PLINK. Using BMD genes from ProxyGeneLD, we identified 51 significant KEGG pathways with adjusted P<0.01. Using BMD genes from PLINK, we identified 38 significant KEGG pathways with adjusted P<0.01. Interestingly, 33 pathways are shared in both methods. In summary, we not only identified the known risk pathway such as Wnt signaling, in which the top GWAS variants are significantly enriched, but also highlight some new risk pathways. Interestingly, evidence from further supports the involvement of these pathways in MBD.


INTRODUCTION
Osteoporosis is a common human complex disease [1][2].It is mainly characterized by low bone mineral density (BMD) and/or low-trauma osteoporotic fractures (OF), both of which have strong genetic determination [1][2].However, the specific genes influencing these phenotypic are largely unknown [1][2].Much effort has been put into identifying the genetic determinants of this disease, especially the genome-wide association studies (GWAS), which have recently provided rapid insights into genetics of osteoporosis [1][2][3].
In 2012, Estrada et al. performed the largest metaanalysis to date on lumbar spine BMD (LS-BMD; n = 31,800 cases) and femoral neck BMD (FN-BMD; n = 32,961), including 17 GWAS datasets from individuals of European and East Asian ancestry [3].They replicated the top BMD-associated markers in 50,933 independent subjects and the risk of low-trauma fracture in 31,016 individuals with a history of fracture (cases) and 102,444 controls [3].They identified 56 loci (32 new) associated with BMD at genome-wide significance [3].In 2015, Zheng et al. reported novel non-coding genetic variants with large effects on BMD (n = 53,236) and fracture (n = 508,253) individuals of European ancestry from the general population, and identified EN1 as a determinant of bone density and fracture by whole-genome sequencing [4].However, these newly identified susceptibility loci exert very small risk effects and cannot fully explain the underlying genetic risk.A large proportion of heritability has yet to be explained.
Pathway analyses of GWAS have been widely conducted in human complex diseases or phenotypes, such as Alzheimer's disease [5][6][7][8], rheumatoid arthritis [9][10] and body mass index [11].The existing largescale GWAS datasets provide strong support for the investigation of osteoporosis mechanisms using pathway analysis methods.Zhang et al. used a novel pathway-based analysis approach in wrist ultradistal radius BMD GWAS, examining approximately 500,000 single nucleotide polymorphisms (SNPs) from 984 unrelated whites [12].They identified the regulation-of-autophagy pathway to be the most significant signal for association with wrist ultradistal radius BMD.They confirmed the regulation-ofautophagy pathway to be significantly associated with arm BMD in the Framingham Heart Study sample containing 2187 subjects [12].
Lee et al. performed a pathway analysis of hip BMD GWAS in 5,715 Europeans [13].They identified eight significant pathways including gammahexachlorocyclohexane degradation, regulation of the smoothened signaling pathway, transmembrane activator and CAML-interactor and B cell maturation antigen stimulation of B cell immune response, endonuclease activity, regulation of defense response to virus, serine type endopeptidase inhibitor_activity, endoribonuclease activity, and myeloid leukocyte differentiation.
All these above findings show different risk pathways may be involved in BMD in different tissues.Here, we conducted multiple pathway analyses of a largescale lumbar spine BMD GWAS using two published gene-based analysis methods including roxyGeneLD [14] and PLINK [15].

Pathway analysis of BMD genes from the ProxyGeneLD
Using ProxyGeneLD, these 2,468,080 SNPs are assigned to 16898 genes, which include at least one adjusted SNP.Using the 1236 significant genes (unadjusted P<0.05), we performed a pathway analysis.We identified 51 significant KEGG pathways (adjusted P<0.01).Based on the classifications of the KEGG pathways, these 51 pathways can be mainly divided into immune system and diseases (n=10), environmental information processing (n=10), cellular processes (n=8), cancers (n=6), and infectious diseases (n=3).These significant pathways were described in

Pathway analysis of BMD genes from the PLINK
Using PLINK, these 2,468,080 SNPs are assigned to 16543 genes, which include at least one SNP.Using the 824 significant genes (unadjusted P<0.05), we performed a pathway analysis and identified 38 significant KEGG pathways (adjusted P<0.01).These 38 pathways can be mainly divided into immune system and diseases (n=10), cellular processes (n=8), environmental information processing (n=7), genetic and environmental information processing (n=3), infectious diseases (n=4).These significant pathways were described in Table 2.All p-values of individual gene from PLINK are described in Supplementary Table 3.The detailed gene information in significant KEGG pathways is described in Supplementary Table 4.

Shared KEGG pathways using both ProxyGeneLD and PLINK
We identified 33 pathways shared in pathway analyses of BMD genes using the "best SNP per gene" method and the meta-analysis method.These 33 pathways are related with immune system and diseases (n=9), cellular processes (n=7), environmental information processing (n=7), infectious diseases (n=3), genetic and environmental information processing (n=2).These shared pathways are highlighted in Table 2.

DISCUSSION
Osteoporosis is a major public health problem.However, the specific genes or pathways influencing these phenotypic are largely unknown.Recently, two pathway analyses of MBD GWAS datasets have been conducted in wrist ultradistal radius and hip.Zhang et al. highlighted the regulation-of-autophagy pathway in wrist ultradistal radius BMD GWAS dataset [12].Lee et al. identified eight significant pathways in hip BMD GWAS [13].However, no shared pathway was identified in both studies.We think that different risk pathways may be involved in BMD in different tissues.Here, we conducted multiple pathway analyses of a large-scale lumbar spine BMD GWAS using the BMD genes from ProxyGeneLD [14] and PLINK [15].Using BMD genes from ProxyGeneLD, we identified 51 significant KEGG pathways.Using BMD genes from PLINK, we identified 38 significant KEGG pathways.Interestingly, 33 pathways are shared in both methods.
We further searched the PubMed and Google Scholar databases to verify our findings.Interestingly, evidence further supports the involvement of these pathways in MBD.Take the Wnt signaling (hsa04310) C, the number of reference genes in the category; O, the number of genes in the gene set and also in the category; E, expected number in the category; R, the ratio of enrichment, rawP, the p value from hypergeometric test; adjP, the p value adjusted by the multiple test adjustment.
for example.We identified it to be the most significant pathway (P = 3.26E-09) and 7 th significant pathway (P = 1.00E-04) using genes from the "best SNP per gene" method and the meta-analysis method, respectively.It is reported that Wnt signaling plays major roles in almost all aspects of skeletal development and homeostasis.Promising effective therapeutic agents for bone diseases are being developed by targeting the Wnt signaling pathway [16].Wnt signaling regulates BMD through the lipoprotein receptor-related protein 5 (LRP5), a receptor of this signaling.Genetic variations at the LRP5 gene locus are associated with osteoporosis, which suggests that genetic variations in Wnt signaling genes may affect the pathogenesis of osteoporosis [17].
In addition to the Wnt signaling, there is also some literature evidence supporting the involvement of other risk pathways in BMD.More detailed information is described in Table 3. C, the number of reference genes in the category; O, the number of genes in the gene set and also in the category; E, expected number in the category; R, the ratio of enrichment, rawP, the p value from hypergeometric test; adjP, the p value adjusted by the multiple test adjustment.Y, this pathway is shared in pathway analysis of BMD genes using the "best SNP per gene" method; N, this pathway is not shared in pathway analysis of BMD genes using the "best SNP per gene" method; Table 3: Literature evidences supporting that genes in measles pathway are associated with bone mineral density or osteoporosis

Rheumatoid arthritis
BMD data of patients with low to moderately active RA demonstrated an association between high radiological RA damage and low BMD at the hip, which suggests an association between the severity of RA and the risk of generalised bone loss, which also occurred in corticosteroid naive patients [27].There is a significant negative relationship between femoral neck BMD and disease duration in RA.The value of OR increases proportionately with lengthening of disease duration which can be reduced significantly by methotrexate therapy [28].
[ [27][28] TGF-beta signaling pathway TGF-beta is the possible Link between loss of bone mineral density and chronic inflammation [29].TGF-β/BMPs have widely recognized roles in bone formation during mammalian development and exhibit versatile regulatory functions in the body [30].
[ [29][30] Focal adhesion Proline-rich tyrosine kinase 2 (PYK2), a member of the focal adhesion kinase family, plays a key role in the regulation of bone formation, and so inhibitors of this kinase might represent potential bone-building therapies for osteoporotic disease [31].The focal adhesion, the actin cytoskeleton and cell-cycle are connected pathways and their genes are implicated in the pathogenesis of low BMD [32].
[ [31][32] Type I diabetes mellitus The lower BMD in type 1 versus type 2 diabetic patients and control subjects probably results from more rapid bone loss after the onset of type 1 diabetes [33].patients with type 1 diabetes have a 6.9-fold increased incidence of hip fracture compared to controls [34].

Regulation of actin cytoskeleton
The focal adhesion, the actin cytoskeleton and cell-cycle are connected pathways [32].Genes in these three pathways are implicated in the pathogenesis of low BMD [32].Genome-wide linkage studies have highlighted the chromosomal region 3p14-p22 as a quantitative trait locus for BMD [35].The FLNB gene, which is thought to have a role in cytoskeletal actin dynamics, is located within this chromosomal region and presents as a strong candidate for BMD regulation [35].Mullin et al. identified significant associations between SNPs in the FLNB gene and BMD in Caucasian women [35].[32,35] Until now, there are kinds of software tools for pathway analysis of the GWAS dataset [19].Some tools including SNP ratio test [20], GenGen [21], GRASS [22], accept raw genotype datasets as input data.Other tools including ProxyGeneLD [14], ALIGATOR [19], i-GSEA4GWAS [19], and PLINK set-test [23], MAGENTA [24], and GESBAP [19] accept the summary results to subsequent pathway analysis.Here, we selected ProxyGeneLD and PLINK for gene-based test, as we did not have access to raw BMD genotype data.Both the ProxyGeneLD and PLINK have different approaches, assumptions regarding genomic architecture of risk variants in pathways, and different methods for the correction of LD and gene size effects.ProxyGeneLD produces a gene-wide p-value using the lowest p-value of the SNPs (the best SNP), or the lowest p-value in a cluster with multiple SNPs and clusters that fall within the gene boundaries [25].The P value was adjusted for the LD patterns in the human genome and gene length.PLINK SET SCREEN TEST is a meta-analysis method that combines P values across all the SNPs in genes and adjusts for the LD [15].
Based on these different software tools for pathway analysis, we recognize some limitations using ProxyGeneLD and PLINK.First, the multiple testing corrections may not be sufficient to account for all biases in pathway analysis.The results from the BMD GWAS should be adjusted using a permutation test.However, the original SNP genotype data for each individual are not available to us now.When we get the SNP genotype data, we will further perform a pathway analysis using some available software such as SNP ratio test [20], GenGen [21], and GRASS [22].These pathway analysis methods or software can be used to analyze the SNP genotype data, and can conduct a permutation test.Future replication studies using genotype data are required to replicate our findings.
In summary, we not only identified the known risk pathway such as Wnt signaling, in which the top GWAS SNPs are significantly enriched, but also highlight some new risk pathways.Interestingly, evidence from further supports the involvement of these pathways in MBD.We believe that our results advance our understanding of BMD mechanisms and will be very informative for future genetic studies in BMD.Further functional evaluation of this pathway to determine the mechanism by which it regulates BMD should be pursued to provide new insights into the pathogenesis of osteoporosis.

The BMD GWAS dataset
The second lumbar spine BMD GWAS dataset used here comes from the summary results of a large-scale BMD GWAS conducted by Estrada et al [3], which is part of the GEnetic Factors for OSteoporosis consortium (GEFOS).Estrada et al. performed a meta-analysis of GWAS for BMD of the lumbar spine (LS-BMD; n=31,800) including ~2.5 million autosomal genotyped or imputed SNPs from 17 GWAS datasets from North America, Europe, East Asia and Australia [3].BMD of the lumbar spine (LS-BMD) was measured in all cohorts using dual-energy X-ray absorptiometry following standard manufacturer protocols [3].GWAS genotyping was followed by imputation to ~2.5 million SNPs from HapMap37 Phase II release 22 using Genome Build 36 [3].Association between each SNP and BMD in each study was analyzed using sex-specific, age-weight-and principal components-adjusted standardized residuals under an additive genetic model [3].In the end, we got the association results about 2,468,080 SNPs and BMD [3].More detailed information is described in the original study [3].

Gene-based testing for GWAS dataset by ProxyGeneLD
ProxyGeneLD assigns SNPs to specific genes [14].This software flexibly takes into consideration the complex linkage disequilibrium (LD) patterns in the human genome and corrects for the inflation of significance caused by gene length.The LD information comes from the HapMap phase II CEU samples (release 22) [14].Using the lowest p-value of the SNPs (the best SNP), or the lowest p-value in a cluster with multiple SNPs and clusters that fall within the gene boundaries, ProxyGeneLD produces a gene-wide p-value [25].Intergenic SNPs, which is in high LD with the mapped genes, will have been mapped to genes for the next analysis.

Gene-based testing for GWAS dataset by PLINK
PLINK is used to test for the GWAS dataset by a meta-analysis of all the SNPs in genes [15].The set screen test uses an approximate Fisher's test to combine P values across all the SNPs in genes and adjusts for LD [15].It is reported that Fisher's method is asymptotically optimal to get the overall significance by combining a set of Pvalues obtained from independent tests of the same null hypothesis (each SNP is not associated with disease) [15].We applied this method to the BMD GWAS dataset using the LD information from the HapMap CEU population.

Pathway-based testing for BMD GWAS dataset
The KEGG pathways in WebGestalt were used for pathway analysis (June 16, 2015) [26].For a given pathway, a hypergeometric test was used to detect the overrepresentation of BMD-related genes among all of the genes in the pathway [26].The p-value of K BMD-related genes in the pathway was calculated using www.impactjournals.com/oncotargetwhere N is the total number of genes of interest, S is the number of all of the BMD-related genes, m is the number of genes in the pathway and K is the number of BMD-related genes in the pathway.
In order to avoid testing overly narrow or broad pathways, we select WebGestalt KEGG pathways that contain at least 20 and at most 300 genes for subsequent analysis.The reference gene list is the entire entrez gene set.The minimum number of genes for a category is 5.The false discovery rate (FDR) method was used to correct for multiple testing.KEGG pathways with an adjusted P<0.01 are considered to be significantly associated with BMD.

Table 1 .
. All p-values of individual gene from ProxyGeneLD are described in Supplementary The detailed gene information in significant KEGG pathways is described in Supplementary Table 2.