Evolutionary selected Tibetan variants of HIF pathway and risk of lung cancer

Tibetans existed in high altitude for ~25 thousand years and have evolutionary selected unique haplotypes assumed to be beneficial to hypoxic adaptation. EGLN1/PHD2 and EPAS1/HIF-2α, both crucial components of hypoxia sensing, are the two best-established loci contributing to high altitude adaptation. The co-adapted Tibetan-specific haplotype encoding for PHD2:p.[D4E/C127S] promotes increased HIF degradation under hypoxic conditions. The Tibetan-specific 200 kb EPAS1 haplotype introgressed from an archaic human population related to Denisovans which underwent evolutionary decay; however, the functional variant(s) responsible for high-altitude adaptation at EPAS1/HIF-2α have not yet been identified. Since HIF modulates the behavior of cancer cells, we hypothesized that these Tibetan selected genomic variants may modify cancer risk predisposition. Here, we ascertained the frequencies of EGLN1D4E/C127S and EGLN1C127S variants and ten EPAS1/HIF-2α variants in lung cancer patients and controls in Nepal, whose population consists of people with Indo-Aryan origin and Tibetan-related Mongoloid origin. We observed a significant association between the selected Tibetan EGLN1/PHD2 haplotype and lung cancer (p=0.0012 for D4E, p=0.0002 for C127S), corresponding to a two-fold increase in lung cancer risk. We also observed a two-fold or greater increased risk for two of the ten EPAS1/HIF-2α variants, although the association was not significant after correcting for multiple comparisons (p=0.12). Although these data cannot address the role of these genetic variants on lung cancer initiation or progression, we conclude that some selected Tibetan variants are strongly associated with a modified risk of lung cancer.


INTRODUCTION
High altitude is one of the most challenging environments for human existence. Nevertheless, longterm high-altitude populations have adapted through evolutionary mechanisms to survive for millennia in environments with extremely low atmospheric oxygen pressure (hypoxia). The molecular basis of such adaptation is still not completely understood. In Tibetans, alterations in hypoxia inducible transcription factor (HIF) pathway have been shown to be such an adaptive response [1][2][3]. Genome-wide scans for positive selection in Tibetan population revealed several evolutionarily selected genetic loci which may allow adaptive biological changes in humans (reviewed in [4]). Two Tibetan specific loci with the highest prevalence and strongest fixation index (F ST ) [5] are EGLN1 encoding prolyl hydroxylase 2 (PHD2) -a principal negative regulator of the HIFs levels, and a large genomic region including EPAS1 which encodes HIF-2αan isoform of one of the three α-subunits of HIFs (encoded by EPAS1 gene); both these genomic loci were reported to protect from polycythemia [3,[6][7][8].
We previously identified EGLN1/PHD2, p.D4E (NM_022051:c.12C>G), a coding variant with functional impact, that is present in cis with the previously reported variant p.C127S (NM_022051:c.380G>C, rs12097901). This Tibetan-specific selected variant EGLN1 D4E/C127S has high gene frequency (~80%) among Tibetans [7,9] and is a gain-of-function variant of PHD2 in hypoxia resulting in downregulation of HIF-1 and HIF-2. However, in another report using a different methodology, this variant was described as a loss-of-function (hypomorphic) variant leading to augmented HIF activation [10]. Several studies of high-altitude adaptation have reported that Tibetan EPAS1/HIF-2α haplotype was influenced by strong positive selection in Tibet [reviewed in [4]] and protection from polycythemia [3,11]. The Tibetan advantageous EPAS1 haplotype originated from an archaic Homo population related to Denisovans. The adaptive EPAS1 haplotype entered the Tibetan population via archaic introgression and then increased to a very high frequency among Tibetans due to positive selection [12]. Because the advantageous EPAS1 haplotype diverged from the Denisovan haplotype approximately one million years ago, there are many Tibetan-specific variants in the EPAS1 region that are absent from the Denisovan reference genome [5,13]. The functional variant(s) contributing to high-altitude adaptation on the EPAS1 haplotype is (are) currently unknown.
HIFs control the expression of a large array of genes in order to mediate hypoxia response, including those that promote glucose uptake, facilitate glycolysis, stimulate angiogenesis, or regulate apoptosis and erythropoiesis [14]. HIFs isoforms have similar structure and share important, but non-redundant role in hypoxia regulation. Despite having closely related and conservative protein domains, they have distinct and tissue-specific gene expression patterns. Whereas HIF-1α is expressed ubiquitously, the expression of HIF-2α is restricted to certain tissues and is especially important for mediating erythropoiesis, vascularization, and pulmonary development [15]. The pleiotropic effect of HIFs is underlined by diseases in which they have both protective responses (i.e. coronary artery disease, peripheral artery disease, colitis) and also contribute to pathogenesis (i.e. congenital polycythemias, cancer, pulmonary arterial hypertension). In many solid tumors the increased HIF-1α and/or HIF-2α levels in diagnostic tumor biopsies are associated with poor prognosis and increased risk of mortality due to their invasive and metastatic properties [16].
Due to the conflicting data on whether HIFs are uniformly tumor initiators or suppressors [17,18], or whether HIFs function differently depending on tissue context, we hypothesized that Tibetan selected genomic variants may modify cancer risk predisposition. We therefore genotyped patients and controls from a casecontrol study for lung cancer in Nepal. Nepal is a land of diverse ethnic groups, but they can be organized into two major groups: Indo-Aryan origin from the Indian subcontinent, and those belonging to the Tibetan-related Mongoloid race from the Himalayas. Since most of the Tibetan-related mongoloid Nepalese and Tibetans share the same genetic background [13], we believe they would also have the Tibetan-specific EGLN1 and EPAS1 genetic variants. We evaluated the association between lung cancer and Tibetan-specific EGLN1 and EPAS1 genetic variants. To do so, we genotyped the two EGLN1 variants, EGLN1 D4E and EGLN1 C127S and ten EPAS1 SNPs with strong evidence of positive selection.

RESULTS
The study was carried out on 214 lung cancer patients (114 male, 53%; 100 female, 47%) and 213 controls (129 male, 61%; 84 female, 39%). The demographic characteristics are shown in Table 1. The distribution of self-reported race/ethnicity, religion and residence are shown since these are variables that we need to consider as potential confounders. The majority of both cases and controls are Hindu and live in the plains regions. Approximately 2.8% of cases and 1.9% of controls live in the high altitude mountain region. Both the cases and controls were widely distributed across the western to eastern regions of Nepal for their residence.
All samples were genotyped using a high resolution melting assay and then validated by Sanger sequencing. Among 427 total samples, 20 healthy controls and 40 patients were heterozygous or homozygous for EGLN1 D4E .
As previously observed [9], all individuals with EGLN1 D4E were also heterozygous or homozygous for EGLN1 C127S , i.e. EGLN1 D4E has always been observed in cis with EGLN1 C127S . The EGLN1 D4E/C127S haplotype was associated with an approximately two-fold increase in lung cancer risk (p=0.0012, p=0.0002) ( Table 2). The associations remained significant after adjusting for genotypes at the EPAS1 locus (p=0.0094, p=0.0022) ( Table 2). We further investigated the EGLN1 D4E/C127S variant by adjusting for smoking status and the associations still held ( Table 3). The association between the EGLN1 D4E/C127S variants and lung cancer risk was observed among men, by study participants residing in the West of Nepal and reported to live in the high altitude, and among Hindus.
We also explored whether the association differed depending on the cancer type and disease stage. These analyses were limited due to smaller numbers of cases. For lung squamous cell carcinoma, the odds ratio (ORs) for the EGLN1 D4E carriers were 3.79 (95%CI=1.79, 8.01) adjusted for Tibetan ethnicity, 3.8 (95%CI-1.78, 8.11) when adjusted for Hindu religion and 3.38 (95%I=1.59, 7.21) when adjusted for smoking status. The ORs for the  [19]. The CMS test combined five independent tests (XP-EHH, iHS, diHH, FST, and DAF) to improve the power to detect recent positive selection. Using simulations, the log-posterior probability that the observed data under the positive selection were calculated for each test; then these log-posterior probabilities were summed together to generate the final CMS score. Our CMS test used the whole genome sequences of 17 Tibetans [5]. The details of the CMS test are described elsewhere [5].
As shown in Table 4 and 5, we observed an odds ratio of greater than two for two of the ten EPAS1 variants, rs117813469 (p=0.050) and rs142764723 (p=0.025). However, the associations were not significant after correcting for multiple comparisons (p=0.12).

DISCUSSION
HIF pathway plays a crucial role in Warburg effect, a unique metabolic feature of cancer cell metabolism. In our case-control study, we evaluated association between lung cancer and Tibetan-specific evolutionary selected EGLN1 variants, EGLN1 D4E and EGLN1 C127S and ten EPAS1 SNPs with strong evidence of positive selection. We found that the adaptive Tibetan EGLN1 haplotype was associated with a two-fold increase in the risk of lung cancer (p=0.0012 for D4E, p=0.0002 for C127S, Table 2). In addition, we observed a two-fold increase in the risk of lung cancer for two of ten interrogated EPAS1 SNPs, although the associations were not significant after correction for multiple testing.
Lung cancer is the most common cancer worldwide and generally can be divided into two categories: non-small-cell lung cancer (adenocarcinoma, squamous cell cancer and large cell carcinoma) and small cell lung cancer. The morphology and molecular profiles are distinct for each subtype but the recent discoveries of mutations in candidate driver genes, e.g. epidermal growth factor receptor (EGFR), anaplastic lymphoma kinase (ALK), or c-ROS oncogene1 (ROS1), brought a deeper understanding of the lung cancer genome [20]. Nevertheless, the role of hypoxia in the transformation of lung epithelial cells and its role in the tumor supporting microenvironment is not completely elucidated. Surprisingly, the overexpression of the HIF hydroxylases PHD1, PHD2, PHD3 individually and collectively were found as indicators of poor prognosis in non-small cell lung cancer, with PHD2 as the most significant prognostic marker [21]. The silencing of PHD2 in cancer cells provided conflicting data, since when increased [22] or decreased [23], tumor growth was observed and was attributed to different molecular mechanisms. The PHD2 haplodeficiency reduced metastasis without affecting tumor growth by decreased activation of cancer-associated fibroblasts [24] indicating the number of HIF-independent roles of PHDs.
Our data cannot address the specific role of EGLN1 and EPAS1 genetic variants in lung cancer initiation, progression, or survival. We cannot rule out the possibility that the inheritance of the EGLN1 D4E/ C127S variants may in fact be associated with longer survival and more indolent clinical course and thus a higher likelihood of detection while those without the EGLN1 D4E/C127S variant, perhaps living in the more remote isolate area of Nepal, may never reach the referral center to undergo diagnostic tests because of their impending rapid clinical course and/or feeble clinical status precluding the travel. Alternatively, the differential rate of acquiring or eliminating EGFR, ALK, or ROS1 lung cancer somatic mutations may be also influenced by the inheritance of the EGLN1 D4E/C127S variant which modulates HIFs-stability. In addition, in Asian patients with adenocarcinoma histology, an EGFR mutation is found to be present at twice the rate of other populations, reflecting the underlying genetic differences [25]. Only a prospective study taking into account these clinical and molecular markers can answer these lung cancer covariant questions.
However, we conclude that high-altitude adapted Tibetan variants in the EGLN1 and possibly EPAS1 regions are associated with the modified lung cancer risk, and that the role of these Tibetan evolutionary selected haplotypes needs to be further elucidated. The evolutionary-selected HIF signaling pathways in Tibetans provide a unique opportunity to gain a deeper understanding of cancer pathophysiology in this minority population subjected to unique evolutionary pressures.

Sample acquisition
A hospital-based case-control study was conducted at the B. P. Koirala Memorial Cancer Hospital (BPKMCH) in the city of Bharatpur, in the Chitwan district (Nepal). In this study, we included 214 lung cancer cases (ICD-O2 codes C33 and C34), and 213 controls matched on age (±five years), sex and geographical area of residency (mountain, hilly and plain region). All eligible cases were recruited as soon as possible after initial diagnosis of lung cancer, with a target interval of one day between initial diagnosis and recruitment, and a maximum interval of three months to ensure that very few cases are not recruited because they die before there is an opportunity to interview them. The source of controls were visitors of non-lung cancer patients at the hospital. A standardized questionnaire was administered to all study participants by trained staff members, who collected data on demographic and socioeconomic status, family history of cancer, tobacco and alcohol consumption habits, dietary factors, occupation, residential history and usage of different chewing products available locally. All study subjects gave informed consent. This study has been approved by the IRB committees at the University of Utah, University of Maryland and the Nepal Health Research Council.

DNA isolation
Genomic DNA was isolated from whole blood using Blood & Cell Culture DNA Mini Kit (Qiagen, Germantown, MD) according to manufacturer's protocol, standardized to 100 ng/μL.

EGLN1/PHD2 genotyping -EGLN1 D4E high resolution melting genotyping
Because the EGLN1 haplotype is located in a GCrich region and is difficult to detect by next-generation or dideoxynucleotide sequencing, we developed a high resolution melting assay suitable for analyzing limited amounts of DNA [9]. Double layer capillary method was used to perform polymerase chain reaction and high resolution melting of EGLN1/PHD2, D4E (rs186996510) using LightScanner 32 capillary real-time instrument

EGLN1/PHD2 genotyping -EGLN1 exon one Sanger-based genotyping
We subsequently improved the Sanger sequencing method for EGLN1 genotyping. This improved Sanger method now allows the analysis of this genetic region without requiring the LightScanner 32 capillary real-time instrument (BioFire Defense). All samples were examined by both methods with complete concordance of the results. The 5' portion of EGLN1 exon one (c.118 -471) was amplified using Platinum® Taq DNA Polymerase (Life Technologies), FailSafe™ PCR 2X PreMix "J" (Epicentre, Madison, WI, USA) and M13-tailed primers: 5'-TGT AAA ACG ACG GCC AGC ATG GCG CAG TAA CG and 5'-CAG GAA ACA GCT ATG ACC TGG AAC AGC GAT GAG. The PCR conditions were: 98°C for 30 secs, and

EPAS1/HIF-2α SNPs genotyping
The ten SNPs were amplified using multiplex touchdown PCR and analyzed by single nucleotide extension (SNE) protocol (Life Technologies). Briefly, multiplex PCR was run with Platinum® Taq DNA Polymerase (Life Technologies), FailSafe™ PCR 2X PreMix "D" (Epicentre) and the primers listed in Supplementary Table  1. The PCR conditions were: 98°C for 30 sec, and ten cycles of 20 seconds at 98°C, 20 seconds annealing at 62°C minus 0.5°C/cycle and 45 seconds extension at 72°C, followed by 25 cycles of 20 seconds at 98°C, 20 seconds annealing at 57°C and 45 seconds extension at 72°C. Amplification products were purified using EXOSapIt (Affymetrix) to remove excess PCR primers and dNTPs. Amplified products were then added to SNaPshot Multiplex System (Life Technologies) reagents according to manufacturer's protocol on a GeneAMp PCR system 9700 (Applied Biosystems, Foster City, CA). Following SNE cycling reaction excess ddNTPs were removed using Shrimp Alkaline Phosphatase (Affymetrix, Santa Clara, CA, USA) and analyzed using a 3130xl Genetic Analyzer (Applied Biosystems). Primers for PCR were designed using MPprimer [26]. Primers for SNE were manually selected and screened using Primer Designer (v.5) software (Scientific & Educational Software, Morrisville, NC, USA). The SNE primers were designed to lengths between 15 and 60 nucleotides. Fragment analysis was conducted using GeneMarker software (Sofgenetics).

Statistical analysis
Unconditional logistic regression was used to estimate the odds ratios (OR) and their 95% confidence intervals (95%CI), to evaluate the risk of lung cancer according to various genetic variants. Given the extensive linkage disequilibrium in the EPAS1 region among Tibetans, a Bonferroni correction would be overly conservative. Therefore, we performed a randomization test to estimate the family wise error rate for the ten SNPs in the EPAS1 region. Specifically, we computed the max(T) permutation p-value for Cochran-Armitage test for trend using PLINK v1.07.

CONFLICTS OF INTEREST
All the authors declare no competing financial interests.

GRANT SUPPORT
Funding for this research study was supported by the HCI Cancer Center Support Grant P30CA042014 (MH), 1P01CA108671-01A2 (JP). LL was supported by Czech Science Foundation, project GACR 15-18046Y and Kontakt II (MSMT LH15223). TB was supported by "Increasing Opportunities for Career Growth in Research and Development in the Field of Medical Sciences", ITMS: 26110230067, co-funded from EU sources and European Social Fund.