Polymorphisms of long non-coding RNA HOTAIR with breast cancer susceptibility and clinical outcomes for a southeast Chinese Han population

Hox transcript antisense intergenic RNA (HOTAIR) is a well-known long non-coding RNA (lncRNA) which participates in tumorigenesis and progress of multiple cancers. However, the associations among polymorphisms on HOTAIR, breast cancer (BC) susceptibility and clinical outcomes have remained obscure. In this case-control study, we assessed the interaction between three lncRNA HOTAIR single nucleotide polymorphisms (SNPs) (rs1899663, rs4759314 and rs7958904) on the risk and clinical outcome of breast cancer in a Chinese Han population. In total, 969 breast cancer cases and 970 healthy controls were enrolled in this study. Associations among genotypes, BC risk and survival were evaluated by univariate and multivariate logistic regression to estimate the odds ratio (OR), hazard ratio (HR) and its 95% confidence interval (CI). The disease-free survival (DFS) and overall survival (OS) was calculated by the Kaplan–Meier method. We found that the T allele of rs1899663 and C allele of rs7958904 both achieved significant differences between cases and controls in the single locus analyses (P = 0.017 and 0.010, respectively). Multivariate analyses also revealed the rs1899663 TT genotype and rs7958904 CC genotype were both at higher risk of breast cancer compared with the GG homozygotes (OR = 2.08, 95% CI = 1.20–3.60 and OR = 1.45, 95% CI = 1.01–2.08, respectively). In survival analysis, we observed that the T allele of rs1899663 presented significant differences for both DFS (HR = 1.64, 95% CI = 1.12–2.40) and OS (HR = 2.10, 95% CI = 1.29–3.42) in younger subjects (age ≤ 40). Our findings may provide new insights into the associations among the genetic susceptibility, the fine classifications and the prognosis of breast cancer. Further studies with larger sample size and functional research should also be conducted to validate our findings and better elucidate the underlying biological mechanisms.


INTRODUCTION
Worldwide, breast cancer (BC) is one of the most commonly diagnosed malignancies and the primary cause of death from cancer in women [1]. For the year 2017, it is estimated in the United States that approximately 252,710 female patients would be diagnosed with breast cancer and 40,610 would die from it [2]. During the past few decades in China, the incidence of BC has increased rapidly and become the most frequent cancer for women in major cities [3,4]. The development of breast cancer is a multifactorial and complex process, involving both environmental and genetic factors. Epidemiology studies have demonstrated that age, obesity, menstrual status, positive family history and previous benign breast disease are correlated with the development of breast cancer [5][6][7][8][9][10]. Whereas accumulative evidences have revealed that, some genetic variants such as single nucleotide polymorphisms (SNPs) in tumor suppressor genes or oncogenes, could also play a critical role in the genetic susceptibility to breast cancer [11][12][13][14][15][16][17]. Although a great proportion of publications have focused on the cancerrelated polymorphisms that are located in protein-coding genes, several SNPs located in chromosomal regions which do not encode genes are also indicated to contribute to the risk of different cancers.
In the past few years, one novel kind of non-coding RNAs, long-non coding RNA (lncRNA) has attracted extensive attentions for its wide range and comprehensive regulatory functions in human diseases. LncRNA is a type of RNA transcripts that are longer than 200 nucleotides with no protein-coding capacities [18]. Although lncRNAs were identified to be involved in multiple biological processes [19][20][21][22], they were also known to play important roles in tumorigenesis, including transcriptional, post-transcriptional and epigenetic regulation of cancerassociated genes, thereby resulting in the cell progression, migration, invasion and apoptosis [23][24][25]. As one of these RNAs, lncRNA Hox transcript antisense intergenic RNA (HOTAIR) which is located on chromosome 12q13.13, has been proved to be linked with the development and progression of multiple cancers, such as hepatocellular cancer [26,27], esophageal cancer [28][29][30], lung cancer [31][32][33], gastric cancer [34][35][36][37] and breast cancer [38][39][40]. HOTAIR plays a crucial role in gene regulation by modifying the chromatin structure [41]. The 5′ domain of HOTAIR could bind polycomb repressive complex 2 (PRC2), leading to a histone H3 lysine27 trimethylation (H3K27me3) in the HOXD locus, whereas the 3′ domain connects to the LSD1/CoREST/REST complex with H3 lysine 4 demethylation, together regulating the various downstream genes and promoting cancer cell metastasis [42]. In breast cancer, increasing evidences have suggested that lncRNA HOTAIR is an oncogene which is correlated with the BC carcinogenesis, progression and prognosis. Firstly, aberrant up-regulation of HOTAIR was found in breast cancer tissue or plasma samples compared with normal adjacent non-tumorous tissue or healthy controls [43,44]. Additionally, this high expression of HOTAIR was also a significant predictor of subsequent metastasis and correlated with a shorter survival time in breast cancer patients [38,43]. Moreover, in vitro studies have identified that the HOTAIR was robustly expressed in the basal-like breast cancer cells and the inhibition of HOTAIR could reduce the basal-like gene expression and growth [45]. Recently, several single nucleotide polymorphisms located in HOTAIR were also reported to show highly significant associations with breast cancer.
For example, one study by Yan et al. [46] identified that the T allele of rs920778 conferred significant increased risk to BC, with the other study in Turkey indicating that the TT genotype of rs12826786 might play critical roles in genetic susceptibility for breast cancer [47]. However, our understanding for the association between lncRNA HOTAIR polymorphisms and the genetic susceptibility of BC is still at an early stage. And as far as we know, no published studies have ever evaluated the relationships between HOTAIR SNPs and the clinical outcomes in breast cancer patients. Accordingly, we selected five SNPs (rs12826786, rs1899663, rs4759314, rs7958904 and rs920778) which were previously identified to be associated with cancer risk and conducted this present case-control study involving 969 BC patients and 970 healthy controls, aiming to investigate the role of HOTAIR tag SNPs on the risk and clinical outcome of breast cancer in a southeast Chinese Han population.

Subject characteristics
A total of 1939 subjects (969 cases and 970 healthy controls) were involved in this study. The selected demographic characteristics and clinicopathological features of breast cancer cases and control subjects are displayed in Table 1. No significant differences were observed between cases and controls in age, menopausal status, age at menopause and previous benign disease (P > 0.05). Compared with the healthy controls, the BC patients were more likely to have a lower mean BMI, an earlier age at menarche, a later age at first live birth and a higher proportion of family history of breast cancer (P < 0.05). Among 969 breast cancer cases, 584 (60.3%) were with tumor size >2 cm, 385 (39.7%) were with tumor size ≤2 cm, 490 (50.6%) patients had lymph node involvement, 479 (49.4%) patients did not have lymph node involvement. Moreover, 644 (66.4%) cases were luminal type, 149 (15.4%) were HER-2 overexpressing and 176 (18.2%) were triple negative breast cancer (TNBC).

Effects of HOTAIR SNPs and breast cancer risk
In linkage disequilibrium (LD) analysis, the SNP rs12826786 was discovered in strong LD with rs1899663, with a Pearson's correlation coefficient (r 2 ) of 0.983. Similarly, the SNP rs920778 was also in strong LD with rs7958904, with a Pearson's correlation coefficient (r 2 ) of 0.984 ( Figure 1). So rs1899663, rs4759314 and rs7958904 were selected as three tag SNPs in this study. The genotype distributions of all three tag SNPs are shown in Table 2. The observed genotype frequencies in three SNPs were consistent with those expected from Hardy-Weinberg equilibrium (HWE) in healthy controls (P = 0.402 for rs1899663, P = 0.295 for rs4759314 and www.impactjournals.com/oncotarget

Stratified analysis of HOTAIR polymorphisms and breast cancer
To further assess the suggestive association between HOTAIR polymorphisms and the risk of breast cancer, we conducted stratified analyses among different subgroups of demographic characteristics and reproductive factors in dominant model (Table 3). For the T carriers of rs1899663, elevated risks of BC were found in subgroups of younger patients (age ≤ 40) (OR = 1.48, 95% CI = 1.04-2.12), individuals with earlier menarche (OR = 1.38, 95% CI = 1.05-1.82) and subjects with an earlier age at first live birth (OR = 1.36, 95% CI = 1.06-1.75). As for the C carriers of rs7958904, we observed significantly increased risks in subgroup of lower BMI individuals (BMI ≤ 24) (OR = 1.26, 95% CI = 1.01-1.57), individuals with an earlier age at first live birth (OR = 1.37, 95% CI = 1.08-1.74) and patients with ER positive (OR = 1.32, 95% CI = 1.07-1.61) or PR positive (OR = 1.32, 95% CI = 1.07-1.64). No positive associations were detected in any of the subgroups of rs4759314 (Supplementary Table 1). Also, no significant heterogeneity was discovered within any of the subgroup for the three tag SNPs.

Effects of clinicopathological features and HOTAIR SNPs on breast cancer survival
As shown in Table 4, the associations of clinicopathological features and HOTAIR polymorphisms with patients' disease free survival and overall survival were evaluated by Cox regression analyses. The results demonstrated that tumor size, lymph node involvement and different molecular subtypes were significantly associated with the DFS and OS for breast cancer patients (all P < 0.05, log-rank test). While for the HOTAIR tag SNPs, no statistically significant associations were observed between the genotypes and the survival of breast cancer in any of the genetic models ( Figure 2). To further assess the prognostic value of HOTAIR polymorphisms, we also performed stratified analyses by age, tumor size, lymph node involvement and different molecular subtypes. Multivariate analyses revealed that the T carriers of rs1899663 presented significant differences for both DFS (HR = 1.64, 95% CI = 1.12-2.40) and OS (HR = 2.10, 95% CI = 1.29-3.42) in younger patients (age ≤ 40) subgroup (Table 5 and Figure 3). As for the G carriers of rs4759314, we observed a decreased risk for OS (HR = 0.26, 95% CI = 0.08-0.83) in patients without lymph node involvement (Table 6). However, we did not notice any significant difference in survival for rs7958904 (Supplementary Table 2) or within any of the other subgroups of rs1899663 and rs4759314.

DISCUSSION
Deeper understanding of lncRNAs and their roles in tumor pathogenesis, progression and prognosis could contribute a large number of potential clues to develop novel therapeutic approaches for breast cancer. HOTAIR (HOX transcript antisense RNA) is known as a functional lncRNA which participates in several tumor types including breast cancer [26][27][28][29][30][31][32][33][34][35][36][37][38][39][40]. The oncogenic roles of HOTAIR have attracted extensive attentions in breast cancer, while epidemiological studies focusing on tumor susceptibility and prognosis conferred by genetic polymorphisms in its locus have not been widely investigated [38][39][40]43]. In this present study, we evaluated the effects of three potential functional HOTAIR polymorphisms (rs1899663, rs4759314 and rs7958904) on breast cancer susceptibility and clinical outcomes in a Chinese population. We identified individuals with T allele of rs1899663 and C allele of rs7958904 had an increased risk of developing breast cancer and patients with T carriers of rs1899663 presented a worse DFS and OS in subgroup with younger subjects. Our findings support the hypothesis that the functional genetic variants located in HOTAIR may explain a part of BC genetic basis. And to the best of our knowledge, this is the first study to evaluate the correlations between HOTAIR variants and breast cancer survival.
LncRNA HOTAIR is located on chromosome 12q13.13 and plays a key role in gene regulation by modifying the chromatin structure [41]. The 5′ domain of HOTAIR could bind polycomb repressive complex 2 (PRC2) and leads to a histone H3 lysine27 trimethylation (H3K27me3) in the HOXD locus, while the 3′ domain connects to the LSD1/CoREST/REST complex with H3 lysine 4 demethylation, together regulating the various downstream genes and promoting cancer cell metastasis [42]. HOTAIR has been widely explored in breast cancer and suggested as a functional lncRNA which is correlated with the carcinogenesis, progression and prognosis of BC. The aberrant up-regulation of HOTAIR was proved to be found in breast cancer tissue or plasma samples compared with the normal adjacent tissue or healthy controls [43,44], and this high expression was also indicated as a predictor of subsequent metastasis and correlated with a shorter survival time of breast cancer patients [38,43]. Except from these, HOTAIR was additionally reported to be robustly expressed in the basal-like breast cancer cells and the inhibition of HOTAIR could reduce the basal-like gene expression and growth in vitro studies [45]. Therefore, understanding the biological roles of HOTAIR may help us to recruit this lncRNA as a diagnostic or predictive biomarker in breast cancer.
In current study, we demonstrated that the T allelic frequency of rs1899663 and C allelic frequency of rs7958904 were both significantly higher in breast cancer cases compared with the cancer-free controls. Multivariate analyses on genotype distributions also revealed that the TT carriers of rs1899663 and the CC carriers of rs7958904 were consistently associated with the elevated risk of breast cancer. In further stratified analyses, we observed that the T carriers of rs1899663 were correlated with elevated risks of BC in subgroups of younger patients (age ≤ 40), individuals with earlier menarche and subjects with an earlier age at first live birth. As for the C carriers of rs7958904, increased risks of breast cancer were found to be more evident in subgroup of lower BMI individuals (BMI ≤ 24), individuals with an earlier age at first live birth and patients with ER positive or PR positive. These results showed that the effects of HOTAIR genetic variant on breast cancer risk could be modulated by specific environmental exposures as well as demographic factors, and provided evidence supporting that the carcinogenesis  is a complex process involving both genetic and environmental factors. Previous studies have suggested that the T allele of rs1899663 was associated with a higher risk of developing prostate cancer [48], whereas this significant positive correlation was not detected in cervical cancer [49] and esophageal squamous cell carcinoma [50]. In one study concerning HOTAIR polymorphisms and breast cancer [51], the rs1899663 T allele also did not show significant differences in the frequency distribution of cancer patients and healthy controls in an overall correlation analysis, while the follow-up stratified analysis indicated the GT+TT genotypes had a significantly lower risk of BC among women with age at menarche >14 (OR = 0.42, 95% CI = 0.21-0.82) and number of pregnancies >2 (OR = 0.65, 95% CI = 0.49-0.95). As for rs7958904, several studies have indicated that the C allele was associated with a significantly decreased risk of colorectal cancer [52], ovarian cancer [53] and  [54] when compared with the G allele, which produce a contrary result with our study. This may be interpreted by the different susceptibilities to a disease among the different populations and the different kinds of cancer could have various etiologies, which involve diverse genetic or epigenetic modifications. The polymorphism rs1899663 and rs7958904 was separately located on the intron 2 and exon 6 of HOTAIR gene. Guo et al. [55] have reported that HOTAIR SNP rs12826786 which is in strong LD with rs1899663 (r 2 = 0.983) was associated with gastric cardia adenocarcinoma risk and had an allelic-specific effect on HOTAIR expression. It is plausible that the rs1899663 or its LD polymorphisms could affect the BC susceptibility by altering the HOTAIR expressions. In silico analyses have revealed that, the secondary structure of HOTAIR gene was distinctly changed with the rs7958904 G/C variants, indicating that this polymorphism may participate in tumorigenesis through the alteration of HOTAIR structure [52]. Another explanation for rs7958904 in relation to breast cancer susceptibility is that the real functional SNP is rs920778, which is in high LD (r 2 = 0.984) with rs7958904. Polymorphism rs920788 was also located on the intron of HOTAIR gene and was proved to be able to enhance the intronic enhancer activity and increase HOTAIR expression in several cancer cells [49,50].

Controls (GG/GC + CC)
In overall survival study, we did not notice any significant association between genotypes of three tag SNPs and the survival of breast cancer in any of the genetic models. While in the subsequent stratified analysis, we revealed that the T allele of rs1899663 presented significant differences for both DFS (HR = 1.64, 95% CI = 1.12-2.40) and OS (HR = 2.10, 95% CI = 1.29-3.42) in younger subjects (age ≤ 40) and the G allele of rs4759314 showed a decreased risk for OS (HR = 0.26, 95% CI = 0.08-0.83) in patients without lymph node involvement. However, given the small sample of rs4759314 GG carriers in subgroups without lymph node involvement in overall analysis (3 cases), we speculated that the association of rs4759314 observed in OS study may be a false positive result.
In conclusion, we identified two SNPs located in HOTAIR (rs1899663 and rs7958904) that were significantly associated with the increased risk of breast cancer and firstly investigated the role of HOTAIR tag SNPs on the clinical outcome of BC in a southeast Chinese Han population. However, several limitations in this study should also be mentioned. Firstly, the sample size of the current study was still not large enough and might lead to a limited statistical power and impact on the accuracy and precision of the results. Secondly, we only included three lncRNA HOTAIR polymorphisms in the present study, while studies comprising more functional SNPs in HOTAIR might be more able to illuminate the precise role of genetic variants in BC carcinogenesis and progress. Thirdly, the biological function of the HOTAIR polymorphisms is not clear, further functional studies are still needed to explore the relationship. In spite of these limitations, the findings of our study were still informative for the researchers and physicians in this field. Additional prospective population-based studies with larger sample size and different ethnicities, as well as relevant functional studies are still needed to confirm our findings.

Ethical statement
This study and consent procedure was approved by the Ethical Committee of Affiliated Union Hospital of Fujian Medical University. Each participant included in the study has provided a written informed consent document.

Study subjects
This hospital-based study was conducted on a total of 969 breast cancer patients and 970 healthy free controls.
All participants were genetically unrelated Chinese Han residents of Fujian Province and its surrounding regions. Breast cancer subjects were all histopathologically confirmed with primary breast cancer and recruited from the Affiliated Union Hospital of Fujian Medical University between July 1995 and October 2010. Healthy controls (frequency-matched to cases on age ±3 years) were randomly selected from individuals attending routine health examination in the outpatients' department during the same period. Each patient and healthy control was interviewed face-to-face by two trained oncologists to gather information on demographic factors, menstrual status, fertility status, previous benign breast disease history and the family history of breast cancer. Specific clinicopathological data of breast cancer cases including tumor size, lymph node involvement, estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor-2 (HER-2) status were all extracted

Outcome collections
Disease free survival (DFS) and overall survival (OS) were the main study points. Patients alive on the last follow-up date were considered censored. DFS was measured as the time from the date of diagnosis to the first local or distant recurrence or to the last follow-up. OS was defined as the time from the date of diagnosis to the date of death due to all causes (including breast cancer) or the last follow-up. The date of death was obtained from inpatient and outpatient records or by the relatives of patients through follow-up telephone calls. The last follow-up date of this study was November 1st, 2016.

DNA extraction and genotyping
Each participant was asked to provide a 5-ml peripheral blood sample after enrolling in this study. Genomic DNA was extracted from the peripheral-blood samples using a Whole-Blood DNA Extraction Kit (Bioteke, Beijing, China) following the manufacturer's instructions. LncRNA HOTAIR tag SNPs were genotyped by a 2 × 48-Plex SNPscan Kit (Cat#:G0104K; Genesky Biotechnologies Inc., Shanghai, China). The DNA samples were ligated and amplified by polymerase chain reaction (PCR) according to the standardization protocol recommended by the manufacturer. Ligation products were performed with an ABI3730XL sequencer and the raw data was analyzed by GeneMapper 4.1 Software (Applied Biosystems, Foster City, CA). For quality control, all genotyping were performed without knowledge of case or control status. About 10% of the DNA samples were randomly selected for direct sequencing (BGI Sequencing, Beijing), and the result was 100% concordant.

Statistical analysis
All statistical analyses were performed using Statistical Package for the Social Sciences (SPSS, version 21.0) for Windows (SPSS, Chicago, IL). The differences between breast cancer cases and healthy controls in demographic characteristics and environmental risk factors were evaluated by using the Student's t-test (for continuous variables) and chi-squared (χ 2 ) test (for categorical variables). Hardy-Weinberg equilibrium (HWE) was applied by a goodness-of-fit chi-squared (χ 2 ) test to assess the expected and observed genotype frequencies in control subjects. Associations among genotypes, breast cancer risk and survival were evaluated by the computing odds ratio (OR), hazard ratio (HR) and its 95% confidence interval (CI) from univariate and multivariate logistic regression analyses. Linkage disequilibrium was calculated from genotype data using Haploview 4.1 (http://www.broad. mit.edu/mpg/haploview/). The power analysis of this study was performed by using the QUANTO program, version 1.2.4, with the disease risk for the Chinese population was 268 per 100000. The disease free survival and overall survival was calculated by the Kaplan-Meier method, with the log-rank test used to compare the differences. All statistical analyses were two-sided, and a level of P value less than 0.05 was considered significant.