Clinical significance of putative markers of cancer stem cells in gastric cancer: A retrospective cohort study

Cancer stem cells (CSCs) are thought as the source of tumor maintaining and many CSCs markers have been identified. Regarding the heterogeneity in gastric cancer (GC), TNM stage is not enough to accurately predict the prognosis. The aim of this study was to investigate the clinical significance of CSCs markers (Lgr5, Oct4, CD133, EpCAM, CD54 and Sox2) and establish a new model based on these markers to accurately predict prognosis of GC. We retrospectively enrolled 377 GC tissues from January 2006 to October 2012 to perform immunohistochemistry (IHC), and 93 pairs of GC tissues and corresponding adjacent normal gastric tissues to perform quantitative PCR (qPCR) from December 2011 to October 2012. The clinicopathological and follow-up characteristics were collected. In IHC, Oct4, CD133 and EpCAM were independently related to tumor progression, while Sox2 were associated with well or moderate differentiation (all p<0.05). Cox regression showed that Oct4-EpCAM was an independently prognostic factor, indicating that double low expression of Oct4-EpCAM group had significantly better prognosis than control group (p=0.035). Regarding qPCR, CD133 was an independent prognostic factor, showing that the prognosis of patients with CD133 high expression was significantly worse than that of patients with CD133 low expression (p<0.001). The prognostic prediction accuracy of nomogram based on Oct4-EpCAM expression in IHC was significantly better than TNM stage alone (p=0.003). Low expressions of Oct4-EpCAM in IHC and CD133 in qPCR were favorable prognostic factors in GC. The nomogram based on Oct4-EpCAM was valuable in prognostic prediction of GC patients.


INTRODUCTION
Gastric cancer (GC) is one of the leading causes of cancer-related mortality worldwide, with high incidence in Asia [1]. For patients with resectable GC, surgery and adjuvant chemoradiotherapy are the main way to cure this malignance. However, many patients still suffered recurrence and metastasis, though they received standard treatments. The concept of cancer stem cells (CSCs) has been put forward to explain the cause of therapy resistance and many studies have discovered that CSCs might play a pivotal role in tumor recurrence and metastasis [2,3].

Research Paper
Meanwhile, some specific markers or their combinations have also been demonstrated valuable in identifying CSCs. These specific markers might be the key points in target therapy and prognostic prediction. From previous studies, we found that intercellular adhesion molecule 1 (CD54), leucine rich repeat containing G protein coupled receptor 5 (Lgr5), prominin 1 (CD133), POU class 5 homeobox 1 (Oct4), epithelial cell adhesion molecule (EpCAM) and sex determining region Y-box 2 (Sox2) were demonstrated as the putative markers of CSCs in many kinds of tumors [4][5][6]. The relationship between these markers and clinicopathological characteristics and the prognostic significance of these markers have been investigated in GC [7,8]. However, many studies only focused on several of these six markers and the results were still controversial. Therefore, the significance of these markers were still under debate and should be further demonstrated.
Nowadays, TNM stage revealing tumor invasion depth, regional metastatic lymph nodes (LNs) and distant metastasis is one of the most important classifications of tumor progression and a useful clinical tool in prognostic prediction of GC [1,9]. Nevertheless, TNM stage cannot illustrate complete information of tumors and patients. As far as we know, heterogeneity extensively exists in many tumors [10,11]. In clinical practice, some patients with the same TNM stage are found to have different prognosis. Therefore, it is necessary to find new tools or crucial supplementary of TNM stage that represents the individual characteristics to accurately predict the prognosis of patients with GC. The aim of this study was to investigate the clinical significance of these six markers of CSCs and establish a new model based on these markers to predict the prognosis of patients with GC.

Expressions of these markers in gastric CSCs (GCSCs) and primary lesions
In GCSCs, EpCAM and CD54 were highly expressed on the cytomembrane, CD133 was weakly but Lgr5 and Oct4 were highly expressed in cytoplasm, and Sox2 was highly expressed in cytoplasm and nucleus ( Figure 1). The expressions of these markers in primary lesions and metastatic LNs of GC through IHC were similar to GCSCs, except that CD133 was mainly expressed in the lumen of glands and Oct4 was expressed in nucleus ( Figure 2). The relationships among these six markers in primary lesions were also analyzed and showed that EpCAM had significantly negative correlation to Lgr5 (p=0.007), CD133 (p=0.006) and Oct4 (p=0.001), while Oct4 was positively associated with Lgr5 (p<0.001) and CD133 (p=0.024) in immunohistochemistry (IHC). With respect to quantitative polymerase chain reaction (qPCR), we found that Lgr5 and Sox2 were remarkably positively correlated with CD54 (p=0.006) and Oct4 (p=0.006), respectively (Table 1).

Relationship between the expressions of these markers in primary lesions and clinicopathological characteristics
Univariate correlated analyses of IHC and qPCR were respectively shown in Table 2 and Table 3. Multivariate analyses of IHC were shown in Table 4.

Lgr5
In IHC, 276 (73.2%) and 101 (26.8%) patients were in Lgr5 low and Lgr5 high expression groups, respectively. The results showed that Lgr5 high expression group had remarkably more patients with >60 years (p=0.001), male (p=0.024), macroscopic type III (p=0.030), tumor size 4cm-7cm (p=0.044) and TNM III stage (p=0.046) than Lgr5 low expression group. Multivariate analysis revealed that Lgr5 expression was only independently associated with age (p=0. 001 In qPCR, there were 65 (69.9%) and 28 (30.1%) patients in EpCAM low and high expression groups. EpCAM expression was remarkably concerned in M stage (p=0.006) in univariate analysis, but no clinicopathological traits were significantly related to EpCAM expression in multivariate analysis.

CD54
In IHC, there were 321 (85.1%) patients in CD54 low expression and 56 (14.9%) patients in CD54 high expression group. CD54 expression was only significantly related to T stage (p=0.047) in univariate analysis, showing that CD54 high expression group had more T4 stage than CD54 low expression group. But no clinicopathological features were independently associated with CD54 expression.
In qPCR, 53 (57.0%) and 40 (43.0%) were divided into CD54 low expression and CD54 high expression groups, but no clinicopathological characteristics were significantly related to CD54 expression.
In primary lesions (n=93) tested by qPCR and IHC, we found that there were no relationships between IHC and qPCR in CD54 (p=0.477), Lgr5 (p=0.576), CD133 (p=0.792), Oct4 (p=0.834), EpCAM (p=0.630) and Sox2 (p=0.250). We have also compared the expression of these markers in some patients through Western Blot, the results of which was similar to IHC, but not consistent with qPCR in EpCAM, CD133, Oct4 and Lgr5 ( Figure 3).

Prognostic significance of the expressions of these markers in primary lesions
In this study, 341(90.5%) patients in IHC and 89 (95.7%) in qPCR were followed up. But we only included 325 (86.2%) patients in IHC (Sox2: 86.0%, 264/307) and 80 (86.0%) patients in qPCR with R0 resection to perform the survival analyses. The median survival time (MST) and 2-year overall survival rates of different groups of these markers in IHC and qPCR were shown in Table 5. The MST was not applicable when the survival rates at the end of follow-up time were still higher than 50%.
In IHC, we found that the patients with Oct4 (p=0.024) and EpCAM (p=0.005) high expressions had significantly worse prognosis than those with low expression in Kaplan-Meier analyses ( Figure 4). The differences between low expression and high     (13)  7 (18)  8 (16) 6 (14) 9 (13) 5 (21)  7 (16) 7 (14)  8  (12) 6 (21)  6  (14) 8 (16) L 25 (47)   1 (4) 4 (9) 6 (12)  Figure 4). To eliminate the potential bias from TNM stage, we compared the prognosis between low and high expressions of these markers stratified by TNM stage ( Figure 5). We found that the prognostic differences were significant between Oct4 low and high expressions in TNM IV stage (p=0.045), EpCAM low and high expression in TNM I stage (p=0.045) and TNM II stage (p<0.001). We found that more patients received chemotherapy in EpCAM low expression group than EpCAM high expression group (p=0.003). To eliminate the influence of chemotherapy, we compared the prognosis of patients with or without chemotherapy between EpCAM low and high expression. Kaplan Meier curve showed that although there were no significant differences between EpCAM low and high expression in patients with (p=0.078) or without chemotherapy (p=0.126), the trend that EpCAM high expression group had worse prognosis than low expression group was still visible ( Figure 6). In Cox regression of patients with all markers tested except Sox2, no markers were the independent prognostic factors.    In qPCR, the univariate survival analyses revealed that the patients with Lgr5 low expression (p=0.038) and CD133 high expression (p<0.001) had significantly worse prognosis than those with Lgr5 high expression and CD133 low expression, respectively ( Figure 8). Contrarily, no significant differences in prognosis were found between the low and high expression of Oct4 (p=0.351), CD54 (p=0.237), EpCAM (P=0.172) and Sox2 (p=0.189) (Figure 8). In multivariate analyses, only CD133 was demonstrated to be independently related to survival outcomes (p<0.001, HR=4.338, 95%CI [2.152-8.747]).

Nomogram of prognostic prediction based on Oct4-EpCAM expression in primary lesions in IHC
Further, we used nomogram to predict 2-year overall survival rate of individual patient in IHC. The results showed that age, T stage, N stage, M stage, and Oct4-EpCAM expression (p=0.040, HR=1.484, 95%CI 1.019-2.160) were included in the nomogram (Figure 9), indicating that Oct4-EpCAM double low expression group had better survival outcomes, which was similar to that of aforementioned multivariate analyses. The calibration curve of nomogram showed that the predictive probability of 2-year survival were very closely to the actual 2-year survival ( Figure 10). Subsequently, we compared the predictive accuracy of prognosis between the nomograms based on Oct4-EpCAM expression and TNM staging (only T stage, N stage and M stage included, Figure 11, 12). The C-indexes of nomograms were 0.711 (95%CI 0.676-0.746), compared with 0.698 (95%CI 0.659-0.737) of TNM staging system in this study. The results indicated that the prognostic prediction accuracy of nomograms based on Oct4-EpCAM expression and other parameters was significantly better than TNM staging system alone (p=0.003).

Clinical significance of these markers in metastatic LNs in IHC
In 275 patients with positive N stage, we collected 206 (74.9%) metastatic LNs to perform IHC. There were 24 (11.7%) patients in CD54, 45 (21.8%) in Lgr5, 23 (11.2%) in CD133, 43 (20.9%) in Oct4 and 167 (81.1%) in EpCAM in high expression groups. For Sox2, out of 223 patients with positive N stage, 185 (83.0%) metastatic LNs were collected to be investigated by IHC, and the high expression rate was 33.5% (n=62). All the expressions of these markers in metastatic LNs had significantly relation to the expressions in primary lesions (all p<0.001).

DISCUSSION
The accurate prognostic prediction of GC are pivotal in clinical practice. At present, as the main method, TNM stage system is widely applied to reveal tumor progression and predict prognosis of patients. Nevertheless, TNM stage can only reflect the general information of tumor progression. Because of the heterogeneity, the individual specific characteristics of tumors and patients cannot be completely revealed only through TNM stage system. With the development of biological technology, the heterogeneity of genetics, like proteomics, genomics, has been gradually discovered in GC [12]. Therefore, besides TNM stage, it is very important to take the heterogeneity of GC into consideration in prognostic prediction. Additionally, CSCs, as the putative source of tumor maintaining and therapy resistance, have also been investigated deeply and widely in many tumors in recent        years. In this present study, we focused on the expressions of specific markers of CSCs in GC tissues to find out their clinical significance in GC and potential application in clinical practice.
In this study, we investigated the expressions of Lgr5, Oct4, CD133, EpCAM, CD54 and Sox2 through IHC and qPCR in GC tissues. In IHC, multivariate analyses demonstrated that Lgr5, CD133 and EpCAM were independently related to old age, CD133 and Sox2 were independently associated with well or moderate differentiation, while Oct4, CD133 and EpCAM were independently related to tumor progression. Regarding qPCR, logistic regression analyses showed that Lgr5 and Sox2 were independently related to well or moderate differentiation and young age, respectively. With respect to prognosis in IHC, we only found that the patients with high expression of Oct4 and EpCAM had significantly worse survival outcomes than those with low expression in univariate analyses. The differences between the high and low expression groups of other four markers were not significant. However, none of these six markers were independent prognostic factors. Based on the differences in survival outcomes between the patients in high and low expression of Oct4 and EpCAM, we combined Oct4 and EpCAM together to investigate the survival outcomes between low expression of Oct4-EpCAM and high expression of Oct4/EpCAM (control group). Cox regression showed that Oct4-EpCAM was the independent prognostic factor. In qPCR, we only found that CD133 was an independent prognostic factor, indicating that the patients with CD133 high expression had significantly worse prognosis than those with CD133 low expression.
Lgr5 was identified as the putative marker of CSCs in colon cancers [13]. Lgr5 was found to be related to depth of invasion, LNs metastasis, distance of metastasis and poor prognosis, and after Lgr5 was inhibited by siRNAs, fewer GC cells migrated through transwell model [14]. Lgr5 had also been considered as an important marker in carcinogenesis of GC, indicating that Lgr5 expression was gradually increased from normal control tissues to GC tissues [15]. Lgr5 was also been studied as a potential novel biomarker in chemoresistance of GC cells and predicting response to chemotherapy and prognosis [16]. However, other study demonstrated that Lgr5 was increased expressed in well-moderate differentiation, stage I and stage II, compared with stage III and stage IV [17].
Oct4 was identified as the putative marker of oral cancer stem-like cells and played a pivotal role in the chemoresistance of CSCs derived from prostate cancer [18,19]. Previous study showed that GC patients with negative expression of Oct4 had worse prognosis than those with positive expression [20]; however, other report showed that Oct4 was expressed higher in GC tissues than non-cancerous tissues and associated with poor differentiation [21]. It was demonstrated that metastatic lesions had more Oct4 positive expression than negative expression [22].
CD133 has been widely investigated as the specific marker of CSCs of brain tumors, prostate cancer, melanoma and pancreatic cancer, but there are still some controversial results indicating that CD133 negative cells might also include CSCs [4,5,23,24]. In our study, CD133 was generally expressed in the lumen of carcinoma glands, which was also reported by other previous research [25]. However, besides luminal expression, cytoplasmic location was another kind of expression and previous research showed that the cytoplasmic expression of CD133 was related to metastasis and tumor progression, but this relationship was not observed in luminal expression [25]. The GC patients with CD133 positive expression were related to poorly differentiation, and had significantly poor survival outcomes than those with CD133 negative expression [26][27]. In our study, although CD133 expression was related to well or moderate differentiation, CD133 expression was also associated with N2-3 stage, which was also similar to previous studies [28][29]. However, we did not found that CD133 was related to survival outcomes in IHC. Instead, we demonstrated that the patients with CD133 high expression had significantly worse prognosis through qPCR.
EpCAM has also been targeted as the putative marker of epithelial CSCs of ovarian cancer and pancreatic cancer [24,30]. EpCAM was found significantly related to large tumor size and poor survival outcomes through IHC, which was similar to a previous study [31]. EpCAM was also found high expressed in peritoneal metastasis of GC, indicating that only GC cells with high expression of EpCAM might metastasize to the peritoneum [32]. In some experiments, the capabilities of cell proliferation and tumor formation in nude mice of GC cell lines were impaired after EpCAM downregulation [33]. However, another study reported that the patients with loss of EpCAM expression had significantly worse prognosis than those without loss and in stage I and II disease, loss of EpCAM expression was related to aggressive tumors [34].
CD54 was found as the surface marker of cancer stem cells of hepatocellular carcinoma, GC and rectal cancer [6,35,36]. We found that CD54 was not independently associated with any clinical pathological characteristics and CD54 was not related to prognosis. However, a previous report with 108 patients demonstrated that CD54 was significantly related to advanced stage and liver metastasis [37]. Additionally, many reports found that serum level of soluble CD54 was closely associated with GC progression, hematogenous metastasis and prognosis [7,8].
Sox2 was expressed in the spheres of glioblastoma and gliosarcoma, and also played an important role in the epithelial mesenchymal transition of glioma stem cells [38]. Our study found that Sox2 was expressed in nuclei and cytoplasm. Some studies found that Sox2 high expression might be associated with invasion of gastric cancer and poor survival outcomes [20,22,39]. Loss of expression of Sox2 indicated a worse prognosis [40]. However, we found that Sox2 high expression was a favorable prognostic factor.
In this study, we used X-tile software to calculate the cut-point of each marker tested through IHC and qPCR to divide the patients into low expression and high expression groups. These cut-points were on the basis of survival data. Hence, we thought that the cut-points could reveal the differences between low and high expression more realistically. Additionally, we used monoclonal antibodies of all these markers in IHC to try our best to make the results more specific and solid. However, polyclonal antibodies were used in some studies [25,17,21,29,37,[41][42][43][44]. We thought that different types of antibodies might also be the reason of the variable high expression rates among our study and some previous ones. Moreover, we applied C-index to compare the accuracy of prognostic prediction and nomogram, a visualized method based on several valuable parameters to illustrate the prognosis of individual patients in this study. Through C-index and nomogram, we found that the prognostic prediction of nomogram based on Oct4 and EpCAM, age and TNM stage had significantly better accuracy than TNM alone, which indicated that the expressions of Oct4 and EpCAM were valuable in prognosis prediction of patient with GC. The results suggested that we should not only focus on TNM stage, but also pay attention to some specific characteristics of patients and tumors. However, we found that few previous studies had applied these kinds of methods in GC.
Our study applied IHC and qPCR to investigate the protein and mRNA expression of GC tissues. Moreover, qPCR also tested the mRNA expression of normal gastric tissues. These might be the reason why the results of IHC and qPCR were not correlated with each other. The results of Western Blot was similar to IHC, but not consistent with qPCR in some markers. We thought that the expression of mRNA in qPCR might differ from the expression of protein in IHC because of the changes after transcription and transduction. Additionally, our study only investigated the clinical significance of these markers, but did not involve the molecular mechanism of these markers. At present, these markers were mainly applied in identification of CSCs and investigation of their clinical significance. Regarding the mechanism, some study found that TR4-Oct4-IL1Ra axis might play a critical role in the development of chemoresistance in the prostate cancer stem/progenitor cells [45]. GC cell migration was enhanced through increasing CD54 through Rho/ROCK pathway by leptin [46].
There were still some limitations in our study. This study was a retrospective one with 377 patients in IHC and 93 patients in qPCR. Because of the difficulty and feasibility of GC tissues, we could only collect and test GC tissues with enough size to avoid the influence of postoperative pathological examination. Therefore, most GC tissues in qPCR were TNM III-IV stage. In addition, CD44 was another very important marker of GC and gastric CSCs and had been reported widely. In this study, we mainly focused on these six markers with fewer reports. Additionally, this study only enrolled the patients in our hospital, and the results should be still further demonstrated through external validation.
In conclusion, our study showed that low expressions of Oct4-EpCAM in IHC and CD133 in qPCR were favorable prognostic factors in GC. The nomogram based on the expression of Oct4-EpCAM was accurate and valuable in prognostic prediction of patients with GC.

Patients
Available formalin-fixed paraffin-embedded primary lesions (n=377) and metastatic LNs (n=194) of patients with GC in Department of Gastrointestinal Surgery, West China Hospital, Sichuan University from January 2006 to October 2012 were retrospectively enrolled for CD54, Lgr5, Oct4, CD133 and EpCAM tests through IHC. For Sox2, available formalin-fixed paraffin-embedded primary lesions (n=307) and metastatic LNs (n=184) were collected from January 2007 to October 2012. We also collected 93 pairs of primary lesions and corresponding adjacent normal gastric tissues stored in liquid nitrogen of patients with GC from December 2011 to October 2012 to perform qPCR. The primary lesions of these 93 patients had also been tested by IHC. All patients were followed up through telephones, mails and outpatient visits up to January 2015. The clinicopathological characteristics and follow-up details were collected. The West China Hospital research ethics committee approved retrospective analysis of anonymous data. Signed patient informed consent was waived per the committee approval, because it was a retrospective analysis. The flow chart of the patients was shown in Figure 13.

qPCR
Total RNA was isolated from GC tissues by TRIzol (Invitrogen) according to the instructions. Reverse transcription of total RNA was carried out with PrimeScript RT reagent kit (TAKARA Biotechnology (Dalian) Co., Ltd) on PCR amplifier under the following conditions: 37°C for 15 min, 85°C for 5 seconds. After that, cDNA was tested in real-time qPCR on the CFX96 Real Time PCR System with the use of Premix Ex Taq (SYBR and probe qPCR, TAKARA Biotechnology (Dalian) Co., Ltd) under the following conditions: 95°C activation for 30 seconds, 95°C denaturation for 5 seconds, 60°C annealing and elongation for 30 seconds, which repeated for 40 cycles. The results were recorded with CT value. After comparison of amplification efficiency, Livak method (2 -ΔΔCT ) was used to compare the difference between GC tissues and corresponding adjacent normal gastric tissues, in which GAPDH was used as reference gene and corresponding adjacent normal gastric tissues were applied as calibration control. All the sequences of primers and probes of these genes were designed and synthesized by TAKARA Biotechnology (Dalian) Co., Ltd (Table 6).

Western Blot
The protein of gastric cancer tissues was extracted through RIPA Buffer (Aidlab) with protease and phosphatase inhibitor cocktails (Roche). BCA protein assay (KeyGen biotech) was used to quantify protein concentration. The protein (30 μg) was separated in 10% Tris Glycine SDS gels and transferred to polyvinylidene difluoride membranes (Millipore). The membranes were blocked with 5% milk in TBST for 60 minutes at room temperature, then incubated with primary antibodies overnight at 4°C. After incubating with secondary antibodies (ZhongShan Golden Bridge Biotechnology Co., Ltd), the membranes were tested with Super Signal West Femto Masimun sensitivity substrate (Thermo Scientific).

Statistical analyses
Statistical analyses were mainly conducted by SPSS software (Version 22, IBM). Chi-square test and rank sum test (Mann-Whitney U test) were used to analyze the unordered categorical variable and ranked data, respectively. Student's t-test was used to analyze the continuous data, if homogeneity of variance and normal distribution. Otherwise, rank sum test was used. Logistic regression was applied in multivariate correlation analysis. Kaplan-Meier and life-table methods were used to calculate the cumulative survival rates. Log-rank test and Cox's proportional hazard regression model were conducted for univariate and multivariate survival analyses, respectively. Prism 5 for Windows (Version 5.01, GraphPad Software) was used to draft the figure of Kaplan-Meier curve. Nomograms and calibration curves were performed through R for Windows (Version 3.2.0, R Foundation for Statistical Computing) with the package of Regression Modeling Strategies (rms), in which the variables were selected according to the model by Akaike information criterion in a stepwise algorithm. Comparison between the nomogram and TNM stage was performed with the package of Harrell Miscellaneous (Hmisc) and was evaluated by C-index meaning that the larger C-index, the more accurate was the prognostic prediction. Two-sided p value less than 0.05 was considered as statistical significance.

ACKNOWLEDGMENTS
The authors thank Volunteer Team of Gastric Cancer Surgery (VOLTGA), West China Hospital, Sichuan University, China for the substantial work in data collection and follow-up of the database.