NOD2 maybe a biomarker for the survival of kidney cancer patients

Background Nucleotide-binding oligomerization domain-containing protein 2 (NOD2) may play an important role in the outcome of kidney cancer patients. To explore the relationship between NOD2 and the prognosis of kidney cancer patients, a databank-based reanalysis was conducted. Materials and Methods Data related to kidney cancer patients at least with survival information, was obtained mainly from The Cancer Genome Atlas (TCGA). Some clinical data, not available online, was collected by personal email to the author. Then, we reanalyzed all the data in order to make a conclusion about the relationship between NOD2 gene and the prognosis of kidney cancer patients. Results A total of 1953 samples with NOD2 information from four databanks of The Cancer Genome Atlas (TCGA) were enrolled in this study. The results of KIPAN showed the Kaplan-Meier curve for risk groups, concordance index, and p-value of the log-rank testing equality of survival curves ( Concordance Index = 56.57, Log−Rank Equal Curves p=0.0009006, R^2 = 0.036/0.953, Risk Groups Hazard Ratio = 1.61 (conf. int. 1.21 ~ 2.13), p = 0.001005) , while a box plot across risk groups, including the p-value testing for difference using t-test (or f-test for more than two groups) was shown. There was a statistical significance for the p value of the result (p < 0.01 ). The similar results could be seen in KIRC and the fourth data (including 468 samples). Conclusions The status of NOD2 gene maybe a biomarker for the survival of kidney cancer patients.


INTRODUCTION
The nucleotide-binding oligomerization domain 2 (NOD2)-like receptors (NLRs) belong to evolutionconserved pattern recognition receptors (PRRs) family locating in cytoplasm, closely related to responses, innate immunity and adaptive immunity [1]. Schroder K reported that they can be triggered by exogenous pathogen-associated molecular patterns (PAMPs) or endogenous damage-associated molecular patterns (DAMPs) [2]. NLRs may play a vital role on NF-κB and mitogen activated protein kinase (MAPK) signaling pathway stimulation and expression of immune response cytokines and chemokines [3]. Stimulation of NLRs can
Formerly, it was mainly considered to be a pathogenic factor for immune inflammations [6][7][8]. With the development of research, we found that the NLRs family, including NOD1, NOD2 and others, may also play some role in the area of cancers. Single nucleotide polymorphisms (SNPs) of NOD1 and NOD2 have been found to be associated with risk of gastric cancers (GC) and precancerous lesions in Caucasian population [9][10][11]. Then, the similar result was also seen in Chinese population [12]. However, the relationship between NOD2 and kidney cancer patients was poorly understood. Although some clinical trials had been taken to investigate the role of immunotherapy for kidney cancer patients, scarcely any trials were involved in NOD2 genetic status. In order to conclude the role of NOD2 among kidney cancer patients, we downloaded the published data and made further analysis with the help of some online tools.

RESULTS
A total of 1953 samples with NOD2 information from four databanks of TCGA were enrolled in this study. Of 1953 subjects with survival status, 468 cases were along with data about clinical grade and stage. The overall characteristic of them can be seen in Table 1. The primary data of them were listed in supplementary material. We performed all analysis in SurvExpress using the maximum row average for NOD2 with multiple probe sets, two risk groups by prognostic index median, and Cox fitting.
We analyzed the data only with survival information first. Among all the results, green color represents lowrisk groups, and red color means high-risk groups. Figure  1A showed the Kaplan-Meier curve for risk groups, concordance index (CI), and p-value of the log-rank testing equality of survival curves (CI = 56.57, Log− Rank Equal Curves p = 0.0009006, R^2 = 0.036/0.953, Risk Groups Hazard Ratio = 1.61 (conf. int. 1.21 ~ 2.13), p = 0.001005) of KIPAN, as recommended by Bovelstad HM [13], while a box plot across risk groups, including the p-value testing for difference using t-test (or f-test for more than two groups) was shown in Figure 1B. There was a statistical significance for the P value of the result (P < 0.01 ). From the above analysis, we found that the expression level of NOD2 gene might be a bad signal for the prognosis of kidney cancer patients. Then,we analyzed another two data from TCGA, named KIRC and KIRP, but the similar results were only observed in KIRC ( Figure 1C and Figure 1D). As is shown in Figure 1E and Figure 1F  To further analyze the data about NOD2 gene expressive level, we tested the NOD2 in the fourth database from TCGA that contains 468 samples with survival, grade and stage data using the SurvExpress stratification functionality. The overall results of the fourth data were summarized in Figure 2A and 2B (CI = 59.7, Log−Rank Equal Curves p=0.0008616, R^2=0.042/0.972, Risk Groups Hazard Ratio = 1.74 (conf. int. 1.25 ~ 2.41), p = 0.001002), which p value was of notable significance. Then,the stratification analysis of 468 samples was made according to grade, stage, pathology, and death of the tumor data.. The Log−Rank Equal Curves were obviously separated from each other in Figure 3A and Figure 4A, when all the patients were grouped by tumor grade and stage. However, when every subgroup was divided into two risk groups in Figure 3B-3E and Figure 4B-4D, the results of stratification analysis for every stage patients were ambiguous except for Figure 4E. No statistical significance was observed. Similar indefinite results of risk subgroup stratification analysis, according to pathology and death of the tumor data,could also be seen. They were gathered in Figure 5 and Figure 6. The details of the stratification analysis results were displayed in Table 2.

DISCUSSION
There are nearly 270,000 patients suffering kidney cancer, which leads to over 115,000 deaths each year [14]. As known, kidney cancer is comprised of many different histological and genetical types of cancer, and the histological type of cancer is associated with clinical course and responses to treatment [15,16]. If the kidney tumor is small, it can be surgically removed and makes patients acquire an expectation of a 5-to 10-years survival. However, if a patient presents with metastasized kidney cancer, nearly 80% will die of this disease within 2 years [17]. Multiple genes linked with kidney cancer, including the VHL, MET, FLCN, fumarate hydratase, succinate dehydrogenase, TSC1, TSC2, and TFE3, were identified by genomic studies and have significantly altered the ways in which patients with kidney cancer are managed.
Though, seven FDA-supported agents, targeting the VHL pathway, had been admitted for the therapy of patients with advanced kidney cancer, further genomic studies, such as whole genome sequencing, gene expression patterns, and so on, will still be needed to get a complete understanding of the genetic basic mechanism of kidney cancer and the kidney cancer gene pathways and, most importantly, to provide the foundation for the development of effective forms of therapy for patients with the disease [18]. More and more new cancer-related genes are required to be investigated in order to provide useful guidance for cancer treatments.
The NOD2-like receptors (NLRs) belong to evolution-conserved pattern recognition receptors (PRRs) family locating in cytoplasm, closely related to responses, innate immunity and adaptive immunity [1]. NLRs are expressed in various cell types, including macrophage, www.impactjournals.com/oncotarget Box plot across risk groups, including the p-value testing for difference using t-test (or f-test for more than two groups) in KIPAN. The ordinate (Y-axis) means the expression percentage of the gene. the abscissa (X-axis) represents different risk groups. (C) Kaplan-Meier curve for risk groups, concordance index (CI), and p-value of the log-rank testing equality of survival curves in KIRC. Red and Green curves denote High-and Low-risk groups respectively. The ordinal (Y-axis) indicates the percentage of survival, the abscissa (X-axis) represents survival days, and the number of survivors at the corresponding time. Censoring samples are shown as "+" marks. The number of individuals, the number of censored, and the CI of each risk group are shown in the top-right insets. (D) Box plot across risk groups, including the p-value testing for difference using t-test (or f-test for more than two groups) in KIRC. The ordinate (Y-axis) means the expression percentage of the gene. the abscissa (X-axis) represents different risk groups. (E) Kaplan-Meier curve for risk groups, concordance index (CI) , and p-value of the log-rank testing equality of survival curves in KIRP. Red and Green curves denote High-and Low-risk groups respectively. The ordinal (Y-axis) indicates the percentage of survival, the abscissa (X-axis) represents survival days, and the number of survivors at the corresponding time. Censoring samples are shown as "+" marks. The number of individuals, the number of censored, and the CI of each risk group are shown in the top-right insets. (F) Box plot across risk groups, including the p-value testing for difference using t-test (or f-test for more than two groups) in KIRP. The ordinate (Y-axis) means the expression percentage of the gene. the abscissa (X-axis) represents different risk groups.
Oncotarget 101492 www.impactjournals.com/oncotarget    neutrophils, epithelial and endothelial cells, dendritic cells, as well as in malignant tumors [19]. Genetic variations in NOD1 and NOD2 are associated with increased susceptibility to Crohn's disease [20]. A.Marijke Keestra-Gounder found that NOD1 and NOD2, two members of the NLR family of PRRs, are important mediators of ER stressinduced inflammation. The association of NOD1 and NOD2 with pro-inflammatory responses induced by the IRE1-α/ TRAF2 signaling pathway provides a novel link between innate immunity and ER stress-induced inflammation [21]. With the development of whole genome sequencing, plenty of data about NOD2 gene in cancers can be acquired online, such as TCGA, Gene Expression Omnibus (GEO) and so on. Previously, most of us paid our attention to the relationship between NOD2 and common cancers except for kidney cancers. Some databases contained some information about the NOD2 gene, but we ignored its existence. Therefore, we tried to reanalyze the data online to make sure the role of NOD2 in kidney cancer patients.
The data used for reanalysis represented a wide variety of sample sizes. To achieve the results that would be representative of the greatest number of study parameters possible, all reanalysis included as much data information as possible. The analysis results of four TCGA data, including the Kaplan-Meier curve for risk groups, concordance index (CI), and p-value of the log-rank testing equality of survival curves, were shown in corresponding Figures 1-6.
The NOD2 expressive level seemed to be an adverse event for the survival of kidney cancer patients in KIPAN and KIRC database, which is obvious in Figure  1A and Figure 1C ,while a box plot across risk groups in Figure 1B and Figure 1D also showed the similar approval testimony. As the figure showed, the survival Kaplan-Meier curves, displayed in green and red colors, were separated distinctly from each other and p-value of the log-rank test was of statistically significant. The concordance index (CI) is one of the most commonly used performance measures of survival models. It can be interpreted as the fraction of all pairs of subjects whose predicted survival times are correctly ordered among all subjects that can actually be ordered. In other words, it is the probability of concordance between the predicted and the observed survival. The concordance index (CI = 56.57 and 56.48) of KIPAN and KIRC suggest that the the fraction of all pairs of samples whose predicted survival times are correctly ordered.
The reanalysis results of KIRP, shown in Figure 1E   Oncotarget 101497 www.impactjournals.com/oncotarget more information about the NOD2 gene in kidney cancers and find out the reason for the result of KIRP, we use SurvExpress to put stratification of 468 samples into practice according to death, grade, stage, and pathology. The detail of the reanalysis that was similar to KIPAN and KIRC ,was arranged in the form of graphs in Figure 2. Our reanalysis of data stated clearly that the NOD2 gene may have something to do with the survival of kidney cancers. The relevance was also endorsed by stratification analysis in Figure 3A and Fgiure 4A respectively according to tumor grade and stage. Although all survival curves showed a separate trend in the deeper stratification analysis of 468 samples, the p value was not statistically significant. I deduced that the emergence of such results may be related to limited samples assigned to different groups.The deeper analysis results of the fourth database also showed the inclination that the NOD2 expressive level was an unfavorable signal for the survival of kidney cancer patients.
A lot of researches revealed that the NOD2 gene may be relevant to caners [21][22][23][24], especially in gastric cancers, but it is the first time that the relationship between NOD2 gene and the prognosis of kidney cancers was revealed by a reanalysis of sequencing data. We clearly stated the close relationship between the NOD2 gene and tumor stage on the survival of kidney cancer patients by deeper stratification analysis. In the future, with the increase in kidney cancer patients data and the development of science, more and more unknown relevance of NOD2 gene may be elucidated. In the present analysis, no enough evidence was promulgated to authenticate the existing association between NOD2 and kidney cancers. More sequencing data and studies on the mechanism of cancer occurrence and clinical trials about NOD2 gene, such as inflammation and immune disease area [25,26], are needed to further reveal the specific mechanism of action in kidney cancer areas.

Data collection
We got all the data mainly from TCGA using keywords related to NOD2, kidney cancer, survival, and gene expression technologies. From TCGA, all data were obtained at the gene level (level 3). RNA-Seq counts data were log2 transformed. Then, we analyzed all the data with the help of online tools as ITTACA, KMPlot, Recurrence Online, bc-GeneExMiner, GOBO, PrognoScan and SurvExpress. We found that SurvExpress was most convenient online tool for this analysis, which is available in (http://bioinformatica.mty.itesm.mx/SurvExpress). It includes a tutorial that describes the analysis options, plots, tables, key concepts related to survival analysis, and representative methods to identify biomarker from gene expression data. The Characteristics of all the data was shown in Table 1.

Prognostic index estimation
During the process of this analysis, we mainly paid our attention to the prognostic index (PI), also named as the risk score sometimes, usually taken to generate risk groups and the linear component of the Cox model [27]. PI = β1x1+ β2x2+...+βpxp. Each β I can be considered as a risk coefficient. SurvExpress that is better than others for more procedures was adopted to estimate the β coefficients. The first one is the classical Cox model ,all genes are included in a unique model,as is performed in R (http://cran.r-project.org) using the survival package. Second, we can make sure a weight for each gene but using the values from the Cox fitting.
The concordance index (CI) is one of the most commonly used performance measures of survival models. It can be interpreted as the fraction of all pairs of subjects whose predicted survival times are correctly ordered among all subjects that can actually be ordered [28] . In other words, it is the probability of concordance between the predicted and the observed survival. The CI, which quantifies the quality of rankings, is the standard performance measure for model assessment in survival analysis. Survival rate was plotted using Kaplan−Meier method and analyzed using Log-rank test. The frequencies of categorical variables were compared using Pearson χ2 or Fisher's exact test, when appropriate. A value of P < 0.01 was considered to be significant.

Risk estimation
We take the first method to generate the risk groups splitting the ordered PI ( higher values corresponding to higher risk) and then leave equal number of samples in each group to mark the number of risk groups. If there are two risk groups, we will split the PI by the median. The second method producing risk groups takes an optimization algorithm for the ordered PI. In a word, for two groups, the log-rank test will be implemented along all values of the arranged PI. Then,the minimum p-value is chosen as the split point for the algorithm. This procedure is generalized for more than two groups repeatedly optimizing one risk group at the time until no changes are found. All the process of risk estimation can be easily finished by the online analysis tools [29]. In this survival analysis, the hazard ratio (HR) is the ratio of the hazard rates corresponding to the conditions described by two levels of risk groups.