Clinical Research Papers:

A rank-based transcriptional signature for predicting relapse risk of stage II colorectal cancer identified with proper data sources

Wenyuan Zhao, Beibei Chen, Xin Guo, Ruiping Wang, Zhiqiang Chang, Yu Dong, Kai Song, Wen Wang, Lishuang Qi, Yunyan Gu, Chenguang Wang, Da Yang and Zheng Guo _

PDF  |  HTML  |  Supplementary Files  |  How to cite  |  Order a Reprint

Oncotarget. 2016; 7:19060-19071. https://doi.org/10.18632/oncotarget.7956

Metrics: PDF 714 views  |   HTML 1226 views  |   ?  


Wenyuan Zhao1, Beibei Chen1, Xin Guo1, Ruiping Wang1, Zhiqiang Chang1, Yu Dong1, Kai Song1, Wen Wang1, Lishuang Qi1, Yunyan Gu1, Chenguang Wang1, Da Yang3,4,5 and Zheng Guo1,2

1 College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China

2 Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Department of Bioinformatics, Fujian Medical University, Fuzhou, China

3 Department of Pharmaceutical Sciences, University of Pittsburgh, Pittsburgh, PA, USA

4 Women’s Cancer Research Center, University of Pittsburgh Cancer Institute, Pittsburgh, PA, USA

5 Department of Computational & Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA

Correspondence to:

Zheng Guo, email:

Da Yang, email:

Keywords: gene expression profiles, prognostic signatures, gene pairs, experimental batch effect, relative expression

Received: December 11, 2015 Accepted: February 25, 2016 Published: March 07, 2016


The irreproducibility problem seriously hinders the studies on transcriptional signatures for predicting relapse risk of early stage colorectal cancer (CRC) patients. Through reviewing recently published 34 literatures for the development of CRC prognostic signatures based on gene expression profiles, we revealed a surprising phenomenon that 33 of these studies analyzed CRC samples with and without adjuvant chemotherapy together in the training and/or validation datasets. This data misuse problem could be partially attributed to the unclear and incomplete data annotation in public data sources. Furthermore, all the signatures proposed by these studies were based on risk scores summarized from gene expression levels, which are sensitive to experimental batch effects and risk compositions of the samples analyzed together. To avoid the above-mentioned problems, we carefully selected three qualified large datasets to develop and validate a signature consisting of three pairs of genes. The within-sample relative expression orderings of these gene pairs could robustly predict relapse risk of stage II CRC samples assessed in different laboratories. The transcriptional and functional analyses provided clear evidence that the high risk patients predicted by the proposed signature represent patients with micro-metastases.

Creative Commons License All site content, except where otherwise noted, is licensed under a Creative Commons Attribution 3.0 License.
PII: 7956