Oncotarget

Research Papers:

Sequence-based predictive modeling to identify cancerlectins

Hong-Yan Lai, Xin-Xin Chen, Wei Chen, Hua Tang _ and Hao Lin

PDF  |  HTML  |  How to cite  |  Order a Reprint

Oncotarget. 2017; 8:28169-28175. https://doi.org/10.18632/oncotarget.15963

Metrics: PDF 979 views  |   HTML 2159 views  |   ?  


Abstract

Hong-Yan Lai1, Xin-Xin Chen1, Wei Chen1,2, Hua Tang3, Hao Lin1

1Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China

2Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, Tangshan, China

3Department of Pathophysiology, Southwest Medical University, Luzhou, China

Correspondence to:

Hua Tang, email: Tanghua771211@aliyun.com

Hao Lin, email: hlin@uestc.edu.cn

Keywords: cancerlectins, binomial distribution, optimal tripeptides, SVM

Received: January 18, 2017     Accepted: February 24, 2017     Published: March 07, 2017

ABSTRACT

Lectins are a diverse type of glycoproteins or carbohydrate-binding proteins that have a wide distribution to various species. They can specially identify and exclusively bind to a certain kind of saccharide groups. Cancerlectins are a group of lectins that are closely related to cancer and play a major role in the initiation, survival, growth, metastasis and spread of tumor. Several computational methods have emerged to discriminate cancerlectins from non-cancerlectins, which promote the study on pathogenic mechanisms and clinical treatment of cancer. However, the predictive accuracies of most of these techniques are very limited. In this work, by constructing a benchmark dataset based on the CancerLectinDB database, a new amino acid sequence-based strategy for feature description was developed, and then the binomial distribution was applied to screen the optimal feature set. Ultimately, an SVM-based predictor was performed to distinguish cancerlectins from non-cancerlectins, and achieved an accuracy of 77.48% with AUC of 85.52% in jackknife cross-validation. The results revealed that our prediction model could perform better comparing with published predictive tools.


Creative Commons License All site content, except where otherwise noted, is licensed under a Creative Commons Attribution 3.0 License.
PII: 15963