Research Papers:

Identification of self-interacting proteins by exploring evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix

Ji-Yong An, Zhu-Hong You _, Xing Chen, De-Shuang Huang, Zheng-Wei Li, Gang Liu and Yin Wang

PDF  |  HTML  |  How to cite  |  Order a Reprint

Oncotarget. 2016; 7:82440-82449. https://doi.org/10.18632/oncotarget.12517

Metrics: PDF 654 views  |   HTML 1757 views  |   ?  


Ji-Yong An1,*, Zhu-Hong You2,*, Xing Chen3, De-Shuang Huang4, Zheng-Wei Li1, Gang Liu5, Yin Wang1

1School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 21116, China

2Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China

3School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou 221116, China

4School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China

5College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China

*joint First Authors

Correspondence to:

Zhu-Hong You, email: zhuhongyou@gmail.com

Xing Chen, email: xingchen@amss.ac.cn

De-Shuang Huang, email: dshuang@tongji.edu.cn

Keywords: disease, position-specific scoring matrix, protein self-interaction, cancer

Received: June 29, 2016     Accepted: September 28, 2016     Published: October 08, 2016


Self-interacting Proteins (SIPs) play an essential role in a wide range of biological processes, such as gene expression regulation, signal transduction, enzyme activation and immune response. Because of the limitations for experimental self-interaction proteins identification, developing an effective computational method based on protein sequence to detect SIPs is much important. In the study, we proposed a novel computational approach called RVMBIGP that combines the Relevance Vector Machine (RVM) model and Bi-gram probability (BIGP) to predict SIPs based on protein sequence. The proposed prediction model includes as following steps: (1) an effective feature extraction method named BIGP is used to represent protein sequences on Position Specific Scoring Matrix (PSSM); (2) Principal Component Analysis (PCA) method is employed for integrating the useful information and reducing the influence of noise; (3) the robust classifier Relevance Vector Machine (RVM) is used to carry out classification. When performed on yeast and human datasets, the proposed RVMBIGP model can achieve very high accuracies of 95.48% and 98.80%, respectively. The experimental results show that our proposed method is very promising and may provide a cost-effective alternative for SIPs identification. In addition, to facilitate extensive studies for future proteomics research, the RVMBIGP server is freely available for academic use at

Creative Commons License All site content, except where otherwise noted, is licensed under a Creative Commons Attribution 3.0 License.
PII: 12517