Identification of self-interacting proteins by exploring evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix
Metrics: PDF 714 views | HTML 1810 views | ?
Ji-Yong An1,*, Zhu-Hong You2,*, Xing Chen3, De-Shuang Huang4, Zheng-Wei Li1, Gang Liu5, Yin Wang1
1School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 21116, China
2Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
3School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou 221116, China
4School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
5College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
*joint First Authors
Zhu-Hong You, email: email@example.com
Xing Chen, email: firstname.lastname@example.org
De-Shuang Huang, email: email@example.com
Keywords: disease, position-specific scoring matrix, protein self-interaction, cancer
Received: June 29, 2016 Accepted: September 28, 2016 Published: October 08, 2016
Self-interacting Proteins (SIPs) play an essential role in a wide range of biological processes, such as gene expression regulation, signal transduction, enzyme activation and immune response. Because of the limitations for experimental self-interaction proteins identification, developing an effective computational method based on protein sequence to detect SIPs is much important. In the study, we proposed a novel computational approach called RVMBIGP that combines the Relevance Vector Machine (RVM) model and Bi-gram probability (BIGP) to predict SIPs based on protein sequence. The proposed prediction model includes as following steps: (1) an effective feature extraction method named BIGP is used to represent protein sequences on Position Specific Scoring Matrix (PSSM); (2) Principal Component Analysis (PCA) method is employed for integrating the useful information and reducing the influence of noise; (3) the robust classifier Relevance Vector Machine (RVM) is used to carry out classification. When performed on yeast and human datasets, the proposed RVMBIGP model can achieve very high accuracies of 95.48% and 98.80%, respectively. The experimental results show that our proposed method is very promising and may provide a cost-effective alternative for SIPs identification. In addition, to facilitate extensive studies for future proteomics research, the RVMBIGP server is freely available for academic use at.
All site content, except where otherwise noted, is licensed under a Creative Commons Attribution 3.0 License.