Research Papers:

Prediction of the aquatic toxicity of aromatic compounds to tetrahymena pyriformis through support vector regression

Qiang Su _, Wencong Lu, Dongshu Du, Fuxue Chen, Bing Niu and Kuo-Chen Chou

PDF  |  HTML  |  Supplementary Files  |  How to cite

Oncotarget. 2017; 8:49359-49369. https://doi.org/10.18632/oncotarget.17210

Metrics: PDF 2338 views  |   HTML 2563 views  |   ?  


Qiang Su1, Wencong Lu2, Dongshu Du1,3, Fuxue Chen1, Bing Niu1,4 and Kuo-Chen Chou4,5,6

1College of Life Science, Shanghai University, Shanghai 200444, China

2Department of Chemistry, College of Sciences, Shanghai University, Shanghai 200444, China

3Department of Life Science, Heze University, Shandong 274500, China

4Gordon Life Science Institute, Boston, MA 02478, USA

5Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China

6Center of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia

Correspondence to:

Bing Niu, email: [email protected], [email protected]

Fuxue Chen, email: [email protected]

Dongshu Du, email: [email protected]

Keywords: aromatic compounds, tetrahymena pyriformis, QSAR, genetic algorithm, mRMR

Received: March 10, 2017    Accepted: March 30, 2017    Published: April 13, 2017


Toxicity evaluation is an extremely important process during drug development. It is usually initiated by experiments on animals, which is time-consuming and costly. To speed up such a process, a quantitative structure-activity relationship (QSAR) study was performed to develop a computational model for correlating the structures of 581 aromatic compounds with their aquatic toxicity to tetrahymena pyriformis. A set of 68 molecular descriptors derived solely from the structures of the aromatic compounds were calculated based on Gaussian 03, HyperChem 7.5, and TSAR V3.3. A comprehensive feature selection method, minimum Redundancy Maximum Relevance (mRMR)-genetic algorithm (GA)-support vector regression (SVR) method, was applied to select the best descriptor subset in QSAR analysis. The SVR method was employed to model the toxicity potency from a training set of 500 compounds. Five-fold cross-validation method was used to optimize the parameters of SVR model. The new SVR model was tested on an independent dataset of 81 compounds. Both high internal consistent and external predictive rates were obtained, indicating the SVR model is very promising to become an effective tool for fast detecting the toxicity.

Creative Commons License All site content, except where otherwise noted, is licensed under a Creative Commons Attribution 4.0 License.
PII: 17210