Oncotarget

Research Papers:

Tumor gene expression data classification via sample expansion-based deep learning

Jian Liu, Xuesong Wang _, Yuhu Cheng and Lin Zhang

PDF  |  HTML  |  How to cite

Oncotarget. 2017; 8:109646-109660. https://doi.org/10.18632/oncotarget.22762

Metrics: PDF 1510 views  |   HTML 2480 views  |   ?  


Abstract

Jian Liu1,*, Xuesong Wang1,*, Yuhu Cheng1 and Lin Zhang1

1School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China

*Joint First Authors

Correspondence to:

Xuesong Wang, email: [email protected]

Yuhu Cheng, email: [email protected]

Keywords: gene expression data; classification; sample expansion; deep learning; 1-dimensional convolutional neural network

Received: August 31, 2017    Accepted: October 29, 2017    Published: November 30, 2017

ABSTRACT

Since tumor is seriously harmful to human health, effective diagnosis measures are in urgent need for tumor therapy. Early detection of tumor is particularly important for better treatment of patients. A notable issue is how to effectively discriminate tumor samples from normal ones. Many classification methods, such as Support Vector Machines (SVMs), have been proposed for tumor classification. Recently, deep learning has achieved satisfactory performance in the classification task of many areas. However, the application of deep learning is rare in tumor classification due to insufficient training samples of gene expression data. In this paper, a Sample Expansion method is proposed to address the problem. Inspired by the idea of Denoising Autoencoder (DAE), a large number of samples are obtained by randomly cleaning partially corrupted input many times. The expanded samples can not only maintain the merits of corrupted data in DAE but also deal with the problem of insufficient training samples of gene expression data to a certain extent. Since Stacked Autoencoder (SAE) and Convolutional Neural Network (CNN) models show excellent performance in classification task, the applicability of SAE and 1-dimensional CNN (1DCNN) on gene expression data is analyzed. Finally, two deep learning models, Sample Expansion-Based SAE (SESAE) and Sample Expansion-Based 1DCNN (SE1DCNN), are designed to carry out tumor gene expression data classification by using the expanded samples. Experimental studies indicate that SESAE and SE1DCNN are very effective in tumor classification.


Creative Commons License All site content, except where otherwise noted, is licensed under a Creative Commons Attribution 4.0 License.
PII: 22762