Research Papers:
Tumor gene expression data classification via sample expansion-based deep learning
PDF | HTML | How to cite
Metrics: PDF 1611 views | HTML 2715 views | ?
Abstract
Jian Liu1,*, Xuesong Wang1,*, Yuhu Cheng1 and Lin Zhang1
1School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
*Joint First Authors
Correspondence to:
Xuesong Wang, email: [email protected]
Yuhu Cheng, email: [email protected]
Keywords: gene expression data; classification; sample expansion; deep learning; 1-dimensional convolutional neural network
Received: August 31, 2017 Accepted: October 29, 2017 Published: November 30, 2017
ABSTRACT
Since tumor is seriously harmful to human health, effective diagnosis measures are in urgent need for tumor therapy. Early detection of tumor is particularly important for better treatment of patients. A notable issue is how to effectively discriminate tumor samples from normal ones. Many classification methods, such as Support Vector Machines (SVMs), have been proposed for tumor classification. Recently, deep learning has achieved satisfactory performance in the classification task of many areas. However, the application of deep learning is rare in tumor classification due to insufficient training samples of gene expression data. In this paper, a Sample Expansion method is proposed to address the problem. Inspired by the idea of Denoising Autoencoder (DAE), a large number of samples are obtained by randomly cleaning partially corrupted input many times. The expanded samples can not only maintain the merits of corrupted data in DAE but also deal with the problem of insufficient training samples of gene expression data to a certain extent. Since Stacked Autoencoder (SAE) and Convolutional Neural Network (CNN) models show excellent performance in classification task, the applicability of SAE and 1-dimensional CNN (1DCNN) on gene expression data is analyzed. Finally, two deep learning models, Sample Expansion-Based SAE (SESAE) and Sample Expansion-Based 1DCNN (SE1DCNN), are designed to carry out tumor gene expression data classification by using the expanded samples. Experimental studies indicate that SESAE and SE1DCNN are very effective in tumor classification.
All site content, except where otherwise noted, is licensed under a Creative Commons Attribution 4.0 License.
PII: 22762