Research Papers:

Identification of potential tissue-specific cancer biomarkers and development of cancer versus normal genomic classifiers

Akram Mohammed, Greyson Biegert, Jiri Adamec and Tomáš Helikar _

PDF  |  HTML  |  Supplementary Files  |  How to cite

Oncotarget. 2017; 8:85692-85715. https://doi.org/10.18632/oncotarget.21127

Metrics: PDF 1711 views  |   HTML 4470 views  |   ?  


Akram Mohammed1, Greyson Biegert1, Jiri Adamec1 and Tomáš Helikar1

1Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, Nebraska, USA

Correspondence to:

Tomáš Helikar, email: [email protected]

Keywords: cancer classification, biomarker identification, microarray gene expression, machine learning, cancer biomarker

Received: June 08, 2017     Accepted: September 05, 2017     Published: September 21, 2017


Machine learning techniques for cancer prediction and biomarker discovery can hasten cancer detection and significantly improve prognosis. Recent “OMICS” studies which include a variety of cancer and normal tissue samples along with machine learning approaches have the potential to further accelerate such discovery. To demonstrate this potential, 2,175 gene expression samples from nine tissue types were obtained to identify gene sets whose expression is characteristic of each cancer class. Using random forests classification and ten-fold cross-validation, we developed nine single-tissue classifiers, two multi-tissue cancer-versus-normal classifiers, and one multi-tissue normal classifier. Given a sample of a specified tissue type, the single-tissue models classified samples as cancer or normal with a testing accuracy between 85.29% and 100%. Given a sample of non-specific tissue type, the multi-tissue bi-class model classified the sample as cancer versus normal with a testing accuracy of 97.89%. Given a sample of non-specific tissue type, the multi-tissue multi-class model classified the sample as cancer versus normal and as a specific tissue type with a testing accuracy of 97.43%. Given a normal sample of any of the nine tissue types, the multi-tissue normal model classified the sample as a particular tissue type with a testing accuracy of 97.35%. The machine learning classifiers developed in this study identify potential cancer biomarkers with sensitivity and specificity that exceed those of existing biomarkers and pointed to pathways that are critical to tissue-specific tumor development. This study demonstrates the feasibility of predicting the tissue origin of carcinoma in the context of multiple cancer classes.

Creative Commons License All site content, except where otherwise noted, is licensed under a Creative Commons Attribution 4.0 License.
PII: 21127