Research Papers:
Machine learning-based survival prediction in colorectal cancer combining clinical and biological features
PDF | Full Text | How to cite | Press Release
Metrics: PDF 169 views | Full Text 636 views | ?
Abstract
Lucas M. Vieira1,2, Natasha A.N. Jorge3, João B. Sousa4, João C. Setubal5, Peter F. Stadler3 and Maria E.M.T. Walter1
1 Department of Computer Science, University of Brasília, Campus Universitario Darcy Ribeiro, Prédio CIC/EST, Brasília, DF 71910-900, Brazil
2 Current Affiliation - Department of Pharmacology, School of Medicine, University of California San Diego, California, CA 92093, USA
3 Bioinformatics, Institute for Informatics, Leipzig University, Leipzig, Saxony 610101, Germany
4 Division of Coloproctology, Department of Surgery, University of Brasilia, Campus Universitario Darcy Ribeiro, Faculdade de Medicina, Brasília, DF 70910-900, Brazil
5 Institute of Chemistry, Department of Biochemistry, University of São Paulo, Av. Prof. Lineu Prestes, São Paulo, SP 05508-000, Brazil
Correspondence to:
| Lucas M. Vieira, | email: | [email protected] |
Keywords: colorectal cancer; machine learning; feature selection; non-coding RNAs; genes
Received: April 19, 2025 Accepted: November 24, 2025 Published: December 15, 2025
ABSTRACT
Colorectal cancer (CRC) is one of the most common and lethal types of cancer worldwide. Understanding both the biological and clinical aspects of the patient is essential to uncover the mechanism underlying the prognosis of the disease. However, most current approaches focus primarily on clinical or biological elements, which can limit their ability to capture the full complexity of the prognosis of CRC. This study aims to enhance understanding of the mechanisms of CRC by combining clinical and biological data from CRC patients with machine learning techniques (ML) to explore the importance of features and predict patient survival. First, we performed differential expression analysis and inspected patient survival curves to identify relevant biological features. Then, we applied ML techniques to understand the individual impact of each clinical and biological feature on patient survival. E2F8, WDR77, and hsa-miR-495-3p stood out as biological features, while pathological stage, age, new tumor event, lymph node count, and chemotherapy have shown themselves as interesting clinical features. Furthermore, our ML model achieved an accuracy of 89.58% to predict patient survival. The clinical and biological features proposed here in conjunction with ML can improve the interpretation of CRC mechanisms and predict patient survival.
All site content, except where otherwise noted, is licensed under a Creative Commons Attribution 4.0 License.
PII: 28783
