Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes

Hui Peng; Chaowang Lan; Yuansheng Liu; Tao Liu; Michael Blumenstein; Jinyan Li

doi:10.18632/oncotarget.20481

Oncotarget

Oncotarget (a primarily oncology-focused, peer-reviewed, open access journal) aims to maximize research impact through insightful peer-review; eliminate borders between specialties by linking different fields of oncology, cancer research and biomedical sciences; and foster application of basic and clinical science.

Its scope is unique. The term "oncotarget" encompasses all molecules, pathways, cellular functions, cell types, and even tissues that can be viewed as targets relevant to cancer as well as other diseases. The term was introduced in the inaugural Editorial, Introducing Oncotarget.

As of January 1, 2022, Oncotarget has shifted to a continuous publishing model. Papers will now be published continuously within yearly volumes in their final and complete form and then quickly released to Pubmed.

Subscribe to receive alerts once a paper has been published by Oncotarget.

Impact Journals, LLC is the publisher of Oncotarget: www.impactjournals.com.

Impact Journals is a member of the Wellcome Trust List of Compliant Publishers.

Impact Journals is a member of the Society for Scholarly Publishing.

On December 23, 2022, Oncotarget server experienced a DDoS attack. As a result, Oncotarget site was inaccessible for a few hours. Oncotarget team swiftly dealt with the situation and took it under control. This malicious action will be reported to the FBI.

Research Papers:

Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes

Hui Peng, Chaowang Lan, Yuansheng Liu, Tao Liu, Michael Blumenstein and Jinyan Li _

PDF | HTML | Supplementary Files | How to cite

Oncotarget. 2017; 8:78901-78916. https://doi.org/10.18632/oncotarget.20481

Metrics: PDF 942 views | HTML 2156 views | ?

Abstract

Hui Peng1, Chaowang Lan1, Yuansheng Liu1, Tao Liu2, Michael Blumenstein3 and Jinyan Li1

1Advanced Analytics Institute & Centre for Health Technologies, University of Technology Sydney, Broadway, NSW, Australia

2Centre for Childhood Cancer Research, University of New South Wales, Sydney, Kensington, NSW, Australia

3School of Software, University of Technology Sydney, Broadway, NSW, Australia

Correspondence to:

Jinyan Li, email: [email protected]

Keywords: chromosome preference, vectorization, long noncoding RNA

Received: April 23, 2017 Accepted: July 19, 2017 Published: August 24, 2017

ABSTRACT

Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes.

All site content, except where otherwise noted, is licensed under a Creative Commons Attribution 4.0 License.
PII: 20481

Publication Alerts

Research Papers:

Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes

Abstract