Research Papers:

Comprehensive landscape of subtype-specific coding and non-coding RNA transcripts in breast cancer

Trung Nghia Vu, Setia Pramana, Stefano Calza, Chen Suo, Donghwan Lee and Yudi Pawitan _

PDF  |  HTML  |  Supplementary Files  |  How to cite

Oncotarget. 2016; 7:68851-68863. https://doi.org/10.18632/oncotarget.11998

Metrics: PDF 1968 views  |   HTML 2024 views  |   ?  


Trung Nghia Vu1,*, Setia Pramana1,*, Stefano Calza1,2,*, Chen Suo1, Donghwan Lee1,3, Yudi Pawitan1

1Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, SE 17177 Stockholm, Sweden

2Department of Molecular and Translational Medicine, University of Brescia, 25125 Brescia, Italy

3Department of Statistics, Ewha Womans University, Seodaemun-gu, Seoul 120-750, South Korea

*These authors have contributed equally to this work

Correspondence to:

Yudi Pawitan, email: [email protected]

Keywords: breast cancer, RNA sequencing, subtype-specific isoforms, subtype co-expression, non-coding RNAs

Received: June 30, 2016     Accepted: August 24, 2016     Published: September 13, 2016


Molecular classification of breast cancer into clinically relevant subtypes helps improve prognosis and adjuvant-treatment decisions. The aim of this study is to provide a better characterization of the molecular subtypes by providing a comprehensive landscape of subtype-specific isoforms including coding, long non-coding RNA and microRNA transcripts. Isoform-level expression of all coding and non-coding RNAs is estimated from RNA-sequence data of 1168 breast samples obtained from The Cancer Genome Atlas (TCGA) project. We then search the whole transcriptome systematically for subtype-specific isoforms using a novel algorithm based on a robust quasi-Poisson model. We discover 5451 isoforms specific to single subtypes. A total of 27% of the subtype-specific isoforms have better accuracy in classifying the intrinsic subtypes than that of their corresponding genes. We find three subtype-specific miRNA and 707 subtype-specific long non-coding RNAs. The isoforms from long non-coding RNAs also show high performance for separation between Luminal A and Luminal B subtypes with an AUC of 0.97 in the discovery set and 0.90 in the validation set. In addition, we discover 1500 isoforms preferentially co-expressed in two subtypes, including 369 isoforms co-expressed in both Normal-like and Basal subtypes, which are commonly considered to have distinct ER-receptor status. Finally, analyses at protein level reveal four subtype-specific proteins and two subtype co-expression proteins that successfully validate results from the isoform level.

Creative Commons License All site content, except where otherwise noted, is licensed under a Creative Commons Attribution 4.0 License.
PII: 11998