Research Papers:

Distinct distributions of genomic features of the 5’ and 3’ partners of coding somatic cancer gene fusions: arising mechanisms and functional implications

Yongzhong Zhao, Won-Min Song, Fan Zhang, Ming-Ming Zhou, Weijia Zhang, Martin J. Walsh and Bin Zhang _

PDF  |  HTML  |  Supplementary Files  |  How to cite

Oncotarget. 2017; 8:66769-66783. https://doi.org/10.18632/oncotarget.10734

Metrics: PDF 2066 views  |   HTML 2050 views  |   ?  


Yongzhong Zhao1,2, Won-Min Song1,2, Fan Zhang3, Ming-Ming Zhou4, Weijia Zhang3, Martin J. Walsh1,4,5 and Bin Zhang1,2

1Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, NY 10029, USA

2Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, NY 10029, USA

3Department of Medicine, Icahn School of Medicine at Mount Sinai, NY 10029, USA

4Department of Structural and Chemical Biology, Icahn School of Medicine at Mount Sinai, NY 10029, USA

5Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA

Correspondence to:

Bin Zhang, email: [email protected]

Keywords: cancer somatic gene fusions, gene age, GC skew, DNA-RNA R-loops, somatic amplification

Received: April 02, 2016     Accepted: June 06, 2016     Published: July 20, 2016


The genomic features and arising mechanisms of coding cancer somatic gene fusions (CSGFs) largely remain elusive. In this study, we show the gene origin stratification pattern of CSGF partners that fusion partners in human cancers are significantly enriched for genes with the gene age ofEuteleostomes and with the gene family age of Bilateria. GC skew (a measurement of G, C nucleotide content bias, (G-C)/(G+C)) is a useful measurement to indicate the DNA leading strand, lagging strand, replication origin, and replication terminal and DNA-RNA R-loop formation. We find that GC skew bias at the 5 prime (5′) but not the 3 prime (3’) partners of CSGFs, coincident with the polarity feature of gene expression breadth that the 5’ partners are more ubiquitous while the 3’ fusion partners are more tissue specific in general. We reveal distinct length and composition distributions of 5’ and 3’ of CSGFs, including sequence features corresponded to the 5’ untranslated regions (UTRs), 3’ UTRs, and the N-terminal sequences of the encoded proteins. Oncogenic somatic gene fusions are most enriched for the 5’ and 3’ genes’ somatic amplification alongside a substantial proportion of other types of combinations. At the function level, 5’ partners of CSGFs appear more likely to be tumour suppressor genes while many 3’ partners appear to be proto-oncogene. Such distinct polarities of CSGFs at the evolutionary, structural, genomic and functional levels indicate the heterogeneous arsing mechanisms of CSGFs including R-loops and suggest potential novel targeted therapeutics specific to CSGF functional categories.

Creative Commons License All site content, except where otherwise noted, is licensed under a Creative Commons Attribution 3.0 License.
PII: 10734