Research Papers:

A framework for exploring associations between biomedical terms in PubMed

Haixiu Yang, Lingling Zhao, Ying Zhang, Hong Ju, Dong Wang, Yang Hu, Jun Zhang and Liang Cheng _

PDF  |  HTML  |  Supplementary Files  |  How to cite

Oncotarget. 2017; 8:103100-103107. https://doi.org/10.18632/oncotarget.21532

Metrics: PDF 1128 views  |   HTML 1775 views  |   ?  


Haixiu Yang1,*, Lingling Zhao2,*, Ying Zhang3, Hong Ju4, Dong Wang5, Yang Hu6, Jun Zhang3 and Liang Cheng1

1College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, PR China

2School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, PR China

3Department of Rehabilitation and Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin 150088, PR China

4Department of Information Engineering, Heilongjiang Biological Science and Technology Career Academy, Harbin 150001, PR China

5Department of Academic Research, Heilongjiang University of Science & Technology, Harbin 150022, PR China

6School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, PR China

*These authors contributed equally to this work

Correspondence to:

Liang Cheng, email: [email protected]

Jun Zhang, email: [email protected]

Yang Hu, email: [email protected]

Keywords: co-occurrence relationship, text mining, framework, term association

Received: June 12, 2017     Accepted: September 08, 2017     Published: October 05, 2017


Co-occurrence relationships in PubMed between terms accelerate the recognition of term associations. The lack of manually curated relationships in vocabularies and the rapid increase of biomedical literatures highlight the importance of co-occurrence relationships. Here we proposed a framework to explore term associations based on a standard procedure that comprises multiple tools of text mining and relationship degree calculation methods. The text of PubMed were segmented into sentences by Apache OpenNLP first, and then terms of sentences were recognized by MGREP. After that two terms occurring in a common sentence were identified as a co-occurrence relationship. The relationship degree is then calculated using Normalized MEDLINE Distance (NMD) or relationship-scaled score (RSS) method. The framework was utilized in exploring associations between terms of Gene Ontology (GO) and Disease Ontology (DO) based on co-occurrence relationship. Results show that pairs of terms with more co-occurrence relationships indicate shared more semantic relationships of ontology and genes. The identified association terms based on co-occurrence relationships were applied in constructing a disease association network (DAN). The small giant component confirms with the observation that diseases in the same class have more linkage than diseases in different classes.

Creative Commons License All site content, except where otherwise noted, is licensed under a Creative Commons Attribution 4.0 License.
PII: 21532