Research Papers:
A framework for exploring associations between biomedical terms in PubMed
PDF | HTML | Supplementary Files | How to cite
Metrics: PDF 1317 views | HTML 2014 views | ?
Abstract
Haixiu Yang1,*, Lingling Zhao2,*, Ying Zhang3, Hong Ju4, Dong Wang5, Yang Hu6, Jun Zhang3 and Liang Cheng1
1College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, PR China
2School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, PR China
3Department of Rehabilitation and Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin 150088, PR China
4Department of Information Engineering, Heilongjiang Biological Science and Technology Career Academy, Harbin 150001, PR China
5Department of Academic Research, Heilongjiang University of Science & Technology, Harbin 150022, PR China
6School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, PR China
*These authors contributed equally to this work
Correspondence to:
Liang Cheng, email: [email protected]
Jun Zhang, email: [email protected]
Yang Hu, email: [email protected]
Keywords: co-occurrence relationship, text mining, framework, term association
Received: June 12, 2017 Accepted: September 08, 2017 Published: October 05, 2017
ABSTRACT
Co-occurrence relationships in PubMed between terms accelerate the recognition of term associations. The lack of manually curated relationships in vocabularies and the rapid increase of biomedical literatures highlight the importance of co-occurrence relationships. Here we proposed a framework to explore term associations based on a standard procedure that comprises multiple tools of text mining and relationship degree calculation methods. The text of PubMed were segmented into sentences by Apache OpenNLP first, and then terms of sentences were recognized by MGREP. After that two terms occurring in a common sentence were identified as a co-occurrence relationship. The relationship degree is then calculated using Normalized MEDLINE Distance (NMD) or relationship-scaled score (RSS) method. The framework was utilized in exploring associations between terms of Gene Ontology (GO) and Disease Ontology (DO) based on co-occurrence relationship. Results show that pairs of terms with more co-occurrence relationships indicate shared more semantic relationships of ontology and genes. The identified association terms based on co-occurrence relationships were applied in constructing a disease association network (DAN). The small giant component confirms with the observation that diseases in the same class have more linkage than diseases in different classes.
All site content, except where otherwise noted, is licensed under a Creative Commons Attribution 4.0 License.
PII: 21532