A supervised term ranking model for diversity enhanced biomedical information retrieval.

Biomedical information retrieval Diversity-oriented retrieval Learning to rank Machine learning Supervised query expansion Term ranking model

Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
02 Dec 2019
Historique:
entrez: 3 12 2019
pubmed: 4 12 2019
medline: 6 2 2020
Statut: epublish

Résumé

The number of biomedical research articles have increased exponentially with the advancement of biomedicine in recent years. These articles have thus brought a great difficulty in obtaining the needed information of researchers. Information retrieval technologies seek to tackle the problem. However, information needs cannot be completely satisfied by directly introducing the existing information retrieval techniques. Therefore, biomedical information retrieval not only focuses on the relevance of search results, but also aims to promote the completeness of the results, which is referred as the diversity-oriented retrieval. We address the diversity-oriented biomedical retrieval task using a supervised term ranking model. The model is learned through a supervised query expansion process for term refinement. Based on the model, the most relevant and diversified terms are selected to enrich the original query. The expanded query is then fed into a second retrieval to improve the relevance and diversity of search results. To this end, we propose three diversity-oriented optimization strategies in our model, including the diversified term labeling strategy, the biomedical resource-based term features and a diversity-oriented group sampling learning method. Experimental results on TREC Genomics collections demonstrate the effectiveness of the proposed model in improving the relevance and the diversity of search results. The proposed three strategies jointly contribute to the improvement of biomedical retrieval performance. Our model yields more relevant and diversified results than the state-of-the-art baseline models. Moreover, our method provides a general framework for improving biomedical retrieval performance, and can be used as the basis for future work.

Sections du résumé

BACKGROUND BACKGROUND
The number of biomedical research articles have increased exponentially with the advancement of biomedicine in recent years. These articles have thus brought a great difficulty in obtaining the needed information of researchers. Information retrieval technologies seek to tackle the problem. However, information needs cannot be completely satisfied by directly introducing the existing information retrieval techniques. Therefore, biomedical information retrieval not only focuses on the relevance of search results, but also aims to promote the completeness of the results, which is referred as the diversity-oriented retrieval.
RESULTS RESULTS
We address the diversity-oriented biomedical retrieval task using a supervised term ranking model. The model is learned through a supervised query expansion process for term refinement. Based on the model, the most relevant and diversified terms are selected to enrich the original query. The expanded query is then fed into a second retrieval to improve the relevance and diversity of search results. To this end, we propose three diversity-oriented optimization strategies in our model, including the diversified term labeling strategy, the biomedical resource-based term features and a diversity-oriented group sampling learning method. Experimental results on TREC Genomics collections demonstrate the effectiveness of the proposed model in improving the relevance and the diversity of search results.
CONCLUSIONS CONCLUSIONS
The proposed three strategies jointly contribute to the improvement of biomedical retrieval performance. Our model yields more relevant and diversified results than the state-of-the-art baseline models. Moreover, our method provides a general framework for improving biomedical retrieval performance, and can be used as the basis for future work.

Identifiants

pubmed: 31787087
doi: 10.1186/s12859-019-3080-2
pii: 10.1186/s12859-019-3080-2
pmc: PMC6886246
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

590

Références

Database (Oxford). 2017 Jan 1;2017:
pubmed: 29220453
Database (Oxford). 2018 Jan 1;2018:
pubmed: 29688379
Proc AMIA Symp. 2001;:17-21
pubmed: 11825149
ScientificWorldJournal. 2014 Mar 02;2014:132158
pubmed: 24723793
Database (Oxford). 2017 Jan 1;2017:null
pubmed: 31725862
J Biomed Inform. 2015 Dec;58:70-79
pubmed: 26429592
J Biomed Inform. 2018 Nov;87:12-20
pubmed: 30217670
J Biomed Inform. 2014 Jun;49:275-81
pubmed: 24680983
IEEE/ACM Trans Comput Biol Bioinform. 2018 Feb 02;:null
pubmed: 29994423
BMC Bioinformatics. 2010 Apr 28;11:212
pubmed: 20426836

Auteurs

Bo Xu (B)

Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Linggong Road, Dalian, People's Republic of China. xubo@dlut.edu.cn.
State Key Laboratory of Cognitive Intelligence,iFLYTEK, Hefei, People's Republic of China. xubo@dlut.edu.cn.

Hongfei Lin (H)

Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Linggong Road, Dalian, People's Republic of China. hflin@dlut.edu.cn.

Liang Yang (L)

Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Linggong Road, Dalian, People's Republic of China.

Kan Xu (K)

Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Linggong Road, Dalian, People's Republic of China.

Yijia Zhang (Y)

Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Linggong Road, Dalian, People's Republic of China.

Dongyu Zhang (D)

Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Linggong Road, Dalian, People's Republic of China.

Zhihao Yang (Z)

Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Linggong Road, Dalian, People's Republic of China.

Jian Wang (J)

Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Linggong Road, Dalian, People's Republic of China.

Yuan Lin (Y)

WISE Lab, School of Public Administration and Law, Dalian University of Technology, Linggong Road, Dalian, People's Republic of China.

Fuliang Yin (F)

Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Linggong Road, Dalian, People's Republic of China.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Coal Metagenome Phylogeny Bacteria Genome, Bacterial
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature
Vancomycin Polyesters Anti-Bacterial Agents Models, Theoretical Drug Liberation

Classifications MeSH