A supervised term ranking model for diversity enhanced biomedical information retrieval.
Biomedical information retrieval
Diversity-oriented retrieval
Learning to rank
Machine learning
Supervised query expansion
Term ranking model
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
02 Dec 2019
02 Dec 2019
Historique:
entrez:
3
12
2019
pubmed:
4
12
2019
medline:
6
2
2020
Statut:
epublish
Résumé
The number of biomedical research articles have increased exponentially with the advancement of biomedicine in recent years. These articles have thus brought a great difficulty in obtaining the needed information of researchers. Information retrieval technologies seek to tackle the problem. However, information needs cannot be completely satisfied by directly introducing the existing information retrieval techniques. Therefore, biomedical information retrieval not only focuses on the relevance of search results, but also aims to promote the completeness of the results, which is referred as the diversity-oriented retrieval. We address the diversity-oriented biomedical retrieval task using a supervised term ranking model. The model is learned through a supervised query expansion process for term refinement. Based on the model, the most relevant and diversified terms are selected to enrich the original query. The expanded query is then fed into a second retrieval to improve the relevance and diversity of search results. To this end, we propose three diversity-oriented optimization strategies in our model, including the diversified term labeling strategy, the biomedical resource-based term features and a diversity-oriented group sampling learning method. Experimental results on TREC Genomics collections demonstrate the effectiveness of the proposed model in improving the relevance and the diversity of search results. The proposed three strategies jointly contribute to the improvement of biomedical retrieval performance. Our model yields more relevant and diversified results than the state-of-the-art baseline models. Moreover, our method provides a general framework for improving biomedical retrieval performance, and can be used as the basis for future work.
Sections du résumé
BACKGROUND
BACKGROUND
The number of biomedical research articles have increased exponentially with the advancement of biomedicine in recent years. These articles have thus brought a great difficulty in obtaining the needed information of researchers. Information retrieval technologies seek to tackle the problem. However, information needs cannot be completely satisfied by directly introducing the existing information retrieval techniques. Therefore, biomedical information retrieval not only focuses on the relevance of search results, but also aims to promote the completeness of the results, which is referred as the diversity-oriented retrieval.
RESULTS
RESULTS
We address the diversity-oriented biomedical retrieval task using a supervised term ranking model. The model is learned through a supervised query expansion process for term refinement. Based on the model, the most relevant and diversified terms are selected to enrich the original query. The expanded query is then fed into a second retrieval to improve the relevance and diversity of search results. To this end, we propose three diversity-oriented optimization strategies in our model, including the diversified term labeling strategy, the biomedical resource-based term features and a diversity-oriented group sampling learning method. Experimental results on TREC Genomics collections demonstrate the effectiveness of the proposed model in improving the relevance and the diversity of search results.
CONCLUSIONS
CONCLUSIONS
The proposed three strategies jointly contribute to the improvement of biomedical retrieval performance. Our model yields more relevant and diversified results than the state-of-the-art baseline models. Moreover, our method provides a general framework for improving biomedical retrieval performance, and can be used as the basis for future work.
Identifiants
pubmed: 31787087
doi: 10.1186/s12859-019-3080-2
pii: 10.1186/s12859-019-3080-2
pmc: PMC6886246
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
590Références
Database (Oxford). 2017 Jan 1;2017:
pubmed: 29220453
Database (Oxford). 2018 Jan 1;2018:
pubmed: 29688379
Proc AMIA Symp. 2001;:17-21
pubmed: 11825149
ScientificWorldJournal. 2014 Mar 02;2014:132158
pubmed: 24723793
Database (Oxford). 2017 Jan 1;2017:null
pubmed: 31725862
J Biomed Inform. 2015 Dec;58:70-79
pubmed: 26429592
J Biomed Inform. 2018 Nov;87:12-20
pubmed: 30217670
J Biomed Inform. 2014 Jun;49:275-81
pubmed: 24680983
IEEE/ACM Trans Comput Biol Bioinform. 2018 Feb 02;:null
pubmed: 29994423
BMC Bioinformatics. 2010 Apr 28;11:212
pubmed: 20426836