Optimizing biomedical information retrieval with a keyword frequency-driven prompt enhancement strategy.

Natural Language Processing Data Mining / methods Information Storage and Retrieval / methods

Biomedical question-answering Generative pretrained transformer Large language models Prompt enhancement strategy

Journal

BMC bioinformatics

ISSN: 1471-2105

Titre abrégé: BMC Bioinformatics

Pays: England

ID NLM: 100965194

Informations de publication

Date de publication:
27 Aug 2024

Historique:

received: 26 04 2024

accepted: 15 08 2024

medline: 28 8 2024

pubmed: 28 8 2024

entrez: 27 8 2024

Statut: epublish

Résumé

Mining the vast pool of biomedical literature to extract accurate responses and relevant references is challenging due to the domain's interdisciplinary nature, specialized jargon, and continuous evolution. Early natural language processing (NLP) approaches often led to incorrect answers as they failed to comprehend the nuances of natural language. However, transformer models have significantly advanced the field by enabling the creation of large language models (LLMs), enhancing question-answering (QA) tasks. Despite these advances, current LLM-based solutions for specialized domains like biology and biomedicine still struggle to generate up-to-date responses while avoiding "hallucination" or generating plausible but factually incorrect responses. Our work focuses on enhancing prompts using a retrieval-augmented architecture to guide LLMs in generating meaningful responses for biomedical QA tasks. We evaluated two approaches: one relying on text embedding and vector similarity in a high-dimensional space, and our proposed method, which uses explicit signals in user queries to extract meaningful contexts. For robust evaluation, we tested these methods on 50 specific and challenging questions from diverse biomedical topics, comparing their performance against a baseline model, BM25. Retrieval performance of our method was significantly better than others, achieving a median Precision@10 of 0.95, which indicates the fraction of the top 10 retrieved chunks that are relevant. We used GPT-4, OpenAI's most advanced LLM to maximize the answer quality and manually accessed LLM-generated responses. Our method achieved a median answer quality score of 2.5, surpassing both the baseline model and the text embedding-based approach. We developed a QA bot, WeiseEule ( https://github.com/wasimaftab/WeiseEule-LocalHost ), which utilizes these methods for comparative analysis and also offers advanced features for review writing and identifying relevant articles for citation. Our findings highlight the importance of prompt enhancement methods that utilize explicit signals in user queries over traditional text embedding-based approaches to improve LLM-generated responses for specialized queries in specialized domains such as biology and biomedicine. By providing users complete control over the information fed into the LLM, our approach addresses some of the major drawbacks of existing web-based chatbots and LLM-based QA systems, including hallucinations and the generation of irrelevant or outdated responses.

Sections du résumé

BACKGROUND BACKGROUND

RESULTS RESULTS

Our work focuses on enhancing prompts using a retrieval-augmented architecture to guide LLMs in generating meaningful responses for biomedical QA tasks. We evaluated two approaches: one relying on text embedding and vector similarity in a high-dimensional space, and our proposed method, which uses explicit signals in user queries to extract meaningful contexts. For robust evaluation, we tested these methods on 50 specific and challenging questions from diverse biomedical topics, comparing their performance against a baseline model, BM25. Retrieval performance of our method was significantly better than others, achieving a median Precision@10 of 0.95, which indicates the fraction of the top 10 retrieved chunks that are relevant. We used GPT-4, OpenAI's most advanced LLM to maximize the answer quality and manually accessed LLM-generated responses. Our method achieved a median answer quality score of 2.5, surpassing both the baseline model and the text embedding-based approach. We developed a QA bot, WeiseEule ( https://github.com/wasimaftab/WeiseEule-LocalHost ), which utilizes these methods for comparative analysis and also offers advanced features for review writing and identifying relevant articles for citation.

CONCLUSIONS CONCLUSIONS

Our findings highlight the importance of prompt enhancement methods that utilize explicit signals in user queries over traditional text embedding-based approaches to improve LLM-generated responses for specialized queries in specialized domains such as biology and biomedicine. By providing users complete control over the information fed into the LLM, our approach addresses some of the major drawbacks of existing web-based chatbots and LLM-based QA systems, including hallucinations and the generation of irrelevant or outdated responses.

Identifiants

DOI: 10.1186/s12859-024-05902-7 PMID: 39192204

pubmed: 39192204

doi: 10.1186/s12859-024-05902-7

pii: 10.1186/s12859-024-05902-7

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

281

Subventions

Organisme : Deutsche Forschungsgemeinschaft

ID : SFB1064-Z04

Informations de copyright

Références

Cao Y, Liu F, Simpson P, Antieau L, Bennett A, Cimino JJ, Ely J, Yu H. AskHERMES: an online question answering system for complex clinical questions. J Biomed Inform. 2011;44(2):277–88.

doi: 10.1016/j.jbi.2011.01.004 pubmed: 21256977 pmcid: 3433744

Hristovski D, Dinevski D, Kastrin A, Rindflesch TC. Biomedical question answering using semantic relations. BMC Bioinform. 2015;16:1–14.

doi: 10.1186/s12859-014-0365-3

Mollá D, Vicedo JL. Question answering in restricted domains: an overview. Comput Linguist. 2007;33(1):41–61.

doi: 10.1162/coli.2007.33.1.41

Ni Y, Zhu H, Cai P, Zhang L, Qui Z, Cao F. CliniQA: highly reliable clinical question answering system. In: Quality of life through quality of information. IOS Press; 2012. pp. 215–219.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: Advances in neural information processing systems; 2017. vol. 30.

Devlin J, Chang M-W, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805 2018.

Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. OpenAI blog. 2019;1(8):9.

Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A. Language models are few-shot learners. In: Advances in neural information processing systems; 2020. vol. 33, pp. 1877–1901.

Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman FL, Almeida D, Altenschmidt J, Altman S, Anadkat S. Gpt-4 technical report. arXiv preprint arXiv:230308774 . 2023.

Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res. 2020;21(140):1–67.

Introducing ChatGPT [ https://openai.com/blog/chatgpt . Accessed 21 March 2024].

Perplexity.ai [ https://en.wikipedia.org/w/index.php?title=Perplexity.ai&oldid=1214662444#cite_note-5 . Accessed 21 March 2024].

Jiang Z, Xu FF, Araki J, Neubig G. How can we know what language models know? Trans Assoc Comput Linguist. 2020;8:423–38.

doi: 10.1162/tacl_a_00324

Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y. Large language models are zero-shot reasoners. In: Advances in neural information processing systems; 2022. vol. 35, pp. 22199–22213.

Reynolds L, McDonell K. Prompt programming for large language models: Beyond the few-shot paradigm. In: 2021; 2021. pp. 1–7.

Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, Küttler H, Lewis M, Yih W-t, Rocktäschel T. Retrieval-augmented generation for knowledge-intensive nlp tasks. In: Advances in neural information processing systems; 2020. vol. 33, pp. 9459–9474.

Izacard G, Grave E. Leveraging passage retrieval with generative models for open domain question answering. arXiv preprint arXiv:200701282 . 2020.

Lazaridou A, Gribovskaya E, Stokowiec W, Grigorev N. Internet-augmented language models through few-shot prompting for open-domain question answering. arXiv preprint arXiv:220305115 . 2022.

Siriwardhana S, Weerasekera R, Wen E, Kaluarachchi T, Rana R, Nanayakkara S. Improving the domain adaptation of retrieval augmented generation (RAG) models for open domain question answering. Transactions of the Association for Computational Linguistics. 2023;11:1–17.

doi: 10.1162/tacl_a_00530

Xiong G, Jin Q, Lu Z, Zhang A. Benchmarking retrieval-augmented generation for medicine. arXiv preprint arXiv:240213178 . 2024.

Zakka C, Shad R, Chaurasia A, Dalal AR, Kim JL, Moor M, Fong R, Phillips C, Alexander K, Ashley E. Almanac—retrieval-augmented language models for clinical medicine. NEJM AI. 2024;1(2):Aloa2300068.

doi: 10.1056/AIoa2300068

Recursively split by character [ https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/recursive_text_splitter/ . Accessed 12 July 2024].

Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781 . 2013.

Pennington J, Socher R, Manning CD. Glove: global vectors for word representation. In: 2014; 2014. 1532–1543.

New and improved embedding model [ https://openai.com/blog/new-and-improved-embedding-model . Accessed 22 March 2024].

Neelakantan A, Xu T, Puri R, Radford A, Han JM, Tworek J, Yuan Q, Tezak N, Kim JW, Hallacy C. Text and code embeddings by contrastive pre-training. arXiv preprint arXiv:220110005 . 2022.

Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2019;36(4):1234–40.

doi: 10.1093/bioinformatics/btz682 pmcid: 7703786

Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon H, Liu T-Y. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Brief Bioinform. 2022;23(6):bbac409.

doi: 10.1093/bib/bbac409 pubmed: 36156661

Jin Q, Kim W, Chen Q, Comeau DC, Yeganova L, Wilbur WJ, Lu Z. MedCPT: Contrastive Pre-trained Transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval. Bioinformatics. 2023;39(11):btad651.

doi: 10.1093/bioinformatics/btad651 pubmed: 37930897 pmcid: 10627406

Pinecone overview [ https://docs.pinecone.io/guides/getting-started/overview . Accessed 22 March 2024].

Manning CD, Raghavan P, Schütze H. Introduction to information retrieval. Cambridge: Cambridge University Press; 2008.

doi: 10.1017/CBO9780511809071

Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems; 2013. vol. 26.

Robertson S, Zaragoza H, Taylor M. Simple BM25 extension to multiple weighted fields. In: Proceedings of the thirteenth ACM international conference on Information and knowledge management; 2004. pp. 42–49.

A Python implementation of the BM25 ranking function. [ https://github.com/nhirakawa/BM25 , Accessed 12 July 2024].

Elasticsearch [ https://www.elastic.co/elasticsearch . Accessed 12 July 2024].

Alsentzer E, Murphy JR, Boag W, Weng W-H, Jin D, Naumann T, McDermott M. Publicly available clinical BERT embeddings. arXiv preprint arXiv:190403323 . 2019.

Beltagy I, Lo K, Cohan A. SciBERT: A pretrained language model for scientific text. arXiv preprint arXiv:190310676 . 2019

Chiu B, Crichton G, Korhonen A, Pyysalo S. How to train good word embeddings for biomedical NLP. In: 2016; 2016. pp. 166–174.

Optimizing biomedical information retrieval with a keyword frequency-driven prompt enhancement strategy.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Informations de copyright

Références

Auteurs

Wasim Aftab (W)

Zivkos Apostolou (Z)

Karim Bouazoune (K)

Tobias Straub (T)

Articles similaires

Robust extraction of pneumonia-associated clinical states from electronic health records.

Multifaceted Natural Language Processing Task-Based Evaluation of Bidirectional Encoder Representations From Transformers Models for Bilingual (Korean and English) Clinical Notes: Algorithm Development and Validation.

Evaluating LLMs' grammatical error correction performance in learner Chinese.

Using Artificial Intelligence to Support Informed Decision-Making on

Classifications MeSH