Optimizing biomedical information retrieval with a keyword frequency-driven prompt enhancement strategy.

Biomedical question-answering Generative pretrained transformer Large language models Prompt enhancement strategy

Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
27 Aug 2024
Historique:
received: 26 04 2024
accepted: 15 08 2024
medline: 28 8 2024
pubmed: 28 8 2024
entrez: 27 8 2024
Statut: epublish

Résumé

Mining the vast pool of biomedical literature to extract accurate responses and relevant references is challenging due to the domain's interdisciplinary nature, specialized jargon, and continuous evolution. Early natural language processing (NLP) approaches often led to incorrect answers as they failed to comprehend the nuances of natural language. However, transformer models have significantly advanced the field by enabling the creation of large language models (LLMs), enhancing question-answering (QA) tasks. Despite these advances, current LLM-based solutions for specialized domains like biology and biomedicine still struggle to generate up-to-date responses while avoiding "hallucination" or generating plausible but factually incorrect responses. Our work focuses on enhancing prompts using a retrieval-augmented architecture to guide LLMs in generating meaningful responses for biomedical QA tasks. We evaluated two approaches: one relying on text embedding and vector similarity in a high-dimensional space, and our proposed method, which uses explicit signals in user queries to extract meaningful contexts. For robust evaluation, we tested these methods on 50 specific and challenging questions from diverse biomedical topics, comparing their performance against a baseline model, BM25. Retrieval performance of our method was significantly better than others, achieving a median Precision@10 of 0.95, which indicates the fraction of the top 10 retrieved chunks that are relevant. We used GPT-4, OpenAI's most advanced LLM to maximize the answer quality and manually accessed LLM-generated responses. Our method achieved a median answer quality score of 2.5, surpassing both the baseline model and the text embedding-based approach. We developed a QA bot, WeiseEule ( https://github.com/wasimaftab/WeiseEule-LocalHost ), which utilizes these methods for comparative analysis and also offers advanced features for review writing and identifying relevant articles for citation. Our findings highlight the importance of prompt enhancement methods that utilize explicit signals in user queries over traditional text embedding-based approaches to improve LLM-generated responses for specialized queries in specialized domains such as biology and biomedicine. By providing users complete control over the information fed into the LLM, our approach addresses some of the major drawbacks of existing web-based chatbots and LLM-based QA systems, including hallucinations and the generation of irrelevant or outdated responses.

Sections du résumé

BACKGROUND BACKGROUND
Mining the vast pool of biomedical literature to extract accurate responses and relevant references is challenging due to the domain's interdisciplinary nature, specialized jargon, and continuous evolution. Early natural language processing (NLP) approaches often led to incorrect answers as they failed to comprehend the nuances of natural language. However, transformer models have significantly advanced the field by enabling the creation of large language models (LLMs), enhancing question-answering (QA) tasks. Despite these advances, current LLM-based solutions for specialized domains like biology and biomedicine still struggle to generate up-to-date responses while avoiding "hallucination" or generating plausible but factually incorrect responses.
RESULTS RESULTS
Our work focuses on enhancing prompts using a retrieval-augmented architecture to guide LLMs in generating meaningful responses for biomedical QA tasks. We evaluated two approaches: one relying on text embedding and vector similarity in a high-dimensional space, and our proposed method, which uses explicit signals in user queries to extract meaningful contexts. For robust evaluation, we tested these methods on 50 specific and challenging questions from diverse biomedical topics, comparing their performance against a baseline model, BM25. Retrieval performance of our method was significantly better than others, achieving a median Precision@10 of 0.95, which indicates the fraction of the top 10 retrieved chunks that are relevant. We used GPT-4, OpenAI's most advanced LLM to maximize the answer quality and manually accessed LLM-generated responses. Our method achieved a median answer quality score of 2.5, surpassing both the baseline model and the text embedding-based approach. We developed a QA bot, WeiseEule ( https://github.com/wasimaftab/WeiseEule-LocalHost ), which utilizes these methods for comparative analysis and also offers advanced features for review writing and identifying relevant articles for citation.
CONCLUSIONS CONCLUSIONS
Our findings highlight the importance of prompt enhancement methods that utilize explicit signals in user queries over traditional text embedding-based approaches to improve LLM-generated responses for specialized queries in specialized domains such as biology and biomedicine. By providing users complete control over the information fed into the LLM, our approach addresses some of the major drawbacks of existing web-based chatbots and LLM-based QA systems, including hallucinations and the generation of irrelevant or outdated responses.

Identifiants

pubmed: 39192204
doi: 10.1186/s12859-024-05902-7
pii: 10.1186/s12859-024-05902-7
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

281

Subventions

Organisme : Deutsche Forschungsgemeinschaft
ID : SFB1064-Z04

Informations de copyright

© 2024. The Author(s).

Références

Cao Y, Liu F, Simpson P, Antieau L, Bennett A, Cimino JJ, Ely J, Yu H. AskHERMES: an online question answering system for complex clinical questions. J Biomed Inform. 2011;44(2):277–88.
doi: 10.1016/j.jbi.2011.01.004 pubmed: 21256977 pmcid: 3433744
Hristovski D, Dinevski D, Kastrin A, Rindflesch TC. Biomedical question answering using semantic relations. BMC Bioinform. 2015;16:1–14.
doi: 10.1186/s12859-014-0365-3
Mollá D, Vicedo JL. Question answering in restricted domains: an overview. Comput Linguist. 2007;33(1):41–61.
doi: 10.1162/coli.2007.33.1.41
Ni Y, Zhu H, Cai P, Zhang L, Qui Z, Cao F. CliniQA: highly reliable clinical question answering system. In: Quality of life through quality of information. IOS Press; 2012. pp. 215–219.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: Advances in neural information processing systems; 2017. vol. 30.
Devlin J, Chang M-W, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805 2018.
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. OpenAI blog. 2019;1(8):9.
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A. Language models are few-shot learners. In: Advances in neural information processing systems; 2020. vol. 33, pp. 1877–1901.
Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman FL, Almeida D, Altenschmidt J, Altman S, Anadkat S. Gpt-4 technical report. arXiv preprint arXiv:230308774 . 2023.
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res. 2020;21(140):1–67.
Introducing ChatGPT [ https://openai.com/blog/chatgpt . Accessed 21 March 2024].
Perplexity.ai [ https://en.wikipedia.org/w/index.php?title=Perplexity.ai&oldid=1214662444#cite_note-5 . Accessed 21 March 2024].
Jiang Z, Xu FF, Araki J, Neubig G. How can we know what language models know? Trans Assoc Comput Linguist. 2020;8:423–38.
doi: 10.1162/tacl_a_00324
Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y. Large language models are zero-shot reasoners. In: Advances in neural information processing systems; 2022. vol. 35, pp. 22199–22213.
Reynolds L, McDonell K. Prompt programming for large language models: Beyond the few-shot paradigm. In: 2021; 2021. pp. 1–7.
Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, Küttler H, Lewis M, Yih W-t, Rocktäschel T. Retrieval-augmented generation for knowledge-intensive nlp tasks. In: Advances in neural information processing systems; 2020. vol. 33, pp. 9459–9474.
Izacard G, Grave E. Leveraging passage retrieval with generative models for open domain question answering. arXiv preprint arXiv:200701282 . 2020.
Lazaridou A, Gribovskaya E, Stokowiec W, Grigorev N. Internet-augmented language models through few-shot prompting for open-domain question answering. arXiv preprint arXiv:220305115 . 2022.
Siriwardhana S, Weerasekera R, Wen E, Kaluarachchi T, Rana R, Nanayakkara S. Improving the domain adaptation of retrieval augmented generation (RAG) models for open domain question answering. Transactions of the Association for Computational Linguistics. 2023;11:1–17.
doi: 10.1162/tacl_a_00530
Xiong G, Jin Q, Lu Z, Zhang A. Benchmarking retrieval-augmented generation for medicine. arXiv preprint arXiv:240213178 . 2024.
Zakka C, Shad R, Chaurasia A, Dalal AR, Kim JL, Moor M, Fong R, Phillips C, Alexander K, Ashley E. Almanac—retrieval-augmented language models for clinical medicine. NEJM AI. 2024;1(2):Aloa2300068.
doi: 10.1056/AIoa2300068
Recursively split by character [ https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/recursive_text_splitter/ . Accessed 12 July 2024].
Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781 . 2013.
Pennington J, Socher R, Manning CD. Glove: global vectors for word representation. In: 2014; 2014. 1532–1543.
New and improved embedding model [ https://openai.com/blog/new-and-improved-embedding-model . Accessed 22 March 2024].
Neelakantan A, Xu T, Puri R, Radford A, Han JM, Tworek J, Yuan Q, Tezak N, Kim JW, Hallacy C. Text and code embeddings by contrastive pre-training. arXiv preprint arXiv:220110005 . 2022.
Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2019;36(4):1234–40.
doi: 10.1093/bioinformatics/btz682 pmcid: 7703786
Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon H, Liu T-Y. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Brief Bioinform. 2022;23(6):bbac409.
doi: 10.1093/bib/bbac409 pubmed: 36156661
Jin Q, Kim W, Chen Q, Comeau DC, Yeganova L, Wilbur WJ, Lu Z. MedCPT: Contrastive Pre-trained Transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval. Bioinformatics. 2023;39(11):btad651.
doi: 10.1093/bioinformatics/btad651 pubmed: 37930897 pmcid: 10627406
Pinecone overview [ https://docs.pinecone.io/guides/getting-started/overview . Accessed 22 March 2024].
Manning CD, Raghavan P, Schütze H. Introduction to information retrieval. Cambridge: Cambridge University Press; 2008.
doi: 10.1017/CBO9780511809071
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems; 2013. vol. 26.
Robertson S, Zaragoza H, Taylor M. Simple BM25 extension to multiple weighted fields. In: Proceedings of the thirteenth ACM international conference on Information and knowledge management; 2004. pp. 42–49.
A Python implementation of the BM25 ranking function. [ https://github.com/nhirakawa/BM25 , Accessed 12 July 2024].
Elasticsearch [ https://www.elastic.co/elasticsearch . Accessed 12 July 2024].
Alsentzer E, Murphy JR, Boag W, Weng W-H, Jin D, Naumann T, McDermott M. Publicly available clinical BERT embeddings. arXiv preprint arXiv:190403323 . 2019.
Beltagy I, Lo K, Cohan A. SciBERT: A pretrained language model for scientific text. arXiv preprint arXiv:190310676 . 2019
Chiu B, Crichton G, Korhonen A, Pyysalo S. How to train good word embeddings for biomedical NLP. In: 2016; 2016. pp. 166–174.

Auteurs

Wasim Aftab (W)

Core Facility Bioinformatics, Biomedical Center, LMU Munich, Grosshaderner Str. 9, 82152, Martinsried, Germany. wasim.aftab@med.uni-muenchen.de.

Zivkos Apostolou (Z)

Molecular Biology Division, Biomedical Center, LMU Munich, Grosshaderner Str. 9, 82152, Martinsried, Germany.

Karim Bouazoune (K)

Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, 16802, USA.

Tobias Straub (T)

Core Facility Bioinformatics, Biomedical Center, LMU Munich, Grosshaderner Str. 9, 82152, Martinsried, Germany. tstraub@bmc.med.lmu.de.

Articles similaires

Humans Electronic Health Records Pneumonia Data Mining Male
Humans Linguistics Language China Semantics

Using Artificial Intelligence to Support Informed Decision-Making on

Jennifer Webster, Jennifer Ghith, Orion Penner et al.
1.00
Humans Artificial Intelligence Proto-Oncogene Proteins B-raf Mutation Clinical Decision-Making

Classifications MeSH