Multi-objective genetic programming strategies for topic-based search with a focus on diversity and global recall.
Automatic query formulation
Diversity maximization
Diversity preservation
Global recall
Information retrieval
Information-theoretic fitness functions
Learning complex queries
Multi-objective genetic programming
Topic-based search
Journal
PeerJ. Computer science
ISSN: 2376-5992
Titre abrégé: PeerJ Comput Sci
Pays: United States
ID NLM: 101660598
Informations de publication
Date de publication:
2023
2023
Historique:
received:
25
05
2023
accepted:
30
10
2023
medline:
11
12
2023
pubmed:
11
12
2023
entrez:
11
12
2023
Statut:
epublish
Résumé
Topic-based search systems retrieve items by contextualizing the information seeking process on a topic of interest to the user. A key issue in topic-based search of text resources is how to automatically generate multiple queries that reflect the topic of interest in such a way that precision, recall, and diversity are achieved. The problem of generating topic-based queries can be effectively addressed by Multi-Objective Evolutionary Algorithms, which have shown promising results. However, two common problems with such an approach are loss of diversity and low global recall when combining results from multiple queries. This work proposes a family of Multi-Objective Genetic Programming strategies based on objective functions that attempt to maximize precision and recall while minimizing the similarity among the retrieved results. To this end, we define three novel objective functions based on result set similarity and on the information theoretic notion of entropy. Extensive experiments allow us to conclude that while the proposed strategies significantly improve precision after a few generations, only some of them are able to maintain or improve global recall. A comparative analysis against previous strategies based on Multi-Objective Evolutionary Algorithms, indicates that the proposed approach is superior in terms of precision and global recall. Furthermore, when compared to query-term-selection methods based on existing state-of-the-art term-weighting schemes, the presented Multi-Objective Genetic Programming strategies demonstrate significantly higher levels of precision, recall, and F1-score, while maintaining competitive global recall. Finally, we identify the strengths and limitations of the strategies and conclude that the choice of objectives to be maximized or minimized should be guided by the application at hand.
Identifiants
pubmed: 38077536
doi: 10.7717/peerj-cs.1710
pii: cs-1710
pmc: PMC10702999
doi:
Types de publication
Journal Article
Langues
eng
Pagination
e1710Informations de copyright
© 2023 Baggio et al.
Déclaration de conflit d'intérêts
Ana G. Maguitman is an Academic Editor of PeerJ Computer Science.
Références
Evol Comput. 2008 Fall;16(3):315-54
pubmed: 18811245
Evolution. 2006 May;60(5):991-1003
pubmed: 16817539
Evol Comput. 2000 Summer;8(2):223-47
pubmed: 10843522
BMC Evol Biol. 2010 Jul 07;10:205
pubmed: 20609254
ScientificWorldJournal. 2014 Feb 12;2014:539128
pubmed: 24672330