Google trends in infodemiology: Methodological steps to avoid irreproducible results and invalid conclusions.
Journal
International journal of medical informatics
ISSN: 1872-8243
Titre abrégé: Int J Med Inform
Pays: Ireland
ID NLM: 9711057
Informations de publication
Date de publication:
21 Jul 2024
21 Jul 2024
Historique:
received:
11
10
2023
revised:
10
07
2024
accepted:
20
07
2024
medline:
24
7
2024
pubmed:
24
7
2024
entrez:
23
7
2024
Statut:
aheadofprint
Résumé
Google Trends is a widely used tool for infodemiological surveys. However, irregularities in the random sampling and aggregation algorithms compromise the reliability of the relative search volume (RSV) and the regional online interest (ROI). The study aims to unmask methodological criticalities commonly ignored in carrying out infodemiological surveys via Google Trends. A guide to avoiding these shortcomings is also provided. The Google Topic "Coronavirus disease 2019" has been investigated using different timelapses, categories, and IP addresses. The same samples were manually collected multiple times to evaluate the RSV and ROI stability. Stability was estimated through indicators of variability (e.g., coefficient of percentage variation "CV%" and its 4-surprisal interval "4-I"). The content aggregation capacity of the algorithms relating to topics and categories was evaluated through the quantitative analysis of RSV and ROI and the qualitative examination of the related queries. The stability of Google Trends' RSV and ROI is not linked exclusively to the dataset dimension or the IP address. Subregional datasets can be highly unstable (e.g., CV% = 10, 4-I: [8,13]). Google Trends categories and topics can exclude relevant queries or include unnecessary queries. The statistical scenario is consistent with the following hypotheses: i) datasets containing too few queries are highly unstable, ii) the "interest over time" data format is generally reliable for evaluating trends and correlations, iii) Google Trends improvements have altered the RSV historical trends. Google Trends can be an effective and efficient infodemiological tool as long as the reliability of web search indexes is appropriately analyzed and weighted for the scientific goal. The methodological steps discussed in this study are critical to drawing valid and relevant scientific conclusions.
Sections du résumé
BACKGROUND
BACKGROUND
Google Trends is a widely used tool for infodemiological surveys. However, irregularities in the random sampling and aggregation algorithms compromise the reliability of the relative search volume (RSV) and the regional online interest (ROI).
OBJECTIVE
OBJECTIVE
The study aims to unmask methodological criticalities commonly ignored in carrying out infodemiological surveys via Google Trends. A guide to avoiding these shortcomings is also provided.
MATERIAL AND METHODS
METHODS
The Google Topic "Coronavirus disease 2019" has been investigated using different timelapses, categories, and IP addresses. The same samples were manually collected multiple times to evaluate the RSV and ROI stability. Stability was estimated through indicators of variability (e.g., coefficient of percentage variation "CV%" and its 4-surprisal interval "4-I"). The content aggregation capacity of the algorithms relating to topics and categories was evaluated through the quantitative analysis of RSV and ROI and the qualitative examination of the related queries.
RESULTS
RESULTS
The stability of Google Trends' RSV and ROI is not linked exclusively to the dataset dimension or the IP address. Subregional datasets can be highly unstable (e.g., CV% = 10, 4-I: [8,13]). Google Trends categories and topics can exclude relevant queries or include unnecessary queries. The statistical scenario is consistent with the following hypotheses: i) datasets containing too few queries are highly unstable, ii) the "interest over time" data format is generally reliable for evaluating trends and correlations, iii) Google Trends improvements have altered the RSV historical trends.
CONCLUSIONS
CONCLUSIONS
Google Trends can be an effective and efficient infodemiological tool as long as the reliability of web search indexes is appropriately analyzed and weighted for the scientific goal. The methodological steps discussed in this study are critical to drawing valid and relevant scientific conclusions.
Identifiants
pubmed: 39043059
pii: S1386-5056(24)00226-0
doi: 10.1016/j.ijmedinf.2024.105563
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
105563Informations de copyright
Copyright © 2024 Elsevier B.V. All rights reserved.
Déclaration de conflit d'intérêts
Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.