Google trends in infodemiology: Methodological steps to avoid irreproducible results and invalid conclusions.


Journal

International journal of medical informatics
ISSN: 1872-8243
Titre abrégé: Int J Med Inform
Pays: Ireland
ID NLM: 9711057

Informations de publication

Date de publication:
21 Jul 2024
Historique:
received: 11 10 2023
revised: 10 07 2024
accepted: 20 07 2024
medline: 24 7 2024
pubmed: 24 7 2024
entrez: 23 7 2024
Statut: aheadofprint

Résumé

Google Trends is a widely used tool for infodemiological surveys. However, irregularities in the random sampling and aggregation algorithms compromise the reliability of the relative search volume (RSV) and the regional online interest (ROI). The study aims to unmask methodological criticalities commonly ignored in carrying out infodemiological surveys via Google Trends. A guide to avoiding these shortcomings is also provided. The Google Topic "Coronavirus disease 2019" has been investigated using different timelapses, categories, and IP addresses. The same samples were manually collected multiple times to evaluate the RSV and ROI stability. Stability was estimated through indicators of variability (e.g., coefficient of percentage variation "CV%" and its 4-surprisal interval "4-I"). The content aggregation capacity of the algorithms relating to topics and categories was evaluated through the quantitative analysis of RSV and ROI and the qualitative examination of the related queries. The stability of Google Trends' RSV and ROI is not linked exclusively to the dataset dimension or the IP address. Subregional datasets can be highly unstable (e.g., CV% = 10, 4-I: [8,13]). Google Trends categories and topics can exclude relevant queries or include unnecessary queries. The statistical scenario is consistent with the following hypotheses: i) datasets containing too few queries are highly unstable, ii) the "interest over time" data format is generally reliable for evaluating trends and correlations, iii) Google Trends improvements have altered the RSV historical trends. Google Trends can be an effective and efficient infodemiological tool as long as the reliability of web search indexes is appropriately analyzed and weighted for the scientific goal. The methodological steps discussed in this study are critical to drawing valid and relevant scientific conclusions.

Sections du résumé

BACKGROUND BACKGROUND
Google Trends is a widely used tool for infodemiological surveys. However, irregularities in the random sampling and aggregation algorithms compromise the reliability of the relative search volume (RSV) and the regional online interest (ROI).
OBJECTIVE OBJECTIVE
The study aims to unmask methodological criticalities commonly ignored in carrying out infodemiological surveys via Google Trends. A guide to avoiding these shortcomings is also provided.
MATERIAL AND METHODS METHODS
The Google Topic "Coronavirus disease 2019" has been investigated using different timelapses, categories, and IP addresses. The same samples were manually collected multiple times to evaluate the RSV and ROI stability. Stability was estimated through indicators of variability (e.g., coefficient of percentage variation "CV%" and its 4-surprisal interval "4-I"). The content aggregation capacity of the algorithms relating to topics and categories was evaluated through the quantitative analysis of RSV and ROI and the qualitative examination of the related queries.
RESULTS RESULTS
The stability of Google Trends' RSV and ROI is not linked exclusively to the dataset dimension or the IP address. Subregional datasets can be highly unstable (e.g., CV% = 10, 4-I: [8,13]). Google Trends categories and topics can exclude relevant queries or include unnecessary queries. The statistical scenario is consistent with the following hypotheses: i) datasets containing too few queries are highly unstable, ii) the "interest over time" data format is generally reliable for evaluating trends and correlations, iii) Google Trends improvements have altered the RSV historical trends.
CONCLUSIONS CONCLUSIONS
Google Trends can be an effective and efficient infodemiological tool as long as the reliability of web search indexes is appropriately analyzed and weighted for the scientific goal. The methodological steps discussed in this study are critical to drawing valid and relevant scientific conclusions.

Identifiants

pubmed: 39043059
pii: S1386-5056(24)00226-0
doi: 10.1016/j.ijmedinf.2024.105563
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

105563

Informations de copyright

Copyright © 2024 Elsevier B.V. All rights reserved.

Déclaration de conflit d'intérêts

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Alessandro Rovetta (A)

Redeev SRL, Napoli 80121, Italy. Electronic address: alessandrorovetta@redeev.com.

Classifications MeSH