A New Natural Language Processing-Inspired Methodology (Detection, Initial Characterization, and Semantic Characterization) to Investigate Temporal Shifts (Drifts) in Health Care Data: Quantitative Study.


Journal

JMIR medical informatics
ISSN: 2291-9694
Titre abrégé: JMIR Med Inform
Pays: Canada
ID NLM: 101645109

Informations de publication

Date de publication:
28 Oct 2024
Historique:
received: 02 11 2023
accepted: 07 07 2024
revised: 30 05 2024
medline: 28 10 2024
pubmed: 28 10 2024
entrez: 28 10 2024
Statut: epublish

Résumé

Proper analysis and interpretation of health care data can significantly improve patient outcomes by enhancing services and revealing the impacts of new technologies and treatments. Understanding the substantial impact of temporal shifts in these data is crucial. For example, COVID-19 vaccination initially lowered the mean age of at-risk patients and later changed the characteristics of those who died. This highlights the importance of understanding these shifts for assessing factors that affect patient outcomes. This study aims to propose detection, initial characterization, and semantic characterization (DIS), a new methodology for analyzing changes in health outcomes and variables over time while discovering contextual changes for outcomes in large volumes of data. The DIS methodology involves 3 steps: detection, initial characterization, and semantic characterization. Detection uses metrics such as Jensen-Shannon divergence to identify significant data drifts. Initial characterization offers a global analysis of changes in data distribution and predictive feature significance over time. Semantic characterization uses natural language processing-inspired techniques to understand the local context of these changes, helping identify factors driving changes in patient outcomes. By integrating the outcomes from these 3 steps, our results can identify specific factors (eg, interventions and modifications in health care practices) that drive changes in patient outcomes. DIS was applied to the Brazilian COVID-19 Registry and the Medical Information Mart for Intensive Care, version IV (MIMIC-IV) data sets. Our approach allowed us to (1) identify drifts effectively, especially using metrics such as the Jensen-Shannon divergence, and (2) uncover reasons for the decline in overall mortality in both the COVID-19 and MIMIC-IV data sets, as well as changes in the cooccurrence between different diseases and this particular outcome. Factors such as vaccination during the COVID-19 pandemic and reduced iatrogenic events and cancer-related deaths in MIMIC-IV were highlighted. The methodology also pinpointed shifts in patient demographics and disease patterns, providing insights into the evolving health care landscape during the study period. We developed a novel methodology combining machine learning and natural language processing techniques to detect, characterize, and understand temporal shifts in health care data. This understanding can enhance predictive algorithms, improve patient outcomes, and optimize health care resource allocation, ultimately improving the effectiveness of machine learning predictive algorithms applied to health care data. Our methodology can be applied to a variety of scenarios beyond those discussed in this paper.

Sections du résumé

BACKGROUND BACKGROUND
Proper analysis and interpretation of health care data can significantly improve patient outcomes by enhancing services and revealing the impacts of new technologies and treatments. Understanding the substantial impact of temporal shifts in these data is crucial. For example, COVID-19 vaccination initially lowered the mean age of at-risk patients and later changed the characteristics of those who died. This highlights the importance of understanding these shifts for assessing factors that affect patient outcomes.
OBJECTIVE OBJECTIVE
This study aims to propose detection, initial characterization, and semantic characterization (DIS), a new methodology for analyzing changes in health outcomes and variables over time while discovering contextual changes for outcomes in large volumes of data.
METHODS METHODS
The DIS methodology involves 3 steps: detection, initial characterization, and semantic characterization. Detection uses metrics such as Jensen-Shannon divergence to identify significant data drifts. Initial characterization offers a global analysis of changes in data distribution and predictive feature significance over time. Semantic characterization uses natural language processing-inspired techniques to understand the local context of these changes, helping identify factors driving changes in patient outcomes. By integrating the outcomes from these 3 steps, our results can identify specific factors (eg, interventions and modifications in health care practices) that drive changes in patient outcomes. DIS was applied to the Brazilian COVID-19 Registry and the Medical Information Mart for Intensive Care, version IV (MIMIC-IV) data sets.
RESULTS RESULTS
Our approach allowed us to (1) identify drifts effectively, especially using metrics such as the Jensen-Shannon divergence, and (2) uncover reasons for the decline in overall mortality in both the COVID-19 and MIMIC-IV data sets, as well as changes in the cooccurrence between different diseases and this particular outcome. Factors such as vaccination during the COVID-19 pandemic and reduced iatrogenic events and cancer-related deaths in MIMIC-IV were highlighted. The methodology also pinpointed shifts in patient demographics and disease patterns, providing insights into the evolving health care landscape during the study period.
CONCLUSIONS CONCLUSIONS
We developed a novel methodology combining machine learning and natural language processing techniques to detect, characterize, and understand temporal shifts in health care data. This understanding can enhance predictive algorithms, improve patient outcomes, and optimize health care resource allocation, ultimately improving the effectiveness of machine learning predictive algorithms applied to health care data. Our methodology can be applied to a variety of scenarios beyond those discussed in this paper.

Identifiants

pubmed: 39467275
pii: v12i1e54246
doi: 10.2196/54246
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

e54246

Informations de copyright

©Bruno Paiva, Marcos André Gonçalves, Leonardo Chaves Dutra da Rocha, Milena Soriano Marcolino, Fernanda Cristina Barbosa Lana, Maira Viana Rego Souza-Silva, Jussara M Almeida, Polianna Delfino Pereira, Claudio Moisés Valiense de Andrade, Angélica Gomides dos Reis Gomes, Maria Angélica Pires Ferreira, Frederico Bartolazzi, Manuela Furtado Sacioto, Ana Paula Boscato, Milton Henriques Guimarães-Júnior, Priscilla Pereira dos Reis, Felício Roberto Costa, Alzira de Oliveira Jorge, Laryssa Reis Coelho, Marcelo Carneiro, Thaís Lorenna Souza Sales, Silvia Ferreira Araújo, Daniel Vitório Silveira, Karen Brasil Ruschel, Fernanda Caldeira Veloso Santos, Evelin Paola de Almeida Cenci, Luanna Silva Monteiro Menezes, Fernando Anschau, Maria Aparecida Camargos Bicalho, Euler Roberto Fernandes Manenti, Renan Goulart Finger, Daniela Ponce, Filipe Carrilho de Aguiar, Luiza Margoto Marques, Luís César de Castro, Giovanna Grünewald Vietta, Mariana Frizzo de Godoy, Mariana do Nascimento Vilaça, Vivian Costa Morais. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 28.10.2024.

Auteurs

Bruno Paiva (B)

Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil.

Marcos André Gonçalves (MA)

Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil.

Leonardo Chaves Dutra da Rocha (LCD)

Computer Science Department, Universidade Federal de São João del-Rei, Brazil, São João del-Rei, Brazil.

Milena Soriano Marcolino (MS)

Faculdade de Medicina, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil.

Fernanda Cristina Barbosa Lana (FCB)

Faculdade de Medicina, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil.

Maira Viana Rego Souza-Silva (MVR)

Faculdade de Medicina, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil.

Jussara M Almeida (JM)

Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil.

Polianna Delfino Pereira (PD)

Faculdade de Medicina, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil.

Claudio Moisés Valiense de Andrade (CMV)

Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil.

Angélica Gomides Dos Reis Gomes (AGDR)

Hospitais da Rede Mater Dei, Belo Horizonte, Brazil.

Maria Angélica Pires Ferreira (MAP)

Hospital de Clínicas de Porto Alegre, Porto Alegre, Brazil.

Frederico Bartolazzi (F)

Hospital Santo Antônio, Curvelo, Brazil.

Manuela Furtado Sacioto (MF)

Faculdade Ciências Médicas de Minas Gerais, Belo Horizonte, Brazil.

Ana Paula Boscato (AP)

Hospital Tacchini, Bento Gonçalves, Brazil.

Priscilla Pereira Dos Reis (PP)

Hospital Metropolitano Doutor Célio de Castro, Belo Horizonte, Brazil.

Felício Roberto Costa (FR)

Faculdade de Medicina, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil.

Alzira de Oliveira Jorge (AO)

Hospital Risoleta Tolentino Neves, Belo Horizonte, Brazil.

Laryssa Reis Coelho (LR)

Faculdade de Medicina, Universidade Federal dos Vales do Jequitinhonha e Mucuri, Teófilo Otoni, Brazil.

Marcelo Carneiro (M)

Hospital Santa Cruz, Santa Cruz do Sul, Brazil.

Thaís Lorenna Souza Sales (TLS)

Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil.

Silvia Ferreira Araújo (SF)

Hospital Semper, Belo Horizonte, Brazil.

Daniel Vitório Silveira (DV)

Hospital Unimed BH, Belo Horizonte, Brazil.

Karen Brasil Ruschel (KB)

Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil.

Fernanda Caldeira Veloso Santos (FCV)

Hospital Universitário de Santa Maria, Santa Maria, Brazil.

Evelin Paola de Almeida Cenci (EPA)

Hospital Moinhos de Vento, Porto Alegre, Brazil.

Luanna Silva Monteiro Menezes (LSM)

Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil.

Fernando Anschau (F)

Hospital Nossa Senhora da Conceição, Porto Alegre, Brazil.

Maria Aparecida Camargos Bicalho (MAC)

Fundação Hospitalar do Estado de Minas Gerais, Belo Horizonte, Brazil.

Euler Roberto Fernandes Manenti (ERF)

Hospital Mãe de Deus, Porto Alegre, Brazil.

Renan Goulart Finger (RG)

Hospital Regional do Oeste, Chapecó, Brazil.

Daniela Ponce (D)

Faculdade de Medicina de Botucatu, Universidade Estadual Paulista Júlio de Mesquita Filho, Botucatu, Brazil.

Filipe Carrilho de Aguiar (FC)

Hospital das Clínicas, Universidade Federal de Pernambuco, Recife, Brazil.

Luiza Margoto Marques (LM)

Faculdade Ciências Médicas de Minas Gerais, Belo Horizonte, Brazil.

Luís César de Castro (LC)

Hospital Bruno Born, Lajeado, Brazil.

Giovanna Grünewald Vietta (GG)

Hospital SOS Cárdio, Florianópolis, Brazil.

Mariana Frizzo de Godoy (MF)

Hospital Santo Antônio, Curvelo, Brazil.

Mariana do Nascimento Vilaça (MDN)

Hospital Metropolitano Odilon Behrens, Belo Horizonte, Brazil.

Vivian Costa Morais (VC)

Faculdade Ciências Médicas de Minas Gerais, Belo Horizonte, Brazil.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH