Dealing with confounders and outliers in classification medical studies: The Autism Spectrum Disorders case study.

Autism Spectrum Disorder / diagnosis Brain / diagnostic imaging Humans Machine Learning Magnetic Resonance Imaging ROC Curve

Autism Spectrum Disorders Autoencoder Confounders Confounding Index MRI Machine learning Outliers Reproducibility

Journal

Artificial intelligence in medicine

ISSN: 1873-2860

Titre abrégé: Artif Intell Med

Pays: Netherlands

ID NLM: 8915031

Informations de publication

Date de publication:
08 2020

Historique:

received: 19 07 2019

revised: 13 12 2019

accepted: 02 07 2020

entrez: 25 9 2020

pubmed: 26 9 2020

medline: 19 8 2021

Statut: ppublish

Résumé

Machine learning (ML) approaches have been widely applied to medical data in order to find reliable classifiers to improve diagnosis and detect candidate biomarkers of a disease. However, as a powerful, multivariate, data-driven approach, ML can be misled by biases and outliers in the training set, finding sample-dependent classification patterns. This phenomenon often occurs in biomedical applications in which, due to the scarcity of the data, combined with their heterogeneous nature and complex acquisition process, outliers and biases are very common. In this work we present a new workflow for biomedical research based on ML approaches, that maximizes the generalizability of the classification. This workflow is based on the adoption of two data selection tools: an autoencoder to identify the outliers and the Confounding Index, to understand which characteristics of the sample can mislead classification. As a study-case we adopt the controversial research about extracting brain structural biomarkers of Autism Spectrum Disorders (ASD) from magnetic resonance images. A classifier trained on a dataset composed by 86 subjects, selected using this framework, obtained an area under the receiver operating characteristic curve of 0.79. The feature pattern identified by this classifier is still able to capture the mean differences between the ASD and Typically Developing Control classes on 1460 new subjects in the same age range of the training set, thus providing new insights on the brain characteristics of ASD. In this work, we show that the proposed workflow allows to find generalizable patterns even if the dataset is limited, while skipping the two mentioned steps and using a larger but not well designed training set would have produced a sample-dependent classifier.

Identifiants

DOI: 10.1016/j.artmed.2020.101926 PMID: 32972657

pubmed: 32972657

pii: S0933-3657(19)30608-6

doi: 10.1016/j.artmed.2020.101926

pii:

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

101926

Dealing with confounders and outliers in classification medical studies: The Autism Spectrum Disorders case study.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Auteurs

Elisa Ferrari (E)

Paolo Bosco (P)

Sara Calderoni (S)

Piernicola Oliva (P)

Letizia Palumbo (L)

Giovanna Spera (G)

Maria Evelina Fantacci (ME)

Alessandra Retico (A)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH