Machine Learning of Plasma Proteomics Classifies Diagnosis of Interstitial Lung Disease.

Connective tissue disease associated interstitial lung disease Differential diagnosis Idiopathic pulmonary fibrosis Machine learning model Plasma proteomics

Journal

American journal of respiratory and critical care medicine

ISSN: 1535-4970

Titre abrégé: Am J Respir Crit Care Med

Pays: United States

ID NLM: 9421642

Informations de publication

Date de publication:
29 Feb 2024

Historique:

medline: 29 2 2024

pubmed: 29 2 2024

entrez: 29 2 2024

Statut: aheadofprint

Résumé

Distinguishing connective tissue disease associated interstitial lung disease (CTD-ILD) from idiopathic pulmonary fibrosis (IPF) can be clinically challenging. Identify proteins that separate and classify CTD-ILD from IPF patients. Four registries with 1247 IPF and 352 CTD-ILD patients were included in analyses. Plasma samples were subjected to high-throughput proteomics assays. Protein features were prioritized using Recursive Feature Elimination (RFE) to construct a proteomic classifier. Multiple machine learning models, including Support Vector Machine, LASSO regression, Random Forest (RF), and imbalanced-RF, were trained and tested in independent cohorts. The validated models were used to classify each case iteratively in external datasets. A classifier with 37 proteins (PC37) was enriched in biological process of bronchiole development and smooth muscle proliferation, and immune responses. Four machine learning models used PC37 with sex and age score to generate continuous classification values. Receiver-operating-characteristic curve analyses of these scores demonstrated consistent Area-Under-Curve 0.85-0.90 in test cohort, and 0.94-0.96 in the single-sample dataset. Binary classification demonstrated 78.6%-80.4% sensitivity and 76%-84.4% specificity in test cohort, 93.5%-96.1% sensitivity and 69.5%-77.6% specificity in single-sample classification dataset. Composite analysis of all machine learning models confirmed 78.2% (194/248) accuracy in test cohort and 82.9% (208/251) in single-sample classification dataset. Multiple machine learning models trained with large cohort proteomic datasets consistently distinguished CTD-ILD from IPF. Identified proteins involved in immune pathways. We further developed a novel approach for single sample classification, which could facilitate honing the differential diagnosis of ILD in challenging cases and improve clinical decision-making.

Identifiants

DOI: 10.1164/rccm.202309-1692OC PMID: 38422478

pubmed: 38422478

doi: 10.1164/rccm.202309-1692OC

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Subventions

Organisme : NHLBI NIH HHS

ID : R01 HL166290

Pays : United States

Organisme : NHLBI NIH HHS

ID : R01 HL169166

Pays : United States

Machine Learning of Plasma Proteomics Classifies Diagnosis of Interstitial Lung Disease.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Subventions

Auteurs

Yong Huang (Y)

Shwu-Fan Ma (SF)

Justin M Oldham (JM)

Ayodeji Adegunsoye (A)

Daisy Zhu (D)

Susan Murray (S)

John S Kim (JS)

Catherine Bonham (C)

Emma Strickland (E)

Angela L Linderholm (AL)

Cathryn T Lee (CT)

Tessy Paul (T)

Hannah Mannem (H)

Toby M Maher (TM)

Philip L Molyneaux (PL)

Mary E Strek (ME)

Fernando J Martinez (FJ)

Imre Noth (I)

Classifications MeSH