Generalizable and Automated Classification of TNM Stage from Pathology Reports with External Validation.

Journal

medRxiv : the preprint server for health sciences

Titre abrégé: medRxiv

Pays: United States

ID NLM: 101767986

Informations de publication

Date de publication:
27 Jun 2023

Historique:

pubmed: 10 7 2023

medline: 10 7 2023

entrez: 10 7 2023

Statut: epublish

Résumé

Cancer staging is an essential clinical attribute informing patient prognosis and clinical trial eligibility. However, it is not routinely recorded in structured electronic health records. Here, we present a generalizable method for the automated classification of TNM stage directly from pathology report text. We train a BERT-based model using publicly available pathology reports across approximately 7,000 patients and 23 cancer types. We explore the use of different model types, with differing input sizes, parameters, and model architectures. Our final model goes beyond term-extraction, inferring TNM stage from context when it is not included in the report text explicitly. As external validation, we test our model on almost 8,000 pathology reports from Columbia University Medical Center, finding that our trained model achieved an AU-ROC of 0.815-0.942. This suggests that our model can be applied broadly to other institutions without additional institution-specific fine-tuning.

Identifiants

DOI: 10.1101/2023.06.26.23291912 PMID: 37425701 PMC: PMC10327265

pubmed: 37425701

doi: 10.1101/2023.06.26.23291912

pmc: PMC10327265

pii:

doi:

Types de publication

Preprint

Langues

eng

Subventions

Organisme : NIGMS NIH HHS

ID : R35 GM131905

Pays : United States

Generalizable and Automated Classification of TNM Stage from Pathology Reports with External Validation.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Subventions

Auteurs

Jenna Kefeli (J)

Nicholas Tatonetti (N)

Classifications MeSH