Machine learning application for incident prostate adenocarcinomas automatic registration in a French regional cancer registry.
Cancer registry
Machine learning
Prostate adenocarcinoma
Journal
International journal of medical informatics
ISSN: 1872-8243
Titre abrégé: Int J Med Inform
Pays: Ireland
ID NLM: 9711057
Informations de publication
Date de publication:
07 2020
07 2020
Historique:
received:
06
09
2019
revised:
07
01
2020
accepted:
31
03
2020
pubmed:
25
4
2020
medline:
24
10
2020
entrez:
25
4
2020
Statut:
ppublish
Résumé
Cancer registries are collections of curated data about malignant tumor diseases. The amount of data processed by cancer registries increases every year, making manual registration more and more tedious. We sought to develop an automatic analysis pipeline that would be able to identify and preprocess registry input for incident prostate adenocarcinomas in a French regional cancer registry. Notifications from different sources submitted to the Bas-Rhin cancer registry were used here: pathology data and, ICD 10 diagnosis codes from hospital discharge data and healthcare insurance data. We trained a Support Vector Machine model (machine learning) to predict whether patient's data must be considered or not as a prostate adenocarcinoma incident case that should therefore be registered. The final registration of all identified cases was manually confirmed by a specialized technician. Text mining tools (regular expressions) were used to extract clinical and biological data from non-structured pathology reports. We performed two successive analyses. First, we used 982 cases manually labeled by registrars from the 2014 dataset to predict the registration of 785 cases submitted in 2015. Then, we repeated the procedure using the 2089 cases labeled by registrars from the 2014 and 2015 datasets to predict the registration of 926 cases submitted in the 2016 data. The algorithm identified 663 cases of prostate adenocarcinoma in 2015, and 610 in 2016. From these findings, 663 and 531 cases were respectively added to the registry; and 641 and 512 cases were confirmed by the specialized technician. This registration process has achieved a precision level above 96 %. The algorithm obtained an overall precision of 99 % (99.5 % in 2015 and 98.5 % in 2016) and a recall of 97 % (97.8 % in 2015 and 96.9 % in 2016). When the information was found in pathology report, text mining was more than 90 % accuracy for major indicators: PSA test, Gleason score, and incidence date). For both PSA and tumor side, information was not detected in the majority of cases." Machine learning was able to identify new cases of prostate cancer, and text mining was able to prefill the data about incident cases. Machine-learning-based automation of the registration process could reduce delays in data production and allow investigators to devote more time to complex tasks and analysis.
Identifiants
pubmed: 32330852
pii: S1386-5056(19)30950-5
doi: 10.1016/j.ijmedinf.2020.104139
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
104139Informations de copyright
Copyright © 2020 Elsevier B.V. All rights reserved.
Déclaration de conflit d'intérêts
Declaration of Competing Interest The authors have no conflict of interests to declare.