Do no harm: a roadmap for responsible machine learning for health care.
Journal
Nature medicine
ISSN: 1546-170X
Titre abrégé: Nat Med
Pays: United States
ID NLM: 9502015
Informations de publication
Date de publication:
09 2019
09 2019
Historique:
received:
10
07
2019
accepted:
17
07
2019
pubmed:
21
8
2019
medline:
13
11
2019
entrez:
21
8
2019
Statut:
ppublish
Résumé
Interest in machine-learning applications within medicine has been growing, but few studies have progressed to deployment in patient care. We present a framework, context and ultimately guidelines for accelerating the translation of machine-learning-based interventions in health care. To be successful, translation will require a team of engaged stakeholders and a systematic process from beginning (problem formulation) to end (widespread deployment).
Identifiants
pubmed: 31427808
doi: 10.1038/s41591-019-0548-6
pii: 10.1038/s41591-019-0548-6
doi:
Types de publication
Journal Article
Review
Langues
eng
Sous-ensembles de citation
IM
Pagination
1337-1340Commentaires et corrections
Type : ErratumIn
Références
Lazer, D., Kennedy, R., King, G. & Vespignani, A. Big data. The parable of Google Flu: traps in big data analysis. Science 343, 1203–1205 (2014).
doi: 10.1126/science.1248506
Hutson, M. Even artificial intelligence can acquire biases against race and gender. Science https://doi.org/10.1126/science.aal1053 (2017).
He, J. et al. The practical implementation of artificial intelligence technologies in medicine. Nat. Med. 25, 30–36 (2019).
doi: 10.1038/s41591-018-0307-0
Silva, I., Moody, G., Scott, D. J., Celi, L. A. & Mark, R. G. Predicting in-hospital mortality of ICU patients: the Physionet/Computing in Cardiology Challenge 2012. Comput. Cardiol. 39, 245–248 (2012).
Luo, Y., Cai, X., Zhang, Y. & Xu, J. Multivariate time series imputation with generative adversarial networks. in Advances in Neural Information Processing Systems 1596–1607 (NeurIPS, 2018).
O’Malley, K. J. et al. Measuring diagnoses: ICD code accuracy. Health Serv. Res. 40, 1620–1639 (2005).
doi: 10.1111/j.1475-6773.2005.00444.x
Saria, S. & Subbaswamy, A. Tutorial: safe and reliable machine learning. Preprint at https://arxiv.org/abs/1904.07204 (2019).
Chen, I. Y., Szolovits, P. & Ghassemi, M. Can AI help reduce disparities in general medical and mental health care? AMA J. Ethics 21, E167–E179 (2019).
doi: 10.1001/amajethics.2019.167
Schulam, P. & Saria, S. Reliable decision support using counterfactual models. in Advances in Neural Information Processing Systems 1697–1708 (NeurIPS, 2017).
O’neil, C. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (Broadway Books, 2016).
Williams, D. R., Mohammed, S. A., Leavell, J. & Collins, C. Race, socioeconomic status, and health: complexities, ongoing challenges, and research opportunities. Ann. NY Acad. Sci. 1186, 69–101 (2010).
doi: 10.1111/j.1749-6632.2009.05339.x
Rajpurkar, P. et al. Chexnet: radiologist-level pneumonia detection on chest X-rays with deep learning. Preprint at https://arxiv.org/abs/1711.05225 (2017).
Liu, V.X., Bates, D.W., Wiens, J. & Shah, N.H. The number needed to benefit: estimating the value of predictive analytics in healthcare. J. Am. Med. Inform. Assoc. https://doi.org/10.1093/jamia/ocz088 (2019).
Oh, J. et al. A generalizable, data-driven approach to predict daily risk of Clostridium difficile infection at two large academic health centers. Infect. Control Hosp. Epidemiol. 39, 425–433 (2018).
doi: 10.1017/ice.2018.16
Schulam, P. & Saria, S. Can you trust this prediction? Auditing pointwise reliability after learning. in The 22nd International Conference on Artificial Intelligence and Statistics 1022–1031 (PMLR, 2019).
Henderson, P. et al. Deep reinforcement learning that matters. in Thirty-second AAAI Conference on Artificial Intelligence (AAAI, 2018).
Nestor, B. et al. Rethinking clinical prediction: why machine learning must consider year of care and feature aggregation. Preprint at https://arxiv.org/abs/1811.12583 (2018).
Henry, K. E., Hager, D. N., Pronovost, P. J. & Saria, S. A targeted real-time early warning score (TREWScore) for septic shock. Sci. Transl. Med. 7, 299ra122 (2015).
doi: 10.1126/scitranslmed.aab3719
Hemming, K., Haines, T. P., Chilton, P. J., Girling, A. J. & Lilford, R. J. The stepped wedge cluster randomised trial: rationale, design, analysis, and reporting. Br. Med. J. 350, h391 (2015).
doi: 10.1136/bmj.h391
Evans, B. & Ossorio, P. The challenge of regulating clinical decision support software after 21
doi: 10.1177/0098858818789418
Okoro, A. O. Preface: The 21
doi: 10.1177/0098858818793388
Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) (U.S. Food & Drug Administration, 2019); https://www.fda.gov/media/122535/download
Massachusetts Institute of Technology. Self-driving cars, robots: identifying AI ‘blind spots’. ScienceDaily (25 January 2019).
Chien, S. & Wagstaff, K. L. Robotic space exploration agents. Sci. Robot. 2, eaan4831 (2017).
doi: 10.1126/scirobotics.aan4831