Simulator Pre-Screening of Underprepared Drivers Prior to Licensing On-Road Examination: Clustering of Virtual Driving Test Time Series Data.

accidents, traffic adolescent automobile driving cause of death child humans licensure machine learning motor vehicle motor vehicles on-road exam simulated driving assessment support vector machines

Journal

Journal of medical Internet research
ISSN: 1438-8871
Titre abrégé: J Med Internet Res
Pays: Canada
ID NLM: 100959882

Informations de publication

Date de publication:
18 06 2020
Historique:
received: 12 03 2019
accepted: 16 12 2019
revised: 11 11 2019
entrez: 20 6 2020
pubmed: 20 6 2020
medline: 1 12 2020
Statut: epublish

Résumé

A large Midwestern state commissioned a virtual driving test (VDT) to assess driving skills preparedness before the on-road examination (ORE). Since July 2017, a pilot deployment of the VDT in state licensing centers (VDT pilot) has collected both VDT and ORE data from new license applicants with the aim of creating a scoring algorithm that could predict those who were underprepared. Leveraging data collected from the VDT pilot, this study aimed to develop and conduct an initial evaluation of a novel machine learning (ML)-based classifier using limited domain knowledge and minimal feature engineering to reliably predict applicant pass/fail on the ORE. Such methods, if proven useful, could be applicable to the classification of other time series data collected within medical and other settings. We analyzed an initial dataset that comprised 4308 drivers who completed both the VDT and the ORE, in which 1096 (25.4%) drivers went on to fail the ORE. We studied 2 different approaches to constructing feature sets to use as input to ML algorithms: the standard method of reducing the time series data to a set of manually defined variables that summarize driving behavior and a novel approach using time series clustering. We then fed these representations into different ML algorithms to compare their ability to predict a driver's ORE outcome (pass/fail). The new method using time series clustering performed similarly compared with the standard method in terms of overall accuracy for predicting pass or fail outcome (76.1% vs 76.2%) and area under the curve (0.656 vs 0.682). However, the time series clustering slightly outperformed the standard method in differentially predicting failure on the ORE. The novel clustering method yielded a risk ratio for failure of 3.07 (95% CI 2.75-3.43), whereas the standard variables method yielded a risk ratio for failure of 2.68 (95% CI 2.41-2.99). In addition, the time series clustering method with logistic regression produced the lowest ratio of false alarms (those who were predicted to fail but went on to pass the ORE; 27.2%). Our results provide initial evidence that the clustering method is useful for feature construction in classification tasks involving time series data when resources are limited to create multiple, domain-relevant variables.

Sections du résumé

BACKGROUND
A large Midwestern state commissioned a virtual driving test (VDT) to assess driving skills preparedness before the on-road examination (ORE). Since July 2017, a pilot deployment of the VDT in state licensing centers (VDT pilot) has collected both VDT and ORE data from new license applicants with the aim of creating a scoring algorithm that could predict those who were underprepared.
OBJECTIVE
Leveraging data collected from the VDT pilot, this study aimed to develop and conduct an initial evaluation of a novel machine learning (ML)-based classifier using limited domain knowledge and minimal feature engineering to reliably predict applicant pass/fail on the ORE. Such methods, if proven useful, could be applicable to the classification of other time series data collected within medical and other settings.
METHODS
We analyzed an initial dataset that comprised 4308 drivers who completed both the VDT and the ORE, in which 1096 (25.4%) drivers went on to fail the ORE. We studied 2 different approaches to constructing feature sets to use as input to ML algorithms: the standard method of reducing the time series data to a set of manually defined variables that summarize driving behavior and a novel approach using time series clustering. We then fed these representations into different ML algorithms to compare their ability to predict a driver's ORE outcome (pass/fail).
RESULTS
The new method using time series clustering performed similarly compared with the standard method in terms of overall accuracy for predicting pass or fail outcome (76.1% vs 76.2%) and area under the curve (0.656 vs 0.682). However, the time series clustering slightly outperformed the standard method in differentially predicting failure on the ORE. The novel clustering method yielded a risk ratio for failure of 3.07 (95% CI 2.75-3.43), whereas the standard variables method yielded a risk ratio for failure of 2.68 (95% CI 2.41-2.99). In addition, the time series clustering method with logistic regression produced the lowest ratio of false alarms (those who were predicted to fail but went on to pass the ORE; 27.2%).
CONCLUSIONS
Our results provide initial evidence that the clustering method is useful for feature construction in classification tasks involving time series data when resources are limited to create multiple, domain-relevant variables.

Identifiants

pubmed: 32554384
pii: v22i6e13995
doi: 10.2196/13995
pmc: PMC7333075
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

e13995

Informations de copyright

©David Grethlein, Flaura Koplin Winston, Elizabeth Walshe, Sean Tanner, Venk Kandadai, Santiago Ontañón. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 18.06.2020.

Références

Bioinformatics. 2003 May 22;19(8):973-80
pubmed: 12761060
Accid Anal Prev. 2011 Jul;43(4):1285-90
pubmed: 21545856
Accid Anal Prev. 2003 May;35(3):311-20
pubmed: 12643948
J Safety Res. 2014 Sep;50:125-38
pubmed: 25142369
Accid Anal Prev. 2003 Nov;35(6):921-5
pubmed: 12971927
Inj Prev. 2015 Jun;21(3):145-52
pubmed: 25740939
J Adolesc Health. 2012 Nov;51(5):484-90
pubmed: 23084170
Transp Res Rec. 2012;2321:73-78
pubmed: 23543947
J Med Internet Res. 2013 Oct 21;15(10):e232
pubmed: 24144946
Accid Anal Prev. 2014 Nov;72:302-8
pubmed: 25103321
Int J Gen Med. 2011;4:359-71
pubmed: 21625472

Auteurs

David Grethlein (D)

Diagnostic Driving, Inc, Philadelphia, PA, United States.
Computer Science Department, Drexel University, Philadelphia, PA, United States.

Flaura Koplin Winston (FK)

Diagnostic Driving, Inc, Philadelphia, PA, United States.
Center for Injury Research and Prevention, Children's Hospital of Philadelphia, Philadelphia, PA, United States.
Perelmen School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.

Elizabeth Walshe (E)

Center for Injury Research and Prevention, Children's Hospital of Philadelphia, Philadelphia, PA, United States.
Annenberg Public Policy Center, University of Pennsylvania, Philadelphia, PA, United States.

Sean Tanner (S)

Diagnostic Driving, Inc, Philadelphia, PA, United States.
Geography Department, Rutgers University, New Brunswick, NJ, United States.

Venk Kandadai (V)

Diagnostic Driving, Inc, Philadelphia, PA, United States.

Santiago Ontañón (S)

Computer Science Department, Drexel University, Philadelphia, PA, United States.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH