Evaluation of supervised machine-learning methods for predicting appearance traits from DNA.


Journal

Forensic science international. Genetics
ISSN: 1878-0326
Titre abrégé: Forensic Sci Int Genet
Pays: Netherlands
ID NLM: 101317016

Informations de publication

Date de publication:
07 2021
Historique:
received: 06 11 2020
revised: 26 02 2021
accepted: 17 03 2021
pubmed: 9 4 2021
medline: 18 8 2021
entrez: 8 4 2021
Statut: ppublish

Résumé

The prediction of human externally visible characteristics (EVCs) based solely on DNA information has become an established approach in forensic and anthropological genetics in recent years. While for a large set of EVCs, predictive models have already been established using multinomial logistic regression (MLR), the prediction performances of other possible classification methods have not been thoroughly investigated thus far. Motivated by the question to identify a potential classifier that outperforms these specific trait models, we conducted a systematic comparison between the widely used MLR and three popular machine learning (ML) classifiers, namely support vector machines (SVM), random forest (RF) and artificial neural networks (ANN), that have shown good performance outside EVC prediction. As examples, we used eye, hair and skin color categories as phenotypes and genotypes based on the previously established IrisPlex, HIrisPlex, and HIrisPlex-S DNA markers. We compared and assessed the performances of each of the four methods, complemented by detailed hyperparameter tuning that was applied to some of the methods in order to maximize their performance. Overall, we observed that all four classification methods showed rather similar performance, with no method being substantially superior to the others for any of the traits, although performances varied slightly across the different traits and more so across the trait categories. Hence, based on our findings, none of the ML methods applied here provide any advantage on appearance prediction, at least when it comes to the categorical pigmentation traits and the selected DNA markers used here.

Identifiants

pubmed: 33831816
pii: S1872-4973(21)00046-6
doi: 10.1016/j.fsigen.2021.102507
pii:
doi:

Substances chimiques

Genetic Markers 0
DNA 9007-49-2

Types de publication

Journal Article Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

102507

Informations de copyright

Copyright © 2021 The Authors. Published by Elsevier B.V. All rights reserved.

Auteurs

Maria-Alexandra Katsara (MA)

Cologne Center for Genomics, University of Cologne, Cologne, Germany.

Wojciech Branicki (W)

Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland.

Susan Walsh (S)

Department of Biology, Indiana University Purdue University Indianapolis (IUPUI), Indianapolis, IN, USA.

Manfred Kayser (M)

Department of Genetic Identification, Erasmus MC University Medical Center Rotterdam, Rotterdam, The Netherlands.

Michael Nothnagel (M)

Cologne Center for Genomics, University of Cologne, Cologne, Germany; Faculty of Medicine and the Cologne University Hospital, Cologne, Germany. Electronic address: michael.nothnagel@uni-koeln.de.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH