Unsupervised machine learning for species delimitation, integrative taxonomy, and biodiversity conservation.

Biodiversity Conservation Integrative Taxonomy Machine Learning Management Units Species Delimitation Species Limits

Journal

Molecular phylogenetics and evolution
ISSN: 1095-9513
Titre abrégé: Mol Phylogenet Evol
Pays: United States
ID NLM: 9304400

Informations de publication

Date de publication:
Dec 2023
Historique:
received: 12 06 2023
revised: 25 09 2023
accepted: 04 10 2023
medline: 28 11 2023
pubmed: 8 10 2023
entrez: 7 10 2023
Statut: ppublish

Résumé

Integrative taxonomy, combining data from multiple axes of biologically relevant variation, is a major goal of systematics. Ideally, such taxonomies will derive from similarly integrative species-delimitation analyses. Yet, most current methods rely solely or primarily on molecular data, with other layers often incorporated only in a post hoc qualitative or comparative manner. A major limitation is the difficulty of devising quantitative parametric models linking different datasets in a unified ecological and evolutionary framework. Machine Learning (ML) methods offer flexibility in this arena by easily learning high-dimensional associations between observations (e.g., individual specimens) across a wide array of input features (e.g., genetics, geography, environment, and phenotype) to delimit statistically meaningful clusters. Here, I implement an unsupervised method using Self-Organizing (or "Kohonen") Maps (SOMs) for such purposes. Recent extensions called "SuperSOMs" can integrate multiple layers, each of which exerts independent influence on a two-dimensional output grid via empirically estimated weights. The grid cells are then delimited into K distinct units that can be interpreted as species or other entities. I show empirical examples in salamanders (Desmognathus) and snakes (Storeria) with layers representing alleles, space, climate, and traits. Simulations reveal that the SuperSOM approach can detect K = 1, tends not to over-split, reflects contributions from all layers, and limits large layers (e.g., genetic matrices) from overwhelming other datasets, desirable properties addressing major concerns from previous studies. Finally, I suggest that these and similar methods could integrate conservation-relevant layers such as population trends and human encroachment to delimit management units from an explicitly quantitative framework grounded in the ecology and evolution of species limits and boundaries.

Identifiants

pubmed: 37804960
pii: S1055-7903(23)00239-7
doi: 10.1016/j.ympev.2023.107939
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

107939

Informations de copyright

Copyright © 2023 Elsevier Inc. All rights reserved.

Déclaration de conflit d'intérêts

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

R Alexander Pyron (R)

Department of Biological Sciences, The George Washington University, Washington, DC 20052 USA. Electronic address: rpyron@colubroid.org.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH