Unsupervised machine learning for species delimitation, integrative taxonomy, and biodiversity conservation.
Biodiversity Conservation
Integrative Taxonomy
Machine Learning
Management Units
Species Delimitation
Species Limits
Journal
Molecular phylogenetics and evolution
ISSN: 1095-9513
Titre abrégé: Mol Phylogenet Evol
Pays: United States
ID NLM: 9304400
Informations de publication
Date de publication:
Dec 2023
Dec 2023
Historique:
received:
12
06
2023
revised:
25
09
2023
accepted:
04
10
2023
medline:
28
11
2023
pubmed:
8
10
2023
entrez:
7
10
2023
Statut:
ppublish
Résumé
Integrative taxonomy, combining data from multiple axes of biologically relevant variation, is a major goal of systematics. Ideally, such taxonomies will derive from similarly integrative species-delimitation analyses. Yet, most current methods rely solely or primarily on molecular data, with other layers often incorporated only in a post hoc qualitative or comparative manner. A major limitation is the difficulty of devising quantitative parametric models linking different datasets in a unified ecological and evolutionary framework. Machine Learning (ML) methods offer flexibility in this arena by easily learning high-dimensional associations between observations (e.g., individual specimens) across a wide array of input features (e.g., genetics, geography, environment, and phenotype) to delimit statistically meaningful clusters. Here, I implement an unsupervised method using Self-Organizing (or "Kohonen") Maps (SOMs) for such purposes. Recent extensions called "SuperSOMs" can integrate multiple layers, each of which exerts independent influence on a two-dimensional output grid via empirically estimated weights. The grid cells are then delimited into K distinct units that can be interpreted as species or other entities. I show empirical examples in salamanders (Desmognathus) and snakes (Storeria) with layers representing alleles, space, climate, and traits. Simulations reveal that the SuperSOM approach can detect K = 1, tends not to over-split, reflects contributions from all layers, and limits large layers (e.g., genetic matrices) from overwhelming other datasets, desirable properties addressing major concerns from previous studies. Finally, I suggest that these and similar methods could integrate conservation-relevant layers such as population trends and human encroachment to delimit management units from an explicitly quantitative framework grounded in the ecology and evolution of species limits and boundaries.
Identifiants
pubmed: 37804960
pii: S1055-7903(23)00239-7
doi: 10.1016/j.ympev.2023.107939
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
107939Informations de copyright
Copyright © 2023 Elsevier Inc. All rights reserved.
Déclaration de conflit d'intérêts
Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.