WilsonGenAI a deep learning approach to classify pathogenic variants in Wilson Disease.


Journal

PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081

Informations de publication

Date de publication:
2024
Historique:
received: 14 11 2023
accepted: 01 05 2024
medline: 17 5 2024
pubmed: 17 5 2024
entrez: 17 5 2024
Statut: epublish

Résumé

Advances in Next Generation Sequencing have made rapid variant discovery and detection widely accessible. To facilitate a better understanding of the nature of these variants, American College of Medical Genetics and Genomics and the Association of Molecular Pathologists (ACMG-AMP) have issued a set of guidelines for variant classification. However, given the vast number of variants associated with any disorder, it is impossible to manually apply these guidelines to all known variants. Machine learning methodologies offer a rapid way to classify large numbers of variants, as well as variants of uncertain significance as either pathogenic or benign. Here we classify ATP7B genetic variants by employing ML and AI algorithms trained on our well-annotated WilsonGen dataset. We have trained and validated two algorithms: TabNet and XGBoost on a high-confidence dataset of manually annotated, ACMG & AMP classified variants of the ATP7B gene associated with Wilson's Disease. Using an independent validation dataset of ACMG & AMP classified variants, as well as a patient set of functionally validated variants, we showed how both algorithms perform and can be used to classify large numbers of variants in clinical as well as research settings. We have created a ready to deploy tool, that can classify variants linked with Wilson's disease as pathogenic or benign, which can be utilized by both clinicians and researchers to better understand the disease through the nature of genetic variants associated with it.

Sections du résumé

BACKGROUND BACKGROUND
Advances in Next Generation Sequencing have made rapid variant discovery and detection widely accessible. To facilitate a better understanding of the nature of these variants, American College of Medical Genetics and Genomics and the Association of Molecular Pathologists (ACMG-AMP) have issued a set of guidelines for variant classification. However, given the vast number of variants associated with any disorder, it is impossible to manually apply these guidelines to all known variants. Machine learning methodologies offer a rapid way to classify large numbers of variants, as well as variants of uncertain significance as either pathogenic or benign. Here we classify ATP7B genetic variants by employing ML and AI algorithms trained on our well-annotated WilsonGen dataset.
METHODS METHODS
We have trained and validated two algorithms: TabNet and XGBoost on a high-confidence dataset of manually annotated, ACMG & AMP classified variants of the ATP7B gene associated with Wilson's Disease.
RESULTS RESULTS
Using an independent validation dataset of ACMG & AMP classified variants, as well as a patient set of functionally validated variants, we showed how both algorithms perform and can be used to classify large numbers of variants in clinical as well as research settings.
CONCLUSION CONCLUSIONS
We have created a ready to deploy tool, that can classify variants linked with Wilson's disease as pathogenic or benign, which can be utilized by both clinicians and researchers to better understand the disease through the nature of genetic variants associated with it.

Identifiants

pubmed: 38758754
doi: 10.1371/journal.pone.0303787
pii: PONE-D-23-35773
doi:

Substances chimiques

ATP7B protein, human 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

e0303787

Informations de copyright

Copyright: © 2024 Vatsyayan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist.

Auteurs

Aastha Vatsyayan (A)

CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi, India.
Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India.

Mukesh Kumar (M)

CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi, India.
Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India.

Bhaskar Jyoti Saikia (BJ)

CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi, India.
Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India.

Vinod Scaria (V)

CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi, India.
Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India.

Binukumar B K (B)

CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi, India.
Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH