DisVar: an R library for identifying variants associated with diseases using large-scale personal genetic information.
Bioinformatics
Disease diagnosis
Disease prevention
GWAS
Genetic disease databases
Genetic diseases
Genetic variants
R package
Journal
PeerJ
ISSN: 2167-8359
Titre abrégé: PeerJ
Pays: United States
ID NLM: 101603425
Informations de publication
Date de publication:
2023
2023
Historique:
received:
06
06
2023
accepted:
22
08
2023
medline:
1
11
2023
pubmed:
4
10
2023
entrez:
4
10
2023
Statut:
epublish
Résumé
Genetic variants may potentially play a contributing factor in the development of diseases. Several genetic disease databases are used in medical research and diagnosis but the web applications used to search these databases for disease-associated variants have limitations. The application may not be able to search for large-scale genetic variants, the results of searches may be difficult to interpret and variants mapped from the latest reference genome (GRCH38/hg38) may not be supported. In this study, we developed a novel R library called "DisVar" to identify disease-associated genetic variants in large-scale individual genomic data. This R library is compatible with variants from the latest reference genome version. DisVar uses five databases of disease-associated variants. Over 100 million variants can be simultaneously searched for specific associated diseases. The package was evaluated using 24 Variant Call Format (VCF) files (215,054 to 11,346,899 sites) from the 1000 Genomes Project. Disease-associated variants were detected in 298,227 hits across all the VCF files, taking a total of 63.58 m to complete. The package was also tested on ClinVar's VCF file (2,120,558 variants), where 20,657 hits associated with diseases were identified with an estimated elapsed time of 45.98 s. DisVar can overcome the limitations of existing tools and is a fast and effective diagnostic and preventive tool that identifies disease-associated variations from large-scale genetic variants against the latest reference genome.
Sections du résumé
Background
Genetic variants may potentially play a contributing factor in the development of diseases. Several genetic disease databases are used in medical research and diagnosis but the web applications used to search these databases for disease-associated variants have limitations. The application may not be able to search for large-scale genetic variants, the results of searches may be difficult to interpret and variants mapped from the latest reference genome (GRCH38/hg38) may not be supported.
Methods
In this study, we developed a novel R library called "DisVar" to identify disease-associated genetic variants in large-scale individual genomic data. This R library is compatible with variants from the latest reference genome version. DisVar uses five databases of disease-associated variants. Over 100 million variants can be simultaneously searched for specific associated diseases.
Results
The package was evaluated using 24 Variant Call Format (VCF) files (215,054 to 11,346,899 sites) from the 1000 Genomes Project. Disease-associated variants were detected in 298,227 hits across all the VCF files, taking a total of 63.58 m to complete. The package was also tested on ClinVar's VCF file (2,120,558 variants), where 20,657 hits associated with diseases were identified with an estimated elapsed time of 45.98 s.
Conclusions
DisVar can overcome the limitations of existing tools and is a fast and effective diagnostic and preventive tool that identifies disease-associated variations from large-scale genetic variants against the latest reference genome.
Identifiants
pubmed: 37790633
doi: 10.7717/peerj.16086
pii: 16086
pmc: PMC10542659
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e16086Informations de copyright
©2023 Chanasongkhram et al.
Déclaration de conflit d'intérêts
The authors declare there are no competing interests.
Références
Am J Hum Genet. 2012 Jan 13;90(1):7-24
pubmed: 22243964
Nucleic Acids Res. 2021 Jul 2;49(W1):W446-W451
pubmed: 33893808
PeerJ. 2022 Nov 10;10:e14344
pubmed: 36389403
Nucleic Acids Res. 2012 Jan;40(Database issue):D1047-54
pubmed: 22139925
Ann Lab Med. 2019 Nov;39(6):552-560
pubmed: 31240883
Nucleic Acids Res. 2015 Jan;43(Database issue):D799-804
pubmed: 25428361
Nat Rev Genet. 2021 May;22(5):269-283
pubmed: 33408383
PLoS One. 2015 Apr 13;10(4):e0122812
pubmed: 25876137
Nucleic Acids Res. 2019 Jan 8;47(D1):D1005-D1012
pubmed: 30445434
BMC Med Genet. 2009 Jan 22;10:6
pubmed: 19161620
Nucleic Acids Res. 2018 Jul 2;46(W1):W114-W120
pubmed: 29771388
Nucleic Acids Res. 2014 Jan;42(Database issue):D1001-6
pubmed: 24316577
Nat Rev Genet. 2019 Aug;20(8):467-484
pubmed: 31068683
Essays Biochem. 2018 Dec 2;62(5):643-723
pubmed: 30509934
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245