GRIEVOUS: Your command-line general for resolving cross-dataset genotype inconsistencies.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
30 Jul 2024
Historique:
received: 01 02 2024
revised: 19 07 2024
accepted: 29 07 2024
medline: 30 7 2024
pubmed: 30 7 2024
entrez: 30 7 2024
Statut: aheadofprint

Résumé

Harmonizing variant indexing and allele assignments across datasets is crucial for data integrity in cross-dataset studies such as multi-cohort genome-wide association studies, meta-analyses, and the development, validation, and application of polygenic risk scores. Ensuring this indexing and allele consistency is a laborious, time-consuming, and error-prone process requiring a certain degree of computational proficiency. Here, we introduce GRIEVOUS, a command-line tool for cross-dataset variant homogenization. By means of an internal database and a custom indexing methodology, GRIEVOUS identifies, formats, and aligns all biallelic SNPs across all summary statistic and genotype files of interest. Upon completion of dataset harmonization, GRIEVOUS can also be used to extract the maximal set of biallelic SNPs common to all datasets. GRIEVOUS and all supporting documentation and tutorials can be found at https://github.com/jvtalwar/GRIEVOUS. It is freely and publicly available under the MIT license and can be installed via pip. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 39078222
pii: 7723992
doi: 10.1093/bioinformatics/btae489
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© The Author(s) 2024. Published by Oxford University Press.

Auteurs

James V Talwar (JV)

Department of Medicine, Division of Medical Genetics, University of California San Diego, CA 92093, USA.
Bioinformatics and Systems Biology Program, University of California San Diego, CA 92093, USA.

Adam Klie (A)

Department of Medicine, Division of Medical Genetics, University of California San Diego, CA 92093, USA.
Bioinformatics and Systems Biology Program, University of California San Diego, CA 92093, USA.

Meghana S Pagadala (MS)

Biomedical Science Program, University of California San Diego, CA 92093, USA.

Hannah Carter (H)

Department of Medicine, Division of Medical Genetics, University of California San Diego, CA 92093, USA.
Bioinformatics and Systems Biology Program, University of California San Diego, CA 92093, USA.
Moores Cancer Center, University of California San Diego, CA 92093, USA.

Classifications MeSH