GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs.


Journal

PLoS computational biology
ISSN: 1553-7358
Titre abrégé: PLoS Comput Biol
Pays: United States
ID NLM: 101238922

Informations de publication

Date de publication:
09 2021
Historique:
received: 11 01 2021
accepted: 10 09 2021
revised: 15 10 2021
pubmed: 28 9 2021
medline: 27 11 2021
entrez: 27 9 2021
Statut: epublish

Résumé

Transcription factors (TFs) are proteins that promote or reduce the expression of genes by binding short genomic DNA sequences known as transcription factor binding sites (TFBS). While several tools have been developed to scan for potential occurrences of TFBS in linear DNA sequences or reference genomes, no tool exists to find them in pangenome variation graphs (VGs). VGs are sequence-labelled graphs that can efficiently encode collections of genomes and their variants in a single, compact data structure. Because VGs can losslessly compress large pangenomes, TFBS scanning in VGs can efficiently capture how genomic variation affects the potential binding landscape of TFs in a population of individuals. Here we present GRAFIMO (GRAph-based Finding of Individual Motif Occurrences), a command-line tool for the scanning of known TF DNA motifs represented as Position Weight Matrices (PWMs) in VGs. GRAFIMO extends the standard PWM scanning procedure by considering variations and alternative haplotypes encoded in a VG. Using GRAFIMO on a VG based on individuals from the 1000 Genomes project we recover several potential binding sites that are enhanced, weakened or missed when scanning only the reference genome, and which could constitute individual-specific binding events. GRAFIMO is available as an open-source tool, under the MIT license, at https://github.com/pinellolab/GRAFIMO and https://github.com/InfOmics/GRAFIMO.

Identifiants

pubmed: 34570769
doi: 10.1371/journal.pcbi.1009444
pii: PCOMPBIOL-D-21-00041
pmc: PMC8519448
doi:

Substances chimiques

Transcription Factors 0

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

e1009444

Subventions

Organisme : NHGRI NIH HHS
ID : R00 HG008399
Pays : United States
Organisme : NHGRI NIH HHS
ID : R35 HG010717
Pays : United States

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist.

Références

Algorithms Mol Biol. 2017 Jul 11;12:18
pubmed: 28702075
Nucleic Acids Res. 2018 Jan 4;46(D1):D794-D801
pubmed: 29126249
Nature. 2012 Sep 6;489(7414):57-74
pubmed: 22955616
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Genome Biol. 2020 May 25;21(1):124
pubmed: 32450900
Science. 2010 Apr 9;328(5975):232-5
pubmed: 20299548
Methods Mol Biol. 1994;25:93-102
pubmed: 8004185
Nat Commun. 2018 Apr 18;9(1):1520
pubmed: 29670109
Bioinformatics. 2010 Sep 15;26(18):i524-30
pubmed: 20823317
Nucleic Acids Res. 2020 Jan 8;48(D1):D756-D761
pubmed: 31691824
Mol Biol Evol. 2015 Aug;32(8):2161-80
pubmed: 25976354
Bioinformatics. 2011 Apr 1;27(7):1017-8
pubmed: 21330290
Bioinformatics. 2015 Oct 15;31(20):3353-5
pubmed: 26092860
J Comput Biol. 2018 Jul;25(7):649-663
pubmed: 29461862
Genetics. 2012 Nov;192(3):973-85
pubmed: 22887818
Bioinformatics. 2009 Dec 1;25(23):3181-2
pubmed: 19773334
Genome Res. 2017 May;27(5):665-676
pubmed: 28360232
Nucleic Acids Res. 2020 Jan 8;48(D1):D87-D92
pubmed: 31701148
Nucleic Acids Res. 2020 Jan 8;48(D1):D835-D844
pubmed: 31777943
Quant Biol. 2013 Jun;1(2):115-130
pubmed: 25045190
Bioinformatics. 2020 Jan 15;36(2):400-407
pubmed: 31406990
Nat Protoc. 2011 Nov 03;6(12):1860-9
pubmed: 22051799
Nat Biotechnol. 2018 Oct;36(9):875-879
pubmed: 30125266
Gigascience. 2017 Jul 1;6(7):1-8
pubmed: 28531267
Nat Rev Genet. 2015 Apr;16(4):197-212
pubmed: 25707927
Nucleic Acids Res. 2009 Jul;37(Web Server issue):W202-8
pubmed: 19458158
BMC Genomics. 2016 Jun 23;17 Suppl 2:395
pubmed: 27356864

Auteurs

Manuel Tognon (M)

Computer Science Department, University of Verona, Verona, Italy.

Vincenzo Bonnici (V)

Computer Science Department, University of Verona, Verona, Italy.

Erik Garrison (E)

University of Tennessee Health Science Center, Memphis, Tennessee, United States of America.

Rosalba Giugno (R)

Computer Science Department, University of Verona, Verona, Italy.

Luca Pinello (L)

Molecular Pathology Unit, Center for Computational and Integrative Biology and Center for Cancer Research, Massachusetts General Hospital Charlestown, Massachusetts, United States of America.
Department of Pathology, Harvard Medical School, Boston, Massachusetts, United States of America.
Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH