SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
01 06 2020
01 06 2020
Historique:
received:
23
12
2019
revised:
06
03
2020
accepted:
17
04
2020
pubmed:
25
4
2020
medline:
29
12
2020
entrez:
25
4
2020
Statut:
ppublish
Résumé
We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources. SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno. lswang@pennmedicine.upenn.edu. Supplementary data are available at Bioinformatics online.
Identifiants
pubmed: 32330239
pii: 5824793
doi: 10.1093/bioinformatics/btaa246
pmc: PMC7320617
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
3879-3881Subventions
Organisme : NIA NIH HHS
ID : U24 AG041689
Pays : United States
Organisme : NIA NIH HHS
ID : U54 AG052427
Pays : United States
Organisme : NIA NIH HHS
ID : U01 AG032984
Pays : United States
Organisme : NIA NIH HHS
ID : T32 AG000255
Pays : United States
Informations de copyright
© The Author(s) 2020. Published by Oxford University Press.
Références
Nat Genet. 2013 Dec;45(12):1452-8
pubmed: 24162737
Nature. 2014 Mar 27;507(7493):455-461
pubmed: 24670763
Nat Genet. 2015 Sep;47(9):979-986
pubmed: 26192919
PLoS Genet. 2014 May 15;10(5):e1004383
pubmed: 24830394
Nucleic Acids Res. 2019 Jan 8;47(D1):D1005-D1012
pubmed: 30445434
Nucleic Acids Res. 2018 Sep 28;46(17):8740-8753
pubmed: 30113658
Nat Methods. 2018 Feb;15(2):123-126
pubmed: 29309061
Nature. 2015 Feb 19;518(7539):317-30
pubmed: 25693563
Nature. 2017 Oct 11;550(7675):204-213
pubmed: 29022597
Nature. 2012 Sep 6;489(7414):57-74
pubmed: 22955616
Nat Commun. 2017 Nov 28;8(1):1826
pubmed: 29184056