SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
01 06 2020
Historique:
received: 23 12 2019
revised: 06 03 2020
accepted: 17 04 2020
pubmed: 25 4 2020
medline: 29 12 2020
entrez: 25 4 2020
Statut: ppublish

Résumé

We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources. SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno. lswang@pennmedicine.upenn.edu. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 32330239
pii: 5824793
doi: 10.1093/bioinformatics/btaa246
pmc: PMC7320617
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

3879-3881

Subventions

Organisme : NIA NIH HHS
ID : U24 AG041689
Pays : United States
Organisme : NIA NIH HHS
ID : U54 AG052427
Pays : United States
Organisme : NIA NIH HHS
ID : U01 AG032984
Pays : United States
Organisme : NIA NIH HHS
ID : T32 AG000255
Pays : United States

Informations de copyright

© The Author(s) 2020. Published by Oxford University Press.

Références

Nat Genet. 2013 Dec;45(12):1452-8
pubmed: 24162737
Nature. 2014 Mar 27;507(7493):455-461
pubmed: 24670763
Nat Genet. 2015 Sep;47(9):979-986
pubmed: 26192919
PLoS Genet. 2014 May 15;10(5):e1004383
pubmed: 24830394
Nucleic Acids Res. 2019 Jan 8;47(D1):D1005-D1012
pubmed: 30445434
Nucleic Acids Res. 2018 Sep 28;46(17):8740-8753
pubmed: 30113658
Nat Methods. 2018 Feb;15(2):123-126
pubmed: 29309061
Nature. 2015 Feb 19;518(7539):317-30
pubmed: 25693563
Nature. 2017 Oct 11;550(7675):204-213
pubmed: 29022597
Nature. 2012 Sep 6;489(7414):57-74
pubmed: 22955616
Nat Commun. 2017 Nov 28;8(1):1826
pubmed: 29184056

Auteurs

Pavel P Kuksa (PP)

Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.

Chien-Yueh Lee (CY)

Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.

Alexandre Amlie-Wolf (A)

Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.
Genomics and Computational Biology Graduate Group.

Prabhakaran Gangadharan (P)

Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.

Elizabeth E Mlynarski (EE)

Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.

Yi-Fan Chou (YF)

Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.

Han-Jen Lin (HJ)

Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.

Heather Issen (H)

Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.

Emily Greenfest-Allen (E)

Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.
Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.

Otto Valladares (O)

Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.

Yuk Yee Leung (YY)

Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.

Li-San Wang (LS)

Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center.
Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Humans Macular Degeneration Mendelian Randomization Analysis Life Style Genome-Wide Association Study
Coal Metagenome Phylogeny Bacteria Genome, Bacterial

Classifications MeSH