NGS-Integrator: An efficient tool for combining multiple NGS data tracks using minimum Bayes' factors.

Efficient data integration Genome-wide NGS Minimum Bayes factor NGS data analysis

Journal

BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258

Informations de publication

Date de publication:
19 Nov 2020
Historique:
received: 30 04 2020
accepted: 09 11 2020
entrez: 20 11 2020
pubmed: 21 11 2020
medline: 15 5 2021
Statut: epublish

Résumé

Next-generation sequencing (NGS) is widely used for genome-wide identification and quantification of DNA elements involved in the regulation of gene transcription. Studies that generate multiple high-throughput NGS datasets require data integration methods for two general tasks: 1) generation of genome-wide data tracks representing an aggregate of multiple replicates of the same experiment; and 2) combination of tracks from different experimental types that provide complementary information regarding the location of genomic features such as enhancers. NGS-Integrator is a Java-based command line application, facilitating efficient integration of multiple genome-wide NGS datasets. NGS-Integrator first transforms all input data tracks using the complement of the minimum Bayes' factor so that all values are expressed in the range [0,1] representing the probability of a true signal given the background noise. Then, NGS-Integrator calculates the joint probability for every genomic position to create an integrated track. We provide examples using real NGS data generated in our laboratory and from the mouse ENCODE database. Our results show that NGS-Integrator is both time- and memory-efficient. Our examples show that NGS-Integrator can integrate information to facilitate downstream analyses that identify functional regulatory domains along the genome.

Sections du résumé

BACKGROUND BACKGROUND
Next-generation sequencing (NGS) is widely used for genome-wide identification and quantification of DNA elements involved in the regulation of gene transcription. Studies that generate multiple high-throughput NGS datasets require data integration methods for two general tasks: 1) generation of genome-wide data tracks representing an aggregate of multiple replicates of the same experiment; and 2) combination of tracks from different experimental types that provide complementary information regarding the location of genomic features such as enhancers.
RESULTS RESULTS
NGS-Integrator is a Java-based command line application, facilitating efficient integration of multiple genome-wide NGS datasets. NGS-Integrator first transforms all input data tracks using the complement of the minimum Bayes' factor so that all values are expressed in the range [0,1] representing the probability of a true signal given the background noise. Then, NGS-Integrator calculates the joint probability for every genomic position to create an integrated track. We provide examples using real NGS data generated in our laboratory and from the mouse ENCODE database.
CONCLUSIONS CONCLUSIONS
Our results show that NGS-Integrator is both time- and memory-efficient. Our examples show that NGS-Integrator can integrate information to facilitate downstream analyses that identify functional regulatory domains along the genome.

Identifiants

pubmed: 33213365
doi: 10.1186/s12864-020-07220-7
pii: 10.1186/s12864-020-07220-7
pmc: PMC7678096
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

806

Subventions

Organisme : NIGMS NIH HHS
ID : R15GM120820
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01 GM134384
Pays : United States
Organisme : NHLBI NIH HHS
ID : ZIA-HL006129
Pays : United States
Organisme : National Science Foundation
ID : NSF CAREER OAC 1925960
Organisme : NHLBI NIH HHS
ID : ZIA-HL001285
Pays : United States

Références

Sci Rep. 2016 Oct 11;6:34863
pubmed: 27725713
Nat Biotechnol. 2008 Nov;26(11):1293-300
pubmed: 18978777
Nat Protoc. 2012 Sep;7(9):1728-40
pubmed: 22936215
Nat Biotechnol. 2009 Jan;27(1):66-75
pubmed: 19122651
J Comput Biol. 2012 Jun;19(6):826-38
pubmed: 22533622
Nat Methods. 2013 Dec;10(12):1213-8
pubmed: 24097267
Oncogene. 2019 Oct;38(40):6647-6661
pubmed: 31391555
BMC Genomics. 2015 Mar 01;16:145
pubmed: 25766521
Nucleus. 2012 Mar 1;3(2):126-31
pubmed: 22555596
Proc Natl Acad Sci U S A. 2009 Feb 17;106(7):2441-6
pubmed: 19190182
Proc Natl Acad Sci U S A. 2017 Oct 17;114(42):E8875-E8884
pubmed: 28973931
J Am Soc Nephrol. 2018 May;29(5):1490-1500
pubmed: 29572403
Nature. 2012 Aug 2;488(7409):116-20
pubmed: 22763441
Genome Biol. 2008;9(9):R137
pubmed: 18798982

Auteurs

Bronte Wen (B)

Epithelial Systems Biology Laboratory, Systems Biology Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA.

Hyun Jun Jung (HJ)

Epithelial Systems Biology Laboratory, Systems Biology Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA.
Division of Nephrology, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA.

Lihe Chen (L)

Epithelial Systems Biology Laboratory, Systems Biology Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA.

Fahad Saeed (F)

School of Computing and Information Sciences, Florida International University, Miami, Florida, USA.

Mark A Knepper (MA)

Epithelial Systems Biology Laboratory, Systems Biology Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA. knepperm@nhlbi.nih.gov.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing
Robotic Surgical Procedures Animals Humans Telemedicine Models, Animal

Odour generalisation and detection dog training.

Lyn Caldicott, Thomas W Pike, Helen E Zulch et al.
1.00
Animals Odorants Dogs Generalization, Psychological Smell

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Classifications MeSH