GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms.

Base Composition Bias Fusobacterium / genetics Genome, Bacterial High-Throughput Nucleotide Sequencing / methods Metagenome Metagenomics / methods Nanopore Sequencing / methods Software / standards

GC bias Illumina Oxford Nanopore PacBio high-throughput sequencing metagenomics

Journal

GigaScience

ISSN: 2047-217X

Titre abrégé: Gigascience

Pays: United States

ID NLM: 101596872

Informations de publication

Date de publication:
01 02 2020

Historique:

received: 16 07 2019

revised: 25 11 2019

accepted: 14 01 2020

entrez: 14 2 2020

pubmed: 14 2 2020

medline: 28 1 2021

Statut: ppublish

Résumé

Metagenomic sequencing is a well-established tool in the modern biosciences. While it promises unparalleled insights into the genetic content of the biological samples studied, conclusions drawn are at risk from biases inherent to the DNA sequencing methods, including inaccurate abundance estimates as a function of genomic guanine-cytosine (GC) contents. We explored such GC biases across many commonly used platforms in experiments sequencing multiple genomes (with mean GC contents ranging from 28.9% to 62.4%) and metagenomes. GC bias profiles varied among different library preparation protocols and sequencing platforms. We found that our workflows using MiSeq and NextSeq were hindered by major GC biases, with problems becoming increasingly severe outside the 45-65% GC range, leading to a falsely low coverage in GC-rich and especially GC-poor sequences, where genomic windows with 30% GC content had >10-fold less coverage than windows close to 50% GC content. We also showed that GC content correlates tightly with coverage biases. The PacBio and HiSeq platforms also evidenced similar profiles of GC biases to each other, which were distinct from those seen in the MiSeq and NextSeq workflows. The Oxford Nanopore workflow was not afflicted by GC bias. These findings indicate potential sources of difficulty, arising from GC biases, in genome sequencing that could be pre-emptively addressed with methodological optimizations provided that the GC biases inherent to the relevant workflow are understood. Furthermore, it is recommended that a more critical approach be taken in quantitative abundance estimates in metagenomic studies. In the future, metagenomic studies should take steps to account for the effects of GC bias before drawing conclusions, or they should use a demonstrably unbiased workflow.

Sections du résumé

BACKGROUND

RESULTS

We explored such GC biases across many commonly used platforms in experiments sequencing multiple genomes (with mean GC contents ranging from 28.9% to 62.4%) and metagenomes. GC bias profiles varied among different library preparation protocols and sequencing platforms. We found that our workflows using MiSeq and NextSeq were hindered by major GC biases, with problems becoming increasingly severe outside the 45-65% GC range, leading to a falsely low coverage in GC-rich and especially GC-poor sequences, where genomic windows with 30% GC content had >10-fold less coverage than windows close to 50% GC content. We also showed that GC content correlates tightly with coverage biases. The PacBio and HiSeq platforms also evidenced similar profiles of GC biases to each other, which were distinct from those seen in the MiSeq and NextSeq workflows. The Oxford Nanopore workflow was not afflicted by GC bias.

CONCLUSIONS

These findings indicate potential sources of difficulty, arising from GC biases, in genome sequencing that could be pre-emptively addressed with methodological optimizations provided that the GC biases inherent to the relevant workflow are understood. Furthermore, it is recommended that a more critical approach be taken in quantitative abundance estimates in metagenomic studies. In the future, metagenomic studies should take steps to account for the effects of GC bias before drawing conclusions, or they should use a demonstrably unbiased workflow.

Identifiants

DOI: 10.1093/gigascience/giaa008 PMID: 32052832 PMC: PMC7016772

pubmed: 32052832

pii: 5735313

doi: 10.1093/gigascience/giaa008

pmc: PMC7016772

pii:

doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Informations de copyright

Références

BMC Genomics. 2012 Jan 03;13:1

pubmed: 22214261

Mol Cell. 2015 May 21;58(4):586-97

pubmed: 26000844

Genome Biol Evol. 2017 Sep 1;9(9):2477-2490

pubmed: 28961970

Nat Methods. 2008 Dec;5(12):1005-10

pubmed: 19034268

Nat Biotechnol. 2011 Sep 18;29(10):915-21

pubmed: 21926975

PLoS One. 2013 Apr 29;8(4):e62856

pubmed: 23638157

Exp Cell Res. 2014 Mar 10;322(1):12-20

pubmed: 24440557

BMC Genomics. 2011 Aug 08;12:402

pubmed: 21824423

Bioinformatics. 2012 Jun 1;28(11):1420-8

pubmed: 22495754

Genome Biol. 2013 May 29;14(5):R51

pubmed: 23718773

BMC Microbiol. 2015 Mar 21;15:66

pubmed: 25880246

BMC Res Notes. 2012 Jul 02;5:337

pubmed: 22748135

Nucleic Acids Res. 2012 May;40(10):e72

pubmed: 22323520

Cell Host Microbe. 2015 May 13;17(5):690-703

pubmed: 25974306

Proc Natl Acad Sci U S A. 2015 Nov 10;112(45):14024-9

pubmed: 26512100

Water Res. 2017 Sep 15;121:213-220

pubmed: 28544990

Biotechniques. 2014 Feb 01;56(2):61-4, 66, 68, passim

pubmed: 24502796

Cold Spring Harb Protoc. 2010 Jun;2010(6):pdb.prot5448

pubmed: 20516186

Appl Environ Microbiol. 2006 Jul;72(7):5069-72

pubmed: 16820507

Acta Vet Scand. 2018 Oct 11;60(1):61

pubmed: 30309375

Methods. 2016 Jun 1;102:3-11

pubmed: 27012178

Nat Commun. 2014 Nov 25;5:5498

pubmed: 25423494

ISME J. 2017 Jan;11(1):87-99

pubmed: 27552639

BMC Genomics. 2008 Feb 08;9:75

pubmed: 18261238

BMC Genomics. 2015 Oct 24;16:856

pubmed: 26496746

J Comput Biol. 2012 May;19(5):455-77

pubmed: 22506599

Front Microbiol. 2017 Mar 29;8:472

pubmed: 28424662

BMC Genomics. 2012 Jul 24;13:341

pubmed: 22827831

Nat Biotechnol. 2016 May 6;34(5):518-24

pubmed: 27153285

Genome Biol. 2011;12(2):R18

pubmed: 21338519

Nucleic Acids Res. 2015 Mar 31;43(6):e37

pubmed: 25586220

BMC Bioinformatics. 2012 Sep 19;13:238

pubmed: 22988817

Bioinformatics. 2009 Aug 15;25(16):2078-9

pubmed: 19505943

PLoS One. 2013 Jul 22;8(7):e68484

pubmed: 23894309

GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Informations de copyright

Références

Auteurs

Patrick Denis Browne (PD)

Tue Kjærgaard Nielsen (TK)

Witold Kot (W)

Anni Aggerholm (A)

M Thomas P Gilbert (MTP)

Lara Puetz (L)

Morten Rasmussen (M)

Athanasios Zervas (A)

Lars Hestbjerg Hansen (LH)

Articles similaires

Comprehensive comparative analysis and development of molecular markers for Lasianthus species based on complete chloroplast genome sequences.

Selecting optimal software code descriptors-The case of Java.

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Fasciola hepatica and Fasciola hybrid form co-existence in yak from Tibet of China: application of rDNA internal transcribed spacer.

Classifications MeSH