Accurate detection of mosaic variants in sequencing data without matched controls.


Journal

Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648

Informations de publication

Date de publication:
03 2020
Historique:
received: 04 12 2018
accepted: 23 11 2019
pubmed: 8 1 2020
medline: 11 4 2020
entrez: 8 1 2020
Statut: ppublish

Résumé

Detection of mosaic mutations that arise in normal development is challenging, as such mutations are typically present in only a minute fraction of cells and there is no clear matched control for removing germline variants and systematic artifacts. We present MosaicForecast, a machine-learning method that leverages read-based phasing and read-level features to accurately detect mosaic single-nucleotide variants and indels, achieving a multifold increase in specificity compared with existing algorithms. Using single-cell sequencing and targeted sequencing, we validated 80-90% of the mosaic single-nucleotide variants and 60-80% of indels detected in human brain whole-genome sequencing data. Our method should help elucidate the contribution of mosaic somatic mutations to the origin and development of disease.

Identifiants

pubmed: 31907404
doi: 10.1038/s41587-019-0368-8
pii: 10.1038/s41587-019-0368-8
pmc: PMC7065972
mid: NIHMS1544381
doi:

Types de publication

Letter Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

314-319

Subventions

Organisme : NINDS NIH HHS
ID : R01 NS032457
Pays : United States
Organisme : NIMH NIH HHS
ID : U01 MH106883
Pays : United States
Organisme : NHGRI NIH HHS
ID : T32 HG002295
Pays : United States
Organisme : NIGMS NIH HHS
ID : T32 GM007753
Pays : United States

Références

Biesecker, L. G. & Spinner, N. B. A genomic view of mosaicism and human disease. Nat. Rev. Genet. 14, 307–320 (2013).
doi: 10.1038/nrg3424
Bae, T. et al. Different mutational rates and mechanisms in human cells at pregastrulation and neurogenesis. Science 359, 550–555 (2018).
doi: 10.1126/science.aan8690
Ju, Y. S. et al. Somatic mutations reveal asymmetric cellular dynamics in the early human embryo. Nature 543, 714–718 (2017).
doi: 10.1038/nature21703
Ye, A. Y. et al. A model for postzygotic mosaicisms quantifies the allele fraction drift, mutation rate, and contribution to de novo mutations. Genome Res. 28, 943–951 (2018).
doi: 10.1101/gr.230003.117
Lodato, M. A. et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science 350, 94–98 (2015).
doi: 10.1126/science.aab1785
Dou, Y., Gold, H. D., Luquette, L. J. & Park, P. J. Detecting somatic mutations in normal cells. Trends Genet. 34, 545–557 (2018).
doi: 10.1016/j.tig.2018.04.003
Dou, Y. et al. Postzygotic single-nucleotide mosaicisms contribute to the etiology of autism spectrum disorder and autistic traits and the origin of mutations. Hum. Mutat. 38, 1002–1013 (2017).
doi: 10.1002/humu.23255
Freed, D. & Pevsner, J. The contribution of mosaic variants to autism spectrum disorder. PLoS Genet. 12, e1006245 (2016).
doi: 10.1371/journal.pgen.1006245
Krupp, D. R. et al. Exonic mosaic mutations contribute risk for autism spectrum disorder. Am. J. Hum. Genet. 101, 369–390 (2017).
doi: 10.1016/j.ajhg.2017.07.016
Lodato, M. A. et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 359, 555–559 (2018).
doi: 10.1126/science.aao4426
Yang, X. et al. Genomic mosaicism in paternal sperm and multiple parental tissues in a Dravet syndrome cohort. Sci. Rep. 7, 15677 (2017).
doi: 10.1038/s41598-017-15814-7
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
doi: 10.1038/nbt.2514
Alioto, T. S. et al. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat. Commun. 6, 10001 (2015).
doi: 10.1038/ncomms10001
Huang, A. Y. et al. Distinctive types of postzygotic single-nucleotide mosaicisms in healthy individuals revealed by genome-wide profiling of multiple organs. PLoS Genet. 14, e1007395 (2018).
doi: 10.1371/journal.pgen.1007395
Lim, E. T. et al. Rates, distribution and implications of postzygotic mosaic mutations in autism spectrum disorder. Nat. Neurosci. 20, 1217–1224 (2017).
doi: 10.1038/nn.4598
Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).
doi: 10.1038/nbt.4235
Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).
doi: 10.1038/s41592-018-0051-x
Bohrson, C. L. et al. Linked-read analysis identifies mutations in single-cell DNA-sequencing data. Nat. Genet. 51, 749–754 (2019).
doi: 10.1038/s41588-019-0366-2
Karczewski, K. J. et al. Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. Preprint at bioRxiv https://doi.org/10.1101/531210 (2019).
Costello, M. et al. Characterization and remediation of sample index swaps by non-redundant dual indexing on massively parallel sequencing platforms. BMC Genomics 19, 332 (2018).
doi: 10.1186/s12864-018-4703-0
Karimzadeh, M., Ernst, C., Kundaje, A. & Hoffman, M. M. Umap and Bismap: quantifying genome and methylome mappability. Nucleic Acids Res. 46, e120 (2018).
doi: 10.1093/nar/gkx951
Smit, A., Hubley, R. & Green, P. RepeatMasker Open-4.0 (2013–2015).
Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv https://doi.org/10.1101/201178 (2018).
Huang, A. Y. et al. MosaicHunter: accurate detection of postzygotic single-nucleotide mosaicism through next-generation sequencing of unpaired, trio, and paired samples. Nucleic Acids Res. 45, e76 (2017).
doi: 10.1093/nar/gkx024
Chen, L., Liu, P., Evans, T. C. Jr. & Ettwiller, L. M. DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification. Science 355, 752–756 (2017).
doi: 10.1126/science.aai8690
Zook, J. M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016).
doi: 10.1038/sdata.2016.25
McInerney, P., Adams, P. & Hadi, M. Z. Error rate comparison during polymerase chain reaction by DNA polymerase. Mol. Biol. Int. 2014, 287430 (2014).
doi: 10.1155/2014/287430
Rimmer, A. et al. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat. Genet. 46, 912–918 (2014).
doi: 10.1038/ng.3036
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).
doi: 10.1093/bioinformatics/btp698
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
doi: 10.1093/bioinformatics/btp352
Haeussler, M. et al. The UCSC Genome Browser database: 2019 update. Nucleic Acids Res. 47, D853–D858 (2019).
doi: 10.1093/nar/gky1095
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
doi: 10.1093/nar/27.2.573
Bragg, L. M., Stone, G., Butler, M. K., Hugenholtz, P. & Tyson, G. W. Shining a light on dark sequencing: characterising errors in Ion Torrent PGM data. PLoS Comput. Biol. 9, e1003031 (2013).
doi: 10.1371/journal.pcbi.1003031
Meacham, F. et al. Identification and correction of systematic error in high-throughput sequence data. BMC Bioinformatics 12, 451 (2011).
doi: 10.1186/1471-2105-12-451
Huang, A. Y. et al. Postzygotic single-nucleotide mosaicisms in whole-genome sequences of clinically unremarkable individuals. Cell Res. 24, 1311–1327 (2014).
doi: 10.1038/cr.2014.131
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).
doi: 10.18637/jss.v033.i01
Bischl, B. et al. mlr: Machine Learning in R. J. Mach. Learn. Res. 17, 1–5 (2016).
Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw. 28, 26 (2008).
doi: 10.18637/jss.v028.i05
Zook, J. M. et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014).
doi: 10.1038/nbt.2835

Auteurs

Yanmei Dou (Y)

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Minseok Kwon (M)

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Rachel E Rodin (RE)

Division of Genetics and Genomics, Manton Center for Orphan Disease, and Howard Hughes Medical Institute, Boston Children's Hospital, Boston, MA, USA.
Departments of Neurology and Pediatrics, Harvard Medical School, Boston, MA, USA.
Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Harvard/MIT MD-PhD Program, Harvard Medical School, Boston, MA, USA.

Isidro Cortés-Ciriano (I)

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.

Ryan Doan (R)

Division of Genetics and Genomics, Manton Center for Orphan Disease, and Howard Hughes Medical Institute, Boston Children's Hospital, Boston, MA, USA.
Departments of Neurology and Pediatrics, Harvard Medical School, Boston, MA, USA.
Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Lovelace J Luquette (LJ)

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
Bioinformatics and Integrative Genomics PhD program, Harvard Medical School, Boston, MA, USA.

Alon Galor (A)

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.

Craig Bohrson (C)

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
Bioinformatics and Integrative Genomics PhD program, Harvard Medical School, Boston, MA, USA.

Christopher A Walsh (CA)

Division of Genetics and Genomics, Manton Center for Orphan Disease, and Howard Hughes Medical Institute, Boston Children's Hospital, Boston, MA, USA.
Departments of Neurology and Pediatrics, Harvard Medical School, Boston, MA, USA.
Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Peter J Park (PJ)

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA. peter_park@hms.harvard.edu.
Ludwig Center at Harvard, Boston, MA, USA. peter_park@hms.harvard.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH