Pathogenicity and functional impact of non-frameshifting insertion/deletion variation in the human genome.
Journal
PLoS computational biology
ISSN: 1553-7358
Titre abrégé: PLoS Comput Biol
Pays: United States
ID NLM: 101238922
Informations de publication
Date de publication:
06 2019
06 2019
Historique:
received:
16
11
2018
accepted:
17
05
2019
revised:
26
06
2019
pubmed:
15
6
2019
medline:
4
12
2019
entrez:
15
6
2019
Statut:
epublish
Résumé
Differentiation between phenotypically neutral and disease-causing genetic variation remains an open and relevant problem. Among different types of variation, non-frameshifting insertions and deletions (indels) represent an understudied group with widespread phenotypic consequences. To address this challenge, we present a machine learning method, MutPred-Indel, that predicts pathogenicity and identifies types of functional residues impacted by non-frameshifting insertion/deletion variation. The model shows good predictive performance as well as the ability to identify impacted structural and functional residues including secondary structure, intrinsic disorder, metal and macromolecular binding, post-translational modifications, allosteric sites, and catalytic residues. We identify structural and functional mechanisms impacted preferentially by germline variation from the Human Gene Mutation Database, recurrent somatic variation from COSMIC in the context of different cancers, as well as de novo variants from families with autism spectrum disorder. Further, the distributions of pathogenicity prediction scores generated by MutPred-Indel are shown to differentiate highly recurrent from non-recurrent somatic variation. Collectively, we present a framework to facilitate the interrogation of both pathogenicity and the functional effects of non-frameshifting insertion/deletion variants. The MutPred-Indel webserver is available at http://mutpred.mutdb.org/.
Identifiants
pubmed: 31199787
doi: 10.1371/journal.pcbi.1007112
pii: PCOMPBIOL-D-18-01935
pmc: PMC6594643
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
e1007112Subventions
Organisme : NIMH NIH HHS
ID : R01 MH108528
Pays : United States
Organisme : NIMH NIH HHS
ID : U01 MH109501
Pays : United States
Organisme : NIMH NIH HHS
ID : R01 MH109885
Pays : United States
Organisme : NIMH NIH HHS
ID : R01 MH105524
Pays : United States
Organisme : NIGMS NIH HHS
ID : T32 GM008666
Pays : United States
Déclaration de conflit d'intérêts
The authors have declared that no competing interests exist.
Références
Genome Biol. 2014 Jan 13;15(1):R19
pubmed: 24451234
Nucleic Acids Res. 2015 Jan;43(Database issue):D222-6
pubmed: 25414356
Int J Mol Sci. 2018 Jul 20;19(7):
pubmed: 30037003
Nature. 2014 Nov 13;515(7526):209-15
pubmed: 25363760
Bioinformatics. 2008 Aug 15;24(16):i241-7
pubmed: 18689832
Proteins. 2011 Jul;79(7):2086-96
pubmed: 21671271
Science. 2018 Apr 20;360(6386):327-331
pubmed: 29674594
Bioinformatics. 2018 May 15;34(10):1774-1777
pubmed: 29300834
PLoS Comput Biol. 2016 Aug 26;12(8):e1005091
pubmed: 27564311
PLoS One. 2013 Oct 30;8(10):e77945
pubmed: 24205039
Proteins. 2006 May 1;63(2):398-410
pubmed: 16493654
PLoS One. 2013 Oct 23;8(10):e77940
pubmed: 24194902
Nat Rev Cancer. 2004 Mar;4(3):177-83
pubmed: 14993899
FEBS Lett. 2016 Aug;590(15):2327-41
pubmed: 27423136
Nat Med. 2016 Jan;22(1):97-104
pubmed: 26657142
BMC Bioinformatics. 2014 Apr 17;15:111
pubmed: 24742296
Bioinformatics. 2007 May 15;23(10):1282-8
pubmed: 17379688
J Mol Biol. 2013 Nov 1;425(21):4047-63
pubmed: 23962656
Neuron. 2010 Oct 21;68(2):192-5
pubmed: 20955926
Genome Biol. 2013 Mar 13;14(3):R23
pubmed: 23497682
PLoS One. 2013 Jun 14;8(6):e66273
pubmed: 23799087
Nucleic Acids Res. 2012 Jan;40(Database issue):D834-40
pubmed: 22102591
Mol Biosyst. 2012 Jan;8(1):27-32
pubmed: 22080206
Cell Chem Biol. 2017 Mar 16;24(3):371-380
pubmed: 28262558
Front Bioeng Biotechnol. 2015 Jan 26;3:8
pubmed: 25674564
Genome Res. 2013 May;23(5):749-61
pubmed: 23478400
Nat Commun. 2016 May 10;7:11479
pubmed: 27161491
Pac Symp Biocomput. 2010;:337-47
pubmed: 19908386
Nucleic Acids Res. 2008 Jan;36(Database issue):D815-9
pubmed: 17827212
Cell. 2012 Dec 21;151(7):1431-42
pubmed: 23260136
Science. 2013 Mar 29;339(6127):1546-58
pubmed: 23539594
Hum Mutat. 2001 Apr;17(4):263-70
pubmed: 11295823
Nat Commun. 2020 Nov 20;11(1):5918
pubmed: 33219223
Nucleic Acids Res. 2016 Jan 4;44(D1):D862-8
pubmed: 26582918
Am J Epidemiol. 2017 Oct 15;186(8):1000-1009
pubmed: 29040395
PLoS One. 2016 Apr 04;11(4):e0152929
pubmed: 27043210
Bioinformatics. 2017 Jul 15;33(14):i389-i398
pubmed: 28882004
PLoS Comput Biol. 2009 Sep;5(9):e1000497
pubmed: 19730682
Nucleic Acids Res. 2010 Jan;38(Database issue):D652-7
pubmed: 19906727
Cancer Res. 2009 Aug 15;69(16):6660-7
pubmed: 19654296
Methods Mol Biol. 2017;1550:235-260
pubmed: 28188534
Nucleic Acids Res. 2013 Jan;41(Database issue):D344-7
pubmed: 23161676
Nature. 2016 Aug 17;536(7616):285-91
pubmed: 27535533
Hum Mutat. 2010 Mar;31(3):335-46
pubmed: 20052762
Genome Biol Evol. 2015 Jun 04;7(6):1815-26
pubmed: 26047845
Bioinformatics. 2011 Feb 15;27(4):441-8
pubmed: 21159622
J Mol Biol. 2002 Jul 5;320(2):369-87
pubmed: 12079393
Nucleic Acids Res. 2010 Sep;38(16):e164
pubmed: 20601685
Am J Hum Genet. 2016 Apr 7;98(4):667-79
pubmed: 27018473
Genome Res. 2003 Oct;13(10):2363-71
pubmed: 14525934
Bioinformatics. 2013 Jun 15;29(12):1504-10
pubmed: 23620363
Sci Rep. 2013 Oct 02;3:2651
pubmed: 24089029
Sci Rep. 2017 Aug 24;7(1):9313
pubmed: 28839204
BMC Genomics. 2010 Nov 02;11 Suppl 2:S5
pubmed: 21047386
PLoS One. 2012;7(10):e46688
pubmed: 23056405
Trends Biochem Sci. 2006 Apr;31(4):206-14
pubmed: 16545956
Brief Bioinform. 2019 Sep 27;20(5):1925-1933
pubmed: 30016397
Nucleic Acids Res. 2012 Jan;40(Database issue):D841-6
pubmed: 22121220
Nucleic Acids Res. 2016 Jul 8;44(W1):W494-501
pubmed: 27150810
Hum Genet. 2017 Jun;136(6):665-677
pubmed: 28349240
Hum Mutat. 2016 Jan;37(1):28-35
pubmed: 26442818
Nucleic Acids Res. 2014 Jan;42(Database issue):D259-66
pubmed: 24214962
Bioinformatics. 2009 Nov 1;25(21):2744-50
pubmed: 19734154
Nat Genet. 2014 Mar;46(3):310-5
pubmed: 24487276
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Hum Genet. 2013 Nov;132(11):1235-43
pubmed: 23793516
PLoS Comput Biol. 2014 May 01;10(5):e1003592
pubmed: 24784581
Nature. 2014 Nov 13;515(7526):216-21
pubmed: 25363768
Nucleic Acids Res. 1997 Sep 1;25(17):3389-402
pubmed: 9254694
BMC Bioinformatics. 2006 Apr 17;7:208
pubmed: 16618368
Mol Genet Genomics. 2015 Feb;290(1):343-52
pubmed: 25248637
Nat Neurosci. 2016 Sep;19(9):1194-6
pubmed: 27479843
Proteins. 2006 Mar 1;62(4):1125-32
pubmed: 16372356
Bioinformatics. 2001 Aug;17(8):700-12
pubmed: 11524371
Bioinformatics. 2011 Jul 1;27(13):1741-8
pubmed: 21596790
Protein Sci. 2004 Jan;13(1):71-80
pubmed: 14691223
Protein Sci. 2014 Aug;23(8):1077-93
pubmed: 24888500
Nucleic Acids Res. 2014 Jan;42(Database issue):D764-70
pubmed: 24270787
Nat Biotechnol. 2017 Oct;35(10):951-959
pubmed: 28892075
JAMA. 2014 Mar 12;311(10):1035-45
pubmed: 24618965