Analyzing effect of quadruple multiple sequence alignments on deep learning based protein inter-residue distance prediction.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
07 04 2021
Historique:
received: 05 02 2021
accepted: 25 03 2021
entrez: 8 4 2021
pubmed: 9 4 2021
medline: 30 10 2021
Statut: epublish

Résumé

Protein 3D structure prediction has advanced significantly in recent years due to improving contact prediction accuracy. This improvement has been largely due to deep learning approaches that predict inter-residue contacts and, more recently, distances using multiple sequence alignments (MSAs). In this work we present AttentiveDist, a novel approach that uses different MSAs generated with different E-values in a single model to increase the co-evolutionary information provided to the model. To determine the importance of each MSA's feature at the inter-residue level, we added an attention layer to the deep neural network. We show that combining four MSAs of different E-value cutoffs improved the model prediction performance as compared to single E-value MSA features. A further improvement was observed when an attention layer was used and even more when additional prediction tasks of bond angle predictions were added. The improvement of distance predictions were successfully transferred to achieve better protein tertiary structure modeling.

Identifiants

pubmed: 33828153
doi: 10.1038/s41598-021-87204-z
pii: 10.1038/s41598-021-87204-z
pmc: PMC8027171
doi:

Substances chimiques

Proteins 0
Caspases EC 3.4.22.-

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

7574

Subventions

Organisme : NIGMS NIH HHS
ID : R01 GM123055
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01 GM133840
Pays : United States
Organisme : NIGMS NIH HHS
ID : T32 GM132024
Pays : United States

Références

Bioinformatics. 2015 Mar 15;31(6):926-32
pubmed: 25398609
J Comput Chem. 2006 Nov 30;27(15):1876-82
pubmed: 16983671
Bioinformatics. 2018 Dec 1;34(23):4039-4045
pubmed: 29931279
J Mol Biol. 2009 Mar 27;387(2):416-30
pubmed: 19135455
J Mol Biol. 2003 Oct 31;333(4):863-82
pubmed: 14568541
Proteins. 2020 Aug;88(8):948-961
pubmed: 31697428
Bioinformatics. 2014 Nov 1;30(21):3128-30
pubmed: 25064567
Bioinformatics. 2020 Apr 1;36(7):2105-2112
pubmed: 31738385
Proteins. 2019 Dec;87(12):1069-1081
pubmed: 31471916
Protein Sci. 1999 Feb;8(2):361-9
pubmed: 10048329
Proc Natl Acad Sci U S A. 2019 Aug 20;116(34):16856-16865
pubmed: 31399549
Bioinformatics. 2012 Jan 15;28(2):184-90
pubmed: 22101153
Proc Natl Acad Sci U S A. 2011 Dec 6;108(49):E1293-301
pubmed: 22106262
Bioinformatics. 2019 Jul 15;35(14):2403-2410
pubmed: 30535134
BMC Bioinformatics. 2019 Sep 14;20(1):473
pubmed: 31521110
Bioinformatics. 2010 Mar 1;26(5):689-91
pubmed: 20061306
Proteins. 2019 Dec;87(12):1100-1112
pubmed: 31344267
BMC Bioinformatics. 2018 Jan 25;19(1):22
pubmed: 29370750
J Chem Inf Model. 2016 Sep 26;56(9):1676-91
pubmed: 27500657
Proc Natl Acad Sci U S A. 2020 Jan 21;117(3):1496-1503
pubmed: 31896580
Nat Rev Mol Cell Biol. 2019 Nov;20(11):681-697
pubmed: 31417196
Biophys J. 2011 Oct 19;101(8):2043-52
pubmed: 22004759
Nucleic Acids Res. 2017 Jan 4;45(D1):D170-D176
pubmed: 27899574
Phys Rev E Stat Nonlin Soft Matter Phys. 2013 Jan;87(1):012707
pubmed: 23410359
Nucleic Acids Res. 2018 Jul 2;46(W1):W200-W204
pubmed: 29905871
Protein Sci. 2002 Nov;11(11):2714-26
pubmed: 12381853
Proteins. 2013 Jan;81(1):149-62
pubmed: 22933340
Nature. 2020 Jan;577(7792):706-710
pubmed: 31942072
Elife. 2014 May 01;3:e02030
pubmed: 24842992
Nat Methods. 2011 Dec 25;9(2):173-5
pubmed: 22198341
Nucleic Acids Res. 1997 Sep 1;25(17):3389-402
pubmed: 9254694
PLoS One. 2011;6(12):e28766
pubmed: 22163331
Proteins. 2019 Dec;87(12):1082-1091
pubmed: 31407406
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W94-8
pubmed: 15980589
Nat Commun. 2018 Jun 29;9(1):2542
pubmed: 29959318

Auteurs

Aashish Jain (A)

Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA.

Genki Terashi (G)

Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA.

Yuki Kagaya (Y)

Graduate School of Information Sciences, Tohoku University, Sendai, Japan.

Sai Raghavendra Maddhuri Venkata Subramaniya (SR)

Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA.

Charles Christoffer (C)

Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA.

Daisuke Kihara (D)

Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA. dkihara@purdue.edu.
Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA. dkihara@purdue.edu.

Articles similaires

Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
alpha-Synuclein Humans Animals Mice Lewy Body Disease

Unsupervised learning for real-time and continuous gait phase detection.

Dollaporn Anopas, Yodchanan Wongsawat, Jetsada Arnin
1.00
Humans Gait Neural Networks, Computer Unsupervised Machine Learning Walking
Humans Shoulder Fractures Tomography, X-Ray Computed Neural Networks, Computer Female

Classifications MeSH