Identification of disease-associated loci using machine learning for genotype and network data integration.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
15 12 2019
Historique:
received: 22 12 2018
revised: 28 03 2019
accepted: 25 04 2019
pubmed: 10 5 2019
medline: 1 7 2020
entrez: 10 5 2019
Statut: ppublish

Résumé

Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci. We developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals' ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user's research needs. An R package (cnmtf) is available at https://lgl15.github.io/cnmtf_web/index.html. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 31070705
pii: 5487393
doi: 10.1093/bioinformatics/btz310
pmc: PMC6954643
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

5182-5190

Subventions

Organisme : NIMH NIH HHS
ID : RL1 MH083268
Pays : United States
Organisme : NHGRI NIH HHS
ID : U01 HG004610
Pays : United States
Organisme : Wellcome Trust
ID : WT/104955/Z/14/Z
Pays : United Kingdom
Organisme : NHGRI NIH HHS
ID : U01 HG004608
Pays : United States
Organisme : Medical Research Council
ID : MR/M013138/2
Pays : United Kingdom
Organisme : NHGRI NIH HHS
ID : U01 HG004438
Pays : United States
Organisme : NHGRI NIH HHS
ID : U01 HG004424
Pays : United States
Organisme : Medical Research Council
ID : MR/M013138/1
Pays : United Kingdom
Organisme : NHLBI NIH HHS
ID : R01 HL087679
Pays : United States
Organisme : Medical Research Council
ID : MR/S019669/1
Pays : United Kingdom
Organisme : NHGRI NIH HHS
ID : U01 HG004609
Pays : United States

Informations de copyright

© The Author(s) 2019. Published by Oxford University Press.

Références

Bioinformatics. 2018 Jan 15;34(2):239-248
pubmed: 28968779
J Biol Chem. 2003 Oct 3;278(40):38772-9
pubmed: 12882964
PLoS Genet. 2009 Jan;5(1):e1000342
pubmed: 19148283
Genome Biol. 2016 Jun 06;17(1):122
pubmed: 27268795
Hum Genet. 2014 Feb;133(2):125-38
pubmed: 24122152
Bioinformatics. 2007 Jun 15;23(12):1495-502
pubmed: 17483501
PLoS Genet. 2011 Jan 13;7(1):e1001273
pubmed: 21249183
Curr Opin Genet Dev. 2013 Dec;23(6):602-10
pubmed: 24287332
Genome Res. 2011 Jul;21(7):1109-21
pubmed: 21536720
Nat Genet. 2009 Jan;41(1):35-46
pubmed: 19060910
PLoS Comput Biol. 2012;8(12):e1002822
pubmed: 23300413
Genomics. 2020 Jan;112(1):379-387
pubmed: 30818062
Bioinformatics. 2017 May 15;33(10):1536-1544
pubmed: 28069594
Pac Symp Biocomput. 2016;21:321-32
pubmed: 26776197
Genome Med. 2015 Feb 23;7(1):16
pubmed: 25709717
BMC Med Genomics. 2011 Jan 26;4:13
pubmed: 21269473
Nucleic Acids Res. 2012 Oct;40(19):e146
pubmed: 22735708
Bioinformatics. 2015 May 1;31(9):1466-8
pubmed: 25550326
Int J Alzheimers Dis. 2010 Dec 02;2010:604792
pubmed: 21151659
Bioinformatics. 2014 Jun 15;30(12):i19-25
pubmed: 24931983
IEEE Trans Pattern Anal Mach Intell. 2015 Jan;37(1):41-53
pubmed: 26353207
Proc Natl Acad Sci U S A. 2012 Jan 24;109(4):1193-8
pubmed: 22223662
Biochim Biophys Acta. 2015 May;1851(5):527-36
pubmed: 25625330
Cold Spring Harb Protoc. 2012 Mar 01;2012(3):297-306
pubmed: 22383645
Nat Genet. 2013 Nov;45(11):1274-1283
pubmed: 24097068
Nature. 2010 Aug 5;466(7307):707-13
pubmed: 20686565
J R Soc Interface. 2015 Nov 6;12(112):null
pubmed: 26490630
Bioinformatics. 2016 Apr 15;32(8):1195-203
pubmed: 26668003
Nucleic Acids Res. 2017 Jan 4;45(D1):D896-D901
pubmed: 27899670
Proteomics. 2016 Mar;16(5):741-58
pubmed: 26677817
Sci Rep. 2017 Apr 20;7(1):938
pubmed: 28428554
Bioinformatics. 2011 Jul 1;27(13):i342-8
pubmed: 21685091
Nat Genet. 2009 Jan;41(1):47-55
pubmed: 19060911
Bioinformatics. 2014 Sep 1;30(17):i594-600
pubmed: 25161252
BMC Genomics. 2016 Jun 10;17:443
pubmed: 27286809
Lipids Health Dis. 2017 Jun 2;16(1):103
pubmed: 28577571

Auteurs

Luis G Leal (LG)

Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK.

Alessia David (A)

Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK.

Marjo-Riita Jarvelin (MR)

Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu FI-90014, Finland.
Biocenter Oulu, University of Oulu, Oulu 90220, Finland.
Unit of Primary Health Care, Oulu University Hospital, Oulu 90220, Finland.
Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London W2 1PG, UK.
Department of Life Sciences, College of Health and Life Sciences, Brunel University London, Middlesex UB8 3PH, UK.

Sylvain Sebert (S)

Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu FI-90014, Finland.
Biocenter Oulu, University of Oulu, Oulu 90220, Finland.

Minna Männikkö (M)

Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu FI-90014, Finland.

Ville Karhunen (V)

Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu FI-90014, Finland.
Biocenter Oulu, University of Oulu, Oulu 90220, Finland.
Unit of Primary Health Care, Oulu University Hospital, Oulu 90220, Finland.
Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London W2 1PG, UK.
Department of Life Sciences, College of Health and Life Sciences, Brunel University London, Middlesex UB8 3PH, UK.

Eleanor Seaby (E)

Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Clive Hoggart (C)

Department of Medicine, Imperial College London, London W2 1PG, UK.

Michael J E Sternberg (MJE)

Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH