Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
15 07 2019
15 07 2019
Historique:
entrez:
13
9
2019
pubmed:
13
9
2019
medline:
10
6
2020
Statut:
ppublish
Résumé
Finding non-linear relationships between biomolecules and a biological outcome is computationally expensive and statistically challenging. Existing methods have important drawbacks, including among others lack of parsimony, non-convexity and computational overhead. Here we propose block HSIC Lasso, a non-linear feature selector that does not present the previous drawbacks. We compare block HSIC Lasso to other state-of-the-art feature selection techniques in both synthetic and real data, including experiments over three common types of genomic data: gene-expression microarrays, single-cell RNA sequencing and genome-wide association studies. In all cases, we observe that features selected by block HSIC Lasso retain more information about the underlying biology than those selected by other techniques. As a proof of concept, we applied block HSIC Lasso to a single-cell RNA sequencing experiment on mouse hippocampus. We discovered that many genes linked in the past to brain development and function are involved in the biological differences between the types of neurons. Block HSIC Lasso is implemented in the Python 2/3 package pyHSICLasso, available on PyPI. Source code is available on GitHub (https://github.com/riken-aip/pyHSICLasso). Supplementary data are available at Bioinformatics online.
Identifiants
pubmed: 31510671
pii: 5529193
doi: 10.1093/bioinformatics/btz333
pmc: PMC6612810
doi:
Substances chimiques
Biomarkers
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
i427-i435Informations de copyright
© The Author(s) 2019. Published by Oxford University Press.
Références
J Bioinform Comput Biol. 2005 Apr;3(2):185-205
pubmed: 15852500
IEEE Trans Pattern Anal Mach Intell. 2005 Aug;27(8):1226-38
pubmed: 16119262
J Neurosci. 2006 Sep 27;26(39):9975-82
pubmed: 17005861
Nature. 2007 Jun 7;447(7145):661-78
pubmed: 17554300
Nat Rev Cancer. 2008 Jan;8(1):37-49
pubmed: 18097463
Philos Trans A Math Phys Eng Sci. 2009 Nov 13;367(1906):4237-53
pubmed: 19805443
Neural Comput. 2014 Jan;26(1):185-207
pubmed: 24102126
Diabetes. 2014 Mar;63(3):1154-65
pubmed: 24306210
BMC Bioinformatics. 2014 May 17;15:146
pubmed: 24884810
Nucleic Acids Res. 2015 Jan;43(Database issue):D146-52
pubmed: 25378301
Gigascience. 2015 Feb 25;4:7
pubmed: 25722852
Science. 2016 Aug 26;353(6302):925-8
pubmed: 27471252
Science. 2017 Apr 21;356(6335):
pubmed: 28428369
Nature. 2017 Nov 16;551(7680):333-339
pubmed: 29144463
Cell. 2018 Jul 26;174(3):716-729.e27
pubmed: 29961576
Science. 1996 Jul 26;273(5274):507-10
pubmed: 8662541