Hi-LASSO: High-performance python and apache spark packages for feature selection with high-dimensional data.


Journal

PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081

Informations de publication

Date de publication:
2022
Historique:
received: 05 07 2022
accepted: 19 11 2022
entrez: 1 12 2022
pubmed: 2 12 2022
medline: 6 12 2022
Statut: epublish

Résumé

High-dimensional LASSO (Hi-LASSO) is a powerful feature selection tool for high-dimensional data. Our previous study showed that Hi-LASSO outperformed the other state-of-the-art LASSO methods. However, the substantial cost of bootstrapping and the lack of experiments for a parametric statistical test for feature selection have impeded to apply Hi-LASSO for practical applications. In this paper, the Python package and its Spark library are efficiently designed in a parallel manner for practice with real-world problems, as well as providing the capability of the parametric statistical tests for feature selection on high-dimensional data. We demonstrate Hi-LASSO's outperformance with various intensive experiments in a practical manner. Hi-LASSO will be efficiently and easily performed by using the packages for feature selection. Hi-LASSO packages are publicly available at https://github.com/datax-lab/Hi-LASSO under the MIT license. The packages can be easily installed by Python PIP, and additional documentation is available at https://pypi.org/project/hi-lasso and https://pypi.org/project/Hi-LASSO-spark.

Identifiants

pubmed: 36455001
doi: 10.1371/journal.pone.0278570
pii: PONE-D-22-19015
pmc: PMC9714948
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

e0278570

Informations de copyright

Copyright: © 2022 Jo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist.

Références

AMIA Annu Symp Proc. 2009 Nov 14;2009:406-10
pubmed: 20351889
Ann Appl Stat. 2011 Mar 1;5(1):468-485
pubmed: 22997542
PLoS One. 2015 Nov 06;10(11):e0141869
pubmed: 26544691
Bioinformatics. 2019 Apr 1;35(7):1181-1187
pubmed: 30184048

Auteurs

Jongkwon Jo (J)

Department of Information and Statistics, Gyeongsang National University, Jinju-si, South Korea.

Seungha Jung (S)

Department of Information and Statistics, Gyeongsang National University, Jinju-si, South Korea.

Joongyang Park (J)

Department of Information and Statistics, Gyeongsang National University, Jinju-si, South Korea.

Youngsoon Kim (Y)

Department of Information and Statistics, Gyeongsang National University, Jinju-si, South Korea.

Mingon Kang (M)

Department of Computer Science, University of Nevada, Las Vegas, Nevada, United States of America.

Articles similaires

1.00
Humans Paper Contraceptives, Oral Carbon Dioxide Female
Animals DNA Barcoding, Taxonomic Fishes Larva China
Humans Male Intestinal Perforation Female Prognosis
Virome Genome, Viral Gastrointestinal Microbiome Gene Library Humans

Classifications MeSH