Imputing dropouts for single-cell RNA sequencing based on multi-objective optimization.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
13 06 2022
Historique:
received: 06 09 2021
revised: 21 04 2022
accepted: 25 04 2022
pubmed: 30 4 2022
medline: 15 11 2022
entrez: 29 4 2022
Statut: ppublish

Résumé

Single-cell RNA sequencing (scRNA-seq) technologies have been testified revolutionary for their promotion on the profiling of single-cell transcriptomes at single-cell resolution. Excess zeros due to various technical noises, called dropouts, will mislead downstream analyses. Therefore, it is crucial to have accurate imputation methods to address the dropout problem. In this article, we develop a new dropout imputation method for scRNA-seq data based on multi-objective optimization. Our method is different from existing ones, which assume that the underlying data has a preconceived structure and impute the dropouts according to the information learned from such structure. We assume that the data combines three types of latent structures, including the horizontal structure (genes are similar to each other), the vertical structure (cells are similar to each other) and the low-rank structure. The combination weights and latent structures are learned using multi-objective optimization. And, the weighted average of the observed data and the imputation results learned from the three types of structures are considered as the final result. Comprehensive downstream experiments show the superiority of our method in terms of recovery of true gene expression profiles, differential expression analysis, cell clustering and cell trajectory inference. The R package is available at https://github.com/Zhangxf-ccnu/scMOO and https://zenodo.org/record/5785195. The codes to reproduce the downstream analyses in this article can be found at https://github.com/Zhangxf-ccnu/scMOO_experiments_codes and https://zenodo.org/record/5786211. The detailed list of data sets used in the present study is represented in Supplementary Table S1 in the Supplementary materials. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 35485740
pii: 6575885
doi: 10.1093/bioinformatics/btac300
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

3222-3230

Subventions

Organisme : National Natural Science Foundation of China
ID : 11871026
Organisme : Hubei Provincial Science and Technology Innovation Base (Platform) Special Project
ID : 2020DFH002
Organisme : Hong Kong Innovation and Technology Commission
Organisme : Hong Kong Research Grants Council
ID : 11200818
Organisme : City University of Hong Kong
ID : 9610460

Informations de copyright

© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Auteurs

Ke Jin (K)

Department of Statistics, School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China.
Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan 430079, China.

Bo Li (B)

Department of Statistics, School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China.
Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan 430079, China.

Hong Yan (H)

Department of Electrical Engineering and Center for Intelligent Multidimensional Data Analysis, City University of Hong Kong, Hong Kong 999077, China.

Xiao-Fei Zhang (XF)

Department of Statistics, School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China.
Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan 430079, China.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Drought Resistance Gene Expression Profiling Gene Expression Regulation, Plant Gossypium Multigene Family
Cephalometry Humans Anatomic Landmarks Software Internet

Classifications MeSH