Leveraging class hierarchy for detecting missing annotations on hierarchical multi-label classification.

Detecting missing annotations Gene function prediction Gene ontology hierarchy Hierarchical multi-label classification Random forest Structured output prediction Tree ensembles

Journal

Computers in biology and medicine
ISSN: 1879-0534
Titre abrégé: Comput Biol Med
Pays: United States
ID NLM: 1250250

Informations de publication

Date de publication:
01 2023
Historique:
received: 12 07 2022
revised: 09 11 2022
accepted: 11 12 2022
pubmed: 19 12 2022
medline: 6 1 2023
entrez: 18 12 2022
Statut: ppublish

Résumé

With the development of new sequencing technologies, availability of genomic data has grown exponentially. Over the past decade, numerous studies have used genomic data to identify associations between genes and biological functions. While these studies have shown success in annotating genes with functions, they often assume that genes are completely annotated and fail to take into account that datasets are sparse and noisy. This work proposes a method to detect missing annotations in the context of hierarchical multi-label classification. More precisely, our method exploits the relations of functions, represented as a hierarchy, by computing probabilities based on the paths of functions in the hierarchy. By performing several experiments on a variety of rice (Oriza sativa Japonica), we showcase that the proposed method accurately detects missing annotations and yields superior results when compared to state-of-art methods from the literature.

Identifiants

pubmed: 36529023
pii: S0010-4825(22)01131-3
doi: 10.1016/j.compbiomed.2022.106423
pii:
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

106423

Informations de copyright

Copyright © 2022. Published by Elsevier Ltd.

Déclaration de conflit d'intérêts

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Miguel Romero (M)

Department of Electronics and Computer Science, Pontificia Universidad Javeriana, Calle 18 N 118-250, Cali, 760031, Colombia. Electronic address: miguelangel.romero@javerianacali.edu.co.

Felipe Kenji Nakano (FK)

Department of Public Health and Primary Care, KU Leuven Campus KULAK, Etienne Sabbelaan 53, Kortrijk, 8500, Belgium; Itec, imec research group at KU Leuven, Etienne Sabbelaan 53, Kortrijk, 8500, Belgium. Electronic address: felipekenji.nakano@kuleuven.be.

Jorge Finke (J)

Department of Electronics and Computer Science, Pontificia Universidad Javeriana, Calle 18 N 118-250, Cali, 760031, Colombia. Electronic address: jfinke@javerianacali.edu.co.

Camilo Rocha (C)

Department of Electronics and Computer Science, Pontificia Universidad Javeriana, Calle 18 N 118-250, Cali, 760031, Colombia. Electronic address: camilo.rocha@javerianacali.edu.co.

Celine Vens (C)

Department of Public Health and Primary Care, KU Leuven Campus KULAK, Etienne Sabbelaan 53, Kortrijk, 8500, Belgium; Itec, imec research group at KU Leuven, Etienne Sabbelaan 53, Kortrijk, 8500, Belgium. Electronic address: celine.vens@kuleuven.be.

Articles similaires

Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Genome, Bacterial Virulence Phylogeny Genomics Plant Diseases
Host Specificity Bacteriophages Genomics Algorithms Escherichia coli
Genome, Plant Medicago sativa Crops, Agricultural Genomics Polyploidy

Classifications MeSH