AdaSampling for Positive-Unlabeled and Label Noise Learning With Bioinformatics Applications.

Algorithms Computational Biology / methods Machine Learning Models, Statistical Phosphoproteins / chemistry Phosphotransferases / chemistry Proteins / chemistry Transcription Factors

Journal

IEEE transactions on cybernetics

ISSN: 2168-2275

Titre abrégé: IEEE Trans Cybern

Pays: United States

ID NLM: 101609393

Informations de publication

Date de publication:
May 2019

Historique:

pubmed: 12 7 2018

medline: 18 6 2019

entrez: 12 7 2018

Statut: ppublish

Résumé

Class labels are required for supervised learning but may be corrupted or missing in various applications. In binary classification, for example, when only a subset of positive instances is labeled whereas the remaining are unlabeled, positive-unlabeled (PU) learning is required to model from both positive and unlabeled data. Similarly, when class labels are corrupted by mislabeled instances, methods are needed for learning in the presence of class label noise (LN). Here we propose adaptive sampling (AdaSampling), a framework for both PU learning and learning with class LN. By iteratively estimating the class mislabeling probability with an adaptive sampling procedure, the proposed method progressively reduces the risk of selecting mislabeled instances for model training and subsequently constructs highly generalizable models even when a large proportion of mislabeled instances is present in the data. We demonstrate the utilities of proposed methods using simulation and benchmark data, and compare them to alternative approaches that are commonly used for PU learning and/or learning with LN. We then introduce two novel bioinformatics applications where AdaSampling is used to: 1) identify kinase-substrates from mass spectrometry-based phosphoproteomics data and 2) predict transcription factor target genes by integrating various next-generation sequencing data.

Identifiants

DOI: 10.1109/TCYB.2018.2816984 PMID: 29993676

pubmed: 29993676

doi: 10.1109/TCYB.2018.2816984

doi:

Substances chimiques

Phosphoproteins 0

Proteins 0

Transcription Factors 0

Phosphotransferases EC 2.7.-

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

1932-1943

AdaSampling for Positive-Unlabeled and Label Noise Learning With Bioinformatics Applications.

Journal

Informations de publication

Résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Pagination

Auteurs

Pengyi Yang (P)

John T Ormerod (JT)

Wei Liu (W)

Chendong Ma (C)

Albert Y Zomaya (AY)

Jean Y H Yang (JYH)

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Exploring structural diversity across the protein universe with The Encyclopedia of Domains.

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Understanding the role of machine learning in predicting progression of osteoarthritis.

Classifications MeSH