A heuristic approach to handling missing data in biologics manufacturing databases.

Biologics manufacturing data Data pre-processing Imputation Missing data Parameter recurrence

Journal

Bioprocess and biosystems engineering
ISSN: 1615-7605
Titre abrégé: Bioprocess Biosyst Eng
Pays: Germany
ID NLM: 101088505

Informations de publication

Date de publication:
Apr 2019
Historique:
received: 21 06 2018
accepted: 20 12 2018
pubmed: 9 1 2019
medline: 18 7 2019
entrez: 9 1 2019
Statut: ppublish

Résumé

The biologics sector has amassed a wealth of data in the past three decades, in line with the bioprocess development and manufacturing guidelines, and analysis of these data with precision is expected to reveal behavioural patterns in cell populations that can be used for making predictions on how future culture processes might behave. The historical bioprocessing data likely comprise experiments conducted using different cell lines, to produce different products and may be years apart; the situation causing inter-batch variability and missing data points to human- and instrument-associated technical oversights. These unavoidable complications necessitate the introduction of a pre-processing step prior to data mining. This study investigated the efficiency of mean imputation and multivariate regression for filling in the missing information in historical bio-manufacturing datasets, and evaluated their performance by symbolic regression models and Bayesian non-parametric models in subsequent data processing. Mean substitution was shown to be a simple and efficient imputation method for relatively smooth, non-dynamical datasets, and regression imputation was effective whilst maintaining the existing standard deviation and shape of the distribution in dynamical datasets with less than 30% missing data. The nature of the missing information, whether Missing Completely At Random, Missing At Random or Missing Not At Random, emerged as the key feature for selecting the imputation method.

Identifiants

pubmed: 30617419
doi: 10.1007/s00449-018-02059-5
pii: 10.1007/s00449-018-02059-5
pmc: PMC6430751
doi:

Substances chimiques

Biological Products 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

657-663

Subventions

Organisme : Leverhulme Trust
ID : ECF-2016-681
Organisme : MedImmune Beacon Project
ID : Project 008

Références

Am Stat. 2007 Feb;61(1):79-90
pubmed: 17401454
Am J Epidemiol. 2009 May 1;169(9):1133-9
pubmed: 19318618
N Engl J Med. 2012 Oct 4;367(14):1355-60
pubmed: 23034025
Nephrol Dial Transplant. 2013 Oct;28(10):2415-20
pubmed: 23729490
Korean J Anesthesiol. 2013 May;64(5):402-6
pubmed: 23741561
Bioinformatics. 2016 Feb 1;32(3):388-97
pubmed: 26411869
Microbiology. 2017 Jun;163(6):829-839
pubmed: 28635591

Auteurs

Jeanet Mante (J)

Pembroke College, Cambridge, UK.

Nishanthi Gangadharan (N)

Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK.

David J Sewell (DJ)

Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK.

Richard Turner (R)

Cell Sciences, Biopharmaceutical Development, MedImmune, Cambridge, UK.

Ray Field (R)

Cell Sciences, Biopharmaceutical Development, MedImmune, Cambridge, UK.

Stephen G Oliver (SG)

Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK.
Department of Biochemistry, University of Cambridge, Cambridge, UK.

Nigel Slater (N)

Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK.

Duygu Dikicioglu (D)

Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK. dd345@cam.ac.uk.
Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK. dd345@cam.ac.uk.

Articles similaires

Vancomycin Polyesters Anti-Bacterial Agents Models, Theoretical Drug Liberation
1.00
Iran Environmental Monitoring Seasons Ecosystem Forests
Humans Recurrence Male Female Middle Aged
Cities China Government Conservation of Natural Resources Humans

Classifications MeSH