An animal health example of managing and analyzing a large volume of data on a PC: Modeling body weight and age of over 13 million cats for explanatory and predictive purposes.


Journal

Preventive veterinary medicine
ISSN: 1873-1716
Titre abrégé: Prev Vet Med
Pays: Netherlands
ID NLM: 8217463

Informations de publication

Date de publication:
Jan 2020
Historique:
received: 26 06 2019
revised: 23 10 2019
accepted: 01 11 2019
pubmed: 17 11 2019
medline: 25 8 2020
entrez: 17 11 2019
Statut: ppublish

Résumé

Large amounts of animal health data are available to researchers, but are often stored in different formats and information silos. Analysis of this existing information can provide new insights into the health and welfare of animals and possibly reduce the need to collect additional data. The objective of this study was to develop a method of managing and analyzing large amounts of data on a personal computer that can be run within 24 h to limit the time and resources spent deploying models on larger servers. This paper describes an overall approach that makes use of existing methods for data acquisition and modeling, but adapts and combines them in a way that allows manipulation and analysis of large volumes of data on a PC. This included a total of five steps: removing errors; removing data points outside the scope of a specific hypothesis; creating descriptive statistics; developing explanatory and/or predictive models; and assessing the fit or accuracy of the models created. The approach was developed using electronic medical records for 19,416,753 feline patients from 3972 anonymized veterinary clinics in the United States and Canada, recorded between January 1981 and June 2016. Data regarding patient signalment (age, sex, breed, reproductive status) and body weight were extracted from the records and used to create linear regression models to describe body weight in cats of different ages, breeds, genders and reproductive status. Ordinary least squares linear regression and stochastic gradient descent linear regression were compared to determine their effectiveness and suitability for creating predictive models with large datasets, using 10 fold cross validation. This approach could be used to build workflows to create models to determine exploratory and predictive properties of health parameters for animals and people. The ability to work with large datasets on a PC or equivalent technology was demonstrated. Significant interactions were present among sex, reproductive status and age. A peak in weight occurred between 6 and 9 years depending on the sex, reproductive status and breed. The predictive ability of the two models was similar, with both producing a root mean square error of 1.45 and a mean absolute error of 1.09, and mean error that was approximately zero on the validation dataset.

Identifiants

pubmed: 31733427
pii: S0167-5877(19)30413-1
doi: 10.1016/j.prevetmed.2019.104824
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

104824

Informations de copyright

Copyright © 2019 Elsevier B.V. All rights reserved.

Auteurs

Adam Campigotto (A)

Department of Population Medicine, Ontario Veterinary College, University of Guelph, 50 Stone Rd E, Guelph, Ontario, N1G 2W1, Canada. Electronic address: acampigo@uoguelph.ca.

Theresa Bernardo (T)

Department of Population Medicine, Ontario Veterinary College, University of Guelph, 50 Stone Rd E, Guelph, Ontario, N1G 2W1, Canada.

Elizabeth Stone (E)

Department of Clinical Studies, Ontario Veterinary College, University of Guelph, 50 Stone Rd E, Guelph, Ontario, N1G 2W1, Canada.

Deborah Stacey (D)

School of Computer Science, University of Guelph, 50 Stone Rd E, Guelph, Ontario, N1G 2W1, Canada.

Zvonimir Poljak (Z)

Department of Population Medicine, Ontario Veterinary College, University of Guelph, 50 Stone Rd E, Guelph, Ontario, N1G 2W1, Canada.

Articles similaires

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male
Humans Meals Time Factors Female Adult

Classifications MeSH