A predictive model of recreational water quality based on adaptive synthetic sampling algorithms and machine learning.

Adaptive synthetic sampling algorithm Artificial neural network Boosting decision tree Nearest neighbour Support vector machine Water quality

Journal

Water research
ISSN: 1879-2448
Titre abrégé: Water Res
Pays: England
ID NLM: 0105072

Informations de publication

Date de publication:
15 Jun 2020
Historique:
received: 09 12 2019
revised: 01 04 2020
accepted: 02 04 2020
pubmed: 25 4 2020
medline: 6 5 2020
entrez: 25 4 2020
Statut: ppublish

Résumé

Predicting recreational water quality is one of the most difficult tasks in water management with major implications for humans and society. Many data-driven models have been used to predict water quality indicators to allow a real time assessment of public health risk. This assessment is most commonly based on Faecal Indicator Bacteria (FIB), with the value of FIB compared with thresholds published in guidelines. However, FIB values usually tend to be unbalanced within water quality datasets, with small proportions of data exceeding guideline thresholds and far larger numbers that do not. This can be a limiting factor in the uptake of model predictions since, even if the overall accuracy is high, the sensitivity of the predictions can be low. To address this issue, this paper proposes an adaptive synthetic sampling algorithm (ADASYN) to generate synthetic above-threshold FIB instances and test the validity of the approach for the prediction of recreational water quality. The models in this paper are based on four machine learning techniques: k-mean nearest neighbour, boosting decision tree, support vector machine, and multi-layer perceptron artificial neural network and are applied to five different locations in Auckland, New Zealand. Aside from support vector machine, all models provide favourable predictions with relatively high sensitivity (around 75%) and overall accuracy (over 90%), indicating that both the compliant and exceedance conditions can be effectively predicted through the use of more sophisticated model training which involves artificial data. Considering the model accuracy and stability, boosting decision trees (BDT) and multi-layer perceptron artificial neural (MLP-ANN) network are the best two models and the multi-layer perceptron is the most efficient with the shortest computation time.

Identifiants

pubmed: 32330740
pii: S0043-1354(20)30325-0
doi: 10.1016/j.watres.2020.115788
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

115788

Informations de copyright

Copyright © 2020 Elsevier Ltd. All rights reserved.

Déclaration de conflit d'intérêts

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Tingting Xu (T)

School of Environment, Faculty of Science, University of Auckland, New Zealand. Electronic address: txu648@aucklanduni.ac.nz.

Giovanni Coco (G)

School of Environment, Faculty of Science, University of Auckland, New Zealand.

Martin Neale (M)

School of Environment, Faculty of Science, University of Auckland, New Zealand.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH