Multi-step ahead prediction of hourly influent characteristics for wastewater treatment plants: a case study from North America.

High temporal resolution Lead time Machine-learning models Multivariate analysis Univariate analysis Wastewater influent

Journal

Environmental monitoring and assessment
ISSN: 1573-2959
Titre abrégé: Environ Monit Assess
Pays: Netherlands
ID NLM: 8508350

Informations de publication

Date de publication:
21 Apr 2022
Historique:
received: 08 11 2021
accepted: 12 03 2022
entrez: 21 4 2022
pubmed: 22 4 2022
medline: 26 4 2022
Statut: epublish

Résumé

Prediction of influent characteristics, before any treatment takes place, is of great importance to the operation and management of wastewater treatment plants (WWTPs). In this study, four machine-learning models, including multilayer perceptron (MLP), long short-term memory network (LSTM), K-nearest neighbour (KNN), and random forest (RF), are introduced to utilize real-time wastewater data from three WWTPs in North America (i.e., Tres Rios, Woodward, and one confidential plant) for predicting hourly influent characteristics. Input variables are selected using an autocorrelation analysis and a variable importance measure from RF. Both univariate and multivariate analyses are investigated to improve model accuracy. The performances of one- and multiple-step-ahead models are compared. With a short prediction horizon, all the models derived from both univariate and multivariate analyses show excellent performance. It was found that the performance deterioration as the prediction horizon expands could be mitigated significantly by including extra variables, such as meteorological variables. This work can provide valuable support for the high-temporal-resolution prediction of wastewater influent characteristics for WWTPs. The proposed models can also bridge the gap between data and decision-making in the wastewater sector.

Identifiants

pubmed: 35445887
doi: 10.1007/s10661-022-09957-y
pii: 10.1007/s10661-022-09957-y
doi:

Substances chimiques

Waste Water 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

389

Informations de copyright

© 2022. The Author(s), under exclusive licence to Springer Nature Switzerland AG.

Références

Abdel-Rahman, E. M., Ahmed, F. B., & Ismail, R. (2013). Random forest regression and spectral band selection for estimating sugarcane leaf nitrogen concentration using EO-1 Hyperion hyperspectral data. International Journal of Remote Sensing, 34(2), 712–728.
doi: 10.1080/01431161.2012.713142
AlSayed, A., Soliman, M., Shakir, R., Snieder, E., ElDyasti, A., & Khan, U. (2021). Data driven models as a powerful tool to simulate emerging bioprocesses: An artificial neural network model to describe methanotrophic microbial activity. Journal of Environmental Informatics, 38(1), 27–40
Archer, K. J., & Kimes, R. V. (2008). Empirical characterization of random forest variable importance measures. Computational Statistics & Data Analysis, 52(4), 2249–2260.
doi: 10.1016/j.csda.2007.08.015
Arimoto, R., Prasad, M. A., & Gifford, E. M. (2005). Development of CYP3A4 inhibition models: Comparisons of machine-learning techniques and molecular descriptors. Journal of Biomolecular Screening, 10(3), 197–205. https://doi.org/10.1177/1087057104274091
doi: 10.1177/1087057104274091
Armstrong, J. S., & Collopy, F. (1992). Error measures for generalizing about forecasting methods: Empirical comparisons. International Journal of Forecasting, 8(1), 69–80.
doi: 10.1016/0169-2070(92)90008-W
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(2).
Boyd, G., Na, D., Li, Z., Snowling, S., Zhang, Q., & Zhou, P. (2019). Influent forecasting for wastewater treatment plants in North America. Sustainability, 11(6), 1764.
doi: 10.3390/su11061764
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32.
doi: 10.1023/A:1010933404324
Breiman, L. (2002). Manual on setting up, using, and understanding random forests v3. 1. Statistics Department University of California Berkeley, CA, USA, 1, 58.
Campisano, A., Cabot Ple, J., Muschalla, D., Pleau, M., & Vanrolleghem, P. A. (2013). Potential and limitations of modern equipment for real time control of urban wastewater systems. Urban Water Journal, 10(5), 300–311.
doi: 10.1080/1573062X.2013.763996
Chen, X. P., Cao, W. P., Zhang, Q. L., Hu, S. B., & Zhang, J. (2020). Artificial intelligence-aided model predictive control for a grid-tied wind-hydrogen-fuel cell system. Ieee Access, 8, 92418–92430. https://doi.org/10.1109/Access.2020.2994577
doi: 10.1109/Access.2020.2994577
Chollet, F. (2015). keras. GitHub. Retrieved from https://github.com/fchollet/keras
Choubin, B., & Rahmati, O. (2021). Groundwater potential mapping using hybridization of simulated annealing and random forest. In Water Engineering Modeling and Mathematic Tools (pp. 391–403): Elsevier.
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27.
doi: 10.1109/TIT.1967.1053964
Cross, S. S., Harrison, R. F., & Kennedy, R. L. (1995). Introduction to neural networks. The Lancet, 346(8982), 1075–1079.
doi: 10.1016/S0140-6736(95)91746-2
Derrac, J., Chiclana, F., García, S., & Herrera, F. (2016). Evolutionary fuzzy k-nearest neighbors algorithm using interval-valued fuzzy sets. Information Sciences, 329, 144–163.
doi: 10.1016/j.ins.2015.09.007
Díaz-Uriarte, R., & De Andres, S. A. (2006). Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7(1), 3.
doi: 10.1186/1471-2105-7-3
Durrenmatt, D. J., & Gujer, W. (2012). Data-driven modeling approaches to support wastewater treatment plant operation. Environmental Modelling & Software, 30, 47–56. https://doi.org/10.1016/j.envsoft.2011.11.007
doi: 10.1016/j.envsoft.2011.11.007
Elmaadawy, K., Abd Elaziz, M., Elsheikh, A. H., Moawad, A., Liu, B., & Lu, S. (2021). Utilization of random vector functional link integrated with manta ray foraging optimization for effluent prediction of wastewater treatment plant. Journal of Environmental Management, 298, 113520.
Ge, Z. (2017). Review on data-driven modeling and monitoring for plant-wide industrial processes. Chemometrics and Intelligent Laboratory Systems, 171, 16–25.
doi: 10.1016/j.chemolab.2017.09.021
Gupta, H. V., Sorooshian, S., & Yapo, P. O. (1999). Status of automatic calibration for hydrologic models: Comparison with multilevel expert calibration. Journal of Hydrologic Engineering, 4(2), 135–143.
doi: 10.1061/(ASCE)1084-0699(1999)4:2(135)
Han, H. G., Zhang, L., & Qiao, J. F. (2018). Data-based predictive control for wastewater treatment process. IEEE Access, 6, 1498–1512. https://doi.org/10.1109/Access.2017.2779175 .
doi: 10.1109/Access.2017.2779175
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
doi: 10.1162/neco.1997.9.8.1735
Imandoust, S. B., & Bolandraftar, M. (2013). Application of k-nearest neighbor (knn) approach for predicting economic events: Theoretical background. International Journal of Engineering Research and Applications, 3(5), 605–610.
Kim, M., Kim, Y., Kim, H., Piao, W., & Kim, C. (2016). Evaluation of the k-nearest neighbor method for forecasting the influent characteristics of wastewater treatment plant. Frontiers of Environmental Science & Engineering, 10(2), 299–310.
doi: 10.1007/s11783-015-0825-7
Kumar, A., Matta, G., & Bhatnagar, S. (2021). A coherent approach of Water Quality Indices and Multivariate Statistical Models to estimate the water quality and pollution source apportionment of River Ganga System in Himalayan region, Uttarakhand India. Environmental Science and Pollution Research, 28(31), 42837–42852. https://doi.org/10.1007/s11356-021-13711-1
doi: 10.1007/s11356-021-13711-1
Li, X., Zeng, G., Huang, G., Li, J., & Jiang, R. (2007). Short-term prediction of the influent quantity time series of wastewater treatment plant based on a chaos neural network model. Frontiers of Environmental Science & Engineering in China, 1(3), 334–338.
doi: 10.1007/s11783-007-0057-6
Maleki, A., Nasseri, S., Aminabad, M. S., & Hadi, M. (2018). Comparison of ARIMA and NNAR models for forecasting water treatment plant’s influent characteristics. KSCE Journal of Civil Engineering, 22(9), 3233–3245.
doi: 10.1007/s12205-018-1195-z
Moriasi, D. N., Arnold, J. G., Van Liew, M. W., Bingner, R. L., Harmel, R. D., & Veith, T. L. (2007). Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Transactions of the Asabe, 50(3), 885–900.
doi: 10.13031/2013.23153
Mosavi, A., Golshan, M., Janizadeh, S., Choubin, B., Melesse, A. M., & Dineva, A. A. (2020). Ensemble models of GLM, FDA, MARS, and RF for flood and erosion susceptibility mapping: a priority assessment of sub-basins. Geocarto International, 1–20.
Mosavi, A., Hosseini, F. S., Choubin, B., Taromideh, F., Ghodsi, M., Nazari, B., & Dineva, A. A. (2021). Susceptibility mapping of groundwater salinity using machine learning models. Environmental Science and Pollution Research, 28(9), 10804–10817.
doi: 10.1007/s11356-020-11319-5
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., & Dubourg, V. (2011). Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12, 2825–2830.
Peterson, L. E. (2009). K-Nearest Neighbor. Scholarpedia, 4(2), 1883.
Rana, M., Koprinska, I., & Agelidis, V. G. (2016). Univariate and multivariate methods for very short-term solar photovoltaic power forecasting. Energy Conversion and Management, 121, 380–390.
doi: 10.1016/j.enconman.2016.05.025
Rosenblatt, F. (1961). Principles of neurodynamics. Perceptrons and the theory of brain mechanisms. Retrieved from
Shcherbakov, M. V., Brebels, A., Shcherbakova, N. L., Tyukov, A. P., Janovsky, T. A., & Kamaev, V. A. E. (2013). A survey of forecast error measures. World Applied Sciences Journal, 24(24), 171–176.
Shi, X., Chen, Z., Wang, H., Yeung, D. Y., Wong, W. K., & Woo, W. C. (2015). Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Advances in Neural Information Processing Systems.
Strobl, C., Boulesteix, A. -L., Zeileis, A., & Hothorn, T. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics, 8(1), 25.
doi: 10.1186/1471-2105-8-25
Szeląg, B., Bartkiewicz, L., Studziński, J., & Barbusiński, K. (2017). Evaluation of the impact of explanatory variables on the accuracy of prediction of daily inflow to the sewage treatment plant by selected models nonlinear. Archives of Environmental Protection, 43(3), 74–81.
doi: 10.1515/aep-2017-0030
Tyralis, H., & Papacharalampous, G. (2017). Variable selection in time series forecasting using random forests. Algorithms, 10(4), 114.
doi: 10.3390/a10040114
Wang, R., Pan, Z., Chen, Y., Tan, Z., & Zhang, J. (2021). Influent quality and quantity prediction in wastewater treatment plant: Model construction and evaluation. Polish Journal of Environmental Studies, 30(5).
Wei, X. P., & Kusiak, A. (2015). Short-term prediction of influent flow in wastewater treatment plant. Stochastic Environmental Research and Risk Assessment, 29(1), 241–249. https://doi.org/10.1007/s00477-014-0889-0
doi: 10.1007/s00477-014-0889-0
Yang, Y., Huang, T. T., Shi, Y. Z., Wendroth, O., & Liu, B. Y. (2021). Comparing the Performance of an Autoregressive State-Space Approach to the Linear Regression and Artificial Neural Network for Streamflow Estimation. Journal of Environmental Informatics, 37(1), 36–48.
doi: 10.3808/jei.200500044
Zhang, Q. Q., Li, Z., Snowling, S., Siam, A., & El-Dakhakhni, W. (2019). Predictive models for wastewater flow forecasting based on time series analysis and artificial neural network. Water Science and Technology, 80(2), 243–253. https://doi.org/10.2166/wst.2019.263
doi: 10.2166/wst.2019.263
Zhao, Z., Chen, W., Wu, X., Chen, P. C., & Liu, J. (2017). LSTM network: A deep learning approach for short-term traffic forecast. IET Intelligent Transport Systems, 11(2), 68–75.
doi: 10.1049/iet-its.2016.0208
Zhou, P., Li, Z., Snowling, S., Baetz, B. W., Na, D., & Boyd, G. (2019). A random forest model for inflow prediction at wastewater treatment plants. Stochastic Environmental Research and Risk Assessment, 33(10), 1781–1792.
doi: 10.1007/s00477-019-01732-9
Zipper, C. E., & Skousen, J. G. (2010). Influent water quality affects performance of passive treatment systems for acid mine drainage. Mine Water and the Environment, 29(2), 135–143.
doi: 10.1007/s10230-010-0101-9

Auteurs

Pengxiao Zhou (P)

Department of Civil Engineering, McMaster University, Hamilton, ON, L8S 4L7, Canada.

Zhong Li (Z)

Department of Civil Engineering, McMaster University, Hamilton, ON, L8S 4L7, Canada. zoeli@mcmaster.ca.

Spencer Snowling (S)

Hatch Ltd., Sheridan Science & Technology Park, 2800 Speakman Drive, Mississauga, ON, L5K 2R7, Canada.

Rajeev Goel (R)

Hatch Ltd., Sheridan Science & Technology Park, 2800 Speakman Drive, Mississauga, ON, L5K 2R7, Canada.

Qianqian Zhang (Q)

Department of Civil Engineering, McMaster University, Hamilton, ON, L8S 4L7, Canada.

Articles similaires

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Understanding the role of machine learning in predicting progression of osteoarthritis.

Simone Castagno, Benjamin Gompels, Estelle Strangmark et al.
1.00
Humans Disease Progression Machine Learning Osteoarthritis
India Carbon Sequestration Environmental Monitoring Carbon Biomass

Unsupervised learning for real-time and continuous gait phase detection.

Dollaporn Anopas, Yodchanan Wongsawat, Jetsada Arnin
1.00
Humans Gait Neural Networks, Computer Unsupervised Machine Learning Walking

Classifications MeSH