Machine learning models for net photosynthetic rate prediction using poplar leaf phenotype data.


Journal

PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081

Informations de publication

Date de publication:
2020
Historique:
received: 04 10 2019
accepted: 20 01 2020
entrez: 12 2 2020
pubmed: 12 2 2020
medline: 1 5 2020
Statut: epublish

Résumé

As an essential component in reducing anthropogenic CO2 emissions to the atmosphere, tree planting is the key to keeping carbon dioxide emissions under control. In 1992, the United Nations agreed to take action at the Earth Summit to stabilize and reduce net zero global anthropogenic CO2 emissions. Tree planting was identified as an effective method to offset CO2 emissions. A high net photosynthetic rate (Pn) with fast-growing trees could efficiently fulfill the goal of CO2 emission reduction. Net photosynthetic rate model can provide refernece for plant's stability of photosynthesis productivity. Using leaf phenotype data to predict the Pn can help effectively guide tree planting policies to offset CO2 release into the atmosphere. Tree planting has been proposed as one climate change solution. One of the most popular trees to plant are poplars. This study used a Populus simonii (P. simonii) dataset collected from 23 artificial forests in northern China. The samples represent almost the entire geographic distribution of P. simonii. The geographic locations of these P. simonii trees cover most of the major provinces of northern China. The northwestern point reaches (36°30'N, 98°09'E). The northeastern point reaches (40°91'N, 115°83'E). The southwestern point reaches (32°31'N, 108°90'E). The southeastern point reaches (34°39'N, 113°74'E). The collected data on leaf phenotypic traits are sparse, noisy, and highly correlated. The photosynthetic rate data are nonnormal and skewed. Many machine learning algorithms can produce reasonably accurate predictions despite these data issues. Influential outliers are removed to allow an accurate and precise prediction, and cluster analysis is implemented as part of a data exploratory analysis to investigate further details in the dataset. We select four regression methods, extreme gradient boosting (XGBoost), support vector machine (SVM), random forest (RF) and generalized additive model (GAM), which are suitable to use on the dataset given in this study. Cross-validation and regularization mechanisms are implemented in the XGBoost, SVM, RF, and GAM algorithms to ensure the validity of the outputs. The best-performing approach is XGBoost, which generates a net photosynthetic rate prediction that has a 0.77 correlation with the actual rates. Moreover, the root mean square error (RMSE) is 2.57, which is approximately 35 percent smaller than the standard deviation of 3.97. The other metrics, i.e., the MAE, R2, and the min-max accuracy are 1.12, 0.60, and 0.93, respectively. This study demonstrates the ability of machine learning models to use noisy leaf phenotype data to predict the net photosynthetic rate with significant accuracy. Most net photosynthetic rate prediction studies are conducted on herbaceous plants. The net photosynthetic rate prediction of P. simonii, a kind of woody plant, illustrates significant guidance for plant science or environmental science regarding the predictive relationship between leaf phenotypic characteristics and the Pn for woody plants in northern China.

Sections du résumé

BACKGROUND
As an essential component in reducing anthropogenic CO2 emissions to the atmosphere, tree planting is the key to keeping carbon dioxide emissions under control. In 1992, the United Nations agreed to take action at the Earth Summit to stabilize and reduce net zero global anthropogenic CO2 emissions. Tree planting was identified as an effective method to offset CO2 emissions. A high net photosynthetic rate (Pn) with fast-growing trees could efficiently fulfill the goal of CO2 emission reduction. Net photosynthetic rate model can provide refernece for plant's stability of photosynthesis productivity.
METHODS AND RESULTS
Using leaf phenotype data to predict the Pn can help effectively guide tree planting policies to offset CO2 release into the atmosphere. Tree planting has been proposed as one climate change solution. One of the most popular trees to plant are poplars. This study used a Populus simonii (P. simonii) dataset collected from 23 artificial forests in northern China. The samples represent almost the entire geographic distribution of P. simonii. The geographic locations of these P. simonii trees cover most of the major provinces of northern China. The northwestern point reaches (36°30'N, 98°09'E). The northeastern point reaches (40°91'N, 115°83'E). The southwestern point reaches (32°31'N, 108°90'E). The southeastern point reaches (34°39'N, 113°74'E). The collected data on leaf phenotypic traits are sparse, noisy, and highly correlated. The photosynthetic rate data are nonnormal and skewed. Many machine learning algorithms can produce reasonably accurate predictions despite these data issues. Influential outliers are removed to allow an accurate and precise prediction, and cluster analysis is implemented as part of a data exploratory analysis to investigate further details in the dataset. We select four regression methods, extreme gradient boosting (XGBoost), support vector machine (SVM), random forest (RF) and generalized additive model (GAM), which are suitable to use on the dataset given in this study. Cross-validation and regularization mechanisms are implemented in the XGBoost, SVM, RF, and GAM algorithms to ensure the validity of the outputs.
CONCLUSIONS
The best-performing approach is XGBoost, which generates a net photosynthetic rate prediction that has a 0.77 correlation with the actual rates. Moreover, the root mean square error (RMSE) is 2.57, which is approximately 35 percent smaller than the standard deviation of 3.97. The other metrics, i.e., the MAE, R2, and the min-max accuracy are 1.12, 0.60, and 0.93, respectively. This study demonstrates the ability of machine learning models to use noisy leaf phenotype data to predict the net photosynthetic rate with significant accuracy. Most net photosynthetic rate prediction studies are conducted on herbaceous plants. The net photosynthetic rate prediction of P. simonii, a kind of woody plant, illustrates significant guidance for plant science or environmental science regarding the predictive relationship between leaf phenotypic characteristics and the Pn for woody plants in northern China.

Identifiants

pubmed: 32045452
doi: 10.1371/journal.pone.0228645
pii: PONE-D-19-27819
pmc: PMC7012418
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

e0228645

Déclaration de conflit d'intérêts

Author Andrew Siu was employed by Amgen. This disclosure does not alter our adherence to the PLOS ONE polices on sharing data and materials. All authors declare no competing interests.

Références

Planta. 1991 Jul;184(4):538-44
pubmed: 24194245
J Exp Bot. 2016 Feb;67(3):723-37
pubmed: 26552881

Auteurs

Xiao-Yu Zhang (XY)

College of Science, Beijing Forestry University, Beijing, P. R. China.

Ziyuan Huang (Z)

Data Science, Harrisburg University of Science and Technology, Harrisburg, PA, United States of America.

Xuehui Su (X)

Jiaozuo Academy of Agriculture and Forestry Sciences, Jiaozuo, P. R. China.

Andrew Siu (A)

Amgen Inc., Thousand Oaks, CA, United States of America.

Yuepeng Song (Y)

College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P. R. China.

Deqiang Zhang (D)

College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P. R. China.

Qing Fang (Q)

Faculty of Science, Yamagata University, Yamagata, Japan.

Articles similaires

Photosynthesis Ribulose-Bisphosphate Carboxylase Carbon Dioxide Molecular Dynamics Simulation Cyanobacteria
Populus Soil Microbiology Soil Microbiota Fungi

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Understanding the role of machine learning in predicting progression of osteoarthritis.

Simone Castagno, Benjamin Gompels, Estelle Strangmark et al.
1.00
Humans Disease Progression Machine Learning Osteoarthritis

Classifications MeSH