Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy.

Linear mixed models Machine learning Model accuracy Model precision Sustainable intensification

Journal

Field crops research

ISSN: 0378-4290

Titre abrégé: Field Crops Res

Pays: Netherlands

ID NLM: 101147219

Informations de publication

Date de publication:
15 Oct 2023

Historique:

received: 21 12 2022

revised: 06 06 2023

accepted: 18 07 2023

medline: 16 10 2023

pubmed: 16 10 2023

entrez: 16 10 2023

Statut: ppublish

Résumé

Collection and analysis of large volumes of on-farm production data are widely seen as key to understanding yield variability among farmers and improving resource-use efficiency. The aim of this study was to assess the performance of statistical and machine learning methods to explain and predict crop yield across thousands of farmers' fields in contrasting farming systems worldwide. A large database of 10,940 field-year combinations from three countries in different stages of agricultural intensification was analyzed. Random effects models were used to partition crop yield variability and random forest models were used to explain and predict crop yield within a cross-validation scheme with data re-sampling over space and time. Yield variability in relative terms was smallest for wheat and barley in the Netherlands and for wheat in Ethiopia, intermediate for rice in the Philippines, and greatest for maize in Ethiopia. Random forest models comprising a total of 87 variables explained a maximum of 65 % of cereal yield variability in the Netherlands and less than 45 % of cereal yield variability in Ethiopia and in the Philippines. Crop management related variables were important to explain and predict cereal yields in Ethiopia, while predictive (i.e., known before the growing season) climatic variables and explanatory (i.e., known during or after the growing season) climatic variables were most important to explain and predict cereal yield variability in the Philippines and in the Netherlands, respectively. Finally, model cross-validation for regions or years not seen during model training reduced the R Big data from farmers' fields is useful to explain on-farm yield variability to some extent, but not to predict it across time and space. The results call for moderate expectations towards big data and machine learning in agronomic studies, particularly for smallholder farms in the tropics where model performance was poorest independently of the variables considered and the cross-validation scheme used.

Identifiants

DOI: 10.1016/j.fcr.2023.109063 PMID: 37840838 PMC: PMC10565834

pubmed: 37840838

doi: 10.1016/j.fcr.2023.109063

pii: S0378-4290(23)00256-3

pmc: PMC10565834

doi:

Types de publication

Journal Article

Langues

eng

Pagination

109063

Informations de copyright

Déclaration de conflit d'intérêts

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Références

Nature. 2018 Mar 15;555(7696):363-366

pubmed: 29513654

PLoS One. 2017 Feb 16;12(2):e0169748

pubmed: 28207752

Sci Data. 2015 Dec 08;2:150066

pubmed: 26646728

Food Policy. 2021 Jul;102:102122

pubmed: 34898811

Nat Food. 2022 Apr;3(4):255-265

pubmed: 37118190

PLoS One. 2019 Jul 31;14(7):e0219327

pubmed: 31365535

Field Crops Res. 2023 Aug 1;299:108975

pubmed: 37529086

Proc Natl Acad Sci U S A. 2016 Jan 12;113(2):458-63

pubmed: 26712016

Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Pagination

Informations de copyright

Déclaration de conflit d'intérêts

Références

Auteurs

João Vasco Silva (JV)

Joost van Heerwaarden (JV)

Pytrik Reidsma (P)

Alice G Laborte (AG)

Kindie Tesfaye (K)

Martin K van Ittersum (MKV)

Classifications MeSH