Empirical Bayes small area prediction under a zero-inflated lognormal model with correlated random area effects.

correlated random effects empirical Bayes sheet and rill erosion small area prediction zero-inflated lognormal

Journal

Biometrical journal. Biometrische Zeitschrift
ISSN: 1521-4036
Titre abrégé: Biom J
Pays: Germany
ID NLM: 7708048

Informations de publication

Date de publication:
Dec 2020
Historique:
revised: 05 07 2020
received: 23 01 2020
accepted: 06 07 2020
medline: 30 7 2020
pubmed: 30 7 2020
entrez: 30 7 2020
Statut: ppublish

Résumé

Many variables of interest in agricultural or economical surveys have skewed distributions and can equal zero. Our data are measures of sheet and rill erosion called Revised Universal Soil Loss Equation - 2 (RUSLE2). Small area estimates of mean RUSLE2 erosion are of interest. We use a zero-inflated lognormal mixed effects model for small area estimation. The model combines a unit-level lognormal model for the positive RUSLE2 responses with a unit-level logistic mixed effects model for the binary indicator that the response is nonzero. In the Conservation Effects Assessment Project (CEAP) data, counties with a higher probability of nonzero responses also tend to have a higher mean among the positive RUSLE2 values. We capture this property of the data through an assumption that the pair of random effects for a county are correlated. We develop empirical Bayes (EB) small area predictors and a bootstrap estimator of the mean squared error (MSE). In simulations, the proposed predictor is superior to simpler alternatives. We then apply the method to construct EB predictors of mean RUSLE2 erosion for South Dakota counties. To obtain auxiliary variables for the population of cropland in South Dakota, we integrate a satellite-derived land cover map with a geographic database of soil properties. We provide an R Shiny application called viscover (available at https://lyux.shinyapps.io/viscover/) to visualize the overlay operations required to construct the covariates. On the basis of bootstrap estimates of the mean square error, we conclude that the EB predictors of mean RUSLE2 erosion are superior to direct estimators.

Identifiants

pubmed: 32725804
doi: 10.1002/bimj.202000029
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

1859-1878

Subventions

Organisme : Natural Resources Conservation Service
ID : 017301-00001
Organisme : Division of Social and Economic Sciences
ID : 1733572

Informations de copyright

© 2020 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Références

Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1-48.
Battese, G. E., Harter, R. M., & Fuller, W. A. (1988). An error-components model for prediction of county crop areas using survey and satellite data. Journal of the American Statistical Association, 83(401), 28-36.
Berg, E., & Chandra, H. (2014). Small area prediction for a unit-level lognormal model. Computational Statistics & Data Analysis, 78, 159-175.
Berg, E., Chandra, H., & Chambers, R. (2016). Small Area Estimation for Lognormal Data (chapter 15, pp. 279-298). John Wiley & Sons, Ltd. https://onlinelibrary.wiley.com/doi/abs/10.1002/9781118814963.ch15
Berg, E., & Lee, D. (2019). Small area prediction of quantiles for zero-inflated data and an informative sample design. Statistical Theory and Related Fields, 3(2), 114-128.
Box, G. E., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society: Series B (Methodological), 26(2), 211-243.
Chandra, H., & Chambers, R. (2016). Small area estimation for semicontinuous data. Biometrical Journal, 58(2), 303-319.
Dreassi, E., Petrucci, A., & Rocco, E. (2014). Small area estimation for semicontinuous skewed spatial data: An application to the grape wine production in tuscany. Biometrical Journal, 56(1), 141-156.
Erciulescu, A. L., & Fuller, W. A. (2016). Small area prediction under alternative model specifications. Statistics in Transition new series, 17(1), 9-24.
Fay, R. E., III, & HerriotR. A. (1979). Estimates of income for small places: An application of James-Stein procedures to census data. Journal of the American Statistical Association, 74(366a), 269-277.
Giner, G., & Smyth, G. K. (2016). statmod: Probability calculations for the inverse Gaussian distribution. R Journal, 8(1), 339-351.
Goebel, J. (2012). Statistical methodology for the NRI-CEAP cropland survey. Retrieved from https://www.nrcs.usda.gov/Internet/FSE_DOCUMENTS/16/nrcs143_013402.pdf
Golub, G. H., & Welsch, J. H. (1969). Calculation of Gauss quadrature rules. Mathematics of Computation, 23(106), 221-230.
González-Manteiga, W., Lombardía, M. J., Molina, I., Morales, D., & Santamaría, L. (2007). Estimation of the mean squared error of predictors of small area linear parameters under a logistic mixed model. Computational Statistics & Data Analysis, 51(5), 2720-2733.
González-Manteiga, W., Lombardía, M. J., Molina, I., Morales, D., & Santamaría, L. (2008). Bootstrap mean squared error of a small-area eblup. Journal of Statistical Computation and Simulation, 78(5), 443-462.
Hall, P., & Maiti, T. (2006). On parametric bootstrap methods for small area prediction. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(2), 221-238.
Hobza, T., & Morales, D. (2016). Empirical best prediction under unit-level logit mixed models. Journal of Official Statistics, 32(3), 661-692.
Hobza, T., Morales, D., & Santamaría, L. (2018). Small area estimation of poverty proportions under unit-level temporal binomial-logit mixed models. Test, 27(2), 270-294.
Hosmer, D. W., Jr, & Lemeshow, S. (2000). Applied logistic regression. Hoboken, NJ: John Wiley & Sons.
Jiang, J. (2003). Empirical best prediction for small-area inference based on generalized linear mixed models. Journal of Statistical Planning and Inference, 111(1-2), 117-127.
Jiang, J., & Lahiri, P. (2001). Empirical best prediction for small area inference with binary data. Annals of the Institute of Statistical Mathematics, 53(2), 217-243.
Jiang, J., & Lahiri, P. (2006). Mixed model prediction and small area estimation. Test, 15(1), 1.
Karlberg, F. (2015). Small area estimation for skewed data in the presence of zeroes. Statistics in Transition New Series, 4(16), 541-562.
Lyu, X. (2020). saezero: Small Area estimation under a zero inflated lognormal model with correlated random area effects. R package version 0.1.0. https://github.com/XiaodanLyu/saezero
Marhuenda, Y., Molina, I., Morales, D., & Rao, J. (2017). Poverty mapping in small areas under a twofold nested error regression model. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180(4), 1111-1136.
Marino, M. F., Ranalli, M. G., Salvati, N., & Alfò, M. (2019). Semiparametric empirical best prediction for small area estimation of unemployment indicators. The Annals of Applied Statistics, 13(2), 1166-1197.
Min, Y., & Agresti, A. (2002). Modeling nonnegative data with clumping at zero: A survey. Journal of the Iranian Statistical Society, 1(1), 7-33.
Molina, I., & Martin, N. (2018). Empirical best prediction under a nested error model with log transformation. The Annals of Statistics, 46(5), 1961-1993.
Molina, I., & Rao, J. (2010). Small area estimation of poverty indicators. Canadian Journal of Statistics, 38(3), 369-385.
Nusser, S. M., & Goebel, J. J. (1997). The national resources inventory: A long-term multi-resource monitoring programme. Environmental and Ecological Statistics, 4(3), 181-204.
Pfeffermann, D., Terryn, B., & Moura, F. A. (2008). Small area estimation under a two-part random effects model with application to estimation of literacy in developing countries. Survey Methodology, 34(2), 235-249.
R Core Team (2019). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/
Rao, J., & Molina, I. (2015). Small area estimation. Hoboken, NJ: John Wiley & Sons.
Schall, R. (1991). Estimation in generalized linear models with random effects. Biometrika, 78(4), 719-727.
Smith, D. D., & Wischmeier, W. H. (1957). Factors affecting sheet and rill erosion. EOS, Transactions American Geophysical Union, 38(6), 889-896.
Smyth, G. K. (2014). Polynomial approximation. Wiley Stats Ref: Statistics Reference Online.
Tobin, J., (1958). Estimation of relationships for limited dependent variables. Econometrica: Journal of the Econometric Society, 26(1), 24-36.
U.S. Department of Agriculture (2020a). National agricultural statistics service cropland data layer. Published crop-specific data layer [Online]. Washington, DC: USDA-NASS. [https://nassgeodata.gmu.edu/CropScape/]
U.S. Department of Agriculture (2020b). Soil Survey Geographic (SSURGO) Database. Soil Survey Staff, Natural Resources Conservation Service. https://sdmdataaccess.sc.egov.usda.gov
Williams, J. R., & Izaurralde, R. (2006). The APEX model. In V. Singh & D. Frevert (Eds.), Watershed models (chapter 18, pp. 437-482). Boca Raton, FL: Taylor & Francis Group.
Wischmeir, W. H., & Smith, D. D. (1965). Predicting rainfall erosion losses from cropland east of the Rocky Mountains: guide for selection for practices for soil and water conservation, U.S. Department of Agriculture. Agricultural handbook No. 282.
Zimmermann, T., & Münnich, R. T., (2018). Small area estimation with a lognormal mixed model under informative sampling. Journal of Official Statistics, 34(2), 523-542.

Auteurs

Xiaodan Lyu (X)

Department of Statistics, Iowa State University, Ames, IA, USA.

Emily J Berg (EJ)

Department of Statistics, Iowa State University, Ames, IA, USA.

Heike Hofmann (H)

Department of Statistics, Iowa State University, Ames, IA, USA.

Classifications MeSH