Integrating explanation and prediction in computational social science.


Journal

Nature
ISSN: 1476-4687
Titre abrégé: Nature
Pays: England
ID NLM: 0410462

Informations de publication

Date de publication:
07 2021
Historique:
received: 23 02 2021
accepted: 20 05 2021
pubmed: 2 7 2021
medline: 3 8 2021
entrez: 1 7 2021
Statut: ppublish

Résumé

Computational social science is more than just large repositories of digital data and the computational methods needed to construct and analyse them. It also represents a convergence of different fields with different ways of thinking about and doing science. The goal of this Perspective is to provide some clarity around how these approaches differ from one another and to propose how they might be productively integrated. Towards this end we make two contributions. The first is a schema for thinking about research activities along two dimensions-the extent to which work is explanatory, focusing on identifying and estimating causal effects, and the degree of consideration given to testing predictions of outcomes-and how these two priorities can complement, rather than compete with, one another. Our second contribution is to advocate that computational social scientists devote more attention to combining prediction and explanation, which we call integrative modelling, and to outline some practical suggestions for realizing this goal.

Identifiants

pubmed: 34194044
doi: 10.1038/s41586-021-03659-0
pii: 10.1038/s41586-021-03659-0
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

181-188

Références

Watts, D. J. A twenty-first century science. Nature 445, 489 (2007).
pubmed: 17268455
Lazer, D. et al. Computational social science. Science 323, 721–723 (2009).
pubmed: 19197046 pmcid: 2745217
Salganik, M. J. Bit by Bit: Social Research in the Digital Age (Princeton Univ. Press, 2018).
Lazer, D. M. J. et al. Computational social science: obstacles and opportunities. Science 369, 1060–1062 (2020).
pubmed: 32855329
Lazer, D. et al. Meaningful measures of human society in the twenty-first century. Nature https://doi.org/10.1038/s41586-021-03660-7 (2021).
Wing, J. M. Computational thinking. Commun. ACM 49, 33–35 (2006).
Hedström, P. & Ylikoski, P. Causal mechanisms in the social sciences. Annu. Rev. Sociol. 36, 49–67 (2010).
Breiman, L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat. Sci. 16, 199–231 (2001). We view our paper as an extension of Brieman’s dichotomy (the ‘algorithmic’ and ‘data modelling’ cultures), arguing that these approaches should be integrated.
Mullainathan, S. & Spiess, J. Machine learning: an applied econometric approach. J. Econ. Perspect. 31, 87–106 (2017). This paper explores the relationships between predictive models and causal inference.
Molina, M. & Garip, F. Machine learning for sociology. Annu. Rev. Sociol. 45, 27–45 (2019).
Shmueli, G. To explain or to predict? Stat. Sci. 25, 289–310 (2010). We build on Schmueli’s distinction between prediction and explanation and propose a framework for integrating the two approaches.
Agrawal, M., Peterson, J. C. & Griffiths, T. L. Scaling up psychology via Scientific Regret Minimization. Proc. Natl Acad. Sci. USA 117, 8825–8835 (2020). This paper exemplifies what we call integrative modelling.
pubmed: 32241896 pmcid: 7183163
Munafò, M. R. et al. A manifesto for reproducible science. Nat. Hum. Behav. 1, 0021 (2017).
pubmed: 33954258 pmcid: 7610724
Yarkoni, T. The generalizability crisis. Behav. Brain Sci. https://doi.org/10.1017/S0140525X20001685 (2020).
Ward, M. D., Greenhill, B. D. & Bakke, K. M. The perils of policy by p-value: predicting civil conflicts. J. Peace Res. 47, 363–375 (2010).
Yarkoni, T. & Westfall, J. Choosing prediction over explanation in psychology: lessons from machine learning. Perspect. Psychol. Sci. 12, 1100–1122 (2017).
pubmed: 28841086 pmcid: 6603289
Watts, D. J. Should social science be more solution-oriented? Nat. Hum. Behav. 1, 0015 (2017).
Berkman, E. T. & Wilson, S. M. So useful as a good theory? The practicality crisis in (social) psychological theory. Perspect. Psychol. Sci. https://doi.org/10.1177/1745691620969650 (2021).
Athey, S. Beyond prediction: Using big data for policy problems. Science 355, 483–485 (2017).
pubmed: 28154050
Lipton, Z. C. The mythos of model interpretability. Queue 16, 31–57 (2018).
Kleinberg, J., Ludwig, J., Mullainathan, S. & Sunstein, C. R. Discrimination in the age of algorithms. J. Legal Anal. 10, 113–174 (2018).
Coveney, P. V., Dougherty, E. R. & Highfield, R. R. Big data need big theory too. Philos. Trans. R. Soc. A 374, 20160153 (2016).
Gigerenzer, G. Mindless statistics. J. Socio-Econ. 33, 587–606 (2004).
Cohen, J. The earth is round (p < .05). Am. Psychol. 49, 997–1003 (1994).
Bertrand, M. & Mullainathan, S. Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. Am. Econ. Rev. 94, 991–1013 (2004).
Ioannidis, J. P. A. Why most published research findings are false. PLoS Med. 2, e124 (2005).
pubmed: 16060722 pmcid: 1182327
Simmons, J. P., Nelson, L. D. & Simonsohn, U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22, 1359–1366 (2011).
pubmed: 22006061
Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349, aac4716 (2015).
Meehl, P. E. Why summaries of research on psychological theories are often uninterpretable. Psychol. Rep. 66, 195–244 (1990).
Gelman, A. Causality and statistical learning. Am. J. Sociol. 117, 955–966 (2011).
Dienes, Z. Understanding Psychology as a Science: An Introduction to Scientific and Statistical Inference (Macmillan, 2008).
Schrodt, P. A. Seven deadly sins of contemporary quantitative political analysis. J. Peace Res. 51, 287–300 (2014).
Lazer, D., Kennedy, R., King, G. & Vespignani, A. The parable of Google flu: traps in big data analysis. Science 343, 1203–1205 (2014).
pubmed: 24626916
Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019).
pubmed: 31649194
Goel, S., Hofman, J. M., Lahaie, S., Pennock, D. M. & Watts, D. J. Predicting consumer behavior with web search. Proc. Natl Acad. Sci. USA 107, 17486–17490 (2010).
pubmed: 20876140 pmcid: 2955127
Hofman, J. M., Sharma, A. & Watts, D. J. Prediction and explanation in social systems. Science 355, 486–488 (2017).
pubmed: 28154051
Case, A. & Deaton, A. Rising morbidity and mortality in midlife among white non-Hispanic Americans in the 21st century. Proc. Natl Acad. Sci. USA 112, 15078–15083 (2015).
pubmed: 26575631 pmcid: 4679063
Oliver, M. L., Shapiro, T. M. & Shapiro, T. Black Wealth, White Wealth: A New Perspective on Racial Inequality (Taylor & Francis, 2006).
Chetty, R., Hendren, N., Kline, P. & Saez, E. Where is the land of opportunity? The geography of intergenerational mobility in the United States. Q. J. Econ. 129, 1553–1623 (2014).
Wagner, C. et al. Measuring algorithmically infused societies. Nature https://doi.org/10.1038/s41586-021-03666-1 (2021).
Ba, B. A., Knox, D., Mummolo, J. & Rivera, R. The role of officer race and gender in police–civilian interactions in Chicago. Science 371, 696–702 (2021).
pubmed: 33574207
Provost, F. & Fawcett, T. Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking (O’Reilly Media, 2013).
Makridakis, S., Wheelwright, S. C. & Hyndman, R. J. Forecasting Methods and Applications (Wiley, 1998).
Tetlock, P. E. Expert Political Judgment: How Good Is It? How Can We Know? (Princeton Univ. Press, 2005).
Kleinberg, J., Ludwig, J., Mullainathan, S. & Obermeyer, Z. Prediction policy problems. Am. Econ. Rev. 105, 491–495 (2015).
pubmed: 27199498 pmcid: 4869349
Dowding, K. & Miller, C. On prediction in political science. Eur. J. Polit. Res. 58, 1001–1018 (2019).
Galesic, M. et al. Human social sensing is an untapped resource for computational social science. Nature https://doi.org/10.1038/s41586-021-03649-2 (2021).
doi: 10.1038/s41586-021-03649-2 pubmed: 34194037
Cheng, J., Adamic, L., Dow, P. A., Kleinberg, J. M. & Leskovec, J. Can cascades be predicted? In WWW '14: Proc. 23rd International Conference on World Wide Web 925–936 (2014).
Pearl, J. The seven tools of causal inference, with reflections on machine learning. Commun. ACM 62, 54–60 (2019). This paper outlines the need for causal thinking in building predictive models.
Salganik, M. J. et al. Measuring the predictability of life outcomes with a scientific mass collaboration. Proc. Natl Acad. Sci. USA 117, 8398–8403 (2020).
pubmed: 32229555 pmcid: 7165437
Fudenberg, D., Kleinberg, J., Liang, A. & Mullainathan, S. Measuring the completeness of theories. SSRN https://doi.org/10.2139/ssrn.3018785 (2019).
Martin, T., Hofman, J. M., Sharma, A., Anderson, A. & Watts, D. J. Exploring limits to prediction in complex social systems. In WWW '16: Proc 25th International Conference on World Wide Web 683–694 (2016).
Watts, D. J. Common sense and sociological explanations. Am. J. Sociol. 120, 313–351 (2014). This paper argues that sociologists should pay more attention to prediction versus interpretability when evaluating their explanations.
Zhou, F., Xu, X., Trajcevski, G. & Zhang, K. A survey of information cascade analysis: models, predictions, and recent advances. ACM Comput. Surv. 54, 1–36 (2021).
Goel, S., Watts, D. J. & Goldstein, D. G. The structure of online diffusion networks. In EC '12: Proc. 13th ACM Conference on Electronic Commerce (2012).
Wu, S., Hofman, J. M., Mason, W. A. & Watts, D. J. Who says what to whom on Twitter. In WWW’11: Proc 20th International Conference on World Wide Web 705–714 (2011).
Goel, S., Anderson, A., Hofman, J. & Watts, D. J. The structural virality of online diffusion. Manage. Sci. 62, 180–196 (2015).
Berger, J. & Milkman, K. L. What makes online content viral? J. Mark. Res. 49, 192–205 (2012).
Bakshy, E., Hofman, J. M., Mason, W. A. & Watts, D. J. Everyone’s an influencer: quantifying influence on Twitter. In WSDM '11: Proc. Fourth ACM International Conference on Web Search and Data Mining 65–74 (2011).
Tan, C., Lee, L. & Pang, B. The effect of wording on message propagation: topic- and author-controlled natural experiments on Twitter. In Proc. 52nd Annual Meeting of the Association for Computational Linguistics 175–185 (2014).
Liu, T., Ungar, L. & Kording, K. Quantifying causality in data science with quasi-experiments. Nat. Comput. Sci. 1, 24–32 (2021).
Hochberg, I. et al. Encouraging physical activity in patients with diabetes through automatic personalized feedback via reinforcement learning improves glycemic control. Diabetes Care 39, e59–e60 (2016).
pubmed: 26822328
Athey, S. & Imbens, G. Recursive partitioning for heterogeneous causal effects. Proc. Natl Acad. Sci. USA 113, 7353–7360 (2016).
pubmed: 27382149 pmcid: 4941430
Charles, D., Chickering, M. & Simard, P. Counterfactual reasoning and learning systems: the example of computational advertising. J. Mach. Learn. Res. 14, 3207–3260 (2013).
Low, H. & Meghir, C. The use of structural models in econometrics. J. Econ. Perspect. 31, 33–58 (2017).
Athey, S., Levin, J. & Seira, E. Comparing open and sealed bid auctions: evidence from timber auctions*. Q. J. Econ. 126, 207–257 (2011).
Awad, E. et al. The Moral Machine experiment. Nature 563, 59–64 (2018).
pubmed: 30356211
Aczel, B. et al. A consensus-based transparency checklist. Nat. Hum. Behav. 4, 4–6 (2020).
pubmed: 31792401
Kidwell, M. C. et al. Badges to acknowledge open practices: a simple, low-cost, effective method for increasing transparency. PLoS Biol. 14, e1002456 (2016).
pubmed: 27171007 pmcid: 4865119
Nosek, B. A. et al. Promoting an open research culture. Science 348, 1422–1425 (2015).
pubmed: 26113702 pmcid: 4550299
Nosek, B. A., Ebersole, C. R., DeHaven, A. C. & Mellor, D. T. The preregistration revolution. Proc. Natl Acad. Sci. USA 115, 2600–2606 (2018).
pubmed: 29531091 pmcid: 5856500
Donoho, D. 50 years of data science. J. Comput. Graph. Stat. 26, 745–766 (2017).
Gelman, A. & Loken, E. The statistical crisis in science. Am. Sci. 102, 460 (2014).
Rao, R. B., Fung, G. & Rosales, R. On the dangers of cross-validation. An experimental evaluation. In Proc. 2008 SIAM International Conference on Data Mining 588–596 (Society for Industrial and Applied Mathematics, 2008).
Dwork, C. et al. The reusable holdout: preserving validity in adaptive data analysis. Science 349, 636–638 (2015).
pubmed: 26250683
Chambers, C. D. Registered reports: a new publishing initiative at Cortex. Cortex 49, 609–610 (2013).
pubmed: 23347556
Nosek, B. A. & Lakens, D. Registered reports: a method to increase the credibility of published reports. Soc. Psychol. 45, 137–141 (2014).
Bennett, J. & Lanning, S. The Netflix Prize. In Proc. KDD Cup and Workshop 2007 (2007).
Dorie, V., Hill, J., Shalit, U., Scott, M. & Cervone, D. Automated versus do-it-yourself methods for causal inference: lessons learned from a data analysis competition. SSO Schweiz. Monatsschr. Zahnheilkd. 34, 43–68 (2019).
Lin, A., Merchant, A., Sarkar, S. K. & D’Amour, A. Universal causal evaluation engine: an API for empirically evaluating causal inference models. in Proc. Machine Learning Research (eds Le, T. D. et al.) Vol. 104, 50–58 (PMLR, 2019).
Craver, C. F. Explaining the Brain: Mechanisms and the Mosaic Unity of Neuroscience (Clarendon, 2007).
Salganik, M. J., Lundberg, I., Kindel, A. T. & McLanahan, S. Introduction to the special collection on the Fragile Families Challenge. Socius https://doi.org/10.1177/2378023119871580 (2019).
Strathern, M. ‘Improving ratings’: audit in the British university system. Eur. Rev. 5, 305–321 (1997).
Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D. & Griffiths, T. L. Using large-scale experiments and machine learning to discover new theories of human decision-making. Science 372, 1209–1214 (2021).

Auteurs

Jake M Hofman (JM)

Microsoft Research, New York, NY, USA. jmh@microsoft.com.

Duncan J Watts (DJ)

Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA. djwatts@seas.upenn.edu.
The Annenberg School of Communication, University of Pennsylvania, Philadelphia, PA, USA. djwatts@seas.upenn.edu.
Operations, Information, and Decisions Department, University of Pennsylvania, Philadelphia, PA, USA. djwatts@seas.upenn.edu.

Susan Athey (S)

Graduate School of Business, Stanford University, Stanford, CA, USA.

Filiz Garip (F)

Department of Sociology, Princeton University, Princeton, NJ, USA.

Thomas L Griffiths (TL)

Department of Psychology, Princeton University, Princeton, NJ, USA.
Department of Computer Science, Princeton University, Princeton, NJ, USA.

Jon Kleinberg (J)

Department of Computer Science, Cornell University, Ithaca, NY, USA.
Department of Information Science, Cornell University, Ithaca, NY, USA.

Helen Margetts (H)

Oxford Internet Institute, University of Oxford, Oxford, UK.
Public Policy Programme, The Alan Turing Institute, London, UK.

Sendhil Mullainathan (S)

Booth School of Business, University of Chicago, Chicago, IL, USA.

Matthew J Salganik (MJ)

Department of Sociology, Princeton University, Princeton, NJ, USA.

Simine Vazire (S)

Melbourne School of Psychological Sciences, University of Melbourne, Melbourne, Victoria, Australia.

Alessandro Vespignani (A)

Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA, USA.

Tal Yarkoni (T)

Department of Psychology, University of Texas at Austin, Austin, TX, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH