Name-based demographic inference and the unequal distribution of misrecognition.
Journal
Nature human behaviour
ISSN: 2397-3374
Titre abrégé: Nat Hum Behav
Pays: England
ID NLM: 101697750
Informations de publication
Date de publication:
07 2023
07 2023
Historique:
received:
15
09
2022
accepted:
13
03
2023
medline:
26
7
2023
pubmed:
18
4
2023
entrez:
17
4
2023
Statut:
ppublish
Résumé
Academics and companies increasingly draw on large datasets to understand the social world, and name-based demographic ascription tools are widespread for imputing information that is often missing from these large datasets. These approaches have drawn criticism on ethical, empirical and theoretical grounds. Using a survey of all authors listed on articles in sociology, economics and communication journals in Web of Science between 2015 and 2020, we compared self-identified demographics with name-based imputations of gender and race/ethnicity for 19,924 scholars across four gender ascription tools and four race/ethnicity ascription tools. We found substantial inequalities in how these tools misgender and misrecognize the race/ethnicity of authors, distributing erroneous ascriptions unevenly among other demographic traits. Because of the empirical and ethical consequences of these errors, scholars need to be cautious with the use of demographic imputation. We recommend five principles for the responsible use of name-based demographic inference.
Identifiants
pubmed: 37069295
doi: 10.1038/s41562-023-01587-9
pii: 10.1038/s41562-023-01587-9
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
1084-1095Informations de copyright
© 2023. The Author(s), under exclusive licence to Springer Nature Limited.
Références
Matias, J. N., Szalavitz, S. & Zuckerman, E. FollowBias: supporting behavior change toward gender equality by networked gatekeepers on social media. In Proc. 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (eds Lee, S. & Poltrock, S.) 1082–1095 (Association for Computing Machinery, 2017).
Peng, H., Lakhani, K. & Teplitskiy, M. Acceptance in top journals shows large disparities across name-inferred ethnicities. Preprint at SocArXiv https://doi.org/10.31235/osf.io/mjbxg (2021).
Hofstra, B. & de Schipper, N. C. Predicting ethnicity with first names in online social media networks. Big Data Soc. https://doi.org/10.1177/2053951718761141 (2018).
King, M. M., Bergstrom, C. T., Correll, S. J., Jacquet, J. & West, J. D. Men set their own cites high: gender and self-citation across fields and over time. Socius https://doi.org/10.1177/2378023117738903 (2017).
Mihaljević, H., Tullney, M., Santamaría, L. & Steinfeldt, C. Reflections on gender analyses of bibliographic corpora. Front. Big Data https://doi.org/10.3389/fdata.2019.00029 (2019).
Keyes, O. The misgendering machines. In Proc. ACM on Human-Computer Interaction (eds Karahalios, K., Monroy-Hernández, A., Lampinen, A. & Fitzpatrick, G.) 1–22 (Association for Computing Machinery, 2018).
D’Ignazio, C. A Primer on Non-Binary Gender and Big Data (MIT Center for Civic Media, 2016); https://civic.mit.edu/index.html%3Fp=1165.html
Borch, C. & Pardo-Gurrera, J. P. (eds) Oxford Handbook of the Sociology of Machine Learning (Oxford Univ. Press, 2023).
Santamaría, L. & Mihaljević, H. Comparison and benchmark of name-to-gender inference services. PeerJ Comput. Sci. 4, e156 (2018).
doi: 10.7717/peerj-cs.156
pubmed: 33816809
pmcid: 7924484
Lindsay, J. & Dempsey, D. First names and social distinction: middle-class naming practices in Australia. J. Sociol. 53, 577–591 (2017).
doi: 10.1177/1440783317690925
Bertrand, M. & Mullainathan, S. Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. Am. Econ. Rev. 94, 991–1013 (2004).
doi: 10.1257/0002828042002561
Fosch-Villaronga, E., Poulsen, A., Søraa, R. A. & Custers, B. H. M. A little bird told me your gender: gender inferences in social media. Inf. Process. Manag. 58, 102541 (2021).
doi: 10.1016/j.ipm.2021.102541
Van Buskirk, I., Clauset, A. & Larremore, D. B. An open-source cultural consensus approach to name-based gender classification. Preprint at http://arxiv.org/abs/2208.01714 (2022).
West, C. & Zimmerman, D. H. Doing gender. Gend. Soc. 1, 125–151 (1987).
doi: 10.1177/0891243287001002002
Bonilla-Silva, E. The essential social fact of race. Am. Sociol. Rev. 64, 899–906 (1999).
doi: 10.2307/2657410
Seguin, C., Julien, C. & Zhang, Y. The stability of androgynous names: dynamics of gendered naming practices in the United States 1880–2016. Poetics 85, 101501 (2021).
doi: 10.1016/j.poetic.2020.101501
Fryer, R. G. Jr. & Levitt, S. D. The causes and consequences of distinctively black names. Q. J. Econ. 119, 767–805 (2004).
doi: 10.1162/0033553041502180
Jensen, J. L. et al. Language models in sociological research: an application to classifying large administrative data and measuring religiosity. Sociol. Methodol. 52, 30–52 (2022).
doi: 10.1177/00811750211053370
Lieberson, S., Dumais, S. & Baumann, S. The instability of androgynous names: the symbolic maintenance of gender boundaries. Am. J. Sociol. 105, 1249–1287 (2000).
doi: 10.1086/210431
Kozlowski, D. et al. Avoiding bias when inferring race using name-based approaches. PLoS ONE 17, e0264270 (2022).
doi: 10.1371/journal.pone.0264270
pubmed: 35231059
pmcid: 8887775
Sebo, P. Using genderize.io to infer the gender of first names: how to improve the accuracy of the inference. J. Med. Libr. Assoc. 109, 609–612 (2021).
doi: 10.5195/jmla.2021.1252
pubmed: 34858090
pmcid: 8608220
Müller, D., Te, Y.-F. & Jain, P. Improving data quality through high precision gender categorization. In 2017 IEEE International Conference on Big Data (Big Data) (eds Baeza-Yeats, R., Hu, X. T. & Kepner, J.) 2628–2636 (IEEE, 2017).
Wang, Z. et al. Demographic inference and representative population estimates from multilingual social media data. In The World Wide Web Conference (eds Liu, L. & Whyte, R.) 2056–2067 (Association for Computing Machinery, 2019).
Silva, G. C., Trivedi, A. N. & Gutman, R. Developing and evaluating methods to impute race/ethnicity in an incomplete dataset. Health Serv. Outcomes Res. Methodol. 19, 175–195 (2019).
doi: 10.1007/s10742-019-00200-9
Mateos, P. A review of name-based ethnicity classification methods and their potential in population studies. Popul. Space Place 13, 243–263 (2007).
doi: 10.1002/psp.457
Barber, M. & Argyle, L. Misclassification and bias in predictions of individual ethnicity from administrative records. Am. Polit. Sci. Rev. (Forthcoming).
ASA membership (American Sociological Association, 2021); https://www.asanet.org/academic-professional-resources/data-about-discipline/asa-membership
Kessler, S. J. & McKenna, W. Gender: an Ethnomethodological Approach (Univ. Chicago Press, 1985).
Pascoe, C. J. Dude, You’re a Fag: Masculinity and Sexuality in High School (Univ. California Press, 2007).
McNamarah, C. T. Misgendering. Calif. Law Rev. 109, 2227–2322 (2021).
Lagos, D. Hearing gender: voice-based gender classification processes and transgender health inequality. Am. Sociol. Rev. 84, 801–827 (2019).
doi: 10.1177/0003122419872504
Browne, K. Genderism and the bathroom problem: (re)materialising sexed sites, (re)creating sexed bodies. Gend. Place Cult. 11, 331–346 (2004).
doi: 10.1080/0966369042000258668
Whitley, C. T., Nordmarken, S., Kolysh, S. & Goldstein-Kral, J. I’ve been misgendered so many times: comparing the experiences of chronic misgendering among transgender graduate students in the social and natural sciences. Sociol. Inq. 92, 1001–1028 (2022).
doi: 10.1111/soin.12482
The Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research (National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, 1979); https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/read-the-belmont-report/index.html
Hamidi, F., Scheuerman, M. K. & Branham, S. M. Gender recognition or gender reductionism?: The social implications of embedded gender recognition systems. In Proc. 2018 CHI Conference on Human Factors in Computing Systems (eds Hancock, M. & Mandryk, R.) 1–13 (Association for Computing Machinery, 2018).
Scheuerman, M. K., Pape, M. & Hanna, A. Auto-essentialization: gender in automated facial analysis as extended colonial project. Big Data Soc. https://doi.org/10.1177/20539517211053712 (2021).
Bourg, C. Gender Mistakes and Inequality (Stanford Univ. Press, 2003).
Davis, G. & Preves, S. Intersex and the social construction of sex. Contexts 16, 80 (2017).
doi: 10.1177/1536504217696082
Fausto-Sterling, A. Sexing the Body: Gender Politics and the Construction of Sexuality (Basic Books, 2000).
Lockhart, J. W. Paradigms of sex research and women in STEM. Gend. Soc. 35, 449–475 (2021).
doi: 10.1177/08912432211001384
pubmed: 35958390
pmcid: 9365066
Science must respect the dignity and rights of all humans. Nat. Hum. Behav. 6, 1029–1031 (2022).
Slater, R. B. The blacks who first entered the world of white higher education. J. Blacks High. Educ. 4, 47–56 (1994).
doi: 10.2307/2963372
Blumenfeld, W. J. On the discursive construction of Jewish “racialization” and “race passing:” Jews as “U-boats” with a mysterious “queer light”. J. Crit. Thought Prax. 1, 2 (2012).
Nakamura, L. Cyberrace. PMLA 123, 1673–1682 (2008).
Sims, J. P. Reevaluation of the influence of appearance and reflected appraisals for mixed-race identity: the role of consistent inconsistent racial perception. Sociol. Race Ethn. 2, 569–583 (2016).
doi: 10.1177/2332649216634740
Buolamwini, J. & Gebru, T. Gender shades: intersectional accuracy disparities in commercial gender classification. In Proc. 1st Conference on Fairness, Accountability and Transparency (ed. Barocas, S.) 1–15 (ACM, 2018).
Tzioumis, K. Demographic aspects of first names. Sci. Data 5, 180025 (2018).
doi: 10.1038/sdata.2018.25
pubmed: 29509189
pmcid: 5839157
Di Bitetti, M. S. & Ferreras, J. A. Publish (in English) or perish: the effect on citation rate of using languages other than English in scientific publications. Ambio 46, 121–127 (2017).
doi: 10.1007/s13280-016-0820-7
pubmed: 27686730
Garcia, P. et al. No: critical refusal as feminist data practice. In Proc. 2020 Conference on Computer Supported Cooperative Work and Social Computing (eds Bietz, M. & Wiggins, A.) 199–202 (Association for Computing Machinery, 2020).
Caplan, R., Donovan, J., Hanson, L. & Matthews, J. Algorithmic Accountability: a Primer (Data & Society, 2018); https://datasociety.net/wp-content/uploads/2019/09/DandS_Algorithmic_Accountability.pdf
Angwin, J., Larson, J., Mattu, S. & Kirchner, L. Machine Bias (ProPublica, 2016); https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Harcourt, B. E. Risk as a proxy for race: the dangers of risk assessment. Fed. Sentencing Rep. 27, 237–243 (2015).
doi: 10.1525/fsr.2015.27.4.237
Caliskan, A., Bryson, J. J. & Narayanan, A. Semantics derived automatically from language corpora contain human-like biases. Science 356, 183–186 (2017).
doi: 10.1126/science.aal4230
pubmed: 28408601
Eubanks, V. Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor (St. Martin’s Press, 2017).
O’Neil, C. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (Allen Lane, 2016).
Benjamin, R. Race After Technology: Abolitionist Tools for the New Jim Code (Polity Press, 2019).
Genderize.io. Determine the gender of a name; https://genderize.io/
Mullen, L., Blevins, C. & Schmidt, B. gender: predict gender from names using historical data http://cran.nexr.com/web/packages/gender/README.html (2021).
Kaplan, J. predictrace: predict the race and gender of a given name using census and Social Security Administration data. GitHub https://github.com/jacobkap/predictrace (2021).
Laohaprapanon, S., Sood, G. & Naji, B. appeler/ethnicolor: impute race and ethnicity based on name. GitHub https://github.com/appeler/ethnicolor (2022).
Khanna, K., Bertelsen, B., Olivella, S., Rosenman, E. & Imai, K. wru: who are you? Bayesian prediction of racial category using surname, first name, middle name, and geolocation. GitHub https://github.com/kosukeimai/wru (2022).