Threats of Bots and Other Bad Actors to Data Quality Following Research Participant Recruitment Through Social Media: Cross-Sectional Questionnaire.
data accuracy
fraud
internet
methods
social media
Journal
Journal of medical Internet research
ISSN: 1438-8871
Titre abrégé: J Med Internet Res
Pays: Canada
ID NLM: 100959882
Informations de publication
Date de publication:
07 10 2020
07 10 2020
Historique:
received:
30
07
2020
accepted:
16
09
2020
revised:
16
09
2020
entrez:
7
10
2020
pubmed:
8
10
2020
medline:
30
1
2021
Statut:
epublish
Résumé
Recruitment of health research participants through social media is becoming more common. In the United States, 80% of adults use at least one social media platform. Social media platforms may allow researchers to reach potential participants efficiently. However, online research methods may be associated with unique threats to sample validity and data integrity. Limited research has described issues of data quality and authenticity associated with the recruitment of health research participants through social media, and sources of low-quality and fraudulent data in this context are poorly understood. The goal of the research was to describe and explain threats to sample validity and data integrity following recruitment of health research participants through social media and summarize recommended strategies to mitigate these threats. Our experience designing and implementing a research study using social media recruitment and online data collection serves as a case study. Using published strategies to preserve data integrity, we recruited participants to complete an online survey through the social media platforms Twitter and Facebook. Participants were to receive $15 upon survey completion. Prior to manually issuing remuneration, we reviewed completed surveys for indicators of fraudulent or low-quality data. Indicators attributable to respondent error were labeled suspicious, while those suggesting misrepresentation were labeled fraudulent. We planned to remove cases with 1 fraudulent indicator or at least 3 suspicious indicators. Within 7 hours of survey activation, we received 271 completed surveys. We classified 94.5% (256/271) of cases as fraudulent and 5.5% (15/271) as suspicious. In total, 86.7% (235/271) provided inconsistent responses to verifiable items and 16.2% (44/271) exhibited evidence of bot automation. Of the fraudulent cases, 53.9% (138/256) provided a duplicate or unusual response to one or more open-ended items and 52.0% (133/256) exhibited evidence of inattention. Research findings from several disciplines suggest studies in which research participants are recruited through social media are susceptible to data quality issues. Opportunistic individuals who use virtual private servers to fraudulently complete research surveys for profit may contribute to low-quality data. Strategies to preserve data integrity following research participant recruitment through social media are limited. Development and testing of novel strategies to prevent and detect fraud is a research priority.
Sections du résumé
BACKGROUND
Recruitment of health research participants through social media is becoming more common. In the United States, 80% of adults use at least one social media platform. Social media platforms may allow researchers to reach potential participants efficiently. However, online research methods may be associated with unique threats to sample validity and data integrity. Limited research has described issues of data quality and authenticity associated with the recruitment of health research participants through social media, and sources of low-quality and fraudulent data in this context are poorly understood.
OBJECTIVE
The goal of the research was to describe and explain threats to sample validity and data integrity following recruitment of health research participants through social media and summarize recommended strategies to mitigate these threats. Our experience designing and implementing a research study using social media recruitment and online data collection serves as a case study.
METHODS
Using published strategies to preserve data integrity, we recruited participants to complete an online survey through the social media platforms Twitter and Facebook. Participants were to receive $15 upon survey completion. Prior to manually issuing remuneration, we reviewed completed surveys for indicators of fraudulent or low-quality data. Indicators attributable to respondent error were labeled suspicious, while those suggesting misrepresentation were labeled fraudulent. We planned to remove cases with 1 fraudulent indicator or at least 3 suspicious indicators.
RESULTS
Within 7 hours of survey activation, we received 271 completed surveys. We classified 94.5% (256/271) of cases as fraudulent and 5.5% (15/271) as suspicious. In total, 86.7% (235/271) provided inconsistent responses to verifiable items and 16.2% (44/271) exhibited evidence of bot automation. Of the fraudulent cases, 53.9% (138/256) provided a duplicate or unusual response to one or more open-ended items and 52.0% (133/256) exhibited evidence of inattention.
CONCLUSIONS
Research findings from several disciplines suggest studies in which research participants are recruited through social media are susceptible to data quality issues. Opportunistic individuals who use virtual private servers to fraudulently complete research surveys for profit may contribute to low-quality data. Strategies to preserve data integrity following research participant recruitment through social media are limited. Development and testing of novel strategies to prevent and detect fraud is a research priority.
Identifiants
pubmed: 33026360
pii: v22i10e23021
doi: 10.2196/23021
pmc: PMC7578815
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e23021Subventions
Organisme : NCI NIH HHS
ID : U54 CA156732
Pays : United States
Informations de copyright
©Rachel Pozzar, Marilyn J Hammer, Meghan Underhill-Blazey, Alexi A Wright, James A Tulsky, Fangxin Hong, Daniel A Gundersen, Donna L Berry. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 07.10.2020.
Références
Contemp Clin Trials. 2015 Nov;45(Pt A):41-54
pubmed: 26176884
JMIR Public Health Surveill. 2019 Feb 04;5(1):e12344
pubmed: 30714944
Int J Methods Psychiatr Res. 2014 Mar;23(1):120-9
pubmed: 24431134
Educ Psychol Meas. 2016 Dec;76(6):912-932
pubmed: 29795893
JMIR Res Protoc. 2018 Apr 24;7(4):e96
pubmed: 29691203
J Med Internet Res. 2016 Nov 7;18(11):e286
pubmed: 27821383
Digit Health. 2018 May 07;4:2055207618771757
pubmed: 29942634
Internet Interv. 2019 Apr 12;17:100246
pubmed: 31080751
Appl Nurs Res. 2016 Nov;32:144-147
pubmed: 27969019
J Law Med Ethics. 2015 Spring;43(1):116-33
pubmed: 25846043
Nurs Res. 2019 Nov/Dec;68(6):423-432
pubmed: 31693547
Fam Relat. 2016 Oct;65(4):550-561
pubmed: 28804184
J Biomed Inform. 2009 Apr;42(2):377-81
pubmed: 18929686
J Med Internet Res. 2017 Aug 28;19(8):e290
pubmed: 28851679
West J Nurs Res. 2019 Sep;41(9):1270-1281
pubmed: 30729866
JMIR Res Protoc. 2016 Aug 10;5(3):e161
pubmed: 27511829
Internet Interv. 2014 Apr;1(2):58-64
pubmed: 25045624
J Med Internet Res. 2016 Nov 15;18(11):e288
pubmed: 27847353
J Med Internet Res. 2018 Nov 08;20(11):e290
pubmed: 30409765
J Med Internet Res. 2016 Jun 15;18(6):e117
pubmed: 27306780
Health Place. 2019 Jan;55:37-42
pubmed: 30466814