#SurveyMethods

Nick Byrd, Ph.D.ByrdNick@nerdculture.de
2026-02-09

Can researchers detect #AI bots taking paid surveys?

#Prolific tested humans and #LLM agents with various #dataQuality checks.
- The company says they caught 100% of the non-humans.
- My take-away: #reCAPTCHA and #mouseTracking caught 95%

prolific.com/resources/authent

#surveyMethods

- Four attention check questions: The classic "please select 'strongly agree' to show you're reading this" type questions.
- Two consistency checks: Paired questions where humans should answer in predictable patterns. For example, if the answer to one question was true, the other had to be false (and vice versa) because it indicates consistency.
- Two reverse shibboleth items: Questions that your intended sample (i.e., humans) should not know the answer to, but an AI agent likely would.
- Three cognitive traps: These are visual illusions designed to exploit fundamental limitations in vision-enabled AI systems. We used a modified Müller-Lyer illusion, a modified cafe wall illusion, and a ‘moving robot task’, all of which have been shown to be highly discriminative between AI agents and human participants.
- Comprehensive mouse tracking: Recording every cursor movement and click pattern throughout the survey.
- Qualtrics' reCAPTCHA score: The platform's built-in risk scoring system.We tested five different AI agents, each completing the survey 25 times to match our human sample size:

- GPT Agent: OpenAI's GPT configured to complete surveys
- Claude: Anthropic's Claude accessed via Cursor
- Perplexity: Perplexity AI's search-enhanced agent
- Gemini: Google's Gemini via Project Mariner
- Custom Agent: Our in-house ‘white hat’ agent, designed by Prolific’s AI research engineers to take surveys and avoid detection

The custom agent was crucial to include because it’s specifically designed to take surveys undetected. It’s closer to Westwood’s bot (2025) as an adversarial, customisable agent than the more readily detectable commercial AI agents.

To take part in the survey, we provided all agents with an identical prompt that asked them to complete the survey as a human would and exhibit human-like behaviour.Prolific’s bot authenticity check (100% accurate)

Mouse tracking (95.0% accurate)

Qualtrics’ reCAPTCHA score (94.2% accurate)

Cognitive traps (85.2% accurate)

Consistency checks (62.7% accurate)

Classic attention checks (59.7% accurate)

Reverse shibboleth questions (58.0% accurate)
Nick Byrd, Ph.D.ByrdNick@nerdculture.de
2026-01-19

Hey #SurveyMethods and #MedEd folks:

In a workshop for #MedSchool faculty about questionnaire design and survey research methods, what

- objectives should be prioritized?
- materials are out there?
- activities are worth it?
- assessments work?

Example: aamc.org/what-we-do/mission-ar

Questionnaire Design and Survey Research
This workshop will provide some basic principles in questionnaire/survey design and give workshop participants an opportunity for hands-on experience designing a questionnaire. Following participating in this workshop, learners will be able to:

- Design a blueprint for a survey/questionnaire appropriate to their own application;
- Construct and edit questions to avoid common problems in wording and framing;
- Select an appropriate response format from a menu of alternatives;
- Design the overall format of the survey/questionnaire to facilitate data management and analysis.
Nick Byrd, Ph.D.ByrdNick@nerdculture.de
2026-01-09

We've found recruiting people for online #research via #onlineAdvertising yielded good results on overt and covert #dataQuality measures (perhaps because participation incentives aren't financial):

Attention checks passed ≅ 2.6 out of 3

ReCAPTCHA (v3) ≅ 0.94 out of 1.0

Sample size > 5000 (from six continents)

doi.org/10.1017/S0034412525000

#surveyMethods #cogSci #psychology #xPhi #QualityControl #econ #marketing

Nick Byrd, Ph.D.ByrdNick@nerdculture.de
2026-01-09

I forgot to share the #mTurk data quality result that got scooped:

“In late 2020…. Participants from the United States were recruited from Amazon Mechanical Turk, -#CloudResearch, #Prolific, and a #university. One participant source yielded up to 18 times as many low-quality respondents as the other three.”

doi.org/10.1093/analys/anaf015

#psychology #philosophy #surveyMethods #quantMethods #dataScience #qualityControl

Figure 1b. Number of observations per sample,
before and after filtering for data quality (N = 460).
Nick Byrd, Ph.D.ByrdNick@nerdculture.de
2025-12-17

RE: mastodon.acm.org/@neilernst/11

😳 The #AI survey taker "rendered [attention quality] checks [ACQs] effectively obsolete. Across 6,000 total trials..., [it] committed only 10 errors, achieving an overall pass rate of 99.8% and scoring perfectly on 18 of the 20 ACQ types."

#surveyMethods #psychometrics #psychology #tech #psychology #philSci #SciComm #dataQuality

Nick Byrd, Ph.D.ByrdNick@nerdculture.de
2025-09-01

Can thinking aloud accurately capture how we decide?

Nisbet & Wilson's 1977 paper famously suggested it can't.

But #NLP and #AI methods may indicate that it can:
- escholarship.org/uc/item/4sb93
- openreview.net/forum?id=1Tny4K

#cogSci #psychology #philosophy #surveyMethods #thinkAloud #LLM

Nisbett, R. E., & Wilson, T. D. (1977). Telling more than we can know: Verbal reports on mental processes. Psychological Review, 84(3), 231–259. https://doi.org/10.1037/0033-295X.84.3.231Xie, H., Xiong, H.-D., & Wilson, R. (2024, August 26). From Strategic Narratives to Code-Like Cognitive Models: An LLM-Based Approach in A Sorting Task. First Conference on Language Modeling. https://openreview.net/forum?id=1Tny4KgGO2Zhang, Z., Xie, H., Baker, T. E., Peters, M., & Wilson, R. (2025). Linking Strategies to Think Aloud in A Stochastic Learning Task. Proceedings of the Annual Meeting of the Cognitive Science Society, 47(0). https://escholarship.org/uc/item/4sb936m3
2025-06-05

#ComparativeResearch #EducationMeasurement #SurveyMethods
Measuring education across countries is complex—but crucial for valid, comparable survey data. This study tests 16 coding strategies using ESS data and finds a strong contender for a new international standard.

Read the article:
Schneider S.L. & Urban J. (2025). A myriad of options: Validity and comparability of alternative international education variables. Survey Methods: Insights from the Field: 10.13094/SMIF-2025-00008

2025-05-09

@GESIS
Thanks for sharing the talk!
Here is the Question-Link R-Package and an in depth tutorial on using it:
matroth.github.io/questionlink

And please note that we also offer consultations regarding harmonization techniques (and other survey method topics).
gesis.org/en/consulting/survey

#surveymethods #harmonization #rstats

Nick Byrd, Ph.D.ByrdNick@nerdculture.de
2025-04-11

Given that surveys tend to overestimate belief in #conspiracyTheories (osf.io/preprints/psyarxiv/zsnc) and support for #politicalViolence (doi.org/10.1073/pnas.211687011), I wonder how much of the correlation between such variables remains after accounting for such measurement error.

#stats #psychometrics #surveyMethods

Nick Byrd, Ph.D.ByrdNick@nerdculture.de
2025-04-07

New #surveyMethods paper replicates and extends differences in #dataQuality, attention, naivety, decision style, etc. by
- online #research recruitment platform (#mTurk, #Prolific, #Qualtrics, #Pollfish)
- device (#mobile v. #desktop)
- person's incentive

doi.org/10.3758/s13428-025-026

Differences in framing effects and cognitive reflection test performance in Study 1 and in cost/participant in Study 2.Differences in conjunction fallacy, reflection test performance, incentives, attention, and pre-experiment activities for Study 2.Differences in exclusion rates (between lenient and strict exclusion policies) by platform and differences in psychological scales (like self-esteem, need for cognition, and self-control) for study 1.Differences in psychological scales, reflection test performance, conjunction fallacy, attention, etc. for Study 2.
2025-03-26

Matthias Roth (@rothm) and I published a new paper on survey data harmonization.

"One harmonization fits all? – Impact of missing population invariance on harmonization error when harmonizing social science survey questions with equating"

The paper again corroborates that observed score equating outperforms linear stretching on average. However, it also shows that it is better to derive a harmonization solution for single-item measures from a sample that is drawn from a similar population to the population you are interested in researching.

doi.org/10.1080/13645579.2025.

#surveyresearch #surveymethods #harmonization

A figure from the liked paper showing its central finding. Observed score equating on average outperforms linear stretching, but the potential bias of observed score equating gets higher the more dissimilar the samples for equating are from the sample that the resulting recoding table is used on.
Nick Byrd, Ph.D.ByrdNick@nerdculture.de
2024-11-28

Thankful to get a copy of Reflection and Intuition in A Crisis-Ridden World: doi.org/10.4324/978100330036

Hoping for a digital copy to annotate!

Can't access the #book? Some free papers cover similar work:
- doi.org/10.1111/meta.12534
- doi.org/10.3390/jintelligence1
- doi.org/10.31234/osf.io/y8sdm

I've only been able to glance at a few sections, but this seems like a book I would want to write:
- broad range of topics that interest more than academics
- situated in the history of ideas
- attention to replicability of results
- anticipation of future directions

Bravo! 👏

#cogSci #psychology #politics #epistemology #ethics #extremism #conspiracyTheory #rationality #logic #surveyMethods #psychometrics

CoverPage 143 about how reflective reasoning is — among other things — more consciously accessible than intuitive reasoning.Pages 154-155 about how reflective reasoning can perform support good reasoning or seemingly vicious reasoning (e.g., post hoc rationalization) a la "Bounded Reflectivism & Epistemic Identity" (Byrd 2022).Pages 156-157 on the distinction between reflection and rumination. As I say in "A Two Factor Explication of Reflection:..." (under review), rumination is lacking reflection's deliberative inhibition of impulses. Instead, rumination is characterized by letting those impulses intrude and persist. So rumination has only one of reflection's two ket features: consciously accessible thoughts.
Centre for Population ChangeCPCpopulation@sciences.social
2024-11-28

🗓️ Next week on Fri 6 Dec

Explore the challenges and opportunities for #socialsurvey #data collection in #Scotland.

Register online for this free in-person event #SocialResearch #SurveyMethods #ResearchMethods #DataCollection

eventbrite.co.uk/e/survey-futu

Bernd Weißberndweiss
2024-11-14

Are you working in the area of adaptive survey design? Submit your abstract to our session on "Innovations in Adaptive Survey Designs" and showcase your work. europeansurveyresearch.org/con The deadline's approaching fast! 🗓️

Nick Byrd, Ph.D.ByrdNick@nerdculture.de
2024-11-10

Excited to share YEARS of research about how to get people to think reflectively and how reflection impacts philosophical judgments at the 2025 #APA in #NewYorkCity (January 8 to 11): apaonline.org/mpage/2025easter

Can't make it?
- More about my talk: researchgate.net/publication/3
- More about my poster: researchgate.net/publication/3

Thanks to the #APA, James Beebe, and the Experimental Philosophy Society for the opportunity!

#decisionScience #philosophy #xPhi #epistemology #bioethics #cognitiveScience #mTurk #Prolific #UniversityParticipants #surveyMethods #DualProcessTheory

2025 APA Eastern website, with link to the programThe Experimental Philosophy Society's session with Nick Byrd's talk: "What Philosophical Tendencies Does Reflective Thinking Actually Cause (and What Did It Take to Find Out)?"Friday afternoon's poster session with Nick Byrd's poster "Reflection-Philosophy Order Effects and Correlations: Aggregating and Comparing Results from mTurk, CloudResearch, Prolific, and Undergraduate Samples"
2024-10-17

This latest blog, first featured in the Social Research Association, Research Matters magazine by Gerry Nicolaas, Director of Methods, explores the possibility of producing more inclusive survey data and the ethics of differential incentives.

natcen.ac.uk/it-ok-use-differe #SurveyMethods

2024-09-30

In a new paper, I simulate the consequences of combining survey data without proper harmonization techniques.

I demonstrate that there is a plausible risk of biased correlative analyses based on the integrated data, if we do not harmonize measurement units across different survey sources and instruments first.

#surveyresearch #surveymethods #harmonization

doi.org/10.5964/miss.11217

A figure from the paper showing simulation results. The x-Axis measures how large the difference in measurement units is for the survey variables formed from two survey data sources. They y-axis shows the possible bias for correlative analyses using such integrated variables. The result is t hat as the measurement unit problem increases, the potential for biased correlation also increases.
Nick Byrd, Ph.D.ByrdNick@nerdculture.de
2024-08-23

Moral comparisons of utilitarian tradeoffs depended on the rating protocol?

Participants rated pairs of utilitarian tradeoffs. Relative differences for each pair depended on
- whether participants saw both tradeoffs at the same time or separately.
- whether the rating was comparative or quantitative.

Sometimes (although not most of the time), the average relative difference for one protocol reversed in the other protocol!

doi.org/10.1016/j.cognition.20

#SurveyMethods #xPhi #ethics #moralPsychology

Pages 5 and 6Pages 7 and 8Pages 9 and 10Pages 11 and 12
Nick Byrd, Ph.D.ByrdNick@nerdculture.de
2024-08-19

Will people reason less reflectively when primed to think about threats to their health or resources?

Multiple pre-registered experiments (N > 3000) didn't detect any such reflection-suppressing effect of threat primes (compared to controls) — and this didn't seem to be a result of a failed manipulation.

doi.org/10.3758/s13428-024-024

#decisionScience #surveyMethods #medicine #health #economics #psychology #edu #policy

Experiment 1Experiment 1 results and Experiment 2Experiment 3 omitted the reflection text by accident.Experiment 4

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst