New Evidence and Design Considerations

for Repeated Measure Experiments in Survey Research

Diana Jordan
Duke University
Trent Ollerenshaw
University of Houston


Graduate Research Workshop
2025 April 18
Andrew Trexler
Duke University

Why Should You Care


  • If you…
    • Conduct experiments with surveys
    • Review or consume scholarship with survey experiments
    • Care about causal inference
    • Care about replicability and theory building
  • We show…
    • Traditional experimental design is suboptimal
    • Repeated measure designs dramatically improve power
    • Suit many experiment types and sampling strategies
    • Suit short survey modules
    • Not without costs: slight attenuation of ATEs

A Motivating Example

  • Do you think that federal spending on foreign aid should be increased or decreased?


\(\bar Y_0 = 0.310\)

  • Spending on foreign aid makes up about 1% of the federal budget. Do you think that federal spending on foreign aid should be increased or decreased?

\(\bar Y_1 = 0.398\)

Traditional Post-only Design


\(Y_i = \beta_0 + \beta_1T_i + \epsilon_i\)

  • Unbiased under weak assumptions (randomization, SUTVA, attrition)
  • Imprecise
  • Requires large samples for adequate power
  • Can be improved with covariate adjustment if…
    • \(X \perp T\)
    • \(corr(X,Y) \neq 0\)

Repeated Measure Designs

  • Traditional post-only design
    • \(Y_i = \beta_0 + \beta_1T_i + \epsilon_i\)
  • Repeated measures designs
    • Pre-post: \(Y_{i,post} = \beta_0 + \beta_1T_i + \beta_2Y_{i,pre} + \epsilon_i\)
    • Quasi: \(Y_{i,post} = \beta_0 + \beta_1T_i + \beta_2Y_{i,quasi} + \epsilon_i\)
    • Within-subject: \(Y_{i,j} = \alpha + \beta T_i + \epsilon_{i,j}\)

Conventional Concerns


  • Consistency pressures
  • Demand incentives
  • Priming effects
  • Could produce treatment attenuation or exaggeration

Clifford, Sheagley, and Piston (2021)


  • 6 studies with randomized designs (post-only, pre-post, or quasi-post)
  • Found no evidence of design bias, but large precision gains
    • \(N=1000\) post-only experiment \(\rightarrow\) \(N \approx\) 200 to 600 with repeated measures
  • Heavily cited by researchers adopting pre-post designs

“Given the clear gains in precision and weak evidence that repeated measures designs change treatment effects, we recommend that researchers use pre-post and within-subjects designs whenever possible.” (CSP, 1062)

Need for Further Study


  • CSP provide compelling evidence
    • But enough to shift experimental design doctrine?
  • CSP use nonprobability samples
    • Less professionalized respondents may behave differently
  • CSP analyze just 1 within-subject experiment
    • Large % of citations are for within-subject experiments
    • Student sample (\(n = 900\))

Aims of Our Study


  • Large-scale replication of the central claim (no design effect)
  • Three primary extensions:
    • Analyze both probability & non-probability samples
    • Field more within-subject experiments
    • Assess differences in DE by proximity of repeated measures

Experimental Design

Design


  • We field 6 experiments on three separate samples (\(N_j = 18\) total studies)
  • Each respondent completed 6 experiments
  • Random 2 experiments used a post-only design
  • Random 4 experiments used a repeated measure design
  • Question order was randomized to vary distance between measures

Experiments


  • Pre-post designs
    1. Information treatment on foreign aid (Gilens 2001)
    2. Party cues treatment on drug imports (CSP 2021)
    3. Framing treatment on GMOs (CSP 2021)
  • Within-subject designs
    1. Welfare/assistance to the poor (Smith 1987)
    2. Affirmative action for minorities/women (Wilson 2009)
    3. Opioid clinic nearby/distant (De Benedictis-Kessner 2019)

Samples


  • NORC AmeriSpeak Panel (probability sample, \(n = 4033\))
  • Prolific (nonprobability, \(n = 4261\))
  • Lucid (nonprobability, \(n = 4869\))
  • Combined \(N_{ij}=78,978\) observations

Randomization


  1. Randomly assign treatment/control for each experiment.
  2. Randomly assign 2 experiments to post-only designs, 4 to repeated measures.
  3. Randomize order of pre-treatment blocks.
  4. Randomly assign each participant to 1 of 2 order randomizations.
  5. Randomize order of post-treatment blocks.

Results

Average Treatment Effects

Bootstrapped ATEs

Estimated Design Effects

Meta-analysis

Respondent Professionalization

Respondent Attention

Respondent Attention

  • Identify first between-groups RM experiment for each respondent
  • Assess accuracy of self-reported attitude change
  • Estimate DE in subsequent between-groups RM experiments for in/accurate respondents
  • No consistent differences

Repeated Measure Proximity

Repeated Measure Proximity

Takeaways

  • Repeated measure designs attenuate treatment effects by ~20%
  • But also shrink standard errors by ~50%
  • In most research applications, gains to precision trump design bias
  • Suitable for short surveys & different sample/experiment types
  • Exceptions: large-N w/large treatment effect, sensitive topics

Thank you!



Special thanks to TESS, the Rapoport Family Foundation, and Duke Bass Connections for supporting this research.

Contact:

Diana Jordan
Duke University
scholars.duke.edu

Trent Ollerenshaw
University of Houston
trentoll.github.io

Andrew Trexler
Duke University
atrexler.com