New Evidence and Design Considerations

for Repeated Measure Experiments in Survey Research

Diana Jordan
Duke University

Trent Ollerenshaw
University of Houston

APW & MEAD
2025 November 10

Andrew Trexler
UW-Madison

Why Should You Care

If you…
- Conduct experiments with surveys
- Review or consume scholarship with survey experiments
- Want accurate estimates of ATEs
- Are concerned about the replication crisis

We show…
- Traditional experimental design is often suboptimal
- Repeated measure designs dramatically improve power
- Suitable for many experimental settings, but…
- Not without costs: slight attenuation of ATEs

A Motivating Example

Do you think that federal spending on foreign aid should be increased or decreased?

\(\bar Y_0 = 0.310\)

Spending on foreign aid makes up about 1% of the federal budget. Do you think that federal spending on foreign aid should be increased or decreased?

\(\bar Y_1 = 0.398\)

Traditional Post-only Design

\(Y_i = \beta_0 + \beta_1T_i + \epsilon_i\)

Unbiased under weak assumptions (randomization, SUTVA, attrition)
Imprecise
Requires large samples for adequate power
Can be improved with covariate adjustment if…
- \(X \perp T\)
- \(corr(X,Y) \neq 0\)

Repeated Measure Designs

Traditional post-only design
- \(Y_{i_{post}} = \beta_0 + \beta_1T_i + \epsilon_i\)
Repeated measures designs
- Pre-post: \(Y_{i_{post}} = \beta_0 + \beta_1T_i + \beta_2Y_{i_{pre}} + \epsilon_i\)
- Quasi: \(Y_{i_{post}} = \beta_0 + \beta_1T_i + \beta_2Y_{i_{quasi}} + \epsilon_i\)
- True within-subject: \(Y_{ij} = \alpha + \beta T_i + \epsilon_{ij}\)

Conventional Concerns

Consistency pressures
Demand incentives
Priming effects
Could produce treatment attenuation or exaggeration

Clifford, Sheagley, and Piston (2021)

6 studies with randomized designs (post-only, pre-post, or quasi-post)
Found no evidence of design bias, but large precision gains
- \(N=1000\) post-only experiment \(\rightarrow\) \(N \approx\) 200 to 600 with RM
Heavily cited by researchers adopting pre-post designs

“Given the clear gains in precision and weak evidence that repeated measures designs change treatment effects, we recommend that researchers use pre-post and within-subjects designs whenever possible.” (CSP, 1062)

Need for Further Study

CSP provide compelling evidence
- But enough to shift experimental design doctrine?
CSP use nonprobability samples
- Less professionalized respondents may behave differently
CSP use just 1 within-subject study
- Large % of citations are for within-subject studies
- Student sample (\(n = 900\))
Other design considerations remain unanswered

Aims of Our Study

Large-scale replication of the central claim (no design effect)
Three primary extensions:
- Analyze both probability & non-probability samples
- Field more within-subject experiments
- Assess differences in DEs by proximity of repeated measures

Experimental Design

Design

We field 6 experiments on three separate samples (\(N_j = 18\) total studies)
Each respondent completed 6 experiments
Random 2 experiments used a post-only design
Random 4 experiments used a repeated measure design
Question order was randomized to vary distance between measures

Experiments

Pre-post designs
1. Info treatment on foreign aid (Gilens 2001)
2. Party cues treatment on drug imports (CSP 2021)
3. Framing treatment on GMOs (CSP 2021)

Within-subject designs
1. Welfare/assistance to the poor (Smith 1987)
2. Affirmative action for minorities/women (Wilson 2009)
3. Opioid clinic nearby/distant (De Benedictis-Kessner 2019)

Samples

NORC AmeriSpeak Panel (probability sample, \(n = 4033\))
Prolific (nonprobability, \(n = 4261\))
Lucid (nonprobability, \(n = 4869\))
Combined \(N_{ij}=78,978\) observations

Randomization

Randomly assign treatment/control for each experiment.
Randomly assign 2 experiments to post-only designs, 4 to repeated measures.
Randomize order of pre-treatment blocks.
Randomly assign each participant to 1 of 2 order randomizations.
Randomize order of post-treatment blocks.

Results

Average Treatment Effects

Bootstrapped ATEs

Estimated Design Effects

Meta-analyses

Design Considerations

We find no meaningful differences…

Between sample providers
By respondent professionalization
By respondent attentiveness
Between experiment types
By how far apart RM are placed

But repeated exposure to repeated measures may increase DEs!
Attitude recall questions may be the real culprit

Taking Stock

RM designs attenuate treatment effects by ~20%
But also shrink SEs by ~50%
Suitable for many applied settings
Precision gains usually trump design bias

Taking Stock

How to consider the attenuation and precision tradeoff?
We simulate 300,000 experiments
Vary design, sample size, true ATE, & attenuation
Evaluate power, absolute error, false discovery, & coverage

Simulations

Caveats & Next Steps

All evidence is from online panels
Limited range of interventions & topics

Repeated measure designs in other survey modes
Sensitive topics/interventions
Measurement scales
Other design considerations?

Thank you!

Special thanks to TESS, the Rapoport Family Foundation, and Duke Bass Connections for supporting this research.

Contact:

Diana Jordan
Duke University
scholars.duke.edu

Trent Ollerenshaw
University of Houston
trentoll.github.io

Andrew Trexler
UW-Madison
atrexler.com

Appendix

Respondent Professionalization

Respondent Attention

Identify first between-groups RM experiment for each respondent
Assess accuracy of self-reported attitude change
Estimate DE in subsequent between-groups RM experiments for in/accurate respondents
No consistent differences

New Evidence and Design Considerations

Why Should You Care

A Motivating Example

Traditional Post-only Design

Repeated Measure Designs

Conventional Concerns

Clifford, Sheagley, and Piston (2021)

Need for Further Study

Aims of Our Study

Experimental Design

Design

Experiments

Samples

Randomization

Results

Average Treatment Effects

Bootstrapped ATEs

Estimated Design Effects

Meta-analyses

Design Considerations

Taking Stock

Taking Stock

Simulations

Simulations

Simulations

Simulations

Caveats & Next Steps

Thank you!

Appendix

Respondent Professionalization

Respondent Attention

Respondent Attention

Repeated Measure Proximity

Repeated Measure Proximity