Ten Questions for Evaluating Evaluations

Ten Questions for Evaluating (Randomized) Evaluations (of Social Programs)

  1. Was randomization really random?

“Using the administrative database to form the majority of the control group allowed a higher percentage of referrals from high-needs referral sources such as the courts to receive the program.”

2. Did the program group mostly get the program?

“Establishing relationships with intended beneficiaries was critical, as 22 out of 200 program members attended at least one training session.”

3. Did the control group mostly not get the program?

“While the central intake system established by the state filtered out individuals who had been placed on a list of study group members, it was impossible to determine if control group members self-enrolled in services themselves or through other referral sources separately.”

4. How much attrition was there?

“Over 220 of the 400 individuals randomly assigned eventually completed the 1 year follow-up survey.”

5. How non-random or differential was the attrition?

“This follow-up data included all 22 of the program group members who attended at least one training, as well as 198 out of 200 control group members.”

6. If two groups are shown to be comparable at baseline, are they the same two groups for which the alleged impacts are being shown?

“Randomization was successful, with program group members (n=232) scoring on average at the 32nd percentile on the Dweazil-Zappa III 1st Grade Assessment  and control group members (n=234) at the 33rd…the nationally normed  Moon-Unit-Zappa assessment used in the 5th grade showed that program group members (n=107) scored at the  47th percentile and control group members (n=85) at the 35th, a dramatic gain for the program group.”

7. Any weird confounds?

“Classroom observations for the program schools were conducted by Author A and classroom observations for the control group schools were conducted by Author B.”

8. Were these unusually popular or stable program sites?

“In spite of declining enrollment state-wide, officials were able to identify seventeen sites which they deemed were likely to maintain high enrollment in spite of the challenges of random assignment.”

9. Do the unadjusted postintervention means tell the same story as the complicated parametric impact model?

“Although the program group showed slight declines in outcomes when viewed naively, when properly adjusted using growth-curve analysis it is clear that the Watching Puppet Plays About Feelings intervention outscored the Reading to Your Kids intervention by a statistically significant amount, particularly among high-shyness personality subgroups.”

10. Do the impacts sound too large to be believable, or the described study design prohibitively expensive, were it actually carried out? 

“As few of you know, I was born Kal-El on the planet Krypton, before being rocketed to Earth as an infant. The yellow star Sol has given me a number of powers unfamiliar to the citizens of Earth, among them the ability to personally collect perfectly normally distributed data with a 96 percent response rate from thousands of respondents, without any outside grant funding.”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s