The effects sizes reported by a randomized controlled trial of a social program can easily be overstated. Perhaps the recipients of a program feel grateful for the help, and so overstate their outcomes in survey results (this might have happened in the Opportunity NYC results, where survey outcomes showed an increase in earnings for recipients while state administrative data showed null effects.) Perhaps the people who actually take up the offer to participate are more motivated individuals or those whose outcomes are just about to bounce back from a temporary setback (the so-called Ashenfelter Dip.) The programs or regions who participate in an evaluation might have volunteered to participate precisely because they are likely to show larger effects than others implementing the same intervention. (See Alcott and Millanaithan’s studies of the OPower energy savings evaluations.)
But it is also possible for an RCT to understate effects, or to indicate negative effects where the truth is closer to zero. The recent, much discussed study of publicly provided pre-K in Tennessee is a potential example. The study showed some positive outcomes at the end of the intervention year, followed by roughly null effects in early follow ups, followed by negative effects in later follow-ups.
My own view is that the long-term effects of pre-K on cognitive outcomes are probably negligible, and that such outcomes will fade out pretty much monotonically once the intervention is completed. This is for the simple reason that kids grow up. They’re changing all the time, in variable and individually-directed ways. Once they leave an educational setting that individually-directed variability comes to dominate quite quickly.
But shouldn’t that individual-directed variability/unobserved heterogeneity cancel out, on average, in a randomized trial? In a perfectly executed one, perhaps, but in the case of the Tennessee Pre-K study, there were several methodological problems that suggest this was not the case. It was based on joiners, first of all (consent to the study happened after randomization). More problematically, the “negative effects” happen between the third and fourth wave of data collection, which is also when they appear to be losing lots more treatment sample members than control members.
There’s no reason to believe that a pre-K program couldn’t do harm instead of good- in fact, if we believe the Iron Law of Evaluation, we should see negative effects as often as positive. But if there are negative effects getting larger and larger over time, when the kids are no longer being exposed to the intervention- my money is on the randomization not having done its job.