A mother of a three-year-old, looking for childcare, calls up her local Head Start office and asks if she can enroll. The program is oversubscribed, the person on the other end replies, but they are holding a lottery, and if she agreed to participate, her child could get in, either this year or the next. If she is selected by the lottery, her child can enroll this year; if she is not selected she would have to wait another year until her child is four. She and her child would also be followed- taking surveys and sharing test scores and other information with some government researchers, over a period of several (or many) years.
She agrees, they check her eligibility, and an hour or a day later, they call her back. She got in or she didn’t; she can enroll her child or she has to wait a year, she’ll be in the treatment group or the control group of the study. They’ll follow up with her and her child and her child’s teacher, over and over, and on average, the difference between the treatment and control group are the impact, the net effect, of a year in Head Start. This is about as clear a counterfactual as social science offers us.
But even this example (based on the Impact Study of Head Start) is not as clear-cut as it might appear. She might have decided that the study sounded too invasive (maybe she is an illegal immigrant and is worried about her information ending up in the wrong government hands) or realized she didn’t want her child to be in Head Start after all; she could have declined the study or decided even after being a “winner” in the lottery that her child shouldn’t enroll. She could have died, or moved away. She could have such good connections with the local Head Start Director that they let her kid in even after he was a “loser” in the lottery. She could choose not to pick up the phone the following year when the survey firm the researchers hire calls again, and again, and again. Her child could be absent when the follow-up reading test is administered.
There are technical fixes and statistical patches to try to address non-response bias, cross-over, or low take-up rates of the program or differences in take-up rate in by group. In the end, however, the researchers can only reliably generate what is called an “intent to treat” estimate of the program’s effects– how much a child and parent’s outcomes change, on average, simply from being offered a spot in the program through the lottery– not how much participating in the program actually changes a child’s outcomes, and still less how Head Start changes outcomes in other times and places, or how the country would change if there were no Head Start.
Angus Deaton, the most recent Nobel laureate in economics and co-author of the recent study on white middle aged-mortality, brings related issues up in a notorious (among advocates of randomized evaluations) essay called Instruments of development: Randomization in the tropics, and the search for the elusive keys to economic development:
Finally, I want to return to the issue of “heterogeneity,” a running theme in this lecture. Heterogeneity of responses first appeared in Section 2 as a technical problem for instrumental variable estimation, dealt with in the literature by local average treatment estimators. Randomized controlled trials provide a method for estimating quantities of interest in the presence of heterogeneity, and can therefore be seen as another technical solution for the “heterogeneity problem.” They allow estimation of mean responses under extraordinarily weak conditions. But as soon as we deviate from ideal conditions, and try to correct the randomization for inevitable practical difficulties, heterogeneity again rears its head, biasing estimates, and making it difficult to interpret what we get. In the end, the technical fixes fail and compromise our attempts to learn from the data. What this should tell us is that the heterogeneity is not a technical problem, but a symptom of something deeper, which is the failure to specify causal models of the processes we are examining. This is the methodological message of this lecture, that technique is never a substitute for the business of doing economics.
There is an implicit problem in defining causality as the kind of thing you can learn from an experiment: any experiment becomes only a thing-in-itself, and not a clue to a broader order or understanding of the world. In education, there are a multiplicity of slogans posing as theories (“Using Our Understanding of How the Brain Works to Guide Instruction”) and a near infinitude of competing agendas and programs, but very little of an attempt in recent years to advance a coherent theory of how children develop and learn, in a way that acknowledges relevant facts instead of wishing them away. The Federal Government and private funders have recently put increased emphasis on RCTs as a way of separating ineffective programs from effective ones, but technique is only a tool and never a substitute for the business of trying to understand the world.