Neoteny as Ashenfelter’s Dip

A couple weeks ago I argued for “Toad’s Corollary” as a general guideline for empirical analyses of social interventions:

If long-term impacts of a social intervention are larger than short-term impacts, in either the negative or positive direction, you’re probably/might be/could be missing something.

It occurred to me that there’s a familiar-among-economists version of this principle that might be worthwhile thinking about. This is the so-called “Ashenfelter’s Dip,” named for Orley Ashenfelter and discussed more systematically by James Heckman in a 2000 article. 

The idea is that people who end up in government programs often end up there because something bad happened to them. If they are in a job training or employment program in particular, it could be because they recently lost their job, or because there was a downturn in their region or sector of the economy that caused a training program to be opened up. Consequently, Ashenfelter observed that the earnings of people in programs showed a “dip,” for about a year before they entered a program. Something like this:


Why is this important? Well, because a lot of times, economists and other social scientists try to determine the effects of a program using a “difference-in-differences” or matching design.

For example, let’s say we’re trying to estimate whether a new program for unemployed people in California works to get them back on their feet and earning money again. Everyone in California who is unemployed gets the program, so we can’t randomly assign it. Instead, we’ll compare the California unemployed people to unemployed people in Nevada. Since Californians and Nevadans don’t have the same income, we look only at the portion of Californians who have Nevadans with very similar earnings over the 12 months before the evaluation starts.


The problem here is that the Californians are different from the Nevadans– they’ll bounce back to a higher level regardless of what the program does. Moreover, they’ll keep getting more different, the further away from the program in time we get. We could call that a long-term result of the program, but much more likely is that the Californians are just recovering from the initial shock that caused the “Ashenfelter’s Dip” in the first place.

To make this a little more quantitative, we can imagine that both groups have some measure of human capital, for example SAT scores. Let’s say the Californians are in group 1 (and will be getting the program) and the Nevadans are in group 0 (they won’t be getting the program):


Before whatever bad thing happened to the Californians (group 1), they are making about $10,000 more on average than the Nevadans (group 0), with their earnings somewhat random but somewhat based on their human capital.


The two groups are matched on their earnings in Year 2, the program occurs in Year 3. The program has zero impact, but everyone’s earnings regress to their individual mean determined by their human capital. As a group, everyone’s earnings wanders around somewhat randomly over time, but the Californians’ gradually rises back towards their pre-bad thing level. The negative shock to the Californians’ earnings (blue below) wears off over time, while the Nevadans’ earnings (red below) stays at their lower, stable level.


Or, just looking at the differences between the two groups:

Year Difference in Earnings (California-Nevada)
3 -1108
4 348
6 2748
8 2966
10 4008

Our “estimated treatment effect” (which is really just the negative shock wearing off over time) keeps getting larger: naively, we might think that the program has larger long-term effects than short-term effects, but really this is a sign of the differences between the two groups emerging over time.

An obvious extension of this is to education programs: just because you can match on baseline test scores at the time the kids enter the program, doesn’t mean that they are “really” the same kind of kids. This is particularly true since many cognitive characteristics won’t be stable and reliably observable until later in adolescence or even adulthood.

But what if intelligence in general is a kind of Ashenfelter’s Dip? That is, one of the ways humans are different from other animals is how very incompetent we are as children, how much care and support we require over time. Our neoteny as a species– our carrying on of fetal and infant characteristics long into development- appears to be intrinsically related to our capacity for general problem solving and creativity. This immediately suggests a problem with “matching” one group of humans to another at an early stage of development to determine the effects of later programs or experiences. Just because one group might appear more precocious, on one measure or another, does not mean that they will end up more cognitively competent at a later date. Instead, they might be simply less neotenous and have less ground to make up coming up.




2 thoughts on “Neoteny as Ashenfelter’s Dip

  1. I think your two main points are valid: (i) a naive calculation of the benefits of, say, an employment training program would likely overestimate its effects, and (ii) the exact same process could bias the findings of studies on interventions on children that aim at improving their prospects as adults.
    That said, let’s think about point (i) a bit more. A relatively weak proposal to avoid the error you mention would be to consider an extensive set of studies made about the same (or similar) kind of intervention, with the same (or similar) kind of people. One could dig into the evolution of subjects’ earnings to establish empirical estimates of the size of the Ashenfelter’s dip, and try to correct for that effect in future studies. The reason why this is a relatively weak proposal is that appropriate studies, in terms of methodology and available raw data, may not exist. So I have two further ideas.
    1. Using a synthetic control group. “Nevadans with very similar earnings over the 12 months before the evaluation starts” may not be that good a control group; instead, you can use data on a wide array of groups to build a synthetic control by assigning weights to each of them according to their distance relative to Californians in terms of the variables you find relevant. This doesn’t *completely* solves the problem, though, because you still have to go through variable selection and defining appropriate generalized distance measures.
    2. An even better idea, if the government of California were really interested in continuous improvement of its policies, would be to drop your fundamental premise that “Everyone in California who is unemployed gets the program, so we can’t randomly assign it”. Synthetic control groups can be very useful, but quasi-experimental techniques are not meant to be the gold standard of research anyway. Granted, RCTs may not solve *all* our problems: if you were to use a lottery to decide who gets the job training program, you could only do that among those who had applied to the program, and those people would probably be different from those who hadn’t applied. In other words, external validity will always be an issue. But I’m pretty sure randomizing the treatment is still the best option by far in the case of such government programs, and the ethical and political arguments against it are usually weak, and valid only in very specific settings.

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s