Boosting the R-Squared

Human beings and human societies are complicated, and there are a lot of things that could influence how people and events turn out. Hotter days in the summer have more assaults and murders than cooler days, even controlling for the day of the year and the city or neighborhood. The Nazis had trouble invading the USSR in part because their trains ran on a different rail gauge than the Soviets’. New York overtook Philadelphia as the premier American city in part because the completion of the Erie Canal allowed for shipping up the Hudson to travel all the way to the Great Lakes. If Charles Darwin hadn’t dropped out of medical school, he never would have ended up as naturalist on the Beagle or collecting finches in the Galapagos.

There is a temptation therefore to view the world as a series of contingent events, random occurrences that build on one another, butterflies flapping their wings to cause hurricanes, history and chance. This temptation is intensified by the pervasive desire not to understand the world but to change it: to engineer each small change and build one upon another to generate a transformed world.

But the world isn’t that random, or rather the random events, all those pluses and minuses, add up to just around zero. The Iron Law is the Iron Law, and the patterns of social behavior and difference are fairly stable. Haiti isn’t Switzerland, Mumbai isn’t LA.

Sociologists get a lot of well-earned derision (there is a reason the Real Peer Review Twitter account finds so many sociology articles to mock) but there is one way in which sociologists- the good ones anyways- tend to outperform their clever economist cousins: because they are trained to view the world as a result of entrenched systems (of power) and institutionalized structures (of oppression), they tend to admit of persistent patterns rather than focus on the temporary or marginal deviations from those patterns. The outcomes of children in one parent versus two parent families, the geography of segregation in all but the most dynamic cities, the racial hierarchy of school achievement or racial disparities in longevity and health don’t change much from year to year, and regardless of your attribution of the cause of these differences, recognizing those patterns is a useful first step.

A recent kerfuffle in economics is perhaps illustrative of the value of admitting the big picture patterns before honing in on the small deviations (that theoretically are more “policy relevant.”) A paper accepted at the prestigious American Economic Review was snooped out by some anonymous commenters at Economics Job Market Rumors as being suspiciously similar in methods and data sources to an obscure paper published years before in the Journal of Biosomatic Medicine. Two hundred pages of comments later, George Borjas, the well-known Harvard economist of immigration, came out on the side of the anonymous commenters and against some failures in peer review that allowed the study to be published. Brett Matsumoto, a young economist at the Bureau of Labor Statistics, made a more careful methodological critique and showed how you could use the authors’ methods to make an entirely spurious effect appear plausible.

The paper itself is one of the growing literature of “birth timing” studies, that use seemingly random variation in the date of conception or birth to make inferences about the effects of intrauterine environment, prenatal nutrition or maternal health care. In this case, the authors claim that experiencing the death of a relative during pregnancy leads to pre-term birth, lower birthweight, and worsened long-term mental health. I’ve written before about why I think this literature is mostly bunk: those seemingly random variations in birth date often aren’t random at all.

Were the same paper published using American data, there would be an additional obvious critique that would be made: the authors leave out race as a potential confound. Race of mother is correlated strongly with the outcomes of pre-term birth and low birthweight and with family mortality (which affects treatment status), so the omitted variable bias could substantially affect the impact estimates.  The paper uses Swedish data, and includes some estimates that exclude foreign-born mothers (though not non-ethnic Swedes) so the bias may be smaller than it would be for an American sample with more heterogeneous ancestry.

But the point remains that in situations where very little of the underlying variation is explained and in which assignment to the treatment is not truly random, almost any identification strategy will be a matter of debate and arguably as likely to produce spurious effects as reliable estimates. The large majority of variation in birthweight and propensity to pre-term birth has nothing to do with whether a relative died during pregnancy versus afterwards, needless to say.

The appeal of papers that focus on unexpected events, however, is the implicit claim that, because these events are contingent, they are more “policy-relevant”: they show that the world can be changed, the wounded land healed. But this is backwards.

When my wife was six months pregnant with our first child, we were finishing up with the seemingly endless round of mid-year parent-teacher conferences at 8:30 one night (the longest day of the teacher’s year) when she started having shooting pain in her back. We  left the school and went to go get some pizza, and the pain continued. She called her doctor: all right to take Advil at this point? Her doctor asked her to come down to the hospital, just to get checked out. We got there around 10, her doctor examined her and told us she was in early labor and was in danger of delivering the baby three months early. They admitted her to the hospital, put her on intravenous terbutaline, and around 3 in the morning they sent us home and put her on bed rest. The baby came three months later, a week late and over nine pounds. A victory for modern medicine.

Did our doctor know that my wife was, by virtue of her race, much more likely to go into pre-term labor when she told her to come down to the hospital? I’d certainly hope so. There’s no reason why the big patterns in social life can’t help us act. We can’t erase them with a well-chosen statistical wand; they will outlast me and perhaps the American Economic Review. But admitting these patterns, documenting them and seeking to understand them, rather than scoping out every potential butterfly wing that might someday cause a hurricane, is probably our best chance at a better world.

6 thoughts on “Boosting the R-Squared

  1. “New York overtook Philadelphia as the premier American city in part because the completion of the Erie Canal allowed for shipping up the Hudson to travel all the way to the Great Lakes.”

    -New York City had overtaken Philadelphia by 1790, long before the Erie Canal.


    1. It’s a bit ambiguous- Philadelphia wasn’t defined by its present-day borders until the mid-19th century, while Manhattan was Manhattan. Suffice it to say the Erie Canal helped New York consolidate its economic dominance.

      Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s