Let’s say we take 500 bright-eyed 6 year olds, just beginning the 1st grade, and give them a test of reading ability. Their scores are normally distributed, from those who still don’t know their letters to those who are already reading chapter books independently.

Then we find those same 500 kids, 12 years later, and give them another test of reading ability. Their scores are still normally distributed, but most likely over a wider range– some kids are reading Joyce and Tolstoy and physics textbooks at age 18 , some kids are still barely literate or essentially not literate at all.

We can look at each kid’s score over time: did they go up or down, relative to other kids their age, from age 6 to age 18?

We might be tempted to attribute their change in scores to their experiences in school, whether they had involved parents who encouraged them to learn, or lived in a supportive neighborhood.

In fact, maybe we could go find the kids who scored a standard deviation higher as 18-year-olds than as 6-year-olds: those kids must have had great teachers- give them a raise!

And then we could find the kids who scored a standard deviation lower as 18-year-olds than as 6-year olds: those kids must have had lousy teachers- better send ’em packing!

This is the basic intuition behind value-added modeling, although VAM is done over a shorter time period (usually from one school year to the next) and with more caveats and controls.

In fact, the basic intuition behind many quasi-experimental social science models is that, “if you can go back far enough,” you can control for the influence of genes, and that everything going forward can be attributed to environment, or policy, or parenting, or the quality of public services.

But genetics is a process, not an endowment. And the way in which it contributes to how much you know is dynamic, through preferences and habits as well as by abilities. For example, kids who have, genetically, a higher propensity at age 10 to be able to read have a largely heritable higher propensity to enjoy reading independently as well. Particularly in rich societies, kids seek out environments that suit their cognitive abilities and (largely genetically determined) interests.

As a result, perhaps the single most well-replicated finding in behavioral genetics is the linearly increasing heritability of cognitive abilities from infancy (20%) through adulthood (60%)**.**

So, let’s take that finding seriously and go back to our group of 500 kids we were observing. What would the same group of kids’ scores look like if 35% of their scores at age 6 were due to some unobserved genetic factor, and that factor increased in influence linearly to age 18?

Oh. Kinda familiar.

And what would it look like if we were able to single out those with high values of that unobserved factor (call them “lucky genes”), 1 or more standard deviations above the mean, as well as low values (call them “unlucky”), 1 or more standard deviations below the mean?

Double oh.

Some responses:

1. Reading ability is (I would think) measured on an ordinal scale, not a cardinal one. So while it would make sense to say that the variance in height increases with age (measured in inches), I don’t quite understand what it means to say the variance in reading scores increases. Similarly, I don’t know what it means to say the scores are normally distributed–surely this depends on the choice of questions. and maybe on a post-test score normalization (like they do on the SAT).

2. The pattern in heritability could be all or partly a measurement issue, i.e. it’s hard to measure cognative abilities for young children.

3. Most importantly: I don’t understand your criticisms of VAM. The validity of VAM (or any measurement of treatment effects) depends on random assignment to treatment. Random assignment is what balances out the unobserved factors. An attack on VAM can’t just point out the existence of unobserved factors, it has to talk about how there is systematic non-random assignment to teachers.

(BTW, glad you’re blogging again.)

LikeLike

1. We normally would normalize scores by age-specific levels, but if you gave a test that had sufficient range to capture 6-year-old through 18-year-old ability, the variance would certainly expand, since 18-year-olds have to show a wider range (and more nested skills); i.e., you need to recognize letters to decode words, need to decode words to understand sentences, need to understand sentences to summarize paragraphs, and so on to comparing multiple passages and synthesizing across multiple previously read texts (the way you’d have to in an AP test given to HS seniors, for example.)

It’s not critical for this argument, but I think it gets lost in discussions of why, for example, the gaps in 12th grade NAEP scores are so much larger than gaps in 4th grade scores, though there are other reasons as well. And visually, it helps convey the other points more clearly.

2. I think you have pretty reliable and valid measures by age 8 or so. Obviously a problem for infancy. But in terms of between-group tests, the gaps are smaller at earlier ages, which would not be the case (i.e., would average out) if it were simply a test-retest reliability/measurement issue.

3. In the other (linked) VAM post I discuss random assignment some, and make a more complete version of this argument- you’re right that conditional on random assignment, unobserved differences just increase the noisiness of VAM without introducing bias. But most VAM is observational; even if you randomly assign kids to kindergarten teachers and then calculate VAM from their scores at the end of the year, and then correlate this with earnings at age 26, ala Chetty, you’re still introducing the increasing influence of genetics over the course of the year. What you want to do is find high VAM teachers and -then- randomly assign the kids to a high VAM group and a low VAM group. Studies that do this, like Transferring Talented Teachers, show smaller effects.

LikeLike

https://spottedtoad.wordpress.com/2016/02/03/value-added-modeling-and-behavioral-genetics/

LikeLike

On number 1: I still don’t know what it means to say the variance expands over time. The tests for 6-year olds and 18-year olds are (I assume) totally different. They aren’t measured in consistent units.

If you want to look at test score “gaps,” the gap is commonly measured relative to the within grade variability on that particular test. Comparing a 12th grade gap to a 4th grade gap just means comparing the differences in means for a grade to the standard deviations for that grade.

Correct me if I’m wrong, but I still don’t understand how you can compare varaibility over tests that are not measured in consistent units.

LikeLike

I’ll concede that it was probably distracting for me to say, when I really just wanted to visually highlight increases versus decreases (though if everything were on the same scale obviously the right side of the graph would be shifted way up.) You’re right that gaps will be expressed in terms specific to that grade-level and test, but I do think it’s useful to say that there is just more stuff for kids to know or not know as they get older.

LikeLike

NWEA’s Measures of Academic Progress is a computer-adaptive assessment that is given across a wide age range (so it should tell us something about the question of whether the variance increases with age): https://www.nwea.org/content/uploads/2015/06/2015-MAP-Normative-Data-AUG15.pdf It looks like end-of-year standard deviations are about 40% higher for reading at 55% higher for math at grade 11 than in kindergarten, but SDs for “Language Usage” (which I guess is writing) and General Science are fairly constant over time.

It’s interesting that, contrary to the idea that summer vacation is at the root of (some) American educational inequality, variance in the chart above increases over the course of the year. This could be because differences in school quality compound over the course of the year, and then wear off during summer break, or because similarity in inputs increases the effect of heritable variation, I guess.

LikeLike