Value Added Modeling and Behavioral Genetics

 The “Goya Beans and Wildflowers” story I posted last week  (true by the way) is my way of explaining the following ideas:

Idea #1) If we take behavioral genetics results even somewhat seriously, then VAM must have limited validity as a measure of teacher quality.

A) Behavioral genetics results indicate that the influence of genetics on cognitive skills *increases* as kids age. This is perhaps the most consistent single result in decades of behavioral genetics studies. :

In the context of current concerns about replication in psychological science, we describe 10 findings from behavioral genetic research that have replicated robustly. These are “big” findings, both in terms of effect size and potential impact on psychological science, such as linearly increasing heritability of intelligence from infancy (20%) through adulthood (60%).

B) The particular ages in which VAM is primarily used (8-13, and soon to be 8-17) are, in particular, ages where scientists observe a big increase in the percent of variance in cognitive ability due to genetics rather than environment. ( See figure 1 here )

Change in Heritability

C) VAM assumes the reverse– that you can “cancel out” the student-specific prior factors by controlling for prior achievement (pretest below) and student observed characteristics (eligibility for free lunch, ELL status, race, gender):

valueaddedtestinggraphic

D) For this reason, existing VAM models mostly account for the fact that different groups, on average are likely to learn more or less in a given year (though they still often combine Asian and white students.) But the within-group variance in how much the kids in a classroom learn is assumed by the nature of the model to be a direct measure of how effective the kid’s teacher was that year. )

image

E) Insofar as anyone has looked at the heritability of individual student value-added, it appears to be just as heritable as baseline achievement, at around or over 50 percent. This is also true for relatively homogenous populations- it isn’t just an artifact of discrimination against individual groups.

Idea #2) Some of the weird stuff people observe about VAM could be explained by assuming Idea # 1 is true.

validity-and-reliability

A) VAM doesn’t have very good within teacher reliability. For example, there is only a 0.35 correlation between NYC teachers’ VAM in one year and the same teacher teaching the same subject the following year .

Teacher VAM in year t is just not a good predictor of teacher VAM in year t+1. Every single time someone looks at this they get fairly low correlations.

B)In spite of the Gates Foundation investing hundreds of millions of dollars in encouraging teacher evaluation system adoption and in attempting to demonstrate the correlations among different measures of teacher quality (including VAM), their own data does not validate this.  In settings in which students are randomized, VAM is only very weakly correlated with teacher quality as measured by multiple observers observing multiple lessons using a structured rubric : a teacher would have to move from the 4th to the 96th percentile in quality measured by observers in order to show a 0.06 SD increase in student achievement, equivalent to moving kids from the 50th to the 52nd percentile in achievement. Basically, going from the worst to the best teachers as measured by observers gives you a measureable but quite small increase in measured student achievement.

met

C) As Jesse Rothstein has repeatedly found, 5th grade VAM predicts 4th grade VAM for the same kids, in almost every dataset found.

nyc-vamfake

D) Observational VAM score-associated impacts don’t fade out as fast as actual experimental impacts. Yes, I know people use this to say that teacher impacts are just super-duper important and class size or whatever isn’t. But the more reasonable interpretation is that experimental impacts fade out because that’s what impacts do. Why is VAM the only program “impact” that doesn’t show fade-out? Because it’s not an impact, it is measuring an underlying characteristic of the kids. Show me the component of VAM that does fade out, and you’ll be on your way to tracking down true teacher impacts.

FadeOut Curve

plant-growth-stages-plant-progress-concept-34209974

So what gives?

Over time, people converge to their natural ability. And VAM is measuring a piece of that process of convergence. (This is especially true if you make the tests focused on abstract reasoning ability rather than school specific knowledge.) So if you by the luck of the draw get a bunch of kids who are going to converge upward, then your VAM is high— and those kids go on to earn a bunch more as adults.

I think this is all the difference between viewing the time series of human development as a random walk, buffeted by environmental shocks, and viewing it as having a unit root.  In any biological system, there is going to be significant variation in the future growth course, even in the same environmental conditions, as long as there is significant genetic variation.

unitroot-001

 

There are a very small number of studies that potentially avoid these issues and show teacher effects and some validity of VAM. For example, the Transferring Talented Teachers study found teachers who had high VAM in one school and paid them to go to another school, where students were randomly assigned to theirs or another class.  The study found positive effects for elementary schools where the high-VAM teachers were transferred.

Note however that…
A) The teachers’ VA shrunk considerably after the teachers were transferred to a new group of kids. It didn’t shrink to zero, but it still shrunk.
B) The study is informative that teachers transferred from another school to a lower performing school (with trouble staffing its classrooms) are better than the teachers that remain in these troubled schools- not that high a bar to pass.

C) This same study only showed positive impacts for elementary teachers: there were zero/slightly negative impacts for middle school teachers in their new schools.

Middle School IMpacts

 

D)Other well-executed RCTs that would shed light on the validity of VAM generally have zero impacts. For example, you can search the total number of studies in “Teacher and Leader Effectiveness” that meet the federal governments’ standards for attrition and follow-up. There is a single study with “potentially promising” effects- a high-attrition RCT with an unusual pattern of zero initial impacts, positive long-term impacts. All the studies that met federal standards show zero impacts.

The overall conclusion is not that VAM has zero validity– it almost certainly has some, but that it has much lower validity than is claimed, and that the main determinant of how much kids learn in school from year to year is not teachers but the kids themselves. It’s not that teachers don’t matter– I’ve spent my life believing that teaching matters quite a bit, at least in terms of whether school is an interesting or pleasant place to be, and good teachers (my own, my children’s, and the colleagues I had over ten years) are the people I respect most in the world. But teachers are not the main determinant of test scores or earnings for the students they teach.

 

14 thoughts on “Value Added Modeling and Behavioral Genetics

  1. Found this blog through randomcriticalanalysis.wordpress.com. I am a public high school science teacher and thought that VAM might not be effective as students get older because of that increasing heritability factor as students get into high school. I have been reading a lot on heritability of IQ and correlations to what I observe as a public school teacher. My ninth grade students seem to struggle in my class compared to earlier grades. They and their parents express to me they used to be successful in science (and school in general) when they were younger, but struggle as they transitioned into high school. I know this could be due to many factors, but it does seem like genetic differences (specifically IQ) might be largely (?) to blame in the reduction of achievement.

    This connection you wrote about in this post makes so much sense to me. I used to think that SES was the most important driver of success in school when I drank the liberal Koolaid in college, but now it seems that it has been genetically driven all along. We are now being held accountable for trying to change a child’s biology through VAM accountability and that seems ludicrous to me as a science teacher.

    Do you know of any other sources of information on this topic in this post, specifically IQ/genetic differences and the correlation to VAM scores? If I had the time and the money I would be interested in studying this myself in graduate school and doing the research!

    Like

    1. I’m not aware of any studies other than the two I linked to that look specifically at value-added; if you’re interested in studies of school performance that include genetic confounds I recommend looking at the UK Twin Study, eg http://www.kcl.ac.uk/ioppn/news/records/2014/October/Why-is-educational-achievement-heritable.aspx

      My feeling is that VAM will be more valid as a measure when based on multiple years (since some of the variation due to student-specific genetic characteristics will cancel out) or when student assignment is random but I agree that currently education policy makers seem very over optimistic about their efficacy as an all-purpose tool.Thanks for reading.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s