Mismatch Under the Mountain

Something that wasn’t clear to me until recently was why affirmative action programs tend to exaggerate group differences in selective programs relative to the size of those differences in the overall population. This isn’t anything specific to the design of affirmative action programs (though some aspects of selective college admissions, like recruitment of athletes and use of nonquantitative factors for admissions, probably make the problem worse.) Instead, it’s the result of how normally distributed characteristics like academic ability tend to differ between groups more in the extremes of the distribution than in the center. So here’s an example of how that might work, in a somewhat less politically charged setting.


After the Battle of Five Armies, the remaining dwarves of Thorin’s company and their recently arrived cousins from the Iron Hills decided to make a magnificent throne for Dain, the new King Under the Mountain. It was decided that this throne would be decorated with 1,000 of the sparkliest colored gemstones (diamonds would be saved for his crown) that they could find amid Smaug’s immense trove.


After counting up each of the 100,000 gemstones in the trove, the dwarves examined each and applied the highly reliable Sparkliness Assessment Test to it, the precise methods of which are known only to the dwarves but which includes waving the gemstone in front of Bombur’s face while he is sleeping as one of its components. It was found that each type of gemstone had a different mean score on the SAT and a different standard deviation to its score:

Gemstone Mean Sparkliness Assessment Test Result Standard Deviation Number of This Type of Gemstone
Amethyst 1110 160 25,000
Emerald 900 135 25,000
Ruby 860 135 25,000
Sapphire 1050 145 25,000

There were thus very consistent gaps in Sparkliness between gemstone types:

Gemstones Compared Average Sparkliness Assessment Test Gap
Sapphire to Ruby 190
Sapphire to Emerald 150
Amethyst to Sapphire 60

Which the dwarves showed to Dain in a box plot that also showed the difference in outliers between the groups:


Initially, the dwarves decided to choose the very sparkliest 1,000 gems out of the 100,000 for the throne.  This resulted in a very lopsided distribution, with many more Amethysts than Sapphires, and hardly any Emeralds or Rubies:

Gemstone Number Chosen for Throne (among top 1000) Mean SAT among gemstones chosen for Throne Minimum SAT among gemstones chosen for Throne Maximum SAT among gemstones chosen for Throne
Amethyst 815 1469 1407 1700
Emerald 3 1423 1408 1448
Ruby 1 1472 1472 1472
Sapphire 181 1458 1407 1649

However, the dwarves noted that among the gemstones chosen in this way (by simply ranking all 100,000 by their sparkliness and picking the top 1,000), there were relatively small gaps between groups, smaller than in the general population of Smaug’s trove, and in one case in the opposite direction.

Gemstones Compared Average Sparkliness Assessment Test Gap Among 1,000 Chosen
Sapphire to Ruby -14 (the one Ruby chosen is above average among the Sapphires chosen)
Sapphire to Emerald 35
Amethyst to Sapphire 11



In addition, the 1,000 taken as a whole were very sparkly: the mean SAT score was 1467.

Frustrated, however, with the idea of a throne for the King Under the Mountain which is so lopsided in the Amethyst direction, however, the Dwarves come up with another idea. Rather than choosing the sparkliest 1,000 gemstones of any type, they will choose the 250 gemstones of each type with the highest Sparkliness Assessment Test score. Now, this is what the distribution of scores looked like under this new system:

Gemstone Number Chosen for Throne Mean SAT among gemstones chosen for Throne Minimum SAT among gemstones chosen for Throne Maximum SAT among gemstones chosen for Throne
Amethyst 250 1535 1481 1700
Emerald 250 1259 1210 1448
Ruby 250 1223 1174 1472
Sapphire 250 1441 1392 1649

Under this new system, not only were the differences in mean SAT among the chosen gemstones larger than under the old system, they were actually larger than in the overall population of 100,000 gemstones:

Gemstones Compared Average Sparkliness Assessment Test Gap under new (equal representation by gemstone) system
Sapphire to Ruby  218
Sapphire to Emerald 182
Amethyst to Sapphire 94


The mean SAT score for the 1,000 chosen was 1365, over 100 lower than under the old system.

The dwarves presented the results of their investigation to Dain, and asked him to weigh in on which system was preferred.

Here is my Stata code for this: I may well have made a mistake somewhere; you can also see that if you change the random number generation seed at the top, you’ll often get none at all of one of the four groups under the first system. The differences in means are roughly the current differences in mean SAT scores (so perhaps not representative of all students, just all SAT test takers); I couldn’t find the actual SDs for individual groups on the total score, so I made some assumptions based on how much I thought the verbal and math subtests would covary…


set seed 123456
clear all
set obs 100000
gen id=_n
gen gemtype=”Sapphire”
replace gemtype=”Ruby” if id<25000
replace gemtype=”Emerald” if id>25000&id<50000
replace gemtype=”Amethyst” if id>50000&id<75000
gen testscore=invnorm(uniform())*145+1050
replace testscore=invnorm(uniform())*135+860 if gemtype==”Ruby”
replace testscore=invnorm(uniform())*135+900 if gemtype==”Emerald”
replace testscore=invnorm(uniform())*160+1110 if gemtype==”Amethyst”
graph box testscore, over(gemtype) ytitle(Sparkliness Assessment Test Score) title(Sparkliness Assessment Test(SAT) Scores) subtitle(By GemType)
graph export testscores.png, replace
bysort gemtype: sum testscore

sort testscore
gen newrank=_n
gen getsin=(newrank>99000)
tab gemtype getsin
sum testscore if getsin
by gemtype, sort: sum testscore if getsin
graph bar (mean) testscore if getsin, over(gem) ytitle(Sparkliness Assessment Test) title(Sparkliness Assessment Test) subtitle(By Gemstone Type among 1000 Sparkliest Overall)
graph export thousandmost.png, replace
bysort gemtype: egen rank2 = rank(-testscore)
gen afact=rank2<251
sum testscore if afact
graph bar (mean) testscore if afact, over(gem) ytitle(Sparkliness Assessment Test) title(Sparkliness Assessment Test) subtitle(By Gemstone Type among 250 Sparkliest Of Each Type)
graph export afact.png, replace
by gemtype, sort: sum testscore if afact


2 thoughts on “Mismatch Under the Mountain

  1. Having a gut sense for how normally distributed traits translate into the real world is a damn hard thing to master, but once done, it is like a philosopher’s stone, allowing feats to be performed usually thought impossible.

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s