Something that wasn’t clear to me until recently was why affirmative action programs tend to exaggerate group differences in selective programs relative to the size of those differences in the overall population. This isn’t anything specific to the design of affirmative action programs (though some aspects of selective college admissions, like recruitment of athletes and use of nonquantitative factors for admissions, probably make the problem worse.) Instead, it’s the result of how normally distributed characteristics like academic ability tend to differ between groups more in the extremes of the distribution than in the center. So here’s an example of how that might work, in a somewhat less politically charged setting.
After the Battle of Five Armies, the remaining dwarves of Thorin’s company and their recently arrived cousins from the Iron Hills decided to make a magnificent throne for Dain, the new King Under the Mountain. It was decided that this throne would be decorated with 1,000 of the sparkliest colored gemstones (diamonds would be saved for his crown) that they could find amid Smaug’s immense trove.
After counting up each of the 100,000 gemstones in the trove, the dwarves examined each and applied the highly reliable Sparkliness Assessment Test to it, the precise methods of which are known only to the dwarves but which includes waving the gemstone in front of Bombur’s face while he is sleeping as one of its components. It was found that each type of gemstone had a different mean score on the SAT and a different standard deviation to its score:
|Gemstone||Mean Sparkliness Assessment Test Result||Standard Deviation||Number of This Type of Gemstone|
There were thus very consistent gaps in Sparkliness between gemstone types:
|Gemstones Compared||Average Sparkliness Assessment Test Gap|
|Sapphire to Ruby||190|
|Sapphire to Emerald||150|
|Amethyst to Sapphire||60|
Which the dwarves showed to Dain in a box plot that also showed the difference in outliers between the groups:
Initially, the dwarves decided to choose the very sparkliest 1,000 gems out of the 100,000 for the throne. This resulted in a very lopsided distribution, with many more Amethysts than Sapphires, and hardly any Emeralds or Rubies:
|Gemstone||Number Chosen for Throne (among top 1000)||Mean SAT among gemstones chosen for Throne||Minimum SAT among gemstones chosen for Throne||Maximum SAT among gemstones chosen for Throne|
However, the dwarves noted that among the gemstones chosen in this way (by simply ranking all 100,000 by their sparkliness and picking the top 1,000), there were relatively small gaps between groups, smaller than in the general population of Smaug’s trove, and in one case in the opposite direction.
|Gemstones Compared||Average Sparkliness Assessment Test Gap Among 1,000 Chosen|
|Sapphire to Ruby||-14 (the one Ruby chosen is above average among the Sapphires chosen)|
|Sapphire to Emerald||35|
|Amethyst to Sapphire||11|
In addition, the 1,000 taken as a whole were very sparkly: the mean SAT score was 1467.
Frustrated, however, with the idea of a throne for the King Under the Mountain which is so lopsided in the Amethyst direction, however, the Dwarves come up with another idea. Rather than choosing the sparkliest 1,000 gemstones of any type, they will choose the 250 gemstones of each type with the highest Sparkliness Assessment Test score. Now, this is what the distribution of scores looked like under this new system:
|Gemstone||Number Chosen for Throne||Mean SAT among gemstones chosen for Throne||Minimum SAT among gemstones chosen for Throne||Maximum SAT among gemstones chosen for Throne|
Under this new system, not only were the differences in mean SAT among the chosen gemstones larger than under the old system, they were actually larger than in the overall population of 100,000 gemstones:
|Gemstones Compared||Average Sparkliness Assessment Test Gap under new (equal representation by gemstone) system|
|Sapphire to Ruby||218|
|Sapphire to Emerald||182|
|Amethyst to Sapphire||94|
The mean SAT score for the 1,000 chosen was 1365, over 100 lower than under the old system.
The dwarves presented the results of their investigation to Dain, and asked him to weigh in on which system was preferred.
Here is my Stata code for this: I may well have made a mistake somewhere; you can also see that if you change the random number generation seed at the top, you’ll often get none at all of one of the four groups under the first system. The differences in means are roughly the current differences in mean SAT scores (so perhaps not representative of all students, just all SAT test takers); I couldn’t find the actual SDs for individual groups on the total score, so I made some assumptions based on how much I thought the verbal and math subtests would covary…
set seed 123456
set obs 100000
replace gemtype=”Ruby” if id<25000
replace gemtype=”Emerald” if id>25000&id<50000
replace gemtype=”Amethyst” if id>50000&id<75000
replace testscore=invnorm(uniform())*135+860 if gemtype==”Ruby”
replace testscore=invnorm(uniform())*135+900 if gemtype==”Emerald”
replace testscore=invnorm(uniform())*160+1110 if gemtype==”Amethyst”
graph box testscore, over(gemtype) ytitle(Sparkliness Assessment Test Score) title(Sparkliness Assessment Test(SAT) Scores) subtitle(By GemType)
graph export testscores.png, replace
bysort gemtype: sum testscore
tab gemtype getsin
sum testscore if getsin
by gemtype, sort: sum testscore if getsin
graph bar (mean) testscore if getsin, over(gem) ytitle(Sparkliness Assessment Test) title(Sparkliness Assessment Test) subtitle(By Gemstone Type among 1000 Sparkliest Overall)
graph export thousandmost.png, replace
bysort gemtype: egen rank2 = rank(-testscore)
sum testscore if afact
graph bar (mean) testscore if afact, over(gem) ytitle(Sparkliness Assessment Test) title(Sparkliness Assessment Test) subtitle(By Gemstone Type among 250 Sparkliest Of Each Type)
graph export afact.png, replace
by gemtype, sort: sum testscore if afact