"The Bell Curve"--solid center or abnormal deviate?

Richard B. Darlington
Copyright © Richard B. Darlington. All rights reserved.

This analysis of The Bell Curve by Richard J. Herrnstein and Charles Murray (Free Press, 1994) is based on a talk given by Darlington at Cornell on April 24, 1995.

One can think of The Bell Curve (TBC) as an attempt to enroll Richard Herrnstein's knowledge of psychology and psychometrics in the service of the conservative social agenda that Charles Murray has long favored. This agenda includes cutting back strongly on welfare, eliminating affirmative action and related programs, redirecting education funds from the disadvantaged to the gifted, and placing more emphasis on skills and abilities in immigration policy. Three psychological claims are used to buttress this agenda:

  1. A person's success and adaptation to society--as measured by an extremely wide variety of measures (income, regularity of voting, avoiding having low-birth-weight babies or babies out of wedlock, etc.)--are determined far more by intelligence as measured by IQ tests than by socioeconomic background.
  2. Intelligence is primarily inherited, so governmental attempts to boost IQ scores are in vain.
  3. Although point #2 does not necessarily imply that the observed difference in IQ scores between the races is determined primarily by heredity, in fact it is.
At a number of points TBC cloaks its main message in brief denials that seem to be thrown in for camouflage--reminding me of lines like, "Whoever kills Martin Luther King will be a hero to all true Americans. Of course I'm not suggesting that any of you kill him." TBC falls in this camp on many different points, including point #3 in this list. Numerous commentators have mentioned this ambivalence. Above I have reported their message as nearly all commentators have read it, not as Murray and Herrnstein themselves admit it.

Conservative Reaction to TBC

TBC is a political phenomenon. Within a few months of its publication, two other books-- The Bell Curve Debate (Times Books, 1995) and The Bell Curve Wars (Basic Books, 1995)--appeared containing numerous analyses of TBC. It comes as no surprise that liberal scientists such as Leon Kamin and Stephen Jay Gould have published vehement denunciations of TBC. Far more surprising is that equally ardent denunciations of the book have been published by prominent conservatives.

Consider for instance the December 5 1994 issue of National Review, which features commentary on TBC. The review by historian Eugene D. Genovese contains phrases like "incoherent treatment of race", "socially dangerous irresponsibility", and "chilling naivete, if not disingenuousness". Sociologist Brigitte Berger writes, "...a narrow and deeply flawed book...The worst thing for conservatives would be to become identified with the Murray-Herrnstein position." Economist Glenn C. Loury writes, "...in every instance there are political arguments for these policy prescriptions that are more compelling and more likely to succeed in the public arena than the generalizations about human capacities that Herrnstein and Murray claim to have established...Herrnstein and Murray are in a moral and political cul-de-sac. I see no reason for serious conservatives to join them there."

Similar views were expressed by economist and lawyer James J. Heckman of the University of Chicago, writing in the conservative Reason magazine of March 1995, who criticized TBC roundly. Heckman's review is the best single review I've seen. All these conservatives were underwhelmed by TBC's reasoning and evidence, even though they had long supported the book's policy conclusions.

With the understanding, then, that my comments do not concern TBC's policy conclusions and are unrelated to a general liberal-conservative split, I'll offer the following personal analysis of TBC. The rest of this piece is organized around the three psychological claims stated above.

IQ as a Determinant of Later Success

TBC's argument about the importance of IQ for later success occupies no fewer than 8 chapters of the 22-chapter book. The book reports analysis after analysis using data from the National Longitudinal Study of Youth (NLSY). In this study, a great many measures were taken in 1979 on a sample of about 12,000 young people aged 14-22. The same people were restudied in 1990, when they were aged 25-33. At this later time, some 25 measures were taken related to the person's success and adaptation to modern society. Had the person graduated from high school or college? What was the person's income? Was the person permanently out of the labor force due to injury? Did the person vote regularly and know the names or his or her US Senators? If the subject was a woman who had given birth, did the baby have low birth weight? And so on for a long list of variables that we think of as in some way measuring success or adaptation to society.

Herrnstein and Murray studied the relation of each of these 1990 "success" variables to two variables measured in 1979: a measure of IQ (actually a composite measure from the Armed Forces Qualification Test battery or AFQT), and a measure of socioeconomic status (SES)--also a composite measure. They correlated the IQ measure with the "success" variable while statistically holding constant the SES measure, and simultaneously correlated the SES measure with the "success" variable while statistically holding constant the IQ measure. TBC reports that by these analyses, most of these measures of success were far more highly related to IQ than to SES.

Although TBC claims that these results are entirely consistent with other work in the literature, they are in fact directly contadictory to work by Cornell's Stephen Ceci. In his 1990 book On Intelligence...More or Less, Ceci reports an analysis by himself and Charles Henderson in which adult income was predicted from measures of IQ and SES taken in the teenage years. In other words, the study was remarkably like the studies reported in TBC, although a completely different sample (also of several thousand subjects) was used. Interestingly, Ceci and Henderson got results exactly opposite those reported by TBC. Statistically holding SES constant removed essentially all relationship between teenage IQ and later income, while holding IQ constant had little effect on the measured relation between teenage SES and later income. Interestingly, TBC never cites this study published in 1990, though it cites other work published as recently as 1994 and repeatedly assures the reader it is providing a fair and comprehensive review of the scientific literature.

How then can two such similar studies yield such opposing results? To answer that, we have to look at the details. Although TBC's discussion of this work goes on for 8 chapters, it's important to remember that all these analyses use the same statistical methodology, the same sample of subjects, and the same measures of IQ and SES. Thus any defect in their approach affects dozens of analyses spanning all these chapters. The aforementioned review by Heckman was extremely critical of this series of analyses, and most of my criticisms of it are taken from that source.

First, unlike the College Board tests, the AFQT makes no attempt to distinguish between "aptitude" and "achievement" items; that would be irrelevant to identifying people who can serve as a radar technican or computer repairer in the armed forces. Thus it is doubtful that the AFQT composite that TBC calls an "IQ" measure is really that. Rather it seems to measure general knowledge which reflects socioeconomic background as much as native intelligence.

Second, there are numerous defects in TBC's measure of socioeconomic status--defects that would tend to lower its observed relation to later success. First, it is based entirely on self-report from teenage children, and often these children don't even know, for instance, how many years of schooling their parents completed. A great many of these children didn't know their parents' income, so Herrnstein and Murray simply left that variable out of their SES composite--even though sociologists usually think of income as a major part of what they mean by SES. Second, the questions the children were asked concerned the status of their families "right now"--not the average status across the 18 or so years of the youth's upbringing. Thus if a father had for years held a middle-management position but had been laid off and was temporarily working as a house painter, the house-painter occupation was the one entering the TBC measure of SES. Third, Herrnstein and Murray arbitrarily rounded down the highest measures of SES and rounded up the lowest measures, thus introducing more error into the measurement of this construct.

After reviewing all these sources of error, Heckman concluded, "The authors have no good way to separate genetic from social influences on social behavior. Their environmental data are too crude and the AFQT score they use is obtained too late in life to make a genetic-environmental distinction meaningful." Recall this is a conservative writer fully sympathetic with TBC's policy conclusions. And if the criticisms seem like the kind that could be leveled against virtually any social research, recall that the Ceci-Henderson study did reach exactly the opposite conclusions. I regard the conclusions of these 8 chapters as unproven.

The Effect of Schooling on IQ scores

TBC conveys the strong impression that there is consensus on this issue among experts--that schooling has little effect on IQ scores. However, Ceci's book has a chapter on exactly this issue, titled "The impact of schooling on intelligence". In it he reviews approximately a dozen studies or groups of studies concerning the effect of schooling on IQ scores. These studies are nearly unanimous in concluding that schooling does seem to have a large effect on IQ scores. These include studies on the IQ scores of "canal-boat children" in Great Britain during the early years of this century. Their parents were mostly successful small businesspeople who transported goods across Britain in canal boats. The children, living on the boats with their parents and living quite normal lives except for not attending school, tested very low on IQ. Other studies have shown that measured IQ drops noticeably over summer vacation. Still others have studied Indian children in South Africa. The years of schooling available to these children through much of the last 50 years depended heavily on chance factors having to do with the towns in which they lived; some towns had much more schooling available than others. A cohort of Dutch children missed several years of school during World War 2. Other scholars have compared German children born in March to those born in April. By German law, the latter group starts school a whole year later than the former group, and that group was found to have substantially lower measured IQ scores than the former group. Other investigators have studied American children in Appalacian communities, which early in this century often lacked schools. Still others have compared Northern American blacks to Southern blacks and to southern whites. It seems to me that virtually every one of these sets of studies exploits a pretty good natural experiment on the effect of schooling on IQ--a natural experiment that provides a better answer to the question than any of the evidence offered by Herrnstein and Murray. With virtual unanimity, these studies lead to the conclusion that schooling does have a substantial impact on measured IQ scores. Yet Herrnstein and Murray ignore this entire literature.

American Black-White Differences in IQ

TBC devotes one chapter (chapter 13) to arguing that observed black-white differences on IQ are primarily genetic in origin. This chapter relies heavily on a review of research by others--especially Arthur Jensen. A large fraction of the apparent "meat" in this chapter consists of work by Jensen. Again, Ceci reviews the same literature as H&M, and comes to very different conclusions. However, here I'll emphasize a point about Jensen's work discovered by my former student Carolyn Boyce.

In 1980 the American Journal of Psychology asked me to review Jensen's Bias in mental testing, which made what seemed at the time to be a big splash in the media, though TBC has now set a new standard. When I read the book, I was unconvinced by very large sections of it. However, Jensen's description of one study left me very impressed. This was an unpublished 1951 doctoral dissertation by Frank McGurk. McGurk matched 213 black high-school students very closely to 213 white students. Each black student was matched with a white student in the same curriculum in the same school and with a nearly-equal score on an SES scale. Thus McGurk's matching for "environment" was virtually unparalleled in the research literature on black-white differences. McGurk then administered a broad set of 74 IQ-test items to all 426 students (that's 2 x 213), and examined the black-white difference on each item individually.

According to Jensen, McGurk's results fit well with Jensen's own theory that

blacks perform better on tests involving rote learning and memory than on tests involving relation eduction or reasoning and problem solving, especially with content of an abstract nature...For example, from McGurk's study there are two items of equal difficulty (28 percent passing) in the combined groups...

1. ABYSMAL :: (a) bottomless (b) temporal (c) incidental (d) matchless

2. A hotel serves a mixture of of three parts cream and two parts milk. How many pints of cream will it take to make 15 pints of this mixture?
(a) 5 (b) 6 (c) 7.5 (d) 9 (e) 12
Jensen reported that the mean white-black difference was very small on the first item and quite large on the second. Jensen then said, "Many similar examples can be found in McGurk's (1951) report." (Jensen, pp. 526-7).

Since Jensen was citing an unpublished doctoral dissertation, the temptation was strong to accept his summary of the findings. But when I showed these pages to Boyce, she used inter-library loan to obtain a copy of McGurk's unpublished doctoral dissertation. She found that the vocabulary item just mentioned was one item in a 10-item vocabulary scale while the milk-cream item was one item on a scale of 9 arithmetic word problems. The average black-white difference on the vocabulary scale was 4.4% while the average difference on the arithmetic word problems was nearly identical at 5.3%. In fact, the milk-cream item was the one item from its scale showing the largest black-white difference while the ABYSMAL item was the one item from its scale showing the smallest difference favoring whites--contrary to Jensen's assertion that "Many similar examples can be found in McGurk's report."

The milk-cream item did not appear to require more complex reasoning than other items in that scale. For instance, in a broad sense it seems virtually identical to the following item:

This item in effect asks students to find three-fifths of 20, while the milk-cream item asks them to find three-fifths of 15. Yet on this wire-cutting item, the black students actually slightly outperformed the whites. The only conclusion I can reach is that when Jensen is faced with a complex body of evidence, he is very selective in how he reports that evidence to his readers. Yet TBC relies very heavily on Jensen for its discussion of racial differences.

Let me mention another point about black-white differences in IQ scores that Boyce and I discovered at that time. With my permission, Ceci included this point on pages 148-9 of his 1990 book, so the point was available to Murray and Herrnstein if they had chosen to look at that book. The question is: what are the average IQ scores of Caucasians outside the developed world--defined for the decades under review as North America, Northwest Europe, Australia, and New Zealand? A comprehensive literature review by Richard Lynn (Ceci pp. 148-9) found just 12 published references to mean IQ's of samples from this population--studying subjects from Portugal, Spain, Italy, southeastern Europe, Iraq, Iran, and India. The mean, median, and midrange of the 12 within-sample means are all 85. Further, when you discard the 6 studies which either failed to mention the sample size or reported sample sizes of 25 or less, the 6 remaining means range only from 83 to 87. Thus the mean IQ scores of Caucasians outside the industrialized world seem to be essentially equal to that of American blacks. Again, culture and schooling seem central.


Thus in all three areas in which TBC attempts to summarize the psychological literature, it provides an extremely biased summary. And even conservatives who agree with the book's policy conclusions find little merit in the book's attempt to use psychological knowledge to advance those policies. The title of Heckman's Reason review says it all: "Cracked Bell".