1/23/96

CONTACT: Stanford University News Service (650) 723-2558

Stereotypes found to affect performance on standardized test

STANFORD -- Standardized tests can not accurately measure intellectual merit because racial and gender stereotypes interfere with the intellectual functioning ofthose taking the tests, according to Stanford psychology Professor Claude Steele.

Steele reported his findings at the annual convention of the American Psychological Association on Saturday, Aug. 12, in New York City. The meeting also featured a task force report on what is known about genetic links to intelligence, and three other sessions devoted to the controversy over The Bell Curve. In that 1994 book, Richard Herrnstein and Charles Murray argued that at least some part of group differences on IQ tests is inherited, reflecting some innate intellectual inferiority of African Americans.

In a symposium about several research projects at Stanford, the University of Michigan and the State University of New York, Steele detailed experiments on factors that can depress the academic performance of women and African Americans in college environments. His seven-year research project bears on three currently controversial issues:

In laboratory testing at Stanford and in a field program at the University of Michigan, Steele found that a dynamic that he calls "stereotype vulnerability" may be responsible for depressed performance. He also found that the performance gaps between men and women in mathematics, and between whites and African Americans as expressed in test scores, grades, and dropout rates, can be eliminated with appropriately designed affirmative action programs.

"These findings demonstrate another process that may be contributing to racial and gender differences in standardized test performance, a process that is an alternative to the genetic interpretation suggested in The Bell Curve," Steele said before leaving for the conference. "And they show that group differences in school achievement can be reduced substantially by programs that emphasize challenge instead of a 'dumbing down' remediation."

Steele also said the findings "underscore the danger of relying too heavily on standardized test results in college admissions or otherwise. The research shows that societal stereotypes can systematically depress the test performance of some groups more than others, even when those groups enter the test situation with equal knowledge."

Steele's research - conducted since 1991 in the psychology department at Stanford with Joshua Aronson, now an assistant professor at the University of Texas at Austin, graduate students Joseph Brown and Kirsten Stoutemyer, and with Steven Spencer at the State University of New York at Buffalo - is supported by grants from the Russell Sage Foundation and the National Institute of Mental Health. It is the latest entry in a century-long controversy over alleged intelligence differences among groups such as European, African and Asian Americans, or women and men. Psychologists periodically argue over whether group differences on standardized tests stem from genetic differences and are thus more difficult to eradicate, or from environmental differences between groups, which are easier to change. Still others argue they merely reflect bias in the tests.

"To this set of explanations, our findings add a new possibility - that stereotype vulnerability and its differential impact on groups in the immediate testing situation" are responsible for a difference in performance, Steele wrote in a paper prepared for the annual meeting of the psychology association, an organization of 132,000 researchers, educators, clinicians, consultants and students organized into four dozen sub-fields of psychology. (Stanford Psychology Professor Philip Zimbardo participated at the convention in a symposium on shyness; psychology graduate student Lisa Stallworth presented research she did with Stanford Professor Felicia Pratto in another session on militaristic and nationalist attitudes.)

IQ research status: inconclusive

At a Sunday session on "Intelligence: Knowns and Unknowns," a task force established by the Board of Scientific Affairs of the APA discussed a report that was commissioned to grapple with the scientific, as opposed to the political, issues raised by the publication of The Bell Curve. Ulrick Neisser of Emory University chaired the task force.

"Like every other trait, intelligence is the joint product of genetic and environmental variables," the report said. "Gene action always involves a (biochemical or social) environment; environments always act via structures to which genes have contributed. Given a trait on which individuals vary, however, one can ask what fraction of that variation is associated with differences in their genotypes (this is the heritability of the trait), as well as what fraction is associated with differences in environmental experiences."

The differentials in average African American and European American IQ scores were treated in the report as a continuing puzzle requiring more study.

"African American IQ scores have long averaged about 15 points below those of whites, with correspondingly lower scores on academic achievement tests. In recent years the achievement-test gap has narrowed appreciably. It is possible that the IQ-score differential is narrowing as well, but this has not been clearly established. . . . Several culturally based explanations of the black/white IQ differential have been proposed; some are plausible, but so far none has been conclusively supported. There is even less empirical support for a genetic interpretation. In short, no adequate explanation of the differential between the IQ means of blacks and whites is presently available," the report said.

Steele's studies do not directly address ethnic group differences in IQ scores, but they do provide evidence about a possible mechanism for another less-well-known phenomenon in standardized test data - that equally prepared blacks will do more poorly in college than their white contemporaries.

"The gap can be substantial," Steele said in his paper. "In a recent cohort of graduates from a large prestigious university, the mean ACT score for white students with a C+ cumulative average was at the 34th percentile, while that for blacks with this average was at the 98th percentile."

The national college dropout rate for African Americans is 70 percent, compared to a 42 percent rate across all groups nationally. "We are in a crisis stage concerning African Americans and their schooling," he told the symposium audience at the Marriott Marquis Hotel.

Some have attributed such achievement gaps to lower motivation and achievement expectations, but Steele says that explanation seems inadequate, given that "the racial achievement gap is just as great among students testing at the 98th percentile - scores that presumably reflect high academic motivation and expectations - as it is among more typical students."

Steele's theory is that stereotype vulnerability, the unsettling expectation that one's membership in a stigmatized group will limit individual ability, may be at the root of lower grades and SAT scores for African Americans. Stereotype vulnerability raises interfering anxiety during testing or classroom situations, Steele wrote. The same dynamic also could explain why highly skilled women at the university level drop out of programs in math, engineering and the physical sciences, he added.

"Surprisingly, you don't have to believe in the stereotype to be vulnerable to it," he pointed out to his audience at the APA convention.

"Everyone in a collective knows the stereotypes about a given target group, including the group members themselves, and everyone knows that everyone knows," Steele wrote in a paper prepared for the convention. "Thus the predicament of 'stereotype vulnerability': The group members then know that anything about them or anything they do that fits the stereotype can be taken as confirming it as self-characteristic, in the eyes of others, and perhaps even in their own eyes. This vulnerability amounts to a jeopardy of double devaluation: once for whatever bad thing the stereotype-fitting behavior or feature would say about anyone, and again for its confirmation of the bad things alleged in the stereotype.

"Consider the woman student who gives the wrong answer in math class. She is vulnerable to the judgment, as is anyone, that she lacks a particular skill. But she is also vulnerable to confirming, or to being seen as confirming, the deeper limitation alleged in the stereotype."

Steele's experimental evidence

In a number of experiments, Steele and his colleagues were able to depress the average performance of high-achieving African American and women college students by subtly implying that well-known stereotypes about those groups' intellectual ability might apply to the test they were about to take.

In one case, Steele and his colleagues tested to see whether "stereotype vulnerability" also could be induced among white males by indicating to test takers that Asians have tended in the past to do better than Americans on a difficult mathematics exam.

In this experiment, white male Stanford students, who presumably do not have a lifetime of experience with being stigmatized, performed less well than a control group of white males who were not "placed under suspicion" by the circumstances of the testing. That suggests, Steele said, that stereotype vulnerability is something that can afflict people in general.

Circumstances that the researchers set up in the laboratory are common in some classrooms. They included such practices as having students check off their race on a form before taking a test, or having an instructor indicate that a math test that is about to be taken is one that may show gender differences.

But in control groups where similar students were given no reason to suspect that the demeaning stereotypes would apply to their performance, both African Americans and women performed as well as whites and males, respectively, on extremely challenging tests, the report said.

In one trial, for example, Steele reported, he and Spencer "recruited women and men students, mostly sophomores, who were both good at math and strongly identified with it - to approximately the same degree - and then gave them a very difficult math test one at a time. The items were taken from the advanced GRE in math and we assumed, would strain the skills of these students without totally exceeding them. We expected that women would underperform in relation to men on this test even though their skills and identification with math were essentially the same. This is because the relevant gender stereotypes should make the frustration they experience more self-threatening, and in turn, more disruptive of their performance.

"This is precisely what happened," he reported.

In another experiment with an advanced test in literature, rather than math, women performed as well as equally qualified men. A second experiment with an easier math test also did not show women underperforming equally qualified men. "The lack of performance frustration on this easier test, we reasoned, disconfirmed the self-relevance of the stereotype to women test- takers and, in this way, made their test-taking experience less self- threatening."

Steele argues that stereotype vulnerability varies with situations so that women are more subject to it in math than in English, and African Americans in academic classes than in athletics. "Performing in domains where prevailing stereotypes allege one's inferiority. . . creates a predicament in which any faltering of performance can confirm the stereotype as self-characteristic. This predicament. . . can cause apprehension and self- consciousness about conforming to the stereotype that directly interferes with performance in that situation. Disruptive pressures such as evaluation apprehension, test anxiety, choking and token status have long been shown to disrupt immediate performance through a variety of mediating mechanisms - interfering anxiety, reticence to respond and distracting thoughts, self consciousness and the like" he reported. "The proposal here is that stereotype vulnerability is another such interfering pressure."

To further test the theory, another experiment was devised in which takers of the hard math test were "either told that the test generally showed gender differences - implying the stereotype of women's math inferiority was relevant to interpreting their own frustration - or that it showed no gender differences - implying that the gender stereotype was not relevant to their performance on this particular test. . . . In dramatic support of our reasoning, women performed worse than men when they were told that the test produced gender differences - replicating women's underperformance in earlier experiments - but they performed equal to men when the test was represented as insensitive to gender differences, even though, of course, the same difficult test was used in both conditions. Genetic limitation did not cap the performance of women in these experiments."

Every individual in an ability-stigmatized group is not vulnerable to negative stereotypes every time he or she takes a test, Steele cautioned, but "across the full range of test-takers in stereotype vulnerable groups, the weight of this vulnerability may substantially depress the group's overall performance, a depression that could account for a significant portion of that group's underperformance in relation to other groups." Such an interaction also could explain why stigmatized minorities in a number of other countries also show about a 15-point IQ gap from the dominant population group.

Another portion of the gap may be the end result of repeated experience with stereotype vulnerability, he wrote. Recent research by others suggests that many individuals within stereotyped groups eventually "dis- identify" with school achievement in general or with a particular subject, reformulating their sense of who they are in order to feel less vulnerable.

At the same convention session, for instance, psychologist Brenda Major of the State University of New York at Buffalo reported on her work on stigmatized individuals and groups. Using a scale measure of school disidentification, she and her colleagues found that black students were more disidentified than white students in several small college samples, and that for disidentified students of both races, negative feedback about an intellectual task had less effect on their self-esteem than it did for those students who identified more with school.

"The more an individual disidentifies with a domain, the less motivated and persistent he or she will be to succeed in that domain," Major told the convention symposium.

Psychology Professor Jennifer Crocker of the University of Michigan wrote in her report for the symposium that "it seems likely that, over time, the direction of causality goes both ways - repeated experience with stereotype vulnerability and its consequent debilitating effects on performance will lead to disidentification with academic pursuits, which in turn will further depress performance."

Implications for affirmative action

The hopeful side of his research, both Steele and Crocker said, is that the results demonstrate that performance can be improved by making changes in academic environments so that they don't support or amplify ability- demeaning stereotypes. Steele also reported on his demonstration project at the University of Michigan which has raised the college grades and reduced the drop-out rate of African American participants. John Jonides of Michigan reported on another project that has resulted in a 45 percent reduction in drop- out rates.

"You have to do something to break the sense of being under suspicion, in order to allow these students to be less defensive and more openly engaging of their academic work," Steele said before the symposium. Honorific recruiting and mentoring programs that allow people to say "I really do belong" are examples. Schools and teachers also should provide "challenge over remediation" and "portray ability as something that's expandable, because it is."

He was critical of the type of college affirmative action programs that are remedial based. "Remediation is a sin in this work, because remediation reifies the stereotype," he said at the symposium. Programs that "provide special counselors, special orientations, special graduations even . . . these programs say the institution is worried about, not confident of, your abilities as a student."

But he also said that "the playing field is not even" for groups who are "under suspicion" of not being as smart, without some sort of institutional recognition of the predicament. Stereotype vulnerable students "may be most helped by tactics that reduce their situational risk of confirming or being judged by the stereotypes about them," he wrote.

At the University of Michigan, Steele helped design the 21st Century Program as one alternative. The program is a racially integrated transition program for new students that includes voluntary, challenging workshops in addition to regular classes and a seminar on adjustment to college life. In its first two years of operation, black students in the program earned significantly better grades than a control group of black students, and those in the top two-thirds of the standardized test distribution earned first semester grades essentially the same as white students with equivalent entering tests scores. "We also know from follow-up data that their higher grade performance continued at least through their sophomore year, and that by that time, only one of them had dropped out," Steele reported.

Jonides of the University of Michigan also described that university's Undergraduate Research Opportunity Program that gets "underrepresented minorities" into early mentoring relationships with faculty. It combines peer advising, research groups and mentoring.

"The good news in this work is that the differences in performance are not immutable. But I want to raise a caution," said Michigan's Crocker, who does research on the emotions of people who are stigmatized. "It's not going to be as easy to eliminate stereotype vulnerability in the classroom or on standardized tests as it is to eliminate them in laboratory experiments."

Asked afterward to elaborate, Crocker said that in the laboratory, Steele and his colleagues can tell students the test they are about to take is just a lab exercise that is not a test of ability. "Once students are taking a real GRE test or are in a classroom, they know that tests are diagnostic of ability.

"I don't mean to undermine the importance of this research because I think it is terribly important. It shows us [that] race and gender differences in performance are not just genetic or [a symptom of] the ways that socialization has damaged people. It says that those explanations don't give the whole picture," she said.

Both of the Michigan programs described at the session "have had enormous success," she said, "but it is not something you say you'll do tomorrow and it's done. . . . I don't want people to think it's easy or cheap [to set up such a program] and then quickly decide it didn't work."

Steele, however, pointed out in his paper that there have already been a number of successful "wise schooling" efforts, such as the well- known example of math teacher Jaime Escalante, depicted in the movie Stand and Deliver; various college programs at Xavier, Howard and Georgia State University; and elementary school strategies devised by James Comer of Yale and Henry Levin of Stanford.

Possible research directions

Crocker also pointed out that the research by Steele, Aronson and Spencer doesn't directly address the question of whether ethnic group differences in IQ scores or gender differences in math scores are genetically based. In an article in the San Jose Mercury News the same day as the symposium in New York, Arthur Jensen, a retired University of California- Berkeley psychologist who has argued for a genetic explanation for the gap in white and black American IQ scores, noted that differences show up on IQ tests of 3-year-olds. "There's so much other evidence that one wonders if this anxiety about stereotypes has gotten to 3-year-olds," Jensen was quoted as saying about Steele's research.

Crocker said that racial differentials on IQ tests were not as large for pre-schoolers, and that she would not be surprised to find out 3-year- olds are aware of racial and gender stereotypes.

"The research evidence and also my own experience as a parent suggests [3-year-olds] know and value social categories" such as race and gender, she said, although perhaps do not yet know the specific content of stereotypes. An interesting research direction, she said, would be to verify when "awareness of devaluation happens among girls and African American children. I also would like to see [Steele] do his kinds of studies with intelligence tests."

Crocker predicted that Steele would get the same effect on IQ exams as on the Graduate Record Exam segments he used, basing her judgment on earlier research by Irwin Katz in the 1950s and 1960s. "Katz didn't compare blacks and whites, but he showed that black kids' scores on IQ tests varied with the testing context," Crocker said.

In a paper she prepared for the symposium, Crocker warned that scientific conferences focused on The Bell Curve may themselves contribute to the stereotype vulnerability of women and minorities.

"Whether these seminars are, in the end, critical or flattering to The Bell Curve may matter less than the fact that we, as social scientists, are taking the book so seriously, despite the fact that it provides no new data. As scholars, our eagerness to look once again at these issues simply reinforces the stereotype vulnerability and disidentification experienced by our students of color." -ko

950816Arc5120.html


This is an archived release.

This release is not available in any other form. Images mentioned in this release are not available online.
Stanford News Service has an extensive library of images, some of which may be available to you online. Direct your request by EMail to newslibrary@stanford.edu.