A little more than two years after the College Board released research rebutting findings concerning the board’s testing methods, a professor at Indiana University and his colleagues have raised new questions in a paper about test bias, based on the testing service’s own data.
The paper suggests that hundreds of thousands of college students have been affected by differential and varied predictions of their success based on how they perform on standardized tests such as the SAT and GRE.
“Our main implication is that tests do not work in the same way across colleges and universities, and we have found that hundreds of thousands of people’s predicted GPA based on SAT scores were under- or overestimated,”
says lead author Herman Aguinis, a professor of organizational behavior and human resources at Indiana University’s Kelley School of Business.
“If the prediction is not the same, that means that you can benefit or suffer based only on your ethnicity or gender, because your performance is expected to be higher or lower than it will be, which means you’re more or less likely to be offered a scholarship or you’re more or less likely to be offered admission.”
His coauthors on the paper are Steven A. Culpepper of the University of Illinois at Urbana-Champaign and Charles A. Pierce of the University of Memphis. The results are timely, given that an increasing number of colleges and universities are making the SAT an optional part of the admissions process.
In 2010, the same coauthors first examined the issue in a paper in the Journal of Applied Psychology, which concluded that the methods used by the College Board and other entities for admissions or employment testing may be flawed. They did not say their research concluded that the tests were biased; but they suggested that the tests had the potential to be biased and that methods to reveal bias were deficient.
Rebuttal to a Rebuttal
The 2010 paper received a great deal of attention. And in 2013, two research scientists then at the College Board— the organization that administers and markets the SAT and GRE— published a paper in response, in the same journal.
The authors, Krista Mattern and Brian F. Patterson, raised questions about Aguinis’ and his coauthors’ paper because it was based on a simulation. In their paper, Mattern and Patterson used actual data involving more than 475,000 students at more than 200 colleges from 2006 to 2008.
Mattern and Patterson studied the relationship between SAT data and first-year grade-point averages for those students and found—on average—that the relationship between the two factors was the same across various groups.
The Journal of Applied Psychology required Mattern and Patterson to make the College Board’s data available for the first time— in the form of a 400-page PDF file. Aguinis, Culpepper, and Pierce decided to extract that data; their new paper is based on that analysis.
Data From More Than 475,000 People
“The first thing we did was to do what they did, exactly what they did,” Aguinis says. “And we found that our results are exactly like theirs—on average—across the 200 colleges.”
But while Aguinis, Culpepper, and Pierce found the same average results as the College Board scientists, their research found much variation when data for each college was studied individually.
They argue that admissions policies, grading approaches, and academic support resources differ greatly by institution and even within them, which raises questions about how useful and fair the SAT can be as a predictor of student success across gender and ethnic groups.
“We have all these things that happen—not only on the test side but also on the GPA side—that make that prediction less precise and create differences across groups,” Aguinis says.
“We have many implications for universities, admissions officers, the testing industry, and society in general. First of all, understanding that the test in any particular context may have bias,” he adds. “In the majority of colleges, we found differences, some in one direction and some in the other direction.
Hundreds of thousands of students probably have been denied admission or denied scholarships just because of their ethnicity or gender when standardized tests are central in the admissions process—but not against blacks or against women necessarily.
It goes both ways. The paper is about predicting performance for all people, and the bias we found sometimes benefits one group and some other times the other.”
They compared 257,336 female and 220,433 male students across 339 samples and 29,734 African American and 304,372 white students across 264 samples collected from 176 colleges and universities from 2006 to 2008.
While the paper focuses on data for the SAT, Aguinis says its findings are applicable for other exams, such as the GRE, GMAT, civil service, and many other pre-employment tests, which also measure intelligence and quantitative skills.
Aguinis is not suggesting that the SAT and other tests are irrelevant; they are among the best measures of future academic success available. However, the results need to be understood within the local context of a college, university or other organization, and he hopes the College Board will make additional data available to help with this process.
“You need to understand how the test works in your local context; the test may be working in ways you don’t know, and you may be over-predicting for some groups and under-predicting for others,” he says. “You need to understand if the test is predicting performance to the same extent across groups. Otherwise, the selection process may be unfair for members of certain groups, and the implications are critical for people’s future.”
In their methodology, the researchers set out to eliminate known factors—such as sample size, range restriction, and proportion of students across ethnicity- and gender-based subgroups—that may explain differences across institutions.
“We had to engage in a very large number of procedures, because the burden of proof was on us,” Aguinis says.
Five different reviewers evaluated the paper, and researchers were asked to submit nine versions, in addition to the original manuscript, before it was accepted for publication. Usually, this process includes two or three reviewers and three or four revisions at the most.
“I’ve been in this field since 1993–23 years—and I’ve never seen—ever—nine revisions in any journal. I am on the board of 15 journals, and I was editor of a journal. I have been a reviewer for another 10 journals,” he says. “This paper was scrutinized like no other I have ever seen before.”
Aguinis, Herman; Culpepper, Steven A.; Pierce, Charles A. Differential Prediction Generalization in College Admissions Testing Journal of Educational Psychology, Jan 21 , 2016, doi: http://dx.doi.org/10.1037/edu0000104
Photo: cameron russell/Flickr