This article is more than 1 year old
Uni revealed it killed off its PhD-applicant screening AI – just as its inventors gave a lecture about the tech
Fears of bias put compsci dept into damage-limitation mode after years of using it to analyze applications
A university announced it had ditched its machine-learning tool, used to filter thousands of PhD applications, right as the software's creators were giving a talk about the code and drawing public criticism.
The GRADE algorithm was developed by a pair of academics at the University of Texas at Austin, and it was used from 2013 to this year to assess those applying for a PhD at the US college's respected computer-science department. The software was trained using the details of previously accepted students, the idea being to teach the system to identify people the school would favor, and to highlight them to staff who would make the final call on the applications. It's likely the program picked up biases against applicants of certain backgrounds excluded from that historical data.
Hopefuls were assigned a score from zero to five by the code, and those with high scores were pushed forward to university staff by GRADE. The software, according to its creators in a paper describing the technology, "reduced the number of full reviews required per applicant by 71 percent and, by a conservative estimate, cut the total time spent reviewing files by at least 74 percent.” That means poorly scored applicants were given less attention by staff.
The compsci department has now distanced itself from the GRADE algorithm, first saying the code had the potential to pick up unfair biases, and later saying it was difficult to maintain. “The University of Texas at Austin’s Department of Computer Science stopped using the graduate admissions evaluator (GRADE) in early 2020,” a spokesperson told The Register in a statement on Monday.
“The system was used to organize graduate admissions in the Department of Computer Science between the 2013 and 2019 academic years. Researchers developed the statistical system in response to a high volume of applicants for graduate programs in the department. It was never used to make decisions to admit or reject prospective students, as at least one person in the department directly evaluates applicants at each stage of the review process.
“Changes in the data and software environment made the system increasingly difficult to maintain, and its use was discontinued. The graduate school works with graduate programs and faculty members across campus to promote efficient and effective holistic application reviews.”
The decision earlier this year to stop using GRADE to screen computer-science PhD candidates, however, was only announced by the department on Twitter last week after plasma physicist Yasmeen Musthafa drew attention to potential flaws in the statistical machine-learning software. Musthafa tweeted their widely shared criticism on November 30, the day before the creators of GRADE were due to give a presentation about their code at a virtual event arranged by the University of Maryland’s Department of Physics. On the day of the lecture, UT Austin tweeted it had abandoned the software:
TXCS is deeply committed to addressing the lack of diversity in our field. We are aware of the potential to encode bias into ML-based systems like GRADE, which is why we have phased out our reliance on GRADE and are no longer using it as part of our graduate admissions process.— Computer Science at UT Austin (@UTCompSci) December 1, 2020
In fact, this damage-limitation move was made when GRADE's designers – Austin Waters and Risto Miikkulainen – were still presenting their work on the software to colleagues via Zoom. Although the presentation is not generally available, technical details and the effects of GRADE were shared in the form of a paper published in AI Magazine in 2014.
AI algorithms trained on historical data can inherit old biases
GRADE is trained on various features to rank applicants, including their GPA, the universities previously attended, letters of recommendation, area of research interest, and the faculty advisor they wish to study under. The algorithm then compares this information to PhD students the department has previously accepted to predict whether an applicant is likely to be granted a place or not. GRADE is designed to weed out weaker prospective students so that the university wastes less time by having to consider every application in full. In other words, it acts as a screening process helping the department focus on students that seem to be more promising.
Surprise, surprise: AI cameras sold to schools in New York struggle with people of color and are full of false positivesREAD MORE
“While every application is still looked at by a human reviewer," the 2014 paper noted, "GRADE makes the review process much more efficient. This is for two reasons. First, GRADE reduces the total number of full application reviews the committee must perform. Using the system’s predictions, reviewers can quickly identify a large number of weak candidates who will likely be rejected and a smaller number of exceptionally strong candidates."
UT Austin's computer-science department is ranked in the top ten of its ilk, and thousands of students fight for a place in its graduate programs.
When The Register asked if applicants were explicitly told that their applications were screened by an algorithm, and that the university stored their data to retrain and improve its system for the following year, UT Austin declined to answer the question. GRADE does not appear to have been rolled out for other departments nor at other universities.
Professor Miikkulainen, who helped invent the GRADE algorithm, said the tool was not biased against race or gender.
“To the degree we could measure bias, we found that the process did not add biases,” he told The Register. "Back in 2013, bias was not yet a mainstream topic in AI, and there were few techniques available, but our choice of learning method created an opportunity: The logistic regression model learns to assign weights on features according to how important they are in decision making.
“We did a separate experiment where we included gender and ethnic origin, and found that GRADE assigned zero weights to them – in other words, these features had no predictive power, ie: reviewers had not used them in making decisions. So to the extent it was possible to measure then, GRADE was unbiased in those respects.”
Not a popular idea
Nevertheless, the university has pledged to stop using GRADE in its graduate admissions process over fears that it could be biased, and that opinion is echoed by other academics.
“I was listening in the talk, and it is true that during the talk the UT Austin compsci department tweeted that because of concerns of fairness, they would no longer be using it,” Steve Rolston, a physics professor at the University of Maryland, told The Register.
Concerned about GRADE’s potential to damage a student’s application, he sent an email out assuring students that the system would not be rolled out at Maryland. “According to the speakers, the point of GRADE was to replicate the decisions of their admissions committee, and in fact it was trained on previous admissions committee data," he said.
"While it is possible that it was successful at that specific task, it would simply be replicating any biases that existed in the committees decisions, let alone the fact that [machine-learning] algorithms do not really give one any guidance on how they are classifying things. When they used GRADE, its results were always checked by a human, but I would be concerned that if you are told the algorithm rated someone low, it would inevitably color your opinion and was thus not necessarily a good check on the system.
“While [machine learning] is fine to do image classification for example, I think it is very dangerous to use it for things such as hiring or admissions. When we are admitting someone to the graduate program we are evaluating their potential to be successful, given a limited set of input data, much of which is subjective, [for example] letters of recommendation. There is no quantitative process to make such identifications, so an algorithm is unlikely to be helpful.”
Miikkulainen confirmed to El Reg that UT Austin has no plans to deploy another machine-learning algorithm to process applications in the future. ®