Software that predicts how likely a criminal will reoffend – and is used by the courts to mete out punishments – is about as smart as a layperson off the street.
That's according to a paper published in Science Advances on Wednesday. The research highlights the potential dangers of relying on black-box algorithms, particularly when a person's freedom is at stake.
The software in question is called COMPAS, which stands for Correctional Offender Management Profiling for Alternative Sanctions. It was designed by Equivant, which was previously known as Northpointe.
It provides “technological solutions” for court systems across America. What that means is, it takes in a load of information about defendants and produces reports that predict their lawfulness, guiding trial judges on sentencing. People condemned by the code to be future repeat offenders are put behind bars for longer, or sent on special courses, and so forth. And no one can see the code.
How COMPAS works internally is unclear since its maker doesn’t want to give away its commercial secrets. It’s a complex tool that, according to the study's researchers, takes into account a whopping 137 variables when deciding how likely a defendant will reoffend within two years of their most recent crime. It has assessed more than a million lawbreakers since it was deployed at the turn of the millennium.
However, according to the published research, the application has the same level of accuracy as untrained people pulled off the street armed with only seven of the variables. In fact, we're told, the laypeople were just as accurate as COMPAS when the folks were given just two bits of information: a defendant's age and number of prior convictions.
In other words, if you took someone with no legal, psychological or criminal justice system training – perhaps you, dear reader – and showed them a few bits of information about a given defendant, they'd be able to guess as well as this software as to whether the criminal would break the law again.
Again, that's according to the above study, which was led by Julia Dressel, a software engineer who graduated last year from Dartmouth College in the US, and Hany Farid, a professor of computer science also at Dartmouth.
The team collected records on 1,000 defendants, and randomly divided the dataset into 20 groups of 50 people each. Then 20 human participants were recruited via Amazon Mechanical Turk – which pays people a small amount of cash to complete mundane tasks – and each participant was assigned to one of those 20 subsets.
These 20 non-expert human judges were then given a passage built from the following template for each defendant in their dataset, with the blanks filled in as appropriate: “The defendant is a [SEX] aged [AGE]. They have been charged with: [CRIME CHARGE]. This crime is classified as a [CRIMINAL DEGREE]. They have been convicted of [NON-JUVENILE PRIOR COUNT] prior crimes. They have [JUVENILE- FELONY COUNT] juvenile felony charges and [JUVENILE-MISDEMEANOR COUNT] juvenile misdemeanor charges on their record.”
Each human was asked: “Do you think this person will commit another crime within two years?” They then had to respond by selecting either yes or no. The seven variables – highlighted in square brackets in the passage above – is much less information to go on compared to the 137 apparently taken into account by COMPAS. Despite this, the human team was accurate in 67 per cent of the cases, better than the 65.2 per cent scored by the computer program.
Dressel told The Register the results show it’s important to understand how these algorithms work before using them.
“COMPAS’s predictions can have a profound impact on someone’s life, so this software needs to be held to a high standard," she said.
"It’s essential that it at least outperforms human judgement. Advances in AI are very promising, but we think that it is important to step back and understand how these algorithms work and how they compare against human-centric decision making before entrusting them with such serious decisions.”
Dressel reckoned the disappointing accuracy results for both man and machine stems from racial bias.
“Black defendants are more likely to be classified as medium or high risk by COMPAS, because black defendants are more likely to have prior arrests," she explained.
"On the other hand, white defendants are more likely to be classified as low risk by COMPAS, because white defendants are less likely to have prior arrests. Therefore, black defendants who don’t reoffend are predicted to be riskier than white defendants who don’t reoffend.
“Conversely, white defendants who do reoffend are predicted to be less risky than black defendants who do reoffend. Mathematically, this means that the false positive rate is higher for black defendants than white defendants, and the false negative rate for white defendants is higher than for black defendants.
"This same sort of bias appeared in the human results. Because the human participants saw only a few facts about each defendant, it is safe to assume that the total number of prior convictions was heavily considered in one’s predictions. Therefore, the bias of the human predictions was likely also a result of the difference in conviction history."
The results cast doubt on whether machines are any good at predicting whether or not a person will break the law again, and whether decision-making code should be used in the legal system at all.
This software, judging from the above findings, is no better than an untrained citizen, yet is used to guide the courts on sentencing. Perhaps the job of handing out punishments should be left purely to the professionals.
In a statement, Equivant claimed its software doesn't actually use 137 variables per defendant. It uses, er, six:
The cursory review of the article indicates serious errors related to misidentification of the COMPAS risk model and a lack of an external/independent validation sample. The authors have made an erroneous specification of the COMPAS risk model as using “137 inputs”. This part of the study is highly misleading. It falsely asserts that 137 inputs are used in the COMPAS risk assessment. In fact, the vast number of these 137 are needs factors and are NOT used as predictors in the COMPAS risk assessment. The COMPAS risk assessment has six inputs only. Risk assessments and needs assessments are not to be viewed as one and the same.
“Regardless how many features are used by COMPAS, the fact is that a simple predictor with only two features and people responding to an online survey are as accurate as COMPAS,” the research duo concluded in response. ®