Risk and Fairness - Mapping Pretrial Risk

One key issue with pretrial RATs is that they predict “risk” in ways that designers claim are fair. But risk, and what is fair and to whom, are subjective measures based on human judgement.

RATs lend a veneer of scientific objectivity to the concept of predicting “dangerousness,” which is both hard to predict and rare. Some argue that acting like dangerousness is quantifiable could lead to increased preventative detention.¹

Given that we are all innocent until proven guilty and that no one can perfectly predict whether or not someone will commit any crimes, Sandy Mason asks: “What statistical risk that a person will commit future crime justifies short-term detention — if any does?”²

As one paper argues, even if RATs’ predictions were better than judges’ and magistrates’ decisions — a claim that, the authors note, research does not verify — using these assessments could still be unfair.³

What factors are included and how much they are weighted in the algorithm, as well as the cutoff points between low, medium, and high risk may be grounded in statistics — but at the end of the day, especially when tied with a decision-making framework, they are a reflection of how much risk is acceptable risk to a particular jurisdiction’s criminal legal system decision-makers.

To dive into this subjectivity further:

“The designation of risk bins is somewhat arbitrary. What qualifies as low or high depends on the thresholds set by tool designers, and merely denotes the risk a group presents relative to other risk bins.
To illustrate, somewhere between 8.6-11% of those flagged for “new violent criminal activity” (“NVCA”) by the PSA were rearrested for a violent charge within six months of their release. That means 89-91% of people flagged were not arrested for a violent crime.”
Brandon Buskey and Andrea Woods, “Making Sense of Pretrial Risk Assessments”⁴

Our current push towards criminal legal reform and ending pretrial incarceration is an opportunity to redefine what kind of risk we really care about. Defining this risk cannot be left to the data scientists creating these tools to work out on their own.

As legal scholar Bernard Harcourt argues: “The fact is, risk today has collapsed into prior criminal history, and prior criminal history has become a proxy for race. The combination of these two trends means that using risk‐assessment tools is going to significantly aggravate the unacceptable racial disparities in our criminal justice system.”⁵

Factors we know to be unfair form the basis for the way that the tools measure risk.

Most pretrial RATs try to predict if someone will return to court if released pretrial. This really has nothing to do with what kind of a risk they pose to society, but has a lot to do with whether or not an accused person has access to reminders to come to court on the right date, transportation to court, childcare, or a job that allows them to take time off.⁶

RATs also try to predict if someone will be arrested again, which makes them sound “dangerous” — even though the vast majority of arrests are for nonviolent crimes.⁷

A new arrest demonstrates the behavior of those doing the arresting, not those who are arrested. If someone is from a neighborhood that is more heavily surveilled, they are already more likely to be arrested just based on where they live.

Researchers investigating New York City’s supervised release risk assessment tool questioned whether felony rearrest, a seemingly fair measure for dangerousness, was actually a fair thing to measure at all,⁸ given racial disparities in arrests.

A tool might be more “accurate” because it correctly predicts what percentage of white and Black defendants will be rearrested. In other words, a tool might say it is accurate in its predictions regarding race, and therefore not racially biased.

However, that “correct” prediction assumes that Black defendants will be arrested more often, which may be true in reality due to systemic racism, but is not actually fair.

Developers may try to balance different statistical metrics of fairness, but researchers have even argued that “bias in criminal risk scores Is mathematically inevitable.”⁹

Here are a few examples of how risk assessments have led to different kinds of unfair outcomes in practice:

ProPublica’s analysis of COMPAS in Broward County, Florida, found that the COMPAS algorithm overstates the risk of Black defendants being rearrested and mislabels white defendants as low risk more often than Black defendants.¹⁰

A study of the PSA in Mecklenburg, North Carolina, found that despite no clear impact of the PSA on racial disparities in their jails, Black defendants were still more likely to be assessed as high risk than others.¹¹

In the New York City study of their supervised release risk assessment tool, researchers found that the tool did not meet many common metrics of fairness,¹² such as limiting false positives (such as when someone is labeled high risk but does not actually reoffend), especially for Black defendants.

RATs can’t fix systemic issues or remove race and class bias in police, judges, courts, or society.

Risk assessment tools could try to maximize “fairness,” but the way they are designed and the realistic limitations of actual data mean they often ignore biases in favor of usability or simplification.¹³

These tradeoffs are a built-in part of creating a predictive tool, but for any individual accused person who is detained just because, due to group-based data, they fall in one risk category instead of another,¹⁴ this can feel especially unfair.

Until we address these issues, our determination of risk and fairness and what tradeoffs are acceptable will be bound in our history of racially and economically biased understandings of who is “dangerous” and who should be protected.

Explore the complexities of determining fairness in algorithms through an interactive tool built by MIT Technology Review.¹⁵